Configure an End-to-End Pipeline for EchoKit
In addition to the classic ASR-LLM-TTS pipeline, EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:
- High API costs – OpenAI's real-time API can cost up to $25 per 100 tokens
- No voice customization – You cannot modify the generated voice
- Limited knowledge integration – External knowledge bases cannot be added to the model
- No MCP support – Model Control Protocol is not supported in most cases
Prerequisites
Before setting up your end-to-end pipeline, ensure you have:
- EchoKit server source code – Follow the guide if you haven't already
- Gemini API key – Obtain from Google AI Studio
- TTS service running (optional) – If using custom voice synthesis
Gemini API Setup
Google's Gemini is one of the most advanced models supporting voice-to-voice interactions, and EchoKit fully supports it. For detailed implementation, see the Gemini example.
Getting Your Gemini API Key
- Visit Google AI Studio
- Sign in with your Google account
- Navigate to "Get API Key" in the left sidebar
- Create a new API key for your project
- Copy the key – you'll need it for the configuration
Basic Configuration
Here's the complete configuration file for Gemini:
addr = "0.0.0.0:9090"
hello_wav = "hello.wav"
[gemini]
api_key = "your_api_key_here"
[[gemini.sys_prompts]]
role = "system"
content = """
You are a helpful assistant. Please answer user questions as concisely as possible while being accurate and truthful. Use short sentences. Try to be humorous and light-hearted.
"""
Starting the Server
After editing the configuration file, restart the EchoKit server to apply the changes.
Since you're using a different config.toml
file in a custom path, your restart command should look like this:
./target/release/echokit_server ./examples/gemini/chat/config.toml
Gemini + TTS (Custom Voice)
While real-time models typically don't allow voice customization, EchoKit enables you to customize the voice even when using Gemini!
Configuration with Custom TTS
Simply add TTS-related parameters to your config.toml
file:
addr = "0.0.0.0:9090"
hello_wav = "hello.wav"
[gemini]
api_key = "your_api_key_here"
[tts]
platform = "StreamGSV"
url = "http://localhost:9094/v1/audio/stream_speech"
speaker = "cooper"
[[gemini.sys_prompts]]
role = "system"
content = """
You are a helpful assistant. Please answer user questions as concisely as possible while being accurate and truthful. Use short sentences. Try to be humorous and light-hearted.
"""
With these TTS settings configured, you can now use your preferred custom voice.