Configure an End-to-End Pipeline for EchoKit

In addition to the classic ASR-LLM-TTS pipeline, EchoKit supports real-time models that can reduce latency. However, this approach has several limitations:

High API costs – OpenAI's real-time API can cost up to $25 per 100 tokens
No voice customization – You cannot modify the generated voice
Limited knowledge integration – External knowledge bases cannot be added to the model
No MCP support – Model Control Protocol is not supported in most cases

Prerequisites

Before setting up your end-to-end pipeline, ensure you have:

EchoKit server source code – Follow the guide if you haven't already
Gemini API key – Obtain from Google AI Studio
TTS service running (optional) – If using custom voice synthesis

Gemini API Setup

Google's Gemini is one of the most advanced models supporting voice-to-voice interactions, and EchoKit fully supports it. For detailed implementation, see the Gemini example.

Getting Your Gemini API Key

Visit Google AI Studio
Sign in with your Google account
Navigate to "Get API Key" in the left sidebar
Create a new API key for your project
Copy the key – you'll need it for the configuration

Basic Configuration

Here's the complete configuration file for Gemini:

addr = "0.0.0.0:9090"
hello_wav = "hello.wav"

[gemini]
api_key = "your_api_key_here"

[[gemini.sys_prompts]]
role = "system"
content = """
You are a helpful assistant. Please answer user questions as concisely as possible while being accurate and truthful. Use short sentences. Try to be humorous and light-hearted.
"""

Starting the Server

After editing the configuration file, restart the EchoKit server to apply the changes.

Since you're using a different config.toml file in a custom path, your restart command should look like this:

./target/release/echokit_server ./examples/gemini/chat/config.toml

Gemini + TTS (Custom Voice)

While real-time models typically don't allow voice customization, EchoKit enables you to customize the voice even when using Gemini!

Configuration with Custom TTS

Simply add TTS-related parameters to your config.toml file:

addr = "0.0.0.0:9090"
hello_wav = "hello.wav"

[gemini]
api_key = "your_api_key_here"

[tts]
platform = "StreamGSV"
url = "http://localhost:9094/v1/audio/stream_speech"
speaker = "cooper"

[[gemini.sys_prompts]]
role = "system"
content = """
You are a helpful assistant. Please answer user questions as concisely as possible while being accurate and truthful. Use short sentences. Try to be humorous and light-hearted.
"""

With these TTS settings configured, you can now use your preferred custom voice.

Prerequisites​

Gemini API Setup​

Getting Your Gemini API Key​

Basic Configuration​

Starting the Server​

Gemini + TTS (Custom Voice)​

Configuration with Custom TTS​