EchoKit server config options

The EchoKit server orchestrates multiple AI services to turn user voice input into voice responses. It generally takes two approaches.

The pipeline approach. It divides up the task into multiple steps, and use a different AI service to process each step.
- The ASR service turns the user input voice audio into text.
- The LLM service generates a text response to the user input. The LLM could be aided by built-in tools, such as web searches and custom tools in MCP servers.
- The TTS service converts the response text to voice.
The end-to-end real-time model approach. It utilizes multimodal models that could directly ingest voice input and generate voice output, such as Google Gemini Live.

The pipeline approach offers greater flexibility and customization - you can choose any voice, control costs by mixing different providers, integrate external knowledge, and run components locally for privacy. While end-to-end models can reduce the latency, the classic pipeline gives you full control over each component.

You can configure how those AI services work together through EchoKit server's config.toml file.

Prerequisites

Started an EchoKit server. Follow the quick start guide if needed
Obtained API keys for your favoriate AI API providers (OpenAI, Groq, xai, Open Router, ElevenLabs, Gemini etc.)

Configure server address and welcome audio

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

addr: The server's listening address and port
- Use 0.0.0.0 to accept connections from any network interface
- Make sure that your firewall allows incoming connections to the port (8080 in this example)
hello_wav: Optional welcome audio file played when a device connects
- Supports 16kHz WAV format
- Make sure that the file is in the same folder as config.toml

Configure AI services

The rest of the config.toml specifies how to use different AI services. Each service will be covered in its own chapter.

The [asr] section configures the voice-to-text services.
The [llm] section configures the large language model services, including tools and MCP actions.
The [tts] section configures the text-to-voice services.

It is important to note that each of sections has those fields.

A platform field that designates the service protocol. A common example is openai for OpenAI compatible API endpoints.
A url field for the service URL endpoint. It is typically a https:// or wss:// URL. The latter is the Web Socket address for streaming services.
Optional fields that are specific to the platform. That includes api_key, model, and others.

Complete Configuration Example

You will need a free API key from Groq.

# Server settings
addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

# Speech recognition using the OpenAI transcriptions API, but hosted by Groq (instead of OpenAI)
[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
lang = "en"
api_key = "gsk_your_api_key_here"
model = "whisper-large-v3-turbo"

# Language model using the OpenAI chat completions API, but hosted by Groq (instead of OpenAI)
[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_your_api_key_here"
model = "gpt-oss-20b"
history = 10

# Text-to-speech using the OpenAI speech API, but hosted by Groq (instead of OpenAI)
[tts]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/speech"
api_key = "gsk_your_api_key_here"
model = "playai-tts"
voice = "Cooper-PlayAI"

# System personality
[[llm.sys_prompts]]
role = "system"
content = """
Your name is EchoKit, a helpful AI assistant. Provide clear, concise responses and maintain a friendly, professional tone. Keep answers brief but informative.
"""

Prerequisites​

Configure server address and welcome audio​

Configure AI services​

Complete Configuration Example​

Prerequisites

Configure server address and welcome audio

Configure AI services

Complete Configuration Example