Blog | EchoKit

Day 16: Dynamic Personality for EchoKit | The First 30 Days with EchoKit

December 11, 2025 · 2 min read

In previous instalments we explored switching LLM providers and giving EchoKit different personalities through system prompts. Today let's learn a powerful new feature —dynamic system prompt loading.

Why dynamic system prompts?

A system prompt sets EchoKit’s tone, role and behaviour. Thanks to the growing ecosystem of open‑source prompts, you can choose from thousands of prebuilt personalities—sites like LLMs.txt offer extensive collections. Previously, changing EchoKit’s character required editing a local file and restarting the server. Now the server can fetch a system prompt from a remote URL, insert it into the context and cache it. This lets you:

Update behaviour remotely. Change the text at the URL and EchoKit adopts a new persona on the next restart.
Experiment without redeploying. Quickly swap prompts or test new conversation flows without editing code.
Iterate on demos. Focus on creativity rather than configuration while your EchoKit responds in new ways.

How to use a remote prompt

Open your config.toml and find the [[llm.sys_prompts]] section. Instead of embedding the full text, wrap a plain‑text URL in double braces:

[[llm.sys_prompts]]
role = "system"
content = """
{{ https://raw.githubusercontent.com/alabulei1/echokit-dynamic-prompt/refs/heads/main/prompt.txt }}
"""

On startup, EchoKit will:

Fetch the content from that URL.
Insert it as the system prompt.
Cache it for later use.

Want to give it a try? GitHub raw files are convenient hosts because it's free and they can return plain text.

When does EchoKit reload the prompt?

Dynamic prompts are fetched only during a full restart:

When you power the device off and back on.
When you press the RST hardware button.

Interrupting a conversation with the K0 button or a temporary Wi‑Fi reconnection will not reload the prompt. This ensures ongoing sessions remain consistent while still giving you the freedom to change behaviour by updating the remote file.

Summary

Dynamic system prompt loading opens up a new level of flexibility for EchoKit. You no longer need to modify local files or restart the server to change your agent’s behaviour; instead, you can pull any prompt hosted on the web and swap personas at will.

Want to get your EchoKit Device and make it unique?

Join the EchoKit Discord to share your creative welcome voices and see how others are personalizing their Voice AI agents!

Day 15: EchoKit × MCP — Search the Web with Your Voice | The First 30 Days with EchoKit

December 10, 2025 · 4 min read

Over the past few days in The First 30 Days with EchoKit, we’ve explored how EchoKit connects to various LLM providers—OpenAI, OpenRouter, Groq, Grok and even local models. But switching models only affects how smart EchoKit is.

Next, we showed how changing the system prompt can transform EchoKit’s personality without touching any code—turning it into a coach, a cat, or a Shakespearean actor. Today, we’re going to extend what EchoKit can do by plugging into the broader ecosystem of tools through the Model Context Protocol (MCP).

Recent industry news makes this especially timely: on December 9, 2025, Anthropic donated MCP to the Linux Foundation and co‑founded the Agentic AI Foundation (AAIF) with Block and OpenAI. MCP is now joined by Block’s Goose agent framework and OpenAI’s AGENTS.md spec as the founding projects of the AAIF.

🧠 What is MCP?

MCP acts like a “USB‑C port” for AI agents. It defines a client–server protocol that lets models call external tools, databases or APIs through standardised actions. MCP servers wrap services—such as file systems, web searches or device controls—behind simple JSON‑RPC endpoints. MCP Clients (like EchoKit or Anthropic’s Claude Code) connect to one or more MCP servers and dynamically discover available tools. When the model needs information or wants to perform an action, it sends a tool request; the server executes the tool and returns results for the model to use.

MCP’s adoption has been rapid: within a year of its release there were over 10,000 public MCP servers and more than 97 million SDK downloads. It’s been integrated into major platforms like ChatGPT, Claude, Cursor, Gemini, Microsoft Copilot and VS Code. By placing MCP under the AAIF, Anthropic and its partners ensure that this crucial infrastructure remains open, neutral and community‑driven.

🔧 Connect EchoKit to an MCP Server

To make EchoKit call external tools, we simply point it to an MCP server. Add a section like the following to your config.toml:

[[llm.mcp_server]]
server = "MCP_SERVER_URL"
type   = "http_streamable"

server – the URL of the MCP server (replace this with the server you want to use).

type – http_streamable and SSE mode are supported.

Once configured, EchoKit will automatically maintain a connection to the MCP server. When the LLM detects that it needs to call a tool, it issues a request via MCP and merges the response into the LLM. So, if you want to use MCP server, the LLM you used must support tool call. Here are some recommendations:

Open source models: Qwen3, GPT-OSS, Llama 3.1
Close source models: Gemini, OpenAI, Claude

🌐 Example: Adding a Web Search Tool

To demonstrate, let’s connect EchoKit to a web‑search MCP server. Many open‑source servers provide a search tool that scrapes public search engine results—often without requiring API keys.

Adding the server to your configuration. Here I use the GPT-OSS-120B model hosted on Groq and the tavily MCP server:

[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR API KEY"
model = "openai/gpt-oss-120b"
history = 5

[llm.mcp_server]]
server = "http://eu.echokit.dev:8011/mcp"
type = "http_streamable"

After that, save the file and restart EchoKit as usual.

Ask: “Tell me the latest update of MCP.”

Under the hood, EchoKit’s LLM recognises that it needs up‑to‑date information. It invokes the search tool on your MCP server, passing your query.

The MCP server performs the web search and returns structured results (titles, URLs and snippets). EchoKit then synthesises a natural‑language answer, summarising the findings and citing the sources.

You can also use other MCP server tools like the Google Calendar MCP server to add and edit events, Slack MCP server to send a message to the Slack channel, Home Assistant MCP server to control home devices. All of these tools become accessible through your voice.

📌 Why This Matters

Integrating MCP gives EchoKit access to a rapidly expanding tool ecosystem. You’re no longer limited to predetermined voice commands; your agent can search the web, read files, run code, query databases or control smart devices—all through a voice interface. The AAIF’s stewardship of MCP ensures that these capabilities remain open and interoperable, so EchoKit can continue to evolve alongside the broader agentic AI community.

Want to explore more or share what you’ve built with MCP servers?

Join the EchoKit Discord

Ready to get your own EchoKit?

EchoKit Box → https://echokit.dev/echokit_box.html
EchoKit DIY Kit → https://echokit.dev/echokit_diy.html

Start building your own voice AI agent today.

Day 14: Give EchoKit a New Personality with System Prompt | The First 30 Days with EchoKit

December 9, 2025 · 3 min read

Over the past few days, we explored how EchoKit connects to different LLM providers — OpenAI, OpenRouter, Groq, Grok and even fully local models like Qwen3.

But switching the model only decides how smart EchoKit is.

Today, we’re doing something much more fun: we’re changing who EchoKit is.

With one simple system prompt, you can turn EchoKit into a cat, a coach, a tired office worker, a sarcastic companion, or a dramatic Shakespeare actor. No code. No firmware change. Just one text block in your configuration.

Let’s make EchoKit come alive.

What Is a System Prompt, and Why Does It Matter?

A system prompt is the personality, behavior guideline, and “soul” you give your LLM.

It defines:

How EchoKit speaks
What role it plays
Its tone and attitude
How it should respond in different situations

System prompt is incredibly powerful. Change it, and the same model can behave like a completely different agent.

Where the System Prompt Lives in EchoKit

In your config.toml, under the [[llm.sys_prompts]] section, you’ll find:

[[llm.sys_prompts]]
role = "system"
content = """
  (your prompt goes here)
"""

Just edit this text, save the file, and restart the EchoKit server.

If your WiFi and EchoKit server didn't change, press the rst button on the device to make the new system prompt take effect.

5 Fun and Hilarious Prompt Ideas You Can Try Today

Below are ready-to-use system prompts. Copy, paste, enjoy.

1. The “Explain Like I’m Five” Tutor

You explain everything as if you're teaching a five-year-old. 
Simple, patient, cute, and crystal clear.

2. The Shakespearean AI

You speak like a dramatic Shakespeare character, 
as if every mundane question is a matter of cosmic destiny.

3. The Confused but Hardworking AI Intern

You are a slightly confused intern who tries extremely hard. 
Sometimes you misunderstand things in funny ways, but you stay cheerful.

4. The Cat That Doesn’t Understand Human Problems

You are a cat. 
You interpret all human activities through a cat’s perspective. 
Add 'meow' occasionally. 
You don't truly understand technology.

5. The Absurd Metaphor Philosopher

You must include at least one ridiculous metaphor in every reply. 
Be philosophical but humorous.

Have fun — EchoKit becomes a completely different creature depending on what you choose.

Prompt Debugging Tips

If your character “breaks,” try adding:

“Stay in character.”
“Keep responses short.”
“If unsure, make up a fun explanation.”
“Use a consistent tone.”

Prompt tuning is an art. A few careful sentences can reshape the entire interaction.

Try giving your EchoKit different personalities now.

Want to explore more or share what you’ve built?

Join the EchoKit Discord

Ready to get your own EchoKit?

EchoKit Box → https://echokit.dev/echokit_box.html
EchoKit DIY Kit → https://echokit.dev/echokit_diy.html

Start building your own voice AI agent today.

Day 13 — Running an LLM Locally for EchoKit | The First 30 Days with EchoKit

December 8, 2025 · 3 min read

Over the last few days, we explored several cloud-based LLM providers — OpenAI, OpenRouter, and Grok. Each offers unique advantages, but today we’re doing something completely different: we’re running the open-source Qwen3-4B model locally and using it as EchoKit’s LLM provider.

There’s no shortage of great open-source LLMs—Llama, Mistral, DeepSeek, Qwen, and many others—and you can pick whichever model best matches your use case.

Likewise, you can run a local model in several different ways. For today’s walkthrough, though, we’ll focus on a clean, lightweight, and portable setup: Qwen3-4B (GGUF) running inside a WASM LLM server powered by WasmEdge. This setup exposes an OpenAI-compatible API, which makes integrating it with EchoKit simple and seamless.

Run the Qwen3-4B Model Locally

Step 1 — Install WasmEdge

WasmEdge is a lightweight, secure WebAssembly runtime capable of running LLM workloads through the LlamaEdge extension.

Install it:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s

Verify the installation:

wasmedge --version

You should see a version number printed.

Step 2 — Download Qwen3-4B in GGUF Format

We’ll use a quantized version of Qwen3-4B, which keeps memory usage manageable while delivering strong performance.

curl -Lo Qwen3-4B-Q5_K_M.gguf https://huggingface.co/second-state/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q5_K_M.gguf

Step 3 — Download the LlamaEdge API Server (WASM)

This small .wasm application loads GGUF models and exposes an OpenAI-compatible chat API, which EchoKit can connect to directly.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Step 4 — Start the Local LLM Server

Now let’s launch the Qwen3-4B model locally and expose the /v1/chat/completions endpoint:

wasmedge --dir .:. \
  --nn-preload default:GGML:AUTO:Qwen3-4B-Q5_K_M.gguf \
  llama-api-server.wasm \
  --model-name Qwen3-4B \
  --prompt-template qwen3-no-think \
  --ctx-size 4096

If everything starts up correctly, the server will be available at:

http://localhost:8080

Connect EchoKit to Your Local LLM

Open your EchoKit server’s config.toml and update the LLM settings:

[llm]
llm_chat_url = "http://localhost:8080/v1/chat/completions"
api_key = "N/A"
model = "Qwen3-4B"
history = 5

Save the file and restart your EchoKit server.

Next, pair your EchoKit device and connect it to your updated server.

Now try speaking to your device:

“EchoKit, what do you think about running local models?”

Watch your terminal — you should see EchoKit sending requests to your local endpoint.

Your EchoKit is now fully powered by a local Qwen3-4B model.

Today we reached a major milestone: EchoKit can now run entirely on your machine, with no external LLM provider required.

This tutorial is only one small piece of what EchoKit can do. If you want to build your own voice AI device, try different LLMs, or run fully local models like Qwen — EchoKit gives you everything you need in one open-source kit.

Want to explore more or share what you’ve built?

Join the EchoKit Discord
Show us your custom models, latency tests, and experiments — the community is growing fast.

Ready to get your own EchoKit?

EchoKit Box → https://echokit.dev/echokit_box.html
EchoKit DIY Kit → https://echokit.dev/echokit_diy.html

Start building your own voice AI agent today.

End-to-End vs. ASR-LLM-TTS: Which One Is The Right Choice to Build Voice AI Agent?

December 8, 2025 · 5 min read

The race to build the perfect Voice AI Agent has primarily split into two lanes: the seamless, ultra-low latency End-to-End (E2E) model (like Gemini Live), and the highly configurable ASR-LLM-TTS modular pipeline. While the speed and fluidity of the End-to-End approach have garnered significant attention, we argue that for enterprise-grade applications, the modular ASR-LLM-TTS architecture provides the strategic advantage of control, customization, and long-term scalability.

This is not simply a technical choice; it is a business decision that determines whether your AI Agent will be a generic tool or a highly specialized, branded extension of your operations.

The Allure of the Integrated Black Box (Low Latency, High Constraint)

End-to-End models are technologically impressive. By integrating the speech-to-text (ASR), large language model (LLM), and text-to-speech (TTS) functions into a single system, they achieve significantly lower latency compared to pipeline systems. The resulting conversation feels incredibly fluid, with minimal pauses—an experience that is highly compelling in demonstrations.

However, this integration creates a “black box”. Once the user's voice enters the system, you lose visibility and the ability to intervene until the synthesized voice comes out. For general consumer-grade assistants, this simplification works. But for companies with specialized vocabulary, unique brand voices, and strict compliance needs, simplicity comes at the cost of surgical control.

Lessons Learned from the Front Lines: The Echokit Experience

Our understanding of this architectural divide is forged through experience building complex, scalable voice platforms. In the early days of advanced voice interaction—systems like echokit—we tackled the challenge of delivering functional, high-quality, and reliable Voice AI using the available modular components.

These pioneering efforts, long before current E2E models were mainstream, taught us a crucial lesson: The ability to inspect, isolate, and optimize each stage (ASR, NLU/LLM, TTS) is non-negotiable for achieving enterprise-level accuracy and customization. We realized that while assembling the pipeline was complex, the resulting control over domain-specific accuracy, language model behavior, and distinct voice output ultimately delivered superior business results and a truly unique brand experience.

More importantly, EchoKit, which is open source, ensures complete transparency and adaptability.

The Power of the Modular Pipeline: Control and Precision (Higher Latency, Full Control)

The ASR-LLM-TTS pipeline breaks the Voice AI process down into three discrete, controllable stages. While this sequential process often results in higher overall latency compared to E2E solutions, this modularity is a deliberate architectural choice that grants businesses the power to optimize every single touchpoint.

ASR (Acoustic and Language Model Fine-tuning): You can specifically train the ASR component on your industry jargon, product names, or regional accents. This is crucial in sectors like finance, healthcare, or manufacturing, where misrecognition of a single term can be disastrous. The pipeline allows you to correct ASR errors before they even reach the LLM, ensuring higher fidelity input.
LLM (Knowledge Injection and Logic Control): This is the brain. You have the flexibility to swap out the LLM (whether it's GPT, Claude, or a custom model) and deeply integrate your proprietary knowledge bases (RAG), MCP servers, business rules, and specific workflow logic. You maintain complete control over the reasoning path and ensure responses are accurate and traceable.
TTS (Brand Voice and Emotional Context): This is the face and personality of your brand. You can select, fine-tune, or even clone a unique voice that perfectly matches your brand identity, adjusting emotional tone and pacing. Your agent should sound like your company, not a generic robot.

Voice AI Architecture Comparison: E2E vs. ASR-LLM-TTS

The choice boils down to a fundamental trade-off between Latency vs. Customization.

Feature	End-to-End (E2E) Model (e.g., Gemini Live)	ASR-LLM-TTS Pipeline (Modular)
Primary Advantage	Ultra-Low Latency & Fluidity. Excellent for fast, generic conversation.	Maximum Customization & Control. Optimized for business value.
Latency	Significantly Lower. Integrated processing minimizes delays.	Generally Higher. Sequential processing introduces latency between stages.
Architecture	Integrated Black Box. All components merged.	Three Discrete Modules. ASR $\to$ LLM $\to$ TTS.
Customization	Low. Limited ability to adjust individual components or voices.	High. Each module can be independently trained and swapped.
Brand Voice	Limited. Locked to vendor's available TTS options.	Full Control. Can implement custom voice cloning and precise emotion tagging.
Optimization Path	All-or-Nothing. Optimization requires waiting for the vendor to update the entire model.	Component-Specific. Allows precise fixes and continuous improvement on any single module.
Strategic Lock-in	High. Tightly bound to the single End-to-End vendor/platform.	Low. Flexibility to integrate best-of-breed components from different vendors.

The Verdict: Choosing a Strategic Asset

While the ultra-low latency of an End-to-End agent is undoubtedly attractive, it is crucial to ask: Does speed alone deliver business value?

For most enterprise use cases—where the Agent handles critical customer service, sales inquiries, or technical support—the ability to be accurate, on-brand, and deeply integrated is far more valuable than shaving milliseconds off the response time.

The ASR-LLM-TTS architecture, validated by our experience with systems like echokit, is the strategic choice because it treats the Voice AI Agent not as a simple conversational tool, but as a controllable, customizable, and continuously optimizable business asset. By opting for modularity, you retain the control necessary to adapt to market changes, ensure data compliance, and, most importantly, deliver a unique and expert-level experience that truly reflects your brand.

Which solution delivers the highest long-term ROI and the strongest brand experience? The answer is clear: Control is the key to enterprise Voice AI.

Day 12 — Switching EchoKit to Grok (with Built-in Web Search) | The First 30 Days with EchoKit

December 5, 2025 · 3 min read

Over the past days, we’ve been exploring how EchoKit’s ASR → LLM → TTS pipeline works. We learned how to replace different ASR providers, and this week we shifted our focus to the LLM — the part that thinks, reasons, and decides how EchoKit should reply.

We have connected EchoKit to OpenAI and OpenRouter. Today, we’re trying something different: Grok — a super-fast LLM with built-in web search.

Why Grok?

Grok, developed by X, stands out for a few practical reasons:

⚡ Extremely fast inference Great for voice AI agents like EchoKit.
🔍 Built-in web search Your device can answer questions using fresh information from the internet.
🔌 OpenAI-compatible API Minimizes changes — EchoKit can talk to it just like it talks to OpenAI.

For a small device that depends on fast responses, Grok is an excellent option.

How to Use Grok as Your LLM in EchoKit

All you need to do is update your config.toml of your EchoKit Server. No code changes, no rewriting your server — just swap URLs and keys.

1. Set Grok as the LLM provider

In your config.toml, make sure the [llm] section points to Grok:

[llm]
llm_chat_url = "https://api.x.ai/v1/chat/completions"
api_key = "YOUR_API_KEY"
model = "grok-4-1-fast-non-reasoning"
history = 5

You can find your Grok API key in your xAI account dashboard. You will need to buy credits before using the Grok API.

Don't rush to close the config.toml window.

2. Enable Grok’s Web Search

This is the special part.

Add the following section in the config.toml file:

[llm.extra]
search_parameters = { mode = "auto" }

mode = "auto" allows Grok to decide when it should fetch information from the web. Ask anything news-related, trending, or timely — Grok will search when needed.

Restart the EchoKit server

After that, save these changes, and restart your EchoKit server.

If your server is outdated, you'll need to recompile it from source. Support for Grok with built-in web search was added in a commit on December 5, 2025.

Try It Out

Press the K0 button to chat with EchoKit and try these prompts:

“What’s the latest news in AI today?”
“How’s the Bitcoin price right now?”
“What's the current time in San Francisco?”

If everything is configured correctly, you’ll notice Grok pulling fresh information in its responses. It feels different — the answers are more grounded in what’s happening right now.

Switching EchoKit to Grok was surprisingly simple — just a few lines in a config file. Now my device can do real-time search when a question needs up-to-date info.

If you want to share your experience or see what others are building with EchoKit + Grok:

Join the EchoKit Discord
Or share your latency tests, setups, and experiments — we love seeing them

Want to get your own EchoKit device?

Day 11: Switching EchoKit’s LLM to Groq — And Experiencing Real Speed | The First 30 Days with EchoKit

December 4, 2025 · 3 min read

Over the past few days, we’ve been exploring how flexible EchoKit really is — from changing the welcome voice and boot screen to swapping between different ASR providers like Groq Whisper, OpenAI Whisper, and local models.

This week, we shifted our focus to the LLM part of the pipeline. After trying OpenAI and OpenRouter, today we’re moving on to something exciting — Groq, known for its incredibly fast inference.

Why Groq? Speed. Real, noticeable speed.

Groq runs Llama and other open source models on its LPU™ hardware, which is built specifically for fast inference. When you pair Groq with EchoKit:

Responses feel snappier,
Interactions become smoother

If you want your EchoKit to feel ultra responsive, Groq is one of the best providers to try.

How to Use Groq as Your EchoKit LLM Provider

Just like yesterday’s setup, all changes happen in your config.toml of your EchoKit server.

Step 1 — Update your LLM section

Locate the llm section and replace the existing LLM provider with something like:

[llm]
chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR_GROQ_API_KEY"
model = "openai/gpt-oss-120b"
history = 5

Replace the LLM endpoint URL, API key and model name. The production models from Groq are llama-3.1-8b-instant, llama-3.3-70b-versatile, meta-llama/llama-guard-4-12b, openai/gpt-oss-120b, and openai/gpt-oss-20b.

Step 2 — Restart your EchoKit server

After editing the config.toml, you will need to restart your EchoKit server.

Docker users:

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server-vad &

Or restart the Rust binary if you’re running it locally.

# Enable debug logging
export RUST_LOG=debug

# Run the EchoKit server in the background
nohup target/release/echokit_server &

Then return to the setup page, pair the device if needed. You should immediately feel the speed difference — especially on follow-up questions.

A Few Tips for Groq Users

Groq works best with Llama models
You can experiment with smaller or larger models depending on your device’s use case
For learning or exploring, the default Groq Llama models are a great starting point

Groq is known for ultra-fast inference, and pairing it with EchoKit makes conversations feel almost instant.

If you’re building a responsive voice AI agent, Groq is definitely worth trying.

If you want to share your experience or see what others are building with EchoKit + Groq:

Join the EchoKit Discord
Or share your latency tests, setups, and experiments — we love seeing them

Want to get your own EchoKit device?

Day 10: Using OpenRouter as Your EchoKit LLM Provider | The First 30 Days with EchoKit

December 3, 2025 · 3 min read

Over the past two weeks, we’ve explored many moving parts inside the ASR → LLM → TTS pipeline. We’ve changed the welcome voice, updated the boot screen, switched between multiple ASR providers, and learned how to run the EchoKit server both via Docker and from source.

This week, we shifted our focus to the LLM, the part of the pipeline that interprets what you say and decides how EchoKit should respond.

Yesterday, we used OpenAI as the LLM provider. Today, we’re going to try something more flexible — OpenRouter.

What Is OpenRouter?

OpenRouter is a unified API gateway that gives you access to many different LLMs without changing your code structure. It’s fully OpenAI-API compatible in the context of text generation models, which means EchoKit can work with it right away.

Some reasons I like OpenRouter:

You can choose from a wide selection of open source LLMs: Qwen, Llama, DeepSeek, Mistral, etc.
Switching models doesn’t require code changes — just update the model name.
Often more cost-effective and more customizable.
Great for exploring different personalities and response styles for EchoKit.

How to Use OpenRouter as Your LLM Provider

1. Get Your OpenRouter API Key

Go to your OpenRouter dashboard and generate an API key. Keep it private — it works just like an OpenAI API key.

2. Update `config.toml`

Open your EchoKit server configuration file and locate LLM:

[llm]
provider = "openrouter"
chat_url = "https://openrouter.ai/api/v1/chat/completions"
api_key = "YOUR_API_KEY_HERE"
model = "qwen/qwen3-14b"
history = 5

You can replace the model with any supported model on OpenRouter.

3. Restart Your EchoKit Server

If you’re running from the Rust source code, after saving the updated config.toml:

# Enable debug logging
export RUST_LOG=debug

# Run the EchoKit server in the background
nohup target/release/echokit_server &

Or using Docker:

docker run --rm \
  -p 8080:8080 \
  -v $(pwd)/config.toml:/app/config.toml \
  secondstate/echokit:latest-server-vad &

Then return to the setup page, pair the device if needed, and EchoKit will now respond using OpenRouter. That's it.

Connecting EchoKit to OpenRouter feels like I unlocked a new layer of creativity. OpenAI gives you a clean and reliable default, but OpenRouter opens the door to experimenting with different model behaviors, tones, and personalities — all without changing your application logic.

If you enjoy tweaking, tuning, and exploring how different models shape your EchoKit’s “brain”, OpenRouter is one of the best tools for that.

If you want to share your experience or see what others are building with EchoKit + OpenRouter:

Join the EchoKit Discord
Or share your latency tests, setups, and experiments — we love seeing them

Want to get your own EchoKit device?

EchoKit Update in Nov: Firmware & Server Improvements

December 2, 2025 · 3 min read

We’re excited to share the latest updates of EchoKit in Nov, our open-source voice AI kit for makers, developers, students. These updates introduce new features in both the firmware and server, making it easier than ever to set up your device and customize its behavior.

Firmware Update

The latest firmware brings several user-friendly improvements:

One-Click Wi-Fi & Server Setup All configuration options—including Wi-Fi credentials and server URL—are now bundled into a single setup interface when connecting the EchoKit Server to your device. Click one button - Save Configurations, and your device will automatically save the settings, restart, and apply the new configuration. See details here.
Version Display You can now easily check your EchoKit firmware version on the device, helping you keep track of updates.
EchoKit Box Volume Adjustment Adjust the volume directly on your EchoKit Box for a better audio experience without extra steps.
- K2 to lower the volume
- K1 to increase the volume

Server Update

The EchoKit server has also received key improvements:

Dynamic Prompt Loading via URL

Prompts define how the AI responds, and with the growing ecosystem of open-source LLM prompts, there’s a wealth of ready-to-use content. For example, websites like LLMs.txt host thousands of prompts for various AI models and use cases. With dynamic prompt loading, you can point EchoKit to these URLs and experiment with different personalities, knowledge bases, or conversation styles in seconds.

You can now load prompts dynamically from a URL, allowing you to:
- Update the AI’s behavior remotely
- Test new conversation flows without restarting the server
- Quickly iterate on experiments and demos
Learn more from the doc: https://echokit.dev/docs/server/dynamic-system
Add a Wait Message for MCP Tools When calling MCP tools, a “please wait” message will now appear, providing clear feedback while operations are in progress.

How to Get These New Features

Firmware Update

Download the latest firmware from EchoKit Firmware Page
Flash the firmware to your device using the ESP32 Launchpad or CLI command line
Your device will now support one-click setup, version display, and volume adjustment for EchoKit Box

Server Update

Get the latest EchoKit server: https://github.com/second-state/echokit_server/releases
Run the latest EchoKit server with docker or from Rust source code
You’ll get dynamic prompt loading and wait messages for MCP tools

Once your device and server are updated, all new features will be immediately available.

These updates are part of our ongoing effort to make EchoKit more user-friendly, flexible, and powerful. Whether you’re a maker experimenting with AI at home or a developer building advanced voice interactions, these improvements make it easier to focus on what matters: creating amazing experiences.

Stay tuned for more updates, and happy tinkering with EchoKit!

Day 9: Use OpenAI as Your EchoKit LLM Provider | The First 30 Days with EchoKit

December 1, 2025 · 4 min read

(And today, you’ll see how easy it is to use OpenAI as your LLM provider.)

Hey everyone, and welcome back! We've covered a ton of ground over the past two weeks in "The First 30 Days with EchoKit." Seriously, look how much we've accomplished:

Getting a quick start with the pre-set EchoKit server.
Figuring out how to change your EchoKit's welcome voice.
Customizing the boot screen
Learning to run the EchoKit server locally with Docker and from the Rust source.
Swapping out ASR (Speech-to-Text) providers like a pro, moving between Groq Whisper, OpenAI Whisper, and Local Whisper.

If you remember, everything inside EchoKit runs through that simple yet incredibly powerful pipeline: ASR → LLM → TTS so far.

Each piece plays a crucial part in the voice AI loop:

ASR (The Ears): Converts your spoken words into text.
LLM (The Brain): Interprets that text, thinks about it, and decides what the perfect response should be.
TTS (The Mouth): Turns the final text answer back into speech.

Last week, we were all about replacing Whisper and swapping out the "ears." For the next few days, we're putting the spotlight squarely on the middle piece: the LLM.

And today, we’re starting with the most common and powerful choice out there—OpenAI!

⭐ What Exactly Does the LLM Do in the EchoKit Server? (It's the Mastermind!)

The LLM is, quite literally, the mastermind of your entire setup. It's the engine that:

Instantly grasps what the user actually wants.
Processes all the conversational history (context).
Generates those helpful, natural, and human-like responses.
Controls how your EchoKit behaves during a conversation.
And, yes, it calls the necessary MCP servers to get things done!

EchoKit proudly supports any provider that uses an OpenAI-compatible LLM API.

Step 1 — Get Your Key Ready

Open up your trusted config.toml file and find the [llm] section. Replace it with this block:

[llm]
llm_chat_url = "https://api.openai.com/v1/chat/completions"
api_key = "YOUR_OPENAI_KEY" # Don't forget to replace this!
model = "gpt-5-mini-2025-08-07" # Choose your favorite model here (e.g., gpt-3.5-turbo)
history = 5

Here's the quick rundown on those settings, just so you know what you're tuning:

[llm]: We're configuring the Large Language Model section.
llm_chat_url: OpenAI’s chat completions endpoint.
api_key: Get your key from the OpenAI API platform.
model: Which OpenAI model should power your EchoKit's thoughts? Up to you!
history: How many previous turns of the conversation should your EchoKit remember for context?

Step 2 — Time for a Quick Reboot!

Whether you’re running your EchoKit server via Docker or from the Rust code, go ahead and restart it right now. That’s it! You're completely done with the server configuration. Told you it was easy!

Step 3 — Connect the New Brain to Your Device

The grand finale! Time to link up your physical EchoKit device to the server with its shiny new OpenAI brain:

Head over to https://echokit.dev/setup/ and reconnect the server if you need to.
Pro Tip: If you only changed your LLM configuration and nothing else (URL, WiFi), you can just hit the RST button on your EchoKit device. It will restart and sync the new settings instantly!
If your server URL or WiFi setup changed, you'll need to reconfigure them through the setup page, just like you did on Day 1.

Next, press that K0 button and start speaking. Every clever thing your EchoKit says back to you is now being powered by OpenAI!

If you want to share your experience or see what others are building with EchoKit + OpenAI:

Join the EchoKit Discord
Or share your latency tests, setups, and experiments — we love seeing them

Want to get your own EchoKit device?

Why dynamic system prompts?​

How to use a remote prompt​

When does EchoKit reload the prompt?​

Summary​

🧠 What is MCP?​

🔧 Connect EchoKit to an MCP Server​

🌐 Example: Adding a Web Search Tool​

📌 Why This Matters​

What Is a System Prompt, and Why Does It Matter?​

Where the System Prompt Lives in EchoKit​

5 Fun and Hilarious Prompt Ideas You Can Try Today​

Prompt Debugging Tips​

Run the Qwen3-4B Model Locally​

Step 1 — Install WasmEdge​

Step 2 — Download Qwen3-4B in GGUF Format​

Step 3 — Download the LlamaEdge API Server (WASM)​

Step 4 — Start the Local LLM Server​

Connect EchoKit to Your Local LLM​

The Allure of the Integrated Black Box (Low Latency, High Constraint)​

Lessons Learned from the Front Lines: The Echokit Experience​

The Power of the Modular Pipeline: Control and Precision (Higher Latency, Full Control)​

Voice AI Architecture Comparison: E2E vs. ASR-LLM-TTS​

The Verdict: Choosing a Strategic Asset​

Why Grok?​

How to Use Grok as Your LLM in EchoKit​

1. Set Grok as the LLM provider​

2. Enable Grok’s Web Search​

Restart the EchoKit server​

Try It Out​

Why Groq? Speed. Real, noticeable speed.​

How to Use Groq as Your EchoKit LLM Provider​

Step 1 — Update your LLM section​

Step 2 — Restart your EchoKit server​

A Few Tips for Groq Users​

What Is OpenRouter?​

How to Use OpenRouter as Your LLM Provider​

1. Get Your OpenRouter API Key​

2. Update config.toml​

3. Restart Your EchoKit Server​

Firmware Update​

Server Update​

How to Get These New Features​

Firmware Update​

Server Update​

⭐ What Exactly Does the LLM Do in the EchoKit Server? (It's the Mastermind!)​

Step 1 — Get Your Key Ready​

Step 2 — Time for a Quick Reboot!​

Step 3 — Connect the New Brain to Your Device​

Why dynamic system prompts?

How to use a remote prompt

When does EchoKit reload the prompt?

Summary

🧠 What is MCP?

🔧 Connect EchoKit to an MCP Server

🌐 Example: Adding a Web Search Tool

📌 Why This Matters

What Is a System Prompt, and Why Does It Matter?

Where the System Prompt Lives in EchoKit

5 Fun and Hilarious Prompt Ideas You Can Try Today

Prompt Debugging Tips

Run the Qwen3-4B Model Locally

Step 1 — Install WasmEdge

Step 2 — Download Qwen3-4B in GGUF Format

Step 3 — Download the LlamaEdge API Server (WASM)

Step 4 — Start the Local LLM Server

Connect EchoKit to Your Local LLM

The Allure of the Integrated Black Box (Low Latency, High Constraint)

Lessons Learned from the Front Lines: The Echokit Experience

The Power of the Modular Pipeline: Control and Precision (Higher Latency, Full Control)

Voice AI Architecture Comparison: E2E vs. ASR-LLM-TTS

The Verdict: Choosing a Strategic Asset

Why Grok?

How to Use Grok as Your LLM in EchoKit

1. Set Grok as the LLM provider

2. Enable Grok’s Web Search

Restart the EchoKit server

Try It Out

Why Groq? Speed. Real, noticeable speed.

How to Use Groq as Your EchoKit LLM Provider

Step 1 — Update your LLM section

Step 2 — Restart your EchoKit server

A Few Tips for Groq Users

What Is OpenRouter?

How to Use OpenRouter as Your LLM Provider

1. Get Your OpenRouter API Key

2. Update `config.toml`

3. Restart Your EchoKit Server

Firmware Update

Server Update

How to Get These New Features

Firmware Update

Server Update

⭐ What Exactly Does the LLM Do in the EchoKit Server? (It's the Mastermind!)

Step 1 — Get Your Key Ready

Step 2 — Time for a Quick Reboot!

Step 3 — Connect the New Brain to Your Device