Skip to main content

26 posts tagged with "echokit30days"

View All Tags

Day 26: Generate config.toml with Claude Code Skills | The First 30 Days with EchoKit

· 6 min read

Over the first 25 days of this series, we've configured EchoKit by manually editing config.toml files. That works fine for tweaks, but it's tedious when you're setting up EchoKit for the first time or trying a completely different configuration.

Today, we're introducing a faster way: the EchoKit Config Generator skill for Claude Code.

This skill automates the entire setup process through an interactive conversation—no manual TOML editing required.

Watch the skill in action:

What Are Claude Code Skills?

Claude Code "skills" are reusable prompts that live in .claude/skills/ directories. Think of them as mini-programs written in natural language. Instead of explaining what you want every time, you trigger a skill, and it guides the AI through a structured workflow.

Why do we need Claude Code skill for EchoKit?

Setting up an EchoKit server involves many steps: writing TOML configuration, understanding platform-specific field names, collecting API keys, building the server, finding your IP address, and launching with the right commands. For beginners, this can be overwhelming. Even experienced users can forget details like which section comes first, or whether ElevenLabs uses api_key or token.

The EchoKit Config Generator skill solves this by turning setup into a conversation. More importantly, it teaches you how to configure EchoKit server along the way. As you answer questions, you learn:

  • How to set up EchoKit server — What goes into config.toml and why
  • How to run EchoKit server — The cargo build --release command, launching with debug logging
  • How to get your IP address — The skill shows you exactly how to find your actual local IP (not localhost) and construct the WebSocket URL

Unlike documentation that you read once and forget, the skill guides you through each step interactively. You see the config being generated, understand what each field does, and learn the workflow by doing it—while the skill handles the technical details for you.

The EchoKit Config Generator skill comes bundled with the echokit_server repository. Just clone the repo, and Claude Code discovers it automatically.

Installing the Skill

First, clone the echokit_server repository:

git clone https://github.com/second-state/echokit_server.git
cd echokit_server

That's it. Claude Code automatically discovers skills in .claude/skills/ directories within your workspace. No additional installation required.

Using the Skill

In Claude Code, simply say: "Generate an EchoKit config for a coding assistant"

The skill guides you through a 5-phase process:

Phase 1: Describe Your Assistant — Answer 7 questions about purpose, tone, capabilities, response style, domain knowledge, constraints, and preferences. The skill generates a sophisticated system prompt from your answers.

Phase 2: Choose Platforms — For each service (ASR, TTS, LLM), select from pre-configured options or choose "Custom" to specify any platform. The skill auto-discovers API documentation via web search for custom platforms.

Phase 3: MCP Server — Optionally add an MCP server by providing the URL.

Phase 4: Preview and Generate — Review your complete config.toml, confirm it's correct, and the skill writes both config.toml and SETUP_GUIDE.md to your chosen directory.

Phase 5: API Keys and Launch — The skill shows where to get API keys, collects them from you, updates config.toml, builds the server with cargo build --release, and launches it with debug logging enabled. When the server starts, the skill automatically detects your local IP address and displays the WebSocket URL ready for you to connect.

From zero to running EchoKit in one conversation.

Why This Matters

The Config Generator offers several advantages:

Faster Setup — Answer questions instead of reading docs and writing TOML manually. The skill handles syntax, field names, and structure automatically.

Fewer Errors — No more wrong field names, incorrect section order, or missing fields. The skill knows platform-specific details like ElevenLabs using token instead of api_key.

Custom Platform Discovery — Want to use a new LLM provider? The skill searches the web for API documentation and confirms with you. Groq, DeepSeek, Mistral, Together—all auto-discovered.

Rich System Prompts — The 7-question phase generates sophisticated system prompts tailored to your use case, saving you time crafting them manually.

Complete Workflow — It doesn't just generate a config. It collects API keys, builds the server, launches it, and even detects your local IP address. You get a ready-to-use WebSocket URL—no manual IP lookup required.

Ready Connection Details — After launching, the skill automatically finds your actual local IP address (not localhost) and displays the complete WebSocket URL. Just copy and paste it into your EchoKit device to connect.

When to Use the Skill vs. Manual Configuration

Use the Skill WhenUse Manual Config When
First-time EchoKit setupQuick API key changes
Learning how EchoKit server worksAdjusting history value
Trying new LLM providersMinor parameter tweaks
Creating custom personalitiesVersion-controlling configs
Exploring custom platformsScripting deployments
Understanding the complete workflowYou know exactly what you need

Both approaches are valid. The skill is also a learning tool—it guides you through each step while explaining what's happening, so you understand the setup process deeply. Manual editing provides precision control once you're familiar with the configuration.

Supported Platforms

Pre-configured:

  • ASR: OpenAI Whisper, Local Whisper
  • TTS: OpenAI, ElevenLabs, GPT-SoVITS
  • LLM: OpenAI Chat, OpenAI Responses API

Custom (auto-discovered via web search):

  • Any OpenAI-compatible LLM: Groq, DeepSeek, Mistral, Together, and more
  • Any platform with documented APIs

Choose "Custom" and the skill finds the rest.

What's Next: Day 27

You now have a fully configured EchoKit server running a custom personality—set up through conversation, not configuration files.

But what happens when you want to share your EchoKit setup with others? Or deploy it to multiple devices?

On Day 27, we'll explore configuration management: versioning your configs, sharing setups, and managing multiple EchoKit instances.


Ready to try the Config Generator skill or share your own configurations?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 25: Built-in Web Search with LLM Providers | The First 30 Days with EchoKit

· 5 min read

On Day 15, we introduced EchoKit's ability to connect to MCP servers, giving your voice agent access to external tools and actions. We showed how to connect to Tavily for web search.

On Day 23, we added DuckDuckGo MCP for real-time web search.

Both approaches required you to host or connect to an external search service. That works great, but what if there was an even simpler way?

Today, we're exploring a different approach: using the LLM provider's own built-in web search.

No separate MCP servers to configure. No API keys for search services. No extra infrastructure.

Just enable a tool, and your EchoKit can search the web.

Before diving in, let's clarify the two approaches to web search we've covered:

ApproachSetupInfrastructureControl
MCP Servers)You connect to external search APIsRequires separate MCP serversFull control over search source
Built-in Tools (Today)Enable in config.tomlLLM provider handles everythingProvider manages search

Built-in tools are simpler — the LLM provider (OpenAI, xAI, etc.) handles everything. When your EchoKit needs current information, it just asks the provider, which performs the search and returns results.

MCP servers give you more control — you choose the search engine, can customize results, and can host it yourself.

Both approaches work. Today is about the simpler path: built-in tools.

The EchoKit server now supports the OpenAI Responses API — a stateful API that enables advanced LLM features including built-in web search.

Let's set up EchoKit with different LLM providers' built-in web search.

OpenAI with Web Search Preview

OpenAI offers the web_search_preview tool:

[llm]
platform = "openai_responses"
url = "https://api.openai.com/v1/responses"
api_key = "sk_ABCD"
model = "gpt-5-nano"

[[llm.extra.tools]]
type = "web_search_preview"

Key points:

  • platform = "openai_responses" enables the Responses API
  • type = "web_search_preview" enables OpenAI's built-in search

xAI's Grok offers a web_search tool with optional filtering:

[llm]
platform = "openai_responses"
url = "https://api.x.ai/v1/responses"
api_key = "xai_ABCD"
model = "grok-4-1-fast-non-reasoning"

[[llm.extra.tools]]
type = "web_search"
# Optional: filters = { "allowed_domains" = ["wikipedia.org"] }

Grok also provides a x_search tool to search posts on x.com (Twitter).

Groq's implementation calls it browser_search:

[llm]
platform = "openai_responses"
url = "https://api.groq.com/openai/v1/chat/responses"
api_key = "gsk_ABCD"
model = "openai/gpt-oss-20b"

[[llm.extra.tools]]
type = "browser_search"

Ask EchoKit: "What's the Weather?"

Once configured, restart your EchoKit server and try a question that requires current information:

User: "What's the weather like in San Francisco right now?"

Under the hood, here's what happens with the Responses API:

  1. EchoKit sends query — Only the latest user message is sent
  2. LLM evaluates — The provider determines this needs current information
  3. Web search performed — The provider searches automatically
  4. Response generated — The LLM synthesizes an answer from search results
  5. Context saved — Search results are stored for follow-up questions

EchoKit might respond like this:

"Let me check the current weather...

Currently in San Francisco, it's 62 degrees Fahrenheit with partly cloudy skies. The high today will be around 68 degrees, with a low of 55 tonight."

Try Follow-up Questions

Because the Responses API is stateful, follow-up questions work naturally:

User: "What about tomorrow?"

The LLM provider already has the weather context from the previous search, so it can answer immediately without searching again.

"Tomorrow in San Francisco, expect sunny skies with a high of 72 degrees and a low of 58. Perfect weather for being outdoors."

This context awareness is one of the key advantages of the Responses API.

Built-in Tools vs. MCP: Which to Use?

We've now covered two approaches to web search. When should you use each?

Use Built-in Tools When:

  • You want the simplest possible setup
  • You're already using an LLM provider that offers search
  • You don't need to customize search behavior
  • Performance and simplicity are priorities

Use MCP Servers When:

  • You want to choose your own search engine
  • You need to filter or customize results
  • You want to host search infrastructure yourself
  • You're in a region where built-in search isn't available

Both approaches are valid. The beauty of EchoKit is that you can mix and match — use built-in tools from your provider while also connecting to custom MCP servers for specialized capabilities.

The Agentic Vision

Across Day 15, 23, and 25, we've seen EchoKit evolve from a simple chatbot into a true AI agent:

  • Day 15: Connected to external tools via MCP (Tavily search)
  • Day 23: Added DuckDuckGo for privacy-focused web search
  • Day 25: Enabled built-in search from LLM providers

Each approach adds capabilities. Your EchoKit can now:

  • Retrieve real-time information from the web
  • Reason about current events and live data
  • Respond with accurate, up-to-date answers
  • Act on that information (as we saw on Day 24 with Google Calendar)

This is the vision of agentic AI — not just conversation, but action. Not just static knowledge, but real-time information. Not just a chatbot, but a tool that bridges your voice to the entire internet.


Ready to explore more integrations or share your own agent setups?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 24: CES 2026 in Practice — Voice Agents That Act | The First 30 Days with EchoKit

· 6 min read

At CES 2026, the message was clear: Smartphones are so 2025.

The future isn't a bigger or foldable screen. It's AI pendants around your neck, holographic companions like Razer's Project AVA, robot pets that hug back, and always-on voice agents that act without touching any screen.

These aren't just "better assistants." They're proactive voice AI agents that listen, understand context, reason, act, and respond — all hands-free, no phone needed.

EchoKit is the open-source devkit showing how those AI devices work under the hood.

We've been building toward this. On Day 15, we introduced MCP (Model Context Protocol) as EchoKit's gateway to external tools. We showed how to connect to Tavily search. On Day 23, we added DuckDuckGo for real-time web search.

Those were about information — giving your voice agent the ability to retrieve knowledge from the web.

Today is about action.

Today, your EchoKit learns to do things for you. We will show you how to integrate Zapier's Google MCP server and EchoKit to manage your Google Calendar via voice.

Why Action Matters

Imagine this: You're rushing to get ready in the morning, hands full, and you remember you need to schedule a meeting with your team tomorrow at 2 PM.

Without action capability, your EchoKit could say, "You should schedule that meeting when you get to your computer." Helpful, but not helpful enough.

With action capability, you simply say:

"Schedule a team meeting tomorrow at 2 PM for one hour"

And your EchoKit actually does it.

No phone. No computer. No screens. Just voice.

That's the difference between a conversational AI that talks about your schedule and an agentic AI that manages it.

Zapier's Google Calendar MCP Server

For today's integration, we're using Zapier's Google Calendar MCP server. Zapier has built an excellent MCP implementation that provides:

  • Create events — add calendar entries with title, time, and duration
  • List upcoming events — see what's scheduled
  • Search events — find specific appointments
  • Update events — modify existing calendar entries

The Zapier MCP server handles all the OAuth authentication and API details, exposing clean tools that EchoKit can use to take action on your behalf. Remember that EchoKit supports MCP servers in the SSE and HTTP-Streamable mode.

Setting Up Zapier MCP Server

Before configuring EchoKit, you'll need to set up the Zapier MCP server and get your endpoint URL:

  1. Go to zapier.com/mcp** — This is where you manage MCP integrations
  2. Click "+ New MCP Server" — Zapier will walk you through creating the MCP server you want
  3. Click Rotate token to get the MCP server URL — It looks like: `https://mcp.zapier.com/api/v1/connect?token=YOUR_TOKEN``

Keep this URL handy — you'll need it for the next step.

Configure EchoKit for Google Calendar

Now add the Zapier Google Calendar MCP server to your EchoKit config.toml:

[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR_GROQ_API_KEY"
model = "llama-3.3-70b-versatile" # Or any tool-capable model
history = 5

[[llm.mcp_server]]
server = "https://mcp.zapier.com/api/v1/connect?token=YOUR_TOKEN"
type = "http_streamable"
call_mcp_message = "Hold on a second. Let me check your calendar."

Key points:

  • server: Paste the Zapier MCP server endpoint URL you copied above
  • type: or http_streamable for Zapier MCP servers
  • call_mcp_message: What EchoKit says while accessing your calendar

Ask EchoKit: "Schedule a Team Meeting"

Once configured, restart EchoKit server and try a voice command:

User: "Schedule a team meeting tomorrow at 2 PM for one hour"

Under the hood, here's what happens:

  1. LLM parses the request — understands it's a calendar action with time and duration
  2. Tool call initiated — invokes the Google Calendar create_event tool via MCP
  3. Action executed — Zapier adds the event to your Google Calendar
  4. Confirmation returned — EchoKit confirms the action was completed

EchoKit might respond like this:

"Let me check your calendar...

I've scheduled your team meeting for tomorrow at 2 PM. The event will last one hour."

Notice what happened: EchoKit didn't just say something. It did something.

Try It Now

Restart your EchoKit server and test it:

  1. Say: "What's on my calendar today?"
  2. Wait for EchoKit to check
  3. Say: "Schedule a test meeting tomorrow at 10 AM"
  4. Check your Google Calendar — the event should appear, actually created

If it works, you're ready to go. If not, check the troubleshooting section below.

More Voice Commands to Try

Once you have Google Calendar connected, here are some practical voice commands:

  • "What's on my calendar today?" — Get a rundown of your schedule
  • "Schedule a dentist appointment next Tuesday at 3 PM" — Create events with natural language
  • "When is my next meeting?" — Check upcoming events
  • "Block out time for deep work tomorrow morning" — Reserve focused time
  • "Move my team meeting to 3 PM" — Reschedule existing events

The LLM understands natural language timing — "tomorrow morning," "next Tuesday," "in two hours" — and converts it into proper calendar entries.

What makes Zapier's MCP server powerful is that it's not just about calendars. Zapier connects to 5,000+ apps, and through MCP, EchoKit can potentially interact with many of them:

  • Slack — Send messages, check channels
  • Gmail — Compose emails, search inbox
  • Trello/Asana — Create tasks, update boards
  • Notion — Add database entries, create pages
  • GitHub — Create issues, check repositories

Each Zapier integration you enable adds a new action capability to your voice agent.

From Voice to Action

Your EchoKit has evolved through these 24 days:

It started as a conversational AI that could talk with you.

Then it learned to listen and understand intent.

On Day 15 and 23, it learned to search and retrieve information.

Today, it learned to act.

This is the vision of agentic AI — not just conversation, but action. Not just talking about doing things, but actually doing them.

Your EchoKit isn't just answering questions anymore. It's getting things done.


Ready to give your voice agent action capabilities?

Want to get your own EchoKit?

Start building your voice-powered productivity assistant today.

Day 23: Real-Time Web Search with DuckDuckGo MCP | The First 30 Days with EchoKit

· 4 min read

On Day 15, we introduced EchoKit's ability to connect to MCP (Model Context Protocol) servers, which unlocks access to external tools and actions beyond simple conversation. We showed an example using a Tavily-based search MCP server.

Today, we're diving deeper into real-time web search using DuckDuckGo.

Why DuckDuckGo? It's privacy-focused, doesn't require API keys for basic usage, and provides a simple way to bring real-world, up-to-date information into your voice AI conversations.

Why Real-Time Web Search Matters

LLMs have a knowledge cutoff — they only know what they were trained on. Ask about yesterday's news, today's stock prices, or events that happened after the model's training, and they'll simply... not know.

But when you connect EchoKit to a web search MCP server, something magical happens:

  • The LLM recognizes it needs current information
  • It automatically invokes the search tool
  • Results are retrieved from the web in real-time
  • The LLM synthesizes an answer citing actual sources

Suddenly, your EchoKit isn't just a chatbot anymore — it's an AI agent that can access the entire internet through voice.

DuckDuckGo Web Search MCP Server

For today's demo, we're using a DuckDuckGo-based web search MCP server. DuckDuckGo is an excellent choice because:

  • No API key required for basic usage — just point and go
  • Privacy-focused — searches aren't tracked or profiled
  • Open ecosystem — multiple open-source DuckDuckGo MCP implementations exist

The server exposes a simple search tool that queries DuckDuckGo and returns structured results with titles, URLs, and snippets.

DuckDuckGo doesn't provide an official MCP server. You can check out this GitHub repo for more details: https://github.com/nickclyde/duckduckgo-mcp-server

Remember that EchoKit supports MCP server in the SSE and HTTP-Streamable mode.

Add the DuckDuckGo MCP server to your EchoKit config.toml:

[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR_GROQ_API_KEY"
model = "llama-3.3-70b-versatile" # Or any tool-capable model
history = 5

[[llm.mcp_server]]
server = "MCP Endpoint"
type = "http_streamable"
call_mcp_message = "Let me search the web for the latest information."

Key points:

  • server: The DuckDuckGo MCP server endpoint
  • type: http_streamable for streaming responses or SSE are supported
  • call_mcp_message: What EchoKit says while searching (provides feedback during latency)

Ask EchoKit: "What's New in CES 2026?"

Now for the fun part. Restart EchoKit server and ask a question that requires current information:

User: "What's new in CES 2026?"

Under the hood, here's what happens:

  1. LLM recognizes it needs real-time data about CES 2026
  2. Tool call initiated — the LLM invokes the DuckDuckGo search tool via MCP
  3. Search executed — DuckDuckGo queries the web for CES 2026 news
  4. Results returned — titles, URLs, and snippets come back through MCP
  5. Answer synthesized — the LLM processes the results and generates a natural response

EchoKit might respond like this:

"Let me search the web for the latest information...

CES 2026 highlights (as of the first week of the show) ...."

And it would cite the actual sources it found.

Once you have MCP configured, you're not limited to web search. The same protocol lets EchoKit:

  • Manage Google Calendar — add events, check schedules
  • Send messages — Slack, email, Discord
  • Control smart home — Home Assistant integration for lights, AC, security
  • Read and write files — local file system access
  • Run code — execute scripts and return results

Each MCP server adds a new capability. Mix and match to build the agent you need.

Today's DuckDuckGo web search demo shows how EchoKit breaks free from the LLM's training cutoff. It can now:

  • Answer questions about current events
  • Look up live data (sports scores, stock prices, weather)
  • Provide cited information from real sources
  • Act as a research assistant accessible by voice

This is the vision of agentic AI — not just conversation, but action. Not just static knowledge, but real-time information. Not just a chatbot, but a tool that bridges your voice to the entire internet.


Want to explore more MCP integrations or share your own agent setups?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 22: Flashing EchoKit from the Command Line | The First 30 Days with EchoKit

· 5 min read

Yesterday, we covered how to flash EchoKit firmware using the ESP Launchpad web tool. It's simple, browser-based, and works great for most people.

But if you're a developer — or if you've ever had the web flasher fail on you — you might want something more direct.

Today is about flashing EchoKit from the command line.

This approach is faster, gives you more control, and works even in situations where the browser-based tool might struggle. Plus, it feels more... natural for anyone comfortable with a terminal.

Why Flash from the Command Line?

The ESP Launchpad web tool is fantastic for getting started. It removes all friction: no toolchains, no dependencies, just click and flash.

But the command line approach has some real advantages:

  • Speed: Once set up, flashing is significantly faster
  • Reliability: Some USB configurations or systems don't play nicely with the web flasher — the CLI tool often works where the browser fails
  • Automation: If you're flashing multiple devices or setting up a fleet, CLI is scriptable
  • Developer experience: If you're already in the terminal, why leave it?

Best of all? The setup is straightforward if you have Rust installed.

Prerequisites: Rust Toolchain

The espflash tool we'll use is built in Rust. If you already have Rust installed, you can skip this step.

If not, installing Rust is quick:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

This will install Rust, Cargo, and the standard toolchain. Once it's done, restart your terminal or run:

source $HOME/.cargo/env

Install espflash

With Rust ready, install the flashing tools:

cargo install cargo-espflash espflash ldproxy

This will:

  • Install espflash — the actual flashing utility
  • Install cargo-espflash — a cargo helper for building and flashing
  • Install ldproxy — a linker proxy needed for some ESP32 builds

The compilation might take a few minutes. Once it's complete, you'll have the espflash command available globally.

Flashing EchoKit DIY

To flash EchoKit DIY from the command line, follow these steps:

Step 1: Download the Firmware

curl -L -o echokit https://echokit.dev/firmware/echokit_boards

Step 2: Connect Your Device

Use a USB cable to connect your computer to the USB-C port labeled OTG on your EchoKit DIY.

Your computer may prompt you to accept or trust the USB device — accept it.

Step 3: Flash the Firmware

With the device connected, run:

espflash flash --monitor --flash-size 16mb echokit

The flags are:

  • --flash-size 16mb: Sets the flash size for EchoKit DIY
  • --monitor: Keeps the connection open after flashing so you can see the serial output
  • echokit: The firmware file you downloaded

espflash will detect your serial port and ask you to select it if multiple ports are available. Once flashing completes, you'll see the device boot up in the terminal, and the screen will display the QR code.

Flashing EchoKit Box

Flashing EchoKit Box from the command line follows the same process, with just a couple of differences.

Step 1: Download the Firmware

For EchoKit Box, use the Box firmware binary:

curl -L -o echokit https://echokit.dev/firmware/echokit_box

Step 2: Connect Your Device

Use a USB cable to connect your computer to the USB-C port labeled SLAVE on your EchoKit Box.

Your computer may prompt you to accept or trust the USB device — accept it.

Step 3: Flash the Firmware

The command is identical to the DIY version:

espflash flash --monitor --flash-size 16mb echokit

espflash will detect your EchoKit Box, flash the firmware, and monitor the serial output. When it's done, the device will reboot and display the QR code on screen.

Troubleshooting

If something doesn't work, here are a few things to try:

  • Try the other USB port: On EchoKit DIY, if flashing fails on the OTG port, try the TTL port instead. Sometimes the USB data connection behaves differently on each port.
  • Force a reset: If the device isn't detected, press the RST button to reset it, then immediately run the flash command again.
  • Check USB permissions: On Linux, you might need to add your user to the dialout group or adjust udev rules for USB serial devices.

Both Approaches Have a Place

After yesterday's browser-based flashing and today's CLI approach, you now have two ways to keep your EchoKit firmware up to date:

  • ESP Launchpad (browser): Great for beginners, quick updates, or when you're already in a GUI
  • espflash (CLI): Faster, more reliable in tricky environments, and perfect for developers

Neither is "better" — they're different tools for different situations.

The important thing is that you're comfortable updating your firmware. EchoKit is an active, evolving project. New features land regularly. Being able to flash confidently — whether via browser or terminal — means you can stay current with the latest improvements.


Want to get your own EchoKit device and start building?

Join the EchoKit Discord to share your setup and see what others are building with their voice AI agents.

Day 21: Flashing EchoKit DIY and EchoKit Box from the Browser | The First 30 Days with EchoKit

· 3 min read

Over the last 20 days, we’ve been building EchoKit step by step — from voice pipelines and local models to MCP tools and personalities.

Today, I want to focus on something more foundational:

firmware.

In this post, we’ll walk through how to flash EchoKit firmware using the ESP Launchpad web tool. This approach is direct, dependency-free, and works entirely from the browser.

Want to learn to flash via the command line? We will talk about it tomorrow.

EchoKit Firmware Is Open Source — and Always Moving

EchoKit’s firmware is fully open source. The code is public, changes are visible, and improvements land continuously. Bugs are fixed in the open, and new capabilities are added incrementally.

Because of this, the firmware repository doesn’t stand still. As EchoKit evolves, the firmware evolves with it — whether that’s protocol adjustments, performance improvements, new device capabilities, or better defaults.

This means EchoKit is not a “flash once and forget” system.

You should expect to refresh the firmware from time to time. More importantly, you should feel comfortable doing so.


Flashing with ESP Launchpad

ESP Launchpad allows you to flash prebuilt EchoKit firmware directly from a browser, with no local toolchains or dependencies to install.

You can open the ESP Launchpad page here, which is preconfigured with EchoKit firmware profiles:

https://espressif.github.io/esp-launchpad/?flashConfigURL=https://echokit.dev/firmware/echokit.toml

The flashing process is exactly the same for EchoKit Box and EchoKit DIY. The only difference is which firmware profile you select.

EchoKit Box

To flash EchoKit Box, open the flashing page, connect the device to your computer via USB, select EchoKit Box, and click Flash button.

The flashing process takes a few minutes. Once it completes successfully, the page will prompt you to reset the device. After rebooting, you’ll see the QR code screen, which indicates the firmware is ready. That’s it.

EchoKit DIY

EchoKit DIY uses the exact same flashing flow.

The only difference is the firmware profile you choose. On the same flashing page, connect your DIY device via USB, select EchoKit DIY, and click Flash button.

Everything else is identical: the flashing takes a few minutes, you reset the device when prompted, and the QR code appears after reboot.

Once flashing is complete, move on to the next step: connecting the EchoKit server and your device so EchoKit can talk to you.

Why Firmware Refreshing Is Important

Many products try to hide firmware updates as much as possible.

EchoKit does the opposite.

EchoKit is an open system. You’re encouraged to explore it, modify it, and keep it up to date. Firmware updates are a normal part of the workflow, not an exception.

Using a browser-based flasher removes most of the friction. There are no toolchains to install, no OS-specific instructions, and no dependency management. This makes firmware updates accessible even to non-programmers.

Day 20: Running GPT-SoVITS Locally as EchoKit’s TTS Provider | The First 30 Days with EchoKit

· 4 min read

Over the past few days, we’ve been switching EchoKit between different cloud-based TTS providers and voice styles. It’s fun, it’s flexible, and it really shows how modular the EchoKit pipeline is.

But today, I want to go one step further.

Today is about running TTS fully locally. No hosted APIs. No external requests. Just an open-source model running on your own machine — and EchoKit talking through it.

For Day 20, I’m using GPT-SoVITS as EchoKit’s local TTS provider.

What Is GPT-SoVITS?

GPT-SoVITS is an open-source text-to-speech and voice cloning system that combines:

  • A GPT-style text encoder for linguistic understanding
  • SoVITS-based voice synthesis for natural prosody and timbre

Compared to traditional TTS systems, GPT-SoVITS stands out for two reasons.

First, it produces very natural, expressive speech, especially for longer sentences and conversational content.

Second, it supports high-quality voice cloning with relatively small reference audio, which has made it popular in open-source voice communities.

Most importantly for us: GPT-SoVITS can run entirely on your own hardware.

Running GPT-SoVITS Locally

To make local GPT-SoVITS easier to run, we also ported GPT-SoVITS to a Rust-based implementation.

This significantly simplifies local deployment and makes it much easier to integrate with EchoKit.

Check out Build and run a GPT-SoVITS server for details. The following steps are on a MacBook

First, install the LibTorch dependencies:

curl -LO https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.4.0.zip
unzip libtorch-macos-arm64-2.4.0.zip

Then, tell the system where to find LibTorch:

export DYLD_LIBRARY_PATH=$(pwd)/libtorch/lib:$DYLD_LIBRARY_PATH
export LIBTORCH=$(pwd)/libtorch

Next, clone the source code and build the GPT-SoVITS API server:

git clone https://github.com/second-state/gsv_tts
git clone https://github.com/second-state/gpt_sovits_rs

cd gsv_tts
cargo build --release

Then, download the required models. Since I’m running GPT-SoVITS locally on my MacBook, I’m using the CPU versions:

cd resources
curl -L -o t2s.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.cpu.pt
curl -L -o vits.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.cpu.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/ssl_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/bert_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/g2pw_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/mini-bart-g2p.pt

Finally, start the GPT-SoVITS API server:

TTS_LISTEN=0.0.0.0:9094 nohup target/release/gsv_tts &

Configure EchoKit to Use the Local TTS Provider

At this point, GPT-SoVITS is running as a local service and exposing a simple HTTP API.

Once the service is up, EchoKit only needs an endpoint that accepts text and returns audio.

Update the TTS section in the EchoKit server configuration:

[tts]
platform = "StreamGSV"
url = "http://localhost:9094/v1/audio/stream_speech"
speaker = "cooper"

Restart the EchoKit server, connect the service to the device, and EchoKit will start using the new local TTS provider.

A Fully Local Voice AI Pipeline

With today’s setup, we can now run the entire voice AI pipeline locally:

  • ASR: local speech-to-text
  • LLM: local open-source language models
  • TTS: GPT-SoVITS running on your own machine

That means:

  • No cloud dependency
  • No external APIs
  • No vendor lock-in

Just a complete, end-to-end voice AI system you can understand, modify, and truly own.


Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your custom voices and see how others are personalizing their voice AI agents.

Day 19: Switching EchoKit’s TTS Provider to Fish.audio| The First 30 Days with EchoKit

· 2 min read

Over the past few days, we’ve been iterating on different parts of EchoKit’s voice pipeline — ASR, LLMs, system prompts, and TTS (including ElevenLabs and Groq).

On Day 19, we switch EchoKit’s Text-to-Speech provider to Fish.audio, purely through a configuration change.

No code changes are required.

What Is Fish.audio

Fish.audio is a modern text-to-speech platform focused on high-quality, expressive voices and fast iteration for developers.

One notable aspect of Fish.audio is the breadth of available voices. It offers a wide range of voice styles, including voices inspired by public figures, pop culture, and anime culture references, which makes it easy to experiment with playful or character-driven agents.

In addition to preset voices, Fish.audio also supports voice cloning, allowing developers to generate speech in a customized voice when needed.

These features make it particularly interesting for conversational and personality-driven voice AI systems.

EchoKit is designed to be provider-agnostic. As long as a TTS service matches the expected interface, it can be plugged into the system without affecting the rest of the pipeline.

The Exact Change in config.toml

Switching to Fish.audio in EchoKit only requires updating the TTS section in the config.toml file:

[tts]
platform = "fish"
speaker = "03397b4c4be74759b72533b663fbd001"
api_key = "YOUR_FISH_AUDIO_API_KEY"

A brief explanation of each field:

  • platform set to "fish" tells EchoKit to use Fish.audio as the TTS provider.
  • speaker specifies the TTS model ID, which can be obtained from the Fish.audio model detail page.
  • api_key is the API key used to authenticate with the Fish.audio service.

After restarting the EchoKit server and reconnecting the device, all voice output is generated by Fish.audio.

Everything else remains unchanged:

  • ASR stays the same
  • The LLM and system prompts stay the same
  • Conversation flow and tool calls stay the same

With Fish.audio added to the list of supported TTS providers, EchoKit’s voice layer becomes even more flexible — making it easier to experiment with different voices without reworking the system.


Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your welcome voices and see how others are personalizing their voice AI agents!

Day 18: Switching EchoKit to Groq PlayAI TTS | The First 30 Days with EchoKit

· 3 min read

Over the past two weeks, we’ve built almost every core component of a voice AI agent on EchoKit:

ASR to turn speech into text. LLMs to reason, chat, and call tools. System prompts to shape personality. MCP servers to let the agent take real actions. TTS to give EchoKit a voice.

Today, we close the loop again — but this time, with a new voice engine.

We’re switching EchoKit’s TTS backend to Groq’s PlayAI TTS.

Why change TTS?

Text-to-speech is often treated as the “last step” in a voice pipeline, but in practice, it’s the part users feel the most.

Latency, voice stability, and natural prosody directly affect whether a voice agent feels responsive or awkward. Since Groq already powers our ASR and LLM experiments with very low latency, it made sense to test their TTS offering as well.

PlayAI TTS fits EchoKit’s design goals nicely: It’s fast, simple to integrate, and exposed through an OpenAI-compatible API.

That means no special SDK, and no changes to EchoKit’s core architecture.

Switching EchoKit to Groq PlayAI TTS

On EchoKit, swapping TTS providers is mostly a configuration change.

To use Groq PlayAI TTS, we update the tts section in config.toml like this:

[tts]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/speech"
model = "Playai-tts"
api_key = "gsk_xxx"
voice = "Fritz-PlayAI"

A few things worth calling out:

The platform stays as openai because Groq exposes an OpenAI-compatible endpoint. We point the url directly to Groq’s audio speech API. The model is set to Playai-tts. Voices are selected via the voice field — here we’re using Fritz-PlayAI.

Once this is in place, no other code changes are required.

Restart the EchoKit server, reconnect the EchoKit device and the new server, and the agent speaks with a new voice.

The bigger picture

Most importantly, switching different tts providers reinforces one of EchoKit’s core ideas: every part of the voice pipeline should be swappable.

It’s about treating voice as a first-class system component — something you can experiment with, replace, and optimize just like models or prompts.

EchoKit doesn’t lock you into one vendor or one voice. If tomorrow you want to try a different TTS engine, or even run one locally, the architecture already supports that.


Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your welcome voices and see how others are personalizing their voice AI agents!

Day 17: Giving EchoKit a Voice — Using ElevenLabs TTS | The First 30 Days with EchoKit

· 2 min read

Over the past three weeks, we’ve covered almost every core piece of a voice AI agent:

  • ASR: turning human speech into text
  • LLMs: reasoning, chatting, and tool calling
  • System prompts: shaping personality and behavior
  • MCP tools: letting EchoKit take real actions

Today, we complete the loop.

It’s time to talk about TTS — Text to Speech.

Without TTS, your agent can think, plan, and decide — but it can’t speak back. And for a voice-first device like EchoKit, that’s a deal breaker.

In Day 17, we’ll start with one of the most popular choices: ElevenLabs TTS.

Why ElevenLabs?

ElevenLabs is widely used because it offers:

  • Very natural-sounding voices
  • Low latency for real-time conversations
  • Multiple languages and accents
  • Voice cloning support (we’ll get to that later 😉)

For builders, it’s also simple to integrate and well-documented — which makes it a great first TTS provider for EchoKit.

What EchoKit Needs for ElevenLabs TTS

EchoKit’s ElevenLabs configuration lives in the EchoKit server’s config.toml file.

[tts]
platform = "Elevenlabs"
token = ""
voice = "yj30vwTGJxSHezdAGsv" # The voice I choose here is Jessa
  • platform: set to "Elevenlabs"
  • token: your ElevenLabs API key. You can generate one from the ElevenLabs Developer Dashboard
  • voice: the voice ID you want EchoKit to speak with

⚠️ Important: If you pick a voice in ElevenLabs, you must add it to “My Voices”. Otherwise, your API key may not be able to call it, even if the voice plays fine in the UI.

That’s it. model_id is optional in EchoKit’s config and not required for basic TTS.

Restart and Reconnect the Server

After updating the config, restart the EchoKit server, then reconnect the EchoKit device.

When you chat with the device again, you should hear EchoKit speak back — using the voice you selected.

With TTS working, EchoKit finally feels complete as a voice AI companion.

Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your welcome voices and see how others are personalizing their voice AI agents!