Voice interaction MCP server in pure Rust. Provides speech-to-text and text-to-speech capabilities for AI coding agents.
- chat - Full voice conversation with optional TTS prompt, audio chimes, and transcription
- Rust 1.85+ (uses Rust 2024 edition)
- OpenAI-compatible STT server (e.g., faster-whisper, whisper.cpp)
- OpenAI-compatible TTS server (e.g., Kokoro, Piper)
- Microphone and speakers
Build from source:
git clone https://github.com/qmx/budgie
cd budgie
cargo build --releaseBudgie looks for configuration in this order:
--configCLI flag$BUDGIE_CONFIGenvironment variable$XDG_CONFIG_HOME/budgie/config.toml(typically~/.config/budgie/)./config.toml(current directory)- Built-in defaults (no config file required)
Create ~/.config/budgie/config.toml:
[server]
host = "127.0.0.1"
port = 8787
[stt]
endpoint = "http://localhost:8080/v1/audio/transcriptions"
model = "whisper"
[tts]
endpoint = "http://localhost:8080/v1/audio/speech"
model = "kokoro"
voice = "af_sky"
[audio]
sample_rate = 16000
channels = 1
[vad]
threshold = 0.5
min_silence_duration_ms = 1000
min_speech_duration_ms = 100See config.example.toml for a fully commented example.
# Run with default config
budgie
# Run with custom config
budgie --config /path/to/config.toml
# Run on specific port
budgie --port 9000
# Override STT/TTS endpoints
budgie --stt-endpoint http://localhost:9000/v1/audio/transcriptions
budgie --tts-endpoint http://localhost:9000/v1/audio/speech
# Enable debug logging
budgie --log-level debugAdd to your MCP client config:
{
"mcpServers": {
"budgie": {
"type": "http",
"url": "http://127.0.0.1:8787/mcp",
"timeout": 180000
}
}
}Important: The timeout must be at least as long as vad.max_duration_secs (default 120 seconds). Voice recordings can take up to the max duration, and if the MCP client times out first, the request will fail.
Full voice conversation loop. Optionally speaks a TTS prompt, plays start/end chimes, records until speech ends (via VAD), and returns the transcript.
{
"prompt": "What would you like to do?",
"max_duration": 30
}Returns: {"transcript": "user's spoken response"}
Inspired by VoiceMode by Mike Bailey.
AGPL-3.0-or-later. See LICENSE for details.