Skip to content

qmx/budgie

Repository files navigation

Budgie

Voice interaction MCP server in pure Rust. Provides speech-to-text and text-to-speech capabilities for AI coding agents.

Features

  • chat - Full voice conversation with optional TTS prompt, audio chimes, and transcription

Requirements

  • Rust 1.85+ (uses Rust 2024 edition)
  • OpenAI-compatible STT server (e.g., faster-whisper, whisper.cpp)
  • OpenAI-compatible TTS server (e.g., Kokoro, Piper)
  • Microphone and speakers

Installation

Build from source:

git clone https://github.com/qmx/budgie
cd budgie
cargo build --release

Configuration

Budgie looks for configuration in this order:

  1. --config CLI flag
  2. $BUDGIE_CONFIG environment variable
  3. $XDG_CONFIG_HOME/budgie/config.toml (typically ~/.config/budgie/)
  4. ./config.toml (current directory)
  5. Built-in defaults (no config file required)

Create ~/.config/budgie/config.toml:

[server]
host = "127.0.0.1"
port = 8787

[stt]
endpoint = "http://localhost:8080/v1/audio/transcriptions"
model = "whisper"

[tts]
endpoint = "http://localhost:8080/v1/audio/speech"
model = "kokoro"
voice = "af_sky"

[audio]
sample_rate = 16000
channels = 1

[vad]
threshold = 0.5
min_silence_duration_ms = 1000
min_speech_duration_ms = 100

See config.example.toml for a fully commented example.

Usage

# Run with default config
budgie

# Run with custom config
budgie --config /path/to/config.toml

# Run on specific port
budgie --port 9000

# Override STT/TTS endpoints
budgie --stt-endpoint http://localhost:9000/v1/audio/transcriptions
budgie --tts-endpoint http://localhost:9000/v1/audio/speech

# Enable debug logging
budgie --log-level debug

MCP Client Configuration

Add to your MCP client config:

{
  "mcpServers": {
    "budgie": {
      "type": "http",
      "url": "http://127.0.0.1:8787/mcp",
      "timeout": 180000
    }
  }
}

Important: The timeout must be at least as long as vad.max_duration_secs (default 120 seconds). Voice recordings can take up to the max duration, and if the MCP client times out first, the request will fail.

Tools

chat

Full voice conversation loop. Optionally speaks a TTS prompt, plays start/end chimes, records until speech ends (via VAD), and returns the transcript.

{
  "prompt": "What would you like to do?",
  "max_duration": 30
}

Returns: {"transcript": "user's spoken response"}

Acknowledgments

Inspired by VoiceMode by Mike Bailey.

License

AGPL-3.0-or-later. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors