Budgie

Voice interaction MCP server in pure Rust. Provides speech-to-text and text-to-speech capabilities for AI coding agents.

Features

chat - Full voice conversation with optional TTS prompt, audio chimes, and transcription

Requirements

Rust 1.85+ (uses Rust 2024 edition)
OpenAI-compatible STT server (e.g., faster-whisper, whisper.cpp)
OpenAI-compatible TTS server (e.g., Kokoro, Piper)
Microphone and speakers

Installation

Build from source:

git clone https://github.com/qmx/budgie
cd budgie
cargo build --release

Configuration

Budgie looks for configuration in this order:

--config CLI flag
$BUDGIE_CONFIG environment variable
$XDG_CONFIG_HOME/budgie/config.toml (typically ~/.config/budgie/)
./config.toml (current directory)
Built-in defaults (no config file required)

Create ~/.config/budgie/config.toml:

[server]
host = "127.0.0.1"
port = 8787

[stt]
endpoint = "http://localhost:8080/v1/audio/transcriptions"
model = "whisper"

[tts]
endpoint = "http://localhost:8080/v1/audio/speech"
model = "kokoro"
voice = "af_sky"

[audio]
sample_rate = 16000
channels = 1

[vad]
threshold = 0.5
min_silence_duration_ms = 1000
min_speech_duration_ms = 100

See config.example.toml for a fully commented example.

Usage

# Run with default config
budgie

# Run with custom config
budgie --config /path/to/config.toml

# Run on specific port
budgie --port 9000

# Override STT/TTS endpoints
budgie --stt-endpoint http://localhost:9000/v1/audio/transcriptions
budgie --tts-endpoint http://localhost:9000/v1/audio/speech

# Enable debug logging
budgie --log-level debug

MCP Client Configuration

Add to your MCP client config:

{
  "mcpServers": {
    "budgie": {
      "type": "http",
      "url": "http://127.0.0.1:8787/mcp",
      "timeout": 180000
    }
  }
}

Important: The timeout must be at least as long as vad.max_duration_secs (default 120 seconds). Voice recordings can take up to the max duration, and if the MCP client times out first, the request will fail.

Tools

chat

Full voice conversation loop. Optionally speaks a TTS prompt, plays start/end chimes, records until speech ends (via VAD), and returns the transcript.

{
  "prompt": "What would you like to do?",
  "max_duration": 30
}

Returns: {"transcript": "user's spoken response"}

Acknowledgments

Inspired by VoiceMode by Mike Bailey.

License

AGPL-3.0-or-later. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.beads		.beads
assets/chimes		assets/chimes
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config.example.toml		config.example.toml
flake.lock		flake.lock
flake.nix		flake.nix
opencode.json		opencode.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Budgie

Features

Requirements

Installation

Configuration

Usage

MCP Client Configuration

Tools

chat

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Budgie

Features

Requirements

Installation

Configuration

Usage

MCP Client Configuration

Tools

chat

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages