1 unstable release

0.1.5	Nov 14, 2025

#879 in Audio

Apache-2.0 and LGPL-3.0

140KB
2.5K SLoC

kokorox - fast Kokoro TTS in Rust

Rust implementation of the Kokoro text-to-speech model. Small model (87M parameters), high quality output, very fast inference.

Features

Multi-language: English, Chinese, Japanese, Spanish, French, and more via espeak-ng
Voice style mixing (e.g., af_sky.4+af_nicole.5)
OpenAI-compatible API server
Streaming and pipe modes for LLM integration
Automatic language detection

Quick Start

# Install (macOS)
brew install byteowlz/tap/koko

# Or download from GitHub Releases
# https://github.com/byteowlz/kokorox/releases

# Generate speech
koko text "Hello, this is a test"

# Output: tmp/output.wav

Installation

Pre-built Binaries

Download from GitHub Releases for Linux, macOS, and Windows.

From Source

Requires ONNX runtime and espeak-ng:

# macOS
brew install espeak-ng

# Ubuntu/Debian
sudo apt-get install espeak-ng libespeak-ng-dev

Build:

git clone https://github.com/byteowlz/kokorox.git
cd kokorox
pip install -r scripts/requirements.txt
python scripts/download_voices.py --all
cargo build --release

ONNX Runtime (Linux with NVIDIA GPU)

tar -xzf onnxruntime-linux-x64-gpu-1.22.0.tgz
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/include /usr/local/
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/lib /usr/local/
sudo ldconfig
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Usage

Basic

koko text "Hello, world!" -o greeting.wav
koko file poem.txt                          # One wav per line
koko file poem.txt --merge -o poem.wav      # Entire file as single wav

Multi-language

koko text "Hola, mundo!" --lan es
koko text "你好，世界!" --lan zh
koko -a text "Bonjour!"                     # Auto-detect language

Voice Styles

koko voices                                 # List available voices
koko voices --language en --gender female   # Filter voices
koko text "Hello" --style af_sky
koko text "Hello" --style af_sky.4+af_nicole.5  # Mix styles

Pipe Mode (LLM Integration)

ollama run llama3 "Tell me a story" | koko pipe
ollama run llama3 "Explain physics" | koko pipe --silent -o output.wav

# Use an already running OpenAI-compatible server
ollama run llama3 "Tell me a story" | koko pipe --backend openai --server-url http://127.0.0.1:3000

# Use an already running WebSocket server
ollama run llama3 "Tell me a story" | koko pipe --backend websocket --server-url ws://127.0.0.1:8766

# Configure URL preprocessing for spoken output
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode readable
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode domain
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode skip

OpenAI-Compatible Server

koko openai --ip 0.0.0.0 --port 3000

curl -X POST http://localhost:3000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model": "kokoro", "input": "Hello!", "voice": "af_sky"}' \
  -o hello.wav

curl http://localhost:3000/v1/audio/voices           # List voice IDs
curl http://localhost:3000/v1/audio/voices/detailed  # Voice metadata

Streaming

koko stream > output.wav
# Type text, press Enter. Ctrl+D to exit.

Docker

docker build -t kokorox .
docker run -v ./tmp:/app/tmp kokorox text "Hello from docker!" -o tmp/hello.wav
docker run -p 3000:3000 kokorox openai --ip 0.0.0.0 --port 3000

Debugging

koko text "Text here" --verbose              # Detailed processing logs
koko text "Accénted" --debug-accents         # Character-by-character analysis

Additional Voices

The default installation includes standard voices. More voices (54 total across 8 languages) can be converted from Hugging Face:

python scripts/convert_pt_voices.py --all
koko -d data/voices-custom.bin text "Hello" --style en_sarah

License

GPL 3.0 due to use of the espeak-rs-sys crate which statically links espeak-ng

Dependencies

~48–73MB
~1M SLoC