1 unstable release
| 0.1.5 | Nov 14, 2025 |
|---|
#879 in Audio
140KB
2.5K
SLoC
kokorox - fast Kokoro TTS in Rust
Rust implementation of the Kokoro text-to-speech model. Small model (87M parameters), high quality output, very fast inference.
Features
- Multi-language: English, Chinese, Japanese, Spanish, French, and more via espeak-ng
- Voice style mixing (e.g.,
af_sky.4+af_nicole.5) - OpenAI-compatible API server
- Streaming and pipe modes for LLM integration
- Automatic language detection
Quick Start
# Install (macOS)
brew install byteowlz/tap/koko
# Or download from GitHub Releases
# https://github.com/byteowlz/kokorox/releases
# Generate speech
koko text "Hello, this is a test"
# Output: tmp/output.wav
Installation
Pre-built Binaries
Download from GitHub Releases for Linux, macOS, and Windows.
From Source
Requires ONNX runtime and espeak-ng:
# macOS
brew install espeak-ng
# Ubuntu/Debian
sudo apt-get install espeak-ng libespeak-ng-dev
Build:
git clone https://github.com/byteowlz/kokorox.git
cd kokorox
pip install -r scripts/requirements.txt
python scripts/download_voices.py --all
cargo build --release
ONNX Runtime (Linux with NVIDIA GPU)
tar -xzf onnxruntime-linux-x64-gpu-1.22.0.tgz
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/include /usr/local/
sudo cp -a onnxruntime-linux-x64-gpu-1.22.0/lib /usr/local/
sudo ldconfig
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
Usage
Basic
koko text "Hello, world!" -o greeting.wav
koko file poem.txt # One wav per line
koko file poem.txt --merge -o poem.wav # Entire file as single wav
Multi-language
koko text "Hola, mundo!" --lan es
koko text "你好,世界!" --lan zh
koko -a text "Bonjour!" # Auto-detect language
Voice Styles
koko voices # List available voices
koko voices --language en --gender female # Filter voices
koko text "Hello" --style af_sky
koko text "Hello" --style af_sky.4+af_nicole.5 # Mix styles
Pipe Mode (LLM Integration)
ollama run llama3 "Tell me a story" | koko pipe
ollama run llama3 "Explain physics" | koko pipe --silent -o output.wav
# Use an already running OpenAI-compatible server
ollama run llama3 "Tell me a story" | koko pipe --backend openai --server-url http://127.0.0.1:3000
# Use an already running WebSocket server
ollama run llama3 "Tell me a story" | koko pipe --backend websocket --server-url ws://127.0.0.1:8766
# Configure URL preprocessing for spoken output
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode readable
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode domain
koko text "Read https://openshovelshack.com/blog/the-octopus-and-the-rake" --url-mode skip
OpenAI-Compatible Server
koko openai --ip 0.0.0.0 --port 3000
curl -X POST http://localhost:3000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "kokoro", "input": "Hello!", "voice": "af_sky"}' \
-o hello.wav
curl http://localhost:3000/v1/audio/voices # List voice IDs
curl http://localhost:3000/v1/audio/voices/detailed # Voice metadata
Streaming
koko stream > output.wav
# Type text, press Enter. Ctrl+D to exit.
Docker
docker build -t kokorox .
docker run -v ./tmp:/app/tmp kokorox text "Hello from docker!" -o tmp/hello.wav
docker run -p 3000:3000 kokorox openai --ip 0.0.0.0 --port 3000
Debugging
koko text "Text here" --verbose # Detailed processing logs
koko text "Accénted" --debug-accents # Character-by-character analysis
Additional Voices
The default installation includes standard voices. More voices (54 total across 8 languages) can be converted from Hugging Face:
python scripts/convert_pt_voices.py --all
koko -d data/voices-custom.bin text "Hello" --style en_sarah
License
GPL 3.0 due to use of the espeak-rs-sys crate which statically links espeak-ng
Dependencies
~48–73MB
~1M SLoC