Local-first voice AI for speech, chat, and audio workflows.
Website - Documentation - Releases - Issues
Izwi is a desktop app, web UI, CLI, and local inference server for voice AI. It runs on your machine and exposes both product workflows and OpenAI-compatible API routes without requiring cloud services or API keys.
- Real-time voice conversations with local ASR, chat, and TTS models.
- Text-to-speech, long-form Studio projects, voice cloning, voice design, and saved voices.
- Transcription, speaker diarization, forced alignment, and realtime speech-to-text.
- Local chat, model download/load/unload/delete, history, exports, and settings.
- OpenAI-compatible
/v1APIs for models, chat completions, audio speech, audio transcriptions, and preview Responses support.
Inference data stays local. Optional anonymous desktop analytics are disabled unless a user explicitly opts in, and they do not send prompts, transcripts, audio payloads, local paths, or personal identifiers.
Download the latest build from GitHub Releases.
- macOS: install the
.dmg, dragIzwi.appto Applications, then launch it. - Linux: install the
.debpackage withsudo dpkg -i izwi_*.deb. - Windows: run the
.exeinstaller.
Runtime support depends on the artifact:
- macOS Apple Silicon release builds use Metal.
- Linux and Windows release builds are CPU-only.
- CUDA is supported through the Docker CUDA profile or source builds on compatible NVIDIA hosts.
See the Runtime Support Matrix for the full contract.
Start the local server and web UI:
izwi serve --mode webServer-only and desktop modes are also available:
izwi serve
izwi serve --mode desktopDownload a model and generate speech:
izwi pull Qwen3-TTS-12Hz-0.6B-Base
izwi tts "Hello from Izwi." --output hello.wavTranscribe audio:
izwi pull Parakeet-TDT-0.6B-v3
izwi transcribe audio.wav --model Parakeet-TDT-0.6B-v3Open the app at http://localhost:8080. The local API reference is available
at http://localhost:8080/docs, and the raw OpenAPI document is available at
http://localhost:8080/openapi.json.
Run izwi list to see the enabled catalog. Current families include:
- TTS: Qwen3-TTS, Kokoro-82M, Voxtral TTS, and VibeVoice.
- ASR: Parakeet, Whisper, Qwen3-ASR, Nemotron 3.5 ASR, VibeVoice ASR, LFM2.5 Audio, and Voxtral Mini.
- Diarization and alignment: Sortformer diarization and Qwen3 ForcedAligner.
- Chat: Qwen3, Qwen3.5, LFM2.5, and Gemma.
Some model weights and bundled assets have their own licenses or access terms. Check the Models Guide before redistribution or commercial use of downloaded model artifacts.
For CLI/server installs, use the project install script:
./scripts/install-cli.shFor manual builds, scope Cargo to the binaries you need:
cargo build --release -p izwi-cli
cargo build --release -p izwi-serverCUDA source builds require the matching CUDA toolkit and Cargo features for the target host. See From Source for platform-specific setup.
Izwi is licensed under the MIT License.