eaRS is a Rust-based streaming speech-to-text stack built on Kyutai's models. The tool is now delivered as a single CLI with two responsibilities:
- Server management:
ears server start|stoplaunches and controls the inference backend. - Client capture: Running
earswithout subcommands streams microphone audio to the server and prints live transcripts.
On Linux, eaRS uses the system sentencepiece library to avoid protobuf conflicts with ONNX Runtime.
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y libsentencepiece-dev sentencepieceFedora/RHEL:
sudo dnf install -y sentencepiece-develArch Linux:
sudo pacman -S sentencepieceOn macOS and Windows, sentencepiece is compiled from source and statically linked during the build process. No manual installation required.
Use the provided just recipes which automatically check and install dependencies:
# Interactive installation with feature selection
just install-ears
# Or use specific presets:
just install-ears-metal # macOS with Metal acceleration
just install-ears-cuda # NVIDIA GPU with CUDA
just install-ears-parakeet # Enable Parakeet engine
just install-ears-default # CPU onlycargo build --release
cargo build --release --features apple # For Apple silicon
cargo build --release --features nvidia # For NVIDIA GPU
cargo build --release --features parakeet # Enable Parakeet (ONNX) engine (CPU)
cargo build --release --features "parakeet nvidia" # Parakeet + CUDA (Kyutai uses CUDA too)
cargo build --release --features "parakeet apple" # Parakeet + CoreML, Kyutai + Metal (Apple Silicon)
cargo build --release --features "parakeet amd" # Parakeet + ROCm (Kyutai stays CPU)
cargo build --release --features "parakeet directml" # Parakeet + DirectML (Kyutai stays CPU)All binaries are emitted into ./target/release/.
The recommended way to install is using the just recipes (see System Dependencies above).
For manual installation:
cargo install --path .
cargo install --path . --features apple # For Apple silicon
cargo install --path . --features nvidia # For NVIDIA GPU# 1. Start the transcription server (runs in the background)
./target/release/ears server start
# 2. Stream your microphone to the server and print live text
./target/release/earsPress Ctrl+C in the client to stop streaming. When you are done with the backend:
./target/release/ears server stop./target/release/ears server start \
[--bind 0.0.0.0:8765] \
[--engine kyutai|parakeet] \
[--hf-repo kyutai/stt-1b-en_fr-candle] \
[--parakeet-repo istupakov/parakeet-tdt-0.6b-v3-onnx] \
[--parakeet-device cpu|nvidia|apple|amd|directml] \
[--parakeet-chunk-seconds 3.0] \
[--parakeet-overlap-seconds 1.0] \
[--cpu] \
[--timestamps] \
[--vad] \
[--whisper] # requires --features whisper
--bind: Override the default bind address (0.0.0.0:<port-from-config>).--engine: Choose the default engine; when compiled withparakeet, both engines load and you can switch via WebSocket{"type":"setengine","engine":"parakeet"}.--hf-repo: Choose a different Kyutai Speech repo hosted on Hugging Face.--parakeet-*: Configure the Parakeet ONNX engine (defaults are multilingual, no language selection needed). Parakeet weights are CC-BY and are downloaded at runtime; nothing is redistributed.--cpu: Force CPU execution (otherwise CUDA/Metal is used when available).--timestamps: Include word timestamps in the server stream.--vad: Enable voice-activity detection for automatic sentence segmentation.--whisper: Force-enable Whisper post-processing (only when compiled with thewhisperfeature).
The server writes a PID file to $XDG_STATE_HOME/ears/server.pid (or ~/.local/state/ears/server.pid) so subsequent start commands will refuse to launch if an instance is already running. ears server stop sends a SIGTERM to the stored PID and removes the PID file; stale files are cleaned up automatically.
./target/release/ears [--device 1] [--server ws://host:port/] [--timestamps] [--list-devices]
--list-devices: Print available input devices and exit.--device: Select a specific capture device by index.--server: Point the client at a remote server (ws://127.0.0.1:<config-port>/by default).--timestamps: Print the final transcript with per-word timing instead of live text.
The client streams raw 24 kHz mono PCM to the server and displays each live word as it appears. When the backend signals completion, the final transcript (and optional timestamps) is printed.
Runtime configuration lives at:
$XDG_CONFIG_HOME/ears/config.toml
# or ~/.config/ears/config.toml
Key sections:
[storage]: Override model cache directories and reference audio location.[whisper]: Configure optional Whisper enhancement defaults (model, quantization, languages, sentence detection thresholds).[server]: Default WebSocket port used byears server startand the capture client.[dictation]: Enable live typing and configure in-app hotkeys.[dictation.notifications]: Toggle desktop popups and customise start/pause/stop messages shown for dictation state changes.[dictation.hooks](requirescargo build --features hooks): Run shell commands on start, pause, or stop transitions (e.g., change colours in status bars).
If the file does not exist, it is created on first run together with the reference audio bundle.
The server emits JSON events:
{"type":"word","word":"hello","start_time":1.23,"end_time":null}– live word updates.{"type":"final","text":"…","words":[…]}– final transcript with timestamp list.{"type":"whisper_processing"|"whisper_complete",…}– optional Whisper status messages when Whisper is enabled.
Clients may send {"type":"stop"} to end the current session (the capture client does this automatically when interrupted).
ears server startreports "already running" – Useears server stopto terminate the existing instance. If the PID no longer exists,stopwill clean up the stale PID file.- Client prints "failed to connect" – Ensure the server is running and reachable at the URL passed via
--server(check the configured port). - High latency – Run the server on the same machine as the client or enable GPU acceleration (
--features cudaor--features metal).