Offline voice input desktop application optimized for Apple Silicon
- High-Accuracy Speech Recognition: Whisper, Parakeet TDT (JA), and Cohere Transcribe — all running fully in-process in Rust (no Python sidecar)
- Model Switching: Switch between Whisper, Parakeet-JA, and (opt-in) Cohere models in Settings
- LLM-Based Post-Processing: Optional on-device cleanup of filler words and self-corrections (Qwen3, in-process via MLX)
- Context-Aware Formatting: Detects active application for context-appropriate output
- Customizable Prompts: Edit post-processing behavior via Advanced Settings
- Always-On Transcription: Continuous listening mode with automatic speech detection (Silero VAD)
- Global Hotkey: System-wide keyboard shortcut for instant recording
- Real-time Transcription: Immediate transcription after recording ends
- Auto Text Insertion: Automatically paste transcription into active applications
- Transcription History: SQLite-backed history with full-text search for always-on mode
- Fully Offline: No internet connection required — all processing happens locally
- OS: macOS 14.0 (Sonoma) or later
- CPU: Apple Silicon (M1/M2/M3/M4)
- Memory: 8GB+ recommended
- Storage: 2GB+ free space (varies by model size)
All ASR models run in-process in Rust — there is no Python dependency at runtime.
Runs on whisper.cpp (Metal) via whisper-rs.
whisper-large-v3-turbo— balanced performance (default)whisper-large-v3— highest accuracywhisper-medium/small/base/tiny— lightweight models
NVIDIA FastConformer + TDT transducer, ported to Apple MLX via mlx-rs. Japanese-specialized (CER 6.4% on JSUT), ~160 ms inference, non-gated.
parakeet-tdt_ctc-0.6b-ja— 0.6B
Cohere Labs 2B, ported to Apple MLX via mlx-rs. 14 languages including JA/ZH/KO; strong on clean read-speech. Requires HuggingFace authentication and license acceptance — see Gated Model Access.
cohere-transcribe-03-2026— 2B (BF16, ~4GB download)
Note: Qwen3-ASR is not currently available — it has not yet been ported to the in-process Rust engine and is intentionally omitted to avoid a non-functional option.
Download the latest release from the Releases page:
- Download the
.dmgfile - Open the DMG and drag Echo to your Applications folder
- Launch Echo from Applications
- Grant microphone permissions when prompted
The app is fully self-contained — no Python or other runtime installation required.
- Launch the app
- Press and hold
Cmd+Shift+Spaceto start recording - Speak your message
- Release the key when finished
- Transcription appears automatically
- Text is auto-inserted into the active application (if enabled in Settings)
Echo also supports continuous listening mode, which runs in the background and automatically detects speech:
- Click the "Start Listening" button in the app
- Echo continuously monitors the microphone using Silero VAD (Voice Activity Detection)
- When speech is detected, it automatically records the segment
- After a silence pause (1.5s), the segment is transcribed and saved to history
- Click "Stop Listening" to end the session
Transcription history is stored in a local SQLite database and can be searched via the History panel.
Customize Echo via the Settings panel:
- ASR Model: Choose between Whisper, Parakeet-JA, or (when enabled) Cohere models
- Hotkey: Customize the recording keyboard shortcut
- Recognition Language: Auto-detect or manually specify language
- Input Device: Select your preferred microphone
- Auto Insert: Enable/disable automatic paste after transcription
- Post-Processing: Enable LLM-based cleanup to remove filler words and self-corrections
- Advanced Settings: Customize the post-processing prompt, and configure gated model access
Some models (currently Cohere Transcribe) require HuggingFace authentication and license acceptance. Enable them via Settings → Gated Models (Advanced):
- Accept the model license on the upstream HF page (e.g.
huggingface.co/CohereLabs/cohere-transcribe-03-2026). - Create a read-scoped HF token at
huggingface.co/settings/tokens. - Enter the token into Echo's Advanced settings and enable gated access.
The token is stored locally in settings.json and used only to download the gated checkpoint. When gated access is disabled, gated models are hidden from the picker.
- Node.js 20+
- Rust 1.83+
- Xcode 16+ with the Metal toolchain installed:
Required because both
xcodebuild -downloadComponent MetalToolchain # ~700 MB, one-timemlx-rs(Apple MLX + Metal kernels) andwhisper-rs(whisper.cpp + ggml + Metal) compile native code from source.
No Python is required — the ASR and post-processing engines are compiled into the app binary via the rust-asr crate.
# Install Node.js dependencies
npm install
# Build the Rust backend + in-process ASR engines (handled by Tauri)
cd src-tauri && cargo buildnpm run tauri:dev# Build frontend only
npm run build
# Build full Tauri application
npm run tauri:build
# Signed local build (preserves Accessibility permissions across updates)
export APPLE_SIGNING_IDENTITY="Developer ID Application: Your Name (TEAMID)"
npm run tauri:build:signedThe in-process engines can be exercised directly via the rust-asr crate:
cd rust-asr && cargo build --release
./target/release/rust-asr run <audio_16k.wav> <lang> # Whisper
./target/release/rust-asr pk-run <audio.wav> # Parakeet-JA
./target/release/rust-asr pp "<text>" [qwen3-model] # Post-processing- Frontend: React 18, TypeScript, Vite, Tailwind CSS
- Backend: Tauri 2.x, Rust
- ASR Engines (in-process): Whisper via
whisper-rs(whisper.cpp/Metal); Parakeet-JA + Cohere viamlx-rs(Apple MLX) - VAD: Silero VAD v5 via ONNX Runtime, rubato for resampling
- Post-Processing: Qwen3 (4-bit, default 4B) ported to
mlx-rs, in-process - Database: SQLite with FTS5 full-text search (
rusqlite) - Platform: macOS 14.0+ on Apple Silicon
Echo runs entirely in a single process — speech recognition and post-processing are native Rust, with no Python sidecar.
- Tauri App (Rust): Main application, hotkey handling, audio capture, VAD, active app detection
- React Frontend: User interface, settings management, transcription history
- In-Process ASR Engines (
rust-asr): Whisper (whisper.cpp/Metal), Parakeet-JA and Cohere (Apple MLX).ASREngineintranscription.rsowns the loaded engines and dispatches by the active model id. - In-Process Post-Processor: Optional on-device cleanup using Qwen3 (full-Rust MLX port), context-aware via active-app detection
- Continuous Pipeline (Rust): Streaming audio → Silero VAD (ONNX) → segment detection → ASR → SQLite
Both Metal backends (whisper.cpp and MLX) coexist in-process and are serialized
by the ASREngine mutex. Models are lazily loaded — they download on first use
and remain cached locally; the post-processor LLM auto-loads on startup if enabled.
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - see LICENSE for details