Echo - Voice Input Application

Offline voice input desktop application optimized for Apple Silicon

Features

High-Accuracy Speech Recognition: Whisper, Parakeet TDT (JA), and Cohere Transcribe — all running fully in-process in Rust (no Python sidecar)
Model Switching: Switch between Whisper, Parakeet-JA, and (opt-in) Cohere models in Settings
LLM-Based Post-Processing: Optional on-device cleanup of filler words and self-corrections (Qwen3, in-process via MLX)
Context-Aware Formatting: Detects active application for context-appropriate output
Customizable Prompts: Edit post-processing behavior via Advanced Settings
Always-On Transcription: Continuous listening mode with automatic speech detection (Silero VAD)
Global Hotkey: System-wide keyboard shortcut for instant recording
Real-time Transcription: Immediate transcription after recording ends
Auto Text Insertion: Automatically paste transcription into active applications
Transcription History: SQLite-backed history with full-text search for always-on mode
Fully Offline: No internet connection required — all processing happens locally

System Requirements

OS: macOS 14.0 (Sonoma) or later
CPU: Apple Silicon (M1/M2/M3/M4)
Memory: 8GB+ recommended
Storage: 2GB+ free space (varies by model size)

Supported Models

All ASR models run in-process in Rust — there is no Python dependency at runtime.

Whisper (default)

Runs on whisper.cpp (Metal) via whisper-rs.

whisper-large-v3-turbo — balanced performance (default)
whisper-large-v3 — highest accuracy
whisper-medium / small / base / tiny — lightweight models

Parakeet TDT (Japanese)

NVIDIA FastConformer + TDT transducer, ported to Apple MLX via mlx-rs. Japanese-specialized (CER 6.4% on JSUT), ~160 ms inference, non-gated.

parakeet-tdt_ctc-0.6b-ja — 0.6B

Cohere Transcribe (gated, opt-in)

Cohere Labs 2B, ported to Apple MLX via mlx-rs. 14 languages including JA/ZH/KO; strong on clean read-speech. Requires HuggingFace authentication and license acceptance — see Gated Model Access.

cohere-transcribe-03-2026 — 2B (BF16, ~4GB download)

Note: Qwen3-ASR is not currently available — it has not yet been ported to the in-process Rust engine and is intentionally omitted to avoid a non-functional option.

Installation

Download the latest release from the Releases page:

Download the .dmg file
Open the DMG and drag Echo to your Applications folder
Launch Echo from Applications
Grant microphone permissions when prompted

The app is fully self-contained — no Python or other runtime installation required.

Usage

Launch the app
Press and hold Cmd+Shift+Space to start recording
Speak your message
Release the key when finished
Transcription appears automatically
Text is auto-inserted into the active application (if enabled in Settings)

Always-On Transcription

Echo also supports continuous listening mode, which runs in the background and automatically detects speech:

Click the "Start Listening" button in the app
Echo continuously monitors the microphone using Silero VAD (Voice Activity Detection)
When speech is detected, it automatically records the segment
After a silence pause (1.5s), the segment is transcribed and saved to history
Click "Stop Listening" to end the session

Transcription history is stored in a local SQLite database and can be searched via the History panel.

Settings

Customize Echo via the Settings panel:

ASR Model: Choose between Whisper, Parakeet-JA, or (when enabled) Cohere models
Hotkey: Customize the recording keyboard shortcut
Recognition Language: Auto-detect or manually specify language
Input Device: Select your preferred microphone
Auto Insert: Enable/disable automatic paste after transcription
Post-Processing: Enable LLM-based cleanup to remove filler words and self-corrections
Advanced Settings: Customize the post-processing prompt, and configure gated model access

Gated Model Access

Some models (currently Cohere Transcribe) require HuggingFace authentication and license acceptance. Enable them via Settings → Gated Models (Advanced):

Accept the model license on the upstream HF page (e.g. huggingface.co/CohereLabs/cohere-transcribe-03-2026).
Create a read-scoped HF token at huggingface.co/settings/tokens.
Enter the token into Echo's Advanced settings and enable gated access.

The token is stored locally in settings.json and used only to download the gated checkpoint. When gated access is disabled, gated models are hidden from the picker.

Development Setup

Prerequisites

Node.js 20+
Rust 1.83+
Xcode 16+ with the Metal toolchain installed:
```
xcodebuild -downloadComponent MetalToolchain   # ~700 MB, one-time
```
Required because both mlx-rs (Apple MLX + Metal kernels) and whisper-rs (whisper.cpp + ggml + Metal) compile native code from source.

No Python is required — the ASR and post-processing engines are compiled into the app binary via the rust-asr crate.

Installation

# Install Node.js dependencies
npm install

# Build the Rust backend + in-process ASR engines (handled by Tauri)
cd src-tauri && cargo build

Development Server

npm run tauri:dev

Building

# Build frontend only
npm run build

# Build full Tauri application
npm run tauri:build

# Signed local build (preserves Accessibility permissions across updates)
export APPLE_SIGNING_IDENTITY="Developer ID Application: Your Name (TEAMID)"
npm run tauri:build:signed

Standalone ASR CLI / parity harness

The in-process engines can be exercised directly via the rust-asr crate:

cd rust-asr && cargo build --release
./target/release/rust-asr run <audio_16k.wav> <lang>   # Whisper
./target/release/rust-asr pk-run <audio.wav>           # Parakeet-JA
./target/release/rust-asr pp "<text>" [qwen3-model]    # Post-processing

Tech Stack

Frontend: React 18, TypeScript, Vite, Tailwind CSS
Backend: Tauri 2.x, Rust
ASR Engines (in-process): Whisper via whisper-rs (whisper.cpp/Metal); Parakeet-JA + Cohere via mlx-rs (Apple MLX)
VAD: Silero VAD v5 via ONNX Runtime, rubato for resampling
Post-Processing: Qwen3 (4-bit, default 4B) ported to mlx-rs, in-process
Database: SQLite with FTS5 full-text search (rusqlite)
Platform: macOS 14.0+ on Apple Silicon

Architecture

Echo runs entirely in a single process — speech recognition and post-processing are native Rust, with no Python sidecar.

Tauri App (Rust): Main application, hotkey handling, audio capture, VAD, active app detection
React Frontend: User interface, settings management, transcription history
In-Process ASR Engines (rust-asr): Whisper (whisper.cpp/Metal), Parakeet-JA and Cohere (Apple MLX). ASREngine in transcription.rs owns the loaded engines and dispatches by the active model id.
In-Process Post-Processor: Optional on-device cleanup using Qwen3 (full-Rust MLX port), context-aware via active-app detection
Continuous Pipeline (Rust): Streaming audio → Silero VAD (ONNX) → segment detection → ASR → SQLite

Both Metal backends (whisper.cpp and MLX) coexist in-process and are serialized by the ASREngine mutex. Models are lazily loaded — they download on first use and remain cached locally; the post-processor LLM auto-loads on startup if enabled.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE for details

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
docs		docs
echo-cli		echo-cli
public		public
rust-asr		rust-asr
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
design_doc.md		design_doc.md
float.html		float.html
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo - Voice Input Application

Features

System Requirements

Supported Models

Whisper (default)

Parakeet TDT (Japanese)

Cohere Transcribe (gated, opt-in)

Installation

Usage

Always-On Transcription

Settings

Gated Model Access

Development Setup

Prerequisites

Installation

Development Server

Building

Standalone ASR CLI / parity harness

Tech Stack

Architecture

Contributing

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Echo - Voice Input Application

Features

System Requirements

Supported Models

Whisper (default)

Parakeet TDT (Japanese)

Cohere Transcribe (gated, opt-in)

Installation

Usage

Always-On Transcription

Settings

Gated Model Access

Development Setup

Prerequisites

Installation

Development Server

Building

Standalone ASR CLI / parity harness

Tech Stack

Architecture

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages