Linux has always punched above its weight, except when it comes to voice typing. Vocalinux fixes that.
It's a free, GPLv3-licensed desktop app that lets you dictate text into any application, on X11 or Wayland, using fully offline speech recognition. Pick from three engines (whisper.cpp, OpenAI Whisper, or VOSK), get automatic GPU acceleration via Vulkan, and control it all with customizable keyboard shortcuts: toggle or push-to-talk.
No internet required. No data leaves your machine. Just speak and type.
🎉 Release: Remote API recognition, Silero VAD, and reliability hardening across threading, IBus, settings, installer, and model metadata.
| Feature | Description |
|---|---|
| 🌐 Remote API Engine | New speech recognition backend for self-hosted or compatible remote transcription services |
| 🎙️ Silero VAD | Neural voice activity detection drops silence-only buffers for cleaner, faster dictation |
| 🧵 Thread Safety | Hardened Remote API, IBus, and text injection threading behavior |
| 🔌 IBus Reliability | Preserves user engines for dead keys and captures scoped engines during activation |
| ⚙️ Settings Polish | Remote Server section now respects the Advanced toggle and the dialog fits lower-resolution screens |
| 📦 Installer & Models | CUDA diagnostics include auto-remediation and model download sizes are corrected |
- Remote API speech recognition engine — Configure Vocalinux to use compatible remote transcription services while keeping existing local engines available (#335)
- Silero VAD — Neural voice activity detection filters silence-only buffers for cleaner recognition when ONNX Runtime support is installed (#447)
- Threading: Harden Remote API, IBus, and text injection thread safety (#452)
- IBus: Preserve user engines for dead keys and capture the current engine during scoped activation (#457, #458)
- UI: Keep the Remote Server section behind the Advanced toggle and reduce settings dialog height for lower-resolution screens (#454, #456)
- Installer: Harden CUDA diagnostics with auto-remediation and behavioral tests (#451)
- Models: Correct whisper.cpp and VOSK download size metadata (#453)
- Startup: Allow launch without the pynput backend (#448)
- Website: Clarify speech demo browser support (#449)
- Developer docs — Remote API test server instructions for easier backend testing (#455)
- Community — GitHub Sponsors funding configuration added
- Behavioral coverage — CUDA diagnostics and release-facing reliability fixes include targeted tests
- 🎤 Toggle or Push-to-Talk activation modes
- ⚡ Real-time transcription with minimal latency
- 🌎 Universal compatibility across all Linux applications
- 🔒 100% Offline operation for privacy and reliability
- 🤖 whisper.cpp by default - High-performance C++ speech recognition
- 🎮 Universal GPU support - Vulkan acceleration for AMD, Intel, and NVIDIA
- 🎨 System tray integration with visual status indicators
- 🚀 Start on login support via XDG autostart (desktop-session startup)
- 🔊 Pleasant audio feedback - smooth gliding tones, headphone-friendly
- ⚙️ Graphical settings dialog for easy configuration
- 📦 3 engine choices - whisper.cpp (default), OpenAI Whisper, or VOSK
Here are some screenshots showcasing Vocalinux in action:
|
Real-time voice-to-text transcription |
System tray with listening indicator |
|
About view with version info |
Log viewer for debugging |
|
Overview of key features and configuration options with annotations |
|
Our new interactive installer guides you through setup with intelligent hardware detection:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.shChoose your engine:
- whisper.cpp ⭐ (Recommended) - Fast, works with any GPU via Vulkan
- Whisper (OpenAI) - PyTorch-based, NVIDIA GPU only
- VOSK - Lightweight, works on older systems
The installer will:
- Auto-detect your hardware (GPU, RAM, Vulkan support)
- Recommend the best engine for your system
- Download the appropriate model (~74MB for the default whisper.cpp tiny model)
- Install neural VAD support when ONNX Runtime is available
- Install in ~1-2 minutes (vs 5-10 min with old Whisper)
Note: Always installs the latest release. For a specific version, check GitHub Releases.
Default (whisper.cpp - recommended):
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.shFastest installation (~1-2 min), universal GPU support via Vulkan.
Whisper (OpenAI) - if you prefer PyTorch:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=whisperNVIDIA GPU only (~5-10 min, downloads PyTorch + CUDA).
VOSK only - for low-RAM systems:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=voskLightweight option (~40MB), works on systems with 4GB RAM.
# Clone the repository
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
# Run the installer (will prompt for Whisper)
./install.sh
# Or with Whisper support
./install.sh --with-whisperThe installer handles everything: system dependencies, Python environment, speech models, and desktop integration.
For developers and early adopters who want to test the latest features, check out our GitHub Releases page which includes both beta and nightly builds.
⚠️ Warning: Nightly releases contain the absolute latest code and may be unstable. For production use, we recommend using the latest beta release.
Nightly builds are automatically generated from the main branch every day. They include all merged changes but haven't undergone the same testing as beta releases.
Release Channels:
- Beta (Recommended) - Tested pre-releases with known features
- Nightly - Untested bleeding edge with latest commits
# If ~/.local/bin is in your PATH (recommended):
vocalinux
# Or activate the virtual environment first:
source ~/.local/bin/activate-vocalinux.sh
vocalinux
# Or run directly:
~/.local/share/vocalinux/venv/bin/vocalinuxOr launch it from your application menu!
- OS: Linux (tested on Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch Linux, openSUSE Tumbleweed)
- Python: 3.9 or newer
- Display: X11 or Wayland
- Hardware: Microphone for voice input
Note: See Distribution Compatibility for distribution-specific information and experimental support for Gentoo, Alpine, Void, Solus, and more.
- Toggle mode: Double-tap the shortcut key (default Ctrl) to start recording
- Speak clearly into your microphone
- Toggle mode: Double-tap again (or pause speaking) to stop, or Push-to-Talk mode: release the key to stop
| Command | Action |
|---|---|
| "new line" | Inserts a line break |
| "period" / "full stop" | Types a period (.) |
| "comma" | Types a comma (,) |
| "question mark" | Types a question mark (?) |
| "exclamation mark" | Types an exclamation mark (!) |
| "delete that" | Deletes the last sentence |
| "capitalize" | Capitalizes the next word |
vocalinux --help # Show all options
vocalinux --debug # Enable debug logging
vocalinux --engine whisper_cpp # Use whisper.cpp engine (default)
vocalinux --engine whisper # Use OpenAI Whisper engine
vocalinux --engine vosk # Use VOSK engine
vocalinux --model medium # Use medium-sized model
vocalinux --model medium.en-q5_0 # Use exact whisper.cpp model variant
vocalinux --model large-v3-turbo # Use large-v3 Turbo with whisper.cpp
vocalinux --wayland # Force Wayland mode
vocalinux --start-minimized # Start without first-run modal promptsVocalinux uses the Linux desktop standard for autostart:
- Mechanism: XDG autostart desktop entry (
vocalinux.desktop) - Path:
$XDG_CONFIG_HOME/autostart/or~/.config/autostart/(fallback) - Launch mode: Starts as a regular user desktop app in your graphical session
- Not used: No
systemdunit/service is created by Vocalinux for autostart
How to enable/disable:
- First-run welcome dialog
- Tray menu: Start on Login
- Settings dialog: Start on Login
Compatibility notes:
- Works on mainstream desktop environments (GNOME, KDE, Xfce, Cinnamon, MATE, LXQt)
- On minimal/custom window-manager sessions, an autostart handler may be required
(for example DE-specific startup hooks or tools like
dex)
Configuration is stored in ~/.config/vocalinux/config.json:
{
"speech_recognition": {
"engine": "whisper_cpp",
"model_size": "tiny",
"vad_sensitivity": 3,
"silence_timeout": 2.0
}
}For whisper.cpp, model_size may be a size such as tiny or an exact ggml model ID
such as medium.en-q5_0 or large-v3-turbo. You can also configure this through
the graphical Settings dialog, where whisper.cpp models are split into Model Size
and Specialization controls.
Vocalinux ships with a Silero VAD model and uses it automatically when onnxruntime is available. The official installer attempts to install this support automatically. Without it, recording falls back to the simpler amplitude-threshold VAD.
For manual or PyPI installs, enable neural VAD with:
pip install "vocalinux[vad]"Restart Vocalinux after install. The Recognition tab in Settings shows which backend is active. The same vad_sensitivity (1-5) works for both -- it's mapped to a Silero probability threshold internally (1 = 0.8, 5 = 0.3).
# Clone and install in dev mode
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
./install.sh --dev
# Activate environment
source venv/bin/activate
# Run tests
pytest
# Run from source with debug
python -m vocalinux.main --debugvocalinux/
├── src/vocalinux/ # Main application code
│ ├── speech_recognition/ # Speech recognition engines (VOSK, Whisper, whisper.cpp)
│ │ └── recognition_manager.py # Unified engine interface
│ ├── text_injection/ # Text injection (X11/Wayland)
│ ├── ui/ # GTK UI components
│ └── utils/ # Utility functions
│ ├── whispercpp_model_info.py # whisper.cpp model metadata & hardware detection
│ └── vosk_model_info.py # VOSK model metadata
├── tests/ # Test suite
├── scripts/ # Development utilities
│ └── generate_sounds.py # Sound generation script
├── resources/ # Icons and sounds
├── docs/ # Documentation
└── web/ # Website source
- Installation Guide - Detailed installation instructions
- Update Guide - How to update Vocalinux
- User Guide - Complete user documentation
- Distribution Compatibility - Distro/session behavior and caveats
- Contributing - Development setup and contribution guidelines
Vocalinux uses smooth, pleasant gliding tones for audio feedback:
- Start: Ascending F4→A4 (0.6s) - positive, uplifting
- Stop: Descending A4→F4 (0.6s) - resolves completion
- Error: Lower descending E4→C4 (0.7s) - gentle but noticeable
All sounds use pure sine waves with smoothstep interpolation for buttery smooth pitch transitions - perfect for headphone use!
To modify or regenerate the notification sounds:
python scripts/generate_sounds.pyThis script generates all three sounds using the same smooth glide algorithm. You can edit the frequencies, durations, and amplitudes in the script to customize the sounds to your preference.
-
Custom icon design✅ -
Graphical settings dialog✅ -
Whisper AI support✅ -
Multi-language support (FR, DE, RU)✅ -
whisper.cpp integration (default engine)✅ -
Vulkan GPU support✅ - In-app update mechanism
- Application-specific commands
- Debian/Ubuntu package (.deb)
-
Wayland support via IBus✅ - Voice command customization
Vocalinux is part of a family of privacy-first, offline voice dictation tools. Same mission, every operating system.
| Platform | Project | Website | GitHub | Status |
|---|---|---|---|---|
| 🐧 Linux | VocaLinux | vocalinux.com | jatinkrmalik/vocalinux | ✅ Beta v0.12.0 |
| 🍎 macOS | VocaMac | vocamac.com | jatinkrmalik/vocamac | 🚀 Beta |
| 🪟 Windows | VocaWin | vocawin.com | jatinkrmalik/vocawin | 📋 Planned |
Each platform uses native technologies for the best possible integration, while sharing the same privacy-first philosophy and offline-only architecture.
We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please check out our Contributing Guide.
Thanks to everyone who has contributed to Vocalinux! 🙌
If you find Vocalinux useful, please consider:
- ⭐ Starring this repository
- 🐛 Reporting bugs you encounter
- 📖 Improving documentation
- 🔀 Contributing code
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Made with ❤️ for the Linux community