If you find this project helpful, consider supporting it!
A simple Emacs package that provides speech-to-text functionality using Whisper.cpp. Record audio directly from Emacs and have it transcribed and inserted at your cursor position.
Perfect Speech to Text in Emacs - See the package in action!
- Record audio with simple key bindings
- Two transcription modes:
- Fast mode (
C-c v): Uses base.en model for quick transcription - Accurate mode (
C-c n): Uses medium.en model for more accurate results
- Fast mode (
- Vocabulary hints: Provide a custom vocabulary file to improve recognition of proper nouns and specialized terms (e.g., Greek names like Socrates, Alcibiades, Diotima)
- Automatic transcription using Whisper.cpp
- Text insertion at cursor position
- Non-blocking recording with user-controlled stop
Before setting up this package, you need to install the following system dependencies:
Ubuntu/Debian:
sudo apt install soxmacOS:
brew install soxArch Linux:
sudo pacman -S soxClone and build Whisper.cpp:
# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
# Build the project
make
# Download models
# For fast mode (required)
bash ./models/download-ggml-model.sh base.en
# For accurate mode (required)
bash ./models/download-ggml-model.sh medium.enMake sure the paths in the code match your installation:
- Whisper binary:
~/whisper.cpp/build/bin/whisper-cli - Fast mode model:
~/whisper.cpp/models/ggml-base.en.bin - Accurate mode model:
~/whisper.cpp/models/ggml-medium.en.bin
If you install Whisper.cpp in a different location, you'll need to update the paths in whisper.el.
-
Clone or download this repository:
git clone <your-repo-url> ~/.emacs.d/whisper
-
Add the following to your
init.elor.emacsfile:;; Add the package directory to load-path (add-to-list 'load-path "~/.emacs.d/whisper") ;; Load the package (require 'whisper)
-
Copy
whisper.elto your Emacs configuration directory:cp whisper.el ~/.emacs.d/ -
Add the following to your
init.el:;; Load the whisper package (load-file "~/.emacs.d/whisper.el")
If you use use-package, add this to your init.el:
(use-package whisper
:load-path "~/.emacs.d/whisper"
:bind (("C-c v" . whisper-transcribe-fast)
("C-c n" . whisper-transcribe)))C-c v: Fast mode (base.en model) - quicker transcription, suitable for most use casesC-c n: Accurate mode (medium.en model) - slower but more accurate transcription
- Start recording: Press
C-c v(fast) orC-c n(accurate) to begin recording audio - Stop recording: Press
C-gto stop recording and start transcription - Get results: The transcribed text will be automatically inserted at your cursor position
- Open any text buffer in Emacs
- Position your cursor where you want the transcribed text
- Press
C-c vfor fast transcription orC-c nfor accurate transcription - Speak into your microphone
- Press
C-gwhen finished speaking - Wait a moment for transcription to complete
- The text appears at your cursor position
To change the key bindings, modify your init.el:
;; Use different key bindings
(global-set-key (kbd "C-c s") #'whisper-transcribe-fast) ; Fast mode
(global-set-key (kbd "C-c S") #'whisper-transcribe) ; Accurate modeTo use a different model for accurate mode, set the whisper-model-path variable in your init.el:
;; Use a different model (e.g., large model for even better accuracy)
(setq whisper-model-path "~/whisper.cpp/models/ggml-large.en.bin")If your Whisper.cpp installation is in a different location, you'll need to modify the paths in whisper.el:
;; Example: if whisper-cli is in /usr/local/bin/
;; Edit the format strings in whisper.el from:
;; "~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin ..."
;; to:
;; "/usr/local/bin/whisper-cli -m /path/to/your/model.bin ..."To improve transcription accuracy for proper nouns, technical terms, or specialized vocabulary, create a vocabulary file at ~/.emacs.d/whisper-vocabulary.txt.
Example ~/.emacs.d/whisper-vocabulary.txt:
This transcription discusses classical Greek philosophy, including scholars and figures such as Thrasymachus, Socrates, Plato, Diotima, Alcibiades, and Phaedrus.
Custom vocabulary location:
(setq whisper-vocabulary-file "~/Documents/my-vocabulary.txt")For detailed guidance on vocabulary formats, tips, domain-specific examples, and managing multiple vocabularies, see VOCABULARY-GUIDE.md.
-
"sox: command not found"
- Install sox using your system package manager
-
"whisper-cli: command not found"
- Ensure Whisper.cpp is built and the path is correct
- Check that
~/whisper.cpp/build/bin/whisper-cliexists
-
No audio recorded
- Check your microphone permissions
- Test sox manually:
sox -d -r 16000 -c 1 -b 16 test.wav
-
Transcription stuck on "Processing transcription, please wait..."
- This issue has been fixed in recent versions
- The fix includes improved process sentinel handling, automatic cleanup of old processes, and proper file management
- Ensures reliable transcription completion when switching between fast and accurate modes
-
Transcription not working
- Verify the model files exist:
- Fast mode:
~/whisper.cpp/models/ggml-base.en.bin - Accurate mode:
~/whisper.cpp/models/ggml-medium.en.bin
- Fast mode:
- Test whisper-cli manually with a wav file
- Verify the model files exist:
Test each component individually:
# Test sox recording (record 5 seconds)
sox -d -r 16000 -c 1 -b 16 test.wav trim 0 5
# Test whisper transcription (fast mode)
~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin -f test.wav
# Test whisper transcription (accurate mode)
~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-medium.en.bin -f test.wav- Recording: Uses
soxto record audio at 16kHz, mono, 16-bit - Processing: Calls
whisper-cliwith the recorded audio file - Integration: Captures the output and inserts it into your Emacs buffer
- Cleanup: Automatically cleans up temporary files and buffers
This project is released under the MIT License.
Feel free to submit issues and pull requests to improve this package.