Sotto

sotto voce — in a quiet voice · そっと — softly, unobtrusively

Fully-local dictation for macOS (Apple Silicon). Hold a hotkey, speak in English or Japanese, release — your words are transcribed with Whisper, lightly cleaned up by a small local LLM (punctuation, filler-word removal), and pasted at the cursor of whatever app you're in.

Everything runs on-device via MLX. No audio or text ever leaves your Mac, and nothing is persisted — audio and transcripts live only in RAM.

ASR: mlx-community/whisper-large-v3-turbo (EN/JA auto-detect)
Cleanup: mlx-community/Qwen3.5-4B-4bit (can be toggled off)
Memory: ~4.5 GB resident while running
Latency: ~2–3 s for a 10 s utterance after warmup

Setup

Requires uv and an Apple Silicon Mac.

uv sync
uv run sotto download   # one-time, ~4 GB into the HuggingFace cache

macOS permissions

macOS attributes permissions to the app you launch from — during development that's your terminal (Terminal.app, iTerm2, Ghostty, ...). Grant your terminal all three in System Settings → Privacy & Security:

Permission	Used for	Symptom if missing
Microphone	recording your voice	recordings are silent (rms ≈ 0)
Input Monitoring	the global hold-to-talk hotkey	hotkey silently does nothing
Accessibility	simulated ⌘V to paste at the cursor	text never appears

Microphone prompts automatically on first recording. The other two usually need to be added manually (+ button → select your terminal app). Restart the terminal after granting. If you switch terminal apps, re-grant.

Run the permission doctor to check:

uv run sotto hotkey-test

It prints Accessibility status and DOWN/UP when you press the hotkey; silence on keypress means Input Monitoring is missing.

Usage

uv run sotto run        # menu bar app

Hold Right Option (⌥), speak, release. The cleaned text is pasted into the focused text field and your previous clipboard is restored.

Menu bar: 🎤 idle · 🔴 recording · ✍️ processing. The menu lets you toggle LLM cleanup, set the language (Universal auto-detects from the audio; force English or Japanese for short utterances that auto-detect gets wrong), pick the microphone (the Microphone submenu shows which device is in use — virtual devices from Loom/Zoom/etc. can silently become the system default), change the hotkey (Right Option / Right Command / F13), and switch Whisper models.

Config lives at ~/Library/Application Support/sotto/config.toml.

Testing each stage

uv run sotto devices                # list input devices, show selected
uv run sotto record --seconds 3     # mic level check
uv run sotto transcribe --seconds 5 # record + Whisper
uv run sotto transcribe --language ja  # force Japanese (auto/en/ja)
uv run sotto clean "um so I think uh we should ship it"
uv run sotto inject "テスト ✅" --delay 3  # focus a text field within 3s
uv run sotto run --no-menubar       # full pipeline, headless with logs

Limitations

Doesn't work in secure input fields (passwords) — macOS blocks synthetic events there by design.
Non-text clipboard contents (images, files) are not restored after pasting.
The fn key can't be used as the hotkey (not visible to event taps).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src/sotto		src/sotto
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sotto

Setup

macOS permissions

Usage

Testing each stage

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sotto

Setup

macOS permissions

Usage

Testing each stage

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages