Skip to content

jamesob/stt

 
 

Repository files navigation

STT

Hold a key, speak, release -- your words appear wherever your cursor is.

Like SuperWhisper, but free. Like Wispr Flow, but local.

Demo

  • Cross-platform -- macOS (Apple Silicon) and Linux (Wayland: Sway, Hyprland, etc.)
  • Any backend -- local MLX Whisper, any OpenAI-compatible server, whisper.cpp HTTP, or Groq cloud
  • Hold-to-record -- global hotkey works in any application
  • Free & open source -- no subscription, no cloud dependency required

Install

uv tool install git+https://github.com/jamesob/stt.git

A setup wizard runs on first launch. To update:

uv tool install --reinstall git+https://github.com/jamesob/stt.git

Linux dependencies

STT checks for missing dependencies at startup and prints install commands. For reference:

Arch Linux:

sudo pacman -S wtype wl-clipboard gtk4-layer-shell \
    gobject-introspection portaudio pipewire-pulse

Debian / Ubuntu:

sudo apt install wtype wl-clipboard gtk4-layer-shell-dev \
    libgirepository1.0-dev gir1.2-gtk-4.0 gir1.2-gtk4layershell-1.0 \
    libportaudio2 portaudio19-dev pipewire-pulse

Your user must be in the input group for keyboard capture:

sudo usermod -aG input $USER
newgrp input  # or log out and back in

macOS permissions

Grant Accessibility and Input Monitoring (System Settings > Privacy & Security) to your terminal app -- not to "stt".

Usage

stt
Action Keys
Record Hold trigger key (default: Right Cmd / Left Alt)
Record + Enter Hold Shift while recording
Cancel ESC
Quit Ctrl+C

Configuration

Settings live in ~/.config/stt/config.yml. Run stt --config to reconfigure. See config.sample.yml for all options.

language: en
hotkey: cmd_r
sound_enabled: true

backends:
  default:
    provider: openai
    openai_base_url: http://localhost:8000
    openai_whisper_model: whisper-large-v3

order:
  - default

Providers

Provider Backend keys Notes
openai openai_base_url, openai_api_key, openai_whisper_model Any OpenAI-compatible server (vLLM, faster-whisper, etc.)
whisper-cpp-http whisper_cpp_http_url Local whisper.cpp HTTP server
mlx whisper_model Apple Silicon, offline
parakeet parakeet_model Apple Silicon, English only, very fast
groq groq_api_key Cloud, requires API key

Fallback chains

Backends listed in order are tried in sequence. If a backend with connect_timeout is unreachable, STT falls back to the next one:

backends:
  qwen:
    provider: openai
    openai_base_url: http://gpu-server:8200
    openai_whisper_model: Qwen/Qwen3-ASR-1.7B
    connect_timeout: 2
  local:
    provider: mlx
    whisper_model: large-v3-turbo

order:
  - qwen
  - local

Benchmark mode runs all backends in parallel and logs timing for comparison. The first backend in order is the primary (its result is used):

benchmark: true

Prompt tuning

The prompt setting helps Whisper recognize domain-specific terms:

prompt: Claude, Anthropic, TypeScript, React, API endpoint

Development

git clone https://github.com/jamesob/stt.git
cd stt
uv sync
uv run stt

License

MIT

About

Excellent, local, cross-platform text-to-speech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.4%
  • Makefile 0.6%