TAK — Talk to Keyboard

Push-to-talk speech-to-text that types anywhere.

Hold a key → speak → release → your words appear wherever you're typing. Works in any application — terminals, browsers, editors, chat apps, anything with a text cursor.

Support TAK

TAK is free and open source (MIT). If it makes your workflow faster, consider supporting continued development:

Method	Details
Credit card	Sponsor on GitHub — no fees, recurring or one-time
Solana (SOL)	`GWhM8kiCcAKqsNn1WMfyVv1tgTCdz6mGCGWy3UqqHLQx`

All crypto goes directly to the maintainer's wallets. Full details: docs/donations.md

Features

Push-to-talk — microphone is only open while you hold the key (no always-on listening)
System-wide — types into whatever window/field currently has focus
Cross-platform — Linux (X11) and macOS (Apple Silicon)
Bilingual — auto-detects English and Spanish
Local & private — runs entirely on your machine via faster-whisper (Linux) or mlx-whisper (macOS) — no cloud APIs
GPU-accelerated — uses CUDA on NVIDIA GPUs (Linux) or Metal on Apple Silicon (macOS)
Auto-normalization — automatically boosts quiet microphone levels
Voice activity detection — filters out silence and background noise
Modular architecture — platform-agnostic core with pluggable backends
Visual overlay — floating recording indicator on all screens (macOS)
Native menu bar app — macOS status item with recording state, preferences, and uninstall (macOS)
Preferences UI — graphical settings for trigger key, model, audio device, and clipboard mode (macOS)
In-app model downloads — progress bar with speed/ETA when switching models or on first launch (macOS)
Configurable — choose your trigger key, model size, and input method

Requirements

Linux

Linux with X11 (Wayland support planned)
NVIDIA GPU with CUDA (or use --cpu for CPU-only)
Conda (Miniconda or Anaconda)
System packages: xdotool, xclip, libportaudio2

macOS

macOS 13+ (Ventura or later)
Apple Silicon (M1/M2/M3/M4) recommended — Metal GPU acceleration via MLX
Intel Macs work but run CPU-only inference (significantly slower)
Homebrew
Conda (Miniconda or Anaconda)

Installation

macOS App (Recommended)

Download the latest signed and notarized DMG:

Download TAK.dmg

Open the disk image, drag TAK into Applications, and launch. Grant Accessibility and Microphone permissions when prompted. The speech model (~1.5 GB) downloads automatically on first launch.

Requires macOS 13+ and Apple Silicon (M1/M2/M3/M4).

Quick Install from Source (macOS and Linux)

git clone https://github.com/lchonkan/tak.git
cd tak
./install.sh

The installer automatically detects your platform, installs system dependencies, creates a conda environment, and verifies the setup.

Manual Install

Linux

1. Install system dependencies

sudo apt install xdotool xclip libportaudio2

2. Create the Conda environment

conda create -n tak python=3.11 -y
conda activate tak

3. Install Python dependencies

pip install -r requirements-linux.txt

Or install manually:

pip install faster-whisper pynput sounddevice numpy

For GPU acceleration (recommended), also install the CUDA libraries:

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

4. Input permissions

pynput needs access to /dev/input to detect key presses. Add your user to the input group:

sudo usermod -aG input $USER
# Log out and back in for the change to take effect

macOS

# 1. Install system dependencies
brew install portaudio ffmpeg

# 2. Create Conda environment and install Python packages
conda create -n tak python=3.11 -y
conda activate tak
pip install -r requirements-macos.txt

# 3. Grant Accessibility permission (required for key detection)
#    System Settings → Privacy & Security → Accessibility → add your terminal app

# 4. Run
./run.sh

TAK checks for Accessibility permission on startup and will show a clear error if it's missing. macOS will also prompt for Microphone access on the first recording — click "Allow".

Quick Start

Linux

./run.sh

Hold Right Ctrl → speak → release → text appears at cursor. Press Ctrl+C to quit.

macOS

./run.sh

Hold Right Option → speak → release → text appears at cursor. Press Ctrl+C to quit.

A red REC pill appears at the bottom of every screen while recording, turning yellow during transcription.

First run downloads the Whisper model (~1.5 GB). Subsequent runs start much faster.

Usage

Options

./run.sh --key caps_lock       # Use a different trigger key
./run.sh --model large-v3      # More accurate (uses more VRAM)
./run.sh --model small          # Faster, less accurate
./run.sh --model tiny           # Fastest, least accurate
./run.sh --clipboard            # Use clipboard paste (always on for macOS)
./run.sh --cpu                  # Run on CPU (Linux only, no GPU required)
./run.sh --device 2             # Use a specific audio input device

You can also run directly with Python (after activating the conda env):

conda activate tak
python -m tak --model medium

Platform defaults

	Linux	macOS
Trigger key	`ctrl_r` (Right Ctrl)	`alt_r` (Right Option)
Whisper model	`medium`	`turbo`
Text injection	Simulated keystrokes	Clipboard paste (Cmd+V)
GPU acceleration	CUDA (NVIDIA)	Metal (Apple Silicon)

Available trigger keys

alt_r (macOS default), ctrl_r (Linux default), ctrl_l, alt_l,
shift_r, shift_l, scroll_lock, pause, insert, f1–f12, caps_lock

Note: scroll_lock, pause, and insert are only available on Linux.

Model sizes

Model	VRAM/RAM	Speed	Accuracy	Notes
`tiny`	~1 GB	Fastest	Basic
`base`	~1 GB	Fast	Good
`small`	~2 GB	Moderate	Better
`medium`	~5 GB	Slower	Great	Linux default
`large-v3`	~6 GB	Slowest	Best
`turbo`	~2 GB	Fast	Great	macOS default

Models are downloaded on first use and cached in ~/.cache/huggingface/hub/.

How It Works

TAK has three main stages that run in a loop:

Key listener — pynput monitors for the trigger key. On press, recording starts; on release, recording stops.
Audio recording — On Linux, captures audio via PipeWire (pw-record) or falls back to ALSA via sounddevice. On macOS, captures audio via Core Audio through sounddevice. Audio is resampled to 16 kHz mono (Whisper's native format). Quiet audio is auto-normalized so Whisper can hear it.
Transcription & typing — On Linux, faster-whisper transcribes the audio using CUDA on your NVIDIA GPU. On macOS, mlx-whisper transcribes using Metal on Apple Silicon. The detected text is injected into the focused window using platform-specific methods (xdotool on Linux, clipboard paste via Cmd+V on macOS).

Transcription runs in a background thread so the key listener stays responsive. If you start a new recording while the previous one is still being transcribed, it waits until the current transcription finishes.

On macOS, a floating pill overlay appears on all connected screens: red while recording, yellow while transcribing. The overlay uses PyObjC (NSPanel) and runs on the main thread via an NSApplication event loop, while pynput runs in a daemon thread.

Architecture

TAK uses a modular architecture with dependency injection. The core application logic is platform-agnostic, while platform-specific backends (audio recording, transcription, text injection) are plugged in at startup.

graph TD
    subgraph "tak/__main__.py (Entry Point)"
        EP[Platform Detection] --> |Linux| LINUX[platforms.linux]
        EP --> |macOS| MACOS[platforms.macos]
    end

    subgraph "tak/core/app.py (Shared Core)"
        APP[TakApp]
        BASE_REC[BaseAudioRecorder]
        BASE_TR[BaseTranscriber]
        PARSE[parse_args]
    end

    subgraph "tak/backend/linux.py"
        LREC[LinuxAudioRecorder<br/>PipeWire / ALSA]
        LTR[LinuxTranscriber<br/>faster-whisper + CUDA]
        LTI[type_text<br/>xdotool / xclip]
    end

    subgraph "tak/backend/macos.py"
        MREC[MacAudioRecorder<br/>Core Audio]
        MTR[MacTranscriber<br/>mlx-whisper + Metal]
        MTI[type_text<br/>AppleScript / pbcopy]
    end

    LINUX --> |injects backends| APP
    MACOS --> |injects backends| APP
    LREC --> |extends| BASE_REC
    LTR --> |extends| BASE_TR
    MREC --> |extends| BASE_REC
    MTR --> |extends| BASE_TR
    APP --> |uses| LREC
    APP --> |uses| LTR
    APP --> |uses| LTI
    APP --> |uses| MREC
    APP --> |uses| MTR
    APP --> |uses| MTI

For detailed architecture diagrams (class diagrams, sequence diagrams, state machines, threading model, audio pipeline, and more), see docs/architecture.md.

Project structure

tak/                                # Project root
├── run.sh                          # Cross-platform launcher
├── requirements-linux.txt          # Linux Python dependencies
├── requirements-macos.txt          # macOS Python dependencies
├── TAK.spec                        # PyInstaller spec for macOS .app bundle
├── setup_app.py                    # .app bundle build script
├── resources/
│   └── tak.icns                    # macOS app icon
├── README.md                       # This file
├── CONTRIBUTING.md                 # Git Flow and contributor guide
├── CLAUDE.md                       # Instructions for Claude Code
├── LICENSE
├── docs/
│   ├── architecture.md             # Detailed architecture diagrams
│   ├── platform-architecture.md    # Cross-platform stack comparison
│   ├── macos-implementation-plan.md # macOS implementation spec (completed)
│   └── donations.md                # Donation methods and wallet addresses
├── tak/                            # Python package
│   ├── __init__.py                 # Package marker
│   ├── __main__.py                 # CLI entry point (platform detection, backend wiring)
│   ├── core/
│   │   ├── app.py                  # Shared core (TakApp, base classes, CLI, constants)
│   │   ├── config.py               # TakConfig dataclass (platform-agnostic settings)
│   │   └── models.py               # Shared model metadata (MLX repo IDs)
│   ├── backend/
│   │   ├── linux.py                # Linux backend (faster-whisper, PipeWire/ALSA, xdotool)
│   │   └── macos.py                # macOS backend (mlx-whisper, Core Audio, AppleScript)
│   └── ui/
│       └── macos/
│           ├── design.py           # macOS design system (colors, fonts, card views)
│           ├── gui_main.py         # GUI entry point for macOS .app bundle
│           ├── overlay.py          # Floating recording/transcribing pill overlay
│           ├── menubar.py          # macOS menu bar status item and dropdown
│           ├── settings.py         # Preferences window (NSUserDefaults persistence)
│           └── splash.py           # Model download splash screen
└── .gitignore

Troubleshooting

Linux: Text doesn't appear in some apps

Some applications don't accept simulated keystrokes from xdotool. Use clipboard mode instead:

./run.sh --clipboard

Linux: Permission denied / key not detected

pynput needs access to /dev/input. Make sure your user is in the input group:

sudo usermod -aG input $USER
# Log out and back in

No audio input

List available audio devices:

conda activate tak && python -m sounddevice

Then specify the device index:

./run.sh --device <index>

macOS: Accessibility permission not granted

TAK needs Accessibility permission for pynput to detect global key events. Go to:

System Settings → Privacy & Security → Accessibility

Add your terminal app (Terminal.app, iTerm2, etc.) to the list.

macOS: Microphone permission

macOS will prompt for microphone access on the first recording attempt. Click "Allow" when prompted. If you accidentally denied it, re-enable in:

System Settings → Privacy & Security → Microphone

PipeWire not available

If pw-record is not installed, TAK automatically falls back to direct ALSA recording via sounddevice. This works but may not see PipeWire virtual devices (e.g., Bluetooth headsets routed through PipeWire). To install PipeWire tools:

sudo apt install pipewire-pulse pipewire-audio-client-libraries

CUDA errors on startup

Make sure you have the NVIDIA CUDA pip packages installed:

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

Or bypass GPU entirely:

./run.sh --cpu

Model download is slow

Whisper models are downloaded from Hugging Face on first use. If downloads are slow, you can set a mirror:

export HF_ENDPOINT=https://hf-mirror.com
./run.sh

TODO

Apple Developer ID code signing and notarization for Gatekeeper-ready distribution
DMG installer packaging
Auto-launch on login (Launch Agent / Login Item)
Wayland support (Linux)
Windows support
Multiple language selection in preferences
Test model switching behavior (what happens when user changes model in Settings)
Evaluate which Whisper models to expose to end users
Fix Spanish accent characters pasting incorrectly (keyboard input encoding)

Contributing

Contributions are welcome! See CONTRIBUTING.md for the branching model, commit conventions, and PR guidelines.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.claude/skills		.claude/skills
.github		.github
app		app
docs		docs
website		website
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

TAK — Talk to Keyboard

Support TAK

Features

Requirements

Linux

macOS

Installation

macOS App (Recommended)

Quick Install from Source (macOS and Linux)

Manual Install

Linux

1. Install system dependencies

2. Create the Conda environment

3. Install Python dependencies

4. Input permissions

macOS

Quick Start

Linux

macOS

Usage

Options

Platform defaults

Available trigger keys

Model sizes

How It Works

Architecture

Project structure

Troubleshooting

Linux: Text doesn't appear in some apps

Linux: Permission denied / key not detected

No audio input

macOS: Accessibility permission not granted

macOS: Microphone permission

PipeWire not available

CUDA errors on startup

Model download is slow

TODO

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages