A TypeScript CLI that transcribes video/audio files to text using OpenAI Whisper. Runs locally — no API keys, no cloud services. Models download automatically on first use.
- Transcribe any video or audio file to timestamped text
- Record from microphone and transcribe on the fly
- Coherent sentence boundaries — doesn't break mid-sentence
- Multiple Whisper model sizes (tiny → large)
- Auto-downloads models on first use (cached locally)
- Auto-installs system dependencies (ffmpeg, sox) via Homebrew when missing
- Node.js >= 18
- macOS (Homebrew-based dependency management; Linux users install ffmpeg/sox manually)
- ffmpeg — for extracting audio from video/audio files (auto-installed if missing)
- sox — for microphone recording only (auto-installed if missing)
git clone <repo-url>
cd transcriber
npm installnpm run transcribe -- video.mp4
npm run transcribe -- podcast.mp3
npm run transcribe -- voice-memo.m4aThis creates a .txt file next to the input with timestamped sentences:
[00:00:00 → 00:00:07] Hello guys, my name is Piotr and I'm gonna teach you Laravel.
[00:00:07 → 00:00:17] First, let's start by talking about why it is so popular.
npm run transcribe -- --micRecords until you press Ctrl+C, then transcribes. Saves to recording-<timestamp>.txt.
| Flag | Description | Default |
|---|---|---|
-o, --output <path> |
Output file path | <input>.txt or recording-<timestamp>.txt |
-m, --model <size> |
Whisper model size | base |
-l, --language <code> |
Language code (e.g. en, pl, de) |
en |
--mic |
Record from microphone instead of a file | off |
--no-file |
Print to stdout only, don't write a file | off |
| Size | Accuracy | Speed | Download |
|---|---|---|---|
tiny |
Low | Fastest | ~75 MB |
base |
Good | Fast | ~150 MB |
small |
Better | Moderate | ~500 MB |
medium |
Great | Slow | ~1.5 GB |
large-v3-turbo |
Best | Slowest | ~3 GB |
Models are downloaded from Hugging Face on first use and cached in ~/.cache/huggingface/.
# Use a larger model for better accuracy
npm run transcribe -- lecture.mp4 -m small
# Use tiny for quick drafts
npm run transcribe -- note.m4a -m tiny# Transcribe a video, save next to it
npm run transcribe -- ~/recordings/lesson-01.mp4
# Transcribe with better accuracy
npm run transcribe -- interview.mp3 -m small -l en
# Transcribe Polish audio
npm run transcribe -- rozmowa.m4a -l pl
# Just print to terminal, no file
npm run transcribe -- memo.m4a --no-file
# Record a voice memo and transcribe
npm run transcribe -- --mic -o my-thought.txt
# Record with a specific model
npm run transcribe -- --mic -m smallEach line is a complete sentence with start and end timestamps:
[HH:MM:SS → HH:MM:SS] Sentence text here.
Whisper chunks are merged into coherent sentences — lines break on sentence-ending punctuation (. ! ?), not mid-sentence.
- Audio extraction — ffmpeg converts the input file to raw PCM audio (16kHz, mono, float32)
- Mic recording — sox captures from the default microphone in the same format
- Transcription — the Whisper model (via
@huggingface/transformers+ ONNX runtime) processes the audio with timestamps - Sentence merging — raw chunks are merged into complete sentences using punctuation boundaries
- Output — formatted lines are printed and saved to a text file
npm testTests cover: timestamp formatting, sentence merging logic, audio loading, and ffmpeg extraction (integration test).
- TypeScript + tsx (runtime)
- @huggingface/transformers — runs Whisper ONNX models in Node.js
- commander — CLI argument parsing
- ffmpeg — audio extraction from video/audio files
- sox — microphone recording
- Node built-in test runner — tests