Skip to content

piotrjura/transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transcriber

A TypeScript CLI that transcribes video/audio files to text using OpenAI Whisper. Runs locally — no API keys, no cloud services. Models download automatically on first use.

Features

  • Transcribe any video or audio file to timestamped text
  • Record from microphone and transcribe on the fly
  • Coherent sentence boundaries — doesn't break mid-sentence
  • Multiple Whisper model sizes (tiny → large)
  • Auto-downloads models on first use (cached locally)
  • Auto-installs system dependencies (ffmpeg, sox) via Homebrew when missing

Requirements

  • Node.js >= 18
  • macOS (Homebrew-based dependency management; Linux users install ffmpeg/sox manually)
  • ffmpeg — for extracting audio from video/audio files (auto-installed if missing)
  • sox — for microphone recording only (auto-installed if missing)

Setup

git clone <repo-url>
cd transcriber
npm install

Usage

Transcribe a file

npm run transcribe -- video.mp4
npm run transcribe -- podcast.mp3
npm run transcribe -- voice-memo.m4a

This creates a .txt file next to the input with timestamped sentences:

[00:00:00 → 00:00:07] Hello guys, my name is Piotr and I'm gonna teach you Laravel.
[00:00:07 → 00:00:17] First, let's start by talking about why it is so popular.

Record from microphone

npm run transcribe -- --mic

Records until you press Ctrl+C, then transcribes. Saves to recording-<timestamp>.txt.

Options

Flag Description Default
-o, --output <path> Output file path <input>.txt or recording-<timestamp>.txt
-m, --model <size> Whisper model size base
-l, --language <code> Language code (e.g. en, pl, de) en
--mic Record from microphone instead of a file off
--no-file Print to stdout only, don't write a file off

Models

Size Accuracy Speed Download
tiny Low Fastest ~75 MB
base Good Fast ~150 MB
small Better Moderate ~500 MB
medium Great Slow ~1.5 GB
large-v3-turbo Best Slowest ~3 GB

Models are downloaded from Hugging Face on first use and cached in ~/.cache/huggingface/.

# Use a larger model for better accuracy
npm run transcribe -- lecture.mp4 -m small

# Use tiny for quick drafts
npm run transcribe -- note.m4a -m tiny

Examples

# Transcribe a video, save next to it
npm run transcribe -- ~/recordings/lesson-01.mp4

# Transcribe with better accuracy
npm run transcribe -- interview.mp3 -m small -l en

# Transcribe Polish audio
npm run transcribe -- rozmowa.m4a -l pl

# Just print to terminal, no file
npm run transcribe -- memo.m4a --no-file

# Record a voice memo and transcribe
npm run transcribe -- --mic -o my-thought.txt

# Record with a specific model
npm run transcribe -- --mic -m small

Output format

Each line is a complete sentence with start and end timestamps:

[HH:MM:SS → HH:MM:SS] Sentence text here.

Whisper chunks are merged into coherent sentences — lines break on sentence-ending punctuation (. ! ?), not mid-sentence.

How it works

  1. Audio extraction — ffmpeg converts the input file to raw PCM audio (16kHz, mono, float32)
  2. Mic recording — sox captures from the default microphone in the same format
  3. Transcription — the Whisper model (via @huggingface/transformers + ONNX runtime) processes the audio with timestamps
  4. Sentence merging — raw chunks are merged into complete sentences using punctuation boundaries
  5. Output — formatted lines are printed and saved to a text file

Tests

npm test

Tests cover: timestamp formatting, sentence merging logic, audio loading, and ffmpeg extraction (integration test).

Tech stack

  • TypeScript + tsx (runtime)
  • @huggingface/transformers — runs Whisper ONNX models in Node.js
  • commander — CLI argument parsing
  • ffmpeg — audio extraction from video/audio files
  • sox — microphone recording
  • Node built-in test runner — tests

About

transcript audio and video files, creates transcripts from mic recordings, 100% free and local

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors