TubeScribe (ytx) — Fast YouTube Transcription & Captions

Repository: https://github.com/prateekjain24/TubeScribe

TubeScribe (CLI command: ytx) downloads YouTube audio, normalizes it, transcribes with your chosen engine, and writes clean transcript JSON + SRT captions. It includes smart caching, chapter processing, and optional summarization.

Resources

Quickstart: README.md
Engines: ytx/src/ytx/engines/ (whisper_engine.py, gemini_engine.py, openai_engine.py, deepgram_engine.py)
CLI entry: ytx/src/ytx/cli.py (commands, flags, progress)
Config guide: docs/CONFIG.md (env vars, timeouts, cache, engine options)
API overview: docs/API.md (modules, models, extension points)
Release checklist: docs/RELEASE.md
Health check: ytx health
End‑to‑end script: scripts/integration_e2e.sh

Quickstart (≈ 2 minutes)

Prereqs

Python 3.10+
FFmpeg on PATH (check: ffmpeg -version)

Install

Recommended (CLI): pipx install tubescribe (or pip install tubescribe)
Verify: ytx --version or tubescribe --version

Dev install (from source)

cd ytx && python3 -m venv .venv && source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .
Or without installing: from repo root → export PYTHONPATH="$(pwd)/ytx/src" && cd ytx && python3 -m ytx.cli --help

Health check

ytx health (checks ffmpeg, API key presence, and basic network)

Transcribe (local Whisper)

ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small

Engines at a glance

Whisper (local, faster‑whisper): fast on CPU/GPU; default.
Whisper.cpp (Metal): on Apple Silicon; pass --engine whispercpp --model /path/to/model.gguf.
Gemini (cloud): best‑effort timestamps; recommended --timestamps chunked --fallback.
OpenAI (cloud): --engine openai (SDK‑first optional; HTTP fallback).
Deepgram (cloud): --engine deepgram (SDK‑first optional; HTTP fallback).
ElevenLabs (cloud): stub in place; STT support pending.

Common flags

--engine whisper|whispercpp|gemini|openai|deepgram — choose an engine
--model <name> — engine model (e.g., small, large-v3-turbo, gemini-2.5-flash, whisper-1)
--timestamps native|chunked|none — timestamp policy (chunked recommended for LLM engines)
--engine-opts '{"k":v}' — provider options (e.g., Deepgram: {"utterances":true,"smart_format":true})
--max-download-abr-kbps <N> — cap YouTube audio bitrate during download (default 96; set 0 to disable)
--by-chapter --parallel-chapters --chapter-overlap 2.0 — process chapters in parallel
--summarize --summarize-chapters — overall TL;DR + bullets; per‑chapter summaries
--output-dir ./artifacts — write outputs outside the cache dir
--overwrite — ignore cache and reprocess
--fallback — on Gemini errors, fallback to Whisper
--debug — verbose logs

Examples

Whisper (CPU):
- ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small
Whisper (Metal via whisper.cpp):
- ytx transcribe "https://youtu.be/<VIDEOID>" --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin
Gemini (chunked timestamps + fallback):
- ytx transcribe "https://youtu.be/<VIDEOID>" --engine gemini --timestamps chunked --fallback
OpenAI (verbose segments when available):
- ytx transcribe "https://youtu.be/<VIDEOID>" --engine openai --timestamps native
Deepgram (utterances):
- ytx transcribe "https://youtu.be/<VIDEOID>" --engine deepgram --engine-opts '{"utterances":true,"smart_format":true}' --timestamps native
Chapters + summaries:
- ytx transcribe "https://youtu.be/<VIDEOID>" --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize
Summarize an existing transcript JSON:
- ytx summarize-file /path/to/<video_id>.json --write

Configuration (copy .env.example → .env)

Cloud keys: OPENAI_API_KEY, DEEPGRAM_API_KEY, GEMINI_API_KEY (or GOOGLE_API_KEY)
Engine defaults: YTX_ENGINE, WHISPER_MODEL
Engine options: YTX_ENGINE_OPTS (JSON), YTX_PREFER_SDK=true (prefer SDK for OpenAI/Deepgram)
Timeouts: YTX_NETWORK_TIMEOUT, YTX_DOWNLOAD_TIMEOUT, YTX_TRANSCRIBE_TIMEOUT, YTX_SUMMARIZE_TIMEOUT
Cache: YTX_CACHE_DIR, YTX_CACHE_TTL_SECONDS|DAYS
whisper.cpp: YTX_WHISPERCPP_BIN, YTX_WHISPERCPP_NGL, YTX_WHISPERCPP_THREADS

Outputs & cache

JSON: <video_id>.json (TranscriptDoc) — includes segments; optionally chapters and summary.
SRT: <video_id>.srt — wrapped captions.
Cache layout (XDG): ~/.cache/ytx/<video_id>/<engine>/<model>/<config_hash>/
- transcript.json, captions.srt, meta.json (provenance), summary.json (if generated)

Apple Silicon (whisper.cpp)

Build: make -j METAL=1 in whisper.cpp
Run: ytx transcribe ... --engine whispercpp --model /path/to/model.gguf
Tuning: YTX_WHISPERCPP_NGL (30–40 typical), YTX_WHISPERCPP_THREADS

Troubleshooting

“ffmpeg not found”: install FFmpeg and ensure it’s on PATH (see Requirements).
“Restricted / Private / Age‑restricted”: use cookies with yt‑dlp outside the tool to download audio locally, then run ytx summarize-file on the transcript.
“ytx: command not found”: ensure your Python scripts path is on PATH (e.g., ~/.local/bin or pipx bin dir).
“No module named ytx”: avoid running ytx from inside the ytx/ folder; or use module form: PYTHONPATH=ytx/src python -m ytx.cli …
Gemini timestamps: best‑effort; prefer --timestamps chunked for reliable coarse timings.

Health Reference

ffmpeg: must be on PATH. macOS: brew install ffmpeg; Ubuntu: sudo apt-get install -y ffmpeg.
whisper_engine: “available” if faster-whisper import works; if not, reinstall: pip install -U tubescribe.
whispercpp_bin: “configured” if YTX_WHISPERCPP_BIN points to a valid binary; “present” if a main whisper.cpp binary exists on PATH; otherwise “absent”. Build whisper.cpp with Metal for Apple Silicon and set the env.
yt_dlp: check presence of yt-dlp executable; install via pip install yt-dlp or brew install yt-dlp.
gemini_api_key: set GEMINI_API_KEY or GOOGLE_API_KEY for summaries.
openai_api_key: set OPENAI_API_KEY (should start with sk-) for --engine openai.
deepgram_api_key: set DEEPGRAM_API_KEY for --engine deepgram.
network: basic internet reachability.

Useful commands

Health: ytx health
Update check: ytx update-check
Cache: ytx cache ls | ytx cache stats | ytx cache clear --yes

Export Markdown notes

From cached transcript by video id:
- ytx export --video-id <VIDEOID> --to md --output-dir ./notes --md-frontmatter
From a TranscriptDoc JSON file:
- ytx export --from-file /path/to/<video_id>.json --to md --output-dir ./notes --md-frontmatter
Options:
- --md-link-style short|long — youtu.be vs full URL
- --md-include-transcript — append full transcript section (off by default)
- --md-include-chapters/--no-md-include-chapters — include chapter outline (on by default)
- --md-auto-chapters-min N — if no chapters are present, synthesize outline every N minutes

Notes-ready Markdown contents

Title linked to YouTube (short or full link style)
Optional YAML frontmatter (Obsidian-friendly): title, url, date, duration, engine, model, tags
Summary (TL;DR) and Key Points bullets (when present)
Chapter outline with clickable timestamps (native or synthesized)
Optional full transcript section with timestamped bullets

Example output (Markdown)

---
title: Sample Title
url: https://youtu.be/ABCDEFGHIJK
date: 2025-09-08
duration: 12:34
engine: gemini
model: gemini-2.5-flash
tags: [youtube, transcript]
---

# [Sample Title](https://youtu.be/ABCDEFGHIJK)

## Summary
One‑paragraph TL;DR here

## Key Points
- Point A
- Point B

## Chapters
### [0:00](https://youtu.be/ABCDEFGHIJK?t=0) Intro
### [5:23](https://youtu.be/ABCDEFGHIJK?t=323) Main Topic

Troubleshooting export

“No cache found” when using --video-id: upgrade to tubescribe>=0.3.3 and ensure both transcript and SRT exist in the branch. Legacy <video_id>.json/.srt are supported.
Always works: use --from-file /path/to/<video_id>.json to export from a specific cached TranscriptDoc.
For long commands in zsh, use trailing \ per line to avoid “command not found”.

Contributing

Code lives under ytx/src/ytx/ (CLI: cli.py). Tests under ytx/tests/.
Run tests: cd ytx && PYTHONPATH=src python -m pytest -q
Lint (if configured): ruff check .

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
ytx		ytx
.DS_Store		.DS_Store
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TubeScribe (ytx) — Fast YouTube Transcription & Captions

About

Uh oh!

Releases 10

Packages

Languages

License

prateekjain24/TubeScribe

Folders and files

Latest commit

History

Repository files navigation

TubeScribe (ytx) — Fast YouTube Transcription & Captions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Languages

Packages