Skip to content

CLI to transcribe YouTube audio to clean JSON + SRT using local Whisper/whisper.cpp or cloud engines (Gemini/OpenAI/ Deepgram). Supports chapters, summaries, caching, and resilient retries.

License

Notifications You must be signed in to change notification settings

prateekjain24/TubeScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TubeScribe (ytx) — Fast YouTube Transcription & Captions

PyPI Release Python License CI Downloads

Repository: https://github.com/prateekjain24/TubeScribe

TubeScribe (CLI command: ytx) downloads YouTube audio, normalizes it, transcribes with your chosen engine, and writes clean transcript JSON + SRT captions. It includes smart caching, chapter processing, and optional summarization.

Resources

  • Quickstart: README.md
  • Engines: ytx/src/ytx/engines/ (whisper_engine.py, gemini_engine.py, openai_engine.py, deepgram_engine.py)
  • CLI entry: ytx/src/ytx/cli.py (commands, flags, progress)
  • Config guide: docs/CONFIG.md (env vars, timeouts, cache, engine options)
  • API overview: docs/API.md (modules, models, extension points)
  • Release checklist: docs/RELEASE.md
  • Health check: ytx health
  • End‑to‑end script: scripts/integration_e2e.sh

Quickstart (≈ 2 minutes)

  1. Prereqs
  • Python 3.10+
  • FFmpeg on PATH (check: ffmpeg -version)
  1. Install
  • Recommended (CLI): pipx install tubescribe (or pip install tubescribe)
  • Verify: ytx --version or tubescribe --version

Dev install (from source)

  • cd ytx && python3 -m venv .venv && source .venv/bin/activate
  • python -m pip install -U pip setuptools wheel
  • python -m pip install -e .
  • Or without installing: from repo root → export PYTHONPATH="$(pwd)/ytx/src" && cd ytx && python3 -m ytx.cli --help
  1. Health check
  • ytx health (checks ffmpeg, API key presence, and basic network)
  1. Transcribe (local Whisper)
  • ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small

Engines at a glance

  • Whisper (local, faster‑whisper): fast on CPU/GPU; default.
  • Whisper.cpp (Metal): on Apple Silicon; pass --engine whispercpp --model /path/to/model.gguf.
  • Gemini (cloud): best‑effort timestamps; recommended --timestamps chunked --fallback.
  • OpenAI (cloud): --engine openai (SDK‑first optional; HTTP fallback).
  • Deepgram (cloud): --engine deepgram (SDK‑first optional; HTTP fallback).
  • ElevenLabs (cloud): stub in place; STT support pending.

Common flags

  • --engine whisper|whispercpp|gemini|openai|deepgram — choose an engine
  • --model <name> — engine model (e.g., small, large-v3-turbo, gemini-2.5-flash, whisper-1)
  • --timestamps native|chunked|none — timestamp policy (chunked recommended for LLM engines)
  • --engine-opts '{"k":v}' — provider options (e.g., Deepgram: {"utterances":true,"smart_format":true})
  • --max-download-abr-kbps <N> — cap YouTube audio bitrate during download (default 96; set 0 to disable)
  • --by-chapter --parallel-chapters --chapter-overlap 2.0 — process chapters in parallel
  • --summarize --summarize-chapters — overall TL;DR + bullets; per‑chapter summaries
  • --output-dir ./artifacts — write outputs outside the cache dir
  • --overwrite — ignore cache and reprocess
  • --fallback — on Gemini errors, fallback to Whisper
  • --debug — verbose logs

Examples

  • Whisper (CPU):
    • ytx transcribe "https://youtu.be/<VIDEOID>" --engine whisper --model small
  • Whisper (Metal via whisper.cpp):
    • ytx transcribe "https://youtu.be/<VIDEOID>" --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin
  • Gemini (chunked timestamps + fallback):
    • ytx transcribe "https://youtu.be/<VIDEOID>" --engine gemini --timestamps chunked --fallback
  • OpenAI (verbose segments when available):
    • ytx transcribe "https://youtu.be/<VIDEOID>" --engine openai --timestamps native
  • Deepgram (utterances):
    • ytx transcribe "https://youtu.be/<VIDEOID>" --engine deepgram --engine-opts '{"utterances":true,"smart_format":true}' --timestamps native
  • Chapters + summaries:
    • ytx transcribe "https://youtu.be/<VIDEOID>" --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize
  • Summarize an existing transcript JSON:
    • ytx summarize-file /path/to/<video_id>.json --write

Configuration (copy .env.example.env)

  • Cloud keys: OPENAI_API_KEY, DEEPGRAM_API_KEY, GEMINI_API_KEY (or GOOGLE_API_KEY)
  • Engine defaults: YTX_ENGINE, WHISPER_MODEL
  • Engine options: YTX_ENGINE_OPTS (JSON), YTX_PREFER_SDK=true (prefer SDK for OpenAI/Deepgram)
  • Timeouts: YTX_NETWORK_TIMEOUT, YTX_DOWNLOAD_TIMEOUT, YTX_TRANSCRIBE_TIMEOUT, YTX_SUMMARIZE_TIMEOUT
  • Cache: YTX_CACHE_DIR, YTX_CACHE_TTL_SECONDS|DAYS
  • whisper.cpp: YTX_WHISPERCPP_BIN, YTX_WHISPERCPP_NGL, YTX_WHISPERCPP_THREADS

Outputs & cache

  • JSON: <video_id>.json (TranscriptDoc) — includes segments; optionally chapters and summary.
  • SRT: <video_id>.srt — wrapped captions.
  • Cache layout (XDG): ~/.cache/ytx/<video_id>/<engine>/<model>/<config_hash>/
    • transcript.json, captions.srt, meta.json (provenance), summary.json (if generated)

Apple Silicon (whisper.cpp)

  • Build: make -j METAL=1 in whisper.cpp
  • Run: ytx transcribe ... --engine whispercpp --model /path/to/model.gguf
  • Tuning: YTX_WHISPERCPP_NGL (30–40 typical), YTX_WHISPERCPP_THREADS

Troubleshooting

  • “ffmpeg not found”: install FFmpeg and ensure it’s on PATH (see Requirements).
  • “Restricted / Private / Age‑restricted”: use cookies with yt‑dlp outside the tool to download audio locally, then run ytx summarize-file on the transcript.
  • “ytx: command not found”: ensure your Python scripts path is on PATH (e.g., ~/.local/bin or pipx bin dir).
  • “No module named ytx”: avoid running ytx from inside the ytx/ folder; or use module form: PYTHONPATH=ytx/src python -m ytx.cli …
  • Gemini timestamps: best‑effort; prefer --timestamps chunked for reliable coarse timings.

Health Reference

  • ffmpeg: must be on PATH. macOS: brew install ffmpeg; Ubuntu: sudo apt-get install -y ffmpeg.
  • whisper_engine: “available” if faster-whisper import works; if not, reinstall: pip install -U tubescribe.
  • whispercpp_bin: “configured” if YTX_WHISPERCPP_BIN points to a valid binary; “present” if a main whisper.cpp binary exists on PATH; otherwise “absent”. Build whisper.cpp with Metal for Apple Silicon and set the env.
  • yt_dlp: check presence of yt-dlp executable; install via pip install yt-dlp or brew install yt-dlp.
  • gemini_api_key: set GEMINI_API_KEY or GOOGLE_API_KEY for summaries.
  • openai_api_key: set OPENAI_API_KEY (should start with sk-) for --engine openai.
  • deepgram_api_key: set DEEPGRAM_API_KEY for --engine deepgram.
  • network: basic internet reachability.

Useful commands

  • Health: ytx health
  • Update check: ytx update-check
  • Cache: ytx cache ls | ytx cache stats | ytx cache clear --yes

Export Markdown notes

  • From cached transcript by video id:
    • ytx export --video-id <VIDEOID> --to md --output-dir ./notes --md-frontmatter
  • From a TranscriptDoc JSON file:
    • ytx export --from-file /path/to/<video_id>.json --to md --output-dir ./notes --md-frontmatter
  • Options:
    • --md-link-style short|long — youtu.be vs full URL
    • --md-include-transcript — append full transcript section (off by default)
    • --md-include-chapters/--no-md-include-chapters — include chapter outline (on by default)
    • --md-auto-chapters-min N — if no chapters are present, synthesize outline every N minutes

Notes-ready Markdown contents

  • Title linked to YouTube (short or full link style)
  • Optional YAML frontmatter (Obsidian-friendly): title, url, date, duration, engine, model, tags
  • Summary (TL;DR) and Key Points bullets (when present)
  • Chapter outline with clickable timestamps (native or synthesized)
  • Optional full transcript section with timestamped bullets

Example output (Markdown)

---
title: Sample Title
url: https://youtu.be/ABCDEFGHIJK
date: 2025-09-08
duration: 12:34
engine: gemini
model: gemini-2.5-flash
tags: [youtube, transcript]
---

# [Sample Title](https://youtu.be/ABCDEFGHIJK)

## Summary
One‑paragraph TL;DR here

## Key Points
- Point A
- Point B

## Chapters
### [0:00](https://youtu.be/ABCDEFGHIJK?t=0) Intro
### [5:23](https://youtu.be/ABCDEFGHIJK?t=323) Main Topic

Troubleshooting export

  • “No cache found” when using --video-id: upgrade to tubescribe>=0.3.3 and ensure both transcript and SRT exist in the branch. Legacy <video_id>.json/.srt are supported.
  • Always works: use --from-file /path/to/<video_id>.json to export from a specific cached TranscriptDoc.
  • For long commands in zsh, use trailing \ per line to avoid “command not found”.

Contributing

  • Code lives under ytx/src/ytx/ (CLI: cli.py). Tests under ytx/tests/.
  • Run tests: cd ytx && PYTHONPATH=src python -m pytest -q
  • Lint (if configured): ruff check .

About

CLI to transcribe YouTube audio to clean JSON + SRT using local Whisper/whisper.cpp or cloud engines (Gemini/OpenAI/ Deepgram). Supports chapters, summaries, caching, and resilient retries.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published