#pdf #slide #youtube #ffmpeg

app captube

Turn a YouTube slide-lecture video into a PDF of its unique slides, using ffmpeg keyframes + perceptual dedup

1 unstable release

0.1.0 Apr 19, 2026

#139 in Video

MIT/Apache

25KB
429 lines

captube

captube

Turn any YouTube video into slides.

crates.io docs.rs downloads license


captube takes a YouTube lecture URL and gives you back a PDF where every page is one unique slide, captured from the video itself.

Install

cargo install captube

Runtime dependencies (not bundled):

  • ffmpeg and ffprobe — on PATH
  • yt-dlp — on PATH

Use

captube 'https://www.youtube.com/watch?v=<VIDEO_ID>' -o slides.pdf

All options:

captube <URL> [OPTIONS]

  -o, --output <PATH>              Output PDF path [default: output.pdf]
      --scene-threshold <F>        ffmpeg scene score cut-off (0.0-1.0) [default: 0.30]
      --fps <F>                    Sampling fps used during scene scanning [default: 2.0]
      --max-width <U32>            Maximum px width of embedded frames [default: 1280]
      --dedup-threshold <F>        Mean pixel diff (0-255) to consider frames
                                   the same slide — raise for fewer pages,
                                   lower to keep subtler slide variations
                                   [default: 20.0]
      --keep-workdir               Keep intermediate files for inspection
  -v, --verbose                    Print per-frame dedup decisions

How it works

  1. Downloadyt-dlp fetches a video-only mp4 at ≤720p.
  2. Keyframe dumpffmpeg -skip_frame nokey decodes only keyframes (about one per GOP). Modern H.264 encoders put keyframes on scene boundaries, so these cover every real slide change — plus duplicates for slides that outlast a single GOP.
  3. Perceptual dedup — each keyframe is hashed as a 256×256 grayscale thumbnail and compared to the previous kept frame by mean absolute difference. Mouse-cursor-only motion collapses away.
  4. Settle re-extract — every remaining keyframe is re-extracted via -ss pts+0.8. This bypasses a decoder quirk where -skip_frame nokey occasionally hands out corrupt-looking frames at cross-fade boundaries, and it also lands on the stable post-transition frame if the keyframe happened to fall mid-fade.
  5. Final dedup + PDF — a small-threshold pass collapses any frames whose settled versions converged onto the same slide; printpdf writes one page per remaining frame.

On a 58-minute lecture the full pipeline (download → PDF) runs in ~17s on a modern x86_64 box.

License

Licensed under either of

at your option.

Dependencies

~42MB
~600K SLoC