Video Table/Text Extraction Tool

This repository contains tools and scripts for extracting tables or text from YouTube videos. The extraction process involves multiple steps, including video download, screenshot capture, PDF generation, OCR processing, and table extraction.

Workflow Overview

Download the video using yt-dlp.
Capture screenshots at specified intervals using FFmpeg.
Combine screenshots into a PDF file using PyPDF2.
(Optional) Perform OCR on the PDF to extract text using OCRmyPDF.
Extract tables or text from the OCR-processed PDF using pytesseract. However, for higher accuracy, it’s recommended to use Adobe Acrobat’s PDF to Excel tool for table extraction.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
sample_ouput		sample_ouput
README.md		README.md
download.sh		download.sh
ext.sh		ext.sh
extpdf.py		extpdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Table/Text Extraction Tool

Workflow Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Video Table/Text Extraction Tool

Workflow Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages