This repository contains tools and scripts for extracting tables or text from YouTube videos. The extraction process involves multiple steps, including video download, screenshot capture, PDF generation, OCR processing, and table extraction.
- Download the video using yt-dlp.
- Capture screenshots at specified intervals using FFmpeg.
- Combine screenshots into a PDF file using PyPDF2.
- (Optional) Perform OCR on the PDF to extract text using OCRmyPDF.
- Extract tables or text from the OCR-processed PDF using pytesseract. However, for higher accuracy, it’s recommended to use Adobe Acrobat’s PDF to Excel tool for table extraction.