A Python script to generate summaries (Claude), podcasts (Whisper), and videos (RunwayML or Luma AI) from annoyingly long YouTube content.
- Original video: https://www.youtube.com/watch?v=_K-L9uhsBLM
- Summary: https://dl.dropbox.com/scl/fi/mdkbglfbs4m9ydeo9a2k7/video-_K-L9uhsBLM.mp4?rlkey=3wrowryg9gio1walaxhdbp2is&dl=0
- Generate concise summaries of YouTube videos
- Create engaging podcast scripts with multiple voices
- Generate AI-powered videos with synchronized podcast audio
- Support for multiple languages
- Multiple transcription options
- Multiple video generation providers
- Clone the repository:
git clone https://github.com/sliday/ytsum.git
cd ytsum- Install dependencies:
pip install -r requirements.txt- Install FFmpeg (required for audio/video processing):
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Windows: Download from FFmpeg website
- macOS:
Create a .env file with your API keys:
ANTHROPIC_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
LUMAAI_API_KEY=your_lumaai_api_key
RUNWAYML_API_SECRET=your_runwayml_api_key
REPLICATE_API_TOKEN=your_replicate_api_key
python ytsum.py "https://www.youtube.com/watch?v=VIDEO_ID"python ytsum.py --podcast "https://www.youtube.com/watch?v=VIDEO_ID"# Using Luma AI (faster, recommended)
python ytsum.py --podcast --lumaai "https://www.youtube.com/watch?v=VIDEO_ID"
# Using RunwayML
python ytsum.py --podcast --runwayml "https://www.youtube.com/watch?v=VIDEO_ID"--language: Specify output language (default: english)--ignore-subs: Force transcription even when subtitles exist--fast-whisper: Use Fast Whisper for transcription (faster)--whisper: Use OpenAI Whisper for transcription (more accurate)--replicate: Use Replicate's Incredibly Fast Whisper
All output files are saved in the out directory:
summary-{video_id}.txt: Text summarypodcast-{video_id}.txt: Podcast scriptpodcast-{video_id}.mp3: Podcast audiovideo-{video_id}.mp4: Final video with podcast audio
The tool supports two AI video generation providers:
- Faster generation times
- High-quality cinematic videos
- Supports camera movements and scene transitions
- Maintains visual consistency
- Optional image input for style reference
- High-quality video generation
- Requires input image
- Longer processing times
- Professional-grade output
Both providers:
- Generate base images using Flux AI
- Create video segments based on podcast content
- Combine segments with audio
- Support custom duration and aspect ratio
-
Fast Whisper (Default)
- Quick transcription
- Good accuracy
- No API key required
-
OpenAI Whisper
- High accuracy
- Slower processing
- Requires OpenAI API key
-
Replicate Whisper
- Fastest option
- Good accuracy
- Requires Replicate API key
Run the test suite:
python test_ytsum.pyRun specific test groups:
# Run Luma AI tests only
pytest -v -m luma
# Run RunwayML tests only
pytest -v -m runwayanthropic: Claude API for text generationopenai: Whisper API for transcription and TTSlumaai: Luma AI for video generation (recommended)runwayml: RunwayML for video generationreplicate: Flux AI for image generationffmpeg-python: Audio/video processingcolorama: Terminal output formattingpytest: Testing framework
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.