A powerful and easy-to-use tool for transcribing podcast MP3 files into clean, readable transcripts using OpenAI's Whisper AI model.
- High Accuracy: Uses OpenAI's Whisper model for state-of-the-art transcription
- Batch Processing: Transcribe entire folders of MP3 files at once
- Multiple Formats: Save transcripts as TXT, JSON, and SRT (subtitle) formats
- Timestamps: Optional timestamps for each segment
- Language Support: Auto-detects language or specify manually
- Clean Output: Beautiful, readable transcripts
- Progress Tracking: Visual progress bars for long transcriptions
- Flexible Models: Choose from 5 model sizes (speed vs accuracy)
- Python 3.8 or higher
- macOS, Linux, or Windows
- At least 2GB RAM (more for larger models)
- Internet connection (first time only, to download models)
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt
Note for macOS with Apple Silicon (M1/M2/M3): If you have issues, install PyTorch first:
pip install torch torchaudio pip install -r requirements.txt
-
Verify installation:
python transcribe.py --help
Transcribe all MP3 files in a folder:
python transcribe.py /path/to/your/podcastsThis will:
- Find all MP3 files in the specified folder
- Transcribe each file using the
basemodel - Save transcripts to a
transcripts/folder - Generate both TXT and JSON formats
Use a more accurate model:
python transcribe.py /path/to/podcasts --model mediumSpecify output directory:
python transcribe.py /path/to/podcasts --output my_transcriptsSpecify language (faster than auto-detect):
python transcribe.py /path/to/podcasts --language enChoose output formats:
# Save as TXT only
python transcribe.py /path/to/podcasts --formats txt
# Save all formats (TXT, JSON, SRT)
python transcribe.py /path/to/podcasts --formats txt json srtComplete example:
python transcribe.py ~/Downloads/podcasts \
--model medium \
--output transcripts \
--language en \
--formats txt json srtChoose the right model for your needs:
| Model | Speed | Accuracy | RAM Usage | Best For |
|---|---|---|---|---|
tiny |
β‘β‘β‘β‘β‘ | ββ | ~1 GB | Quick drafts, testing |
base |
β‘β‘β‘β‘ | βββ | ~1 GB | Default, good balance |
small |
β‘β‘β‘ | ββββ | ~2 GB | Better accuracy |
medium |
β‘β‘ | βββββ | ~5 GB | High quality transcripts |
large |
β‘ | ββββββ | ~10 GB | Professional use |
Recommendation: Start with base model. Upgrade to medium or large if you need better accuracy.
Clean, readable text with two sections:
- Transcript with timestamps for each segment
- Clean transcript without timestamps (perfect for reading)
Example:
Transcript: podcast_episode_123.mp3
Generated: 2025-11-04 10:30:00
================================================================================
[00:00 -> 00:05] Welcome to the show, today we're talking about AI.
[00:05 -> 00:12] This is a fascinating topic that affects everyone.
================================================================================
CLEAN TRANSCRIPT (no timestamps):
Welcome to the show, today we're talking about AI. This is a fascinating topic that affects everyone.
Structured data with full details including:
- Audio filename
- Detected language
- Complete text
- All segments with timestamps
- Generation timestamp
Standard subtitle format that can be used with video players or video editing software.
Whisper supports 99+ languages. Common language codes:
en- Englishes- Spanishfr- Frenchde- Germanit- Italianpt- Portugueseja- Japaneseko- Koreanzh- Chinese
Or leave blank for auto-detection!
-
First Run: The first time you run the tool, it will download the selected Whisper model (~100MB-3GB depending on size). This happens once per model.
-
Audio Quality: Higher quality audio = better transcriptions. Whisper is very robust though!
-
Processing Time:
- A 1-hour podcast takes ~5-10 minutes with
basemodel - Use
tinyfor quick tests - Use
mediumorlargefor important content
- A 1-hour podcast takes ~5-10 minutes with
-
Batch Processing: Process multiple files overnight if you have many podcasts.
-
Storage: Transcripts are small (typically <100KB per hour of audio).
pip install openai-whisperUse a smaller model:
python transcribe.py /path/to/podcasts --model tiny- Use a smaller model (
tinyorbase) - Ensure no other heavy processes are running
- Check if your CPU supports optimizations
- Try a larger model (
mediumorlarge) - Ensure audio quality is good
- Specify the language explicitly
# 1. Put your podcast MP3s in a folder
mkdir ~/podcasts_to_transcribe
# (copy your MP3 files there)
# 2. Run the transcription
python transcribe.py ~/podcasts_to_transcribe --model base
# 3. Find your transcripts
ls transcripts/
# 4. Open and enjoy!
open transcripts/podcast_episode.txtFound a bug or want a feature? Feel free to:
- Open an issue
- Submit a pull request
- Share your feedback
MIT License - feel free to use this tool for personal or commercial projects!
- Built with OpenAI Whisper
- Inspired by the need for accessible podcast transcripts
If you find this tool useful, please star the repo and share it with others!
Questions? Open an issue or check the Whisper documentation.