Skip to content

Transcribe podcast MP3s into clean text using OpenAI Whisper. Includes Neurotech Pub transcripts

License

Notifications You must be signed in to change notification settings

LydNot/transcripts

Repository files navigation

πŸŽ™οΈ Podcast Transcription Tool

A powerful and easy-to-use tool for transcribing podcast MP3 files into clean, readable transcripts using OpenAI's Whisper AI model.

✨ Features

  • High Accuracy: Uses OpenAI's Whisper model for state-of-the-art transcription
  • Batch Processing: Transcribe entire folders of MP3 files at once
  • Multiple Formats: Save transcripts as TXT, JSON, and SRT (subtitle) formats
  • Timestamps: Optional timestamps for each segment
  • Language Support: Auto-detects language or specify manually
  • Clean Output: Beautiful, readable transcripts
  • Progress Tracking: Visual progress bars for long transcriptions
  • Flexible Models: Choose from 5 model sizes (speed vs accuracy)

πŸ“‹ Requirements

  • Python 3.8 or higher
  • macOS, Linux, or Windows
  • At least 2GB RAM (more for larger models)
  • Internet connection (first time only, to download models)

πŸš€ Installation

  1. Clone or download this repository

  2. Install dependencies:

    pip install -r requirements.txt

    Note for macOS with Apple Silicon (M1/M2/M3): If you have issues, install PyTorch first:

    pip install torch torchaudio
    pip install -r requirements.txt
  3. Verify installation:

    python transcribe.py --help

πŸ“– Usage

Basic Usage

Transcribe all MP3 files in a folder:

python transcribe.py /path/to/your/podcasts

This will:

  • Find all MP3 files in the specified folder
  • Transcribe each file using the base model
  • Save transcripts to a transcripts/ folder
  • Generate both TXT and JSON formats

Advanced Options

Use a more accurate model:

python transcribe.py /path/to/podcasts --model medium

Specify output directory:

python transcribe.py /path/to/podcasts --output my_transcripts

Specify language (faster than auto-detect):

python transcribe.py /path/to/podcasts --language en

Choose output formats:

# Save as TXT only
python transcribe.py /path/to/podcasts --formats txt

# Save all formats (TXT, JSON, SRT)
python transcribe.py /path/to/podcasts --formats txt json srt

Complete example:

python transcribe.py ~/Downloads/podcasts \
  --model medium \
  --output transcripts \
  --language en \
  --formats txt json srt

🎯 Model Sizes

Choose the right model for your needs:

Model Speed Accuracy RAM Usage Best For
tiny ⚑⚑⚑⚑⚑ ⭐⭐ ~1 GB Quick drafts, testing
base ⚑⚑⚑⚑ ⭐⭐⭐ ~1 GB Default, good balance
small ⚑⚑⚑ ⭐⭐⭐⭐ ~2 GB Better accuracy
medium ⚑⚑ ⭐⭐⭐⭐⭐ ~5 GB High quality transcripts
large ⚑ ⭐⭐⭐⭐⭐⭐ ~10 GB Professional use

Recommendation: Start with base model. Upgrade to medium or large if you need better accuracy.

πŸ“ Output Formats

TXT Format

Clean, readable text with two sections:

  1. Transcript with timestamps for each segment
  2. Clean transcript without timestamps (perfect for reading)

Example:

Transcript: podcast_episode_123.mp3
Generated: 2025-11-04 10:30:00
================================================================================

[00:00 -> 00:05] Welcome to the show, today we're talking about AI.
[00:05 -> 00:12] This is a fascinating topic that affects everyone.

================================================================================
CLEAN TRANSCRIPT (no timestamps):

Welcome to the show, today we're talking about AI. This is a fascinating topic that affects everyone.

JSON Format

Structured data with full details including:

  • Audio filename
  • Detected language
  • Complete text
  • All segments with timestamps
  • Generation timestamp

SRT Format

Standard subtitle format that can be used with video players or video editing software.

πŸ› οΈ Configuration

Supported Languages

Whisper supports 99+ languages. Common language codes:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ja - Japanese
  • ko - Korean
  • zh - Chinese

Or leave blank for auto-detection!

πŸ’‘ Tips & Best Practices

  1. First Run: The first time you run the tool, it will download the selected Whisper model (~100MB-3GB depending on size). This happens once per model.

  2. Audio Quality: Higher quality audio = better transcriptions. Whisper is very robust though!

  3. Processing Time:

    • A 1-hour podcast takes ~5-10 minutes with base model
    • Use tiny for quick tests
    • Use medium or large for important content
  4. Batch Processing: Process multiple files overnight if you have many podcasts.

  5. Storage: Transcripts are small (typically <100KB per hour of audio).

πŸ”§ Troubleshooting

"No module named 'whisper'"

pip install openai-whisper

"Out of memory" error

Use a smaller model:

python transcribe.py /path/to/podcasts --model tiny

Slow transcription

  • Use a smaller model (tiny or base)
  • Ensure no other heavy processes are running
  • Check if your CPU supports optimizations

Poor accuracy

  • Try a larger model (medium or large)
  • Ensure audio quality is good
  • Specify the language explicitly

πŸ“ Example Workflow

# 1. Put your podcast MP3s in a folder
mkdir ~/podcasts_to_transcribe
# (copy your MP3 files there)

# 2. Run the transcription
python transcribe.py ~/podcasts_to_transcribe --model base

# 3. Find your transcripts
ls transcripts/

# 4. Open and enjoy!
open transcripts/podcast_episode.txt

🀝 Contributing

Found a bug or want a feature? Feel free to:

  • Open an issue
  • Submit a pull request
  • Share your feedback

πŸ“„ License

MIT License - feel free to use this tool for personal or commercial projects!

πŸ™ Acknowledgments

  • Built with OpenAI Whisper
  • Inspired by the need for accessible podcast transcripts

πŸŽ‰ Happy Transcribing!

If you find this tool useful, please star the repo and share it with others!


Questions? Open an issue or check the Whisper documentation.

About

Transcribe podcast MP3s into clean text using OpenAI Whisper. Includes Neurotech Pub transcripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published