Convert video audio to subtitles using AI-powered transcription.
audio-to-subs is a Python CLI tool that extracts audio from video files, transcribes it using Mistral AI's Voxtral Mini model, and generates accurate subtitle files in multiple formats (SRT, VTT, WebVTT, SBV).
- Extract audio from video files (.mp4, .mkv, .avi, etc.)
- AI-powered transcription using Mistral AI Voxtral Mini
- Generate timestamped subtitles in multiple formats:
- SRT (SubRip)
- VTT (WebVTT)
- SBV (YouTube)
- Single video and batch processing
- Configuration file support (.audio-to-subs.yaml)
- Enhanced Progress Reporting
- Visual progress bars with percentage indicators
- Real-time upload progress tracking (1MB chunks)
- Multi-stage progress monitoring (audio extraction, upload, transcription, generation)
- Segment-level progress for large files
- Configurable verbosity levels
- Subtitle Naming Conventions
- Automatic language code inclusion in filenames
- Bazarr and media player compatibility
- ISO 639-1/2 language code support (en, fr, es, de, etc.)
- Proper filename format:
filename.language_code.format
- Container-first development (Podman)
- Full test coverage with TDD/BDD
# Build production image
podman build -t audio-to-subs:latest .Set your Mistral AI API key:
# Using Podman secrets (recommended)
echo "your_api_key" | podman secret create mistral_api_key -
# Or using environment variable
export MISTRAL_API_KEY=your_api_key# Convert single video (preserves your UID/GID for output files)
podman run --rm \
--userns=keep-id \
--secret mistral_api_key,type=env,target=MISTRAL_API_KEY \
-v ./videos:/input:ro,Z \
-v ./subtitles:/output:Z \
audio-to-subs:latest -i /input/video.mp4 -o /output/video.srt
# With progress reporting (visual progress bars)
podman run --rm \
--userns=keep-id \
--secret mistral_api_key,type=env,target=MISTRAL_API_KEY \
-v ./videos:/input:ro,Z \
-v ./subtitles:/output:Z \
audio-to-subs:latest -i /input/video.mp4 -o /output/video.srt --progress
# Specify output format (default: srt)
podman run --rm \
--userns=keep-id \
--secret mistral_api_key,type=env,target=MISTRAL_API_KEY \
-v ./videos:/input:ro,Z \
-v ./subtitles:/output:Z \
audio-to-subs:latest -i /input/video.mp4 -o /output/video.vtt --format vttImportant: Use --userns=keep-id to preserve your user ID/GID on output files, preventing permission issues.
Create .audio-to-subs.yaml in your working directory:
jobs:
- input: ./videos/video1.mp4
output: ./subtitles/video1.srt
- input: ./videos/video2.mp4
output: ./subtitles/video2.vtt
format: vttThen run:
podman run --rm \
--userns=keep-id \
--secret mistral_api_key,type=env,target=MISTRAL_API_KEY \
-v $(pwd):/work:Z,rslave \
audio-to-subs:latest --config /work/.audio-to-subs.yamlOr with docker-compose:
podman-compose upThe configuration file supports batch processing with custom settings:
# Default settings for all jobs
defaults:
format: srt # srt, vtt, webvtt, sbv
temp_dir: /tmp/audio-to-subs
jobs:
- input: videos/meeting.mp4
output: subtitles/meeting.srt
- input: videos/presentation.mkv
output: subtitles/presentation.vtt
format: vtt # Override default format
- input: videos/tutorial.avi
output: subtitles/tutorial.sbv
format: sbvThe system automatically generates subtitle filenames following industry standards for Bazarr and media player compatibility.
When you specify a language code using the --language parameter, the generated subtitle filename will include the language code:
audio-to-subs -i video.mp4 -o output.srt --language en
# Generates: output.en.srt
audio-to-subs -i movie.mp4 -o subtitles.srt --language fr
# Generates: subtitles.fr.srtThe system supports ISO 639-1 (2-letter) and ISO 639-2 (3-letter) language codes:
- 2-letter codes: en, fr, es, de, it, pt, ru, zh, ja, ko
- 3-letter codes: eng, fra, spa, deu, ita, por, rus, zho, jpn, kor
base_filename.language_code.format
Examples:
movie.en.srt(English SRT subtitles)show.s01e01.fr.vtt(French WebVTT subtitles)documentary.de.sbv(German YouTube subtitles)
These naming conventions are supported by:
- Plex: Automatic subtitle detection and selection
- Jellyfin: Full subtitle naming support
- Kodi: Recognizes standard naming patterns
- VLC: Automatic subtitle loading
- MPV: Recognizes subtitle files with matching base names
- Bazarr: Expected subtitle file naming for automated management
# Generate English subtitles
audio-to-subs -i video.mp4 -o output.srt --language en
# Generate French subtitles
audio-to-subs -i video.mp4 -o output.srt --language fr
# Generate Spanish subtitles in VTT format
audio-to-subs -i video.mp4 -o output.vtt --language es --format vtt
# Batch processing with language codes
# .audio-to-subs.yaml
jobs:
- input: videos/movie.mp4
output: subtitles/movie.srt
# Note: Language is set via CLI --language parameterWide compatibility with video players and editing software.
1
00:00:01,000 --> 00:00:05,000
First subtitle line
2
00:00:05,500 --> 00:00:10,000
Second subtitle line
Web standard subtitle format.
WEBVTT
00:00:01.000 --> 00:00:05.000
First subtitle line
00:00:05.500 --> 00:00:10.000
Second subtitle line
WebVTT with optional metadata and styling.
YouTube's legacy subtitle format.
0:00:01,000
0:00:05,000
First subtitle line
0:00:05,500
0:00:10,000
Second subtitle line
# Clone repository
git clone https://github.com/guiand888/audio-to-subs.git
cd audio-to-subs
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e ".[dev]"
# Verify installation
audio-to-subs --help# Build development container
podman build -f Dockerfile.dev -t audio-to-subs:dev .
# Build production container
podman build -t audio-to-subs:latest .
# Run with Podman
podman run --rm -it audio-to-subs:latest --help# With coverage report
pytest tests/ -v --cov=src --cov-report=html
# Quick test run
pytest tests/ -v --no-cov
# Run specific test file
pytest tests/test_pipeline.py -v# Run feature tests
pytest features/steps/ -v
# Run specific feature
pytest features/steps/audio_steps.py -vCurrent test coverage: 120 tests passing, 3 skipped
- Audio extraction: 7 tests
- Subtitle generation: 14 tests
- Pipeline orchestration: 7 tests
- CLI interface: 9 tests
- Configuration parsing: 18 tests
- Format conversions: 25 tests
- Audio splitting: 25 tests
- Logging: 12 tests
- Integration tests: 3 tests (skipped - require API key)
Target coverage: >80% ✅
Error: FFmpeg not found or not in PATH
Solution:
# Install FFmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Verify installation
ffmpeg -versionError: Error: API key required. Provide with --api-key or set MISTRAL_API_KEY
Solution:
# Set environment variable
export MISTRAL_API_KEY=your_api_key
# Or pass directly
audio-to-subs -i video.mp4 -o output.srt --api-key your_api_key
# Get API key from https://console.mistral.aiError: Permission denied: 'output.srt'
Solution (with Podman):
# Use --userns=keep-id to preserve user permissions
podman run --rm --userns=keep-id \
--secret mistral_api_key,type=env,target=MISTRAL_API_KEY \
-v ./videos:/input:ro,Z \
-v ./subtitles:/output:Z \
audio-to-subs:latest -i /input/video.mp4 -o /output/video.srtIssue: Processing takes a long time or runs out of memory
Solution:
- Audio files >15 minutes are automatically split into segments
- Each segment is transcribed separately
- Timestamps are automatically adjusted
- Temporary files are cleaned up automatically
Error: Audio extraction failed: Unsupported video format
Solution:
# Convert video to supported format first
ffmpeg -i input.mov -c:v libx264 -c:a aac output.mp4
# Supported formats: mp4, mkv, avi, mov, flv, wmv, webmaudio-to-subs/
├── src/ # Source code
│ ├── __main__.py # Entry point
│ ├── cli.py # CLI interface
│ ├── pipeline.py # Main orchestrator
│ ├── audio_extractor.py # FFmpeg wrapper
│ ├── audio_splitter.py # Large file handling
│ ├── transcription_client.py # Mistral AI client
│ ├── subtitle_generator.py # Format generators
│ ├── config_parser.py # YAML config parsing
│ └── logging_config.py # Logging setup
├── tests/ # Unit tests (120 tests)
├── features/ # BDD scenarios
│ └── steps/ # Step implementations
├── Dockerfile # Production container
├── Dockerfile.dev # Development container
├── docker-compose.yml # Compose configuration
├── pyproject.toml # Project metadata
└── README.md # This file
-
Write tests first (TDD)
pytest tests/test_new_feature.py -v
-
Implement feature
# Edit src/new_feature.py -
Run tests
pytest tests/ -v --cov=src
-
Code quality checks
black src/ ruff check src/ mypy src/
# Build dev container
podman build -f Dockerfile.dev -t audio-to-subs:dev .
# Run tests in container
podman run --rm -v .:/app:Z audio-to-subs:dev pytest tests/ -v
# Interactive shell
podman run --rm -it -v .:/app:Z audio-to-subs:dev bash
# Run CLI in container
podman run --rm -v ./videos:/input:ro,Z -v ./output:/output:Z \
-e MISTRAL_API_KEY=your_key \
audio-to-subs:dev -i /input/video.mp4 -o /output/video.srtFor programmatic usage:
from src.pipeline import Pipeline
# Initialize pipeline
pipeline = Pipeline(api_key="your_key")
# Process single video
output = pipeline.process_video(
video_path="video.mp4",
output_path="subtitles.srt",
output_format="srt" # srt, vtt, webvtt, sbv
)
# Process batch
jobs = [
{"input": "video1.mp4", "output": "sub1.srt"},
{"input": "video2.mp4", "output": "sub2.vtt", "format": "vtt"}
]
results = pipeline.process_batch(jobs)
# With progress callback (enhanced with percentages)
def on_progress(message, percentage=None):
if percentage:
print(f"[Progress {percentage}%] {message}")
else:
print(f"[Progress] {message}")
pipeline = Pipeline(
api_key="your_key",
progress_callback=on_progress,
verbose_progress=True # Show upload and segment progress with percentages
)The enhanced progress reporting system provides detailed feedback during processing:
- Audio Extraction (0-25%): Extracting audio from video file
- Audio Upload (25-50%): Uploading audio to Mistral AI (with chunked progress)
- Transcription Processing (50-75%): Processing transcription results
- Subtitle Generation (75-100%): Generating final subtitle files
--progress: Show visual progress bars with percentage indicators--verbose: Show detailed progress messages- Combine both for comprehensive progress reporting
def progress_callback(message: str, percentage: int = None):
"""
Progress callback function signature
Args:
message: Progress message text
percentage: Optional percentage (0-100) for progress bars
"""
pass[Progress 10%] Extracting audio from video...
[Progress 25%] Audio extraction complete
[Progress 30%] Audio ready for transcription
[Progress 30%] Transcribing audio with Mistral AI...
[Progress 30%] Uploading segment 1/1: 0.0/1.5 MB (0%)
[Progress 35%] Uploading segment 1/1: 0.5/1.5 MB (33%)
[Progress 45%] Uploading segment 1/1: 1.5/1.5 MB (100%)
[Progress 75%] Transcription processing complete
[Progress 75%] Generating SRT subtitles...
[Progress 100%] Subtitle generation complete
[Progress 100%] Complete! Subtitles generated successfully.
- Files are uploaded in 1MB chunks for smooth progress updates
- Real-time percentage tracking during upload
- Segment-level progress for large files split into multiple parts
- Visual progress bars show upload completion
# Enhanced progress callback with upload tracking
def detailed_progress(message, percentage=None):
if "Uploading" in message and percentage:
print(f"📤 {message}")
elif percentage:
print(f"[{percentage:3d}%] {message}")
else:
print(f"→ {message}")
pipeline = Pipeline(
api_key="your_key",
progress_callback=detailed_progress,
verbose_progress=True
)
# Process with detailed progress
pipeline.process_video("video.mp4", "subtitles.srt")- Runtime: Podman or Docker
- API: Mistral AI API key (get from https://console.mistral.ai)
- Python: 3.9+ (for local development)
- FFmpeg: 4.0+ (included in containers)
- Single video: ~1-2 minutes (depends on video length and API response time)
- Batch processing: Processes videos sequentially
- Memory usage: ~200MB base + audio buffer
- Disk usage: Temporary audio files cleaned up automatically
This project follows strict development standards:
- Code Style: Black formatter, Ruff linter
- Type Checking: MyPy with strict mode
- Testing: TDD/BDD with >80% coverage
- Git Workflow: Feature branches, pull requests, code review
See dev/QUALITY.md for detailed standards.
Comprehensive development rules, coding standards, and project guidelines are maintained in a separate repository and referenced during development. These include standards for:
- Code quality and style
- Testing methodologies (TDD/BDD)
- Git workflow and collaboration
- Architecture and design patterns
- Deployment practices
For access to these standards, contact the project maintainers or see the development team setup documentation.
GPLv3 - See LICENSE file for details.
Guillaume Andre (@guiand888)