A comprehensive microservices-based solution for automatic video captioning, designed for 9:16 format videos (shorts). The system consists of four independent services that can be used together or separately, with a modern web interface for easy interaction.
Auto Captions
โโโ transcriptions/ # Audio/video transcription service
โโโ ffmpeg-captions/ # FFmpeg-based subtitle rendering
โโโ remotion-captions/ # Remotion-based video processing
โโโ web/ # Web interface for user interaction
โโโ setup.sh # Global setup script
โโโ docker-compose.yml # Docker orchestration
- Port: 3001
- Purpose: Extract audio from video/audio files and generate transcriptions
- Technology: TypeScript, OpenAI Whisper, Whisper.cpp
- Documentation:
transcriptions/README.md
- Port: 3002
- Purpose: Generate captioned videos using FFmpeg with ASS subtitle styling
- Technology: TypeScript, FFmpeg, ASS subtitles
- Documentation:
ffmpeg-captions/README.md
- Port: 3003
- Purpose: Create highly customizable captioned videos with Remotion
- Technology: TypeScript, Remotion, React-based styling
- Documentation:
remotion-captions/README.md
- Port: 80
- Purpose: User-friendly web interface for the entire caption generation workflow
- Technology: PHP, JavaScript, Tailwind CSS
- Features: File upload, transcription editing, service management, real-time preview
- Documentation:
web/README.md
demo.mp4
- Node.js 22+
- npm or yarn
- FFmpeg (required for all services)
- PHP 8.4+ (for web interface)
- Docker & Docker Compose (for containerized deployment)
-
Clone the repository:
git clone <repository-url> cd AutoCaptions
-
Start all services:
docker-compose up -d
-
Access the web interface:
http://localhost:80
-
Clone the repository:
git clone <repository-url> cd AutoCaptions
-
Run global setup:
chmod +x setup.sh ./setup.sh
-
Start services individually:
# Terminal 1 - Transcriptions cd transcriptions && npm run dev # Terminal 2 - FFmpeg Captions cd ffmpeg-captions && npm run dev # Terminal 3 - Remotion Captions cd remotion-captions && npm run dev # Terminal 4 - Web Interface cd web && php -S localhost:80
Once running, the services will be available at:
- Web Interface: http://localhost:80 (Primary user interface)
- Transcriptions API: http://localhost:3001
- FFmpeg Captions API: http://localhost:3002
- Remotion Captions API: http://localhost:3003
- Open Web Interface: Navigate to http://localhost:80
- Upload Video: Drag and drop your 9:16 video file
- Generate Transcription: AI-powered speech-to-text processing
- Edit Captions: Fine-tune text, timing, and formatting
- Choose Rendering: Select FFmpeg (fast) or Remotion (advanced)
- Customize Styling: Fonts, colors, positioning, animations
- Download Result: Get your captioned video
# 1. Generate transcription
curl -X POST http://localhost:3001/api/transcribe \
-F "file=@video.mp4" \
-F "service=whisper-cpp"
# 2. Generate captioned video (FFmpeg)
curl -X POST http://localhost:3002/api/captions/generate \
-H "Content-Type: application/json" \
-d '{
"data": {...},
"video": "video.mp4"
}'
# 3. Generate captioned video (Remotion)
curl -X POST http://localhost:3003/render \
-H "Content-Type: application/json" \
-d '{
"video": "video.mp4",
"transcription": {...},
"props": {...}
}'Each service has its own .env configuration file. After running setup, review and customize:
transcriptions/.env- Whisper models and API keysffmpeg-captions/.env- FFmpeg paths and output settingsremotion-captions/.env- Remotion rendering configuration
The web interface provides a settings panel to configure all service URLs:
- Click the gear icon in the header
- Update service URLs:
- Transcriptions:
http://localhost:3001(Docker:http://transcriptions:3001) - FFmpeg Captions:
http://localhost:3002(Docker:http://ffmpeg-captions:3002) - Remotion Captions:
http://localhost:3003(Docker:http://remotion-captions:3003)
- Transcriptions:
- Test connections and save
All services include health check endpoints:
# Check individual services
curl http://localhost:3001/health # Transcriptions
curl http://localhost:3002/health # FFmpeg Captions
curl http://localhost:3003/health # Remotion Captionsffmpeg-captionsservice is under the MIT License - see the LICENSE file for details.transcriptionsservice is under the MIT License - see the LICENSE file for details.remotion-captionsservice is under the Remotion License - see the LICENSE file for details.webservice is under the MIT License - see the LICENSE file for details.
- Include the Remotion service in the web service
- Convert videos to
webmin theremotion-captionsservice rather thanh264to avoid installing Google Chrome (and thus enable the ARM64 build) - In the
transcriptionsservice, add a fallback to whisper-cpp when openai-whisper is used but the API is down
Built with โค๏ธ