Skip to content

Microservices-based solution for automatic video captioning, designed for 9:16 format videos (shorts). Generate AI-powered transcriptions and create styled captions with FFmpeg or Remotion.

Notifications You must be signed in to change notification settings

alehano/AutoCaptions

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Auto Captions

A comprehensive microservices-based solution for automatic video captioning, designed for 9:16 format videos (shorts). The system consists of four independent services that can be used together or separately, with a modern web interface for easy interaction.

๐Ÿ—๏ธ Architecture

Auto Captions
โ”œโ”€โ”€ transcriptions/     # Audio/video transcription service
โ”œโ”€โ”€ ffmpeg-captions/    # FFmpeg-based subtitle rendering
โ”œโ”€โ”€ remotion-captions/  # Remotion-based video processing
โ”œโ”€โ”€ web/                # Web interface for user interaction
โ”œโ”€โ”€ setup.sh            # Global setup script
โ””โ”€โ”€ docker-compose.yml  # Docker orchestration

๐Ÿ“ฆ Services Overview

๐ŸŽค Transcriptions Service

  • Port: 3001
  • Purpose: Extract audio from video/audio files and generate transcriptions
  • Technology: TypeScript, OpenAI Whisper, Whisper.cpp
  • Documentation: transcriptions/README.md

๐ŸŽฌ FFmpeg Captions Service

  • Port: 3002
  • Purpose: Generate captioned videos using FFmpeg with ASS subtitle styling
  • Technology: TypeScript, FFmpeg, ASS subtitles
  • Documentation: ffmpeg-captions/README.md

๐ŸŽจ Remotion Captions Service

  • Port: 3003
  • Purpose: Create highly customizable captioned videos with Remotion
  • Technology: TypeScript, Remotion, React-based styling
  • Documentation: remotion-captions/README.md

๐ŸŒ Web Interface

  • Port: 80
  • Purpose: User-friendly web interface for the entire caption generation workflow
  • Technology: PHP, JavaScript, Tailwind CSS
  • Features: File upload, transcription editing, service management, real-time preview
  • Documentation: web/README.md

Demo

demo.mp4

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 22+
  • npm or yarn
  • FFmpeg (required for all services)
  • PHP 8.4+ (for web interface)
  • Docker & Docker Compose (for containerized deployment)

Option 1: Docker Deployment (Recommended)

  1. Clone the repository:

    git clone <repository-url>
    cd AutoCaptions
  2. Start all services:

    docker-compose up -d
  3. Access the web interface:

    http://localhost:80
    

Option 2: Native Setup

  1. Clone the repository:

    git clone <repository-url>
    cd AutoCaptions
  2. Run global setup:

    chmod +x setup.sh
    ./setup.sh
  3. Start services individually:

    # Terminal 1 - Transcriptions
    cd transcriptions && npm run dev
    
    # Terminal 2 - FFmpeg Captions
    cd ffmpeg-captions && npm run dev
    
    # Terminal 3 - Remotion Captions
    cd remotion-captions && npm run dev
    
    # Terminal 4 - Web Interface
    cd web && php -S localhost:80

๐ŸŒ Access Points

Once running, the services will be available at:

๐ŸŽฎ Usage Workflows

Via Web Interface (Recommended)

  1. Open Web Interface: Navigate to http://localhost:80
  2. Upload Video: Drag and drop your 9:16 video file
  3. Generate Transcription: AI-powered speech-to-text processing
  4. Edit Captions: Fine-tune text, timing, and formatting
  5. Choose Rendering: Select FFmpeg (fast) or Remotion (advanced)
  6. Customize Styling: Fonts, colors, positioning, animations
  7. Download Result: Get your captioned video

Via Direct API Usage

# 1. Generate transcription
curl -X POST http://localhost:3001/api/transcribe \
  -F "file=@video.mp4" \
  -F "service=whisper-cpp"

# 2. Generate captioned video (FFmpeg)
curl -X POST http://localhost:3002/api/captions/generate \
  -H "Content-Type: application/json" \
  -d '{
    "data": {...},
    "video": "video.mp4"
  }'

# 3. Generate captioned video (Remotion)
curl -X POST http://localhost:3003/render \
  -H "Content-Type: application/json" \
  -d '{
    "video": "video.mp4",
    "transcription": {...},
    "props": {...}
  }'

๐Ÿ”ง Configuration

Service Configuration

Each service has its own .env configuration file. After running setup, review and customize:

  • transcriptions/.env - Whisper models and API keys
  • ffmpeg-captions/.env - FFmpeg paths and output settings
  • remotion-captions/.env - Remotion rendering configuration

Web Interface Configuration

The web interface provides a settings panel to configure all service URLs:

  1. Click the gear icon in the header
  2. Update service URLs:
    • Transcriptions: http://localhost:3001 (Docker: http://transcriptions:3001)
    • FFmpeg Captions: http://localhost:3002 (Docker: http://ffmpeg-captions:3002)
    • Remotion Captions: http://localhost:3003 (Docker: http://remotion-captions:3003)
  3. Test connections and save

๐Ÿ“Š Health Checks & Monitoring

Service Status

All services include health check endpoints:

# Check individual services
curl http://localhost:3001/health    # Transcriptions
curl http://localhost:3002/health    # FFmpeg Captions
curl http://localhost:3003/health    # Remotion Captions

๐Ÿ“„ License

  • ffmpeg-captions service is under the MIT License - see the LICENSE file for details.
  • transcriptions service is under the MIT License - see the LICENSE file for details.
  • remotion-captions service is under the Remotion License - see the LICENSE file for details.
  • web service is under the MIT License - see the LICENSE file for details.

๐Ÿ› Known bugs & Improvement to be made

  • Include the Remotion service in the web service
  • Convert videos to webm in the remotion-captions service rather than h264 to avoid installing Google Chrome (and thus enable the ARM64 build)
  • In the transcriptions service, add a fallback to whisper-cpp when openai-whisper is used but the API is down

Built with โค๏ธ

About

Microservices-based solution for automatic video captioning, designed for 9:16 format videos (shorts). Generate AI-powered transcriptions and create styled captions with FFmpeg or Remotion.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 43.6%
  • JavaScript 25.4%
  • TypeScript 23.8%
  • Shell 5.3%
  • Other 1.9%