Skip to content

andrewarrow/laiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local AI Video Creation Platform - Setup Guide

pip install opencv-python torch torchvision numpy tqdm pip install sk-video

Automated setup script for installing a complete local AI video creation pipeline on macOS/Windows/Linux. No cloud services required.

What Gets Installed

Phase 1: Prerequisites

  • Validates Python 3.10+, Git, and FFmpeg
  • Checks disk space (20-40 GB recommended)
  • Creates installation directory structure

Phase 2: ComfyUI Core

  • ComfyUI (visual node editor for Stable Diffusion)
  • Python virtual environment with all dependencies
  • ComfyUI-Manager (plugin management)
  • Launch scripts for easy startup

Phase 3: Video Generation

  • AnimateDiff (text/image to animation)
  • Video Helper Suite (video I/O nodes)
  • Stable Video Diffusion support (image to video)

Phase 4: Enhancement Tools

  • Real-ESRGAN (AI upscaling to 1080p/4K)
  • RIFE (frame interpolation for smooth 60fps)

Phase 5: Audio Tools

  • Coqui XTTS v2 (voice synthesis + cloning)
  • MusicGen/AudioCraft (AI music generation)
  • Example scripts

Quick Start

Install Everything

python3 setup_ai_video.py --all

Install Specific Phases

# Just prerequisites and ComfyUI
python3 setup_ai_video.py --phase 1 2

# Add video tools
python3 setup_ai_video.py --phase 3

# Add upscaling and audio
python3 setup_ai_video.py --phase 4 5

Custom Install Location

python3 setup_ai_video.py --all --dir ~/my-video-studio

Prerequisites

Before running the script, ensure you have:

  1. Python 3.10 or newer

    python3 --version
  2. Git

    git --version
  3. FFmpeg

    • macOS: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
    • Windows: Download from ffmpeg.org
  4. 20-40 GB free disk space

  5. Optional but recommended:

    • Apple Silicon Mac, or
    • NVIDIA GPU with 8+ GB VRAM, or
    • Decent CPU (slower but works)

Usage After Installation

1. Launch ComfyUI

cd ~/ai-video-studio/ComfyUI
./launch.sh  # or launch.bat on Windows

Then open: http://127.0.0.1:8188

2. Download Models via ComfyUI-Manager

  • Click "Manager" button in ComfyUI web interface
  • Go to "Install Models" tab
  • Download:
    • SDXL base model (or Flux)
    • AnimateDiff motion modules
    • (Optional) Stable Video Diffusion model

3. Create Your First Video

  1. In ComfyUI Manager → Workflows → Load AnimateDiff example
  2. Set your text prompt
  3. Configure: 16-24 frames, 12-24 fps, 768×432 resolution
  4. Click "Queue Prompt" to generate

4. Upscale with Real-ESRGAN

cd ~/ai-video-studio/tools/Real-ESRGAN
python inference_realesrgan.py -n RealESRGAN_x4plus -i input.mp4 -o output.mp4

5. Frame Interpolation with RIFE

cd ~/ai-video-studio/tools/RIFE
python inference_video.py --exp=2 --video input.mp4 --output smooth_60fps.mp4

6. Generate Voiceover

tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "Your narration text here" \
    --out_path voiceover.wav

7. Generate Background Music

cd ~/ai-video-studio/tools/audio
python generate_music_example.py

8. Combine Audio + Video

ffmpeg -i video.mp4 -i voice.wav -i music.wav \
  -filter_complex "[1:a]adelay=0|0[a1];[2:a]volume=0.35[a2];[a1][a2]amix=inputs=2[aout]" \
  -map 0:v -map "[aout]" -c:v libx264 -c:a aac -shortest final.mp4

Installation Directory Structure

~/ai-video-studio/
├── ComfyUI/
│   ├── .venv/                 # Python virtual environment
│   ├── custom_nodes/          # Plugins
│   │   ├── ComfyUI-Manager/
│   │   ├── ComfyUI-AnimateDiff-Evolved/
│   │   └── ComfyUI-VideoHelperSuite/
│   ├── models/                # AI models (downloaded via Manager)
│   └── launch.sh              # Startup script
│
└── tools/
    ├── Real-ESRGAN/           # Upscaling
    ├── RIFE/                  # Frame interpolation
    └── audio/                 # TTS & music generation
        └── generate_music_example.py

Phase Details

Phase 1: Prerequisites Check

  • Validates Python, Git, FFmpeg versions
  • Checks available disk space
  • Creates directory structure
  • Runtime: < 1 minute

Phase 2: ComfyUI Installation

  • Clones ComfyUI repository (~500 MB)
  • Creates isolated Python virtual environment
  • Installs PyTorch and dependencies (~2-5 GB)
  • Installs ComfyUI-Manager
  • Creates launch scripts
  • Runtime: 5-15 minutes (depending on network speed)

Phase 3: Video Tools

  • Installs AnimateDiff custom nodes
  • Installs Video Helper Suite
  • Sets up video generation capabilities
  • Runtime: 2-5 minutes

Phase 4: Enhancement Tools

  • Clones Real-ESRGAN repository
  • Installs upscaling dependencies
  • Clones RIFE for frame interpolation
  • Installs interpolation dependencies
  • Runtime: 5-10 minutes

Phase 5: Audio Tools

  • Installs Coqui TTS (~1-2 GB)
  • Installs MusicGen dependencies (~500 MB)
  • Creates example scripts
  • Runtime: 5-15 minutes

Total Installation Time: ~20-45 minutes (varies by network and system)

Troubleshooting

Import Errors in ComfyUI

If you see missing dependencies:

cd ~/ai-video-studio/ComfyUI
.venv/bin/pip install [missing-package]

GPU Not Detected

  • NVIDIA: Ensure CUDA drivers are installed
  • AMD: Check ROCm support
  • Apple Silicon: Should work out of box with MPS backend

Out of Memory Errors

  • Reduce resolution (e.g., 512×512 instead of 768×768)
  • Reduce batch size/frame count
  • Close other applications
  • Use CPU mode (slower but works)

FFmpeg Audio Sync Issues

Add -async 1 flag:

ffmpeg -i video.mp4 -i audio.wav -async 1 -c:v copy -c:a aac output.mp4

Command Reference

View All Options

python3 setup_ai_video.py --help

Check Installation Status

# ComfyUI
ls ~/ai-video-studio/ComfyUI

# All tools
ls ~/ai-video-studio/tools

# Verify Python packages
~/ai-video-studio/ComfyUI/.venv/bin/pip list

Uninstall

rm -rf ~/ai-video-studio

Performance Tips

  1. First Run: Model downloads happen via ComfyUI-Manager (5-20 GB)
  2. Generation Speed:
    • CPU: 5-30 min per video
    • GPU: 30 sec - 5 min per video
  3. Start Small: Test with 512×512, 16 frames, then scale up
  4. Batch Processing: Use ComfyUI's queue system for multiple videos

Hardware Recommendations

Minimum

  • CPU: 4+ cores
  • RAM: 16 GB
  • Storage: 40 GB free
  • Generation time: 10-30 min/video

Recommended

  • GPU: NVIDIA RTX 3060+ (8 GB VRAM) or Apple M1/M2
  • RAM: 32 GB
  • Storage: 100 GB free (SSD)
  • Generation time: 1-5 min/video

Optimal

  • GPU: NVIDIA RTX 4090 (24 GB VRAM)
  • RAM: 64 GB
  • Storage: 500 GB SSD
  • Generation time: 30 sec - 2 min/video

Next Steps

  1. Read CLAUDE.md for detailed workflow tutorials
  2. Explore ComfyUI workflows in the Manager's workflow gallery
  3. Join communities:
    • r/StableDiffusion
    • ComfyUI Discord
    • Civitai.com (model sharing)

License

This setup script is provided as-is. Individual tools have their own licenses:

  • ComfyUI: GPL-3.0
  • Real-ESRGAN: BSD-3-Clause
  • RIFE: MIT
  • Coqui TTS: MPL-2.0
  • MusicGen: CC-BY-NC-4.0

Credits

Based on the comprehensive guide in CLAUDE.md. Tools by their respective authors and communities.

About

local ai video

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages