Local AI Video Creation Platform - Setup Guide

pip install opencv-python torch torchvision numpy tqdm pip install sk-video

Automated setup script for installing a complete local AI video creation pipeline on macOS/Windows/Linux. No cloud services required.

What Gets Installed

Phase 1: Prerequisites

Validates Python 3.10+, Git, and FFmpeg
Checks disk space (20-40 GB recommended)
Creates installation directory structure

Phase 2: ComfyUI Core

ComfyUI (visual node editor for Stable Diffusion)
Python virtual environment with all dependencies
ComfyUI-Manager (plugin management)
Launch scripts for easy startup

Phase 3: Video Generation

AnimateDiff (text/image to animation)
Video Helper Suite (video I/O nodes)
Stable Video Diffusion support (image to video)

Phase 4: Enhancement Tools

Real-ESRGAN (AI upscaling to 1080p/4K)
RIFE (frame interpolation for smooth 60fps)

Phase 5: Audio Tools

Coqui XTTS v2 (voice synthesis + cloning)
MusicGen/AudioCraft (AI music generation)
Example scripts

Quick Start

Install Everything

python3 setup_ai_video.py --all

Install Specific Phases

# Just prerequisites and ComfyUI
python3 setup_ai_video.py --phase 1 2

# Add video tools
python3 setup_ai_video.py --phase 3

# Add upscaling and audio
python3 setup_ai_video.py --phase 4 5

Custom Install Location

python3 setup_ai_video.py --all --dir ~/my-video-studio

Prerequisites

Before running the script, ensure you have:

Python 3.10 or newer
```
python3 --version
```
Git
```
git --version
```
FFmpeg
- macOS: brew install ffmpeg
- Linux: sudo apt install ffmpeg
- Windows: Download from ffmpeg.org
20-40 GB free disk space
Optional but recommended:
- Apple Silicon Mac, or
- NVIDIA GPU with 8+ GB VRAM, or
- Decent CPU (slower but works)

Usage After Installation

1. Launch ComfyUI

cd ~/ai-video-studio/ComfyUI
./launch.sh  # or launch.bat on Windows

Then open: http://127.0.0.1:8188

2. Download Models via ComfyUI-Manager

Click "Manager" button in ComfyUI web interface
Go to "Install Models" tab
Download:
- SDXL base model (or Flux)
- AnimateDiff motion modules
- (Optional) Stable Video Diffusion model

3. Create Your First Video

In ComfyUI Manager → Workflows → Load AnimateDiff example
Set your text prompt
Configure: 16-24 frames, 12-24 fps, 768×432 resolution
Click "Queue Prompt" to generate

4. Upscale with Real-ESRGAN

cd ~/ai-video-studio/tools/Real-ESRGAN
python inference_realesrgan.py -n RealESRGAN_x4plus -i input.mp4 -o output.mp4

5. Frame Interpolation with RIFE

cd ~/ai-video-studio/tools/RIFE
python inference_video.py --exp=2 --video input.mp4 --output smooth_60fps.mp4

6. Generate Voiceover

tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "Your narration text here" \
    --out_path voiceover.wav

7. Generate Background Music

cd ~/ai-video-studio/tools/audio
python generate_music_example.py

8. Combine Audio + Video

ffmpeg -i video.mp4 -i voice.wav -i music.wav \
  -filter_complex "[1:a]adelay=0|0[a1];[2:a]volume=0.35[a2];[a1][a2]amix=inputs=2[aout]" \
  -map 0:v -map "[aout]" -c:v libx264 -c:a aac -shortest final.mp4

Installation Directory Structure

~/ai-video-studio/
├── ComfyUI/
│   ├── .venv/                 # Python virtual environment
│   ├── custom_nodes/          # Plugins
│   │   ├── ComfyUI-Manager/
│   │   ├── ComfyUI-AnimateDiff-Evolved/
│   │   └── ComfyUI-VideoHelperSuite/
│   ├── models/                # AI models (downloaded via Manager)
│   └── launch.sh              # Startup script
│
└── tools/
    ├── Real-ESRGAN/           # Upscaling
    ├── RIFE/                  # Frame interpolation
    └── audio/                 # TTS & music generation
        └── generate_music_example.py

Phase Details

Phase 1: Prerequisites Check

Validates Python, Git, FFmpeg versions
Checks available disk space
Creates directory structure
Runtime: < 1 minute

Phase 2: ComfyUI Installation

Clones ComfyUI repository (~500 MB)
Creates isolated Python virtual environment
Installs PyTorch and dependencies (~2-5 GB)
Installs ComfyUI-Manager
Creates launch scripts
Runtime: 5-15 minutes (depending on network speed)

Phase 3: Video Tools

Installs AnimateDiff custom nodes
Installs Video Helper Suite
Sets up video generation capabilities
Runtime: 2-5 minutes

Phase 4: Enhancement Tools

Clones Real-ESRGAN repository
Installs upscaling dependencies
Clones RIFE for frame interpolation
Installs interpolation dependencies
Runtime: 5-10 minutes

Phase 5: Audio Tools

Installs Coqui TTS (~1-2 GB)
Installs MusicGen dependencies (~500 MB)
Creates example scripts
Runtime: 5-15 minutes

Total Installation Time: ~20-45 minutes (varies by network and system)

Troubleshooting

Import Errors in ComfyUI

If you see missing dependencies:

cd ~/ai-video-studio/ComfyUI
.venv/bin/pip install [missing-package]

GPU Not Detected

NVIDIA: Ensure CUDA drivers are installed
AMD: Check ROCm support
Apple Silicon: Should work out of box with MPS backend

Out of Memory Errors

Reduce resolution (e.g., 512×512 instead of 768×768)
Reduce batch size/frame count
Close other applications
Use CPU mode (slower but works)

FFmpeg Audio Sync Issues

Add -async 1 flag:

ffmpeg -i video.mp4 -i audio.wav -async 1 -c:v copy -c:a aac output.mp4

Command Reference

View All Options

python3 setup_ai_video.py --help

Check Installation Status

# ComfyUI
ls ~/ai-video-studio/ComfyUI

# All tools
ls ~/ai-video-studio/tools

# Verify Python packages
~/ai-video-studio/ComfyUI/.venv/bin/pip list

Uninstall

rm -rf ~/ai-video-studio

Performance Tips

First Run: Model downloads happen via ComfyUI-Manager (5-20 GB)
Generation Speed:
- CPU: 5-30 min per video
- GPU: 30 sec - 5 min per video
Start Small: Test with 512×512, 16 frames, then scale up
Batch Processing: Use ComfyUI's queue system for multiple videos

Hardware Recommendations

Minimum

CPU: 4+ cores
RAM: 16 GB
Storage: 40 GB free
Generation time: 10-30 min/video

Optimal

GPU: NVIDIA RTX 4090 (24 GB VRAM)
RAM: 64 GB
Storage: 500 GB SSD
Generation time: 30 sec - 2 min/video

Next Steps

Read CLAUDE.md for detailed workflow tutorials
Explore ComfyUI workflows in the Manager's workflow gallery
Join communities:
- r/StableDiffusion
- ComfyUI Discord
- Civitai.com (model sharing)

License

This setup script is provided as-is. Individual tools have their own licenses:

ComfyUI: GPL-3.0
Real-ESRGAN: BSD-3-Clause
RIFE: MIT
Coqui TTS: MPL-2.0
MusicGen: CC-BY-NC-4.0

Credits

Based on the comprehensive guide in CLAUDE.md. Tools by their respective authors and communities.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
enhance_video.py		enhance_video.py
setup_ai_video.py		setup_ai_video.py

License

andrewarrow/laiv

Folders and files

Latest commit

History

Repository files navigation

Local AI Video Creation Platform - Setup Guide

What Gets Installed

Phase 1: Prerequisites

Phase 2: ComfyUI Core

Phase 3: Video Generation

Phase 4: Enhancement Tools

Phase 5: Audio Tools

Quick Start

Install Everything

Install Specific Phases

Custom Install Location

Prerequisites

Usage After Installation

1. Launch ComfyUI

2. Download Models via ComfyUI-Manager

3. Create Your First Video

4. Upscale with Real-ESRGAN

5. Frame Interpolation with RIFE

6. Generate Voiceover

7. Generate Background Music

8. Combine Audio + Video

Installation Directory Structure

Phase Details

Phase 1: Prerequisites Check

Phase 2: ComfyUI Installation

Phase 3: Video Tools

Phase 4: Enhancement Tools

Phase 5: Audio Tools

Troubleshooting

Import Errors in ComfyUI

GPU Not Detected

Out of Memory Errors

FFmpeg Audio Sync Issues

Command Reference

View All Options

Check Installation Status

Uninstall

Performance Tips

Hardware Recommendations

Minimum

Recommended

Optimal

Next Steps

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages