This project is a prototype of an end-to-end pipeline that generates short, multi-scene videos from a single text prompt. It leverages a series of state-of-the-art, open-source AI models, each specialized for a different part of the creative process, all wrapped in a simple, user-friendly web interface.
The core workflow is designed to mimic a real production studio:
- The Director (LLM): Creates a script and storyboard.
- The Art Department (Text-to-Image): Generates concept art for each scene.
- The Animation Studio (Image-to-Video): Animates the concept art into video clips.
- Multi-Step Guided UI: A tabbed web interface (Gradio) guides the user through the video creation process.
- State-of-the-Art Models: Uses separate models for LLM, image generation, and video generation for best results.
- Modular Architecture: Clear folder/module separation so components can be swapped or upgraded easily.
- Automated Pipeline: Outputs from one model feed into the next with minimal manual overhead.
Step 1: Storyboard Generation | Step 2: Image Generation |
---|---|
Step 3: Video Clip Generation | Final Output Example |
- Director LLM:
Meta-Llama-3.1-8B-Instruct
- Image Generation:
black-forest-labs/FLUX.1-dev
- Video Generation:
Wan-AI/Wan2.2-T2V-A14B
- Web Framework:
Gradio
- Core Libraries:
transformers
,diffusers
,torch
- GPU Backend: AMD ROCm
These steps assume you're working on a Linux machine with an AMD GPU and have
git
installed.
git clone https://github.com/YourUsername/ai-video-creator.git
cd ai-video-creator
Create and activate a virtual environment, then install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Tip: Use
conda
orpipx
if you prefer those tools — adapt the steps accordingly.
The models are not included in this repo. Run the provided scripts to download them:
# Download the Llama 3.1 LLM
python download_llm.py
# Download the FLUX.1 Image Model
python download_image_model.py
# Download the Wan2.2 Video Model
# Note: Wan2.2 may require cloning a separate repo first — see the Wan2.2 repo page.
Place the downloaded model folders inside ./models/
(e.g., ./models/llama-3.1/
, ./models/flux1/
, ./models/wan2.2/
). The repo includes .gitkeep
placeholders demonstrating the expected structure.
Wan2.2 has its own dependencies. Install them inside the same virtual environment (or a dedicated one):
cd Wan2.2
pip install -r requirements.txt
cd ..
Note: You may need system-level packages (FFmpeg, libsndfile, build tools). Follow Wan2.2 repo instructions if you encounter errors.
If you store models outside ./models
, create a .env
or update config/*.yaml
with the correct paths. Example .env
:
MODEL_DIR=./models
LLM_PATH=./models/llama-3.1
IMAGE_MODEL_PATH=./models/flux1
VIDEO_MODEL_PATH=./models/wan2.2
Start the app from the project root:
python app.py
The app will load the models (this may take several minutes). Gradio will print a local URL such as http://localhost:7860
and, optionally, a public *.gradio.live
sharing URL.
Important: Model loading is resource-intensive. On ROCm setups, ensure PyTorch/ROCm and drivers are installed and tested first.
- Welcome Tab — Overview + quick-start instructions.
- Step 1: Storyboard Tab — Enter a video idea and click Generate Storyboard. The LLM outputs scenes, shot descriptions, and recommended camera moves/timings.
- Step 2: Images Tab — Generate and review concept art for each scene. Regenerate or refine prompts as needed.
- Step 3: Videos Tab — Render short video clips from the images. Preview, download, and iterate.
Outputs are saved to ./outputs/
with a timestamped folder for each run.
If you'd like help setting this up for ROCm, creating .env
/config.sample.yaml
, or automating downloads with scripts, open an issue or reach out to ni3.singh.r@gmail.com
(replace with your contact).
Generated with ❤️