Skip to content
/ ARON Public

ARON — A unified AI pipeline for text, image, video, and audio generation. Combine multimodal models in one platform for seamless AI-powered content creation.

Notifications You must be signed in to change notification settings

NI3singh/ARON

Repository files navigation

🎬 AI Video Creator Pipeline

This project is a prototype of an end-to-end pipeline that generates short, multi-scene videos from a single text prompt. It leverages a series of state-of-the-art, open-source AI models, each specialized for a different part of the creative process, all wrapped in a simple, user-friendly web interface.

The core workflow is designed to mimic a real production studio:

  1. The Director (LLM): Creates a script and storyboard.
  2. The Art Department (Text-to-Image): Generates concept art for each scene.
  3. The Animation Studio (Image-to-Video): Animates the concept art into video clips.

✨ Features

  • Multi-Step Guided UI: A tabbed web interface (Gradio) guides the user through the video creation process.
  • State-of-the-Art Models: Uses separate models for LLM, image generation, and video generation for best results.
  • Modular Architecture: Clear folder/module separation so components can be swapped or upgraded easily.
  • Automated Pipeline: Outputs from one model feed into the next with minimal manual overhead.

🖼️ Screenshots

Step 1: Storyboard Generation Step 2: Image Generation
Storyboard UI Image Generator UI
Step 3: Video Clip Generation Final Output Example
Video Generator UI Watch the demo

🛠️ Technology Stack

  • Director LLM: Meta-Llama-3.1-8B-Instruct
  • Image Generation: black-forest-labs/FLUX.1-dev
  • Video Generation: Wan-AI/Wan2.2-T2V-A14B
  • Web Framework: Gradio
  • Core Libraries: transformers, diffusers, torch
  • GPU Backend: AMD ROCm

🚀 Quick Start

These steps assume you're working on a Linux machine with an AMD GPU and have git installed.

1. Clone the repository

git clone https://github.com/YourUsername/ai-video-creator.git
cd ai-video-creator

2. Set up the Python environment

Create and activate a virtual environment, then install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Tip: Use conda or pipx if you prefer those tools — adapt the steps accordingly.

3. Download the AI models

The models are not included in this repo. Run the provided scripts to download them:

# Download the Llama 3.1 LLM
python download_llm.py

# Download the FLUX.1 Image Model
python download_image_model.py

# Download the Wan2.2 Video Model
# Note: Wan2.2 may require cloning a separate repo first — see the Wan2.2 repo page.

Place the downloaded model folders inside ./models/ (e.g., ./models/llama-3.1/, ./models/flux1/, ./models/wan2.2/). The repo includes .gitkeep placeholders demonstrating the expected structure.

4. Install Wan2.2-specific dependencies

Wan2.2 has its own dependencies. Install them inside the same virtual environment (or a dedicated one):

cd Wan2.2
pip install -r requirements.txt
cd ..

Note: You may need system-level packages (FFmpeg, libsndfile, build tools). Follow Wan2.2 repo instructions if you encounter errors.

5. Configure model paths (optional)

If you store models outside ./models, create a .env or update config/*.yaml with the correct paths. Example .env:

MODEL_DIR=./models
LLM_PATH=./models/llama-3.1
IMAGE_MODEL_PATH=./models/flux1
VIDEO_MODEL_PATH=./models/wan2.2

(Usage)

Start the app from the project root:

python app.py

The app will load the models (this may take several minutes). Gradio will print a local URL such as http://localhost:7860 and, optionally, a public *.gradio.live sharing URL.

Important: Model loading is resource-intensive. On ROCm setups, ensure PyTorch/ROCm and drivers are installed and tested first.


Workflow (UI)

  1. Welcome Tab — Overview + quick-start instructions.
  2. Step 1: Storyboard Tab — Enter a video idea and click Generate Storyboard. The LLM outputs scenes, shot descriptions, and recommended camera moves/timings.
  3. Step 2: Images Tab — Generate and review concept art for each scene. Regenerate or refine prompts as needed.
  4. Step 3: Videos Tab — Render short video clips from the images. Preview, download, and iterate.

Outputs are saved to ./outputs/ with a timestamped folder for each run.


Contact

If you'd like help setting this up for ROCm, creating .env/config.sample.yaml, or automating downloads with scripts, open an issue or reach out to ni3.singh.r@gmail.com (replace with your contact).


Generated with ❤️

About

ARON — A unified AI pipeline for text, image, video, and audio generation. Combine multimodal models in one platform for seamless AI-powered content creation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages