WoW (World-Omniscient World Model) is a generative world model trained on 2 million robotic interaction trajectories, designed to imagine, reason, and act in the physical world. Unlike passive vide…

Jupyter Notebook 150 11 Updated Jan 4, 2026

Phantom-video / Phantom

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Python 1,497 97 Updated Sep 11, 2025

buoyancy99 / large-video-planner

Python 222 16 Updated Jan 31, 2026

bghira / SimpleTuner

A general fine-tuning kit geared toward image/video/audio diffusion models.

Python 2,808 277 Updated Apr 2, 2026

tdrussell / diffusion-pipe

A pipeline parallel training script for diffusion models.

Python 1,907 266 Updated Feb 8, 2026

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 12,164 1,183 Updated Apr 2, 2026

Wan-Video / Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,030 1,821 Updated Mar 17, 2026

microsoft / VITRA

[ICRA 2026] VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Python 343 19 Updated Feb 24, 2026

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,584 1,268 Updated Nov 4, 2025

Yaofang-Liu / Pusa-VidGen

Pusa: Thousands Timesteps Video Diffusion Model

Python 677 47 Updated Feb 13, 2026

XavierXiao / Dreambooth-Stable-Diffusion

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Jupyter Notebook 7,743 803 Updated Dec 8, 2022

google / dreambooth

1,023 97 Updated Mar 6, 2023

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,331 71 Updated Jan 27, 2026

DexForce / EmbodiChain

An end-to-end, GPU-accelerated, and modular platform for building generalized Embodied Intelligence.

Python 134 9 Updated Apr 3, 2026

thu-ml / Motus

Official code of Motus: A Unified Latent Action World Model

Python 925 42 Updated Jan 5, 2026

Tencent-Hunyuan / HunyuanWorld-1.0

Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model

Python 2,741 241 Updated Dec 17, 2025