Stars
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
PyTorch code and models for VJEPA2 self-supervised learning from video.
Reference PyTorch implementation and models for DINOv3
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
Lets make video diffusion practical!
Optimus: the first large-scale pre-trained VAE language model
MAGI-1: Autoregressive Video Generation at Scale
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization (ICCV 2025)
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Rembg is a tool to remove images background
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Official inference repo for FLUX.1 models
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.