Lists (1)
Sort Name ascending (A-Z)
Stars
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
MoBA: Mixture of Block Attention for Long-Context LLMs
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
[ICCV 2023] Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
Implementing DeepSeek R1's GRPO algorithm from scratch
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
A fork to add multimodal model training to open-r1
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
[ICCV'25]DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
[ICLR 2023 Oral] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
Official code for "Style Aligned Image Generation via Shared Attention"
[AAAI 2025]👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high …
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
This codebase demonstrates how to synthesize realistic 3D character animations given an arbitrary speech signal and a static character mesh.
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation