Stars
Native and Compact Structured Latents for 3D Generation
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
Unified Multimodal Model for image generation/editing/understanding
Official Implementation of Paper Transfer between Modalities with MetaQueries
Native Multimodal Models are World Learners
[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Official PyTorch Implementation of "Latent Diffusion Model Without Variational Autoencoder".
[NeurIPS'25 Spotlight] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
A part-based 3D generation framework & the largest and most comprehensively annotated 3D part dataset.
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
Official code for VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2025] Improving Video Generation with Human Feedback
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Wan: Open and Advanced Large-Scale Video Generative Models
Enjoy the magic of Diffusion models!
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
[NeurIPS 2025]SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction.
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.