Highlights
- Pro
Stars
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Official implementation of Continuous 3D Perception Model with Persistent State
[ICLR2026] The official code of "Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance"
Official Code Repo for UniVA: Universal Video Agents
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
The official implementation of paper "Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation"
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Automatic Metric for Evaluating Generated Videos
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (COLM 2024)
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
[ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
[NeurIPS 2025] Improving Video Generation with Human Feedback
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
A curated list of papers on reinforcement learning for video generation
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Enjoy the magic of Diffusion models!
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
This is a repository dedicated to high quality figures from EMNLP 2025 long papers.
This is a repository dedicated to high quality figures from ACL 2025 long papers.