Stars
[ICML 2026] The offical code of Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
[CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'
ICML2025, I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[2025-TMLR] A Survey on the Honesty of Large Language Models
[TOG 2024]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
[ECCV2022] Learning Quality-aware Dynamic Memory for Video Object Segmentation