Stars
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
[NIPS 2025] Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Official implementation of DepthLM
(NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
[ACMMM 2025] "Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts" (Official Implementation)
A Collection of Papers and Codes for CVPR2025/ICCV2025/CVPR2024/ECCV2024 AIGC
Calligrapher: Freestyle Text Image Customization
Unified layout planning and image generation, ICCV2025
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
[NeurIPS 2025] IEAP: Image Editing As Programs with Diffusion Models
Layout Conditioned Image Generation, NeurIPS2024
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Official inference repo for FLUX.1 models
[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Train transformer language models with reinforcement learning.