Highlights
- Pro
Stars
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[NeurIPS 2024] Simple and Effective Masked Diffusion Language Model
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
Official Implementation for Diffusion Models Without Classifier-free Guidance
Code for Fast Training of Diffusion Models with Masked Transformers
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Controlled text generation with programmable constraints
Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"
A framework for few-shot evaluation of language models.
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
[NeurIPS 2025] Official repository for "Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling"
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
[ICLR2025] Halton Scheduler for Masked Generative Image Transformer
Official Jax Implementation of MaskGIT
SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
collection of diffusion model papers categorized by their subareas
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
A curated list for awesome discrete diffusion models resources.
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch