Highlights
- Pro
Stars
The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Official PyTorch implementation for "Large Language Diffusion Models"
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
SEED-Voken: A Series of Powerful Visual Tokenizers
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
Janus-Series: Unified Multimodal Understanding and Generation Models
[TMLR 2025🔥] A survey for the autoregressive models in vision.
An open source implementation of CLIP.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Adapting LLaMA Decoder to Vision Transformer
Taming Transformers for High-Resolution Image Synthesis
PyTorch package for the discrete VAE used for DALL·E.
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Tools for merging pretrained large language models.
Example models using DeepSpeed