Highlights
- Pro
Stars
Enjoy the magic of Diffusion models!
PyTorch code and models for VJEPA2 self-supervised learning from video.
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
AI agents running research on single-GPU nanochat training automatically
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
[ICLR 2026] pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Official inference repo for FLUX.2 models
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Wan: Open and Advanced Large-Scale Video Generative Models
The open-source CapCut alternative
Open-source simulator for autonomous driving research.
The ultimate training toolkit for finetuning diffusion models
Generative Omnimatte (CVPR 2025)
Self-contained, minimalistic implementation of diffusion models with Pytorch.
Official repository of In-Context LoRA for Diffusion Transformers
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official code for ICCV 2025 paper, X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Wan: Open and Advanced Large-Scale Video Generative Models
🧙 Automates the installation and updating of the Cursor .AppImage for Linux users, resolving common issues during setup and effortlessly handling configurations, updates, and related tasks.
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"