Stars
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
A generative world for general-purpose robotics & embodied AI learning.
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
A suite of image and video neural tokenizers
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Efficient Segment Anything in Medical Images
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
llama3 implementation one matrix multiplication at a time
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
A neural network training interface based on PyTorch, with a focus on flexibility
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Effortless data labeling with AI support from Segment Anything and other awesome models.
ImageBind One Embedding Space to Bind Them All
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code and models for the DINOv2 self-supervised learning method.
Efficient vision foundation models for high-resolution generation and perception.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.