Stars
🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!
Official inference framework for 1-bit LLMs
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Hackable and optimized Transformers building blocks, supporting a composable construction.
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
🚀 「大模型」1小时从0训练67M参数的视觉多模态VLM!🌏 Train a 67M-parameter VLM from scratch in just 1 hours!
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Everything about the SmolLM and SmolVLM family of models
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
OpenMMLab Self-Supervised Learning Toolbox and Benchmark
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
NSGA2, NSGA3, R-NSGA3, MOEAD, Genetic Algorithms (GA), Differential Evolution (DE), CMAES, PSO
[IEEE TMI] Official Implementation for UNet++
Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.
[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-…
This is a collection of our NAS and Vision Transformer work.
Nvidia Semantic Segmentation monorepo
Pytorch implementation of various Knowledge Distillation (KD) methods.