Stars
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
[ICCV 2025] MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
Visual tracking library based on PyTorch.
Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"
Everything about the SmolLM and SmolVLM family of models
The hub for EleutherAI's work on interpretability and learning dynamics
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer Tracking
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
记录近期的 1) 图像/视频的超分增强等low level vision任务; 2) 图像生成 等任务相关论文, 主要为18年以后的DL based方法.
A PyTorch library and evaluation platform for end-to-end compression research
X-Super-Resolution is dedicated to presenting the research efforts of XPixel in the realm of image super-resolution.
Collect super-resolution related papers, data, repositories
Collection of public available person re-identification datasets
Awesome Person Re-identification
The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content…
The dataset for drone based detection and tracking is released, including both image/video, and annotations.