Stars
Includes the VideoCount dataset and CountVid code for the paper Open-World Object Counting in Videos.
Awesome Incremental Learning
[NeurIPS 2025] Efficient Reasoning Vision Language Models
This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).
Awesome Spatial Intelligence (Personal Use)
[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"
[NeurIPS 2025 Spotlight] The official repository of "ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection".
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
[TMLR 2025🔥] A survey for the autoregressive models in vision.
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
[NeurIPS 2025] YOLOv12: Attention-Centric Real-Time Object Detectors
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
[CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"
The official implementation of [CVPR 2025] "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
YOLO-UniOW: Efficient Universal Open-World Object Detection
Official github repo for SafeDialBench, a comprehensive multi-turn dialogue benchmark to evaluate LLMs' safety.
An awesome paper list of Semi-Supervised Learning under realistic settings.
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Writing AI Conference Papers: A Handbook for Beginners