Stars
[NeurIPS 2025 Spotlight] "SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation."
[DEIMv2] Real Time Object Detection Meets DINOv3
[NeurIPS 2025] YOLOv12: Attention-Centric Real-Time Object Detectors
Repository of the paper "AnyUp: Universal Feature Upsampling".
SpotX patcher used for patching the desktop version of Spotify
Make your job hunt easy by automating your application process with this Auto Applier
VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision
Train YOLO on custom dataset — no coding required.
Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
[CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"
NVIDIA DeepStream SDK 8.0 / 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
tensorrt for yolo series (YOLOv11,YOLOv10,YOLOv9,YOLOv8,YOLOv7,YOLOv6,YOLOX,YOLOv5), nms plugin support
Ultralytics YOLO with Additional Knowledge Distillation Capability
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
Provides functions to query offers from rebuy or momox
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
A high-performance library for detecting objects in images and videos, leveraging Rust's speed and safety. Optionally supports a gRPC API for building scalable microservices, enabling seamless inte…
The swiss army knife of lossless video/audio editing
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
We write your reusable computer vision tools. 💜
This is the dataset repository for the paper: POP909: A Pop-song Dataset for Music Arrangement Generation
An SDK for Transformers + YOLO and other SSD family models
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.