Lists (1)
Sort Name ascending (A-Z)
Stars
A simple MPC controller for path tracking implemented in python
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Learn it. Build it. Ship it for others.
Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
YOLOs-TRT is a header-only C++ library for running all YOLO models with all tasks with NVIDIA TensorRT on CUDA GPUs and Jetson. It features GPU preprocessing (letterbox/normalize/HWC→NCHW), CUDA Gr…
[DEIMv2] Real Time Object Detection Meets DINOv3
Train YOLO + VLM with one command. Auto-generate vision-language training data from YOLO labels - no extra labeling needed.
🕹️SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Inference repo for Falcon-Perception and Falcon-OCR model, early-fusion, natively multimodal, dense Autoregressive Transformer models.
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
[ICLR 2026] Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
A feed-forward 3D foundation model for reconstructing scenes from streaming data
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful re…
[CVPR 2026 Oral] "INSID3: Training-Free In-Context Segmentation with DINOv3"
Simulation platform for general-purpose robotics & embodied AI learning.
Efficient Universal Perception Encoder: a single on-device vision encoder with versatile representations that match or exceed specialized experts across multiple task domains.
Become a cracked AI/ML Research Engineer
[CVPR2026]🚀🚀🚀Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* 🔥🔥🔥
CAR: Controllable AutoRegressive Modeling for Visual Generation
The official implementation of ICCV'25 paper "FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution"
Official implementation of GeCo2 (AAAI 2026) -- Generalized-Scale Object Counting with Gradual Query Aggregation