Stars
[CVPR 2026] Official implementation of "ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models"
CoWTracker: Tracking by Warping instead of Correlation
π RuView: WiFi DensePose turns commodity WiFi signals into real-time human pose estimation, vital sign monitoring, and presence detection — all without a single pixel of video.
Official implementation of paper [DeepTag: A General Framework for Fiducial Marker Design and Detection]
This is a lightweight GAN developed for real-time deblurring. The model has a super tiny size and a rapid inference time. The motivation is to boost marker detection in robotic applications, howeve…
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
[RSS'25] This repository is the implementation of "NaVILA: Legged Robot Vision-Language-Action Model for Navigation"
A flexible, high-performance 3D simulator for Embodied AI research.
A GPU-accelerated TSDF and ESDF library for robots equipped with RGB-D cameras.
MonSter++: A Unified Geometric Foundation Model for Stereo and Multi-View Depth Estimation via the Unleashing of Monodepth Priors
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
国内首个占据栅格网络全栈课程《从BEV到Occupancy Network,算法原理与工程实践》,包含端侧部署。Surrounding Semantic Occupancy Perception Course for Autonomous Driving (docs, ppt and source code) 在线课程主页:http://111.229.117.200:8100/ (作者独立搭建)
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.
This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"
GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)
Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM
[CVPR 2025] JamMa is a lightweight image matcher that enables fast internal and mutual interaction of images with joint Mamba.
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
在 ROS 上run的 llama3.1-8b 聊天机器人,无需联网,直接 ollama 本地部署调用!
Joint deep network for feature line detection and description
This code contains an algorithm to compute stereo visual SLAM by using both point and line segment features.
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding