Stars
Anny, A Free and Interpretable Human Body Model for all ages, written in PyTorch.
Implementation of the KinectFusion approach in modern C++14 and CUDA
Open3D: A Modern Library for 3D Data Processing
Pangolin is a lightweight portable rapid development library for managing OpenGL display / interaction and abstracting video input.
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
Habitat-Web is a web application to collect human demonstrations for embodied tasks on Amazon Mechanical Turk (AMT) using the Habitat simulator.
The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)
Sparse Video Generation Model for Embodied Navigation conditioned on loose language guidance, 100% real world verification
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Official GitHub Repository for Paper "Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill", ICRA 2024
[IROS 2025] Official implementation of SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
[ICRA'25] One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
[NeurIPS 2024] SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[ICRA 2025] Official implementation of Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs