Stars
starVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"
Code for "AutoRecon: Automated 3D Object Discovery and Reconstruction" CVPR 2023 (Highlight)
The Most Faithful Implementation of Segment Anything (SAM) in 3D
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
Open source library for Single Object Tracking in point clouds.
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
[ICLR 2025 Oral] Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
[NeurIPS 2023] PointGPT: Auto-regressively Generative Pre-training from Point Clouds
Codebase for Automated Creation of Digital Cousins for Robust Policy Learning
Full reimplementation of siamese rpn, has 0.24 eao on vot2017.
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms.
[NeurIPS 2025 Spotlight] SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
[ECCV'24] OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) & TCTrack++ (TPAMI)
Official implementation for the CVPR2021 paper Alpha-Refine