Stars
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Unofficial ROS2 SDK support for Unitree GO2 AIR/PRO/EDU
[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking
Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πππ
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
PyTorch Implementation of EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
π A collection of utilities for LeRobot.
Vision-and-Language Navigation in Continuous Environments using Habitat
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Building General-Purpose Robots Based on Embodied Foundation Model
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Official Python toolkit for generic object tracking benchmark GOT-10k and beyond
π₯ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
[ECCV 2022] Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Differentiable IoU of rotated bounding boxes using Pytorch
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
PyViz3D is a web-based visualizer for 3D objects and point clouds.