Stars
[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"
RetinaFace: Deep Face Detection Library for Python
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
Embodied Reasoning Question Answer (ERQA) Benchmark
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πππ
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! π₯
π€ RoboOS: A Universal Embodied Operating System for Cross-Embodied and Multi-Robot Collaboration
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
State-of-the-art 2D and 3D Face Analysis Project
[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
PyTorch implementation of paper "ARTrack" and "ARTrackV2"
[AAAI2024]Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models
π Explore Egocentric Vision: research, data, challenges, real-world apps. Stay updated & contribute to our dynamic repository! Work-in-progress; join us!
Official repo and evaluation implementation of VSI-Bench