Stars
Open-Sora: Democratizing Efficient Video Production for All
State-of-the-art 2D and 3D Face Analysis Project
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Implementation of Nougat Neural Optical Understanding for Academic Documents
BoxMOT: Pluggable SOTA multi-object tracking modules modules for segmentation, object detection and pose estimation models
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Data processing for and with foundation models! π π π½ β‘οΈ β‘οΈπΈ πΉ π·
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
RetinaFace: Deep Face Detection Library for Python
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
SEED-Voken: A Series of Powerful Visual Tokenizers
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. πππ
Official repo and evaluation implementation of VSI-Bench
Low-level locomotion policy training in Isaac Lab
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid
PyTorch implementation of paper "ARTrack" and "ARTrackV2"
[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"
Vision-Language Navigation Benchmark in Isaac Lab