Highlights
- Pro
Lists (9)
Sort Name ascending (A-Z)
Stars
📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for streaming video.
Implementation of paper "Playful Agentic Robot Learning"
RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉
LAVIS - A One-stop Library for Language-Vision Intelligence
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
Fully Open Framework for Democratized Multimodal Training
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
A curated, continuously updated reading list, paper blogs, and resources for World Action Models (WAMs) in embodied AI.
An image-to-world skillset for Claude.
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"
Implementation of the paper "Counting Through Occlusion: Framework for Open World Amodal Counting"
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
[ICLR 2026 Oral] Visual Planning: Let's Think Only with Images
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Official implementation of "RL Makes MLLMs See Better Than SFT"