-
NVIDIA
- Seattle, Washington
- http://kaichun-mo.github.io
- @KaichunMo
Stars
A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Modular Multi-Agent System for Scientific Research Assistance
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Enjoy the magic of Diffusion models!
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
[CVPR'25 Highlight] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.