Stars
Physical laws underpin all existence, and harnessing them for generative modeling opens boundless possibilities for advancing science and shaping the future!
[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Native Multimodal Models are World Learners
Official PyTorch Implementation of "F2M-Reg: Unsupervised RGB-D registration with Frame-to-Model Optimization“
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Code for paper "CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation"
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
[CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
code for CoRL2025 "LaDiWM: A Latent Diffusion-based World Model for Predictive Manipulation"
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Implementation of Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
🆓免费的 ChatGPT 镜像网站列表,持续更新。List of free ChatGPT mirror sites, continuously updated.
aod321 / ManiSkill
Forked from haosulab/ManiSkillSAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive Learning
code for paper: MS2A: Memory Storage-to-Adaptation for Cross-domain Few-annotation Object Detection