Stars
Real-Time VLAs via Future-state-aware Asynchronous Inference.
RynnVLA-002: A Unified Vision-Language-Action and World Model
PyTorch code and models for VJEPA2 self-supervised learning from video.
[NeurIPS'25] Generalizable Reasoning through Compositional Energy Minimization
Open-source code of the paper: Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions.
Cambrian-S: Towards Spatial Supersensing in Video
Muon is an optimizer for hidden layers in neural networks
Reference PyTorch implementation and models for DINOv3
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Official code for the CVPR 2025 paper "Navigation World Models".
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
A PyTorch native platform for training generative AI models
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
PyTorch code and models for V-JEPA self-supervised learning from video.
This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"
[IROS 2024] 📈 RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
LinVT: Empower Your Image-level Large Language Model to Understand Videos