Stars
A beginner-friendly SLAM mini-course with Jupyter notebooks — covering Bayes Filters, Kalman Filters, Particle Filters, and Graph-based SLAM with hands-on Python examples.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Official repository for "AM-RADIO: Reduce All Domains Into One"
This code corresponds to simulation environments used as part of the MimicGen project.
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Reference workflow for generating large amounts of synthetic motion trajectories for robot manipulation from a few human demonstrations.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Wan: Open and Advanced Large-Scale Video Generative Models
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
The Official PyTorch implementation of DIFO: Diffusion Imitation from Observations (NeurIPS'24).
Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM …
Fullstack app framework for web, desktop, and mobile.
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Elegant reading of real-time and hottest news
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
A collection of awesome video generation studies.
[ECCV 2024] codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
Revisiting Image Deblurring with an Efficient ConvNet - An efficient CNN performs better than Transformer
A generative world for general-purpose robotics & embodied AI learning.
Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
Efficient Triton Kernels for LLM Training