- Santa Clara
-
12:19
(UTC -12:00) - https://dragonlong.github.io/
- @lxiaol9
Lists (1)
Sort Name ascending (A-Z)
Stars
Team Comet's 2025 BEHAVIOR Challenge Codebase
Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Unified high-performance Python client for object and file stores.
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images
InternRobotics' open platform for building generalized navigation foundation models.
GigaWorld-0: World Models as Data Engine to Empower Embodied AI
[RSS 2025] TactAR teleopeartion APP in "Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation"
[RSS 2025] Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
robomimic: A Modular Framework for Robot Learning from Demonstration
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Simulation benchmark from Toyota Research Institute containing 49 tasks that measure the performance of Large Behavior Model policies
Post-training scripts and samples for NVIDIA Cosmos ecosystem
Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Enjoy the magic of Diffusion models!
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)