Stars
Convert nuscenes data to mcap file format
Official models of Franka Robotics GmbH robots
A curated list of awesome robot descriptions (URDF, MJCF)
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
A light-weight, pythonic ros2 package to connect the genesis simulator and ROS2
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
C++ library designed to provide an abstraction for different rendering engines. It offers unified APIs for creating 3D graphics applications.
Builds on top of Qt to provide widgets which are useful when developing robotics applications, such as a 3D view, plots, dashboard, etc, and can be used together in a convenient unified interface.
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
🔎 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
PointNet and PointNet++ implemented by pytorch (pure python) and on ModelNet, ShapeNet and S3DIS.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
An open-source framework for training large multimodal models.
A Minimalist, Batteries-included Repository for Advancing World Model Science.
A minimal PyTorch implementation of the VQ-VAE model described in "Neural Discrete Representation Learning".
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Gemma open-weight LLM library, from Google DeepMind
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
ACL 2025: Synthetic data generation pipelines for text-rich images.
Open-Sora: Democratizing Efficient Video Production for All
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
An open source implementation of CLIP.