Stars
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Python sample codes and textbook for robotics algorithms.
deep learning for image processing including classification and object-detection etc.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Generate 3D objects conditioned on text or images
Enjoy the magic of Diffusion models!
Common used path planning algorithms with animations.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
OpenMMLab's next-generation platform for general 3D object detection.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Python package for the evaluation of odometry and SLAM
Visual localization made easy with hloc
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
pySLAM is a hybrid Python/C++ Visual SLAM pipeline supporting monocular, stereo, and RGB-D cameras. It provides a broad set of modern local and global feature extractors, multiple loop-closure stra…
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
A Unified Framework for Surface Reconstruction
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Transformer seq2seq model, program that can build a language translator from parallel corpus
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
这是一本关于SLAM的书稿,希望能清楚的介绍SLAM系统中的使用的几何方法和深度学习方法。书稿最后应该会达到200页左右,书稿每章对应的代码也会被整理出来。