-
Institute of Automation Chinese Academy of Sciences
- BEIJING, CHINA
- https://bitcats.github.io/
Lists (6)
Sort Name ascending (A-Z)
Stars
Official implementation of "IDOL: Inertial Deep Orientation-estimation & Localization." AAAI 2021.
This is the source code for our ICLR 2025 work EqNIO
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
[ICCV 2025] 3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.
PyTorch code and models for V-JEPA self-supervised learning from video.
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
A highly robust and accurate LiDAR-only, LiDAR-inertial odometry
🚀 Official code for “XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression”, published at SID’s Display Week 2026.
Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.
[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Official implementation for "SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation"
[IROS 24] Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instru…
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
[3DV 2026 Oral] Official Repo of "SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization"
Official implementation of "Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation" (NeurIPS'25 Oral)
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
[CVPR 2025 Hightlight] PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
[NeurIPS 2025] the official project page of a paper, "PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting"
PlaneRCNN detects and reconstructs piece-wise planar surfaces from a single RGB image
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
CoTracker is a model for tracking any point (pixel) on a video.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR'25 Highlight] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
Python3 library for downloading YouTube Videos.
[NeurIPS 2025] Pixel-Perfect Depth
MapAnything: Universal Feed-Forward Metric 3D Reconstruction