-
Institute of Automation Chinese Academy of Sciences
- BEIJING, CHINA
- https://bitcats.github.io/
Lists (6)
Sort Name ascending (A-Z)
Stars
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Official implementation for "SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation"
[IROS 24] Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instru…
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Official Repo of "SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization"
Official implementation of "Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation" (NeurIPS'25 Oral)
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
[CVPR 2025 Hightlight] PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
[NeurIPS 2025] the official project page of a paper, "PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting"
PlaneRCNN detects and reconstructs piece-wise planar surfaces from a single RGB image
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
CoTracker is a model for tracking any point (pixel) on a video.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR'25 Highlight] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
Python3 library for downloading YouTube Videos.
[NeurIPS 2025] Pixel-Perfect Depth
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A collection of useful functions for 3D vision & graphics research in Python.
[ICLR 2025 Oral] NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields
A general and accurate MACs / FLOPs profiler for PyTorch models
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)
ViPE: Video Pose Engine for Geometric 3D Perception
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Code of π^3: Permutation-Equivariant Visual Geometry Learning