-
University of Chinese Academy of Sciences
- Beijing
-
11:07
(UTC +08:00) - https://luoxubo.github.io/
- @xubo_luo
Highlights
- Pro
Lists (24)
Sort Name ascending (A-Z)
Attention mechanism
Autonomous driving
clip
Efficiency
ekf
Event Camera
Facial expression recognition
flow matching
Homography Estimation
IELTS
Image fusion
Image matching
Some nice image matching related worksImage retrieval
Lab homepage
Some nice templates of homepage of labsLearning
Mulit sensor localization
NeRF
Paper codes
Pose estimation
Segmentation
SLAM with deep learning
Tracking
Object tracking repos.Visual localization
world model
Starred repositories
Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).
Official implementation of paper "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation"
A list of works on video generation towards world model
A Collection of Diffusion for Path Planning Papers, Toolboxes and Notes.
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
Offical code release for DynoSAM: Dynamic Object Smoothing And Mapping. Accepted Transactions on Robotics (Visual SLAM SI). A visual SLAM framework and pipeline for Dynamic environements, estimatin…
[TRO 2025] NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning.
[CVPR 2025 Highlight] MATCHA: Towards Matching Anything
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
Vision-Language Navigation Benchmark in Isaac Lab
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
Collect some World Models for Autonomous Driving (and Robotic, etc.) papers.
[IROS 2025 oral] Official implementation of NOLO: Navigate Only Look Once
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
[AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Official code release for CoRL'25 paper: VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning
Repository of the paper "AnyUp: Universal Feature Upsampling".
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Reading list for research topics in embodied vision
This is the code for the IROS2025 RoboSense challenge track1: LLM for Driving
[ACMMM 2025] Official implementation of SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero Shot 3D Visual Grounding
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
SLAM-Former: Putting SLAM into One Transformer
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding