-
Nanjing University of Science and Technology
- Nanjing, China
- rayn-wu.github.io/
Lists (12)
Sort Name ascending (A-Z)
Stars
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
🌐 Forging Spatial Intelligence: A Survey on Multi-Modal Pre-Training for Autonomous Systems
[ICCV 2025] AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model
OccSTeP: Benchmarking 4D Occupancy Spatio-Temporal Persistence
GaussianFormer with Semantic Render & Multi-Frame Surpervice
💫 [CVPR 2024] LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Code and Data for "Depth Based Semantic Scene Completion with Position Importance Aware Loss", ICRA2020 and RAL
Not All Pixels Are Equal: Learning Hardness Probability for Semantic Segmentation.
[NeurIPS2025 Spotlight] Implementation of "GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving"
Collects papers on autonomous driving E2E learning and VLM/VLA, with organized research branches and trends in these fields.
🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
FB-OCC & FlashOcc with Spatial Retrieval Enchanced
Devkit, Dataset Curation Code, and Dataset (nuScenes-Geography) for Spatial Retrieval Augmented Autonomous Driving
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
This is the official repository for the AAAI 2026 paper "DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving"
Code for "Object as Query: Lifting any 2D Object Detector to 3D Detection"
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors, CVPR 2024
[AAAI'26] BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
Official implementation of Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Official implementation of SRCN3D: Sparse R-CNN 3D Surround-View Cameras 3D Object Detection and Tracking for Autonomous Driving
Learning to Drive via Real-World Simulation at Scale