User profiles for Chenxu Dang

Chenxu Dang

Huazhong University of Science and Technology
Verified email at hust.edu.cn
Cited by 37

FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection

C Dang, Z Duan, P An, X Zhang… - Proceedings of the …, 2025 - openaccess.thecvf.com
Recent top-performing temporal 3D detectors based on Lidars have increasingly adopted
region-based paradigms. They first generate coarse proposals, followed by encoding and …

Sparseworld: A flexible, adaptive, and efficient 4d occupancy world model powered by sparse and dynamic queries

C Dang, H Liu, J Bao, P An, X Tang, A Pan… - Proceedings of the …, 2026 - ojs.aaai.org
Semantic occupancy has emerged as a powerful representation in world models for its ability
to capture rich spatial semantics. However, most existing occupancy world models rely on …

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZP Duan, CX Dang, X Hu, P An… - Proceedings of the …, 2025 - openaccess.thecvf.com
Multimodal 3D occupancy prediction has garnered significant attention for its potential in
autonomous driving. However, most existing approaches are single-modality: camera-based …

Vggdrive: Empowering vision-language models with cross-view geometric grounding for autonomous driving

J Wang, G Li, Z Huang, C Dang, H Ye, Y Han… - arXiv preprint arXiv …, 2026 - arxiv.org
The significance of cross-view 3D geometric modeling capabilities for autonomous driving
is self-evident, yet existing Vision-Language Models (VLMs) inherently lack this capability, …

Drivefine: Refining-augmented masked diffusion vla for precise and robust driving

C Dang, S Ang, Y Li, H Tian, J Wang, G Li, H Ye… - arXiv preprint arXiv …, 2026 - arxiv.org
Vision-Language-Action (VLA) models for autonomous driving increasingly adopt generative
planners trained with imitation learning followed by reinforcement learning. Diffusion-…

Mimo-embodied: X-embodied foundation model technical report

…, S Ren, X Meng, Y Zhang, J Wu, J Lu, C Dang… - arXiv preprint arXiv …, 2025 - arxiv.org
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully
integrate and achieve state-of-the-art performance in both Autonomous Driving and …

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning

C Dang, J Wang, G Li, Z Hou, Z You, H Ye, J Ma… - arXiv preprint arXiv …, 2026 - arxiv.org
In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning ,
whereas semantic occupancy provides fine-grained details. Despite significant progress in …

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

…, Y Li, H Wang, S Xu, Y Luo, F Li, C Dang… - arXiv preprint arXiv …, 2026 - arxiv.org
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in
VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is …

MiMo-Embodied: X-Embodied Foundation Model

…, Z Lu, S Ren, X Meng, Y Zhang, J Wu, J Lu, C Dang… - 2026 - researchsquare.com
MiMo-Embodied is a groundbreaking cross-embodied foundation model that integrates
indoor robotics and outdoor autonomous driving in one model, effectively addressing the …

SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving

Z You, H Liu, C Dang, Z Wang, S Ang, A Wang… - arXiv preprint arXiv …, 2026 - arxiv.org
Recent advances in Vision-Language-Action (VLA) models have shown promising capabilities
in autonomous driving by leveraging the understanding and reasoning strengths of Large …