Google Scholar

User profiles for Chenxu Dang

Chenxu Dang

Huazhong University of Science and Technology

Verified email at hust.edu.cn

Cited by 37

[PDF] thecvf.com

FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection

C Dang, Z Duan, P An, X Zhang… - Proceedings of the …, 2025 - openaccess.thecvf.com

Recent top-performing temporal 3D detectors based on Lidars have increasingly adopted
region-based paradigms. They first generate coarse proposals, followed by encoding and …

Save Cite Cited by 2 Related articles All 4 versions View as HTML

[PDF] aaai.org

Sparseworld: A flexible, adaptive, and efficient 4d occupancy world model powered by sparse and dynamic queries

C Dang, H Liu, J Bao, P An, X Tang, A Pan… - Proceedings of the …, 2026 - ojs.aaai.org

Semantic occupancy has emerged as a powerful representation in world models for its ability
to capture rich spatial semantics. However, most existing occupancy world models rely on …

Save Cite Cited by 5 Related articles All 2 versions View as HTML

[PDF] thecvf.com

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZP Duan, CX Dang, X Hu, P An… - Proceedings of the …, 2025 - openaccess.thecvf.com

Multimodal 3D occupancy prediction has garnered significant attention for its potential in
autonomous driving. However, most existing approaches are single-modality: camera-based …

Save Cite Cited by 11 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Vggdrive: Empowering vision-language models with cross-view geometric grounding for autonomous driving

J Wang, G Li, Z Huang, C Dang, H Ye, Y Han… - arXiv preprint arXiv …, 2026 - arxiv.org

The significance of cross-view 3D geometric modeling capabilities for autonomous driving
is self-evident, yet existing Vision-Language Models (VLMs) inherently lack this capability, …

Save Cite Cited by 5 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Drivefine: Refining-augmented masked diffusion vla for precise and robust driving

C Dang, S Ang, Y Li, H Tian, J Wang, G Li, H Ye… - arXiv preprint arXiv …, 2026 - arxiv.org

Vision-Language-Action (VLA) models for autonomous driving increasingly adopt generative
planners trained with imitation learning followed by reinforcement learning. Diffusion-…

Save Cite Cited by 4 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Mimo-embodied: X-embodied foundation model technical report

…, S Ren, X Meng, Y Zhang, J Wu, J Lu, C Dang… - arXiv preprint arXiv …, 2025 - arxiv.org

We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully
integrate and achieve state-of-the-art performance in both Autonomous Driving and …

Save Cite Cited by 11 Related articles All 2 versions View as HTML

[PDF] arxiv.org

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning

C Dang, J Wang, G Li, Z Hou, Z You, H Ye, J Ma… - arXiv preprint arXiv …, 2026 - arxiv.org

In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning ,
whereas semantic occupancy provides fine-grained details. Despite significant progress in …

Save Cite Cited by 1 Related articles All 2 versions View as HTML

[PDF] arxiv.org

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

…, Y Li, H Wang, S Xu, Y Luo, F Li, C Dang… - arXiv preprint arXiv …, 2026 - arxiv.org

Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in
VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is …

[PDF] researchsquare.com

MiMo-Embodied: X-Embodied Foundation Model

…, Z Lu, S Ren, X Meng, Y Zhang, J Wu, J Lu, C Dang… - 2026 - researchsquare.com

MiMo-Embodied is a groundbreaking cross-embodied foundation model that integrates
indoor robotics and outdoor autonomous driving in one model, effectively addressing the …

Save Cite Related articles View as HTML

[PDF] arxiv.org

SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving

Z You, H Liu, C Dang, Z Wang, S Ang, A Wang… - arXiv preprint arXiv …, 2026 - arxiv.org

Recent advances in Vision-Language-Action (VLA) models have shown promising capabilities
in autonomous driving by leveraging the understanding and reasoning strengths of Large …

Create alert

Cite

Advanced search

Saved to My library

User profiles for Chenxu Dang

Chenxu Dang

FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection

Sparseworld: A flexible, adaptive, and efficient 4d occupancy world model powered by sparse and dynamic queries

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

Vggdrive: Empowering vision-language models with cross-view geometric grounding for autonomous driving

Drivefine: Refining-augmented masked diffusion vla for precise and robust driving

Mimo-embodied: X-embodied foundation model technical report

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

MiMo-Embodied: X-Embodied Foundation Model

SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving