User profiles for Chenxu Dang
Chenxu DangHuazhong University of Science and Technology Verified email at hust.edu.cn Cited by 37 |
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Recent top-performing temporal 3D detectors based on Lidars have increasingly adopted
region-based paradigms. They first generate coarse proposals, followed by encoding and …
region-based paradigms. They first generate coarse proposals, followed by encoding and …
Sparseworld: A flexible, adaptive, and efficient 4d occupancy world model powered by sparse and dynamic queries
Semantic occupancy has emerged as a powerful representation in world models for its ability
to capture rich spatial semantics. However, most existing occupancy world models rely on …
to capture rich spatial semantics. However, most existing occupancy world models rely on …
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
Multimodal 3D occupancy prediction has garnered significant attention for its potential in
autonomous driving. However, most existing approaches are single-modality: camera-based …
autonomous driving. However, most existing approaches are single-modality: camera-based …
Vggdrive: Empowering vision-language models with cross-view geometric grounding for autonomous driving
The significance of cross-view 3D geometric modeling capabilities for autonomous driving
is self-evident, yet existing Vision-Language Models (VLMs) inherently lack this capability, …
is self-evident, yet existing Vision-Language Models (VLMs) inherently lack this capability, …
Drivefine: Refining-augmented masked diffusion vla for precise and robust driving
Vision-Language-Action (VLA) models for autonomous driving increasingly adopt generative
planners trained with imitation learning followed by reinforcement learning. Diffusion-…
planners trained with imitation learning followed by reinforcement learning. Diffusion-…
Mimo-embodied: X-embodied foundation model technical report
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully
integrate and achieve state-of-the-art performance in both Autonomous Driving and …
integrate and achieve state-of-the-art performance in both Autonomous Driving and …
SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning
In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning ,
whereas semantic occupancy provides fine-grained details. Despite significant progress in …
whereas semantic occupancy provides fine-grained details. Despite significant progress in …
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in
VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is …
VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is …
MiMo-Embodied: X-Embodied Foundation Model
MiMo-Embodied is a groundbreaking cross-embodied foundation model that integrates
indoor robotics and outdoor autonomous driving in one model, effectively addressing the …
indoor robotics and outdoor autonomous driving in one model, effectively addressing the …
SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving
Recent advances in Vision-Language-Action (VLA) models have shown promising capabilities
in autonomous driving by leveraging the understanding and reasoning strengths of Large …
in autonomous driving by leveraging the understanding and reasoning strengths of Large …