Skip to content

zhangzaibin/AD-H

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš— AD-H: Autonomous Driving with Hierarchical Agents

Language-guided Autonomous Driving with Hierarchical Agents

arXiv

๐Ÿ”ฅ Updates

  • [2024-06] AD-H is released on arXiv.
  • [2026-05] Inference/evaluation code is released.

๐Ÿ“– Overview

AD-H is a hierarchical multi-agent framework for language-guided autonomous driving that explicitly separates high-level decision-making from low-level vehicle execution:

  • ๐Ÿง  MLLM Planner (LLaVA-7B-v1.5 / Mipha-3B) โ€” Interprets natural-language commands and environmental context to generate coherent mid-level driving instructions (e.g., "Approaching a junction, prepare to follow traffic rules. Slow down and make a slight left turn.")
  • โšก Lightweight Controller (OPT-350M) โ€” Converts mid-level instructions into precise, continuous control signals via waypoints

Instead of using a single end-to-end MLLM to map language directly to actions, AD-H leverages this decomposition to unleash the reasoning power of MLLMs while ensuring stable actuation โ€” even the Mipha-3B variant outperforms 7B-scale single-agent baselines (3B+350M vs. 7B).

โœจ Key Highlights

  • ๐Ÿ† Outperforms state-of-the-art despite using nearly half the parameters
  • ๐Ÿ”„ Strong emergent generalization โ€” self-corrects in unseen corner cases (e.g., oversteering)
  • ๐Ÿ“ Robust long-horizon instruction following โ€” coherent planning across extended temporal sequences
  • ๐Ÿ“Š 1.15M hierarchical annotation pairs โ€” constructed via a rule-based pipeline from 26 atomic sub-commands across Perception, Speed, Steer, and Brake dimensions

๐Ÿ—‚๏ธ Method

Component Model Role
Planner LLaVA-7B-v1.5 / Mipha-3B Decomposes high-level instructions โ†’ mid-level driving commands
Controller OPT-350M + Vision Encoder (R50 + Q-Former) Decodes mid-level commands โ†’ waypoints โ†’ control signals (PID)

Mid-level commands are composed from 26 atomic sub-commands across four dimensions:

Dimension Examples
๐Ÿšฆ Perception "There is a pedestrian ahead" / "Traffic light is red"
๐ŸŽ๏ธ Speed "Maintain current speed" / "Accelerate gradually"
๐Ÿ”€ Steer "Make a slight left turn" / "Keep steering straight"
๐Ÿ›‘ Brake "Apply brakes safely"

These sub-commands combine to produce 170+ distinct mid-level driving commands.

๐Ÿš€ Validation

To facilitate convenient inference, we quantize the model during inference, allowing both the CARLA simulator and our model to run simultaneously on a single GPU with 24GB VRAM.

๐Ÿ’ก For full-model validation on larger VRAM, modify planner_load_8bit / planner_load_4bit in team_code/adh_agent_config.py. For multi-GPU setups, set planner_device, controller_device, and CARLA_DEVICE accordingly.

๐Ÿ–ฅ๏ธ Hardware & System

  • GPU with at least 24GB VRAM
  • Ubuntu 22.04 / 20.04

๐Ÿ“ฆ Installation

git clone https://github.com/zhangzaibin/AD-H.git
cd AD-H
conda env create -f requirements.yml   # If it fails, try changing the conda source
conda activate adh

๐ŸŽฎ Download CARLA 0.9.15 & Additional Maps

Click to download: CARLA 0.9.15 | AdditionalMaps 0.9.15

Or via command line:

wget https://carla-releases.b-cdn.net/Linux/CARLA_0.9.15.tar.gz
wget https://carla-releases.b-cdn.net/Linux/AdditionalMaps_0.9.15.tar.gz
mkdir CARLA_0.9.15
tar -xvf CARLA_0.9.15.tar.gz -C CARLA_0.9.15
mv AdditionalMaps_0.9.15.tar.gz CARLA_0.9.15/Import
cd CARLA_0.9.15 && bash ImportAssets.sh && cd ..

๐Ÿ”ป Download Pre-trained Models

TODO โ€” checkpoints will be released soon.

Expected ./checkpoints structure:

checkpoints
    โ”œโ”€โ”€ llava15-ours/
    โ”œโ”€โ”€ opt-350m-ours/
    โ”œโ”€โ”€ opt-350m-ours.pth
    โ””โ”€โ”€ vision_weights/
        โ””โ”€โ”€ vision-encoder-r50.pth.tar

โ–ถ๏ธ Run Evaluation

Place weights in checkpoints/ or modify paths in team_code/adh_agent_config.py, then:

conda activate adh
bash adh_evaluation.sh

๐Ÿ“‹ TODO

  • Release pre-trained model checkpoints
  • Release training code & dataset
  • Release full training data (1.15M hierarchical annotation pairs)

๐Ÿ“„ Citation

@article{zhang2024adh,
  title={AD-H: Language-guided Autonomous Driving with Hierarchical Agents},
  author={Zhang, Zaibin and Fu, Talas and Tang, Shiyu and Zhang, Yuanhang and Wang, Yifan and Wang, Lijun and Lu, Huchuan},
  journal={arXiv preprint arXiv:2406.03474},
  year={2024}
}

๐Ÿ“œ License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors