[NeurIPS 2025] Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism
Junfei Zhou, Penglin Dai✉, Quanmin Wei, Bingyi Liu, Xiao Wu, Jianping Wang
This repository provides a unified and integrated multi-agent collaborative perception framework, extended from HEAL to support heterogeneous settings across sensors, modalities, and models. Building on HEAL, we add support for additional features, datasets, and multiple heterogeneous collaboration methods. Explore this repository to discover the ultimate experience of heterogeneous collaboration.🌟
Inherit from HEAL
-
Modality Support: LiDAR/Camera/LiDAR + Camera
-
Heterogeneity Support: Sensor/Modality/Model
-
Dataset Support: OPV2V/V2XSet/V2X-Sim 2.0/DAIR-V2X-C
-
Detector Support: PointPillars/SECOND/Pixor/VoxelNet/Lift-Splat-Shoot
-
Multiple collaborative perception methods
-
Robustness Setiing
- Pose error
-
Multiple heterogeneous collaboration methods
-
Additional Dataset Support
- V2X-Real [ECCV 2024] (Real-World dataset with 4 agents, 2CAV & 2RSU)
- V2V4REAL [CVPR 2023] (Real-world dataset with 2 agents, 1CAV & 1RSU)
-
Robustness Settings
- Communication delay
- Communication degradation
- LiDAR Simulation under snow and foggy weather
-
Two variants of AP computation, See details here
You can refer to the information from HEAL:
- OPV2V: Please refer to this repo. You also need to download
additional-001.zipwhich stores data for camera modality.- OPV2V-H: We store our data in Huggingface Hub. Please refer to Downloading datasets tutorial for the usage.
- V2XSet: Please refer to this repo.
- V2X-Sim 2.0: Download the data from this page. Also download pickle files from google drive.
- DAIR-V2X-C: Download the data from this page. We use complemented annotation, so please also follow the instruction of this page.
Note that you can select your interested dataset to download.OPV2V-H, DAIR-V2X-C and V2X-Real are used in our experiments, so it is recommended that you download and try them first.
Create a dataset folder under GenComm and put your data there. Make the naming and structure consistent with the following:
GenComm/dataset
.
├── my_dair_v2x
│ ├── v2x_c
│ ├── v2x_i
│ └── v2x_v
├── OPV2V
│ ├── additional
│ ├── test
│ ├── train
│ └── validate
├── OPV2V_Hetero
│ ├── test
│ ├── train
│ └── validate
├── V2XSET
│ ├── test
│ ├── train
│ └── validate
├── v2xsim2-complete
│ ├── lidarseg
│ ├── maps
│ ├── sweeps
│ └── v1.0-mini
└── v2xsim2_info
├── v2xsim_infos_test.pkl
├── v2xsim_infos_train.pkl
└── v2xsim_infos_val.pkl
# create env
conda create -n gencomm python=3.8
conda activate gencomm
# install pytorch.
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
# install dependency
pip install -r requirements.txt # -i https://pypi.tuna.tsinghua.edu.cn/simple use this mirror if needed
# install this project. It's OK if EasyInstallDeprecationWarning shows up.
python setup.py develop
pip install spconv-cu116 # match your cudatoolkit version
python opencood/utils/setup.py build_ext --inplace
# OPTINAL, in this repo, you can skip following command
python opencood/pcdet_utils/setup.py build_ext --inplaceNote: By default, this repo uses spconv 2.x.
If you want to reproduce the checkpoints from the HEAL repo, you may encounter some bugs, especially when the LiDAR encoder is SECOND.
In that case, please refer to the HEAL repo to install spconv 1.2.1.
Following HEAL, We use identifiers such as m1, m2, ... to indicate the modalities and models that an agent will use.
However, yaml files without identifiers like m1 (if you are familiar with the CoAlign repository) still work in this repository. For example, PointPillar Early Fusion.
Note that there will be some differences in the weight key names of their two models' checkpoint. For example, training with the m1 identifier will assign some parameters's name with prefix like encoder_m1., backbone_m1, etc. But since the model structures are the same, you can convert them using the rename_model_dict_keys function in opencood/utils/model_utils.py.
-
The identifiers like
m1, m2inopv2v_4modality.jsonare used to assign agent type to each agent in the scene. With this assignment, we ensure the validation scenarios for all methods are consistent and fixed. To generate these json files, you can refer to heter_utils.py. -
The identifiers like
m1, m2in${METHOD}.yamlare used to specify the sensor configuration and detection model used by this agent type (likem2in the case ofcamera_pyramid.yaml).
In ${METHOD}.yaml, there is also a concept of mapping_dict. It maps the given agent type of opv2v_4modality.json to the agent type in the current experiment. If all agent types in the mapping_dict are mapped to the same category, the setting is considered a homogeneous collaborative perception scenario.
Just note that mapping_dict will not take effect during the training process to introduce more data augmentation. Each agent will be randomly assigned an agent type that exists in the yaml.
We follow the style of basic training and tesitng command of OpenCOOD and HEAL. These training and testing instructions apply to all end-to-end training methods.
We uses yaml file to configure all the parameters for training. To train your own model from scratch or a continued checkpoint, run the following commonds:
python opencood/tools/train.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]Arguments Explanation:
-yorhypes_yaml: the path of the training configuration file, e.g.opencood/hypes_yaml/opv2v/LiDAROnly/lidar_fcooper.yaml, meaning you want to train a FCooper model. We elaborate each entry of the yaml in the exemplar config fileopencood/hypes_yaml/exemplar.yaml.model_dir(optional) : the path of the checkpoints. This is used to fine-tune or continue-training. When themodel_diris given, the trainer will discard thehypes_yamland load theconfig.yamlin the checkpoint folder. In this case, ${CONFIG_FILE} can beNone,
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]--nproc_per_node indicate the GPU number you will use.
python opencood/tools/inference.py --model_dir ${CHECKPOINT_FOLDER} [--fusion_method intermediate]inference.pyhas more optional args, you can inspect into this file.[--fusion_method intermediate]the default fusion method is intermediate fusion. According to your fusion strategy in training, available fusion_method can be:- single: only ego agent's detection, only ego's gt box. [only for late fusion dataset]
- no: only ego agent's detection, all agents' fused gt box. [only for late fusion dataset]
- late: late fusion detection from all agents, all agents' fused gt box. [only for late fusion dataset]
- early: early fusion detection from all agents, all agents' fused gt box. [only for early fusion dataset]
- intermediate: intermediate fusion detection from all agents, all agents' fused gt box. [only for intermediate fusion dataset]
All the baselines adopt a two-stage training strategy: first, training the collaborative base in a homogeneous setting for each agent type, and then training the baseline methods in a heterogeneous setting with different agent types.
Suppose you are now in the GenComm/ folder. If this is your first training attempt, execute mkdir opencood/logs. Then
mkdir opencood/logs/Baselines
mkdir opencood/logs/Baselines/stage1
mkdir opencood/logs/Baselines/stage1/OPV2V_m1_att
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage1/m1_att.yaml opencood/logs/Baselines/stage1/OPV2V_m1_att/config.yaml
CUDA_VISIBLE_DEVICES=x python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage1/OPV2V_m1_att/ # x is the index of GPUs
# you can also use DDP training:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y None --model_dir opencood/logs/Baselines/stage1/OPV2V_m1_att/ # PLEASE make sure that the num of nproc_per_node equal to the num of GPUs.After the collaboration base training, you will obtain the best-validation checkpoint for each agent type (e.g., net_epoch_bestval_at17.pth). Now, in the heterogeneous collaboration stage:
- For CodeFilling and MPDA, when a new agent joins the collaboration, the entire module needs to be retrained. For example, we need to train combinations such as
m1m2,m1m2m3, andm1m2m3m4. - For BackAlign, when a new agent joins the collaboration, only the new agent’s encoder needs to be retrained. We train
m1m2,m1m3, andm1m4. - For STAMP, when a new agent joins the collaboration, its adapter and reverter to the protocol space need to be trained. We train
m0m1,m0m2,m0m3, andm0m4.
## CodeFilling & MPDA style
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2_att
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3m4_att
# Take m1m2m3 as an example
# copy config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/CodeFilling/OPV2V_m1m2m3_att.yaml opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att/config.yaml
# combine ckpt form stage1
python opencood/tools/heal_tools.py merge_and_save \
opencood/logs/Baselines/stage1/OPV2V_m2_att \
opencood/logs/Baselines/stage1/OPV2V_m3_att \
opencood/logs/Baselines/stage1/OPV2V_m1_att \
opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att
# `python opencood/tools/heal_tools.py merge_and_save` will automatically search the best checkpoints for each folder and merge them together. The collaboration base's folder (m1 here) should be put in the second to last place, while the output folder should be put last.
# Then you can train new agent type as below:
```bash
python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att # you can also use DDP training
## BackAlign & STAMP style
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m2_att
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m4_att
# Take m1m3 as an example
# copy config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/BackAlign/m1m3_att.yaml opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att/config.yaml
# combine ckpt form stage1
python opencood/tools/heal_tools.py merge_and_save \
opencood/logs/Baselines/stage1/OPV2V_m3_att \
opencood/logs/Baselines/stage1/OPV2V_m1_att \
opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att
# Make sure that ego_dir is placed as the second-to-last argument, and the directory for saving the combined checkpoint is placed as the last argument.
# Then you can train new agent type as below:
```bash
python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att # you can also use DDP training
At the inference stage, we consider two scenarios in our paper: static and dynamic.
- The static inference scenario usually includes two agent types, e.g.,
m1m2. - The dynamic inference scenario refers to dynamically adding new agent types into the collaboration.
# you can directly inference in any log_dir of stage2
CUDA_VISIBLE_DEVICES=x python opencood/tools/inference.py --model_dir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att- CodeFilling & MPDA. Use
opencood/tools/inference_heter_in_order.pyin them1m2m3m4log_dir of stage2
CUDA_VISIBLE_DEVICES=x opencood/tools/inference_heter_in_order.py --model_dir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3m4_att --use_cav [2,3,4]- BackAlign & STAMP. You need to first combine the four checkpoints from Stage 2, and then perform inference.
mkdir opencood/logs/BackAlign/stage2/OPV2V_m1m2m3m4_att_infer/ # create a log folder for final infer.
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/BackAlign/m1m2m3m4_att_infer.yaml opencood/logs/BackAlign/OPV2V_m1m2m3m4_infer/
python opencood/tools/heal_tools.py merge_and_save \
opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m2_att \
opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att \
opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m4_att \
opencood/logs/Baselines/stage2/OPV2V_m1_att \
opencood/logs/Baselines/stage2/OPV2V_m1m2m3m4_att_infer
python opencood/tools/inference_heter_in_order.py --model_dir opencood/logs/HEAL_m1_based/final_infer This will overwrite many parameters in config.yaml, including mapping_dict, comm_range, and gradually adding m1, m2, m3, m4 agent into the scene. Ground-truth will always be max_cav's fused gt boxes.
mkdir opencood/logs/GenComm/stage1/OPV2V_m1_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m2_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m3_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m4_att
# Take m1 as an example
# copy the config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/gencomm/stage1/m1_att_diffcomm.yaml opencood/logs/GenComm/stage1/OPV2V_m1_att/config.yaml
# Train
CUDA_VISIBLE_DEVICES=x python opencood/tools/trian.py -y None --model_dir opencood/logs/GenComm/stage1/OPV2V_m1_attmkdir opencood/logs/GenComm/stage2/OPV2V_m1m2_att
mkdir opencood/logs/GenComm/stage2/OPV2V_m1m3_att
mkdir opencood/logs/GenComm/stage2/OPV2V_m1m4_att
# Take m1m2 as an example
cp opencood/hypes_yaml/opv2v/GenComm_yamls/gencomm/stage2/m1m2_att.yaml opencood/logs/GenComm/stage2/OPV2V_m1m2_att/config.yaml
python opencood/tools/heal_tools.py merge_and_save \
opencood/logs/GenComm/stage1/OPV2V_m2_att \
opencood/logs/GenComm/stage1/OPV2V_m1_att \
opencood/logs/GenComm/stage2/OPV2V_m1m2_attRefer to the inference style of BackAlign above.
Attention: There are specific training and inference scripts for certain datasets and methods, such as
train_stamp.py,inference_v2xreal.py, andinference_v2xreal_heter_in_order.py. Please make sure to use the corresponding script accordingly.
- Stage1: GenComm base training under homogeneous collaboration
- Stage2: Specific Deformable Message Extractors training under heterogeneous collaboration
- New agents join the collaboration by reaching a consensus with other vendors and training specific DMEs.
@article{zhou2025pragmatic,
title={Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism},
author={Zhou, Junfei and Dai, Penglin and Wei, Quanmin and Liu, Bingyi and Wu, Xiao and Wang, Jianping},
journal={arXiv preprint arXiv:2510.19618},
year={2025}
}
This repository is built upon the excellent foundations of OpenCOOD, HEAL, and V2X-R. We sincerely appreciate @yifanlu0227, @DerrickXuNu and @ylwhxht for their outstanding contributions to the community.