Skip to content

[NeurIPS 2025] Official repo for paper "Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism."

License

Notifications You must be signed in to change notification settings

jeffreychou777/GenComm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenComm (Generative Communication)

[NeurIPS 2025] Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism

Junfei Zhou, Penglin Dai✉, Quanmin Wei, Bingyi Liu, Xiao Wu, Jianping Wang

Homepage | ArXiv | Poster

This repository provides a unified and integrated multi-agent collaborative perception framework, extended from HEAL to support heterogeneous settings across sensors, modalities, and models. Building on HEAL, we add support for additional features, datasets, and multiple heterogeneous collaboration methods. Explore this repository to discover the ultimate experience of heterogeneous collaboration.🌟

GenComm Teaser

Repo Features

Inherit from HEAL

What's new 🌟

Data Preparation

You can refer to the information from HEAL:

  • OPV2V: Please refer to this repo. You also need to download additional-001.zip which stores data for camera modality.
  • OPV2V-H: We store our data in Huggingface Hub. Please refer to Downloading datasets tutorial for the usage.
  • V2XSet: Please refer to this repo.
  • V2X-Sim 2.0: Download the data from this page. Also download pickle files from google drive.
  • DAIR-V2X-C: Download the data from this page. We use complemented annotation, so please also follow the instruction of this page.

Note that you can select your interested dataset to download.OPV2V-H, DAIR-V2X-C and V2X-Real are used in our experiments, so it is recommended that you download and try them first.

Create a dataset folder under GenComm and put your data there. Make the naming and structure consistent with the following:

GenComm/dataset

. 
├── my_dair_v2x 
│   ├── v2x_c
│   ├── v2x_i
│   └── v2x_v
├── OPV2V
│   ├── additional
│   ├── test
│   ├── train
│   └── validate
├── OPV2V_Hetero
│   ├── test
│   ├── train
│   └── validate
├── V2XSET
│   ├── test
│   ├── train
│   └── validate
├── v2xsim2-complete
│   ├── lidarseg
│   ├── maps
│   ├── sweeps
│   └── v1.0-mini
└── v2xsim2_info
    ├── v2xsim_infos_test.pkl
    ├── v2xsim_infos_train.pkl
    └── v2xsim_infos_val.pkl

Installation

# create env
conda create -n gencomm python=3.8
conda activate gencomm
# install pytorch. 
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
# install dependency
pip install -r requirements.txt # -i https://pypi.tuna.tsinghua.edu.cn/simple use this mirror if needed

# install this project. It's OK if EasyInstallDeprecationWarning shows up.
python setup.py develop

pip install spconv-cu116 # match your cudatoolkit version
python opencood/utils/setup.py build_ext --inplace

# OPTINAL, in this repo, you can skip following command
python opencood/pcdet_utils/setup.py build_ext --inplace

Note: By default, this repo uses spconv 2.x.
If you want to reproduce the checkpoints from the HEAL repo, you may encounter some bugs, especially when the LiDAR encoder is SECOND.
In that case, please refer to the HEAL repo to install spconv 1.2.1.

Yaml Style

Following HEAL, We use identifiers such as m1, m2, ... to indicate the modalities and models that an agent will use.

However, yaml files without identifiers like m1 (if you are familiar with the CoAlign repository) still work in this repository. For example, PointPillar Early Fusion.

Note that there will be some differences in the weight key names of their two models' checkpoint. For example, training with the m1 identifier will assign some parameters's name with prefix like encoder_m1., backbone_m1, etc. But since the model structures are the same, you can convert them using the rename_model_dict_keys function in opencood/utils/model_utils.py.

Agent type identifier

  • The identifiers like m1, m2 in opv2v_4modality.json are used to assign agent type to each agent in the scene. With this assignment, we ensure the validation scenarios for all methods are consistent and fixed. To generate these json files, you can refer to heter_utils.py.

  • The identifiers like m1, m2 in ${METHOD}.yaml are used to specify the sensor configuration and detection model used by this agent type (like m2 in the case of camera_pyramid.yaml).

In ${METHOD}.yaml, there is also a concept of mapping_dict. It maps the given agent type of opv2v_4modality.json to the agent type in the current experiment. If all agent types in the mapping_dict are mapped to the same category, the setting is considered a homogeneous collaborative perception scenario.

Just note that mapping_dict will not take effect during the training process to introduce more data augmentation. Each agent will be randomly assigned an agent type that exists in the yaml.

Basic Train / Test Command

We follow the style of basic training and tesitng command of OpenCOOD and HEAL. These training and testing instructions apply to all end-to-end training methods.

Train the model

We uses yaml file to configure all the parameters for training. To train your own model from scratch or a continued checkpoint, run the following commonds:

python opencood/tools/train.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]

Arguments Explanation:

  • -y or hypes_yaml : the path of the training configuration file, e.g. opencood/hypes_yaml/opv2v/LiDAROnly/lidar_fcooper.yaml, meaning you want to train a FCooper model. We elaborate each entry of the yaml in the exemplar config file opencood/hypes_yaml/exemplar.yaml.
  • model_dir (optional) : the path of the checkpoints. This is used to fine-tune or continue-training. When the model_dir is given, the trainer will discard the hypes_yaml and load the config.yaml in the checkpoint folder. In this case, ${CONFIG_FILE} can be None,

Train the model in DDP

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch  --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y ${CONFIG_FILE} [--model_dir ${CHECKPOINT_FOLDER}]

--nproc_per_node indicate the GPU number you will use.

Test the model

python opencood/tools/inference.py --model_dir ${CHECKPOINT_FOLDER} [--fusion_method intermediate]
  • inference.py has more optional args, you can inspect into this file.
  • [--fusion_method intermediate] the default fusion method is intermediate fusion. According to your fusion strategy in training, available fusion_method can be:
    • single: only ego agent's detection, only ego's gt box. [only for late fusion dataset]
    • no: only ego agent's detection, all agents' fused gt box. [only for late fusion dataset]
    • late: late fusion detection from all agents, all agents' fused gt box. [only for late fusion dataset]
    • early: early fusion detection from all agents, all agents' fused gt box. [only for early fusion dataset]
    • intermediate: intermediate fusion detection from all agents, all agents' fused gt box. [only for intermediate fusion dataset]

Training and inference of Baselines

All the baselines adopt a two-stage training strategy: first, training the collaborative base in a homogeneous setting for each agent type, and then training the baseline methods in a heterogeneous setting with different agent types.

Stage 1: Train the Collaboration Base

Suppose you are now in the GenComm/ folder. If this is your first training attempt, execute mkdir opencood/logs. Then

mkdir opencood/logs/Baselines
mkdir opencood/logs/Baselines/stage1
mkdir opencood/logs/Baselines/stage1/OPV2V_m1_att

cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage1/m1_att.yaml opencood/logs/Baselines/stage1/OPV2V_m1_att/config.yaml

CUDA_VISIBLE_DEVICES=x python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage1/OPV2V_m1_att/  # x is the index of GPUs

# you can also use DDP training:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch  --nproc_per_node=2 --use_env opencood/tools/train_ddp.py -y None --model_dir opencood/logs/Baselines/stage1/OPV2V_m1_att/ # PLEASE make sure that the num of nproc_per_node equal to the num of GPUs.

Stage 2: Train New Agent Types

After the collaboration base training, you will obtain the best-validation checkpoint for each agent type (e.g., net_epoch_bestval_at17.pth). Now, in the heterogeneous collaboration stage:

  • For CodeFilling and MPDA, when a new agent joins the collaboration, the entire module needs to be retrained. For example, we need to train combinations such as m1m2, m1m2m3, and m1m2m3m4.
  • For BackAlign, when a new agent joins the collaboration, only the new agent’s encoder needs to be retrained. We train m1m2, m1m3, and m1m4.
  • For STAMP, when a new agent joins the collaboration, its adapter and reverter to the protocol space need to be trained. We train m0m1, m0m2, m0m3, and m0m4.
## CodeFilling & MPDA style
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2_att
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att
mkdir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3m4_att

# Take m1m2m3 as an example

# copy config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/CodeFilling/OPV2V_m1m2m3_att.yaml opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att/config.yaml

# combine ckpt form stage1
python opencood/tools/heal_tools.py merge_and_save \
  opencood/logs/Baselines/stage1/OPV2V_m2_att \
  opencood/logs/Baselines/stage1/OPV2V_m3_att \
  opencood/logs/Baselines/stage1/OPV2V_m1_att \
  opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att

# `python opencood/tools/heal_tools.py merge_and_save` will automatically search the best checkpoints for each folder and merge them together. The collaboration base's folder (m1 here) should be put in the second to last place, while the output folder should be put last.

# Then you can train new agent type as below:
```bash
python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3_att # you can also use DDP training

## BackAlign & STAMP style
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m2_att
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att
mkdir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m4_att

# Take m1m3 as an example 
# copy config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/BackAlign/m1m3_att.yaml opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att/config.yaml

# combine ckpt form stage1
python opencood/tools/heal_tools.py merge_and_save \
  opencood/logs/Baselines/stage1/OPV2V_m3_att \
  opencood/logs/Baselines/stage1/OPV2V_m1_att \
  opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att

# Make sure that ego_dir is placed as the second-to-last argument, and the directory for saving the combined checkpoint is placed as the last argument.

# Then you can train new agent type as below:
```bash
python opencood/tools/train.py -y None --model_dir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att # you can also use DDP training

Stage 3: Inference

At the inference stage, we consider two scenarios in our paper: static and dynamic.

  • The static inference scenario usually includes two agent types, e.g., m1m2.
  • The dynamic inference scenario refers to dynamically adding new agent types into the collaboration.

For static inference

# you can directly inference in any log_dir of stage2
CUDA_VISIBLE_DEVICES=x python opencood/tools/inference.py --model_dir opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att

For dynamic inference

  • CodeFilling & MPDA. Use opencood/tools/inference_heter_in_order.py in the m1m2m3m4 log_dir of stage2
CUDA_VISIBLE_DEVICES=x opencood/tools/inference_heter_in_order.py --model_dir opencood/logs/Baselines/stage2/CodeFilling/OPV2V_m1m2m3m4_att --use_cav [2,3,4]
  • BackAlign & STAMP. You need to first combine the four checkpoints from Stage 2, and then perform inference.
mkdir opencood/logs/BackAlign/stage2/OPV2V_m1m2m3m4_att_infer/ # create a log folder for final infer.

cp opencood/hypes_yaml/opv2v/GenComm_yamls/baselines/stage2/BackAlign/m1m2m3m4_att_infer.yaml opencood/logs/BackAlign/OPV2V_m1m2m3m4_infer/

python opencood/tools/heal_tools.py merge_and_save \
  opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m2_att \
  opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m3_att \
  opencood/logs/Baselines/stage2/BackAlign/OPV2V_m1m4_att \
  opencood/logs/Baselines/stage2/OPV2V_m1_att \
  opencood/logs/Baselines/stage2/OPV2V_m1m2m3m4_att_infer

python opencood/tools/inference_heter_in_order.py --model_dir opencood/logs/HEAL_m1_based/final_infer 

This will overwrite many parameters in config.yaml, including mapping_dict, comm_range, and gradually adding m1, m2, m3, m4 agent into the scene. Ground-truth will always be max_cav's fused gt boxes.

Training and inference of GenComm

Stage 1: Base training of GenComm

mkdir opencood/logs/GenComm/stage1/OPV2V_m1_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m2_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m3_att
mkdir opencood/logs/GenComm/stage1/OPV2V_m4_att

# Take m1 as an example
# copy the config.yaml
cp opencood/hypes_yaml/opv2v/GenComm_yamls/gencomm/stage1/m1_att_diffcomm.yaml opencood/logs/GenComm/stage1/OPV2V_m1_att/config.yaml

# Train
CUDA_VISIBLE_DEVICES=x python opencood/tools/trian.py -y None --model_dir opencood/logs/GenComm/stage1/OPV2V_m1_att

Stage 2: Heterogenous Collaboration

mkdir opencood/logs/GenComm/stage2/OPV2V_m1m2_att
mkdir opencood/logs/GenComm/stage2/OPV2V_m1m3_att
mkdir opencood/logs/GenComm/stage2/OPV2V_m1m4_att

# Take m1m2 as an example
cp opencood/hypes_yaml/opv2v/GenComm_yamls/gencomm/stage2/m1m2_att.yaml opencood/logs/GenComm/stage2/OPV2V_m1m2_att/config.yaml

python opencood/tools/heal_tools.py merge_and_save \
  opencood/logs/GenComm/stage1/OPV2V_m2_att \
  opencood/logs/GenComm/stage1/OPV2V_m1_att \
  opencood/logs/GenComm/stage2/OPV2V_m1m2_att

Stage3: Inference stage

Refer to the inference style of BackAlign above.

Attention: There are specific training and inference scripts for certain datasets and methods, such as train_stamp.py, inference_v2xreal.py, and inference_v2xreal_heter_in_order.py. Please make sure to use the corresponding script accordingly.

Real-World Practice Rationale

  • Stage1: GenComm base training under homogeneous collaboration
  • Stage2: Specific Deformable Message Extractors training under heterogeneous collaboration
  • New agents join the collaboration by reaching a consensus with other vendors and training specific DMEs.

Citation

@article{zhou2025pragmatic,
  title={Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism},
  author={Zhou, Junfei and Dai, Penglin and Wei, Quanmin and Liu, Bingyi and Wu, Xiao and Wang, Jianping},
  journal={arXiv preprint arXiv:2510.19618},
  year={2025}
}

Acknowledge

This repository is built upon the excellent foundations of OpenCOOD, HEAL, and V2X-R. We sincerely appreciate @yifanlu0227, @DerrickXuNu and @ylwhxht for their outstanding contributions to the community.

About

[NeurIPS 2025] Official repo for paper "Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism."

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published