Joint Multi-View Embedding with Progressive Multi-Scale Alignment for Unaligned Infrared-Visible Image Fusion
| Name | ORCID |
|---|---|
| Yida Chen (a) | |
| Yafei Zhang (a) | |
| Huafeng Li (a, ✉ corresponding author) | |
| Zhengtao Yu (a) | |
| Yu Liu (b) |
🏫 Affiliations
a Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China
b Department of Biomedical Engineering, Hefei University of Technology, Hefei, Anhui, 230009, China
We propose an end-to-end infrared-visible image fusion network (ME-PMA) with feature-level registration, featuring:
- End-to-end registration and fusion for unaligned scenarios
- Progressive multi-scale feature alignment with multi-view embedding
- Superior performance across datasets with single model weights
Key Components:
- Feature Encoder: SFE, UIB_Block, and Restormer
- MSPA: Multi-Scale Progressive Alignment module
- Feature Decoder: FFCM fusion and FRRB reconstruction
- Restormer_Corr: Global feature extraction with local correlation
- UIB_CA: Channel attention for local features
- Reg_flow: Multi-view registration flow prediction
git clone https://github.com/yidamyth/ME-PMA.git
cd ME-PMA
# Create conda environment
conda create -n ME-PMA python=3.9.18
conda activate ME-PMA
# Install PyTorch
pip install torch==1.12.1+cu113
pip install torchvision==0.13.1+cu113
# Install dependencies
pip install -r requirements.txt.
└── ./DataSet/IVIF/
├── M3FD
├── test
├── ir
├── ir_move
└── vis
├── MSRS
├── test
├── ir
├── ir_move
└── vis
└── RoadScene
├── RoadS_test
│ ├── ir
│ ├── ir_move
│ └── vis
└── RoadS_train
├── ir
└── vis
End-to-end feature-level registration and fusion results (input images from unaligned scenarios)
python test_phase2.py
# Save to: ./DataSet/IVIF/RoadScene/RoadS_test/Results/UnAligned/Direct fusion results, without using the registration module (input images from aligned scenarios)
python test.py
# Save to: ./DataSet/IVIF/RoadScene/RoadS_test/Results/Aligned/You can switch datasets to get results from different datasets. The default is test_path['RoadScene'] or ['M3FD'] or ['MSRS']; the same model weights are used for all different dataset tests.
# 1. Find python location
which python
# ouput: /home/yida/anaconda3/envs/ME-PMA/bin/python
# 2. Edit conda path
vim run.sh
# 3. Switch to your anaconda3 conda path
eval "$(/home/your_user_name_xxx/anaconda3/bin/conda shell.bash hook)"
# 4. Save vim
# 5. Run
sh ./run.sh
# 6. Check logs
tail -f ./Logs/nohup/2024-1119-1001_time.log
# 7. Run program in background, can exit terminal
# Model save path: ./Model/Parameters/24-1119-1001/
# 8. Exit program
control + z# 1. Edit conda path
vim run_phase2.sh
eval "$(/home/your_user_name_xxx/anaconda3/bin/conda shell.bash hook)"
# Load first stage model path
phase2_model_id='24-1119-1001'
phase2_ModelPath='./Model/Parameters/24-1119-1001/RegImageFusModel-best.pth'
# Save vim
# 2. Run
sh ./run_phase2.sh
# 3. Check logs
tail -f ./Logs/nohup/2024-1119-1355_time.log
# 4. Exit program
control + z$Q_{CE↓}$ $Q_{MI↑}$ $Q_{VIF↑}$ $Q_{AB/F↑}$ $Q_{CB↑}$ $Q_{CV↓}$
You can get our detailed quantitative evaluation metrics, using the following example:
python ./Util/metrics_fus.py$Q_{MI↑}$ $Q_{MS-SSIM↑}$ $Q_{NCC↑}$
You can get our detailed quantitative evaluation metrics, using the following example:
python ./Util/metrics_reg.pyFor convenience, the provided metric scripts allow you to directly reproduce the results reported in the paper.
We provide the complete fusion results for direct and unbiased metric evaluation. For convenience, resized input images are also included to help reproduce the results. While the overall outputs remain consistent, minor pixel-level differences may occur due to the resizing operation.
Note that the registration evaluation metrics are the average of the three datasets.
(Red bold = best, Orange bold = second-best)
📌 Note: In each column, values in red bold are the best, and values in orange bold are the second-best.
(Red bold = best, Orange bold = second-best)
📌 Note: In each column, values in red bold are the best, and values in orange bold are the second-best.
The related models will be updated and uploaded soon.
cd ./ME-PMA
python -m Model.Architecture.RegImageFusModelThe overall architecture of this project was independently designed by the author@Yida Chen. However, parts of the implementation reference the following excellent open-source works:
-
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
- (CVPR 2023) https://github.com/haozixiang1228/MMIF-CDDFuse
-
Correlation-aware Coarse-to-Fine MLPs for Deformable Medical Image Registration
-
MobileNetv4 Implementations
- MobileNetv4-1: https://github.com/jiaowoguanren0615/MobileNetV4
- MobileNetv4-2: https://github.com/jaiwei98/MobileNetV4-pytorch
- Analysis of Quality Objective Assessment Metrics for Visible and Infrared Image Fusion
- (Journal of Image and Graphics 2023) https://github.com/sunbinuestc/VIF-metrics-analysis
- MulimgViewer (for local detail visualization)
We sincerely appreciate the open-source community for providing valuable tools, resources, and inspiration that greatly supported the development of this project.
If this work benefits your research, a citation to our paper would be greatly appreciated:
@article{2026_ME-PMA,
title = {Joint multi-view embedding with progressive multi-scale alignment for unaligned infrared-visible image fusion},
author = {Chen, Yida and Zhang, Yafei and Li, Huafeng and Yu, Zhengtao and Liu, Yu},
journal = {Information Fusion},
volume = {128},
pages = {103960},
year = {2026},
doi = {10.1016/j.inffus.2025.103960}
}
This project uses the MIT License. See LICENSE file.
Thank you for your attention. If you have any questions, please contact us by email at yida_myth@163.com. We will get back to you as soon as possible, and you may also raise your questions through the Issues page.