The official implementation of Data-efficient deep learning for gene rearrangement status prediction in DLBCL (Under review in npj Digital Medicine).
Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid malignancy, comprising about 30% of adult cases. Current DLBCL examination workflows face challenges because conventional FISH is time-consuming, tissue-intensive, and costly.
To address these issues, we propose a novel framework named HE2FISH, trained on large-scale pathology image datasets to assist the DLBCL workflow by providing insights into the macroscopic morphological features that indicate DLBCL gene rearrangement status.
The main novelty of HE2FISH is illustrated in Figure 1. We propose an attention-based classification model with multi-scale experts to predict three gene rearrangement statuses—BCL2, BCL6, and MYC—in order to assist DLBCL diagnosis.
Figure 1. Overview of HE2FISH for gene rearrangement status prediction using DLBCL pathology imaging.
First clone the repository and enter the project directory:
git clone https://github.com/mahmoodlab/HE2FISH.git
cd HE2FISHIntall the dependencies using cmd:
conda env create -f environment.yml
conda activate he2fishThe code is developed and tested using pytorch 2.6.0 + cu126. Other versions of pytorch are not fully tested.
Place your data under the Datasets/ directory. Please prepare the data using Trident or CLAM. Data split script and patient information could be refered by CLAM. HE2FISH extracts patch features using UNI2-h, which is an outstanding computational pathology vision foundation model published at Nature Medicine. We extract features at both 20x and 5x magnifications, and concat them using the coordinates at 20x.
HE2FISH weights could be saved at "./weights/" (a new dir), which could help you to evaluate the performance of our model. Of course, you can also train the backbone from scratch.
Training your HE2FISH:
python trainer_moe_scale.py # only pathology
python trainer_moe_clinic.py # pathology with electric health recordTesting your HE2FISH:
python tester_moe.py # only pathology
python tester_moe_with_clinic.py # pathology with electric health recordWe also provide evaluation utilities, including attention map generation (visualization.ipynb), ROC plotting (slide-level evaluation.ipynb or patient-level evaluation.ipynb), confusion matrix plotting (same notebooks), and explainability analyses (SI-MIL).
HE2FISH achieves encouraging performance on all three tasks, improving diagnostic efficiency and revealing pathological attributes contained in DLBCL slides:
Figure 2. Attention maps generated by HE2FISH.
If you find this work or code helpful, please cite:
@article{cai2025attrimil,
title={AttriMIL: Revisiting attention-based multiple instance learning for whole-slide pathological image classification from a perspective of instance attributes},
author={Cai, Linghan and Huang, Shenjin and Zhang, Ye and Lu, Jinpeng and Zhang, Yongbing},
journal={Medical Image Analysis},
pages={103631},
year={2025},
publisher={Elsevier}
}
Thanks to the following work for improving our project:
- TRIDENT: https://github.com/mahmoodlab/TRIDENT
- UNI: https://github.com/mahmoodlab/uni
- AttriMIL: https://github.com/MedCAI/AttriMIL
- CLAM: https://github.com/mahmoodlab/CLAM
Distributed under the Apache 2.0 License. See LICENSE for more information.