Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

🔥 News

[2025/06/20] The code are publicly available! 🚀
[2026/06/13] Hepto-LLaVA has been accepted at MICCAI 2026 as a poster! 🎉
[2026/03/01] Hepto-LLaVA is now live on arXiv! 🔥

👀 Introduction

Hepatocellular Carcinoma (HCC) relies on histopathological Whole Slide Images (WSIs) examination as the gold standard. However, manual analysis of these gigapixel, highly heterogeneous WSIs is labor-intensive and prone to inter-observer variability. This has catalyzed WSI-based Multi-modal Large Language Models (MLLMs) to enable VQA.

A key challenge in pathology MLLMs is gigapixel WSI representation. Existing methods either use thumbnail-based approaches that lose critical high-resolution diagnostic details, or employ slide-encoder approaches that generate excessively redundant tokens.

We propose Hepato-LLaVA, a specialized MLLM for fine-grained hepatocellular pathology analysis. It features a novel Hierarchical Sparse Visual Attention (HSVA) mechanism that models 2D tissue topology to aggregate diagnostic evidence while preserving context. To address multiscale data scarcity, we also present HepatoPathoVQA, comprising 33K hierarchically structured QA pairs validated by pathologists. Hepato-LLaVA achieves state-of-the-art diagnostic accuracy, outperforming existing pathology MLLMs by an absolute 20%.

🛠️ Installation

git clone https://github.com/wssf3092/Hepato-LLaVA.git
cd Hepato-LLaVA

conda create -n hepato_llava python=3.10 -y
conda activate hepato_llava

pip install --upgrade pip
pip install -r requirements.txt

For the patch encoder, please follow the official installation instructions of CONCH to set up the model and obtain the pretrained weights.

📦 Data Preparation

Feature Extraction

Use the CONCH encoder to extract patch-level features from WSIs:

bash data/feature/1_run.sh

For data augmentation (generating 9 variants per WSI):

bash data/feature/1_run_augment.sh

Data Format Conversion

Convert VQA data to LLaVA fine-tuning format:

data/conversation/qa.py — convert QA JSONL to LLaVA fine-tuning format
data/conversation/caption.py — convert captioning data to fine-tuning format

🚀 Training

Hepato-LLaVA follows a three-stage training pipeline:

Stage 1: MAE Pre-training — Self-supervised pre-training of the HSAN slide encoder with curriculum masking (patch-level → pack-level):

bash scripts/run_mae.sh

Stage 2: MoCo Pre-training — Contrastive learning for summary token representations:

bash scripts/run_moco_summary.sh

Stage 3: LLaVA Fine-tuning — End-to-end fine-tuning with DeepSpeed and LoRA:

bash scripts/run_llava_finetune.sh

🔍 Inference & Evaluation

Run VQA evaluation:

bash scripts/run_eval_vqa.sh

For GPT-4 based open-ended evaluation:

python scripts/eval_open.py

For choice question statistics:

python scripts/stat_choice.py

⚙️ Hyperparameter Settings

LoRA

Parameter	Value
LORA_R	128
LORA_ALPHA	256

Training

Parameter	Value
NUM_EPOCHS	3
BATCH_SIZE	8
GRADIENT_ACCUMULATION	4
LEARNING_RATE	2e-5
MM_PROJECTOR_LR	2e-5
WARMUP_RATIO	0.03
MODEL_MAX_LENGTH	8192

Generation

Parameter	Value
TEMPERATURE	0.0
TOP_P	0.9
NUM_BEAMS	1
MAX_NEW_TOKENS	2048

📚 Citation

@article{hepatollava2026,
  title={Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images},
  author={Yang, Yuxuan and Yan, Zhonghao and Zhang, Yi and Yun, Bo and Diao, Muxi and Zhao, Guowei and Liang, Kongming and Li, Wenbin and Ma, Zhanyu},
  year={2026}
}

🙏 Acknowledgements

This code is built on CONCH and WSI-LLaVA. We thank the authors for sharing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data		data
model		model
network		network
scripts		scripts
static		static
third_party		third_party
utils		utils
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
eval_vqa.py		eval_vqa.py
index.html		index.html
pretrain_mae.py		pretrain_mae.py
requirements.txt		requirements.txt
train_moco_summary.py		train_moco_summary.py
train_wsi_llava_v2.py		train_wsi_llava_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

🔥 News

👀 Introduction

🛠️ Installation

📦 Data Preparation

Feature Extraction

Data Format Conversion

🚀 Training

🔍 Inference & Evaluation

⚙️ Hyperparameter Settings

LoRA

Training

Generation

📚 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

🔥 News

👀 Introduction

🛠️ Installation

📦 Data Preparation

Feature Extraction

Data Format Conversion

🚀 Training

🔍 Inference & Evaluation

⚙️ Hyperparameter Settings

LoRA

Training

Generation

📚 Citation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages