`µ²`Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

🎉🎉🎉 Our Paper accepted by the 28th conference of The Medical Image Computing and Computer Assisted Intervention Society (MICCAI). See you in Daejeon, Korea from September 23-27, 2025.

This repository contains the official paper for μ² Tokenizer, a novel approach for automated radiology report generation (RRG) introduced in the paper "μ² Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation".

Our proposed model, μ²LLM, leverages a multi-scale, multi-modal architecture to generate accurate and clinically salient radiology reports from CT scans.

👋 Introduction

we introduce μ²LLM, a multi-scale multimodal large language model. At its core is the novel μ² Tokenizer, an intermediate layer that intelligently fuses visual features from CT scans with textual information. The model is further refined using Direct Preference Optimization (DPO), guided by the specialized medical report evaluation metric, GREEN, to ensure the generated reports align with expert standards.

Our experimental results on four large-scale CT datasets show that μ²LLM outperforms existing methods, highlighting its potential for generating high-quality radiology reports even with limited training data.

🚀 Quickstart

Here, we can easily use our model based on Hugging Face.

coming soon...

🤖 Model

Model	Download Link
μ²Qwen3-8B	HuggingFace
μ²Qwen3-1.7B	HuggingFace

⚙️ Installation

git clone https://github.com/Siyou-Li/u2Tokenizer.git
cd u2Tokenizer
pip install -r requirements.txt

Ensure that the NVIDIA CUDA version 11.8 or above to be compatible with PyTorch 2.2.2.

💿 Data

Coming soon...

🚄 Training

Coming soon...

🧰 System Hardware requirements

For training, stage 1 and 2 use a 4 * 80GB A100 GPU. For inference, a single 40GB A40 GPU is used. For loading model checkpoint, approximately 39GB of CPU memory is required.

🫡 Acknowledgements

✨ Cite our work

If you find this repo useful, please consider citing:

@misc{li2025mu2tokenizerdifferentiablemultiscalemultimodal,
      title={${\mu}^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation}, 
      author={Siyou Li and Pengyao Qin and Huanan Wu and Dong Nie and Arun J. Thirunavukarasu and Juntao Yu and Le Zhang},
      year={2025},
      eprint={2507.00316},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.00316}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
assets		assets
base_model_tokenizers		base_model_tokenizers
config		config
eval		eval
evalscipt		evalscipt
green_refactored		green_refactored
green_score_accelerate		green_score_accelerate
hpc		hpc
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
green_score.yml		green_score.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`µ²`Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

👋 Introduction

🚀 Quickstart

🤖 Model

⚙️ Installation

💿 Data

🚄 Training

🧰 System Hardware requirements

🫡 Acknowledgements

✨ Cite our work

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Siyou-Li/u2Tokenizer

Folders and files

Latest commit

History

Repository files navigation

µ2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

👋 Introduction

🚀 Quickstart

🤖 Model

⚙️ Installation

💿 Data

🚄 Training

🧰 System Hardware requirements

🫡 Acknowledgements

✨ Cite our work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

`µ²`Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Packages