TCM-Ladder, the first comprehensive multimodal QA dataset specifically designed for evaluating large TCM language models.
-
Multiple core disciplines of TCM: fundamental theory, diagnostics, herbal formulas, internal medicine, surgery, pharmacognosy, and pediatrics.
-
Multimodal: TCM-Ladder incorporates various modalities such as images and videos.
-
Multiple question formats: single-choice, multiple-choice, fill-in-the-blank, diagnostic dialogue, and visual comprehension tasks.
We trained a reasoning model on TCM-Ladder and conducted comparative experiments against nine state-of-the-art general domain and five leading TCM-specific LLMs to evaluate their performance on the dataset. Moreover, we propose Ladder-Score, an evaluation method specifically designed for TCM question answering that effectively assesses answer quality in terms of terminology usage and semantic expression. To the best of our knowledge, this is the first work to systematically evaluate mainstream general domain and TCM-specific LLMs on a unified multimodal benchmark. The datasets and leaderboard are publicly available at https://tcmladder.com and will be continuously updated.
- [2025-9] Our paper is accepted by NeurIPS 2025.
- [2025-5] We release our preprint paper on arXiv.
- [2025-5] Our dataset TCM-Ladder is released on Huggingface.
- English version.
- Instructions to run evaluation.
3. Performance of general-domain and TCM-specific language models on single and multiple-choice question answering tasks
4. The performance of large language models on questions regarding Chinese herbal medicine and tongue images.
if you find our work useful in your research, please consider citing:
@article{xie2025tcm,
title={TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine},
author={Xie, Jiacheng and Yu, Yang and Zhang, Ziyang and Zeng, Shuai and He, Jiaxuan and Vasireddy, Ayush and Tang, Xiaoting and Guo, Congyu and Zhao, Lening and Jing, Congcong and others},
journal={arXiv preprint arXiv:2505.24063},
year={2025}
}