This is the code repository for our ACL 2026 Findings paper Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach.
We proposes MedSSR, a framework that combines:
- Knowledge-enhanced data synthesis with controllable rare-disease knowledge injection.
- A two-stage semi-supervised RLVR pipeline: first self-supervised RL on pseudo-labeled synthetic data, then supervised RL on human-annotated real data.
Our environment is provided in MedSSR.yml (conda) and requirements.txt (pip).
Some key requirements:
torch==2.6.0+cu124
transformers==4.52.4
vllm==0.8.5.post1
verl==0.3.1
To install:
conda env create -f MedSSR.yml
conda activate MedSSROur test datasets are provided in data folder.
Input data should be a JSON list. Each sample should contain at least the following fields:
{
"id": "unique-id",
"question": "Question text with answer options",
"gold": "A",
"name": "dataset_name"
}See data/test_short_medqa.json for an example.
We apply logit bias in a second decoding pass to avoid invalid outputs. The provided launcher can evaluate multiple datasets from a single script:
MODEL_PATH=/path/to/your/model \
bash scripts/run_example.shYou can also override:
MODEL_PATH=/path/to/your/model \
BASE_OUTPUT_DIR=./outputs \
DATASET_DIR=./data \
bash scripts/run_example.shpython vllm_logitsbias_multi.py \
--model /path/to/your/model \
--dataset data/test_short_medqa.json \
--question_type mcq \
--temperature 0.6 \
--top_p 0.95 \
--top_k 20 \
--max_tokens 2048 \
--num_generations 4 \
--output_prefix outputs/sample_runFor the training pipeline, we use the verl framework.
If you find our repo useful, please cite our paper:
@article{li2026eliciting,
title={Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach},
author={Li, Haolin and Jiang, Shuyang and Zhang, Ruipeng and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng},
journal={arXiv preprint arXiv:2604.11547},
year={2026}
}