[ACL 2026] MedSSR

This is the code repository for our ACL 2026 Findings paper Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach.

Overview

We proposes MedSSR, a framework that combines:

Knowledge-enhanced data synthesis with controllable rare-disease knowledge injection.
A two-stage semi-supervised RLVR pipeline: first self-supervised RL on pseudo-labeled synthetic data, then supervised RL on human-annotated real data.

Environment

Our environment is provided in MedSSR.yml (conda) and requirements.txt (pip).

Some key requirements:

torch==2.6.0+cu124
transformers==4.52.4
vllm==0.8.5.post1
verl==0.3.1

To install:

conda env create -f MedSSR.yml
conda activate MedSSR

Dataset

Our test datasets are provided in data folder.

Input data should be a JSON list. Each sample should contain at least the following fields:

{
  "id": "unique-id",
  "question": "Question text with answer options",
  "gold": "A",
  "name": "dataset_name"
}

See data/test_short_medqa.json for an example.

Evaluation

Option 1: Run the example script

We apply logit bias in a second decoding pass to avoid invalid outputs. The provided launcher can evaluate multiple datasets from a single script:

MODEL_PATH=/path/to/your/model \
bash scripts/run_example.sh

You can also override:

MODEL_PATH=/path/to/your/model \
BASE_OUTPUT_DIR=./outputs \
DATASET_DIR=./data \
bash scripts/run_example.sh

Option 2: Run the Python script directly

python vllm_logitsbias_multi.py \
  --model /path/to/your/model \
  --dataset data/test_short_medqa.json \
  --question_type mcq \
  --temperature 0.6 \
  --top_p 0.95 \
  --top_k 20 \
  --max_tokens 2048 \
  --num_generations 4 \
  --output_prefix outputs/sample_run

Training

For the training pipeline, we use the verl framework.

Citation

If you find our repo useful, please cite our paper:

@article{li2026eliciting,
  title={Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach},
  author={Li, Haolin and Jiang, Shuyang and Zhang, Ruipeng and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng},
  journal={arXiv preprint arXiv:2604.11547},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
scripts		scripts
utils		utils
LICENSE		LICENSE
MedSSR.yml		MedSSR.yml
README.md		README.md
requirements.txt		requirements.txt
vllm_logitsbias_multi.py		vllm_logitsbias_multi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ACL 2026] MedSSR

Overview

Environment

Dataset

Evaluation

Option 1: Run the example script

Option 2: Run the Python script directly

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ACL 2026] MedSSR

Overview

Environment

Dataset

Evaluation

Option 1: Run the example script

Option 2: Run the Python script directly

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages