Skip to content

tdlhl/MedSSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ACL 2026] MedSSR

Paper Model Trainset Testset

This is the code repository for our ACL 2026 Findings paper Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach.

Overview

We proposes MedSSR, a framework that combines:

  • Knowledge-enhanced data synthesis with controllable rare-disease knowledge injection.
  • A two-stage semi-supervised RLVR pipeline: first self-supervised RL on pseudo-labeled synthetic data, then supervised RL on human-annotated real data.

Environment

Our environment is provided in MedSSR.yml (conda) and requirements.txt (pip).

Some key requirements:

torch==2.6.0+cu124
transformers==4.52.4
vllm==0.8.5.post1
verl==0.3.1

To install:

conda env create -f MedSSR.yml
conda activate MedSSR

Dataset

Our test datasets are provided in data folder.

Input data should be a JSON list. Each sample should contain at least the following fields:

{
  "id": "unique-id",
  "question": "Question text with answer options",
  "gold": "A",
  "name": "dataset_name"
}

See data/test_short_medqa.json for an example.

Evaluation

Option 1: Run the example script

We apply logit bias in a second decoding pass to avoid invalid outputs. The provided launcher can evaluate multiple datasets from a single script:

MODEL_PATH=/path/to/your/model \
bash scripts/run_example.sh

You can also override:

MODEL_PATH=/path/to/your/model \
BASE_OUTPUT_DIR=./outputs \
DATASET_DIR=./data \
bash scripts/run_example.sh

Option 2: Run the Python script directly

python vllm_logitsbias_multi.py \
  --model /path/to/your/model \
  --dataset data/test_short_medqa.json \
  --question_type mcq \
  --temperature 0.6 \
  --top_p 0.95 \
  --top_k 20 \
  --max_tokens 2048 \
  --num_generations 4 \
  --output_prefix outputs/sample_run

Training

For the training pipeline, we use the verl framework.

Citation

If you find our repo useful, please cite our paper:

@article{li2026eliciting,
  title={Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach},
  author={Li, Haolin and Jiang, Shuyang and Zhang, Ruipeng and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng},
  journal={arXiv preprint arXiv:2604.11547},
  year={2026}
}

About

[ACL 2026] This is the official repository for "Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors