Skip to content

CarlosVirella1/ReasonMed

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

[📖Paper]    [🤗ReasonMed Dataset]

[🤗ReasonMed-7B model]    [🤗CoTMed-7B model]    [🤗ReasponseMed-7B model]

Table of Contents

  1. Introduction
  2. Installation
  3. Modules
  4. Example Pipeline
  5. Conclusion

Introduction

ReasonMed is a comprehensive multi-agent generated dataset designed to advance medical reasoning capabilities. It is equipped with various tools and modules for generating, validating, optimizing, ranking, summarizing, and evaluating Chain-of-Thought (CoT) responses in the medical domain. ReasonMed's goal is to help researchers and practitioners improve and assess medical reasoning in clinical decision-making.

This README provides an overview of ReasonMed's core functionality, installation instructions, usage examples, and how to integrate each module into your medical reasoning workflow.

Installation

Clone the Repository

To get started, clone the ReasonMed repository:

git clone https://github.com/YuSun-Work/ReasonMed.git
cd ReasonMed

Requirements

conda create -n reasonmed python=3.11 -y
conda activate reasonmed
pip install -r requirements.txt

Note: Ensure that you have access to the models or endpoints mentioned for inference in each script.

Modules

Generate CoTs

This module generates multiple Chain-of-Thought (CoT) responses from three different models, each generating three CoTs for a given question.

Command Example:

python generate_9cot.py --data_path /path/to/question.json --model_path1 /path/to/model1 --model_path2 /path/to/model2 --model_path3 /path/to/model3 --json_path /path/to/save_cot.json

Input Example (question.json):

[
    {
        "question": "Chronic urethral obstruction due to benign prostatic hyperplasia can lead to the following change in kidney parenchyma",
        "options": [
            "Hyperplasia",
            "Hypertrophy",
            "Atrophy",
            "Dysplasia"
        ]
    }
]
  • data_path: The path to the JSON file containing clinical questions and multiple-choice options.
  • model_path1, model_path2, model_path3: Paths to the three models used for generating CoTs.
  • json_path: Path to save the generated CoTs.

Output:

The generated CoTs will be saved in a specified JSON file.

Intermediate File:

intermediate.json can be used to store results after each stage, useful for debugging or troubleshooting.

Evaluate CoTs

This script validates the generated CoTs by verifying their correctness against clinical reasoning.

Command Example:

python verifier.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model
  • input_json: Path to the JSON file containing generated CoTs.
  • model_path: Path to the model used for evaluating the CoTs.

Output:

Validates the CoTs and outputs a verdict (e.g., Correct, Error).

Quality Ranker

The quality ranker ranks the CoTs generated for each clinical question, keeping the top two most valid CoTs.

Command Example:

python quality_ranker.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model --intermediate_file /path/to/intermediate.json --final_output /path/to/final_results.json
  • input_json: The JSON file with the CoTs to be evaluated and ranked.
  • model_path: Path to the model used for ranking.
  • intermediate_file: Path to save intermediate ranking results.
  • final_output: Path to save the final ranked CoTs.

Output:

Ranks the CoTs and saves the best two CoTs per clinical question.

Error Refiner

This module refines CoTs that have errors or incomplete reasoning by leveraging error feedback to improve the reasoning process.

Command Example:

python error_refiner_openai.py --input_json /path/to/save_cot.json --api_key /path/to/api_key --azure_endpoint /path/to/azure_endpoint --model /path/to/refine_model --output_json /path/to/refined_output.json
  • input_json: Path to the JSON file with generated CoTs to be refined.
  • api_key: Your Azure OpenAI API key.
  • azure_endpoint: The endpoint for the Azure OpenAI API.
  • model: Path to the refinement model.
  • output_json: Path to save the refined CoTs.

Output:

Refined CoTs that incorporate error corrections from previous iterations.

Diff Optimizer

This module performs advanced optimizations on the CoTs using the Azure OpenAI API. It focuses on deep reasoning improvements based on detailed feedback.

Command Example:

python diff_opti.py --input_json /path/to/save_cot.json --output_json /path/to/optimized_cot.json --api_key /path/to/api_key --azure_endpoint /path/to/azure_endpoint --model /path/to/optimize_model
  • input_json: Path to the JSON file with CoTs to be optimized.
  • output_json: Path to save the optimized CoTs.
  • api_key: Your Azure OpenAI API key.
  • azure_endpoint: The Azure OpenAI endpoint.
  • model: The model used for deep optimization.

Output:

Optimized CoTs that have undergone deeper analysis and improvement.

Response Summarizer

This module generates concise summaries for each CoT, transforming verbose reasoning into a one-sentence explanation.

Command Example:

python response_summarizer.py --input_json /path/to/save_cot.json --model /path/to/summary_model --azure_endpoint /path/to/azure_endpoint --api_key /path/to/api_key --results_file /path/to/summaries.json
  • input_json: Path to the JSON file with CoTs to be summarized.
  • model: The model used for summarization.
  • azure_endpoint: The endpoint for the Azure OpenAI API.
  • api_key: Your Azure OpenAI API key.
  • results_file: Path to save the summarized CoTs.

Output:

A JSON file with concise summaries for each CoT.

Score Evaluator

This module evaluates the clinical accuracy of CoTs based on multiple criteria and generates scores for each CoT.

Command Example:

python score_evaluator.py --input_jsons /path/to/save_cot.json --model /path/to/score_model --azure_endpoint /path/to/azure_endpoint --api_key /path/to/api_key --final_output /path/to/scores.json
  • input_jsons: The input JSON file with CoTs to be evaluated.
  • model: Path to the model used for scoring.
  • azure_endpoint: The endpoint for the Azure OpenAI API.
  • api_key: Your Azure OpenAI API key.
  • final_output: Path to save the final evaluation scores.

Output:

Scores for each CoT based on clinical accuracy, reasoning, and completeness.

Example Pipeline

Easy Pipeline:

To generate and evaluate CoTs:

python generate_9cot.py --data_path /path/to/question.json --model_path1 /path/to/model1 --model_path2 /path/to/model2 --model_path3 /path/to/model3 --json_path /path/to/save_cot.json
python verifier.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model

Medium Pipeline:

To generate, evaluate, rank, and refine CoTs:

python generate_9cot.py --data_path /path/to/question.json --model_path1 /path/to/model1 --model_path2 /path/to/model2 --model_path3 /path/to/model3 --json_path /path/to/save_cot.json
python verifier.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model
python quality_ranker.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model --intermediate_file /path/to/intermediate.json --final_output /path/to/final_results.json
python error_refiner_openai.py --input_json /path/to/save_cot.json --api_key /path/to/api_key --azure_endpoint /path/to/azure_endpoint --model /path/to/refine_model --output_json /path/to/refined_output.json

Difficult Pipeline:

For advanced optimizations:

python generate_9cot.py --data_path /path/to/question.json --model_path1 /path/to/model1 --model_path2 /path/to/model2 --model_path3 /path/to/model3 --json_path /path/to/save_cot.json
python verifier.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model
python quality_ranker.py --input_json /path/to/save_cot.json --model_path /path/to/eval_model --intermediate_file /path/to/intermediate.json --final_output /path/to/final_results.json
python error_refiner_openai.py --input_json /path/to/save_cot.json --api_key /path/to/api_key --azure_endpoint /path/to/azure_endpoint --model /path/to/refine_model --output_json /path/to/refined_output.json
python diff_opti.py --input_json /path/to/save_cot.json --output_json /path/to/optimized_cot.json --api_key /path/to/api_key --azure_endpoint /path/to/azure_endpoint --model /path/to/optimize_model

Training

Stay tuned

Evaluation

Stay tuned

Conclusion

ReasonMed provides an integrated framework for generating, optimizing, validating, and evaluating medical Chain-of-Thought responses. This comprehensive pipeline is crucial for advancing AI-powered clinical reasoning and decision-making models.

Citations

@misc{sun2025reasonmed370kmultiagentgenerated,
      title={ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning}, 
      author={Yu Sun and Xingyu Qian and Weiwen Xu and Hao Zhang and Chenghao Xiao and Long Li and Yu Rong and Wenbing Huang and Qifeng Bai and Tingyang Xu},
      year={2025},
      eprint={2506.09513},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09513}, 
}

About

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%