This repository contains two Python scripts used to build a multi-agent LLM system for screening research articles.
The workflow follows the process described in the paper: two open-source LLMs act as worker models, and GPT-4o acts as the arbiter when the workers disagree.
Scripts included:
call_LLM.py– run one local open-source LLM on all papers.tie_breaker.py– resolve disagreements between two worker models using GPT-4o.
Runs a single locally deployed LLM (e.g., Llama 4 or Qwen3) to evaluate each paper’s title and abstract using predefined inclusion/exclusion criteria.
The model outputs:
- psychiatric focus
- multimodal data use
- AI methods
- original research
- final include/exclude decision
- a brief explanation
- An Excel file containing:
- PMID
- Title
- Abstract
- Local LLM endpoint at
http://localhost:8085/v1 - A specified model name (e.g.,
llama4_scout_inst)
Two CSV files:
- Raw model responses
- Parsed JSON results with yes/no decisions and reasons
These results are later compared with another model’s results (e.g., Qwen3 output).
Acts as the arbitration step.
After both worker LLMs have produced their include/exclude decisions, this script:
- Finds papers where the two models disagree.
- Retrieves their titles and abstracts.
- Sends them to GPT-4o through an internal Azure OpenAI endpoint.
- Parses the model’s decision.
- Produces the final combined decision:
- If workers agree → keep their decision
- If workers disagree → use GPT-4o’s decision
- CSV file with worker decisions (
include_qwenandinclude_llama) - Excel file with titles and abstracts
- GPT-4o API access (internal endpoint + token)
- A CSV containing GPT-4o arbitration results
- A final merged CSV with the final screening decision for each paper