Skip to content

UNITES-Lab/CURA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CURA: Calibrated Uncertainty with Retrieval and Agents

CURA is a reliability-first multimodal medical decision-support framework. It couples knowledge-graph grounded retrieval with a heterogeneous clinical council, fuses opinions via Bayesian belief aggregation, and exposes a calibrated posterior to downstream decision policies — selective abstention, split conformal prediction sets, entropy-gated escalation, and uncertainty-driven interactive diagnosis.

Core modules (cura/):

  • kg/ — Co-evolutionary LLM↔KG grounding: entity extraction, embedding-based matching, Monte Carlo multi-hop path sampling, evidence reranking
  • expert_panel.py — Heterogeneous clinical council with specialist personas (sequential and adaptive modes)
  • aggregation/bayesian.py — Bayesian belief aggregation in log-space with normalized Shannon entropy
  • aggregation/conformal.py — Split conformal prediction sets with Clopper-Pearson CIs
  • aggregation/selective.py — Selective abstention (risk-coverage curves, AURC)
  • aggregation/calibration.py — ECE computation and calibration diagnostics
  • llm.py — Unified LLM interface (OpenAI, Azure, Anthropic, AWS Bedrock)
  • usage_tracker.py — Token and cost tracking across all agents

Setup

# Environment
cd cura_env && bash setup.sh
conda activate cura_e1

# Install
pip install -e .

# API keys
cp .env.example .env  # fill in your keys

Requires at least one LLM provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). For Azure deployments, prefix model names with azure-.

KG backend requires Neo4j. See scripts/setup_neo4j.py and config/kg_settings.json.

Usage

from cura.agent import A1

agent = A1(path='./data', llm='azure-gpt-4.1')
result = agent.go("Diagnose this patient presenting with...")

Benchmarks

# Medical QA (MedQA, MedMCQA, MMLU-Medical, QA4MRE)
python benchmark/run_benchmark.py --dataset medqa --model azure-gpt-4.1

# AgentClinic interactive diagnosis (MedQA, NEJM scenarios)
python benchmark/run_agentclinic.py --dataset NEJM_Ext --n_scenarios 100 \
    --mode baseline_single --max_turns 10 \
    --commit_pmax 0.85 --commit_entropy 0.3

# VQA-RAD multimodal
python benchmark/run_benchmark.py --dataset vqa_rad --model azure-gpt-4.1

Results are saved to experiments/results/ with per-case JSONL, metrics CSV, and run metadata JSON.

Key Scripts

Script Purpose
benchmark/run_benchmark.py Run QA benchmarks with optional expert panel
benchmark/run_agentclinic.py Interactive diagnosis (immediate / fixed-turn / uncertainty-driven)
scripts/setup_neo4j.py Initialize Neo4j KG backend
scripts/build_benchmark_kg.py Build domain KGs from literature
scripts/plot_reliability_diagram.py Generate calibration and reliability plots
scripts/run_ablation_diversity.py Council size / diversity ablations

Tests

python tests/run_all_tests.py

License

Apache 2.0. See LICENSE.

Acknowledgements

Built on LangGraph and the Biomni framework.

About

CURA: Calibrated Uncertainty with Retrieval and Agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors