AAAI 2026 Paper: KDR-Agent — a Multi-Agent LLM Framework for Low-Resource, Multi-Domain NER.
KDR-Agent is a multi-agent LLM framework designed for low-resource, multi-domain Named Entity Recognition (NER). The core idea is: “understand before recognizing, and self-correct through reflection.”
Figure: Overview of KDR-Agent.
You can install all required packages via:
pip install -r requirements.txtexport OPENAI_API_KEY="sk-xxx"Read it in Python:
openai.api_key = os.getenv("OPENAI_API_KEY")Example config file:./config/Bio_BC5CDR.json
{
"dataset": "Bio_BC5CDR",
"test_file_path": "./data/Bio_BC5CDR/test_sample.json",
"save_file_path": "./data/Bio_BC5CDR/test_sample.json",
"model_name": "gpt-4o",
"api_keys": "sk-*********************************",
"max_loop": 10
}Parameter description:
| Parameter | Description |
|---|---|
dataset |
Dataset name |
test_file_path |
Path to the test set (json) |
save_file_path |
Path to save predictions (will overwrite / write back) |
model_name |
LLM name, e.g., gpt-4o |
api_keys |
OpenAI API Key (can be replaced by env var) |
max_loop |
Maximum retry times when JSON schema validation fails |
python main.py --args_file ./config/Bio_BC5CDR.jsonAfter running, you will get:
- Updated
test_sample.jsonwith predictions - Precision / Recall / F1
The input file test_file_path should be a JSON list. Each sample contains at least:
[
{
"sentence": "Docetaxel was compared with paclitaxel in breast cancer.",
"entities": [
{"name": "Docetaxel", "type": "Chemical"},
{"name": "paclitaxel", "type": "Chemical"},
{"name": "breast cancer", "type": "Disease"}
]
}
]The output will append a prediction field for each sample:
"predicts": [
{"name": "...", "type": "..."}
]We compute metrics using get_PRF(test_data):
- P (Precision)
- R (Recall)
- F1
You may replace it with your preferred evaluation method in tool.py
(e.g., strict match / partial match / span-level evaluation).
We use the Python wikipedia API to fetch short summaries as background knowledge.
Retrieval follows a robust fallback calling strategy: we first try an automatic fuzzy/auto-suggest lookup, and if the concept is ambiguous or not found, we fall back to candidate pages from disambiguation or search results.
This ensures stable Wikipedia context acquisition for rare, noisy, or multi-meaning entity mentions.
KDR-Agent/
├─ main.py
├─ arguments.py
├─ tool.py
├─ config/
│ └─ Bio_BC5CDR.json
├─ data/
│ └─ Bio_BC5CDR/
│ ├─ test_sample.json
│ └─ ...
└─ README.md
-
If the LLM sometimes produces invalid JSON, consider:
- Increasing
max_loop - Strengthening schema constraints in prompts
- Increasing