English | 简体中文
[ACL 2026] RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
Welcome to the official repository for RouteMoA (Paper link). This project introduces an efficient and dynamic routing mechanism designed to boost the performance of Mixture-of-Agents (MoA) architectures without relying on pre-inference steps.
Concept comparison between our RouteMoA and previous MoA-based methods.
We highly recommend using conda to create an isolated environment.
System Requirement: We strongly recommend using a Linux (x86_64) operating system. This guide is based on Ubuntu 22.04, and other operating systems are currently not officially supported.
conda create -n YOUR_ENV_NAME python=3.10 -y
conda activate YOUR_ENV_NAME
# Install the OpenCompass evaluation framework (for small model pool experiments)
cd opencompass
pip install -e .
# Install the EMoA core module (small model pool)
cd ../emoa
pip install -e .
# Install the EMoA large model pool module
cd ../emoa_large
pip install -e .We suggest downloading the LLM checkpoints locally first. Our experiments were conducted using five NVIDIA A800 80GB GPUs. For reproducibility, We strongly recommend deploying each model on a dedicated high-performance NVIDIA GPU with more than 70GB of VRAM.
Set up your local checkpoint directory:
mkdir -p </path/to/your/local/checkpoint/folder>Install the HuggingFace CLI tool:
pip install -U huggingface_hubDownload the required models from HuggingFace:
# Download Bio-Medical Llama
huggingface-cli download ContactDoctor/Bio-Medical-Llama-3-8B \
--token <your-huggingface-token> \
--local-dir </path/to/your/local/checkpoint/folder>/Bio-Medical-Llama-3-8B
# Download Qwen models
huggingface-cli download Qwen/Qwen2.5-Coder-7B-Instruct \
--local-dir </path/to/your/local/checkpoint/folder>/Qwen2.5-Coder-7B-Instruct
huggingface-cli download Qwen/Qwen2.5-Math-7B-Instruct \
--local-dir </path/to/your/local/checkpoint/folder>/Qwen2.5-Math-7B-Instruct
# Download Gemma and Ministral models
huggingface-cli download google/gemma-2-9b-it \
--token <your-huggingface-token> \
--local-dir </path/to/your/local/checkpoint/folder>/gemma-2-9b-it
huggingface-cli download mistralai/Ministral-8B-Instruct-2410 \
--local-dir </path/to/your/local/checkpoint/folder>/Ministral-8B-Instruct-2410Once the models are ready, deploy them using LMDeploy:
bash lmdeploy.shNext, set up the router and launch the core services.
- Download the Router checkpoint from Google Drive.
- Place the downloaded checkpoint into your local
checkpoints/directory. - Download the router backbone (
mdeberta-v3-base) from the official Microsoft project:- Model weights: microsoft/mdeberta-v3-base
- Or simply use the CLI:
huggingface-cli download microsoft/mdeberta-v3-base --local-dir </path/to/mdeberta-v3-base>
- Update the config file (
emoa/configs/emoa_v2.json):- Set
router_pth_pathto the absolute path of your downloaded Router checkpoint. - Set
router_backboneto the absolute path of yourmdeberta-v3-basefolder.
- Set
Now, start the services:
conda activate YOUR_ENV_NAME
# Start the EMoA service
python3 -m emoa.serve.app_v2 --config emoa/configs/emoa_v2.json --host 0.0.0.0 --port 10666
# Start the SMoA service
python3 -m emoa.serve.smoa --config emoa/configs/smoa.json --host 0.0.0.0 --port 10667We use OpenCompass to evaluate the performance of our EMoA service.
First, run the inference:
conda activate YOUR_ENV_NAME
opencompass examples/eval_emoa.py -r latest --mode infer --dump-eval-detailsAfter inference completes, run the evaluation to calculate the scores:
opencompass examples/eval_emoa.py -r latest --mode eval --dump-eval-detailsIf you need to analyze the API costs and latency, we provide a handy script:
cd opencompass
python bill_stat.pyNote on Reproducibility: To demonstrate the reproducibility of our experiments, we have provided the full OpenCompass evaluation results. You can download and verify them here.
The large model pool experiment calls external LLM APIs (e.g., DeepSeek, Qwen) instead of deploying local models. All code is in the emoa_large/ directory.
a) Install dependencies:
cd emoa_large
pip install -e .b) Fill in your API credentials in emoa_large/configs/api_info.csv:
Open the file and replace YOUR_API_BASE_URL and YOUR_API_KEY for each model with your actual API endpoint and key. The file uses the standard OpenAI-compatible API format:
model,model_name,model_id,input_price,output_price,api_type,api_base,api_key
deepseek-ai/DeepSeek-V3-0324,deepseek-v3-0324,,0.28,1.14,openai,https://api.deepseek.com/v1,YOUR_KEY
...
You do not need to populate all 15 models — only configure the models you plan to use and update the candidate_models list in the relevant config JSON accordingly.
c) Download the router weights (required for RouteMoA only):
Download router_large.pth from Google Drive and place it at:
emoa_large/weights/router_large.pth
d) Download the router backbone (microsoft/mdeberta-v3-base):
huggingface-cli download microsoft/mdeberta-v3-base --local-dir /path/to/mdeberta-v3-baseThen update router_backbone in emoa_large/configs/routemoa.json to the local path, or leave it as "microsoft/mdeberta-v3-base" to download automatically from HuggingFace.
All three methods expose an OpenAI-compatible /v1/chat/completions endpoint.
MoA (baseline):
cd emoa_large
python3 -m emoa.serve.app_moa --config configs/moa.json --host 0.0.0.0 --port 10078RouteMoA (ours):
cd emoa_large
python3 -m emoa.serve.app_routemoa --config configs/routemoa.json --host 0.0.0.0 --port 10079SMoA (baseline):
cd emoa_large
python3 -m emoa.serve.app_smoa --config configs/smoa.json --host 0.0.0.0 --port 10080You can verify a service is running by checking its health endpoint:
curl http://localhost:10078/healthNote on evaluation methodology: The large model pool results in Table 1 of the paper were produced using an internal evaluation platform that is not publicly available. We provide an equivalent standalone evaluation suite in
emoa_large/eval/that reproduces the same benchmark, metrics, and results using only open-source tools.
The emoa_large/eval/ directory contains everything needed to reproduce the paper's large model pool results.
Step 1 — Install evaluation dependencies:
pip install openai
pip install lawrouge # for RougeMetric (lcsts)
pip install pkuseg nltk # for GEC F1 (nlpcc2018_task2, conll2014)For GEC F1, also install m2scorer:
git clone https://github.com/nusnlp/m2scorer
export PYTHONPATH=/path/to/m2scorer:$PYTHONPATHStep 2 — Run inference against a running service:
cd emoa_large/eval
python inference.py \
--base_url http://localhost:10079/v1 \
--api_key dummy \
--model routemoa \
--input benchmark_questions.json \
--output predictions_routemoa.json \
--workers 4Step 3 — Evaluate predictions:
python evaluate.py \
--predictions predictions_routemoa.json \
--benchmark benchmark_questions.json \
--output eval_results_routemoa.json \
--judge_base_url https://api.openai.com/v1 \
--judge_api_key YOUR_OPENAI_KEY \
--judge_model gpt-4oThe script prints per-dataset scores, category averages, and a global average to stdout.
Pre-computed results are available in
emoa_large/eval/for all three methods (moa.json,routemoa.json,smoa.json) along with a cross-model summary insummary_all.json. Seeemoa_large/eval/README.mdfor the full benchmark description and paper results.
Our router training pipeline is built on top of RouterDC.
You can train your own router using custom data by following the instructions provided in the RouterDC repository. We have currently released the pre-trained checkpoint of our router, and the full training code tailored for RouteMoA will be released in the future.
This project is built upon the great work from the open-source community. We sincerely thank: