Hongcheng Gao*, Yue Liu*, Yufei He, Longxu Dou, Chao Du, Zhijie Deng,
Bryan Hooi, Min Lin, Tianyu Pang†
*Equal Contribution † Corresponding Author
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by
We follow the MetaGPT to install the required dependencies, please run the following commands:
git clone https://github.com/sail-sg/FlowReasoner
cd code
pip install --upgrade -e .All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.
Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)
models:
"<model_name>": # model: "gpt-4-turbo" # or gpt-3.5-turbo
api_type: "openai" # or azure / ollama / groq etc.
base_url: "<your base url>"
api_key: "<your api key>"
temperature: 0
"<model_name>":
api_type: "openai"
base_url: "<your base url>"
api_key: "<your api key>"
temperature: 0
CALC_USAGE: True python -m examples.FlowReasoner.optimize --dataset MATHpython -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process using LLaMA-Factory while the RL is based on EasyRL.
This repository is based on the codebase of the MetaGPT, LLaMA-Factory, and EasyRL. Thanks for their impressive work!
If you find our work helpful, please cite as
@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,
title={FlowReasoner: Reinforcing Query-Level Meta-Agents},
author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},
year={2025},
eprint={2504.15257},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.15257},
}