GitHub

FlowReasoner: Reinforcing Query-Level Meta-Agents

Hongcheng Gao^*, Yue Liu^*, Yufei He, Longxu Dou, Chao Du, Zhijie Deng,
Bryan Hooi, Min Lin, Tianyu Pang^†

^*Equal Contribution ^† Corresponding Author

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by $\mathbf{10.52}$% accuracy across three benchmarks.

Installation

We follow the MetaGPT to install the required dependencies, please run the following commands:

git clone https://github.com/sail-sg/FlowReasoner 
cd code
pip install --upgrade -e .

All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.

Configure optimization parameters:

Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)

models:
 "<model_name>": # model: "gpt-4-turbo"  # or gpt-3.5-turbo
   api_type: "openai"  # or azure / ollama / groq etc.
   base_url: "<your base url>" 
   api_key: "<your api key>"
   temperature: 0
 "<model_name>":  
   api_type: "openai"  
   base_url: "<your base url>"
   api_key: "<your api key>"
   temperature: 0
CALC_USAGE: True

Run the inference

Using default parameters

python -m examples.FlowReasoner.optimize --dataset MATH

Or with custom parameters

python -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...

Training Stage

The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process using LLaMA-Factory while the RL is based on EasyRL.

Acknowledgments

This repository is based on the codebase of the MetaGPT, LLaMA-Factory, and EasyRL. Thanks for their impressive work!

Citation

If you find our work helpful, please cite as

@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,
      title={FlowReasoner: Reinforcing Query-Level Meta-Agents}, 
      author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},
      year={2025},
      eprint={2504.15257},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.15257}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
code		code
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowReasoner: Reinforcing Query-Level Meta-Agents

Installation

Configure optimization parameters:

Run the inference

Using default parameters

Or with custom parameters

Training Stage

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Languages

romilly/FlowReasoner

Folders and files

Latest commit

History

Repository files navigation

FlowReasoner: Reinforcing Query-Level Meta-Agents

Installation

Configure optimization parameters:

Run the inference

Using default parameters

Or with custom parameters

Training Stage

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages