Skip to content

romilly/FlowReasoner

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by $\mathbf{10.52}$% accuracy across three benchmarks.

Installation

We follow the MetaGPT to install the required dependencies, please run the following commands:

git clone https://github.com/sail-sg/FlowReasoner 
cd code
pip install --upgrade -e .

All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.

Configure optimization parameters:

Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)

models:
 "<model_name>": # model: "gpt-4-turbo"  # or gpt-3.5-turbo
   api_type: "openai"  # or azure / ollama / groq etc.
   base_url: "<your base url>" 
   api_key: "<your api key>"
   temperature: 0
 "<model_name>":  
   api_type: "openai"  
   base_url: "<your base url>"
   api_key: "<your api key>"
   temperature: 0
CALC_USAGE: True 

Run the inference

Using default parameters

python -m examples.FlowReasoner.optimize --dataset MATH

Or with custom parameters

python -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...

Training Stage

The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process using LLaMA-Factory while the RL is based on EasyRL.

Acknowledgments

This repository is based on the codebase of the MetaGPT, LLaMA-Factory, and EasyRL. Thanks for their impressive work!

Citation

If you find our work helpful, please cite as

@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,
      title={FlowReasoner: Reinforcing Query-Level Meta-Agents}, 
      author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},
      year={2025},
      eprint={2504.15257},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.15257}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.9%
  • JavaScript 1.1%
  • Other 1.0%