skip to main content
10.5555/3535850.3536106acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
extended-abstract

Off-Policy Correction For Multi-Agent Reinforcement Learning

Published: 09 May 2022 Publication History

Abstract

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with negligible impact on the quality of training. Furthermore, our algorithm is theoretically grounded -- we provide a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

References

[1]
Josh Achiam. 2018. Spinning Up in Deep RL. https://spinningup.openai.com .
[2]
Lasse Espeholt, Hubert Soyer, Ré mi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 1406--1415. http://proceedings.mlr.press/v80/espeholt18a.html
[3]
Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 2974--2982. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193
[4]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 6379--6390. https://proceedings.neurips.cc/paper/2017/hash/68a9750337a418a86fe06c1991a1d64c-Abstract.html
[5]
Tabish Rashid, Mikayel Samvelyan, Christian Schrö der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 4292--4301. http://proceedings.mlr.press/v80/rashid18a.html
[6]
Tabish Rashid, Mikayel Samvelyan, Christian Schrö der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2020. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. CoRR, Vol. abs/2003.08839 (2020). arxiv: 2003.08839 https://arxiv.org/abs/2003.08839
[7]
Mikayel Samvelyan, Tabish Rashid, Christian Schrö der de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob N. Foerster, and Shimon Whiteson. 2019a. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS '19, Montreal, QC, Canada, May 13--17, 2019, Edith Elkind, Manuela Veloso, Noa Agmon, and Matthew E. Taylor (Eds.). International Foundation for Autonomous Agents and Multiagent Systems, 2186--2188. http://dl.acm.org/citation.cfm?id=3332052
[8]
Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philiph H. S. Torr, Jakob Foerster, and Shimon Whiteson. 2019b. The StarCraft Multi-Agent Challenge. CoRR, Vol. abs/1902.04043 (2019).
[9]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction .MIT press.
[10]
John N. Tsitsiklis and Benjamin Van Roy. 1997. An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control., Vol. 42, 5 (1997), 674--690. https://doi.org/10.1109/9.580874
[11]
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre M. Bayen, and Yi Wu. 2021. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. CoRR, Vol. abs/2103.01955 (2021). showeprint[arXiv]2103.01955 https://arxiv.org/abs/2103.01955
[12]
Michal Zawalsk Henryk Michalewski, and Piotr Mi?o?. 2021. Off-Policy Correction For Multi-Agent Reinforcement Learning. arxiv: cs.LG/2111.11229

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
May 2022
1990 pages
ISBN:9781450392136

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

  1. importance sampling
  2. reinforcement learning
  3. scalability
  4. v-trace

Qualifiers

  • Extended-abstract

Funding Sources

  • Polish National Science Center

Conference

AAMAS ' 22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 27
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media