extended-abstract

Off-Policy Correction For Multi-Agent Reinforcement Learning

Authors:

Michał Zawalski,

Błazej Osiński,

Henryk Michalewski,

Piotr MiłońAuthors Info & Claims

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Pages 1774 - 1776

Published: 09 May 2022 Publication History

Get Access

Abstract

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with negligible impact on the quality of training. Furthermore, our algorithm is theoretically grounded -- we provide a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

References

[1]

Josh Achiam. 2018. Spinning Up in Deep RL. https://spinningup.openai.com .

Google Scholar

[2]

Lasse Espeholt, Hubert Soyer, Ré mi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 1406--1415. http://proceedings.mlr.press/v80/espeholt18a.html

Google Scholar

[3]

Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 2974--2982. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193

Crossref

Google Scholar

[4]

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 6379--6390. https://proceedings.neurips.cc/paper/2017/hash/68a9750337a418a86fe06c1991a1d64c-Abstract.html

Google Scholar

[5]

Tabish Rashid, Mikayel Samvelyan, Christian Schrö der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research), Jennifer G. Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 4292--4301. http://proceedings.mlr.press/v80/rashid18a.html

Google Scholar

[6]

Tabish Rashid, Mikayel Samvelyan, Christian Schrö der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2020. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. CoRR, Vol. abs/2003.08839 (2020). arxiv: 2003.08839 https://arxiv.org/abs/2003.08839

Google Scholar

[7]

Mikayel Samvelyan, Tabish Rashid, Christian Schrö der de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob N. Foerster, and Shimon Whiteson. 2019a. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS '19, Montreal, QC, Canada, May 13--17, 2019, Edith Elkind, Manuela Veloso, Noa Agmon, and Matthew E. Taylor (Eds.). International Foundation for Autonomous Agents and Multiagent Systems, 2186--2188. http://dl.acm.org/citation.cfm?id=3332052

Google Scholar

[8]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philiph H. S. Torr, Jakob Foerster, and Shimon Whiteson. 2019b. The StarCraft Multi-Agent Challenge. CoRR, Vol. abs/1902.04043 (2019).

Google Scholar

[9]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction .MIT press.

Digital Library

Google Scholar

[10]

John N. Tsitsiklis and Benjamin Van Roy. 1997. An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control., Vol. 42, 5 (1997), 674--690. https://doi.org/10.1109/9.580874

Crossref

Google Scholar

[11]

Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre M. Bayen, and Yi Wu. 2021. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. CoRR, Vol. abs/2103.01955 (2021). showeprint[arXiv]2103.01955 https://arxiv.org/abs/2103.01955

Google Scholar

[12]

Michal Zawalsk Henryk Michalewski, and Piotr Mi?o?. 2021. Off-Policy Correction For Multi-Agent Reinforcement Learning. arxiv: cs.LG/2111.11229

Google Scholar

Index Terms

Off-Policy Correction For Multi-Agent Reinforcement Learning
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Other architectures
      1. Neural networks
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Towards reinforcement learning for holonic multi-agent systems

Holonic multi-agent system HMAS offers a promising approach to model complex systems. HMAS is based on self-similar entities defined in a holarchical organization. Although some models and frameworks have been proposed for holonic systems, there is no ...
A multi-agent reinforcement learning with weighted experience sharing
ICIC'11: Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence

Reinforcement Learning, also sometimes called learning by rewards and punishments is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment [1]. With repeated trials however, it is expected ...
Multi-Agent Inverse Reinforcement Learning
ICMLA '10: Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications

Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multi-agent inverse reinforcement ...

Comments

Information & Contributors

Information

Published In

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

May 2022

1990 pages

ISBN:9781450392136

General Chairs:
Catherine Pelachaud
CNRS-ISIR, Sorbonne University, France
,
Matthew E. Taylor
University of Alberta, Canada
,
Program Chairs:
Piotr Faliszewski
AGH University of Science and Technology, Poland
,
Viviana Mascardi
University of Genova, Italy

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

Qualifiers

Extended-abstract

Funding Sources

Polish National Science Center

Conference

AAMAS ' 22

Sponsor:

SIGAI

AAMAS ' 22: International Conference on Autonomous Agents and Multi-Agent Systems

May 9 - 13, 2022

Virtual Event, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
27
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Towards reinforcement learning for holonic multi-agent systems

A multi-agent reinforcement learning with weighted experience sharing

Multi-Agent Inverse Reinforcement Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations