skip to main content
10.5555/3535850.3535908acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Published: 09 May 2022 Publication History

Abstract

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by MKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash equilibrium in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms.

References

[1]
Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, et al. 2020. Learning to Play No-Press Diplomacy with Best Response Policy Iteration. In Advances in Neural Information Processing Systems.
[2]
Ayala Arad and Ariel Rubinstein. 2012. Multi-dimensional iterative reasoning in action: The case of the Colonel Blotto game. Journal of Economic Behavior & Organization, Vol. 84, 2 (2012), 571--585.
[3]
W Brian Arthur. 1994. Complexity in economic theory: Inductive reasoning and bounded rationality. The American Economic Review, Vol. 84, 2 (1994), 406--411.
[4]
Yakov Babichenko. 2016. Query complexity of approximate Nash equilibria. Journal of the ACM (JACM), Vol. 63, 4 (2016), 36:1--36:24.
[5]
Amir Beck and Marc Teboulle. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, Vol. 31, 3 (2003), 167--175.
[6]
Soheil Behnezhad, Sina Dehghani, Mahsa Derakhshan, MohammadTaghi HajiAghayi, and Saeed Seddighin. 2017. Faster and simpler algorithm for optimal strategies of Blotto game. In Proceedings of the AAAI Conference on Artificial Intelligence. 369--375.
[7]
David Blackwell et al. 1956. An analog of the minimax theorem for vector payoffs. Pacific J. Math., Vol. 6, 1 (1956), 1--8.
[8]
Avrim Blum and Yishay Mansour. 2007. Learning, Regret Minimization, and Equilibria .Cambridge University Press, 79--102. https://doi.org/10.1017/CBO9780511800481.006
[9]
Ben Blum, Christian R Shelton, and Daphne Koller. 2006. A continuation method for Nash equilibria in structured games. Journal of Artificial Intelligence Research, Vol. 25 (2006), 457--502.
[10]
Enric Boix-Adserà, Benjamin L Edelman, and Siddhartha Jayanti. 2020. The Multiplayer Colonel Blotto Game. In Proceedings of the 21st ACM Conference on Economics and Computation. 47--48.
[11]
Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. 2015. Heads-up Limit Hold'em Poker is Solved. Science, Vol. 347, 6218 (January 2015), 145--149.
[12]
Noam Brown and Tuomas Sandholm. 2017. Superhuman AI for Heads-up No-limit Poker: Libratus beats top professionals. Science, Vol. 360, 6385 (December 2017).
[13]
Jie Chen and Ronny Luss. 2018. Stochastic gradient descent with biased but consistent gradient estimators. arXiv preprint arXiv:1807.11880 (2018).
[14]
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018).
[15]
Xi Chen, Xiaotie Deng, and Shang-Hua Teng. 2009. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), Vol. 56, 3 (2009), 1--57.
[16]
Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. 2009. The complexity of computing a Nash equilibrium. SIAM J. Comput., Vol. 39, 1 (2009), 195--259.
[17]
Argyrios Deligkas, John Fearnley, Tobenna Peter Igwe, and Rahul Savani. 2016. An empirical study on computing equilibria in polymatrix games. arXiv preprint arXiv:1602.06865 (2016).
[18]
Argyrios Deligkas, John Fearnley, Rahul Savani, and Paul Spirakis. 2017. Computing approximate Nash equilibria in polymatrix games. Algorithmica, Vol. 77, 2 (2017), 487--514.
[19]
Alain Durmus, Pablo Jiménez, Éric Moulines, Salem Said, and Hoi-To Wai. 2020. Convergence analysis of Riemannian stochastic approximation schemes. arXiv preprint arXiv:2005.13284 (2020).
[20]
Francisco Facchinei and Jong-Shi Pang. 2007. Finite-dimensional variational inequalities and complementarity problems .Springer Science & Business Media.
[21]
John Fearnley, Martin Gairing, Paul W Goldberg, and Rahul Savani. 2015. Learning equilibria of games via payoff queries. The Journal of Machine Learning Research, Vol. 16, 1 (2015), 1305--1344.
[22]
John Fearnley and Rahul Savani. 2016. Finding approximate Nash equilibria of bimatrix games via payoff queries. ACM Transactions on Economics and Computation (TEAC), Vol. 4, 4 (2016), 25:1--25:19.
[23]
Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Thanasis Lianeas, Panayotis Mertikopoulos, and Georgios Piliouras. 2020. No-regret learning and mixed Nash equilibria: They do not mix. arXiv preprint arXiv:2010.09514 (2020).
[24]
Drew Fudenberg, David K Levine, et al. 1998. The Theory of Learning in Games. MIT Press Books, Vol. 1 (1998).
[25]
Ian Gemp, Rahul Savani, Marc Lanctot, Yoram Bachrach, Thomas Anthony, Richard Everett, Andrea Tacchetti, Tom Eccles, and János Kramár. 2022. Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent. arXiv preprint arXiv:2106.01285 (2022).
[26]
Srihari Govindan and Robert Wilson. 2003. A global Newton method to compute Nash equilibria. Journal of Economic Theory, Vol. 110, 1 (2003), 65--86.
[27]
Srihari Govindan and Robert Wilson. 2004. Computing Nash equilibria by iterated polymatrix approximation. Journal of Economic Dynamics and Control, Vol. 28, 7 (2004), 1229--1241.
[28]
Jonathan Gray, Adam Lerer, Anton Bakhtin, and Noam Brown. 2020. Human-Level Performance in No-Press Diplomacy via Equilibrium Search. arXiv preprint arXiv:2010.02923 (2020).
[29]
Galina M Korpelevich. 1976. The extragradient method for finding saddle points and other problems. Matecon, Vol. 12 (1976), 747--756.
[30]
Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, and Jonah Ryan-Davis. 2019. OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR, Vol. abs/1908.09453 (2019). arxiv: 1908.09453 [cs.LG] http://arxiv.org/abs/1908.09453
[31]
Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. 2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems. 4190--4203.
[32]
Carlton Lemke and Joseph Howson, Jr. 1964. Equilibrium points of bimatrix games. Journal of the Society for industrial and Applied Mathematics, Vol. 12, 2 (1964), 413--423.
[33]
Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, and Karl Tuyls. 2019. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI).
[34]
Richard D McKelvey and Andrew McLennan. 1996. Computation of equilibria in finite games. Handbook of Computational Economics, Vol. 1 (1996), 87--142.
[35]
Richard D McKelvey, Andrew M McLennan, and Theodore L Turocy. 2016. Gambit: Software tools for game theory, version 16.0.1.
[36]
Richard D McKelvey and Thomas R Palfrey. 1995. Quantal response equilibria for normal form games. Games and Economic Behavior, Vol. 10, 1 (1995), 6--38.
[37]
H Brendan McMahan, Geoffrey J Gordon, and Avrim Blum. 2003. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 536--543.
[38]
Panayotis Mertikopoulos, Christos Papadimitriou, and Georgios Piliouras. 2018. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2703--2717.
[39]
Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. 2017. The Numerics of GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 1823--1833.
[40]
Matej Moravv c'ik, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. 2017. DeepStack: Expert-level artificial intelligence in Heads-up No-limit Poker. Science, Vol. 358, 6362 (October 2017).
[41]
Eugene Nudelman, Jennifer Wortman, Yoav Shoham, and Kevin Leyton-Brown. 2004. Run the Gamut: A comprehensive approach to evaluating game-theoretic algorithms. In AAMAS, Vol. 4. 880--887.
[42]
Georg Ostrovski and Sebastian van Strien. 2013. Payoff performance of fictitious play. arXiv preprint arXiv:1308.4049 (2013).
[43]
Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, et al. 2020. From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization. arXiv preprint arXiv:2002.08456 (2020).
[44]
Ryan Porter, Eugene Nudelman, and Yoav Shoham. 2008. Simple search methods for finding a Nash equilibrium. Games and Economic Behavior, Vol. 63, 2 (2008), 642--662.
[45]
Tuomas Sandholm, Andrew Gilpin, and Vincent Conitzer. 2005. Mixed-integer programming methods for finding Nash equilibria. In AAAI. 495--501.
[46]
Yoav Shoham and Kevin Leyton-Brown. 2009. Multiagent systems: Algorithmic, game-theoretic, and logical foundations .Cambridge University Press.
[47]
Richard S Sutton, Csaba Szepesvári, and Hamid Reza Maei. 2008. A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. Advances in Neural Information Processing Systems, Vol. 21, 21 (2008), 1609--1616.
[48]
Theodore L Turocy. 2005. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence. Games and Economic Behavior, Vol. 51, 2 (2005), 243--263.
[49]
Gerard van der Laan, AJJ Talman, and L Van der Heyden. 1987. Simplicial variable dimension algorithms for solving the nonlinear complementarity problem on a product of unit simplices using a general labelling. Mathematics of Operations Research, Vol. 12, 3 (1987), 377--397.
[50]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.
[51]
Michael P Wellman. 2006. Methods for empirical game-theoretic analysis. In AAAI. 1552--1556.

Cited By

View all
  • (2024)On the Stability of Learning in Network Games with Many PlayersProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662940(861-870)Online publication date: 6-May-2024
  • (2023)The impact of exploration on convergence and performance of multi-agent Q-learning dynamicsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618986(14178-14202)Online publication date: 23-Jul-2023
  • (2023)Beyond strict competitionProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/16(135-143)Online publication date: 19-Aug-2023

Index Terms

  1. Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
    May 2022
    1990 pages
    ISBN:9781450392136

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 09 May 2022

    Check for updates

    Author Tags

    1. Nash
    2. empirical game theory
    3. homotopy
    4. limiting logit equilibrium
    5. n-player
    6. normal-form
    7. quantal response equilibrium

    Qualifiers

    • Research-article

    Conference

    AAMAS ' 22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)On the Stability of Learning in Network Games with Many PlayersProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662940(861-870)Online publication date: 6-May-2024
    • (2023)The impact of exploration on convergence and performance of multi-agent Q-learning dynamicsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618986(14178-14202)Online publication date: 23-Jul-2023
    • (2023)Beyond strict competitionProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/16(135-143)Online publication date: 19-Aug-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media