research-article

Bayesian Strategy Networks Based Soft Actor-Critic Learning

Authors:

Ramviyas ParasuramanAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 3

Article No.: 42, Pages 1 - 24

https://doi.org/10.1145/3643862

Published: 29 March 2024 Publication History

Abstract

A strategy refers to the rules that the agent chooses the available actions to achieve goals. Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system’s utility, decrease the overall cost, and increase mission success probability. This article proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method—soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. Our method achieves the state-of-the-art performance on the standard continuous control benchmarks in the OpenAI Gym environment. The results demonstrate that the promising potential of the BSAC method significantly improves training efficiency. Furthermore, we extend the topic to the Multi-Agent systems (MAS), discussing the potential research fields and directions.

References

[1]

Stefano V. Albrecht and Peter Stone. 2018. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence 258 (2018), 66–95.

[2]

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34, 6 (2017), 26–38.

[3]

Leemon Baird and Andrew Moore. 1998. Gradient descent for general reinforcement learning. InProceedings of the 11th International Conference on Neural Information Processing Systems.

[4]

Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. 2018. Distributional policy gradients. In International Conference on Learning Representations.

[5]

D. N. Barton, T. Saloranta, S. J. Moe, H. O. Eggestad, and S. Kuikka. 2008. Bayesian belief networks as a meta-modelling tool in integrated river basin management-pros and cons in evaluating nutrient abatement decisions under uncertainty in a Norwegian river basin. Ecological Economics 66, 1 (2008), 91–104.

[6]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:1606.01540. Retrieved from https://arxiv.org/abs/1606.01540

[7]

Michael Buro. 2003. Real-time strategy games: A new AI research challenge. In Proceedings of the International Joint Conference on Artificial Intelligence. Vol. 2003, 1534–1535.

[8]

Lawrence Freedman. 2015. Strategy: A History. Oxford University Press.

[9]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning. PMLR, 1587–1596.

[10]

Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar, et al. 2015. Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning 8, 5–6 (2015), 359–483.

Digital Library

[11]

Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. 2012. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 1291–1307.

Digital Library

[12]

Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, and Sergey Levine. 2017. Q-Prop: Sample-efficient policy gradient with An off-policy critic. In International Conference on Learning Representations.

[13]

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. 2017. Reinforcement learning with deep energy-based policies. In Proceedings of the International Conference on Machine Learning. PMLR, 1352–1361.

[14]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, 1861–1870.

[15]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic algorithms and applications. arXiv:1812.05905. Retrieved from https://arxiv.org/abs/1812.05905

[16]

Bernhard Hengst. 2012. Hierarchical approaches. In Reinforcement Learning, Marco Wiering and Martijn Otterlo (Eds.). Springer, 293–323.

[17]

Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. 2019. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 750–797.

Digital Library

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[19]

Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Wray Buntine, and Mohammed Bennamoun. 2022. Hands-on Bayesian neural networks-a tutorial for deep learning users. IEEE Computational Intelligence Magazine 17, 2 (2022), 29–48.

[20]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.

[21]

Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.

Digital Library

[22]

Pat Langley, John E. Laird, and Seth Rogers. 2009. Cognitive architectures: Research issues and challenges. Cognitive Systems Research 10, 2 (2009), 141–160.

Digital Library

[23]

Pedro Larranaga, Hossein Karshenas, Concha Bielza, and Roberto Santana. 2013. A review on evolutionary algorithms in Bayesian network learning and inference tasks. Information Sciences 233 (2013), 109–125.

Digital Library

[24]

Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv:1701.07274. Retrieved from https://arxiv.org/abs/1701.07274

[25]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In International Conference on Learning Representations.

[26]

Rongrong Liu, Florent Nageotte, Philippe Zanne, Michel de Mathelin, and Birgitta Dresp-Langley. 2021. Deep reinforcement learning for the control of robotic manipulation: A focussed mini-review. Robotics 10, 1 (2021), 22.

[27]

Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, and Nicolas Heess. 2021. From motor control to team play in simulated humanoid football. arXiv:2105.12196. Retrieved from https://arxiv.org/abs/2105.12196

[28]

Bruce G. Marcot. 2017. Common quandaries and their practical solutions in Bayesian network modeling. Ecological Modelling 358 (2017), 1–9.

[29]

Bruce G. Marcot and Trent D. Penman. 2019. Advances in Bayesian network modelling: Integration of modelling technologies. Environmental Modelling & Software 111 (2019), 386–393.

[30]

Laetitia Matignon, Guillaume J. Laurent, and Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review 27, 1 (2012), 1–31.

Digital Library

[31]

Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the INTERSPEECH. Vol. 2, Makuhari, 1045–1048.

[32]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1928–1937.

Digital Library

[33]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.

[34]

Robin R. Murphy. 2014. Disaster Robotics. MIT Press.

[35]

Illah R. Nourbakhsh, Katia Sycara, Mary Koes, Mark Yong, Michael Lewis, and Steve Burion. 2005. Human-robot teaming for search and rescue. IEEE Pervasive Computing 4, 1 (2005), 72–79.

Digital Library

[36]

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. 2021. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–35.

Digital Library

[37]

Athanasios S. Polydoros and Lazaros Nalpantidis. 2017. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems 86, 2 (2017), 153–173.

Digital Library

[38]

Pascal Poupart and Nikos Vlassis. 2008. Model-based Bayesian reinforcement learning in partially observable domains. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics,. 1–2.

[39]

Pascal Poupart, Nikos Vlassis, Jesse Hoey, and Kevin Regan. 2006. An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning. 697–704.

Digital Library

[40]

Yara Rizk, Mariette Awad, and Edward W. Tunstel. 2018. Decision making in multiagent systems: A survey. IEEE Transactions on Cognitive and Developmental Systems 10, 3 (2018), 514–529.

[41]

Mauro Scanagatta, Antonio Salmerón, and Fabio Stella. 2019. A survey on Bayesian network structure learning from data. Progress in Artificial Intelligence 8 (2019), 425–439.

Digital Library

[42]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 1889–1897.

[43]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https://arxiv.org/abs/1707.06347

[44]

Kun Shao, Zhentao Tang, Yuanheng Zhu, Nannan Li, and Dongbin Zhao. 2019. A survey of deep reinforcement learning in video games. arXiv:1912.10944. Retrieved from https://arxiv.org/abs/1912.10944

[45]

Bharat Singh, Rajesh Kumar, and Vinay Pratap Singh. 2022. Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review 55 (2022), 945–990.

Digital Library

[46]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.

Digital Library

[47]

Haoran Tang and Tuomas Haarnoja. 2017. Learning diverse skills via maximum entropy deep reinforcement learning. Berkeley Artificial Intelligence Research. Retrieved Oct 6, 2017 from https://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/

[48]

Arash Tavakoli, Fabio Pardo, and Petar Kormushev. 2018. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 4131–4138.

[49]

Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess, and Yuval Tassa. 2020. dm_control: Software and tasks for continuous control. Software Impacts 6 (2020), 100022.

[50]

Gabriel Valverde, David Quesada, Pedro Larrañaga, and Concha Bielza. 2023. Causal reinforcement learning based on Bayesian networks applied to industrial settings. Engineering Applications of Artificial Intelligence 125 (2023), 106657.

Digital Library

[51]

Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.

[52]

Jing Wang, Jinglin Zhou, Xiaolu Chen, Jing Wang, Jinglin Zhou, and Xiaolu Chen. 2022. Probabilistic graphical model for continuous variables. Data-Driven Fault Detection and Reasoning for Industrial Monitoring 3 (2022), 251–265.

[53]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1995–2003.

Digital Library

[54]

Alfred Wehrl. 1978. General properties of entropy. Reviews of Modern Physics 50, 2 (1978), 221.

[55]

Qin Yang. 2021. Self-adaptive swarm system (SASS). In Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI-21, 5040–5041. Doctoral Consortium.

[56]

Qin Yang. 2022. Self-Adaptive Swarm System. Ph.D. Dissertation. University of Georgia.

Digital Library

[57]

Qin Yang. 2023. Hierarchical needs-driven agent learning systems: From deep reinforcement learning to diverse strategies. InProceedings of the 37th AAAI 2023 Conference on Artificial Intelligence and Robotics Bridge Program.

[58]

Qin Yang and Rui Liu. 2023. Understanding the application of utility theory in robotics and artificial intelligence: A survey. arXiv:2306.09445. Retrieved from https://arxiv.org/abs/2306.09445

[59]

Qin Yang, Zhiwei Luo, Wenzhan Song, and Ramviyas Parasuraman. 2019. Self-reactive planning of multi-robots with dynamic task assignments. In Proceedings of the 2019 International Symposium on Multi-Robot and Multi-Agent Systems (MRS). IEEE, 89–91.

[60]

Qin Yang and Ramviyas Parasuraman. 2020. Hierarchical needs based self-adaptive framework for cooperative multi-robot system. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2991–2998.

Digital Library

[61]

Qin Yang and Ramviyas Parasuraman. 2020. Needs-driven heterogeneous multi-robot cooperation in rescue missions. In Proceedings of the 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE, 252–259.

Digital Library

[62]

Qin Yang and Ramviyas Parasuraman. 2021. How can robots trust each other for better cooperation? A relative needs entropy based robot-robot trust assessment model. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2656–2663.

Digital Library

[63]

Qin Yang and Ramviyas Parasuraman. 2023. A game-theoretic utility network for cooperative multi-agent decision-making in adversarial environments. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing.

Digital Library

[64]

Qin Yang and Ramviyas Parasuraman. 2023. A strategy-oriented bayesian soft actor-critic model. Procedia Computer Science 220 (2023), 561–566.

[65]

Yichuan Zhang, Yixing Lan, Qiang Fang, Xin Xu, Junxiang Li, and Yujun Zeng. 2021. Efficient reinforcement learning from demonstration via Bayesian network-based knowledge extraction. Computational Intelligence and Neuroscience 2021 (2021), 7588221. DOI:. PMCID: PMC8486502.

Digital Library

[66]

Wenshuai Zhao, Jorge Peña Queralta, and Tomi Westerlund. 2020. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 737–744.

Index Terms

Bayesian Strategy Networks Based Soft Actor-Critic Learning

Recommendations

Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system's utility, decrease the overall cost, and increase mission ...
A Strategy-Oriented Bayesian Soft Actor-Critic Model
Abstract
Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system's utility, decrease the overall cost, and increase mission ...
Off-Policy Training for Truncated TD( $λ$ ) Boosted Soft Actor-Critic
PRICAI 2021: Trends in Artificial Intelligence
Abstract
TD( $λ$ ) has become a crucial algorithm of modern reinforcement learning (RL). By introducing the trace decay parameter $λ$ , TD( $λ$ ) elegantly unifies Monte Carlo methods ( $λ = 1$ ) and one-step temporal difference prediction ( $λ = 0$ ), which can learn the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 15, Issue 3

June 2024

646 pages

EISSN:2157-6912

DOI:10.1145/3613609

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2024

Online AM: 01 February 2024

Accepted: 24 January 2024

Revised: 21 November 2023

Received: 15 August 2023

Published in TIST Volume 15, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
432
Total Downloads

Downloads (Last 12 months)432
Downloads (Last 6 weeks)19

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents