research-article

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach

Authors:

Alberto Castellini,

Alessandro FarinelliAuthors Info & Claims

AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

Pages 889 - 897

Published: 03 May 2021 Publication History

Abstract

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47% has been achieved in our tests.

References

[1]

Sule Anjomshoae, Amro Najjar, Davide Calvaresi, and Kary Fr"amling. 2019. Explainable Agents and Robots: Results from a Systematic Literature Review. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (Montreal QC, Canada) (AAMAS '19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1078--1088.

[2]

Eugenio Bargiacchi, Diederik M. Roijers, and Ann Nowé. 2020. AI-Toolbox: A C+ library for Reinforcement Learning and Planning (with Python Bindings). Journal of Machine Learning Research, Vol. 21, 102 (2020), 1--12. http://jmlr.org/papers/v21/18--402.html

[3]

Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 2499--2509.

Digital Library

[4]

Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. vZ - An Optimizing SMT Solver. In Proceedings of the 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 9035. Springer-Verlag, Berlin, Heidelberg, 194--199.

[5]

Rudy Bunel, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, and Pawan Kumar Mudigonda. 2018. A Unified View of Piecewise Linear Neural Network Verification. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3--8 December 2018, Montréal, Canada. 4795--4804. http://papers.nips.cc/paper/7728-a-unified-view-of-piecewise-linear-neural-network-verification

[6]

Michael Cashmore, Anna Collins, Benjamin Krarup, Senka Krivic, Daniele Magazzeni, and David Smith. 2019. Towards Explainable AI Planning as a Service. 2nd ICAPS Workshop on Explainable Planning, XAIP 2019.

[7]

Michael Cashmore, Maria Fox, Derek Long, and Daniele Magazzeni. 2016. A Compilation of the Full PDDL+ Language into SMT. In Proceedings of the Twenty-Sixth International Conference on International Conference on Automated Planning and Scheduling (London, UK) (ICAPS'16). AAAI Press, 79--87. http://dl.acm.org/citation.cfm?id=3038594.3038605

Digital Library

[8]

Anthony Cassandra, Michael L. Littman, and Nevin L. Zhang. 1997. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes. In In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, 54--61.

[9]

Alberto Castellini, Georgios Chalkiadakis, and Alessandro Farinelli. 2019. Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 5540--5546.

[10]

Alberto Castellini, Enrico Marchesini, and Alessandro Farinelli. 2019. Online Monte Carlo Planning for Autonomous Robots: Exploiting Prior Knowledge on Task Similarities. In Proceedings of the 6th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2019@AI*IA2019) (CEUR Workshop Proceedings, Vol. 2594). CEUR-WS.org, 25--32.

[11]

Alberto Castellini, Enrico Marchesini, Giulio Mazzi, and Alessandro Farinelli. 2020. Explaining the Influence of Prior Knowledge on POMCP Policies. In Multi-Agent Systems and Agreement Technologies - 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14--15, 2020, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 12520). Springer, 261--276.

[12]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hungary) (TACAS'08/ETAPS'08). Springer-Verlag, Berlin, Heidelberg, 337--340.

[13]

Maria Fox, Derek Long, and Daniele Magazzeni. 2017. Explainable Planning. CoRR, Vol. abs/1709.10256 (2017). http://arxiv.org/abs/1709.10256

[14]

David Gunning. 2019. DARPA's Explainable Artificial Intelligence (XAI) Program. IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces, ii--ii.

Digital Library

[15]

Ernst Hellinger. 1909. Neue begründung der theorie quadratischer formen von unendlichvielen ver"anderlichen. Journal für die reine und angewandte Mathematik, Vol. 136 (1909), 210--271.

[16]

Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety Verification of Deep Neural Networks. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24--28, 2017, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 10426). Springer, 3--29.

[17]

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artif. Intell., Vol. 101, 1--2 (May 1998), 99--134.

Digital Library

[18]

Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Computer Aided Verification, Rupak Majumdar and Viktor Kunvc ak (Eds.). Springer International Publishing, Cham, 97--117.

[19]

Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. In Proc. ECML'06. Springer-Verlag, Berlin, Heidelberg, 282--293.

[20]

Pat Langley, Ben Meadows, Mohan Sridharan, and Dongkyu Choi. 2017. Explainable Agency for Intelligent Autonomous Systems. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 4762--4763.

Digital Library

[21]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining. 413--422.

Digital Library

[22]

Giulio Mazzi, Alberto Castellini, and Alessandro Farinelli. 2020. Policy Interpretation for Partially Observable Monte-Carlo Planning: a Rule-based Approach. In Proceedings of the 7th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2020@AI*IA2020) (CEUR Workshop Proceedings, Vol. 2806). CEUR-WS.org, 44--48.

[23]

Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems. Real-Time Systems, Vol. 53, 3 (2017), 354--402.

Digital Library

[24]

Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes. Math. Oper. Res., Vol. 12, 3 (Aug. 1987), 441--450.

Digital Library

[25]

Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Rron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[26]

David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 2164--2172. http://papers.nips.cc/paper/4031-monte-carlo-planning-in-large-pomdps.pdf

Digital Library

[27]

Laurens van der Maaten and Geoffrey Hinton. 2008. Viualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9 (11 2008), 2579--2605.

[28]

Yue Wang, Swarat Chaudhuri, and Lydia E. Kavraki. 2018. Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives. ArXiv, Vol. abs/1801.09780 (2018).

[29]

He Zhu, Zikang Xiong, Stephen Magill, and Suresh Jagannathan. 2019. An Inductive Synthesis Framework for Verifiable Reinforcement Learning. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 686--701.

Digital Library

Cited By

Castellini ABianchi FZorzi ESimão TFarinelli ASpaan MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Scalable safe policy improvement via monte carlo tree searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618560(3732-3756)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618560

Index Terms

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning under uncertainty

Recommendations

Risk-aware shielding of Partially Observable Monte Carlo Planning policies
Abstract
Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding ...
Risk-sensitive planning in partially observable environments
AAMAS '10: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Partially Observable Markov Decision Process (POMDP) is a popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is risk-neutral in that it assumes that the agent is maximizing the expected reward of its ...
Planning and acting in partially observable stochastic domains

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable ...

Comments

Information & Contributors

Information

Published In

AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

May 2021

1899 pages

ISBN:9781450383073

General Chairs:
Frank Dignum
Umeå University, Sweden
,
Alessio Lomuscio
Imperial College London, UK
,
Program Chairs:
Ulle Endriss
University of Amsterdam, Netherlands
,
Ann Nowé
Vrije Universiteit Brussel, Belgium

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 03 May 2021

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '21

Sponsor:

SIGAI

AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems

May 3 - 7, 2021

Virtual Event, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
27
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Castellini ABianchi FZorzi ESimão TFarinelli ASpaan MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Scalable safe policy improvement via monte carlo tree searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618560(3732-3756)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618560

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents