skip to main content
10.5555/3463952.3464058acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach

Published: 03 May 2021 Publication History

Abstract

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47% has been achieved in our tests.

References

[1]
Sule Anjomshoae, Amro Najjar, Davide Calvaresi, and Kary Fr"amling. 2019. Explainable Agents and Robots: Results from a Systematic Literature Review. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (Montreal QC, Canada) (AAMAS '19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1078--1088.
[2]
Eugenio Bargiacchi, Diederik M. Roijers, and Ann Nowé. 2020. AI-Toolbox: A C+ library for Reinforcement Learning and Planning (with Python Bindings). Journal of Machine Learning Research, Vol. 21, 102 (2020), 1--12. http://jmlr.org/papers/v21/18--402.html
[3]
Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 2499--2509.
[4]
Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. vZ - An Optimizing SMT Solver. In Proceedings of the 21st International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 9035. Springer-Verlag, Berlin, Heidelberg, 194--199.
[5]
Rudy Bunel, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, and Pawan Kumar Mudigonda. 2018. A Unified View of Piecewise Linear Neural Network Verification. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3--8 December 2018, Montréal, Canada. 4795--4804. http://papers.nips.cc/paper/7728-a-unified-view-of-piecewise-linear-neural-network-verification
[6]
Michael Cashmore, Anna Collins, Benjamin Krarup, Senka Krivic, Daniele Magazzeni, and David Smith. 2019. Towards Explainable AI Planning as a Service. 2nd ICAPS Workshop on Explainable Planning, XAIP 2019.
[7]
Michael Cashmore, Maria Fox, Derek Long, and Daniele Magazzeni. 2016. A Compilation of the Full PDDL+ Language into SMT. In Proceedings of the Twenty-Sixth International Conference on International Conference on Automated Planning and Scheduling (London, UK) (ICAPS'16). AAAI Press, 79--87. http://dl.acm.org/citation.cfm?id=3038594.3038605
[8]
Anthony Cassandra, Michael L. Littman, and Nevin L. Zhang. 1997. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes. In In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, 54--61.
[9]
Alberto Castellini, Georgios Chalkiadakis, and Alessandro Farinelli. 2019. Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 5540--5546.
[10]
Alberto Castellini, Enrico Marchesini, and Alessandro Farinelli. 2019. Online Monte Carlo Planning for Autonomous Robots: Exploiting Prior Knowledge on Task Similarities. In Proceedings of the 6th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2019@AI*IA2019) (CEUR Workshop Proceedings, Vol. 2594). CEUR-WS.org, 25--32.
[11]
Alberto Castellini, Enrico Marchesini, Giulio Mazzi, and Alessandro Farinelli. 2020. Explaining the Influence of Prior Knowledge on POMCP Policies. In Multi-Agent Systems and Agreement Technologies - 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14--15, 2020, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 12520). Springer, 261--276.
[12]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hungary) (TACAS'08/ETAPS'08). Springer-Verlag, Berlin, Heidelberg, 337--340.
[13]
Maria Fox, Derek Long, and Daniele Magazzeni. 2017. Explainable Planning. CoRR, Vol. abs/1709.10256 (2017). http://arxiv.org/abs/1709.10256
[14]
David Gunning. 2019. DARPA's Explainable Artificial Intelligence (XAI) Program. IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces, ii--ii.
[15]
Ernst Hellinger. 1909. Neue begründung der theorie quadratischer formen von unendlichvielen ver"anderlichen. Journal für die reine und angewandte Mathematik, Vol. 136 (1909), 210--271.
[16]
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety Verification of Deep Neural Networks. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24--28, 2017, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 10426). Springer, 3--29.
[17]
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artif. Intell., Vol. 101, 1--2 (May 1998), 99--134.
[18]
Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Computer Aided Verification, Rupak Majumdar and Viktor Kunvc ak (Eds.). Springer International Publishing, Cham, 97--117.
[19]
Levente Kocsis and Csaba Szepesvári. 2006. Bandit Based Monte-Carlo Planning. In Proc. ECML'06. Springer-Verlag, Berlin, Heidelberg, 282--293.
[20]
Pat Langley, Ben Meadows, Mohan Sridharan, and Dongkyu Choi. 2017. Explainable Agency for Intelligent Autonomous Systems. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 4762--4763.
[21]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining. 413--422.
[22]
Giulio Mazzi, Alberto Castellini, and Alessandro Farinelli. 2020. Policy Interpretation for Partially Observable Monte-Carlo Planning: a Rule-based Approach. In Proceedings of the 7th Italian Workshop on Artificial Intelligence and Robotics (AIRO 2020@AI*IA2020) (CEUR Workshop Proceedings, Vol. 2806). CEUR-WS.org, 44--48.
[23]
Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems. Real-Time Systems, Vol. 53, 3 (2017), 354--402.
[24]
Christos H. Papadimitriou and John N. Tsitsiklis. 1987. The Complexity of Markov Decision Processes. Math. Oper. Res., Vol. 12, 3 (Aug. 1987), 441--450.
[25]
Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Rron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[26]
David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 2164--2172. http://papers.nips.cc/paper/4031-monte-carlo-planning-in-large-pomdps.pdf
[27]
Laurens van der Maaten and Geoffrey Hinton. 2008. Viualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9 (11 2008), 2579--2605.
[28]
Yue Wang, Swarat Chaudhuri, and Lydia E. Kavraki. 2018. Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives. ArXiv, Vol. abs/1801.09780 (2018).
[29]
He Zhu, Zikang Xiong, Stephen Magill, and Suresh Jagannathan. 2019. An Inductive Synthesis Framework for Verifiable Reinforcement Learning. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 686--701.

Cited By

View all
  • (2023)Scalable safe policy improvement via monte carlo tree searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618560(3732-3756)Online publication date: 23-Jul-2023

Index Terms

  1. Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
    May 2021
    1899 pages
    ISBN:9781450383073

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 03 May 2021

    Check for updates

    Author Tags

    1. anomaly detection
    2. explainable planning
    3. max-smt
    4. pomcp

    Qualifiers

    • Research-article

    Conference

    AAMAS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Scalable safe policy improvement via monte carlo tree searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618560(3732-3756)Online publication date: 23-Jul-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media