research-article

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

Authors:

Hannes Eriksson,

Debabrota Basu,

Christos DimitrakakisAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 516 - 524

Published: 06 May 2024 Publication History

Abstract

In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as Model Transfer Reinforcement Learning (MTRL) problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. In the first stage, MLEMTRL uses a constrained Maximum Likelihood Estimation (MLE) -based approach to estimate the target MDP model using a set of known MDP models. In the second stage, using the estimated target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL both in realisable and non-realisable settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance depending on the similarity of the available MDPs and the target MDP.

References

[1]

John Aitchison and SD Silvey. 1958. Maximum-likelihood estimation of parameters subject to restraints. The annals of mathematical Statistics, Vol. 29, 3 (1958), 813--828.

[2]

Isac Arnekvist, Danica Kragic, and Johannes A Stork. 2019. Vpe: Variational policy embedding for transfer reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 36--42.

Digital Library

[3]

Christopher G Atkeson and Juan Carlos Santamaria. 1997. A comparison of direct and model-based reinforcement learning. In Proceedings of international conference on robotics and automation, Vol. 4. IEEE, 3557--3564.

[4]

Peter Auer, Thomas Jaksch, and Ronald Ortner. 2008. Near-optimal regret bounds for reinforcement learning. Advances in neural information processing systems, Vol. 21 (2008).

[5]

Andrew G Barto, Richard S Sutton, and Charles W Anderson. 1983. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics 5 (1983), 834--846.

[6]

David E Bell. 1982. Regret in decision making under uncertainty. Operations research, Vol. 30, 5 (1982), 961--981.

[7]

George Casella and Roger L Berger. 2021. Statistical inference. Cengage Learning.

[8]

Gabriela Ciuperca, Andrea Ridolfi, and Jérôme Idier. 2003. Penalized maximum likelihood estimator for normal mixtures. Scandinavian Journal of Statistics, Vol. 30, 1 (2003), 45--59.

[9]

Felipe Leno Da Silva and Anna Helena Reali Costa. 2019. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, Vol. 64 (2019), 645--703.

Digital Library

[10]

Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. Aaai/iaai, Vol. 1998 (1998), 761--768.

[11]

Gabriel Dulac-Arnold, Nir Levine, Daniel J Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2021. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, Vol. 110, 9 (2021), 2419--2468.

Digital Library

[12]

Hannes Eriksson, Debabrota Basu, Mina Alibeigi, and Christos Dimitrakakis. 2022. Sentinel: taming uncertainty with ensemble based distributional reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR, 631--640.

[13]

Yannis Flet-Berliac and Debabrota Basu. 2022. SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics. In RLDM 2022-The Multi-disciplinary Conference on Reinforcement Learning and Decision Making.

[14]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.

[15]

Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. (1960).

[16]

Jack Kiefer and Jacob Wolfowitz. 1956. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics (1956), 887--906.

[17]

Dieter Kraft. 1988. A software package for sequential quadratic programming. Forschungsbericht-Deutsche Forschungs- und Versuchsanstalt fur Luftund Raumfahrt (1988).

[18]

Pat Langley. 2006. Transfer of knowledge in cognitive systems. In Talk, workshop on Structural Knowledge Transfer for Machine Learning at the Twenty-Third International Conference on Machine Learning.

[19]

Romain Laroche and Merwan Barlier. 2017. Transfer reinforcement learning with shared dynamics. In Thirty-First AAAI Conference on Artificial Intelligence.

[20]

Alessandro Lazaric. 2012. Transfer in reinforcement learning: a framework and a survey. In Reinforcement Learning. Springer, 143--173.

[21]

Alessandro Lazaric and Mohammad Ghavamzadeh. 2010. Bayesian multi-task reinforcement learning. In ICML-27th International Conference on Machine Learning. Omnipress, 599--606.

[22]

Xinle Liang, Yang Liu, Tianjian Chen, Ming Liu, and Qiang Yang. 2023. Federated transfer reinforcement learning for autonomous driving. In Federated and Transfer Learning. Springer, 357--371.

[23]

Thomas Minka. 2000. Bayesian linear regression. Technical Report. Citeseer.

[24]

Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. 2023. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, Vol. 16, 1 (2023), 1--118.

[25]

Whitney K Newey and James L Powell. 1987. Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society (1987), 819--847.

[26]

I. Osband, D. Russo, and B. Van Roy. 2013. (More) efficient reinforcement learning via posterior sampling. In Advances in Neural Information Processing Systems. 3003--3011.

[27]

Reda Ouhamma, Debabrota Basu, and Odalric Maillard. 2023. Bilinear exponential family of MDPs: frequentist regret bound with tractable exploration & planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9336--9344.

Digital Library

[28]

Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2015. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015).

[29]

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 3803--3810.

Digital Library

[30]

Athanasios S Polydoros and Lazaros Nalpantidis. 2017. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, Vol. 86, 2 (2017), 153--173.

Digital Library

[31]

Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

[32]

Jian Qian, Ronan Fruit, Matteo Pirotta, and Alessandro Lazaric. 2020. Concentration inequalities for multinoulli random variables. arXiv preprint arXiv:2001.11595 (2020).

[33]

Cédric Rommel, Joseph Frédéric Bonnans, Baptiste Gregorutti, and Pierre Martinon. 2017. Aircraft dynamics identification for optimal control. In 7th European Conference on Aeronautics and Space Sciences (EUCASS 2017).

[34]

Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).

[35]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[36]

Julian Skirzy'nski, Frederic Becker, and Falk Lieder. 2021. Automatic discovery of interpretable planning strategies. Machine Learning, Vol. 110, 9 (2021), 2641--2683.

Digital Library

[37]

Alexander L Strehl and Michael L Littman. 2005. A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd international conference on Machine learning. 856--863.

Digital Library

[38]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[39]

Aviv Tamar, Daniel Soudry, and Ev Zisselman. 2022. Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8423--8431.

[40]

Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. 2018. Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018).

[41]

Matthew E Taylor, Nicholas K Jong, and Peter Stone. 2008. Transferring instances for model-based reinforcement learning. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 488--505.

[42]

Matthew E Taylor and Peter Stone. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, Vol. 10, 7 (2009).

[43]

Jan Willems. 1971. Least squares stationary optimal control and the algebraic Riccati equation. IEEE Transactions on automatic control, Vol. 16, 6 (1971), 621--634.

[44]

Aaron Wilson, Alan Fern, Soumya Ray, and Prasad Tadepalli. 2007. Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th international conference on Machine learning. 1015--1022.

Digital Library

[45]

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. 2020. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning. PMLR, 1094--1100.

[46]

Amy Zhang, Harsh Satija, and Joelle Pineau. 2018. Decoupling dynamics and reward for transfer learning. arXiv preprint arXiv:1804.10689 (2018).

[47]

Amy Zhang, Shagun Sodhani, Khimya Khetarpal, and Joelle Pineau. 2020. Learning Robust State Abstractions for Hidden-Parameter Block MDPs. In International Conference on Learning Representations.

[48]

Zhuangdi Zhu, Kaixiang Lin, Anil K Jain, and Jiayu Zhou. 2023. Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

Index Terms

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic inference problems
      1. Maximum likelihood estimation
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Transfer in variable-reward hierarchical reinforcement learning

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL ...
Reinforcement learning transfer via sparse coding
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up ...
Automatically mapped transfer between reinforcement learning tasks via three-way restricted Boltzmann machines
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Existing reinforcement learning approaches are often hampered by learning tabula rasa. Transfer for reinforcement learning tackles this problem by enabling the reuse of previously learned results, but may require an inter-task mapping to encode how the ...

Comments

Information & Contributors

Information

Published In

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Inria-Kyoto University Associate Team RELIANT
ANR JCJC grant for the REPUBLIC project
Wallenberg AI Autonomous Systems and Software Program (WASP)

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
13
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 29 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents