Skip to main navigation Skip to search Skip to main content

Fast Bellman Updates for Robust MDPs

  • Chin Pang Ho*
  • , Marek Petrik
  • , Wolfram Wiesemann
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s, a-rectangular ambiguity sets. It runs in quasi-linear time for plain L1 norms and also generalizes to weighted L1 norms. The second algorithm uses bisection to compute updates for robust MDPs with s-rectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasi-linear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.
Original languageEnglish
Title of host publicationProceedings of 35th International Conference on Machine Learning, ICML 2018
PublisherInternational Machine Learning Society (IMLS)
Pages1979-1988
Volume5
ISBN (Print)9781510867963
Publication statusPublished - Jul 2018
Externally publishedYes
Event35th International Conference on Machine Learning (ICML 2018) - Stockholm, Sweden
Duration: 10 Jul 201815 Jul 2018
https://icml.cc/Conferences/2018

Publication series

NameProceedings of Machine Learning Research (PMLR)
Volume80
ISSN (Electronic)2640-3498

Conference

Conference35th International Conference on Machine Learning (ICML 2018)
PlaceSweden
CityStockholm
Period10/07/1815/07/18
Internet address

Fingerprint

Dive into the research topics of 'Fast Bellman Updates for Robust MDPs'. Together they form a unique fingerprint.

Cite this