Bounded-parameter partially observable Markov decision processes

Yaodong Ni, Zhi-Qiang Liu

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

12 Citations (Scopus)

Abstract

The POMDP is considered as a powerful model for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model precisely the real-life situations, due to various reasons such as limited data for learning the model, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four typical strategies for setting U-set and L-set, and some of them guarantee that the modified value iteration is implemented through the algorithm. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are revealed by empirical studies. Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Original languageEnglish
Title of host publicationICAPS 2008 - Proceedings of the 18th International Conference on Automated Planning and Scheduling
Pages240-247
Publication statusPublished - 2008
Event18th International Conference on Automated Planning and Scheduling, ICAPS 2008 - Sydney, NSW, Australia
Duration: 14 Sept 200818 Sept 2008

Conference

Conference18th International Conference on Automated Planning and Scheduling, ICAPS 2008
PlaceAustralia
CitySydney, NSW
Period14/09/0818/09/08

Fingerprint

Dive into the research topics of 'Bounded-parameter partially observable Markov decision processes'. Together they form a unique fingerprint.

Cite this