Policy Optimization for Complex Systems

Student thesis: Doctoral Thesis

Abstract

As artificial intelligence and machine learning continue to advance, there is an exponential surge in the demand for enhancing complex decision-making models. Markov Decision Processes (MDPs) serve as a foundational framework for tackling sequential decision-making problems, with the goal of identifying the optimal policy to assist decision-makers in maximizing cumulative rewards. MDPs, supported by well-established theoretical principles, provide versatile solution methods for efficiently addressing sequential decision-making problems. However, in specific complex and modern decision-making scenarios, the underlying dynamics can display intricate structures that existing methods can't effectively handle. In such cases, it becomes essential to develop efficient and fast algorithms to overcome these challenges. Moreover, in many real-world applications, essential elements in the MDP framework, like transition probabilities and rewards, often remain unknown and can usually only be estimated from data, which inherently introduces potential errors. Consequently, the actual performance of the computed policy frequently diverges from the expectations of decision-makers. In this context, decision-makers with a conservative mindset may adopt a pessimistic approach, planning cautiously to safeguard against the worst-case scenario.

This dissertation explores the implementation of policy optimization within complex systems. In this thesis, our primary focus lies in two areas: firstly, the development of efficient algorithms for attaining optimal policies in specific challenging, ill-conditioned MDPs, and secondly, the establishment of novel methodologies through robust MDPs (RMDPs) aimed at computing robust policies that significantly enhance performance under the worst case. The methodologies presented in this thesis hold significant potential for application across a wide range of contemporary real-world domains, including finance, manufacturing, molecular biophysics, healthcare, robotics, and moreover, our extensive numerical experiments demonstrate the superior performance of our techniques compared to alternative methods.
Date of Award27 Dec 2023
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorDuan LI (Supervisor), Chin Pang HO (Supervisor) & Qi WU (Co-supervisor)

Cite this

'