Policy Optimization for Complex Systems


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date27 Dec 2023


As artificial intelligence and machine learning continue to advance, there is an exponential surge in the demand for enhancing complex decision-making models. Markov Decision Processes (MDPs) serve as a foundational framework for tackling sequential decision-making problems, with the goal of identifying the optimal policy to assist decision-makers in maximizing cumulative rewards. MDPs, supported by well-established theoretical principles, provide versatile solution methods for efficiently addressing sequential decision-making problems. However, in specific complex and modern decision-making scenarios, the underlying dynamics can display intricate structures that existing methods can't effectively handle. In such cases, it becomes essential to develop efficient and fast algorithms to overcome these challenges. Moreover, in many real-world applications, essential elements in the MDP framework, like transition probabilities and rewards, often remain unknown and can usually only be estimated from data, which inherently introduces potential errors. Consequently, the actual performance of the computed policy frequently diverges from the expectations of decision-makers. In this context, decision-makers with a conservative mindset may adopt a pessimistic approach, planning cautiously to safeguard against the worst-case scenario.

This dissertation explores the implementation of policy optimization within complex systems. In this thesis, our primary focus lies in two areas: firstly, the development of efficient algorithms for attaining optimal policies in specific challenging, ill-conditioned MDPs, and secondly, the establishment of novel methodologies through robust MDPs (RMDPs) aimed at computing robust policies that significantly enhance performance under the worst case. The methodologies presented in this thesis hold significant potential for application across a wide range of contemporary real-world domains, including finance, manufacturing, molecular biophysics, healthcare, robotics, and moreover, our extensive numerical experiments demonstrate the superior performance of our techniques compared to alternative methods.