Value-Gradient Algorithm for Optimal Control and Residual-Quantile Adjustment Method

最優控制的價值函數-梯度計算方法和殘差分位數自適應方法

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date16 May 2024

Abstract

This dissertation includes the value-gradient formulation for optimal control problem and a residual-quantile adjustment (RQA) technique for residual based PDE solvers. For the first topic, we look deep into the convergence analysis and provide sufficient numerical experiments to show the robustness.  For the second topic, the insight of adaptivity is shown and the effectiveness of RQA is proved through various examples. Brief introductions for both topics are given as below.

Typically, optimal control problem is solved by first finding the value function through Hamilton–Jacobi equation (HJE) and then taking the minimizer of the Hamiltonian to obtain the control. Instead of focusing on the value function, however, we propose a new formulation for the gradient of the value function (value-gradient) as a decoupled system of partial differential equations in the context of continuous-time deterministic discounted optimal control problem.  An efficient iterative scheme, PI-lambda, is developed for this system of equations in parallel by utilizing the properties that they share the same characteristic curves as the HJE for the value function.

For the theoretical part, we prove that this iterative scheme converges linearly in L2α sense for some suitable exponent α in a weight function.

For the numerical method, we combine characteristic line method with machine learning techniques.  Specifically, we generate multiple characteristic curves at each policy iteration from an ensemble of initial states, and compute both the value function and its gradient simultaneously on each curve as the labelled data.  Then supervised machine learning is applied to minimize the weighted squared loss for both the value function and its gradients.

Experimental results demonstrate that this new method not only significantly increases the accuracy but also improves the efficiency and robustness of the numerical estimates, particularly with less amount of characteristics data or fewer training steps.

Besides the HJE, machine learning methods is also applied to provide numerical solutions to general high dimensional partial differential equations (PDE). The rapid development of residual-based solvers, such as Physics-informed neural network (PINN), left unsolved problems on effective and accurate training.

Adaptive training methods for PINN require dedicated constructions of the distribution of weights assigned to each training sample. To efficiently seek such an optimal weight distribution is not a simple task and most existing methods choose the adaptive weights based on approximating the full distribution or the maximum of residuals. We show that the bottleneck in the adaptive choice of samples for training efficiency is the behavior of the tail distribution of the numerical residual. Thus, we propose the Residual-Quantile Adjustment (RQA) method for a better weight choice for each training sample.  After initially setting the weights proportional to the p-th power of the residual, our RQA method reassign all weights above q-quantile (90% for example) to the median value, so that the weight follows a quantile-adjusted distribution derived from the residuals. This iterative reweighting technique, on the other hand, is also very easy to implement. Experiment results show that the proposed method can outperform several adaptive methods on various partial differential equation problems.

The dissertation is organized as follows.

Chapter 1 mainly provides theoretical foundations for value-gradient formulation. Specifically, optimal control problem and HJE are briefly reviewed in Section 1.1 and Section 1.2. PI-lambda algorithm is proposed in Section 1.3, followed by the convergence analysis in Section 1.4.

Chapter 2 first gives a big picture of the deep learning based PI-lambda framework in Section 2.1. Then the numerical details of characteristic line method and supervised learning are introduced in Section 2.2. Numerical results of 3 sections afterwards (Section 2.3, 2.4, 2.5) will discuss the performance of PI-lambda from different aspects in detail.

In Chapter 3, the RQA method is presented.  Section 3.1 offers an overall introduction to the background and problem formulation. Previous works, especially PINN and Selectnet are covered in Section 3.2.  Section 3.3 contains the main method and Section 3.4 is the numerical results.

The dissertation is concluded in Chapter 4.

    Research areas

  • Machine learning, Control theory, Hamilton-Jacobi equations, Adaptive, Partial Differential Equation (PDE)