Value-gradient based formulation of optimal control problem and machine learning algorithm
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 973-994 |
Number of pages | 22 |
Journal / Publication | SIAM Journal on Numerical Analysis |
Volume | 61 |
Issue number | 2 |
Online published | 26 Apr 2023 |
Publication status | Published - 2023 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85159768814&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(ef143737-e2e8-4266-aa29-993152843f7a).html |
Abstract
Optimal control problem is typically solved by first finding the value function through the Hamilton-Jacobi equation (HJE) and then taking the minimizer of the Hamiltonian to obtain the control. In this work, instead of focusing on the value function, we propose a new formulation for the gradient of the value function (value-gradient) as a decoupled system of partial differential equations in the context of a continuous-time deterministic discounted optimal control problem. We develop an efficient iterative scheme for this system of equations in parallel by utilizing the fact that they share the same characteristic curves as the HJE for the value function. For the theoretical part, we prove that this iterative scheme converges linearly in L2α sense for some suitable exponent α in a weight function. For the numerical method, we combine a characteristic line method with machine learning techniques. Specifically, we generate multiple characteristic curves at each policy iteration from an ensemble of initial states and compute both the value function and its gradient simultaneously on each curve as the labeled data. Then supervised machine learning is applied to minimize the weighted squared loss for both the value function and its gradients. Experimental results demonstrate that this new method not only significantly increases the accuracy but also improves the efficiency and robustness of the numerical estimates, particularly with less characteristics data or fewer training steps. © 2023 Society for Industrial and Applied Mathematics
Research Area(s)
- characteristic curve, Hamilton-Jacobi equation, machine learning, optimal control, value function
Bibliographic Note
Citation Format(s)
In: SIAM Journal on Numerical Analysis, Vol. 61, No. 2, 2023, p. 973-994.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review