Projects per year
Abstract
Optimal control problem is typically solved by first finding the value function through the Hamilton-Jacobi equation (HJE) and then taking the minimizer of the Hamiltonian to obtain the control. In this work, instead of focusing on the value function, we propose a new formulation for the gradient of the value function (value-gradient) as a decoupled system of partial differential equations in the context of a continuous-time deterministic discounted optimal control problem. We develop an efficient iterative scheme for this system of equations in parallel by utilizing the fact that they share the same characteristic curves as the HJE for the value function. For the theoretical part, we prove that this iterative scheme converges linearly in L2α sense for some suitable exponent α in a weight function. For the numerical method, we combine a characteristic line method with machine learning techniques. Specifically, we generate multiple characteristic curves at each policy iteration from an ensemble of initial states and compute both the value function and its gradient simultaneously on each curve as the labeled data. Then supervised machine learning is applied to minimize the weighted squared loss for both the value function and its gradients. Experimental results demonstrate that this new method not only significantly increases the accuracy but also improves the efficiency and robustness of the numerical estimates, particularly with less characteristics data or fewer training steps. © 2023 Society for Industrial and Applied Mathematics
Original language | English |
---|---|
Pages (from-to) | 973-994 |
Number of pages | 22 |
Journal | SIAM Journal on Numerical Analysis |
Volume | 61 |
Issue number | 2 |
Online published | 26 Apr 2023 |
DOIs | |
Publication status | Published - 2023 |
Funding
*Received by the editors August 30, 2021; accepted for publication (in revised form) October 13, 2022; published electronically April 26, 2023. https://doi.org/10.1137/21M1442838 Funding: The work of the first author was supported by National Science Foundation grant DMS-1905449 ,grant HKSAR-GRF grant 14301321 and grant NSF-DMS 2204795. The work of the second author was supported by the HKUGC for Ph.D. candidates; part of the current work contributes to the partial fulfillment of her Ph.D. dissertation. The work of the third author was partially supported by HKGRF grant 14300319 with the project title ``Shape-constrained Inference: Testing for Monotonicity"" and HKGRF grant 14301321 with the project title ``General Theory for Infinite Dimensional Stochastic Control: Mean Field and Some Classical Problems"" awarded by Hong Kong RGC. The work of the fourth author was supported by Hong Kong RGC GRF grants 11307319, 11308121, and 11318522.
Research Keywords
- characteristic curve
- Hamilton-Jacobi equation
- machine learning
- optimal control
- value function
Publisher's Copyright Statement
- COPYRIGHT TERMS OF DEPOSITED FINAL PUBLISHED VERSION FILE: © 2023 Society for Industrial and Applied Mathematics.
Fingerprint
Dive into the research topics of 'Value-gradient based formulation of optimal control problem and machine learning algorithm'. Together they form a unique fingerprint.-
GRF: Topics on Dynamics and Algorithms for Saddle Point Calculation
ZHOU, X. (Principal Investigator / Project Coordinator)
1/09/22 → …
Project: Research
-
GRF: Theory of Deep Learning: from CNNs to RNNs
ZHOU, X. (Principal Investigator / Project Coordinator)
1/01/22 → …
Project: Research
-
GRF: Learning Theory of Deep Structured Neural Networks
ZHOU, X. (Principal Investigator / Project Coordinator)
1/01/20 → 28/12/23
Project: Research
Student theses
-
Value-Gradient Algorithm for Optimal Control and Residual-Quantile Adjustment Method
HAN, J. (Author), ZHOU, X. (Supervisor), 16 May 2024Student thesis: Doctoral Thesis