Value-gradient based formulation of optimal control problem and machine learning algorithm

Alain BENSOUSSAN*, Jiayue HAN, Sheung Chi Phillip YAM, Xiang ZHOU

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

3 Citations (Scopus)
108 Downloads (CityUHK Scholars)

Abstract

Optimal control problem is typically solved by first finding the value function through the Hamilton-Jacobi equation (HJE) and then taking the minimizer of the Hamiltonian to obtain the control. In this work, instead of focusing on the value function, we propose a new formulation for the gradient of the value function (value-gradient) as a decoupled system of partial differential equations in the context of a continuous-time deterministic discounted optimal control problem. We develop an efficient iterative scheme for this system of equations in parallel by utilizing the fact that they share the same characteristic curves as the HJE for the value function. For the theoretical part, we prove that this iterative scheme converges linearly in L2α sense for some suitable exponent α in a weight function. For the numerical method, we combine a characteristic line method with machine learning techniques. Specifically, we generate multiple characteristic curves at each policy iteration from an ensemble of initial states and compute both the value function and its gradient simultaneously on each curve as the labeled data. Then supervised machine learning is applied to minimize the weighted squared loss for both the value function and its gradients. Experimental results demonstrate that this new method not only significantly increases the accuracy but also improves the efficiency and robustness of the numerical estimates, particularly with less characteristics data or fewer training steps. © 2023 Society for Industrial and Applied Mathematics

Original languageEnglish
Pages (from-to)973-994
Number of pages22
JournalSIAM Journal on Numerical Analysis
Volume61
Issue number2
Online published26 Apr 2023
DOIs
Publication statusPublished - 2023

Funding

*Received by the editors August 30, 2021; accepted for publication (in revised form) October 13, 2022; published electronically April 26, 2023. https://doi.org/10.1137/21M1442838 Funding: The work of the first author was supported by National Science Foundation grant DMS-1905449 ,grant HKSAR-GRF grant 14301321 and grant NSF-DMS 2204795. The work of the second author was supported by the HKUGC for Ph.D. candidates; part of the current work contributes to the partial fulfillment of her Ph.D. dissertation. The work of the third author was partially supported by HKGRF grant 14300319 with the project title ``Shape-constrained Inference: Testing for Monotonicity"" and HKGRF grant 14301321 with the project title ``General Theory for Infinite Dimensional Stochastic Control: Mean Field and Some Classical Problems"" awarded by Hong Kong RGC. The work of the fourth author was supported by Hong Kong RGC GRF grants 11307319, 11308121, and 11318522.

Research Keywords

  • characteristic curve
  • Hamilton-Jacobi equation
  • machine learning
  • optimal control
  • value function

Publisher's Copyright Statement

  • COPYRIGHT TERMS OF DEPOSITED FINAL PUBLISHED VERSION FILE: © 2023 Society for Industrial and Applied Mathematics.

Fingerprint

Dive into the research topics of 'Value-gradient based formulation of optimal control problem and machine learning algorithm'. Together they form a unique fingerprint.

Cite this