Abstract
The state and input constraints of nonlinear systems could greatly impede the realization of their optimal control when using reinforcement learning (RL)-based approaches since the commonly used quadratic utility functions cannot meet the requirements of solving constrained optimization problems. This article develops a novel optimal control approach for constrained discrete-time (DT) nonlinear systems based on safe RL. Specifically, a barrier function (BF) is introduced and incorporated with the value function to help transform a constrained optimization problem into an unconstrained one. Meanwhile, the minimum of such an optimization problem can be guaranteed to occur at the origin. Then a constrained policy iteration (PI) algorithm is developed to realize the optimal control of the nonlinear system and to enable the state and input constraints to be satisfied. The constrained optimal control policy and its corresponding value function are derived through the implementation of two neural networks (NNs). Performance analysis shows that the proposed control approach still retains the convergence and optimality properties of the traditional PI algorithm. Simulation results of three examples reveal its effectiveness. © 2023 IEEE.
| Original language | English |
|---|---|
| Pages (from-to) | 854-865 |
| Journal | IEEE Transactions on Neural Networks and Learning Systems |
| Volume | 36 |
| Issue number | 1 |
| Online published | 31 Oct 2023 |
| DOIs | |
| Publication status | Published - Jan 2025 |
Funding
This work was supported in part by the National Nature Science Foundation of China under Grant 62073286; in part by the Science Fund for Creative Research Groups of the National Natural Science Foundation of China under Grant 61621002; and in part by the Fellowship Award from the Research Grants Council of the Hong Kong Special Administrative Region, China, under Project CityU PDFS2324-1S02.
Research Keywords
- Artificial neural networks
- Barrier function (BF)
- constrained policy iteration (PI)
- discrete-time (DT) nonlinear systems
- Iterative methods
- Nonlinear systems
- optimal control
- Optimization
- Process control
- safe reinforcement learning (RL)
- Safety