Abstract
As classic mathematical models for describing the dynamic behavior of physical systems, continuous-time nonlinear systems are widely applied across various industrial fields. However, traditional control theory for nonlinear systems is strongly dependent on accurate dynamical models, which limits the intelligent development of the systems to a certain extent. As an important branch of artificial intelligence, reinforcement learning leverages information from interaction with the environment to solve complex decision-making problems, thus relaxing the reliance on the system model and expert knowledge. Over the years, the research on reinforcement learning control for nonlinear systems, wherein the value function and control input are iteratively solved, has attracted widespread attention from both theoretical and industrial researchers in the control field. Since the vast majority of dynamic systems under consideration possess continuous state and action spaces, it is often necessary to introduce function approximators for estimating the value of state’s function. This type of approach, which integrates function approximation, reinforcement learning, and optimal control, is also referred to as adaptive dynamic programming and has already achieved remarkable results in the intelligent control field.Nevertheless, the study of reinforcement learning-based control for continuous-time nonlinear systems still faces several limitations. Firstly, reinforcement learning control based on policy iteration algorithms typically requires an initial admissible control policy. For complex nonlinear systems, such an initial policy is often difficult to obtain. Therefore, it is of practical significance to investigate generalized policy iteration algorithms for nonlinear systems. Secondly, the impact of other inputs present in the system, such as external disturbances or multiple inputs simultaneously controlling the system, also warrants investigation. Additionally, the policy optimization algorithms for nonlinear systems usually require a large amount of data and exploration trials to learn effective policies. This requirement is challenging to meet in practical applications, especially when the cost of data collection is high. Improving data efficiency is thus a valuable research problem. Lastly, when multiple nonlinear systems are interconnected, the increased state dimensionality makes centralized control difficult to implement. Consequently, it is necessary to employ reinforcement learning to study the decentralized control problem of interconnected nonlinear systems, thereby extending the existing learning approaches to the control of interconnected systems. The main work and innovation of this dissertation are as follows:
• For continuous-time nonlinear systems, the robust control problem is transformed into a two-player zero-sum game by treating the external disturbance as a hostile player who tries to disrupt system performance, and a novel approach to find the saddle point from upper and lower performance index functions is proposed. Subsequently, a partially model-free generalized policy iteration algorithm based on system state and input trajectories data is developed, where a user-defined update horizon parameter is used to adjust the convergence speed, which results in a mixture of the existing policy iteration and value iteration algorithms, relaxing the dependency on the dynamic model. A convergence analysis of the upper (lower) value function-based generalized policy iteration algorithm is established according to the properties of the Bellman equation.
• For the optimal control problem of hierarchical input continuous-time nonlinear systems with external disturbances, a costate is introduced to perform problem transformation into a leader-follower optimization problem, and an approximate solution to the Stackelberg-saddle equilibrium based on parametric equations is proposed. With the aid of neural networks, the identifier-actor-critic-disturbance structure is established, and the convergence analysis of this algorithm as well as the system stability are ensured. The proposed approach expands the existing Stackelberg policy learning schemes to continuous-time nonlinear systems under external disturbances and avoids solving recursive difference Riccati equations, improving computational efficiency.
• For the discount-cost optimal control probem for the continuous-time input nonaffine system, a new value function which still satisfies the Bellman equation is established to improve data utilization efficiency. A temporal-difference-based multi-step Q-learning algorithm for discretized system is proposed to learn the optimal control policy when no knowledge about the system model is available in advance but the running data, and the convergence analysis of the advantage function in this algorithm is further provided. This study strengthens the connection between two adjacent sampled states and fills the research gap in multi-step Q-learning-based control of nonaffine systems where the states within the sampling intervals play no role except to contribute their single-step costs and only discrete systems are considered.
• For the trajectory tracking control issue for a class of two-wheeled mobile robots subject to model uncertainties and unknown disturbances, an easy-to-operate velocity error dynamic system is developed through kinematic and dynamic modeling aspects. A logic-based iterative learning control scheme with neural network approximation is proposed to compensate for system uncertainties and external disturbances, overcoming the negative effects of unknown dynamics and periodic disturbances on system stability. Moreover, the boundness of all the closed-loop signals is rigorously analyzed based on the Lyapunov stability theory to provide theoretical foundation. The proposed approach validates the feasibility and effectiveness of adaptive dynamic programming principle in practical control scenarios.
• For the control problem of interconnected continuous-time nonlinear systems, a decentralized stabilizing control policy set based on local information of subsystems is developed, overcoming the challenges associated with the difficulty of measuring interconnection terms and the high dimensionality of centralized control. Using the least-squares method, an off-policy adaptive dynamic programming algorithm is established to iteratively solving for the subsystem’s best response. The equivalence between the decoupled Hamilton-Jacobi-Bellman equation and the coupled one is established, further providing stability analysis of the closed-loop interconnected system. The proposed research expands the decentralized stability control strategy learning scheme for interconnected systems with unknown dynamics.
Date of Award | 24 Dec 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Min XIE (Supervisor) & Junlin Xiong (External Supervisor) |