A Policy Optimization Method Towards Optimal-time Stability

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

View graph of relations

Author(s)

  • Shengjie Wang
  • Fengbo Lan
  • Yuxue Cao
  • Oluwatosin Oseni
  • Haotian Xu
  • Tao Zhang
  • Yang Gao

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings of The 7th Conference on Robot Learning
EditorsJie Tan, Marc Toussaint, Kourosh Darvish
PublisherML Research Press
Number of pages29
Publication statusPublished - 2023

Publication series

NameProceedings of Machine Learning Research
Volume229
ISSN (Print)2640-3498

Conference

Title2023 Conference on Robot Learning (CoRL 2023)
PlaceUnited States
CityAtlanta
Period6 - 9 November 2023

Abstract

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns. © 2023 Proceedings of Machine Learning Research. All Rights Reserved.

Research Area(s)

  • Reinforcement Learning, Robotic Control, Stability

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

A Policy Optimization Method Towards Optimal-time Stability. / Wang, Shengjie; Lan, Fengbo; Zheng, Xiang et al.
Proceedings of The 7th Conference on Robot Learning. ed. / Jie Tan; Marc Toussaint; Kourosh Darvish. ML Research Press, 2023. (Proceedings of Machine Learning Research; Vol. 229).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review