Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent Learning
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 107 |
Number of pages | 28 |
Journal / Publication | ACM Transactions on Intelligent Systems and Technology |
Volume | 14 |
Issue number | 6 |
Online published | 14 Nov 2023 |
Publication status | Published - Dec 2023 |
Link(s)
Abstract
In Deep Reinforcement Learning (DRL) domain, a compound learning task is often decomposed into several sub-tasks in a divide-and-conquer manner, each trained separately and then fused concurrently to achieve the original task, referred to as policy fusion. However, the state-of-the-art (SOTA) policy fusion methods treat the importance of sub-tasks equally throughout the task process, eliminating the possibility of the agent relying on different sub-tasks at various stages. To address this limitation, we propose a generic policy fusion approach, referred to as Policy Fusion Learning with Dynamic Weights and Prior Reward (PFLDWPR), to automate the time-varying selection of sub-tasks. Specifically, PFLDWPR produces a time-varying one-hot vector for sub-tasks to dynamically select a suitable sub-task and mask the rest throughout the entire task process, enabling the fused strategy to optimally guide the agent in executing the compound task. The sub-tasks with the dynamic one-hot vector are then aggregated to obtain the action policy for the original task. Moreover, we collect sub-tasks’s rewards at the pre-training stage as a prior reward, which, along with the current reward, is used to train the policy fusion network. Thus, this approach reduces fusion bias by leveraging prior experience. Experimental results under three popular learning tasks demonstrate that the proposed method significantly improves three SOTA policy fusion methods in terms of task duration, episode reward, and score difference. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Research Area(s)
- Compound agent learning, deep reinforcement learning, policy fusion, dynamic weights, prior reward
Bibliographic Note
Research Unit(s) information for this publication is provided by the author(s) concerned.
Citation Format(s)
Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent Learning. / XU, Meng; SHE, Yechao; JIN, Yang et al.
In: ACM Transactions on Intelligent Systems and Technology, Vol. 14, No. 6, 107, 12.2023.
In: ACM Transactions on Intelligent Systems and Technology, Vol. 14, No. 6, 107, 12.2023.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review