Projects per year
Abstract
Deep Reinforcement Learning (DRL) has recently been focused on extracting more knowledge from the reward signal to improve sample efficiency. The Multi-Reward Architecture (MRA) achieves this by breaking down the original reward function into multiple sub-reward branches and training a source-specific policy branch for each one. However, existing MRAs treat all source-specific policy branches as equally important or assign a constant level of importance based on current task conditions, which hinders DRL agents from prioritizing the most important branch at different task stages. Additionally, this necessitates a manual and time-consuming reset of weights for each branch when task conditions change. Thus, it is crucial to automate the time-varying importance assignment for branches. We propose a generic MRA approach to achieve this goal, which can be applied to improve state-of-the-art (SOTA) MRA methods. Firstly, we add a policy branch corresponding to the original reward function, allowing MRA to learn from each sub-reward branch without losing the experience provided by the original reward. Then, we apply the Asynchronous Advantage Actor Critic (A3C) algorithm to learn time-varying weights for all policy branches. These weights are then shaped into a one-hot vector to select the suitable policy branch for producing an action. Extensive experiments have demonstrated that the proposed method effectively improves three SOTA MRA methods across four tasks in terms of episode reward, success rate, score difference, and episode duration. © 2024 IEEE.
| Original language | English |
|---|---|
| Pages (from-to) | 1865-1881 |
| Number of pages | 17 |
| Journal | IEEE Transactions on Emerging Topics in Computational Intelligence |
| Volume | 8 |
| Issue number | 2 |
| Online published | 6 Feb 2024 |
| DOIs | |
| Publication status | Published - Apr 2024 |
Bibliographical note
Information for this record is supplemented by the author(s) concerned.Funding
This work was supported in part by the Science and Technology Innovation Committee Foundation of Shenzhen under Grant JCYJ20200109143223052 and in part by Hong Kong Research Grant Council under GRF Grant 11218621.
Research Keywords
- Deep reinforcement learning
- multi-reward architecture
- time-varying importance
- A3C algorithm
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Time-Varying Weights in Multi-Reward Architecture for Deep Reinforcement Learning'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Age of Information Centric Task Scheduling in Autonomous Driving Systems
WANG, J. (Principal Investigator / Project Coordinator) & Qiao, C. (Co-Investigator)
1/01/22 → 12/12/25
Project: Research