TY - GEN
T1 - Exploring a Reinforcement Learning Agent with Improved Prioritized Experience Replay for a Confrontation Game
AU - Zhao, Tian
PY - 2022
Y1 - 2022
N2 - Deep Q-network (DQN) is used successfully in dealing with many reinforcement learning situations and challenging tasks with real-world complexity. The current limits are the unacceptable training time to obtain satisfactory results like a human. To address this obstacle, I propose a new reinforcement learning strategy. This paper focuses on the confrontation game environment for two players with sparse reward and no direct hindsight reward function and no fixed goals. According to some strategies, algorithm can put them into reinforcement learning with reward functions and replay to give the abilities of judging in the middle of the games as references. To demonstrate the effectiveness of the proposed strategy, a new game is designed. Fence game is a confrontation game for two players that one tries their best to fence the other one in Die ow. The custom environment of this game will give the only reward functions at the end: win, lose or draw. In conclusion, these factors include performance and results proved that 1) Prioritized Experience Replay with Dynamic Hindsight reward function (DH-PER) and 2) Prioritized Experience Replay with Dynamic Hindsight reward function and Sharing (DHS-PER) both let the RL agents converge more quickly.
AB - Deep Q-network (DQN) is used successfully in dealing with many reinforcement learning situations and challenging tasks with real-world complexity. The current limits are the unacceptable training time to obtain satisfactory results like a human. To address this obstacle, I propose a new reinforcement learning strategy. This paper focuses on the confrontation game environment for two players with sparse reward and no direct hindsight reward function and no fixed goals. According to some strategies, algorithm can put them into reinforcement learning with reward functions and replay to give the abilities of judging in the middle of the games as references. To demonstrate the effectiveness of the proposed strategy, a new game is designed. Fence game is a confrontation game for two players that one tries their best to fence the other one in Die ow. The custom environment of this game will give the only reward functions at the end: win, lose or draw. In conclusion, these factors include performance and results proved that 1) Prioritized Experience Replay with Dynamic Hindsight reward function (DH-PER) and 2) Prioritized Experience Replay with Dynamic Hindsight reward function and Sharing (DHS-PER) both let the RL agents converge more quickly.
KW - Deep Q-network (DQN)
KW - Dynamic Hindsight Experience Replay (DHER)
KW - experience replay
KW - experience sharing
KW - Hindsight Experience Replay (HER)
KW - Prioritized Experience Replay (PER)
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/85129580557
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85129580557&origin=recordpage
U2 - 10.1109/BDICN55575.2022.00075
DO - 10.1109/BDICN55575.2022.00075
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 978-1-6654-8477-0
T3 - Proceedings - International Conference on Big Data, Information and Computer Network, BDICN
SP - 373
EP - 381
BT - PROCEEDINGS - 2022 International Conference on Big Data, Information and Computer Network
PB - IEEE
T2 - 2022 International Conference on Big Data, Information and Computer Network, BDICN 2022
Y2 - 20 January 2022 through 22 January 2022
ER -