TY - JOUR
T1 - Hybrid MDP based integrated hierarchical Q-learning
AU - Chen, ChunLin
AU - Dong, DaoYi
AU - Li, Han-Xiong
AU - Tarn, Tzyh-Jong
PY - 2011/11
Y1 - 2011/11
N2 - As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e. g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach. © 2011 Science China Press and Springer-Verlag Berlin Heidelberg.
AB - As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e. g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach. © 2011 Science China Press and Springer-Verlag Berlin Heidelberg.
KW - hierarchical Q-learning
KW - hybrid MDP
KW - reinforcement learning
KW - temporal abstraction
UR - http://www.scopus.com/inward/record.url?scp=80255135494&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-80255135494&origin=recordpage
U2 - 10.1007/s11432-011-4332-6
DO - 10.1007/s11432-011-4332-6
M3 - RGC 21 - Publication in refereed journal
SN - 1674-733X
VL - 54
SP - 2279
EP - 2294
JO - Science China Information Sciences
JF - Science China Information Sciences
IS - 11
ER -