Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

13 Scopus Citations
View graph of relations



Original languageEnglish
Article number9088297
Pages (from-to)1106-1121
Journal / PublicationIEEE Transactions on Multimedia
Online published6 May 2020
Publication statusPublished - 2021


Rate control (RC) plays a critical role in the transmission of high-quality video data under certain bandwidth restrictions in High Efficiency Video Coding (HEVC). Most current HEVC RC algorithms based on spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast moving objects, significant object occlusion or scene changes. In this paper, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.

Research Area(s)

  • reinforcement learning, rate control, rate-distortion optimization, dynamically changing video