Optimization Techniques for Rate Control in High Efficiency Video Coding

高效視頻編碼中碼率控制的優化技術

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Wei GAO

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date24 Oct 2016

Abstract

As the era the internet of things (IoT), cloud computing and big data comes, the explosive volume of data appears to be a big difficulty to handle, especially for storage and transmission. Since video data occupies a large volume and is naturally a big data, it acts as a major part of multimedia content. The storage and transmission of video content become urgent problems to be solved in various practical applications. Video compression, i.e. video coding, makes efforts to compress the volume of video data in order to relieve the burden in storage and transmission, which becomes an effective approach to reduce the expenses of storage devices and communication bandwidth.
The research and industrialization of video coding has lasted for decades, several different standardization proposals have continuously been updated and improved, including the new coding tools for the Rate-Distortion-Optimization (RDO) and the practical transmission problems. Recently, the H.264/AVC has been replaced by the new High Efficiency Video Coding (H.265/HEVC). The standardization work for HEVC has been finished by the Joint Collaborative Team on Video Coding (JCT-VC) which is organized by both ITU-T Video Coding Experts Group (VCEG) and the ISO Moving Picture Experts Group (MPEG) in 2013.
However, it is still a challenge to make videos compressed by HEVC be effectively stored and transmitted. Rate control (RC) is an important problem in video coding which makes effort to provide the best reconstructed video quality under a certain bandwidth constraint and simultaneously achieved desirable results on quality smoothness, buffer occupancy, bitrate accuracy and computational complexity. As a typical resource allocation problem, the aim of bit allocation in RC is to achieve the best overall R-D performances by efficiently allocating the limited bit resources at different coding levels, such as Group of Pictures (GOPs), frames and Coding Tree Units (CTUs). High performance bit allocation and RC schemes can achieve a much better visual quality and experience of multimedia video contents. A reasonable and effective modeling approach is the key to achieve the optimization goals of RC.
In this thesis, three different approaches are proposed and discussed to improve the performances of RC in HEVC. They are statistical analysis based, game theory modeling based, perceptual cue based, respectively.
For the statistical analysis based RC approach, a new ρ-domain based frame-level bit allocation method is proposed based on the new simplified Synthesized Laplacian Distribution (SynLD) model for Discrete Cosine Transform (DCT) coefficients distribution modeling and the adaptive prediction method for Quality Dependency Factors (QDFs) among frames. The existing DCT coefficients distribution modeling is the Mixture Laplacian Distribution (MLD) model which is based on the quadtree partition scheme in HEVC. However, MLD model is complex and it is difficult to derive effective rate and distortion models for the further optimization due to the multiplication-add form. The proposed SynLD model is derived from the Kullback-Leibler (KL) divergence analysis which has the similar form and complexity with the Single Laplacian Distribution (SLD) model which is very helpful for the further achievement of numerical solution for the rate and distortion models. Moreover, the quality dependency among frames in the Random Access (RA) coding structure in HEVC is analyzed. The existing works usually set the Quality Dependency Factor (QDF) as a constant value for different frames and sequence which is not reasonable since the QDF is highly related to the video content. In this thesis, QDFs are modeled to be correlated with QPs and skip mode percentages (SMP). There exists an approximate relationship between QDF and SMP. Therefore, SMP is applied to predict the QDF adaptively. Based on these two improvements, a ρ-domain based bit allocation and RC method is proposed to enhance the RC performances in HEVC. Compared with bit allocation method based on the fixed bit ratios according to bits per pixel (bpp) values for a GOP, the proposed method can be more adaptive to the frame content and coding complexity. Therefore, by handling the DCT coefficient modeling and quality dependency better in HEVC, the proposed statistical analysis based RC approach can works better.
For the game theory modeling based RC approach, a Structural Similarity (SSIM) based cooperative game theory modeling method is applied to CTU level bit allocation for the intra frames in HEVC. In the proposed SSIM-based game theory modeling approach (SSIM-GT), several contributions have been made to improve the R-D performances of RC. Firstly, the complexity-based R-D models are investigated for CTUs based on Mean Squared Error (MSE) and SSIM metrics, respectively. The coding complexity consistency are measured for prediction accuracy assurance. The detailed R-D modeling for CTUs are the prerequisite for the R-D performance improvement in a frame. Secondly, it is the first time to apply SSIM based GT method for CTU bit allocation optimization. Every CTU is modeled as a player. The convexity of the feasible SSIM based utility set is proved. Nash Bargaining Solution (NBS) is used to achieve the optimal bit allocation solution. The influence of minimum utility setting in this method is also discussed. Thirdly, a two-stage remaining bit refinement method is proposed for more accurate and smooth bit allocation. The MSE and SSIM based quality metrics are both improved in the SSIM-GT based RC method.
For the perceptual cue based RC approach, we explore the Phase Congruency (PC) maps and PC distortions (PCD) in video coding for better RC optimization. Multiscale PC analysis method can suppress the non-salient background in an image. PC can extract an edge saliency map to indicate the textural complexity. It is the first time to introduce PC based edge saliency map into image and video coding. The relationship between PCD and Quantization step (Qstep) and that between PCD and R are investigated. PCD has a monotonic increase relationship with Qstep, thus PCD can evaluate the compression loss in perceptual image and video coding. R-PCD model is proposed based on curve fitting accuracy results. Then, a two step CTU level bit allocation and rate control method is proposed based on the MSE and PCD quality oriented optimization solutions. The proposed method can achieve significant R-D performance gains, and also performs very well on PC map preservation, quality smoothness and bit rate accuracy.