Rate-Distortion-Complexity Optimization for High Efficiency Video Coding


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date2 Nov 2018


The booming of multimedia services in recent years has been promoting the development of video compression standards, which requires to store and delivery more video contents with relatively limited resource. The latest video compression standard, High Efficiency Video Coding (HEVC), has greatly improved the Rate-Distortion (RD) performance, 50% bit rate reduction under the same visual quality, when compared with the predecessor one, i.e., H.264/Advanced Video Coding (AVC). However, equipped with the quad-tree structured partitions and other sophisticated coding tools, HEVC brings a significant increment on the computational complexity, which limits its real-time application. Fortunately, a promising solution is brought by the powerful techniques of Machine Learning (ML) and Deep Learning (DL).With their amazing abilities, they push the improvement for video coding in many ways, not only the complexity, but also the RD performance.

This thesis focuses on the Rate-Distortion-Complexity (RDC) optimization for video coding by using ML and DL. It mainly consists of four parts: 1) a fuzzy Support Vector Machine (SVM) based Coding Unit (CU) decision approach for HEVC; 2) a multi-class ranking based Prediction Unit (PU) decision approach for HEVC; 3) a binary and multi-class learning based CU and PU decision approach for HEVC; 4) a Convolutional Neural Network (CNN) based synthesized view quality enhancement approach for three Dimensional (3D) HEVC. The first three topics are explored at the low complexity optimization for HEVC and the last one is studied at the RD performance improvement for 3D HEVC.

In the first part, the process of CU decision is formulated as a cascaded multilevel classification task. The optimal feature set is selected according to a defined misclassification cost and a risk area is introduced for an uncertain classification output. To further improve the RDC performance, different regulation parameters in SVM are adopted and outliers in training samples are eliminated. Additionally, the proposed CU decision method is incorporated into a joint RDC optimization framework, where the width of risk area is adaptively adjusted to allocate flexible computational complexity to different CUs, aiming at minimizing computational complexity under a configurable constraint in terms of RD performance degradation.

In the second part, the process of PUs selection is regarded as a binary classification plus multi-class ranking task, and incremental learning is applied for classifier training to better exploit the information in the emerging training data. Furthermore, the complexity can be flexibly allocated targeting at minimizing computational cost under a constrained RD performance degradation.

In the third part, the processes of recursive CU decision and PU selection in HEVC are modeled as hierarchical binary classification and multi-class classification structures. According to the two classification structures, the CU decision and PU selection are optimized by binary and multi-class SVM, i.e., the CU and PU can be predicted directly via classifiers without intensive RD cost calculation. In particular, to achieve better prediction performance, a learning method is proposed to combine the off-line ML mode and on-line ML mode for classifiers based on a multiple reviewers system. Additionally, the optimal parameters determination scheme is adopted for flexible complexity allocation under a given RD constraint.

In the fourth part, the problem of warping and coding distortion elimination in synthesized view is casted into an image restoration task. Moreover, both geometric and compression distortions are considered according to the specific characteristics of synthesized view distortion. Due to the distinguished ability of CNN in signal level reconstruction, it is adopted in both View Synthesis Optimization (VSO) and post-processing modules, aiming to reconstruct the latent distortion free image. Accordingly, a new Lagrange multiplier in the RD cost function is derived to achieve a better trade-off between view synthesis distortion variation and coding bits.

    Research areas

  • machine learning, support vector machine, high efficiency video coding, convolutional neural network, low complexity optimization, 3D, view synthesis, multi-view plus depth, rate-distortion optimization