Content Adaptive Compression and Optimization Techniques for Video Coding


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date16 Aug 2021


The remarkable popularity of video-oriented applications and the tremendous growth of video data present new challenges to video compression technology. This thesis focuses on the coding performance improvement and encoding optimization for the upcoming VVC and AVS3 standards. It mainly consists of three parts: 1) new CU partitioning schemes are investigated to promote local adaptability for VVC and AVS3; 2) low-complexity trellis-coded quantization schemes for VVC encoder; 3) super-resolution for compressed screen content videos. The first topic is explored to enhance the coding performance. The second topic aims at exploiting low-complexity and high throughput coding tools. The last one is studied to enhance the quality of compression distorted screen content videos.

In the first part, new CU partitioning methods are investigated to further develop the principle behind the existing partitioning structure in the VVC and AVS3 standards. First, a central extended quad-tree (CENTRAL-EQT) partitioning is introduced, which extends the traditional QT partitioning with central pattern and generates four elaborately designed sub-CUs with different sizes. Second, we propose a parallel extended quad-tree (PARALLEL-EQT) partitioning that allows the CU to split along a single direction, leading to four identical size sub-blocks. Third, we present an asymmetric ternary-tree (ATT) method, which splits the CU asymmetrically into three sub-blocks. The proposed new partitioning methods allow the interleaving with binary-tree partitioning for enhanced adaptability, and they can be jointly enabled to capture different characteristics of the local content. The optimal partitioning mode is determined based on the rate distortion optimization. The CENTRAL-EQT partitioning has been adopted to the AVS3 standard.

In the second part, we propose a low complexity trellis-coded quantization scheme for the VVC encoder. The forthcoming VVC standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft-decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, the rate and distortion models are established in a scientifically sound way with theoretical modeling. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. One implementation of the proposed scheme is adopted by the VVC software.

In the third part, we concentrate on the super-resolution (SR) of compressed screen content video, in an effort to address the real-world challenges by considering the underlying characteristics of screen content. Firstly, we propose a new dataset for the SR of screen content video with different distortion levels. Meanwhile, inspired by the mulit-hypothesis philosophy, we design an efficient SR structure that could capture the characteristics of compressed screen content video and manipulate the inner-connections in consecutive compressed low-resolution frames, facilitating the high-quality recovery of the high-resolution counter-part. Moreover, we design a new loss function for network training to better remedy the compression distortion and perceptual distortion. Experimental results demonstrate the effectiveness and superiority of the proposed method.

Therefore, this thesis improves the video compression efficiency from the following three aspects. 1) The rate distortion performance is improved with more flexible coding unit partitioning. 2) The encoding complexity is reduced with a low-complexity trellis-coded quantization. 3) The quality of the reconstructed frames is enhanced with a deep neural network. The merit of the proposed schemes lies in the content-adaptive equipped with practicable value. Extensive experimental results verify the effectiveness of the proposed schemes.