Rate control optimization and image quality assessment in three dimensional video coding
Student thesis: Doctoral Thesis
Related Research Unit(s)
Recent development of three dimensional (3D) display technologies has led to emerging research in 3D video coding (3DVC). Compared to traditional single view video, 3D video contains higher dimensional visual information, which can improve the user’s experience such as depth perception. Due to the large volume of 3D video content and limited resources, the high compression performance must be considered during the coding stage. Currently, there are numerous 3DVC schemes for different application environments such as the transport option, 3D video format and display terminal. Rate control is a classical resource allocation problem in 3DVC, which aims to achieve a balance between visual quality, quality smoothness and buffer smoothness under the bandwidth constraints. For the rate control optimization in 3DVC, the bit allocation problems exist in hierarchical coding units such as view level, frame level, etc., which makes the RC optimization in 3DVC more complicated. On the other hand, the ultimate optimization criterion is the subjective perceptual quality in 3DVC. However, subjective quality assessment is time-consuming. Meanwhile, the traditional 2D image quality assessment metric cannot reflect the perceptual stereoscopic image quality. In addition, the distortion type of virtual view is significantly different from the common type of distortion, thus the traditional 2D image quality assessment metrics are failure. Therefore, developing reliable and generic 3D image quality assessment metrics is a challenging issue in 3DVC. The contribution of this thesis is mainly composed of two parts: the rate control optimization in 3DVC and the design of 3D image quality assessment metrics. As for the rate control optimization part, we aim to figure out the disadvantages of the existing works, and discover promising directions for improving the rate control performance in 3DVC. First, we propose a user preference guided bit allocation framework for the view level bit allocation in multi view plus depth based 3DVC. The Cauchy-density based rate-distortion model of the texture video and depth map are employed to represent the rate distortion properties. The relationship between the distortion of synthesized view and quantization step size of texture videos and depth maps is represented as linear model. The proposed algorithm can achieve good performance with acceptable computational complexity comparing to the full search scheme. Second, we design a one-pass frame level bit allocation and rate control algorithmbased on bargaining game theory for hierarchical B-picture structure, which is the basic coding structure of 3DVC. The frame level bit allocation problem is modeled as a cooperative game problem. Then, the bit allocation problem is solved by nash bargaining solution to allocate bits effectively among the frames in different temporal levels. The proposed one-pass rate control algorithm outperforms the benchmark algorithm JVT-W043 and other one-pass algorithms both in bit rate accuracy and visual quality. Third, we make an analysis of the inner-layer bit allocation problem of scalable 3DVC. From the perspective of centralized resource allocation optimization, the inner-layer bit allocation problem of scalable 3DVC is similar to the bargaining problem. For each spatial layer, the encoding constraints, such as bit rates, buffer size are jointly modeled as resources in the inner-layer bit allocation bargaining game. The modified rate distortion model incorporated with the inter-layer coding information is investigated. Then the generalized nash bargaining solution is employed to achieve an optimal bit allocation solution. The bandwidth is allocated to the frames from the generalized nash bargaining solution adaptively based on their own bargaining powers. The proposed rate control algorithm achieves appealing visual quality improvement and buffer smoothness. As for the 3D image quality assessment part, we aim to design different types of metric that can be used in different application environments. First, we develop a 3D full reference image quality assessment metric based on the binocular spatial sensitivity. The binocular spatial sensitivity map is modeled to reflect the binocular fusion properties. Then, a framework of integration of binocular spatial sensitivity map into quality assessment is presented. The proposed metric is shown to be effective on benchmark datasets. Second, we propose a 3D reduced reference image quality assessment metric based on 3D natural image statistics in contourlet domain. In this metric, the Gaussian scale mixtures model is employed to normalize the coefficients for the coefficient in the contourlet subband of luminance image and disparity map of the 3D images. After divisive normalization transform, we find that the marginal distribution of the coefficients are approximately Gaussian distributed. Based on these investigations, the standard derivations of the fitted Guassian distribution are extracted as the feature parameters in our metric for each contourlet subband of the luminance image and disparity map of the 3D images. After that, the feature similarity index is employed to measure the 3D visual quality at the receiver side without accessing the original stereoscopic images. The proposed metric has good consistency with 3D subjective perception of human, and the metric can be implemented in many end-to-end 3D video systems.
- Coding theory, Image processing, 3-D video (Three-dimensional imaging)