Computational complexity optimization for scalable and multiview video coding
Student thesis: Doctoral Thesis
Related Research Unit(s)
In recent years, the widespread adoption of multimedia products has progressively become an important component of lifestyle in the new century. Nevertheless, compared with the increasingly requirement of multimedia information, the storage and transmission limitation will still be a bottleneck in the foreseeable future. Video coding, which addresses how to represent video streams for storage and transmission, has been an essential integrant in broadcast and entertainment media industry. During the past two decades, video coding techniques have been greatly improved, which have resulted in MPEG-x and H.264-x series, as the most representative and widespread video coding standards. Besides, Scalable Video Coding (SVC) and Multiview Coding (MVC) standards are respectively developed, as extensions of the state-of-the-art video coding standard H.264. To promote compression efficiency, all the above video coding standards are designed as lossy coding standards, in which one major concern is to minimize the distortion subject to given bit rate constrains. To improve the Rate-Distortion (RD) performance, several techniques are adopted in H.264/AVC, such as RD optimization, multiple reference selection, quarter pixel Motion Estimation (ME) and variable block partitions for mode decision, and so on. In SVC and MVC, inter-layer and inter-view predictions are also utilized. Despite of high compression efficiency, the computational complexity is also dramatically increased, which extremely limits real-time and mobile applications. Therefore, it is much preferable and even imperative to optimize the encoding procedure for computational complexity reduction whilst well maintaining encoding efficiency. Aiming to address the above problem, several novel and effective optimization techniques are proposed in this thesis, including optimizations of mode decision, multiple reference selection and ME. First of all, the characteristics of mode partitions in H.264 and its extensions are investigated as the basis of this thesis. Later, two conventional mode decision algorithms (i.e., the algorithms based on mode features and prediction structure, like most of existing mode decision algorithms) are respectively proposed, as approximated probability model for IPPP/IBBP structure and joint model for Hierarchical B Picture (HBP) structure. Although the two conventional models achieve good performances, there exists one drawback that the conventional models are designed for specific prediction structures and thus could not be widely employed in different video coding standards. To avoid this, mode mapping assumption is proposed, which in general consider the coding modes as points in Euclidean space and decide the best mode in such a way like ME. Separate models are designed for H.264 and its extensions with mode mapping, and the experimental results also reveal the efficiency and robustness of these models as well as the mode mapping assumption. To further promote a more generic decision method, optimal stopping theory is firstly introduced in mode decision. The duration problem in optimal stopping theory is extended with probability and examination time of each candidate, which results in two constrained models. With advantages of these two models, another hybrid model is designed. During the derivation process, the validity of the above proposed models could also be proved mathematically. In order to utilize optimal stopping theory in mode decision, the probability and time percentage of each mode is firstly estimated statistically, then with the conditions derived with optimal stopping model, the candidate mode list is initialized and examined with early termination. Exhaustive experiments for SVC and MVC demonstrate that, compared with the other recent algorithms, the proposed model could significantly reduce computational complexity with negligible degradation of video quality. Finally, another two algorithms are developed to optimize multiple reference selection and ME. A Gaussian model is presented for RD cost estimation before mode decision, which could jointly optimize multiple reference selection and ME, and also could be combined with the other mode decision algorithms. An adaptive ME early termination scheme is designed with overflow control. The coding efficiency and robustness are further examined with experimental results.
- Computational complexity, Video compression, Digital video, Coding theory