Content-Adaptive Rate Distortion Optimization Techniques for Video Coding
基於內容自適應的率失真優化方法研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 13 Apr 2023 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(ff834489-a185-43a0-800f-9a3c0a4b63da).html |
---|---|
Other link(s) | Links |
Abstract
Effcient video coding technologies are highly demanded in various applications. This thesis focuses on the compression capability improvement for the Versatile Video Coding (VVC) and AOMedia Video 1 (AV1) standards. It mainly consists of three parts: 1) learning-based rate control for AV1 encoder; 2) adaptive quantization parameter (QP) selection with dependency modeling for VVC encoder; 3) content-aware screen content rate control for VVC encoder. The first topic targets at improving the coding performance for AV1 with the pre-trained machine learning models. The second topic explores the dependency relationship among frames in VVC encoder and enhance the coding performance with fixed-QP configuration. The third topic aims at utilizing spatial and temporal information to overcome the challenge of screen content rate control.
In the first part, the performance of rate control is enhanced with two machine learning models. First, two learning-based models are proposed the hierarchical bit allocation and frame-level parameter estimation in AV1. More specifically, the coding features are reutilized for the frame-wise bit allocation and R-Q model parameter estimation without extra computation resources for feature extraction. Second, we propose a training data acquisition strategy based on multi-pass coding for AV1, which ensures the best coding effciency during offline training. As such, the effectiveness of the machine learning models could be guaranteed. Third, the proposed models are incorporated into AV1 and superior performance demonstrates the effectiveness of our method in terms of RD performance and bitrate accuracy.
In the second part, dependency information among frames is utilized in global rate distortion optimization (RDO) for QP determination. As such, the encoding QP for each frame is solved with content characteristics. The distortion dependency and rate dependency between the reference frame and the to-be-coded frame are modeled in a scientifically-sound way. In addition, the parameters in the dependent models could be adaptively calculated based on the statistics of the first-pass encoding. Second, the global RDO across different frames is achieved based on the precise reference relationship modeling, such that the optimal QP for each frame is adaptively derived. Moreover, the corresponding Lagrange multiplier is obtained based on the content-aware QP-λ relationship, in an effort to thoroughly optimize the RD performance.
In the third part, we propose a content-aware CTU level rate control for screen content coding. First, hyperbolic complexity-aware rate model and distortion model are proposed for the screen content videos. Compared with the existing λ-domain models and complexity-aware model, the proposed models achieve superior accuracy. In addition, thetraditional fixed λ-QP relationship is replaced with the complexity-adaptive λ-QS relationship, which is derived from the proposed models. As such, the encoding QP and λ could be calculated more precisely. Second, we propose an advanced CU-Tree based pre-analysis strategy that seamlessly collaborates with the screen content characteristics. Besides, the temporal importance of each frame or CTU is computed with the help of dependency relationship. As such, the spatial complexity and temporal referencing relationship could be more effectively captured for each individual frame or CTU. Third, we propose the rate control scheme at frame-level and CTU-level for VVC screen content coding, based upon the rate-distortion modeling and pre-analysis. More specifically, effcient bit allocation is performed based on the pre-analysis outcomes, such that the content complexity and temporal dependency can be jointly considered, leading to the compression performance improvement for screen content video coding under the low-delay configuration.
Overall, this thesis improves the video compression capability in the following three aspects: 1) The bit allocation and QP determination in rate control are improved with the content adaptive machine learning models. 2) The dependency information is thoroughly explored such that the global RDO could be effectively solved. 3) The content characteristics are effectively modeled in the screen content rate control and optimal bit allocation is achieved for each frame. The effectiveness of the proposed schemes has been verified with extensive experimental results.
In the first part, the performance of rate control is enhanced with two machine learning models. First, two learning-based models are proposed the hierarchical bit allocation and frame-level parameter estimation in AV1. More specifically, the coding features are reutilized for the frame-wise bit allocation and R-Q model parameter estimation without extra computation resources for feature extraction. Second, we propose a training data acquisition strategy based on multi-pass coding for AV1, which ensures the best coding effciency during offline training. As such, the effectiveness of the machine learning models could be guaranteed. Third, the proposed models are incorporated into AV1 and superior performance demonstrates the effectiveness of our method in terms of RD performance and bitrate accuracy.
In the second part, dependency information among frames is utilized in global rate distortion optimization (RDO) for QP determination. As such, the encoding QP for each frame is solved with content characteristics. The distortion dependency and rate dependency between the reference frame and the to-be-coded frame are modeled in a scientifically-sound way. In addition, the parameters in the dependent models could be adaptively calculated based on the statistics of the first-pass encoding. Second, the global RDO across different frames is achieved based on the precise reference relationship modeling, such that the optimal QP for each frame is adaptively derived. Moreover, the corresponding Lagrange multiplier is obtained based on the content-aware QP-λ relationship, in an effort to thoroughly optimize the RD performance.
In the third part, we propose a content-aware CTU level rate control for screen content coding. First, hyperbolic complexity-aware rate model and distortion model are proposed for the screen content videos. Compared with the existing λ-domain models and complexity-aware model, the proposed models achieve superior accuracy. In addition, thetraditional fixed λ-QP relationship is replaced with the complexity-adaptive λ-QS relationship, which is derived from the proposed models. As such, the encoding QP and λ could be calculated more precisely. Second, we propose an advanced CU-Tree based pre-analysis strategy that seamlessly collaborates with the screen content characteristics. Besides, the temporal importance of each frame or CTU is computed with the help of dependency relationship. As such, the spatial complexity and temporal referencing relationship could be more effectively captured for each individual frame or CTU. Third, we propose the rate control scheme at frame-level and CTU-level for VVC screen content coding, based upon the rate-distortion modeling and pre-analysis. More specifically, effcient bit allocation is performed based on the pre-analysis outcomes, such that the content complexity and temporal dependency can be jointly considered, leading to the compression performance improvement for screen content video coding under the low-delay configuration.
Overall, this thesis improves the video compression capability in the following three aspects: 1) The bit allocation and QP determination in rate control are improved with the content adaptive machine learning models. 2) The dependency information is thoroughly explored such that the global RDO could be effectively solved. 3) The content characteristics are effectively modeled in the screen content rate control and optimal bit allocation is achieved for each frame. The effectiveness of the proposed schemes has been verified with extensive experimental results.