Visual Quality Assessment and Optimization: Exploring Generalization Capability
基於模型泛化的視覺質量評價與優化研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 3 May 2022 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(d418aa46-d67e-41af-8382-7f2e54f03df6).html |
---|---|
Other link(s) | Links |
Abstract
Serving as the key component in a variety of computer vision applications, objective image/video quality assessment (I/VQA) has become an essential yet challenging task. Nowadays, deep learning technologies have achieved great success in various computer vision tasks and are also widely employed for quality assessment. However, the strong assumption of deep learning-based methods is that the training and testing data are drawn from the same distribution, bringing a high risk of poor generalization capability on unseen testing data. As consequence, the learned I/VQA modes may not be reliable to be used for quality optimization, especially when the data hold dramatically different statistics compared to those in the training set. This thesis focuses on exploring generalization capability in I/VQA, in view of the increasing demand for enhancing the generalization capability of quality assessment and optimization. It mainly consists of four parts: 1) An pseudo-reference based no-reference (NR) IQA model is designed, aiming for generalized quality assessment by improving the model discrimination capability. 2) The capability of transferring the quality assessment of natural scene images to the screen content images (SCIs) is explored with domain adaptation. 3) An NR-VQA model equipped with high-generalization capability in cross-content, -resolution and -frame rate quality prediction is proposed. 4) The loop framework of quality assessment and quality optimization is constructed for low-light image enhancement.
In the first part, we propose a novel NR-IQA method via pseudo-reference (PR) estimation. Instead of predicting the reference in an image-level, we learn the reference information in the feature-level, getting rid of the design of a specific network for PR image generation. In particular, we first construct the PR feature from the distorted image by a mutual learning strategy, thus the PR feature can be learned from the pristine reference one. To ensure the discrimination of PR feature and distortion feature, the triplet constraint is further adopted. Then we fuse the PR feature and the corresponding distortion feature based on an invertible neural layer for final quality prediction. Due to the quality estimated by our model are patch-wised, a gated recurrent unit (GRU) based quality aggregation module is proposed to aggregate the predicted quality scores of different patches in an adaptive manner. Experimental results demonstrate the effectiveness and superiority of the proposed method.
In the second part, rooted in the widely accepted view that the human visual system has adapted and evolved through the perception of natural environment, we develop the unsupervised domain adaptation based NR quality assessment method for SCIs, leveraging rich subjective ratings of the natural images (NIs). Due to the dramatically different statistical characteristics that NI and SCI hold, the proposed quality measure is designed based on the philosophy of improving the transferability and discriminability in a pair-wise manner. Regarding feature discriminatory capability enhancement, we propose a center based loss to rectify the classifier and improve its prediction capability not only for source domain (NI) but also the target domain (SCI). For feature discrepancy minimization, the maximum mean discrepancy (MMD) is imposed on the extracted ranking features of NIs and SCIs. Furthermore, to further enhance the feature diversity, we introduce the correlation penalization between different feature dimensions, leading to the features with lower rank and higher diversity. The proposed method also sheds light on learning quality assessment measures for unseen application-specific content without the cumbersome and costing subjective evaluations.
In the third part, we concentrate on the generalized NR-VQA model, in an effort to improve the prediction capability in cross-content, -resolution and -frame rate quality prediction. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method.
In the fourth part, to close the gap between enhancement and assessment, we propose a loop enhancement framework that produces a clear picture of how the enhancement of low-light images could be optimized towards better visual quality. In particular, we create a large-scale database for QUality assessment Of The Enhanced LOw-Light image (QUOTE-LOL), which serves as the foundation in studying and developing objective quality assessment measures. The objective quality assessment measure plays a critical bridging role between visual quality and enhancement and is further incorporated in the optimization in learning the enhancement model towards perceptual pleasant results. Finally, we iteratively perform the enhancement and optimization tasks, enhancing the low-light images continuously. The superiority of the proposed scheme is validated based on various low-light scenes.
Overall, this thesis improves the performance of I/VQA models from the following four aspects. 1) The generalization capability of NR-IQA models is improved by exploring more discriminative quality features based on the construction of PR features. 2) The transferability of the IQA model from NI to SCI is promoted by a domain adaptation framework. 3) The generalization capability of the NR-VQA model is enhanced with a unified distribution regularization. 4) The optimization for perceptual enhancement of low-light images is explored in a loop manner. Extensive experimental results verify the effectiveness of the proposed schemes.
In the first part, we propose a novel NR-IQA method via pseudo-reference (PR) estimation. Instead of predicting the reference in an image-level, we learn the reference information in the feature-level, getting rid of the design of a specific network for PR image generation. In particular, we first construct the PR feature from the distorted image by a mutual learning strategy, thus the PR feature can be learned from the pristine reference one. To ensure the discrimination of PR feature and distortion feature, the triplet constraint is further adopted. Then we fuse the PR feature and the corresponding distortion feature based on an invertible neural layer for final quality prediction. Due to the quality estimated by our model are patch-wised, a gated recurrent unit (GRU) based quality aggregation module is proposed to aggregate the predicted quality scores of different patches in an adaptive manner. Experimental results demonstrate the effectiveness and superiority of the proposed method.
In the second part, rooted in the widely accepted view that the human visual system has adapted and evolved through the perception of natural environment, we develop the unsupervised domain adaptation based NR quality assessment method for SCIs, leveraging rich subjective ratings of the natural images (NIs). Due to the dramatically different statistical characteristics that NI and SCI hold, the proposed quality measure is designed based on the philosophy of improving the transferability and discriminability in a pair-wise manner. Regarding feature discriminatory capability enhancement, we propose a center based loss to rectify the classifier and improve its prediction capability not only for source domain (NI) but also the target domain (SCI). For feature discrepancy minimization, the maximum mean discrepancy (MMD) is imposed on the extracted ranking features of NIs and SCIs. Furthermore, to further enhance the feature diversity, we introduce the correlation penalization between different feature dimensions, leading to the features with lower rank and higher diversity. The proposed method also sheds light on learning quality assessment measures for unseen application-specific content without the cumbersome and costing subjective evaluations.
In the third part, we concentrate on the generalized NR-VQA model, in an effort to improve the prediction capability in cross-content, -resolution and -frame rate quality prediction. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method.
In the fourth part, to close the gap between enhancement and assessment, we propose a loop enhancement framework that produces a clear picture of how the enhancement of low-light images could be optimized towards better visual quality. In particular, we create a large-scale database for QUality assessment Of The Enhanced LOw-Light image (QUOTE-LOL), which serves as the foundation in studying and developing objective quality assessment measures. The objective quality assessment measure plays a critical bridging role between visual quality and enhancement and is further incorporated in the optimization in learning the enhancement model towards perceptual pleasant results. Finally, we iteratively perform the enhancement and optimization tasks, enhancing the low-light images continuously. The superiority of the proposed scheme is validated based on various low-light scenes.
Overall, this thesis improves the performance of I/VQA models from the following four aspects. 1) The generalization capability of NR-IQA models is improved by exploring more discriminative quality features based on the construction of PR features. 2) The transferability of the IQA model from NI to SCI is promoted by a domain adaptation framework. 3) The generalization capability of the NR-VQA model is enhanced with a unified distribution regularization. 4) The optimization for perceptual enhancement of low-light images is explored in a loop manner. Extensive experimental results verify the effectiveness of the proposed schemes.
- Image quality assessment, video quality assessment, low-light enhancement, generalization capability, domain adaptation