No-reference Image Quality Assessment via Non-local Modeling
基於非局部建模的無參考圖像質量評價研究
Student thesis: Master's Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 4 May 2023 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(2d1e72fb-2405-43df-aac9-4838b6da1875).html |
---|---|
Other link(s) |
Abstract
In recent years, delivering and generating visually appealing content has become one of the key competitions among streaming media tech giants, such as YouTube, Meta, ByteDance TikTok, and Netflix. In order to distribute quality-pleasant and human-centric content, general, accurate, and robust visual quality assessments are highly demanded. On the one hand, subjective quality assessment is much more accurate, reliable, and trustworthy. However, collecting subjective evaluation opinions is cumbersome, time-consuming, and expensive. As a result, subjective studies in data-intensive real-world applications are unrealistic and impractical. On the other hand, objective Image Quality Assessment (IQA) automatically measures the input image quality, playing an essential and fundamental role in various computer vision tasks, e.g., image enhancement, compression, editing, synthesis, and generation. It improves customers’ visual experience and optimizes the production line. Specifically, there are three kinds of objective IQA models categorized via the presence of pristine images. With pristine images, the Full-reference (FR) IQA algorithms optimize and evaluate image processing systems by designing an image fidelity metric. The Reduced-reference (RR) IQA algorithms maintain partial information from reference images, for example, a subset of features. Such methods are mainly employed for live television streaming services as they control the transmission quality on the fly. Apart from the FR and RR IQA, the No-reference (NR) or blind IQA is highly desirable, practical, and essential, yet challenging, in most real-life scenarios, where pristine images are typically unavailable. Thus, this thesis pays more attention to objective quality evaluation and NR-IQA and mainly contains two parts: 1) Local and Non-local Analyses of Natural Images: We analyze and discuss the significance of the local modeling and non-local modeling on natural images. 2) A Proposed Non-local Modeling Method: A novel NR-IQA method based on the non-local features learned by Graph Neural Network (GNN) is proposed to explore the non-local interactions in quality prediction.
The local modeling encodes spatially proximate local neighborhoods. The state-of-the-art quality assessment and perceptual optimization methods are primarily built upon traditional CNN, where strong inductive biases, such as locality, are inherent. Benefits abound in the local processing, e.g., translation equivalence, translation invariance, and weight sharing. Nevertheless, suffered by the local priors of CNN, the non-local information is usually absent, and the local modeling methods’ performances may be improved by considering the non-local features and long-range dependencies. Thus, we analyze the non-local information and long-range dependencies on natural images. The non-local modeling establishes the spatial integration of information by long- and short-range communications with different spatial weighting functions. The non-local features are internal statistics of natural images, which are a kind of Natural Scene Statistics (NSS). They hold strong predictive power and expressiveness. In addition, from the perspective of dependency and relational modeling, the non-local method has shown a powerful modeling capability for semantics and content understanding, which manages to catch the context information from images. Besides, the non-local feature extraction is content-adaptive and flexible, and geometric structures and relations over the whole image are well explored. Furthermore, the non-local features are broadly utilized to regularize the solution space of the ill-posed and under-constrained vision tasks, such as image restoration and generation. Such a prior carries and preserves the common sense of the environment, reflecting the knowledge about the statistical properties of the world. Thus, the non-local features are perceptually relevant to the Human Visual System (HVS) and directly affect visual quality evaluation and perceptual optimization.
In the second part, we introduce a non-local behavior and propose a non-local modeling method based on the GNN for NR-IQA. The method is rooted in the view that image quality is perceived with the non-local and long-range dependencies among different regions, inspiring us to explore the non-local and long-range interactions in visual quality assessment and perceptual optimization. Specifically, a two-stage superpixel-based GNN approach is presented to derive the non-local features and model long-range dependencies. In the first stage, a one-layer GNN is implemented to aggregate features within visually-meaningful superpixels. Subsequently, the following GNN layers integrate features via a spatial attention module and link the distributed and widely separated local regions, i.e., superpixels, into a comprehensive and robust non-local feature representation of the overall image quality. Apart from the non-local features, the local features are extracted from the spatially-proximate neighborhoods by a pre-trained VGGNet-16. Finally, the learned non-local features are combined with the local features, achieving superior performances to the features utilized individually. Extensive intra-dataset experimental results on the LIVE, CSIQ, TID2013, and KADID-10k databases have demonstrated the superiority of integrating the local and non-local features for quality prediction and perceptual optimization. Besides, the results of individual distortion type reveal that the non-local modeling manages to handle a wide variety of global distortions, i.e., the globally and uniformly distributed distortions with non-local recurrences over the image. In the meanwhile, it maintains sensitivity to local distortions, i.e., the local nonuniform-distributed distortions in a local region. In addition, it particularly holds a stronger prediction capability to assess the noisy and compressed images quality. Lastly, the cross-dataset performances have verified the advanced generalization capability of our proposed method.
We see the non-local modeling is complementary to traditional local methods. Both local and non-local features contribute to image quality assessment. CNN’s local modeling features are effective and robust. Meanwhile, the non-local modeling is a supporting element that reinforces prediction power and boosts representation ability. De facto, the unilateral local or non-local features will not be adequate to evaluate perceptual quality. The combination of the two is superior for visual quality assessment. Thus, this thesis opens the door to consider the non-local concept in the field, together with the local features from the local modeling methods.
The local modeling encodes spatially proximate local neighborhoods. The state-of-the-art quality assessment and perceptual optimization methods are primarily built upon traditional CNN, where strong inductive biases, such as locality, are inherent. Benefits abound in the local processing, e.g., translation equivalence, translation invariance, and weight sharing. Nevertheless, suffered by the local priors of CNN, the non-local information is usually absent, and the local modeling methods’ performances may be improved by considering the non-local features and long-range dependencies. Thus, we analyze the non-local information and long-range dependencies on natural images. The non-local modeling establishes the spatial integration of information by long- and short-range communications with different spatial weighting functions. The non-local features are internal statistics of natural images, which are a kind of Natural Scene Statistics (NSS). They hold strong predictive power and expressiveness. In addition, from the perspective of dependency and relational modeling, the non-local method has shown a powerful modeling capability for semantics and content understanding, which manages to catch the context information from images. Besides, the non-local feature extraction is content-adaptive and flexible, and geometric structures and relations over the whole image are well explored. Furthermore, the non-local features are broadly utilized to regularize the solution space of the ill-posed and under-constrained vision tasks, such as image restoration and generation. Such a prior carries and preserves the common sense of the environment, reflecting the knowledge about the statistical properties of the world. Thus, the non-local features are perceptually relevant to the Human Visual System (HVS) and directly affect visual quality evaluation and perceptual optimization.
In the second part, we introduce a non-local behavior and propose a non-local modeling method based on the GNN for NR-IQA. The method is rooted in the view that image quality is perceived with the non-local and long-range dependencies among different regions, inspiring us to explore the non-local and long-range interactions in visual quality assessment and perceptual optimization. Specifically, a two-stage superpixel-based GNN approach is presented to derive the non-local features and model long-range dependencies. In the first stage, a one-layer GNN is implemented to aggregate features within visually-meaningful superpixels. Subsequently, the following GNN layers integrate features via a spatial attention module and link the distributed and widely separated local regions, i.e., superpixels, into a comprehensive and robust non-local feature representation of the overall image quality. Apart from the non-local features, the local features are extracted from the spatially-proximate neighborhoods by a pre-trained VGGNet-16. Finally, the learned non-local features are combined with the local features, achieving superior performances to the features utilized individually. Extensive intra-dataset experimental results on the LIVE, CSIQ, TID2013, and KADID-10k databases have demonstrated the superiority of integrating the local and non-local features for quality prediction and perceptual optimization. Besides, the results of individual distortion type reveal that the non-local modeling manages to handle a wide variety of global distortions, i.e., the globally and uniformly distributed distortions with non-local recurrences over the image. In the meanwhile, it maintains sensitivity to local distortions, i.e., the local nonuniform-distributed distortions in a local region. In addition, it particularly holds a stronger prediction capability to assess the noisy and compressed images quality. Lastly, the cross-dataset performances have verified the advanced generalization capability of our proposed method.
We see the non-local modeling is complementary to traditional local methods. Both local and non-local features contribute to image quality assessment. CNN’s local modeling features are effective and robust. Meanwhile, the non-local modeling is a supporting element that reinforces prediction power and boosts representation ability. De facto, the unilateral local or non-local features will not be adequate to evaluate perceptual quality. The combination of the two is superior for visual quality assessment. Thus, this thesis opens the door to consider the non-local concept in the field, together with the local features from the local modeling methods.
- Image Quality Assessment, No-reference, Human Visual System, Non-local Modeling, Superpixel, Graph Neural Network