Visual Quality Assessment for Human and Machine Vision Systems

Student thesis: Doctoral Thesis

Abstract

Visual quality assessment aims to quantify the extent of the distortions that corrupt visual data, providing quality monitoring criteria or optimization goals in numerous vision-centric systems. Over the past decade, most quality assessment works focus on assessing the quality of natural images for the human visual system (HVS). In recent years, there have been notable advancements in image generation facilitated by artificial intelligence (AI) techniques. AI-generated images have found applications in wide-ranging domains, such as advertising, entertainment, and scientific research, leading to a high demand for assessing the quality of AI-generated images. Besides, advancements in AI techniques have also significantly enhanced the capability and efficiency of intelligent machines in various visual analysis tasks, e.g., image detection and recognition. For specific tasks for machine vision, high-quality images are required to guarantee the stability and reliability of machines. Therefore, it is highly desirable to assess the image quality based on machine vision. This thesis includes three parts that study visual quality assessment for the human vision and machine vision.

The first part explores the drawbacks of traditional full-reference image quality assessment (FR-IQA), and introduces a new FR-IQA paradigm. More specifically, traditional FR-IQA methods predict the perceptual quality of a distorted image with a given pristine-quality image as the reference. However, the near-threshold visual perception of the HVS suggests that there could be numerous pristine-quality representations that are indistinguishable in a scene, and the so-called pristine image used in FR-IQA for reference is just one of them. With numerous approaches proposed for FR-IQA by evaluating the perceptual similarity, much less work has been dedicated to locating the best reference for the deterministic perceptual similarity measure. Therefore, we aim to answer the question of whether enabling the freedom in reference image selection could lead to better performance by designing a new FR-IQA paradigm FLexible REference (FLRE). The FLRE paradigm is developed in the feature space by attempting to obtain the feature-level reference of the distorted image via the selection of its corresponding best explanation within an equal-quality space. We implement the FLRE as a plug-in module before the deterministic FR-IQA process, and experimental results have demonstrated that combining FLRE with the existing deep feature-based FR-IQA models can significantly improve the quality prediction performance, largely surpassing the state-of-the-art methods.

The second part systematically studies generative adversarial networks (GAN) generated face image quality assessment (GFIQA) for the HVS. The distortions of GAN-generated face images (GFIs) usually vary depending on different GAN models, resulting in a high generalization capability that a GFIQA model should possess. Therefore, we first establish a large GFIQA database by collecting various GFIs from existing popular GAN models. Subsequently, we further propose a Causal Representation Learning scheme for the generalized GFIQA model (CRL-GFIQA) with the assumption that the causal knowledge of human quality assessment is shareable in different scenarios. In particular, we disentangle the learned features into casual and non-causal components by an invertible neural network, facilitating the proposed CRL-GFIQA model with a high generalization on unseen domains. Extensive experimental results demonstrate the effectiveness of our CRL-GFIQA model.

The three part investigates the accuracy-rate equilibrium (ARE) for face recognition systems. Facial data is often compressed to accommodate transmission or storage limitations, which can lead to the loss of critical identity details, diminishing the effectiveness of face recognition systems. Therefore, we develop an ARE prediction method for the Face Recognition system (ARE-FR), which automatically infers ARE images of face images. The goal of the proposed ARE-FR is to maximize redundancy removal without impairment of robust identity information. Considering that high-level semantic features effectively capture crucial identity information, we force the proposed ARE-FR to focus only on the features in identity-related regions when predicting the ARE images. These features are derived from the interactive relationships between deep and shallow features. Furthermore, to facilitate the learning process, we develop a method to annotate training samples according to the definition of ARE in face recognition. Experimental results have demonstrated that combining our proposed ARE-FR with the image coding algorithm is capable of saving more bits while maintaining the performance of the face recognition system.
Date of Award22 Nov 2024
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorShiqi WANG (Supervisor) & Tak Wu Sam KWONG (External Co-Supervisor)

Cite this

'