Just-Noticeable-Distortion Optimization: From Methodology to Application

恰可察覺失真優化:從方法論到應用

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date23 Nov 2021

Abstract

Investigating the perception mechanisms of Human Visual System (HVS) and incorporating them into perception-oriented applications are long-standing research topics. As a significant characteristic of HVS, Just-Noticeable-Distortion (JND) denotes the limitation of HVS distinguish power and has been studied over the past few decades. This thesis aims at having an overall optimization to current JND researches, including the methodology of JND computation and corresponding applications. It consists of three main parts: 1) A novel patch decomposition-based JND inferring model, which provides interpretable visual information representation and accurate JND prediction; 2) A JND-guided perceptual video coding scheme, where the JND is employed to provide reliable guidance for perceptually lossless visual content compression; 3) A JND-based target domain transfer scheme, which is proposed to improve the perceptual quality of current Convolutional Neural Network (CNN)-based image super-resolution models.

In the first part, we point out the limitations of existing JND models and propose an effective approach to infer the JND profile based on patch-level structural visibility learning. Instead of pixel-level JND profile estimation, the image patch, which is regarded as the basic processing unit to better correlate with the human perception, can be further decomposed into three conceptually independent components for visibility estimation. In particular, to incorporate the structural degradation into the patch-level JND model, a deep learning-based structural degradation estimation model is trained to approximate the masking of structural visibility. In order to facilitate the learning process, a JND dataset is further established, including 202 pristine images and 7878 distorted images generated by advanced compression algorithms based on the upcoming Versatile Video Coding (VVC) standard. Extensive experimental results further show the superiority of the proposed approach over the state-of-the-art.

The second part proposes a JND-guided perceptually lossless coding framework for Versatile Video Coding (VVC) intra coding. Within this framework, a pattern-based pixel-wise JND model is employed to guide the distortion distribution, and subsequently the most appropriate quantization parameter is chosen for each Coding Tree Unit (CTU). The content-adaptive Laplacian distribution-based D-Q model incorporating a two-pass coding framework is established to derive the most proper QP that satisfies the perceptually lossless coding criteria. The whole framework is integrated into the VVC intra coding framework. Experimental results demonstrate that the proposed scheme can achieve high accuracy prediction and efficient perceptually lossless intra coding, leading to around 10% bit rate savings comparing with the frame-level QP derivation scheme.

In the third part, we have a preliminary exploration of applying JND into CNN-based single image super-resolution (SISR) models. Conventional CNN-based SISR models attempt to lift the images from the degraded one to the real-world image domain with deep learning models, ignoring the intrinsic divergence between the images acquired from the real world and generated from neural networks. In view of such gap, we propose the learning-based JND model. The proposed model relys on the inference of viability maskings for domain transfer from the real-world acquisition image domain to the deep neural network generation domain. To obtain the JND model, a dataset including 500 real-world images and their 4500 degraded versions which are from 9 CNN-based self-encoders, is established. Along with the subjective testing, the domain transfer is achieved by inferring the JND profile in an end-to-end manner, leading to the perceptually equivalent and CNN-generated space for SISR model training. The proposed scheme provides reliable guidance for SISR models, leading to better perceptual quality by compensating the inconsistency regarding the distributions of the model-generated images and real-world acquired images. Extensive experimental results have demonstrated that our proposed scheme can further improve the performance of SISR based on state-of-the-art methods.