Exploiting Deep Prior and Bitstream Prior for Visual Data Enhancement

利用深度先驗和比特流先驗進行視覺數據增強

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date4 May 2023

Abstract

Visual data plays a dominant role in diverse application scenarios, such as medical imaging, robot vision, and augmented reality. However, various degradations are inevitably introduced into the visual data because of practical constraints, including the limitation of captured devices, lossy compression, and unstable transmission bandwidth, which degrade their perceptual quality for human viewing or utilities on downstream tasks. Thus, this thesis focuses on visual data enhancement by fully considering the deep priors learned from neural networks and the potential exploitation of off-the-shelf bitstream priors. It mainly consists of three parts: 1) compressed domain deep video super-resolution; 2) standard dynamic range television video to high dynamic range conversion; 3) occupancy map guided attributes deblocking for video-based point cloud compression. The first one seeks to pioneer compressed video enhancement by introducing super-resolution on the decoder side. The second one aims to enhance existing SDR videos for vividly playing on upcoming HDR devices. The last one is studied to improve the quality of compressed point clouds, which is promising for next-generation immersive and realistic communication.

The first one investigates a novel approach for compressed domain deep video super-resolution (SR) via jointly leveraging the coding and deep priors. By directly exploiting the diverse and ready-made spatial and temporal coding priors (e.g., partition maps and motion vectors) extracted effortlessly from the video bitstream, the video SR in the compressed domain allows us to accurately reconstruct the high-resolution video with high flexibility and substantially economized computational complexity. Specifically, a Guided Spatial Feature Transform (GSFT) layer is proposed to modulate prior features in a fine-grained and content-adaptive manner. To incorporate the temporal coding prior, a guided soft alignment scheme is designed to generate local attention off-sets for decoded motion vector compensation. To promote the compressed domain video SR research, a novel Compressed Video with Coding Prior (CVCP) dataset is built, including decoded LR videos of diverse content and various coding priors extracted from the bitstream. Finally, extensive experimental results demonstrate the effectiveness of coding priors in compressed domain video SR.

The second one proposes a two-stage learning paradigm for faithfully driving the transformation from the existing standard dynamic range television (SDRTV) video content to the corresponding HDR television (HDRTV) counterpart, which adopts hybrid attention mechanisms to exploit spatial, channel-wise, and regional correlations fully. Specifically, in the first domain mapping stage, the Depth-wise Self-Attention (DSA) and Global Calibration Layer (GCL) are proposed, which adaptively leverage feature intra-relationships to construct better scene representation. In the second highlight generation stage, considering that the over-exposed regions always lead to detail loss which brings enormous challenges to the conversion, a Regional Self-Attention (RSA) module is proposed to restore missing highlights efficiently. Extensive experimental results on public databases exhibit that the proposed method outperforms state-of-the-art approaches in different quality evaluation measures.

The third one presents a deep-learning based attribute map enhancement method for compressed point cloud data, which fully leverages the potential bitsteam guidance in the occupancy map to conduct adaptive local feature modification and selective non-local attention. The occupancy map that screens out the padding pixels in the attribute map provides the model with conceivable clues in improving the point cloud quality. The proposed approach functions on the decoder side and can be feasibly incorporated into the V-PCC codec. Extensive evaluations show the effectiveness of the proposed method in artifact removal, and equivalently 5.0% BD-rate savings can be obtained.

Therefore, this thesis studies the enhancement of visual data in practical scenarios and promotes the development of more immersive and realistic interactions in various applications. The advantages of deep and bitstream prior are systematically and specifically studied to improve the performance of visual enhancement. Comprehensive evaluations validate the effectiveness of the proposed approaches, which will benefit various aspects of society that require visual data as the fundamental component, such as industry, entertainment, and healthcare.