Physical Priors Based Representation Learning for Image Enhancement
基於物理先驗的圖像增強表徵學習
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 23 Mar 2021 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(5b3663e2-ef79-4d67-9b40-2196ec4b35ce).html |
---|---|
Other link(s) | Links |
Abstract
The human visual system (HVS) is highly accommodated to scenes with varying properties, with the ability of adaptively perceiving and preserving scene information in high resolution and proper dynamic range. Similarly, it is essential to empower the camera system of AI agents the ability of perceiving scenes as humans. Having an effective pixel-level visual perception model helps AI agents in scene-level understanding, reasoning, and decision making. In this thesis, we analyze typical degradation factors across the image formation pipeline, and investigate several methods to incorporate physical priors (i.e., scene properties and image representations) for image restoration and enhancement, including image super resolution, low-light restoration and rain removal.
We first study the super resolution problem. Since cameras use discrete pixels to record scene irradiance, distant scene objects would suffer from low resolution. It is challenging to super-resolve low resolution images, due to the diversity of image types with little shared properties as well as the speed required by online applications, e.g., target identification. We first explore the merits and demerits of recent deep learning based and conventional patch-based SR methods, and show that they can be integrated in a complementary manner, while balancing the reconstruction quality and time cost. Motivated by this initial results, we further propose an integration framework to take the results from FSRCNN and A+ methods as inputs, and directly learn a pixel-wise mapping between the inputs and the reconstructed results using the Gaussian Conditional Random Fields (GCRFs). The learned pixel-wise integration mapping is flexible to accommodate different upscaling factors. Experimental results show that the proposed framework can achieve superior performance compared with the state-of-the-arts, while being efficient.
We then study low-light scenes, where images taken by cameras at low-light typically suffer from two problems. First, they have low visibility (i.e., small pixel values). Second, noise becomes significant and disrupts the image content, due to low signal-to-noise ratio. Most existing low-light image enhancement methods, however, learn from noise-negligible datasets. They rely on users having good photographic skills in taking images with low noise. Unfortunately, this is not the case for majority of the low-light images. On the other hand, attempting to enhance a low-light image while removing its noise is an ill-posed problem. We observe that noise exhibits different levels of contrast in different frequency layers, and it is much easier to detect noise in the low-frequency layer than in the high one. Inspired by this observation, we propose a frequency-based decomposition-guided model for low-light image restoration and enhancement. Based on this model, we present a novel network that first learns to recover image objects in the low-frequency layer and then enhances high-frequency details guided by the recovered image objects. In addition, we have prepared a new low-light image dataset with real noise to facilitate learning. Finally, we have conducted extensive experiments to show that the proposed method outperforms state-of-the-art approaches in enhancing practical noisy low-light images.
Finally, we study outdoor scenes with extreme weather. In particular, we study the rain effect, which degrades image visual quality and disrupts object structures, obscuring their details and erasing their colors. Existing rain removal methods are primarily based on modeling either visual appearances of rain or its physical characteristics (e.g., rain direction and density), and thus suffer from two common problems. First, due to the stochastic nature of rain, they tend to fail in recognizing rain streaks correctly, and wrongly remove image structures and details. Second, they fail to recover the image colors erased by heavy rain. We address these two problems with the following three contributions. First, we propose a novel PHP block to aggregate comprehensive spatial and hierarchical information for removing rain streaks of different sizes. Second, we propose a novel network to first remove rain streaks, then recover objects structures/colors, and finally enhance details. Third, to train the network, we prepare a new dataset, and propose a novel loss function to introduce semantic and color regularization for rain removal. Extensive experiments demonstrate the superiority of the proposed method over state-of-the-art rain removal methods on both synthesized and real-world data, in terms of visual quality, quantitative accuracy, and running speed.
We first study the super resolution problem. Since cameras use discrete pixels to record scene irradiance, distant scene objects would suffer from low resolution. It is challenging to super-resolve low resolution images, due to the diversity of image types with little shared properties as well as the speed required by online applications, e.g., target identification. We first explore the merits and demerits of recent deep learning based and conventional patch-based SR methods, and show that they can be integrated in a complementary manner, while balancing the reconstruction quality and time cost. Motivated by this initial results, we further propose an integration framework to take the results from FSRCNN and A+ methods as inputs, and directly learn a pixel-wise mapping between the inputs and the reconstructed results using the Gaussian Conditional Random Fields (GCRFs). The learned pixel-wise integration mapping is flexible to accommodate different upscaling factors. Experimental results show that the proposed framework can achieve superior performance compared with the state-of-the-arts, while being efficient.
We then study low-light scenes, where images taken by cameras at low-light typically suffer from two problems. First, they have low visibility (i.e., small pixel values). Second, noise becomes significant and disrupts the image content, due to low signal-to-noise ratio. Most existing low-light image enhancement methods, however, learn from noise-negligible datasets. They rely on users having good photographic skills in taking images with low noise. Unfortunately, this is not the case for majority of the low-light images. On the other hand, attempting to enhance a low-light image while removing its noise is an ill-posed problem. We observe that noise exhibits different levels of contrast in different frequency layers, and it is much easier to detect noise in the low-frequency layer than in the high one. Inspired by this observation, we propose a frequency-based decomposition-guided model for low-light image restoration and enhancement. Based on this model, we present a novel network that first learns to recover image objects in the low-frequency layer and then enhances high-frequency details guided by the recovered image objects. In addition, we have prepared a new low-light image dataset with real noise to facilitate learning. Finally, we have conducted extensive experiments to show that the proposed method outperforms state-of-the-art approaches in enhancing practical noisy low-light images.
Finally, we study outdoor scenes with extreme weather. In particular, we study the rain effect, which degrades image visual quality and disrupts object structures, obscuring their details and erasing their colors. Existing rain removal methods are primarily based on modeling either visual appearances of rain or its physical characteristics (e.g., rain direction and density), and thus suffer from two common problems. First, due to the stochastic nature of rain, they tend to fail in recognizing rain streaks correctly, and wrongly remove image structures and details. Second, they fail to recover the image colors erased by heavy rain. We address these two problems with the following three contributions. First, we propose a novel PHP block to aggregate comprehensive spatial and hierarchical information for removing rain streaks of different sizes. Second, we propose a novel network to first remove rain streaks, then recover objects structures/colors, and finally enhance details. Third, to train the network, we prepare a new dataset, and propose a novel loss function to introduce semantic and color regularization for rain removal. Extensive experiments demonstrate the superiority of the proposed method over state-of-the-art rain removal methods on both synthesized and real-world data, in terms of visual quality, quantitative accuracy, and running speed.