Image Deblurring and Matting via Deep Learning
基於深度學習的圖像去模糊和摳圖算法
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 8 Mar 2018 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(46466122-c601-42f4-b475-b9a006a89eac).html |
---|---|
Other link(s) | Links |
Abstract
Low level vision is an important area of computer vision. It ranges from image generation to image pixel manipulation. Unlike mid and high level vision tasks, which mainly focus on image understanding, low level vision aims at synthesizing various kinds of images. It has a lot of applications including, but are not limited to, image denoising, image debluring, image super-resolution and image matting. With the prevalent deep learning frameworks and parallel computing prototypes, existing low level vision tasks can be effectively and efficiently solved via deep learning.
In this thesis, we apply deep learning to solve two important low-level vision tasks: image deblurring and matting. We propose to utilize fully convolutional neural networks (FCNNs) to learn a good image prior in non-blind image deblurring. We also apply spatially variant recurrent neural networks (RNNs) to directly solve dynamic scene blind image deblurring. We also develop a deep learning based image matting method in this thesis. The proposed network refines the alpha matte under the guidance of the learned feature maps by using skip links.
Image deblurring can be divided into two steps, blind deblurring and non-blind deblurring. Blind deblurring estimates the blur kernels for an input blurry image, while non-blind deblurring reconstructs an output clean image given the input blurry image and the estimated blur kernels. To solve the non-blind deblurring problem, we propose a FCNN for iterative non-blind deconvolution. We decompose the non-blind deconvolution problem into image denoising and image deconvolution. We train a FCNN to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step. In contrast to existing deep neural network based methods, we iteratively deconvolve the input blurred image in a multi-stage framework. The proposed method is able to learn an adaptive image prior, which keeps both local (detail) and global (structural) information.
However, like most existing deblurring algorithms, the above method cannot deal with the dynamic scene deblurring problem. This is because in dynamic scene deblurring, foreground and background objects may require different blur kernels and it is difficult to accurately segment the foreground from the background. To address this limitation, we propose spatially variant recurrent neural networks (RNNs) for dynamic scene deblurring without the need to estimate the blur kernels. Our work is motivated by the interesting observation that RNNs are equivalent to the deconvolution operation when used to handle 1D signal restoration. We develop a deep convolutional neural network (CNN) to learn the pixel-wise weights. With the guidance of learned pixel-wise weights, the proposed spatially variant RNN is able to recover clear images from dynamic scenes. We analyze the relationship between the proposed spatially variant RNN and the deconvolution process to show that the spatially variant RNN is able to model the deblurring process. We also develop an auto-encoder scheme to reduce the size of the network.
We also develop a deep learning-based image matting algorithm in this thesis. Our algorithm is motivated by the fact that the success of propagation-based methods for alpha matting relies upon the propagation of trimaps under the guidance of the handcrafted features. Therefore, we propose an encoder and decoder architecture to estimate the useful features. Using the skip links, the trimaps are refined under the guidance of the learned features from the encoder part. We also propose an iterative approach to increase the accuracy of the alpha matting estimation process. By training in an endto-end manner, the proposed algorithm is able to better recover fine details without any initializations.
Experimental results demonstrate that the above proposed methods can perform favorably against state-of-the-art methods in terms of speed and quality.
In this thesis, we apply deep learning to solve two important low-level vision tasks: image deblurring and matting. We propose to utilize fully convolutional neural networks (FCNNs) to learn a good image prior in non-blind image deblurring. We also apply spatially variant recurrent neural networks (RNNs) to directly solve dynamic scene blind image deblurring. We also develop a deep learning based image matting method in this thesis. The proposed network refines the alpha matte under the guidance of the learned feature maps by using skip links.
Image deblurring can be divided into two steps, blind deblurring and non-blind deblurring. Blind deblurring estimates the blur kernels for an input blurry image, while non-blind deblurring reconstructs an output clean image given the input blurry image and the estimated blur kernels. To solve the non-blind deblurring problem, we propose a FCNN for iterative non-blind deconvolution. We decompose the non-blind deconvolution problem into image denoising and image deconvolution. We train a FCNN to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step. In contrast to existing deep neural network based methods, we iteratively deconvolve the input blurred image in a multi-stage framework. The proposed method is able to learn an adaptive image prior, which keeps both local (detail) and global (structural) information.
However, like most existing deblurring algorithms, the above method cannot deal with the dynamic scene deblurring problem. This is because in dynamic scene deblurring, foreground and background objects may require different blur kernels and it is difficult to accurately segment the foreground from the background. To address this limitation, we propose spatially variant recurrent neural networks (RNNs) for dynamic scene deblurring without the need to estimate the blur kernels. Our work is motivated by the interesting observation that RNNs are equivalent to the deconvolution operation when used to handle 1D signal restoration. We develop a deep convolutional neural network (CNN) to learn the pixel-wise weights. With the guidance of learned pixel-wise weights, the proposed spatially variant RNN is able to recover clear images from dynamic scenes. We analyze the relationship between the proposed spatially variant RNN and the deconvolution process to show that the spatially variant RNN is able to model the deblurring process. We also develop an auto-encoder scheme to reduce the size of the network.
We also develop a deep learning-based image matting algorithm in this thesis. Our algorithm is motivated by the fact that the success of propagation-based methods for alpha matting relies upon the propagation of trimaps under the guidance of the handcrafted features. Therefore, we propose an encoder and decoder architecture to estimate the useful features. Using the skip links, the trimaps are refined under the guidance of the learned features from the encoder part. We also propose an iterative approach to increase the accuracy of the alpha matting estimation process. By training in an endto-end manner, the proposed algorithm is able to better recover fine details without any initializations.
Experimental results demonstrate that the above proposed methods can perform favorably against state-of-the-art methods in terms of speed and quality.