Saliency Detection and Feature Matching for Object Segmentation in Digital Images
數字圖像中目標分割的顯著度檢測與特徵匹配
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 13 Aug 2020 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(ef4e7c62-1fbb-459d-b9ef-e1e2a6d487c3).html |
---|---|
Other link(s) | Links |
Abstract
Saliency detection is a rising research trend in computer vision applications. The main goal of saliency detection is to model human perception-based detection, to provide useful information for many vision tasks such as object segmentation, image cropping, adaptive compression, image matching, health care system, and visual surveillance. The human visual system works as a filter to allocate more attention to the attractive and interesting regions of an image, which are called salient regions. To model a human-like perception-based saliency detection algorithm in cluttered images with noisy backgrounds is a challenging problem in computer vision. Recently, many saliency detection algorithms have been proposed, which exploit information from the background, as boundary priors, of an image to detect salient objects. These algorithms may not provide satisfactory detection results for color images due to the assimilation of local spatial information.
In the first phase of this work, the problem of saliency detection is investigated, and a novel technique, which uses the Porter-Duff method to compose binary maps obtained by fuzzy c-mean clustering, is proposed. Binary maps generated by fuzzy c-mean clustering, contain specific parts of the salient region that are composed by the Porter-Duff composition method. Outliers in extracted salient regions are removed in post-processing by the morphological technique. Finally, an image mask, in the composite form of a frequency, color, and location prior to the image, is used to extract the final saliency map from the blended binary maps.
In the second phase of this work, an unsupervised saliency detection technique using multi-color space-based morphological gradient images is proposed. These gradient images contain different edge features, which are useful to obtain an accurate counter-based superpixel image containing both foreground and background clusters. To remove background clusters, a robust background measuring technique, which describes the spatial information of an image cluster to image boundaries, is implemented. This geometric clarification method effectively removes multiple low-level clues to produce a precise and uniform saliency map. These initially obtained saliency maps are fused using a multi-map fusion technique, and a compact saliency map is obtained. Experiments on nine different data sets evaluated the performance of the proposed algorithm validating it on synthetic and real image datasets relative to several state-of-the-art algorithms.
In the third phase of this work, the features of supervised and unsupervised models were examined. Most existing saliency methods measure foreground saliency by using the contrast of a foreground region within the local context, or boundary priors and spatial compactness. These methods are not powerful enough to extract the precise salient region from noisy and cluttered backgrounds. The high-level features from both supervised and unsupervised methods were considered to propose an affinity-based robust background subtraction technique and maximum attention map using a pre-trained convolution neural network. This technique used pixel similarities to propagate salient pixels values among foreground regions, background regions, and the union of the foreground and background. This salient pixel value controls the foreground and background information using multiple pixel affinities. The maximum attention map is derived from the convolution neural network using the features of the Pooling and Relu layers. It can simultaneously detect the salient regions from images with high contrast to the background. The experimental results demonstrate the effectiveness of the proposed approach on six different saliency data sets and benchmarks. This approach improves the detection quality with greater precision than other detection approaches.
In the later part of this work, a hypergraph matching technique is proposed for multiple feature point matching. Hypergraph matching has been shown to have great potential for solving many challenging problems in computer vision. Matching a large number of feature points in hypergraph constraints is an NP-hard problem. It requires results in high computational complexity in many algorithms, such as spectral graph matching, tensor graph matching, and reweighted random walk matching. In this work, a computationally efficient cluster-based algorithm for one-to-one hypergraph matching is proposed. This clusters a large hypergraph into many sub-hypergraphs that can be matched, based on a tensor model, to guarantee the maximum matching score. The results from the sub-hypergraphs are then used to match all feature points in the entire hypergraph. Simulation results on real and synthetic data sets validate the efficiency of the proposed method.
These results demonstrate that the proposed saliency detection methods are like a compact pipeline, which effectively simplifies the procedure of salient object detection in many computer vision problems. The findings from this research could foster new research trends and novel applications. Learning, using unsupervised and supervised techniques, would likely be more useful and practicable and could lead to further progress in the challenging problems of computer vision.
In the first phase of this work, the problem of saliency detection is investigated, and a novel technique, which uses the Porter-Duff method to compose binary maps obtained by fuzzy c-mean clustering, is proposed. Binary maps generated by fuzzy c-mean clustering, contain specific parts of the salient region that are composed by the Porter-Duff composition method. Outliers in extracted salient regions are removed in post-processing by the morphological technique. Finally, an image mask, in the composite form of a frequency, color, and location prior to the image, is used to extract the final saliency map from the blended binary maps.
In the second phase of this work, an unsupervised saliency detection technique using multi-color space-based morphological gradient images is proposed. These gradient images contain different edge features, which are useful to obtain an accurate counter-based superpixel image containing both foreground and background clusters. To remove background clusters, a robust background measuring technique, which describes the spatial information of an image cluster to image boundaries, is implemented. This geometric clarification method effectively removes multiple low-level clues to produce a precise and uniform saliency map. These initially obtained saliency maps are fused using a multi-map fusion technique, and a compact saliency map is obtained. Experiments on nine different data sets evaluated the performance of the proposed algorithm validating it on synthetic and real image datasets relative to several state-of-the-art algorithms.
In the third phase of this work, the features of supervised and unsupervised models were examined. Most existing saliency methods measure foreground saliency by using the contrast of a foreground region within the local context, or boundary priors and spatial compactness. These methods are not powerful enough to extract the precise salient region from noisy and cluttered backgrounds. The high-level features from both supervised and unsupervised methods were considered to propose an affinity-based robust background subtraction technique and maximum attention map using a pre-trained convolution neural network. This technique used pixel similarities to propagate salient pixels values among foreground regions, background regions, and the union of the foreground and background. This salient pixel value controls the foreground and background information using multiple pixel affinities. The maximum attention map is derived from the convolution neural network using the features of the Pooling and Relu layers. It can simultaneously detect the salient regions from images with high contrast to the background. The experimental results demonstrate the effectiveness of the proposed approach on six different saliency data sets and benchmarks. This approach improves the detection quality with greater precision than other detection approaches.
In the later part of this work, a hypergraph matching technique is proposed for multiple feature point matching. Hypergraph matching has been shown to have great potential for solving many challenging problems in computer vision. Matching a large number of feature points in hypergraph constraints is an NP-hard problem. It requires results in high computational complexity in many algorithms, such as spectral graph matching, tensor graph matching, and reweighted random walk matching. In this work, a computationally efficient cluster-based algorithm for one-to-one hypergraph matching is proposed. This clusters a large hypergraph into many sub-hypergraphs that can be matched, based on a tensor model, to guarantee the maximum matching score. The results from the sub-hypergraphs are then used to match all feature points in the entire hypergraph. Simulation results on real and synthetic data sets validate the efficiency of the proposed method.
These results demonstrate that the proposed saliency detection methods are like a compact pipeline, which effectively simplifies the procedure of salient object detection in many computer vision problems. The findings from this research could foster new research trends and novel applications. Learning, using unsupervised and supervised techniques, would likely be more useful and practicable and could lead to further progress in the challenging problems of computer vision.