Study on Image Superpixel Segmentation Algorithms based on Structure Optimization


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date2 Feb 2021


Superpixel segmentation is a basic task aiming at grouping pixels into some high-level primitives based on the intrinsic properties, such as coherent color and similar texture. Instead of discretizing pixel-level entities, superpixels can produce perceptually meaningful atomic units that adhere to object boundaries and reduce the number of primitives. As the basic processing unit, superpixel has been widely used in many tasks, such as image enhancement, saliency detection, image segmentation, video segmentation, object tracking, object recognition, medical image retrieval, medical image segmentation, video analysis and summarization.  The notable achievement of superpixel is contributed to its consistence with human visual recognition and less data redundancy.

Most of the existing algorithms can achieve excellent segmentation performance and decrease the number of primitives.  However, there still exist some drawbacks and bottleneck, for example, the irregular shape of superpixels, the loss of detailed information in superpixel segmentation, and the combination of conventional superpixel segmentation methods with multi-view images, and so on. In view of the above problems, this thesis puts forward corresponding improvement measures respectively to promote the further development of this field.

One obvious disadvantage is that the shape of superpixels is irregular, which may induce the substantial increases in data storage of subsequent operations. The original intention of superpixel segmentation is to reduce the number of primitives not only from the visible image content, but also from the consideration of storage and operation.  Therefore, generating the storage-efficient and regular-shape superpixels is a crucial issue. For the irregular shape of superpixel, a superpixel segmentation method is proposed to generate approximately structural superpixels with sharp boundary adherence and comprehensive semantic information. The superpixel segmentation is formulated as a square-wise asymmetric partition problem, where the semantic perceptual superpixels are recorded in square level to preserve abundant semantic information and save storage simultaneously. Moreover, in order to achieve regular-shape superpixel units to better adhere to image boundaries and contours, a combinatorial optimization strategy is devised to achieve an optimal combination of squares and isolated pixels.

Another obvious weakness of the existing superpixel methods is the limitation to adapt to the local image details, such as the content boundaries and object contours. And in many cases of practical problems and industrial tasks, the local details are the key information, which should not be ignored. For most methods, the detailed boundaries of image content are still hard to be well preserved, due to the lack of the prior knowledge about the shape and size of the superpixels in an image.  These methods have to increase the number of superpixels and reduce the superpixel size to achieve the detailed boundaries. This may usually lead to large data redundancy in the sparse area, which is unwillingness for practical tasks. How to balance the detailed information and superpixel number is a challenging problem for superpixel segmentation to be applied in industrial tasks. To keep detailed image boundaries, a spatial regularization term to emphasize the spatial correlation is devised, and a spatially constrained subspace clustering based superpixel segmentation model is proposed to generate superpixels with more accurate and detailed boundaries, which is more appropriate for practical and industrial tasks.

In recent years, dual-camera system becomes more and more popular, which has been widely used in mobile phones and autonomous vehicles. Moreover, it turns out that stereo image pairs have better consistency with human perception scheme than a single image, and the information from the two views are complementary and correlative, which is conducive to scene representation and object modeling. However, the task of superpixel segmentation for stereo image pairs is a challenging new proposition, because the information consistency and difference between two viewpoints need to be considered jointly. For stereo images, the stereo vision task aims to obtain the superpixel segmentation results of the left and right views more cooperatively and consistently, rather than simply performing independent segmentation directly, but there is little research in this field. To produce stereo superpixels, a left-right interactive optimization framework for stereo superpixel segmentation is proposed. Considering the view difference between the left and right images, we first divide the images into paired region and non-paired region according to the disparity, and construct a matching relationship between paired regions to alleviate the matching errors caused by occlusion. Then, combined with the left-right matching consistency, we propose a collaborative optimization scheme to coordinately refine the matched superpixels of the left and right images in an interactive manner, and enforce the matched superpixels in stereo pairs become more consistent and accurate.

Guided by the structural optimization of superpixels, the problems in the existing methods of superpixel segmentation are improved from three aspects: structural storage, structural preservation and structural coordination. The corresponding solutions alleviate the defects of related problems and promote the further development of the field of superpixel segmentation.