Towards Image Segmentation Based on Deep Neural Networks: Exploration on Imperfect Data

面向基於深度神經網絡的圖像分割:不完美數據的探索

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
  • Kam Yiu LAM (Supervisor)
  • Chi Yin CHOW (External person) (External Co-Supervisor)
Award date4 Sept 2023

Abstract

Image segmentation is a fundamental task in vision with broad applications in various fields. The advent of deep neural networks, such as Convolutional Neural Networks (CNNs), has significantly advanced image segmentation technology. Currently, deep learning-based image segmentation has been widely implemented in photography, object recognition, automatic driving, medicine, and other domains. Neural network-based segmentation methods have achieved promising segmentation results high-resolution natural images. However, there is still ample room for research and development in image segmentation for ``imperfect images'' such as medical images and images taken with specialized equipment or under unique conditions. Imperfect images, such as those captured underwater, typically exhibit significant noise, and the resolution of the image is often low due to the high water resistance and capacity requirements of underwater equipment. The complex underwater environment and varying light conditions pose significant challenges for image segmentation. Similarly, medical images also contain noise due to the imaging equipment and principles, particularly ultrasound images. Moreover, medical images are predominantly black and white, limiting object differentiation to their shape, outline, and gray value rather than color. These factors significantly impact image segmentation performance. Of greater significance is the fact that a majority of current deep learning techniques that rely on imperfect images remain confined to a single task, thereby failing to adequately address the requirements of image analysis in practical settings. This thesis aims to investigate the effectiveness of image segmentation on imperfect images by selecting two distinct types of such images for analysis, namely, underwater photos and medical ultrasound images.

First, we explore how to use deep learning technology to denoise medical ultrasound images and improve the performance of instance segmentation. We propose a novel unsupervised learning approach called Dual Image (DI) for denoising medical ultrasound images. Unlike many existing supervised denoising methods, DI does not require clean medical ultrasound images. Instead, it uses Computed Tomography images and noise patches extracted from medical ultrasound images to achieve denoising. After denoising, we create a medical ultrasound image segmentation network called Segmenting on Ultrasound Image by strengthening the communication and fusion of different feature layers.

Second, we present Boundmask, a specialized instance segmentation framework for medical ultrasound median nerve images designed to achieve superior segmentation results. The Boundmask framework includes two key features. Firstly, we introduce the nesting attention module, which combines spatial and channel attention to enhance feature information even with a simple backbone. Secondly, we design a boundary-guided segmentation mechanism, which considers the unique traits and border information of objects during the segmentation process.

Then, we address the challenge of real-time instance segmentation of ultrasound images with limited datasets. To address this challenge, we propose CoarseInst, a novel weakly supervised framework comprising two stages. The first stage generates pseudo-mask labels through coarse mask generation, while the second stage uses self-training techniques to iteratively improve the quality of the masks. We also mitigate the performance loss by adding an object enhancement block and propose a pair of loss functions for noisy label training.

Lastly, we introduce a novel Transformer-based network for medical image segmentation, called MISTKD. The network comprises of a teacher network and a student network, and achieves similar performance to state-of-the-art transformer models with fewer parameters. This is accomplished by employing the teacher network to train the student network. During training, sequences are extracted from both the teacher and student encoder networks, and losses are calculated between them. This enables the student network to learn from the teacher network using knowledge distillation, resulting in improved segmentation accuracy.