Semantic Segmentation Based on Deep Neural Networks

基於深度神經網絡的語義分割

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date10 Mar 2022

Abstract

Semantic segmentation is a fundamental and essential task in computer vision that can be formulated as assigning dense semantic labels to each pixel in a given image or video. The segmentation task has achieved unprecedented progress benefited from the prevalence of artificial neural networks, especially deep convolutional neural networks (CNN). Recently, efficient semantic segmentation on natural images and videos has received considerable attention with growing demands in autonomous vehicles, human-machine interaction, etc. These applications impose a strict requirement on low latency inference and expect competitive segmentation performance. Therefore, this thesis is dedicated to investigating promising image and video semantic segmentation methods to balance both effective inference time and high segmentation accuracy. Meanwhile, the significance of medical image segmentation in computer-aided diagnosis has attracted increasing attention on implementing new segmentation algorithms. Hence, this thesis is also committed to the development of a novel reinforcement learning-based method for left ventricle segmentation.

General semantic segmentation networks adopt large-scale backbones to advance the segmentation accuracy performances. However, heavy backbones suffer from high computational complexity and low inference speed, which is not suitable for real-time implementations. To tackle this problem, a light Cascaded Selective Resolution Network (CSRNet) is proposed to improve the performance of real-time image semantic segmentation through multiple context information embedding and enhanced feature aggregation. The proposed network builds a three-stage segmentation system, which integrates feature information from low resolution to high resolution in each stage and achieves feature refinement progressively. Comprehensive experiments on Cityscapes and CamVid datasets demonstrate that the proposed CSRNet outperforms the mainstream efficient semantic segmentation approaches by accuracy and can be performed in real-time.

Compared to images, videos involve a much larger volume of data and rich spatial-temporal information. Directly applying single image semantic segmentation to video sequences results in temporal inconsistent performances. To take advantage of the strong cross-frame relations of the video, we revisit the idea of feature reuse and feature warping. In this thesis, an efficient distortion map-guided feature rectification method is proposed for video semantic segmentation, specifically targeting the feature updating and correction on the distorted regions with unreliable optical flow. The distortion map is generated in a coarse-to-fine manner, and serves as weighted attention to guide the feature rectification process. The proposed network is end-to-end trainable and highly modular. Comprehensive experiments on Cityscapes and CamVid datasets demonstrate that the proposed method has achieved state-of-the-art performance by weighing accuracy, inference speed, and temporal consistency on video semantic segmentation.

Due to the vast applications in radiological diagnosis of medical semantic segmentation, we conduct a deeper study on performing the left ventricle segmentation with the deep reinforcement learning (DRL) technique. A DRL agent is designed to imitate the human process of left ventricle segmentation. For this purpose, the segmentation problem is formulated as a sequential decision-making (Markov Decision Process, MDP). The state, action and reward of the DRL agent are defined accordingly. The proposed DRL agent, which is optimized through double Deep Q-Network, can locate the edge points of left ventricle successively and ultimately obtains a closed segmentation mask. The experimental results show that the proposed model has outperformed the previous reinforcement learning methods and achieved comparable performances compared with deep learning baselines on Automated Cardiac Diagnosis Challenge 2017 dataset and Sunnybrook 2009 dataset.

    Research areas

  • Semantic segmentation, Deep Neural Networks, Attention mechanism, Deep learning, Reinforcement learning