Distortion Map-Guided Feature Rectification for Efficient Video Semantic Segmentation

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

View graph of relations

Related Research Unit(s)


Original languageEnglish
Journal / PublicationIEEE Transactions on Multimedia
Online published16 Dec 2021
Publication statusOnline published - 16 Dec 2021


To leverage the strong cross-frame relations of videos, many video semantic segmentation methods tend to explore feature reuse and feature warping based on motion clues. However, since the video dynamics are too complex to model accurately, some warped feature values may be invalid. Moreover, the warping errors can accumulate across frames, thereby resulting in degraded segmentation performance. To tackle this problem, we present an efficient distortion map-guided feature rectification method for video semantic segmentation, specifically targeting the feature updating and correction on the distorted regions with unreliable optical flow. The updated features for the distorted regions are extracted from a light correction network (CoNet). A distortion map serves as the weighted attention to guide the feature rectification by aggregating the warped features and the updated features. The generation of the distortion map is simple yet effective in predicting the distorted areas in the warped features, i.e., moving boundaries, thin objects, and occlusions. In addition, we propose an auxiliary edge-semantics loss to implement the distorted region supervision with classes. Our network is trained in an end-to-end manner and highly modular. Comprehensive experiments on Cityscapes and CamVid datasets demonstrate that the proposed method has achieved state-of-the-art performance by weighing accuracy, inference speed, and temporal consistency on video semantic segmentation.

Research Area(s)

  • deep neural networks, Distortion, Feature extraction, feature warping and correction, Image analysis, Image segmentation, Optical distortion, optical flow, Optical imaging, Semantics, Video semantic segmentation