CNN-Based RGB-D Salient Object Detection : Learn, Select, and Fuse

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

16 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)2076–2096
Journal / PublicationInternational Journal of Computer Vision
Volume129
Issue number7
Online published5 May 2021
Publication statusPublished - Jul 2021

Abstract

The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which we use the progressive predictions from the well-learned source modality to supervise learning feature hierarchies and inference in the new modality. To better select complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal cross-level interactions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in learning from a new modality, the advantages of the proposed multi-modal fusion pattern in selecting and fusing cross-modal complements, and the generalization of the proposed designs in different tasks.

Research Area(s)

  • Convolutional neural network, Cross-modal distillation, RGB-D, Salient object detection