CNN-Based RGB-D Salient Object Detection : Learn, Select, and Fuse

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

9 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Pages (from-to)2076–2096
Journal / PublicationInternational Journal of Computer Vision
Issue number7
Online published5 May 2021
Publication statusPublished - Jul 2021


The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which we use the progressive predictions from the well-learned source modality to supervise learning feature hierarchies and inference in the new modality. To better select complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal cross-level interactions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in learning from a new modality, the advantages of the proposed multi-modal fusion pattern in selecting and fusing cross-modal complements, and the generalization of the proposed designs in different tasks.

Research Area(s)

  • Convolutional neural network, Cross-modal distillation, RGB-D, Salient object detection