Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

18 Scopus Citations
View graph of relations

Related Research Unit(s)


Original languageEnglish
Article number8820129
Pages (from-to)4808-4820
Journal / PublicationIEEE Transactions on Cybernetics
Issue number11
Online published30 Aug 2019
Publication statusPublished - Nov 2020


This article addresses two key issues in RGB-D salient object detection based on the convolutional neural network (CNN). 1) How to bridge the gap between the "data-hungry" nature of CNNs and the insufficient labeled training data in the depth modality? 2) How to take full advantages of the complementary information among two modalities. To solve the first problem, we model the depth-induced saliency detection as a CNN-based cross-modal transfer learning problem. Instead of directly adopting the RGB CNN as initialization, we additionally train a modality classification network (MCNet) to encourage discriminative modality-specific representations in minimizing the modality classification loss. To solve the second problem, we propose a densely cross-level feedback topology, in which the cross-modal complements are combined in each level and then densely fed back to all shallower layers for sufficient cross-level interactions. Compared to traditional two-stream frameworks, the proposed one can better explore, select, and fuse cross-modal cross-level complements. Experiments show the significant and consistent improvements of the proposed CNN framework over other state-of-the-art methods.

Research Area(s)

  • Convolutional neural networks (CNNs), crossmodal transfer, dense feedback, RGB-D, salient object detection