Salient object detection with image-level binary supervision

Pengjie Wang, Yuxuan Liu, Ying Cao*, Xin Yang, Yu Luo, Huchuan Lu, Zijian Liang, Rynson W.H. Lau

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

12 Citations (Scopus)

Abstract

Recent deep learning based salient object detection (SOD) methods have achieved impressive performance. However, while fully-supervised methods require a large amount of labeled data, weakly-supervised methods still require a considerable human effort. To address this problem, we propose a novel weakly-supervised method for salient object detection based on only binary image tags, which are much cheaper to collect. Our basic idea is to construct a dataset of images that are labeled as either salient (with salient objects) or non-salient (without salient objects), and leverage such binary labels as supervision to learn a salient object detector based on existing unsupervised methods. In particular, we propose a target saliency map hallucinator, which can synthesize pseudo ground truth saliency maps for the salient images in the training data solely from binary labels. We can then use the pseudo ground truth labels to train a salient object detector. Experimental results show that our method performs comparably to the state-of-the-art weakly-supervised methods, but requires considerably less human supervision.
Original languageEnglish
Article number108782
JournalPattern Recognition
Volume129
Online published10 May 2022
DOIs
Publication statusPublished - Sept 2022

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • Binary labels
  • Salient object detection
  • Weak supervision

Fingerprint

Dive into the research topics of 'Salient object detection with image-level binary supervision'. Together they form a unique fingerprint.

Cite this