Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension

Zaiquan Yang, Yuhao Liu*, Jiaying Lin, Gerhard Hancke*, Rynson W.H. Lau*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

This paper explores the weakly-supervised referring image segmentation (WRIS) problem, and focuses on a challenging setup where target localization is learned directly from image-text pairs. We note that the input text description typically already contains detailed information on how to localize the target object, and we also observe that humans often follow a step-by-step comprehension process (i.e., progressively utilizing target-related attributes and relations as cues) to identify the target object. Hence, we propose a novel Progressive Comprehension Network (PCNet) to leverage target-related textual cues from the input description for progressively localizing the target object. Specifically, we first use a Large Language Model (LLM) to decompose the input text description into short phrases. These short phrases are taken as target-related cues and fed into a Conditional Referring Module (CRM) in multiple stages, to allow updating the referring text embedding and enhance the response map for target localization in a multi-stage manner. Based on the CRM, we then propose a Region-aware Shrinking (RaS) loss to constrain the visual localization to be conducted progressively in a coarse-to-fine manner across different stages. Finally, we introduce an Instance-aware Disambiguation (IaD) loss to suppress instance localization ambiguity by differentiating overlapping response maps generated by different referring texts on the same image. Extensive experiments show that our method outperforms SOTA methods on three common benchmarks. © 2024 Neural information processing systems foundation. All rights reserved.
Original languageEnglish
Title of host publication38th Conference on Neural Information Processing Systems (NeurIPS 2024)
EditorsA. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang
PublisherNeural Information Processing Systems (NeurIPS)
Pages93213-93239
ISBN (Electronic)9798331314385
Publication statusPublished - Dec 2024
Event38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024) - Vancouver Convention Center, Vancouver, Canada
Duration: 10 Dec 202415 Dec 2024
https://neurips.cc/
https://proceedings.neurips.cc/

Publication series

NameAdvances in Neural Information Processing Systems
Volume37
ISSN (Print)1049-5258

Conference

Conference38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
Abbreviated titleNeurIPS 2024
Country/TerritoryCanada
CityVancouver
Period10/12/2415/12/24
Internet address

Fingerprint

Dive into the research topics of 'Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension'. Together they form a unique fingerprint.

Cite this