Compare and Reweight : Distinctive Image Captioning Using Similar Images Sets

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

29 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publicationProceedings, Part I
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
PublisherSpringer
Pages370-386
ISBN (electronic)9783030584528
ISBN (print)9783030584511
Publication statusOnline published - 3 Nov 2020

Publication series

NameLecture Notes in Computer Science
Volume12346
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Conference

Title16th European Conference on Computer Vision (ECCV 2020)
LocationOnline
PlaceUnited Kingdom
CityGlasgow
Period23 - 28 August 2020

Abstract

A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric—between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study. Project page: https://wenjiaxu.github.io/ciderbtw/.

Citation Format(s)

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets. / Wang, Jiuniu; Xu, Wenjia; Wang, Qingzhong et al.
Computer Vision – ECCV 2020: Proceedings, Part I. ed. / Andrea Vedaldi; Horst Bischof; Thomas Brox; Jan-Michael Frahm. Springer, 2020. p. 370-386 (Lecture Notes in Computer Science; Vol. 12346).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review