On Distinctive Image Captioning via Comparing and Reweighting
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 2088-2103 |
Journal / Publication | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 45 |
Issue number | 2 |
Online published | 16 Mar 2022 |
Publication status | Published - Feb 2023 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85126518407&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(c621469f-0019-45ee-be2f-7e0cb1a51de2).html |
Abstract
Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images. First, we propose a distinctiveness metric—between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness; however, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions. In contrast, we reweight each ground-truth caption according to its distinctiveness during training. We further integrate a long-tailed weight strategy to highlight the rare words that contain more information, and captions from the similar image set are sampled as negative examples to encourage the generated sentence to be unique. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study. © 2022 IEEE.
Research Area(s)
- Annotations, between-set CIDEr, distinctiveness, Image captioning, Maximum likelihood estimation, Measurement, metric, Semantics, Training, training strategies, Web and internet services, Xenon
Citation Format(s)
On Distinctive Image Captioning via Comparing and Reweighting. / Wang, Jiuniu; Xu, Wenjia; Wang, Qingzhong et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 2, 02.2023, p. 2088-2103.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 2, 02.2023, p. 2088-2103.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available