Neighbours Matter : Image Captioning with Similar Images
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | 31st British Machine Vision Conference, BMVC 2020 |
Publisher | British Machine Vision Association, BMVA |
Number of pages | 14 |
Publication status | Published - Sept 2020 |
Publication series
Name | British Machine Vision Conference, BMVC |
---|
Conference
Title | 31st British Machine Vision Conference (BMVC 2020) |
---|---|
Location | Virtual |
Period | 7 - 10 September 2020 |
Link(s)
Document Link | Links
|
---|---|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85132254968&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(0e99a2a5-0101-4d92-9cda-f475617c59f3).html |
Abstract
Most image captioning models aim to generate captions based solely on the input image. However images that are similar to the given input image contain variations of the same or similar concepts as the input image. Thus, aggregating information over similar images could be used to improve image captioning models, by strengthening or inferring concepts that are in the input image. In this paper, we propose an image captioning model based on KNN graphs composed of the input image and its similar images, where each node denotes an image or a caption. An attention-in-attention (AiA) model is developed to refine the node representations. Using the refined features significantly improves the baseline performance, e.g., CIDEr score obtained by Updown model increases from 120.1 to 125.6. Compared with the state-of-the-art performance, our proposed method obtains 129.3 of CIDEr and 22.6 of SPICE on Karpathy's test split, which is competitive with the models that employ fine-grained image features such as scene graphs and image parsing trees. © 2020. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
Citation Format(s)
Neighbours Matter: Image Captioning with Similar Images. / Wang, Qingzhong; Wang, Jiuniu; Chan, Antoni B. et al.
31st British Machine Vision Conference, BMVC 2020. British Machine Vision Association, BMVA, 2020. (British Machine Vision Conference, BMVC).
31st British Machine Vision Conference, BMVC 2020. British Machine Vision Association, BMVA, 2020. (British Machine Vision Conference, BMVC).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review