Image captioning via semantic element embedding

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

25 Scopus Citations
View graph of relations

Author(s)

  • Shengfeng He
  • Xinhang Song
  • Jianbin Jiao
  • Qixiang Ye

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)212-221
Journal / PublicationNeurocomputing
Volume395
Online published13 Jul 2019
Publication statusPublished - 28 Jun 2020

Abstract

Image caption approaches that use the global Convolutional Neural Network (CNN) features are not able to represent and describe all the important elements in complex scenes. In this paper, we propose to enrich the semantic representations of images and update the language model by proposing semantic element embedding. For the semantic element discovery, an object detection module is used to predict regions of the image, and a captioning model, Long Short-Term Memory (LSTM), is employed to generate local descriptions for these regions. The predicted descriptions and categories are used to generate the semantic feature, which not only contains detailed information but also shares a word space with descriptions, and thus bridges the modality gap between visual images and semantic captions. We further integrate the CNN feature with the semantic feature into the proposed Element Embedding LSTM (EE-LSTM) model to predict a language description. Experiments on MS COCO datasets demonstrate that the proposed approach outperforms conventional caption methods and is flexible to combine with baseline models to achieve superior performance.

Research Area(s)

  • CNN, Element embedding, Image captioning, LSTM

Bibliographic Note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Citation Format(s)

Image captioning via semantic element embedding. / Zhang, Xiaodan; He, Shengfeng; Song, Xinhang et al.
In: Neurocomputing, Vol. 395, 28.06.2020, p. 212-221.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review