Keyword-driven image captioning via Context-dependent Bilateral LSTM

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

7 Scopus Citations
View graph of relations

Author(s)

  • Shengfeng He
  • Xinhang Song
  • Pengxu Wei
  • Shuqiang Jiang
  • Qixiang Ye
  • Jianbin Jiao

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages781-786
ISBN (print)9781509060672
Publication statusPublished - Jul 2017

Publication series

Name
ISSN (electronic)1945-788X

Conference

Title2017 IEEE International Conference on Multimedia and Expo (ICME)
PlaceHong Kong
Period10 - 14 July 2017

Abstract

Image captioning has recently received much attention. Existing approaches, however, are limited to describing images with simple contextual information, which typically generate one sentence to describe each image with only a single contextual emphasis. In this paper, we address this limitation from a user perspective with a novel approach. Given some keywords as additional inputs, the proposed method would generate various descriptions according to the provided guidance. Hence, descriptions with different Focuses can be generated for the same image. Our method is based on a new Context-dependent Bilateral Long Short-Term Memory (CDB-LSTM) model to predict a keyword-driven sentence by considering the word dependence. The word dependence is explored externally with a bilateral pipeline, and internally with a unified and joint training process. Experiments on the MS COCO dataset demonstrate that the proposed approach not only significantly outperforms the baseline method but also shows good adaptation and consistency with various keywords.

Research Area(s)

  • Image Captioning, Keyword-driven, L-STM

Citation Format(s)

Keyword-driven image captioning via Context-dependent Bilateral LSTM. / Zhang, Xiaodan; He, Shengfeng; Song, Xinhang et al.
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017. Institute of Electrical and Electronics Engineers, Inc., 2017. p. 781-786 8019525.

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review