Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning

Runwei Situ, Zhenguo Yang*, Jianming Lv, Qing Li, Wenyin Liu

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

6 Citations (Scopus)

Abstract

In this paper, we propose to learn Deep Semantic Space (DSS) for cross-modal event retrieval, which is achieved by exploiting deep learning models to extract semantic features from images and textual articles jointly. More specifically, a VGG network is used to transfer deep semantic knowledge from a large-scale image dataset to the target image dataset. Simultaneously, a fully-connected network is designed to model semantic representation from textual features (e.g., TF-IDF, LDA). Furthermore, the obtained deep semantic representations for image and text can be mapped into a high-level semantic space, in which the distance between data samples can be measured straightforwardly for cross-model event retrieval. In particular, we collect a dataset called Wiki-Flickr event dataset for cross-modal event retrieval, where the data are weakly aligned unlike image-text pairs in the existing cross-modal retrieval datasets. Extensive experiments conducted on both the Pascal Sentence dataset and our Wiki-Flickr event dataset show that our DSS outperforms the state-of-the-art approaches.
Original languageEnglish
Title of host publicationAdvances in Multimedia Information Processing – PCM 2018
Subtitle of host publication19th Pacific-Rim Conference on Multimedia, Hefei, China, September 21-22, 2018, Proceedings, Part II
EditorsRichang Hong, Wen-Huang Cheng, Toshihiko Yamasaki, Meng Wang, Chong-Wah Ngo
PublisherSpringer Verlag
Pages147-157
ISBN (Electronic)978-3-030-00767-6
ISBN (Print)978-3-030-00766-9
DOIs
Publication statusPublished - Sept 2018
Event19th Pacific-Rim Conference on Multimedia (PCM 2018) - Hefei, China
Duration: 21 Sept 201822 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11165 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th Pacific-Rim Conference on Multimedia (PCM 2018)
PlaceChina
CityHefei
Period21/09/1822/09/18

Research Keywords

  • Common space
  • Cross-modal event retrieval
  • Deep learning

Fingerprint

Dive into the research topics of 'Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning'. Together they form a unique fingerprint.

Cite this