CONQUER : Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

20 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMM' 21
Subtitle of host publicationProceedings of the 29th ACM International Conference on Multimedia
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages3900-3908
Number of pages9
ISBN (print)978-1-4503-8651-7
Publication statusPublished - 2021

Publication series

NameMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Conference

Title29th ACM International Conference on Multimedia (MM 2021)
LocationHybrid
PlaceChina
CityChengdu
Period20 - 24 October 2021

Abstract

This paper tackles a recently proposed Video Corpus Moment Retrieval task. This task is essential because advanced video retrieval applications should enable users to retrieve a precise moment from a large video corpus. We propose a novel CONtextual QUery-awarE Ranking∼(CONQUER) model for effective moment localization and ranking. CONQUER explores query context for multi-modal fusion and representation learning in two different steps. The first step derives fusion weights for the adaptive combination of multi-modal video content. The second step performs bi-directional attention to tightly couple video and query as a single joint representation for moment localization. As query context is fully engaged in video representation learning, from feature fusion to transformation, the resulting feature is user-centered and has a larger capacity in capturing multi-modal signals specific to query. We conduct studies on two datasets, TVR for closed-world TV episodes and DiDeMo for open-world user-generated videos, to investigate the potential advantages of fusing video and query online as a joint representation for moment retrieval.

Research Area(s)

  • cross-modal retrieval, moment localization with natural language

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval. / Hou, Zhijian; Ngo, Chong-Wah; Chan, W. K.
MM' 21 : Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021. p. 3900-3908 (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review