CONQUER : Contextual Query-aware Ranking for Video Corpus Moment Retrieval
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | MM' 21 |
Subtitle of host publication | Proceedings of the 29th ACM International Conference on Multimedia |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 3900-3908 |
Number of pages | 9 |
ISBN (print) | 978-1-4503-8651-7 |
Publication status | Published - 2021 |
Publication series
Name | MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia |
---|
Conference
Title | 29th ACM International Conference on Multimedia (MM 2021) |
---|---|
Location | Hybrid |
Place | China |
City | Chengdu |
Period | 20 - 24 October 2021 |
Link(s)
Abstract
This paper tackles a recently proposed Video Corpus Moment Retrieval task. This task is essential because advanced video retrieval applications should enable users to retrieve a precise moment from a large video corpus. We propose a novel CONtextual QUery-awarE Ranking∼(CONQUER) model for effective moment localization and ranking. CONQUER explores query context for multi-modal fusion and representation learning in two different steps. The first step derives fusion weights for the adaptive combination of multi-modal video content. The second step performs bi-directional attention to tightly couple video and query as a single joint representation for moment localization. As query context is fully engaged in video representation learning, from feature fusion to transformation, the resulting feature is user-centered and has a larger capacity in capturing multi-modal signals specific to query. We conduct studies on two datasets, TVR for closed-world TV episodes and DiDeMo for open-world user-generated videos, to investigate the potential advantages of fusing video and query online as a joint representation for moment retrieval.
Research Area(s)
- cross-modal retrieval, moment localization with natural language
Bibliographic Note
Research Unit(s) information for this publication is provided by the author(s) concerned.
Citation Format(s)
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval. / Hou, Zhijian; Ngo, Chong-Wah; Chan, W. K.
MM' 21 : Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021. p. 3900-3908 (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia).
MM' 21 : Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021. p. 3900-3908 (MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review