Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Aligning a user query and video clips in cross-modal latent space and that with semantic concepts are two mainstream approaches for ad-hoc video search (AVS). However, the effectiveness of existing approaches is bottlenecked by the small sizes of available video-text datasets and the low quality of concept banks, which results in the failures of unseen queries and the out-of-vocabulary problem. This paper addresses these two problems by constructing a new dataset and developing a multi-word concept bank. Specifically, capitalizing on a generative model, we construct a new dataset consisting of 7 million generated text and video pairs for pre-training. To tackle the out-of-vocabulary problem, we develop a multi-word concept bank based on syntax analysis to enhance the capability of a state-of-the-art interpretable AVS method in modelling relationships between query words. We also study the impact of current advanced features on the method. Experimental results show that the integration of the above-proposed elements doubles the R@1 performance of the AVS method on the MSRVTT dataset and improves the xinfAP on the TRECVid AVS query sets for 2016–2023 (eight years) by a margin from 2% to 77%, with an average about 20%. The code and model are available at https://github.com/nikkiwoo-gh/ImprovedITV. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Original languageEnglish
Title of host publicationICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval
PublisherAssociation for Computing Machinery
Pages73-82
ISBN (Print)9798400706028
DOIs
Publication statusPublished - May 2024
Event14th International Conference on Multimedia Retrieval (ICMR 2024) - Dusit Thani Laguna Phuket, Phuket, Thailand
Duration: 10 Jun 202414 Jun 2024
https://icmr2024.org/

Publication series

NameICMR - Proceedings of the International Conference on Multimedia Retrieval

Conference

Conference14th International Conference on Multimedia Retrieval (ICMR 2024)
Country/TerritoryThailand
CityPhuket
Period10/06/2414/06/24
Internet address

Funding

This research is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Proposal ID: T2EP20222- 0047) and the CityU MF_EXT (project no. 9678180). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Ministry of Education, Singapore.

Research Keywords

  • Ad-hoc video search
  • Concept bank construction
  • Interpretable embedding
  • Large-scale video-text dataset
  • Out of vocabulary

Fingerprint

Dive into the research topics of 'Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank'. Together they form a unique fingerprint.

Cite this