Reinforcement Learning-Based Interactive Video Search

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

4 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMultiMedia Modeling
Subtitle of host publication28th International Conference, MMM 2022, Phu Quoc, Vietnam, June 6–10, 2022, Proceedings, Part II
EditorsBjörn Þór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, Benoit Huet
Place of PublicationCham
PublisherSpringer
Pages549-555
ISBN (electronic)978-3-030-98355-0
ISBN (print)9783030983543
Publication statusPublished - 2022

Publication series

NameLecture Notes in Computer Science
Volume13142
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Conference

Title28th International Conference on MultiMedia Modeling (MMM 2022)
LocationPhu Quoc Island (on-site and on-line)
PlaceViet Nam
CityPhu Quoc
Period6 - 10 June 2022

Abstract

Despite the rapid progress in text-to-video search due to the advancement of cross-modal representation learning, the existing techniques still fall short in helping users to rapidly identify the search targets. Particularly, in the situation that a system suggests a long list of similar candidates, the user needs to painstakingly inspect every search result. The experience is frustrated with repeated watching of similar clips, and more frustratingly, the search targets may be overlooked due to mental tiredness. This paper explores reinforcement learning-based (RL) searching to relieve the user from the burden of brute force inspection. Specifically, the system maintains a graph connecting shots based on their temporal and semantic relationship. Using the navigation paths outlined by the graph, an RL agent learns to seek a path that maximizes the reward based on the continuous user feedback. In each round of interaction, the system will recommend one most likely video candidate for users to inspect. In addition to RL, two incremental changes are introduced to improve VIREO search engine. First, the dual-task cross-modal representation learning has been revised to index phrases and model user query and unlikelihood relationship more effectively. Second, two more deep features extracted from SlowFast and Swin-Transformer, respectively, are involved in dual-task model training. Substantial improvement is noticed for the automatic Ad-hoc search (AVS) task on the V3C1 dataset.

Research Area(s)

  • Feature enhancement, Interactive video retrieval, Query understanding, Reinforcement learning

Citation Format(s)

Reinforcement Learning-Based Interactive Video Search. / Ma, Zhixin; Wu, Jiaxin; Hou, Zhijian et al.
MultiMedia Modeling: 28th International Conference, MMM 2022, Phu Quoc, Vietnam, June 6–10, 2022, Proceedings, Part II. ed. / Björn Þór Jónsson; Cathal Gurrin; Minh-Triet Tran; Duc-Tien Dang-Nguyen; Anita Min-Chun Hu; Binh Huynh Thi Thanh; Benoit Huet. Cham: Springer, 2022. p. 549-555 (Lecture Notes in Computer Science; Vol. 13142).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review