Leveraging LLMs and Generative Models for Interactive Known-Item Video Search

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMultiMedia Modeling
Subtitle of host publication30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part IV
EditorsStevan Rudinac, Alan Hanjalic, Cynthia Liem, Marcel Worring, Björn Þór Jónsson, Bei Liu, Yoko Yamakata
Place of PublicationCham
PublisherSpringer
Pages380-386
ISBN (electronic)978-3-031-53302-0
ISBN (print)978-3-031-53301-3
Publication statusPublished - 2024

Publication series

NameLecture Notes in Computer Science
Volume14557
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Conference

Title30th International Conference on MultiMedia Modeling (MMM 2024)
PlaceNetherlands
CityAmsterdam
Period29 January - 2 February 2024

Abstract

While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In addition, generative models have the ability to imagine or visualize the verbose query as images. We integrate these new LLM capabilities into our existing system and evaluate their effectiveness on V3C1 and V3C2 datasets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Research Area(s)

  • Generative Model, Interactive Video Retrieval, Known-Item Search, Large Language Models

Citation Format(s)

Leveraging LLMs and Generative Models for Interactive Known-Item Video Search. / Ma, Zhixin; Wu, Jiaxin; Ngo, Chong Wah.
MultiMedia Modeling: 30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 – February 2, 2024, Proceedings, Part IV. ed. / Stevan Rudinac; Alan Hanjalic; Cynthia Liem; Marcel Worring; Björn Þór Jónsson; Bei Liu; Yoko Yamakata. Cham: Springer, 2024. p. 380-386 (Lecture Notes in Computer Science; Vol. 14557).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review