Fusion of Multimodal Embeddings for Ad-Hoc Video Search
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | 2019 International Conference on Computer Vision, Workshop, ICCV 2019 |
Subtitle of host publication | Proceedings |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1868-1872 |
ISBN (electronic) | 978-1-7281-5023-9 |
ISBN (print) | 978-1-7281-5024-6 |
Publication status | Published - Oct 2019 |
Publication series
Name | Proceedings - International Conference on Computer Vision Workshop, ICCV |
---|---|
ISSN (Print) | 2473-9936 |
ISSN (electronic) | 2473-9944 |
Conference
Title | 17th IEEE/CVF International Conference on Computer Vision (ICCV 2019) |
---|---|
Location | COEX Convention Center |
Place | Korea, Republic of |
City | Seoul |
Period | 27 October - 2 November 2019 |
Link(s)
Abstract
The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and free-style (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS benchmarking in TRECVID.
In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.
In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.
Research Area(s)
- Deep learning, Multimedia, Multimodal embeddings, Multimodal fusion, Video search
Bibliographic Note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).
Citation Format(s)
Fusion of Multimodal Embeddings for Ad-Hoc Video Search. / Francis, Danny; Nguyen, Phuong Anh; Huet, Benoit et al.
2019 International Conference on Computer Vision, Workshop, ICCV 2019: Proceedings. Institute of Electrical and Electronics Engineers, 2019. p. 1868-1872 9021984 (Proceedings - International Conference on Computer Vision Workshop, ICCV).
2019 International Conference on Computer Vision, Workshop, ICCV 2019: Proceedings. Institute of Electrical and Electronics Engineers, 2019. p. 1868-1872 9021984 (Proceedings - International Conference on Computer Vision Workshop, ICCV).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review