Visual Relations Augmented Cross-modal Retrieval
Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | ICMR '20 |
Subtitle of host publication | Proceedings of the 2020 International Conference on Multimedia Retrieval |
Publisher | Association for Computing Machinery |
Pages | 9-15 |
ISBN (Print) | 978-1-4503-7087-5 |
Publication status | Published - Oct 2020 |
Publication series
Name | ICMR - Proceedings of the International Conference on Multimedia Retrieval |
---|
Conference
Title | ICMR 2020 |
---|---|
Place | Ireland |
City | Dublin |
Period | 26 - 29 October 2020 |
Link(s)
Abstract
Retrieving relevant samples across multiple-modalities is a primary topic that receives consistently research interests in multimedia communities, and has benefited various real-world multimedia applications (e.g., text-based image searching). Current models mainly focus on learning a unified visual semantic embedding space to bridge visual contents & text query, targeting at aligning relevant samples from different modalities as neighbors in the embedding space. However, these models did not consider relations between visual components in learning visual representations, resulting in their incapability of distinguishing images with the same visual components but different relations (i.e., Figure 1). To precisely modeling visual contents, we introduce a novel framework that enhanced visual representation with relations between components. Specifically, visual relations are represented by the scene graph extracted from an image, then encoded by the graph convolutional neural networks for learning visual relational features. We combine the relational and compositional representation together for image-text retrieval. Empirical results conducted on the challenging MS-COCO and Flicker 30K datasets demonstrate the effectiveness of our proposed model for cross-modal retrieval task.
Research Area(s)
- Image-text retrieval, Scene graph, Visual relation, Visual-semantic embedding
Citation Format(s)
Visual Relations Augmented Cross-modal Retrieval. / Guo, Yutian; Chen, Jingjing; Zhang, Hao; Jiang, Yu-Gang.
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval. Association for Computing Machinery, 2020. p. 9-15 (ICMR - Proceedings of the International Conference on Multimedia Retrieval).Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45) › 32_Refereed conference paper (with ISBN/ISSN) › peer-review