Exploiting Interaction and Co-occurrence of Multi-modal Features for Early Fusion in Retrieval

Project: Research

View graph of relations


In this project, early fusion is investigated by assessing the interaction and co-occurrence of different modalities prior to semantic retrieval. Specifically, the inter-dependency of speech-based text and visual tokens is exploited by interacting and coupling each other for concept detection during retrieval. The concepts basically serve as semantic filters in answering multimedia queries. When investigating multi-modal feature interaction, multiple instance learning is proposed, which allows the learning of visual-text association from a small set of partially labelled examples. Most importantly, the issues of many-to-one region-to-word mapping are addressed through visual flow characterization and Expectation-Maximization algorithm. This is in contrast to the state-of-the-art techniques that mostly rely on the one-to-one region-to-word translation. In addition to modality interaction, event-based sensory retrieval is inspected by characterizing high-level concepts through the co-clustering of multi-modal features. This allows for the investigation of the co-utilization of different modalities when answering difficult queries that involve multimedia concepts.


Project number9041150
Grant typeGRF
Effective start/end date1/01/071/03/10