Multimedia search by self, external, and crowdsourcing knowledge

多媒體搜索 : 從原始對象, 相關資源到群體智慧分析

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Ting YAO

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date3 Oct 2014

Abstract

This thesis investigates the problem of multimedia search under the umbrella of knowledge transfer by considering three cases: 1) how to exploit visual patterns from the initial ranked list to boost search precision, 2) how to leverage the external knowledge as a prior to help the search, and 3) how to explore the largely available click-through data (i.e., crowdsourcing human intelligence) for annotation and search. A common practice for improving search performance is to rerank the initial visual documents returned from a search engine by seeking consensus from various visual features. We propose a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities for improving search performance, following the philosophy that strong performing modality could learn from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically, circular reranking conducts multiple runs of random walks through exchanging the ranking scores among different features in a cyclic manner. Moreover, we study several properties of circular reranking, including how and which order of information propagation should be configured to fully exploit the potential of modalities for reranking. For the transfer of external knowledge, we first systematically analyze the different factors that lead to the success and failure of transferring classifiers. A simple yet innovative and practical model is proposed for predicting the transfer from the clues such as the distribution shift of data, concept category and concept contextual relationship. Next, we develop the semi-supervised domain adaptation with subspace learning and transfer RankBoost algorithms for one-to-one domain adaptation and multiple-to-one domain adaptation, respectively. The former aims to jointly explore invariant low-dimensional structures across domains to correct data distribution mismatch and leverage available unlabeled target examples to exploit the underlying intrinsic information in the target domain. The later extends the generic RankBoost learning framework for transferring knowledge from multiple sources. To investigate the use of click-through data, we devise a novel video similarity measurement based on polynomial semantic indexing. Two mappings to project queries and video documents into a common latent space are learnt by minimizing the margin ranking loss of the observed query-video pairs on the click-through bipartite. Then the dot product in the latent space is taken as the similarity function between videos and the video similarity is further applied for three major tasks in video tagging: tag assignment, ranking, and enrichment. Later, to bridge the user intention gap and allow direct comparison of text queries and visual images, click-through-based cross-view learning approach is presented for image search. The objective is formalized as a latent space learning by jointly minimizing the distance between the mappings of query and image in the latent space and preserving the inherent structure in each original space. We evaluate all the proposed techniques on several large-scale real-world image and video datasets. Experimental evaluations demonstrate promising results of our techniques, and their advantages to be applied to various multimedia search applications.

    Research areas

  • Multimedia systems, Database searching