Multimedia search by self, external, and crowdsourcing knowledge
多媒體搜索 : 從原始對象, 相關資源到群體智慧分析
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 3 Oct 2014 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(261895cd-1e88-4520-ae29-c39ba7f6b6db).html |
---|---|
Other link(s) | Links |
Abstract
This thesis investigates the problem of multimedia search under the umbrella of
knowledge transfer by considering three cases: 1) how to exploit visual patterns
from the initial ranked list to boost search precision, 2) how to leverage the external
knowledge as a prior to help the search, and 3) how to explore the largely available
click-through data (i.e., crowdsourcing human intelligence) for annotation and
search.
A common practice for improving search performance is to rerank the initial
visual documents returned from a search engine by seeking consensus from various
visual features. We propose a new reranking algorithm, named circular reranking,
that reinforces the mutual exchange of information across multiple modalities for
improving search performance, following the philosophy that strong performing
modality could learn from weaker ones, while weak modality does benefit from
interacting with stronger ones. Technically, circular reranking conducts multiple
runs of random walks through exchanging the ranking scores among different
features in a cyclic manner. Moreover, we study several properties of circular reranking,
including how and which order of information propagation should be
configured to fully exploit the potential of modalities for reranking.
For the transfer of external knowledge, we first systematically analyze the different
factors that lead to the success and failure of transferring classifiers. A
simple yet innovative and practical model is proposed for predicting the transfer
from the clues such as the distribution shift of data, concept category and concept
contextual relationship. Next, we develop the semi-supervised domain adaptation
with subspace learning and transfer RankBoost algorithms for one-to-one domain
adaptation and multiple-to-one domain adaptation, respectively. The former aims
to jointly explore invariant low-dimensional structures across domains to correct data distribution mismatch and leverage available unlabeled target examples to
exploit the underlying intrinsic information in the target domain. The later extends
the generic RankBoost learning framework for transferring knowledge from
multiple sources.
To investigate the use of click-through data, we devise a novel video similarity
measurement based on polynomial semantic indexing. Two mappings to project
queries and video documents into a common latent space are learnt by minimizing
the margin ranking loss of the observed query-video pairs on the click-through
bipartite. Then the dot product in the latent space is taken as the similarity
function between videos and the video similarity is further applied for three major
tasks in video tagging: tag assignment, ranking, and enrichment. Later, to bridge
the user intention gap and allow direct comparison of text queries and visual
images, click-through-based cross-view learning approach is presented for image
search. The objective is formalized as a latent space learning by jointly minimizing
the distance between the mappings of query and image in the latent space and
preserving the inherent structure in each original space.
We evaluate all the proposed techniques on several large-scale real-world image
and video datasets. Experimental evaluations demonstrate promising results of
our techniques, and their advantages to be applied to various multimedia search
applications.
- Multimedia systems, Database searching