Large scale semantic concept detection, fusion, and selection for domain adaptive video search
大規模語義概念的檢測, 融合及選擇進行數據域自適應視頻檢索
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 2 Oct 2009 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(ac130d8c-9a2c-40e2-ada1-8a42ecc177ef).html |
---|---|
Other link(s) | Links |
Abstract
This thesis investigates the problem of video search based on semantic concepts.
We present approaches to handle three correlated issues that are critical to this
problem: (1) how to construct an e®ective feature representation for semantic
concept detection, (2) how to exploit semantic context to improve the detection of
these concepts, and (3) how to select the most suitable concept detectors to answer
user queries. In particular, as the target videos may come from di®erent domains
(genres or sources) with distinctive data characteristics, for each of the issues, we
will need to cope with the domain changes.
Video frames are represented by bag-of-visual-words (BoW) derived from local
keypoint features, which are invariant to rotation, scale and illumination. We ¯rst
conduct a comprehensive study on the representation choices of BoW, including
vocabulary size, weighting scheme, stop word removal, feature selection, spatial
information, and visual bi-gram. The aim is to o®er practical insights in how
these choices will impact the performance of BoW for semantic concept detec-
tion. We also show how to further augment the BoW representation by exploring
the linguistic and ontological aspects of visual words. A visual-word ontology is
constructed to hierarchically specify their hyponym relationship, which is incor-
porated into BoW for improved video frame representation.
To exploit semantic context, we develop a novel and e±cient domain adap-
tive semantic di®usion algorithm. Inter-concept relationship is modeled using a
semantic graph, which treats concepts as nodes and the concept a±nities as the
weights of edges. It is then applied to re¯ne the initial detection results through
a function level graph di®usion process, aiming to recover the consistency and
smoothness of the detection results over the graph. To handle the domain change between training and test sets, our algorithm involves a graph adaptation pro-
cess which iteratively re¯nes the concept a±nity based on the target domain data
characteristics. This algorithm is e±cient and scalable to large scale data sets.
For the selection of concept detectors, we focus on exploring heterogeneous
knowledge sources for better measurement of query-detector similarity. Instead
of using WordNet as in most existing works, we exploit the context information
associated with Flickr images to estimate the similarity between queries and con-
cept detectors. This similarity measure, named FCS, re°ects the word correlation
in images rather than text corpora. With an initial detector set selected by FCS
for each query, we further propose a semantic context transfer algorithm that
adapts the query-detector similarity to a target data set. The adaptation process
is highly e±cient, satisfying the critical requirement of online video search.
We evaluate all the proposed techniques on large scale video search bench-
marks provided by TRECVID from years 2005 to 2008. Experimental evalua-
tions demonstrate promising results of our techniques, and their potential to be
applied to other applications such as visual object categorization and web scale
image retrieval.
- Optical pattern recognition, Digital techniques, Image processing