Semantic-based video retrieval has long been recognized as one of the hardest problems in multimedia computing. The challenges include 1) the lack of coincidence between low-level features and user expectations, which gives rise to the problem of "semantic gap", 2) most users get used to text-based queries, while the effectiveness
of searching for videos with a few text keywords remains questionable. One popular
search methodology which addresses these challenges is Concept-based Video Search
(CBVS), where a set of semantic concept detectors are developed for predicting query
semantics. Under CBVS, detectors which could interpret search intentions are selected and fused by reasoning for query answering. This thesis addresses three main
open research issues related to CBVS in concept detector selection and fusion, specifically which and how many detectors should be selected for answering a given text
query, and how to fuse them. Two novel spaces, namely semantic space and context
space, are proposed and developed to provide computable platforms for inter-concept
relationship modeling and reasoning. With these spaces, detectors can be uniformly
reasoned and fused together for large-scale video search.
This thesis first proposes a novel construction of semantic space to determine
concept similarity globally. In contrast to conventional ontology reasoning such as
WordNet, this space enables a uniform and global similarity to measure inter-concept
relationship. In this space, basis vectors are formed by modeling ontological relationship among concepts. Each concept is represented as a vector for measuring
similarity. Because ontology knowledge is taken into account when building the semantic space, we call the space "ontology enriched". We propose two variants of
semantic space by considering the orthogonality property of the space. The first space is named Ontology-enriched Semantic Space (OSS), while the second space is
called Ontology-enriched Orthogonal Semantic Space (OS2). Both OSS and OS2 are
successfully demonstrated for several tasks including concept detector selection, word
sense disambiguation and search.
In addition to semantic space, context space is proposed to address the fact that
semantic concepts do not exist in isolation but are correlated to each other. Using
such context relationship can also greatly facilitate concept selection and fusion. The
developed context space considers the global consistency of concept relationships, addresses the problem of missing annotation, and is extensible for cross-domain detector
fusion. The space can be built by modeling the inter-concept relationship through
annotation provided by either manual labeling or machine tagging. Context space
has been successfully demonstrated for the task of Context-based Concept Fusion
(CBCF) in both concept detector development and search.
With the semantic space and context space, a novel multi-level fusion framework
is then proposed for CBVS. The framework for answering queries considers different
aspects including the semantic, context, reliability and diversity of detectors. In the
concept selection step, the number of appropriate detectors is adaptively determined
by joint reasoning in semantic and context spaces. In the fusion step, the selected
detectors are combined hierarchically where each level of fusion emphasizes one aspect
of the detectors. Experimental results obtained using our methodology on TRECVID
datasets of years 2005 to 2008 have been very encouraging and demonstrate state-of-the-art performance for CBVS.
| Date of Award | 2 Oct 2009 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Chong Wah NGO (Supervisor) |
|---|
- Optical pattern recognition
- Image processing
- Digital techniques
Concept-based video search by semantic and context reasoning
WEI, X. (Author). 2 Oct 2009
Student thesis: Doctoral Thesis