Large scale semantic concept detection, fusion, and selection for domain adaptive video search

大規模語義概念的檢測, 融合及選擇進行數據域自適應視頻檢索

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Yugang JIANG

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date2 Oct 2009

Abstract

This thesis investigates the problem of video search based on semantic concepts. We present approaches to handle three correlated issues that are critical to this problem: (1) how to construct an e®ective feature representation for semantic concept detection, (2) how to exploit semantic context to improve the detection of these concepts, and (3) how to select the most suitable concept detectors to answer user queries. In particular, as the target videos may come from di®erent domains (genres or sources) with distinctive data characteristics, for each of the issues, we will need to cope with the domain changes. Video frames are represented by bag-of-visual-words (BoW) derived from local keypoint features, which are invariant to rotation, scale and illumination. We ¯rst conduct a comprehensive study on the representation choices of BoW, including vocabulary size, weighting scheme, stop word removal, feature selection, spatial information, and visual bi-gram. The aim is to o®er practical insights in how these choices will impact the performance of BoW for semantic concept detec- tion. We also show how to further augment the BoW representation by exploring the linguistic and ontological aspects of visual words. A visual-word ontology is constructed to hierarchically specify their hyponym relationship, which is incor- porated into BoW for improved video frame representation. To exploit semantic context, we develop a novel and e±cient domain adap- tive semantic di®usion algorithm. Inter-concept relationship is modeled using a semantic graph, which treats concepts as nodes and the concept a±nities as the weights of edges. It is then applied to re¯ne the initial detection results through a function level graph di®usion process, aiming to recover the consistency and smoothness of the detection results over the graph. To handle the domain change between training and test sets, our algorithm involves a graph adaptation pro- cess which iteratively re¯nes the concept a±nity based on the target domain data characteristics. This algorithm is e±cient and scalable to large scale data sets. For the selection of concept detectors, we focus on exploring heterogeneous knowledge sources for better measurement of query-detector similarity. Instead of using WordNet as in most existing works, we exploit the context information associated with Flickr images to estimate the similarity between queries and con- cept detectors. This similarity measure, named FCS, re°ects the word correlation in images rather than text corpora. With an initial detector set selected by FCS for each query, we further propose a semantic context transfer algorithm that adapts the query-detector similarity to a target data set. The adaptation process is highly e±cient, satisfying the critical requirement of online video search. We evaluate all the proposed techniques on large scale video search bench- marks provided by TRECVID from years 2005 to 2008. Experimental evalua- tions demonstrate promising results of our techniques, and their potential to be applied to other applications such as visual object categorization and web scale image retrieval.

    Research areas

  • Optical pattern recognition, Digital techniques, Image processing