Video hyperlinking for multimedia search

基於視頻超鏈接的多媒體檢索

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Hung Khoon TAN

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date4 Oct 2010

Abstract

With the spread of Web 2.0, web videos have become prevalent online. There is a growing need for effective modeling and organization of video data to facilitate browsing or retrieval. Meanwhile, the modeling of web pages through hyperlink graph has seen tremendous success with algorithms such as PageRank for web page ranking. Casting the same solution for processing video documents is nevertheless met with many challenges. In contrast to web pages, hyperlinks between videos do not exist in reality and the automatic creation of such links to interconnect related videos is a fundamental and yet important step for a variety of multimedia tasks. Furthermore, web videos are heterogeneous documents accompanied by various user-supplied information sources. In this thesis, we investigate two research issues related to video hyperlinking: (a) construction of video hyperlink graph by means of partial near-duplicate links and (b) fusion of multiple hyperlink graphs constructed from auxilliary modalities of video documents. The main focus of this thesis is on the first issue which investigates how to create partial-duplicate links among videos to form a media network inter-relating different portions of videos. We consider the mining and localization of nearduplicate segments at arbitrary positions of partial near-duplicate videos in a corpus. Scalable detection is achieved by jointly considering visual similarity and temporal consistency where temporal constraints are embedded into a network structure as directed edges. Through the structure, partial alignment is novelly converted into a network flow problem where highly efficient solutions exist. To handle multiple alignments, we consider two properties of network structure: conciseness and divisibility, for efficient and effective mining. To ensure precision, frame-level matching is further integrated in the temporal network for alignment verification. This results in an iterative alignment-verification procedure to fine tune the localization of near-duplicate segments. In the second issue, multiple relational graphs are considered. Besides partialduplicate links, different context graphs are constructed using heterogeneous sources of information associated to videos, i.e., user tags, descriptions and titles. The difficulties of fusing these graphs come from the fact that the modalities derived from web videos are often noisy, diverse and conflicting with each other. We investigate how the agreement among heterogeneous modalities can be exploited to guide data fusion. The problem of fusion is cast as the simultaneous mining of agreement from different modalities and adaptation of fusion weights to construct a fused graph from these modalities. An iterative framework based on agreement-fusion optimization is thus proposed. We plug in two well-known algorithms: random walk and semi-supervised learning to this framework for illustrating the idea of how agreement (conflict) is incorporated (compromised) in the case of uniform and adaptive fusion.

    Research areas

  • Database searching, Multimedia systems