Video hyperlinking for multimedia search
基於視頻超鏈接的多媒體檢索
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 4 Oct 2010 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(65692fed-0b24-4de5-b589-f3a3280bf457).html |
---|---|
Other link(s) | Links |
Abstract
With the spread of Web 2.0, web videos have become prevalent online. There is
a growing need for effective modeling and organization of video data to facilitate
browsing or retrieval. Meanwhile, the modeling of web pages through hyperlink
graph has seen tremendous success with algorithms such as PageRank for web
page ranking. Casting the same solution for processing video documents is nevertheless
met with many challenges. In contrast to web pages, hyperlinks between
videos do not exist in reality and the automatic creation of such links to interconnect
related videos is a fundamental and yet important step for a variety of
multimedia tasks. Furthermore, web videos are heterogeneous documents accompanied
by various user-supplied information sources. In this thesis, we investigate
two research issues related to video hyperlinking: (a) construction of video hyperlink
graph by means of partial near-duplicate links and (b) fusion of multiple
hyperlink graphs constructed from auxilliary modalities of video documents.
The main focus of this thesis is on the first issue which investigates how to
create partial-duplicate links among videos to form a media network inter-relating
different portions of videos. We consider the mining and localization of nearduplicate
segments at arbitrary positions of partial near-duplicate videos in a
corpus. Scalable detection is achieved by jointly considering visual similarity and
temporal consistency where temporal constraints are embedded into a network
structure as directed edges. Through the structure, partial alignment is novelly
converted into a network flow problem where highly efficient solutions exist. To
handle multiple alignments, we consider two properties of network structure:
conciseness and divisibility, for efficient and effective mining. To ensure precision,
frame-level matching is further integrated in the temporal network for alignment verification. This results in an iterative alignment-verification procedure to fine
tune the localization of near-duplicate segments.
In the second issue, multiple relational graphs are considered. Besides partialduplicate
links, different context graphs are constructed using heterogeneous
sources of information associated to videos, i.e., user tags, descriptions and titles.
The difficulties of fusing these graphs come from the fact that the modalities
derived from web videos are often noisy, diverse and conflicting with each other.
We investigate how the agreement among heterogeneous modalities can be exploited
to guide data fusion. The problem of fusion is cast as the simultaneous
mining of agreement from different modalities and adaptation of fusion weights
to construct a fused graph from these modalities. An iterative framework based
on agreement-fusion optimization is thus proposed. We plug in two well-known
algorithms: random walk and semi-supervised learning to this framework for illustrating
the idea of how agreement (conflict) is incorporated (compromised) in
the case of uniform and adaptive fusion.
- Database searching, Multimedia systems