As bandwidth accessible to average users is increasing, video has become one
of the fastest growing data type in Internet. Especially with the popularity of social
media, there has been exponential growth in videos available on the Web. Among
these huge volumes of videos, there exist large numbers of near-duplicates and copies.
Near-duplicates (NDs) carry both blessing and redundant signals. For example, ND
provides rich visual clue for indexing and summarizing broadcast videos from different
channels. On the other hand, the excessive amount of ND makes browsing web
videos streamed over Internet an extremely time-consuming task. In this thesis, we
conduct studies to research the algorithms and techniques for large-scale mining of
near-duplicate videos. The issues we study include feature indexing and matching,
pattern modeling, scalable detection and mining for different emerging multimedia applications.
We begin by investigating the matching algorithm of near-duplicate keyframes
(NDK) using local interest points (LIP). We propose one-to-one symmetric (OOS)
matching algorithm for fault-tolerant based NDK identification. To deal with indexing
problem, we also present a new indexing algorithm, named LIP-IS, for efficient
LIP matching. Comparative study is given to contrast LIP-IS with a more popular
indexing technique, Locality Sensitive Hashing (LSH). To model the patterns formed
by LIP matching, we propose two different techniques, supervised and unsupervised,
for efficient learning. In the supervised version, we capture LIP matching pattern as
a histogram and adopt support vector machines (SVM) to learn the pattern for NDK
classification. In the unsupervised version, we novelly propose an entropy-based evaluation
to assess the degree of near-duplicate. Two algorithms, named pattern entropy
(PE) and scale-rotation invariant PE (SR-PE), are studied accordingly for unsupervised
learning of LIP matching pattern.
In view that LIP matching is computationally expensive, especially when there are
excessive number of LIP to match between two keyframes, we further study visual keyword (VK) matching for NDK identification. VK models LIP in a keyframe as a
bag-of-words, and each keyframe is represented as a histogram of words. Thus, instead
of performing nearest neighbor based matching, direct matching of words can be efficiently
conducted. Especially with the indexing structure: inverted file index, and the
refinement technique: Hamming embedding, LIP collected from large video dataset
can be efficiently indexed, searched and matched. Nevertheless, the effectiveness and
accuracy of matching remains a problem for VK. We thus study different algorithms
and techniques to improve the matching effectiveness of LIP. These include enhanced
weak geometric consistency checking (E-WGC) algorithm for matching verification,
and reverse entropy (RE) technique for frame aggregation.
Finally, we demonstrate our works for two applications: video copy detection,
and automatic video tagging. In copy detection, we demonstrate very competitive
detection performance compared to state-of-the-art reported results on TRECVID copy
detection benchmark. In video tagging, we demonstrate two types of tagging: crosssource
tagging to automatically mine tags from one search engine to enrich tags of
videos in another search engine; and cross-time tagging to automatically tag newly
uploaded videos according to the historical video data collected over years.
| Date of Award | 4 Oct 2010 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Chong Wah NGO (Supervisor) |
|---|
- Data mining
- Multimedia systems
Algorithms and techniques for large-scale near-duplicate video mining
ZHAO, W. (Author). 4 Oct 2010
Student thesis: Doctoral Thesis