Algorithms and techniques for large-scale near-duplicate video mining

  • Wanlei ZHAO

Student thesis: Doctoral Thesis

Abstract

As bandwidth accessible to average users is increasing, video has become one of the fastest growing data type in Internet. Especially with the popularity of social media, there has been exponential growth in videos available on the Web. Among these huge volumes of videos, there exist large numbers of near-duplicates and copies. Near-duplicates (NDs) carry both blessing and redundant signals. For example, ND provides rich visual clue for indexing and summarizing broadcast videos from different channels. On the other hand, the excessive amount of ND makes browsing web videos streamed over Internet an extremely time-consuming task. In this thesis, we conduct studies to research the algorithms and techniques for large-scale mining of near-duplicate videos. The issues we study include feature indexing and matching, pattern modeling, scalable detection and mining for different emerging multimedia applications. We begin by investigating the matching algorithm of near-duplicate keyframes (NDK) using local interest points (LIP). We propose one-to-one symmetric (OOS) matching algorithm for fault-tolerant based NDK identification. To deal with indexing problem, we also present a new indexing algorithm, named LIP-IS, for efficient LIP matching. Comparative study is given to contrast LIP-IS with a more popular indexing technique, Locality Sensitive Hashing (LSH). To model the patterns formed by LIP matching, we propose two different techniques, supervised and unsupervised, for efficient learning. In the supervised version, we capture LIP matching pattern as a histogram and adopt support vector machines (SVM) to learn the pattern for NDK classification. In the unsupervised version, we novelly propose an entropy-based evaluation to assess the degree of near-duplicate. Two algorithms, named pattern entropy (PE) and scale-rotation invariant PE (SR-PE), are studied accordingly for unsupervised learning of LIP matching pattern. In view that LIP matching is computationally expensive, especially when there are excessive number of LIP to match between two keyframes, we further study visual keyword (VK) matching for NDK identification. VK models LIP in a keyframe as a bag-of-words, and each keyframe is represented as a histogram of words. Thus, instead of performing nearest neighbor based matching, direct matching of words can be efficiently conducted. Especially with the indexing structure: inverted file index, and the refinement technique: Hamming embedding, LIP collected from large video dataset can be efficiently indexed, searched and matched. Nevertheless, the effectiveness and accuracy of matching remains a problem for VK. We thus study different algorithms and techniques to improve the matching effectiveness of LIP. These include enhanced weak geometric consistency checking (E-WGC) algorithm for matching verification, and reverse entropy (RE) technique for frame aggregation. Finally, we demonstrate our works for two applications: video copy detection, and automatic video tagging. In copy detection, we demonstrate very competitive detection performance compared to state-of-the-art reported results on TRECVID copy detection benchmark. In video tagging, we demonstrate two types of tagging: crosssource tagging to automatically mine tags from one search engine to enrich tags of videos in another search engine; and cross-time tagging to automatically tag newly uploaded videos according to the historical video data collected over years.
Date of Award4 Oct 2010
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorChong Wah NGO (Supervisor)

Keywords

  • Data mining
  • Multimedia systems

Cite this

'