Coherent bag-of audio words model for efficient large-scale video copy detection

Yang Liu, Wan-Lei Zhao, Chong-Wah Ngo, Chang-Sheng Xu, Han-Qing Lu

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

48 Citations (Scopus)

Abstract

Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equiv-alently important as well as visual information in multimedia processing. Firstly, inspired by bag-of visual words model, a bag-of audio words (BoA) representation is proposed to characterize each audio frame. Different from naive single-based modeling audio retrieval approaches, BoA is a high-level model due to its perceptual and semantical property. Within the BoA model, a coherency vocabulary indexing structure is adopted to achieve more efficient and effective indexing than single vocabulary of standard BoW model. The coherency vocabulary takes advantage of multiple audio features by computing co-occurrence of them across different feature spaces. By enforcing the tight coherency constraint across feature spaces, coherency vocabulary makes the BoA model more discriminative and robust to various audio transforms. 2D Hough transform is then applied to aggregate scores from matched audio segments. The segements fall into the peak bin is identified as the copy segments in reference video. In addition, we also accomplish video copy detection from both audio and visual cues by performing four late fusion strategies to demonstrate complementarity of audio and visual information in video copy detection. Intensive experiments are conducted on the large-scale dataset of TRECVID 2009 and competitve results are achieved. Copyright © 2010 ACM.
Original languageEnglish
Title of host publicationCIVR 2010 - 2010 ACM International Conference on Image and Video Retrieval
Pages89-96
DOIs
Publication statusPublished - 2010
EventACM International Conference on Image and Video Retrieval, ACM-CIVR 2010 - Xi'an, China
Duration: 5 Jul 20107 Jul 2010

Conference

ConferenceACM International Conference on Image and Video Retrieval, ACM-CIVR 2010
Country/TerritoryChina
CityXi'an
Period5/07/107/07/10

Research Keywords

  • Audio words
  • Coherency vocabulary
  • Copy detection

Fingerprint

Dive into the research topics of 'Coherent bag-of audio words model for efficient large-scale video copy detection'. Together they form a unique fingerprint.

Cite this