HNIP : Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

28 Scopus Citations
View graph of relations

Author(s)

  • Jie Lin
  • Ling-Yu Duan
  • Yan Bai
  • Yihang Lou
  • Vijay Chandrasekhar
  • Tiejun Huang
  • Alex Kot
  • Wen Gao

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number7944594
Pages (from-to)1968-1983
Journal / PublicationIEEE Transactions on Multimedia
Volume19
Issue number9
Online published8 Jun 2017
Publication statusPublished - Sep 2017

Abstract

With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEGCDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated CNN descriptors and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.

Research Area(s)

  • Convolutional neural networks (CNNs), deep global descriptors, handcrafted descriptors, hybrid nested invariance pooling, MPEG CDVA, MPEG CDVS

Citation Format(s)

HNIP : Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval. / Lin, Jie; Duan, Ling-Yu; Wang, Shiqi; Bai, Yan; Lou, Yihang; Chandrasekhar, Vijay; Huang, Tiejun; Kot, Alex; Gao, Wen.

In: IEEE Transactions on Multimedia, Vol. 19, No. 9, 7944594, 09.2017, p. 1968-1983.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review