Deep spectral feature pyramid in the frequency domain for long-term action recognition

Gaoyun An, Zhenxing Zheng*, Dapeng Wu, Wen Zhou

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

6 Citations (Scopus)

Abstract

In this paper, we propose a novel Deep Spectral Feature Pyramid in the Frequency domain (DSFP) to share the merits of deep features and spectral approaches for long-term action recognition. More specifically, in the spatial domain, deep features of sparse sampled frames are extracted by Convolutional Neural Networks (CNNs) to cover long-term temporal structure. In the frequency domain, appearance features of sampled frames are partitioned recursively along the time dimension and spectral transform is applied to each partitioned feature respectively. All coefficients of partitioned features are then concatenated into a video-level feature to better model the spatio-temporal structure of actions in the form of a pyramid. So DSFP could model actions from both microcosmic and macroscopic aspects. Extensive experiments conducted on two challenging action benchmarks UCF101 and HMDB51 show that our proposed DSFP is effective for spatio-temporal representation of actions and achieves comparable performance with the state-of-the-arts.
Original languageEnglish
Article number102650
JournalJournal of Visual Communication and Image Representation
Volume64
Online published17 Sept 2019
DOIs
Publication statusPublished - Oct 2019
Externally publishedYes

Research Keywords

  • Action recognition
  • Deep learning
  • Spectral feature
  • Video classification

Fingerprint

Dive into the research topics of 'Deep spectral feature pyramid in the frequency domain for long-term action recognition'. Together they form a unique fingerprint.

Cite this