TY - JOUR
T1 - Deep spectral feature pyramid in the frequency domain for long-term action recognition
AU - An, Gaoyun
AU - Zheng, Zhenxing
AU - Wu, Dapeng
AU - Zhou, Wen
PY - 2019/10
Y1 - 2019/10
N2 - In this paper, we propose a novel Deep Spectral Feature Pyramid in the Frequency domain (DSFP) to share the merits of deep features and spectral approaches for long-term action recognition. More specifically, in the spatial domain, deep features of sparse sampled frames are extracted by Convolutional Neural Networks (CNNs) to cover long-term temporal structure. In the frequency domain, appearance features of sampled frames are partitioned recursively along the time dimension and spectral transform is applied to each partitioned feature respectively. All coefficients of partitioned features are then concatenated into a video-level feature to better model the spatio-temporal structure of actions in the form of a pyramid. So DSFP could model actions from both microcosmic and macroscopic aspects. Extensive experiments conducted on two challenging action benchmarks UCF101 and HMDB51 show that our proposed DSFP is effective for spatio-temporal representation of actions and achieves comparable performance with the state-of-the-arts.
AB - In this paper, we propose a novel Deep Spectral Feature Pyramid in the Frequency domain (DSFP) to share the merits of deep features and spectral approaches for long-term action recognition. More specifically, in the spatial domain, deep features of sparse sampled frames are extracted by Convolutional Neural Networks (CNNs) to cover long-term temporal structure. In the frequency domain, appearance features of sampled frames are partitioned recursively along the time dimension and spectral transform is applied to each partitioned feature respectively. All coefficients of partitioned features are then concatenated into a video-level feature to better model the spatio-temporal structure of actions in the form of a pyramid. So DSFP could model actions from both microcosmic and macroscopic aspects. Extensive experiments conducted on two challenging action benchmarks UCF101 and HMDB51 show that our proposed DSFP is effective for spatio-temporal representation of actions and achieves comparable performance with the state-of-the-arts.
KW - Action recognition
KW - Deep learning
KW - Spectral feature
KW - Video classification
UR - http://www.scopus.com/inward/record.url?scp=85072573793&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85072573793&origin=recordpage
U2 - 10.1016/j.jvcir.2019.102650
DO - 10.1016/j.jvcir.2019.102650
M3 - RGC 21 - Publication in refereed journal
SN - 1047-3203
VL - 64
JO - Journal of Visual Communication and Image Representation
JF - Journal of Visual Communication and Image Representation
M1 - 102650
ER -