Learning Representations from Skeletal Self-Similarities for Cross-View Action Recognition

Zhanpeng Shao, Youfu Li, Hong Zhang*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Existing research attention in vision-based action recognition is generally paid on recognizing actions from the same views seen in the training data. One of the big challenges in action recognition lies in the large variations of action representations as actions are captured from totally different viewpoints. This paper addresses this problem by learning view-invariant representations from skeletal self-similarities of varying scales with a very light multi-stream neural network (MSNN). As human skeletons have been proved to be an effective feature modality used for action recognition and are easy to obtain, we first create a view-invariant action description by formulating skeletal self-similarities at each frame as an image (SSI), which can show a high structural stability under view changes. Accordingly, a MSNN is designed based on 3D CNN and LSTM units to learn representations from SSIs of multiple scales, where the scheme of multiple scales provides our method with a good robustness to view changes. In addition, we integrate the computation of SSIs into the MSNN by wrapping it as a custom learnable layer thanks to its simplicity, instead of normalizing and transforming skeletons using a hand-crafted preprocessing. Extensive experimental evaluations on three challenging cross-view datasets demonstrate the effectiveness of our proposed method, which achieves superior performance to the state-of-the-art algorithms on cross-view recognition. The source code of this work will be released shortly to facilitate future studies in this field.
Original languageEnglish
Article number8955925
Pages (from-to)160-174
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume31
Issue number1
Online published10 Jan 2021
DOIs
Publication statusPublished - Jan 2021

Research Keywords

  • Cross-view action recognition
  • human skeleton
  • multi-stream neural network
  • self-similarity
  • view-invariant representation

Fingerprint

Dive into the research topics of 'Learning Representations from Skeletal Self-Similarities for Cross-View Action Recognition'. Together they form a unique fingerprint.

Cite this