Abstract
Existing research attention in vision-based action recognition is generally paid on recognizing actions from the same views seen in the training data. One of the big challenges in action recognition lies in the large variations of action representations as actions are captured from totally different viewpoints. This paper addresses this problem by learning view-invariant representations from skeletal self-similarities of varying scales with a very light multi-stream neural network (MSNN). As human skeletons have been proved to be an effective feature modality used for action recognition and are easy to obtain, we first create a view-invariant action description by formulating skeletal self-similarities at each frame as an image (SSI), which can show a high structural stability under view changes. Accordingly, a MSNN is designed based on 3D CNN and LSTM units to learn representations from SSIs of multiple scales, where the scheme of multiple scales provides our method with a good robustness to view changes. In addition, we integrate the computation of SSIs into the MSNN by wrapping it as a custom learnable layer thanks to its simplicity, instead of normalizing and transforming skeletons using a hand-crafted preprocessing. Extensive experimental evaluations on three challenging cross-view datasets demonstrate the effectiveness of our proposed method, which achieves superior performance to the state-of-the-art algorithms on cross-view recognition. The source code of this work will be released shortly to facilitate future studies in this field.
| Original language | English |
|---|---|
| Article number | 8955925 |
| Pages (from-to) | 160-174 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 31 |
| Issue number | 1 |
| Online published | 10 Jan 2021 |
| DOIs | |
| Publication status | Published - Jan 2021 |
Research Keywords
- Cross-view action recognition
- human skeleton
- multi-stream neural network
- self-similarity
- view-invariant representation
Fingerprint
Dive into the research topics of 'Learning Representations from Skeletal Self-Similarities for Cross-View Action Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver