Multi-view depth-based pairwise feature learning for person-person interaction recognition

Meng Li*, Howard Leung

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

13 Citations (Scopus)

Abstract

This paper addresses the problem of recognizing person-person interaction using multi-view data captured by depth cameras. Due to the complex spatio-temporal structure of interaction between two persons, it is difficult to characterize different classes of person-person interactions for recognition. To handle this difficulty, we divide each person-person interaction into body part interactions, and analyze the person-person interaction using the pairwise features of these body part interactions. We first make use of two features for representing the relative movement and local physical contact between the body parts of two people and extract the pairwise features to characterize the corresponding body part interaction. For processing each camera view, we propose a regression-based learning approach with a sparsity inducing regularizer to model each person-person interaction as the combination of pairwise features for a sparse set of body part interactions. To take full advantage of the information in all depth camera views, we further extend the proposed interaction learning model to combine features from multi-views to order to increase the recognition performance. Our approach is evaluated on three public activity recognition datasets captured with depth cameras. Experimental results on the three datasets have demonstrated the efficacy of the proposed method.
Original languageEnglish
Pages (from-to)5731–5749
JournalMultimedia Tools and Applications
Volume78
Issue number5
Online published14 Feb 2018
DOIs
Publication statusPublished - Mar 2019

Research Keywords

  • Depth camera
  • Multi-view
  • Pairwise feature
  • Person-person interaction recognition
  • Regression-based learning

Fingerprint

Dive into the research topics of 'Multi-view depth-based pairwise feature learning for person-person interaction recognition'. Together they form a unique fingerprint.

Cite this