TY - JOUR
T1 - Multi-view depth-based pairwise feature learning for person-person interaction recognition
AU - Li, Meng
AU - Leung, Howard
PY - 2019/3
Y1 - 2019/3
N2 - This paper addresses the problem of recognizing person-person interaction using multi-view data captured by depth cameras. Due to the complex spatio-temporal structure of interaction between two persons, it is difficult to characterize different classes of person-person interactions for recognition. To handle this difficulty, we divide each person-person interaction into body part interactions, and analyze the person-person interaction using the pairwise features of these body part interactions. We first make use of two features for representing the relative movement and local physical contact between the body parts of two people and extract the pairwise features to characterize the corresponding body part interaction. For processing each camera view, we propose a regression-based learning approach with a sparsity inducing regularizer to model each person-person interaction as the combination of pairwise features for a sparse set of body part interactions. To take full advantage of the information in all depth camera views, we further extend the proposed interaction learning model to combine features from multi-views to order to increase the recognition performance. Our approach is evaluated on three public activity recognition datasets captured with depth cameras. Experimental results on the three datasets have demonstrated the efficacy of the proposed method.
AB - This paper addresses the problem of recognizing person-person interaction using multi-view data captured by depth cameras. Due to the complex spatio-temporal structure of interaction between two persons, it is difficult to characterize different classes of person-person interactions for recognition. To handle this difficulty, we divide each person-person interaction into body part interactions, and analyze the person-person interaction using the pairwise features of these body part interactions. We first make use of two features for representing the relative movement and local physical contact between the body parts of two people and extract the pairwise features to characterize the corresponding body part interaction. For processing each camera view, we propose a regression-based learning approach with a sparsity inducing regularizer to model each person-person interaction as the combination of pairwise features for a sparse set of body part interactions. To take full advantage of the information in all depth camera views, we further extend the proposed interaction learning model to combine features from multi-views to order to increase the recognition performance. Our approach is evaluated on three public activity recognition datasets captured with depth cameras. Experimental results on the three datasets have demonstrated the efficacy of the proposed method.
KW - Depth camera
KW - Multi-view
KW - Pairwise feature
KW - Person-person interaction recognition
KW - Regression-based learning
UR - http://www.scopus.com/inward/record.url?scp=85042116949&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85042116949&origin=recordpage
U2 - 10.1007/s11042-018-5738-6
DO - 10.1007/s11042-018-5738-6
M3 - RGC 21 - Publication in refereed journal
SN - 1380-7501
VL - 78
SP - 5731
EP - 5749
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 5
ER -