Graph-Based Modeling and Sparse Feature Fusion for Human Activity Analysis

基於圖建模和稀疏特徵融合的人體活動分析

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date11 Aug 2017

Abstract

Motion capture has a lot of applications in the real world. However, motion capture data has large spatio-temporal variations, thus it is challenging to encode the human activity for analysis. In this thesis, using the two types of motion capture technologies, marker-based motion capture and depth-based motion capture (depth camera), we propose novel approaches to model the captured data for different activity analysis.

First, we propose different graph models to characterize the skeletal activity data. These graph models have three parts, namely structures of graphs, edge attributes of graphs and vertex attributes of graphs. In particular, we propose different graph models to represent different skeletal activity data. The structures of these graphs are based on different ways of selection of top-N important joint pairs, including the top-N Relative Ranges of Joint Relative Distances (RRJRDs), the top-N single-person Relative Variance of Joint Relative Distances (RVJRDs) and the top-N person-person RVJRDs. Then we use the value of R/VJRD to attribute the edges of these graph models and use the proposed temporal pyramid covariance on joint relative locations to attribute the vertexes of these graph models.

Second, we propose different graph kernels to measure the similarity of their corresponding graphs respectively. In order to measure the similarity between two graphs, we should evaluate the similarities of their structures, their vertex attributes and edge attributes. First, we define the edge kernel and vertex kernel to measure the similarity of the edge kernel and vertex kernel respectively. Then, we construct the walk kernel based on the edge kernel and vertex kernel to measure the similarity of walks with the same length. Finally, we use all the walk kernels to construct the graph kernel, since the walks can reflect the structure of graph. Furthermore, multiple kernel learning methods are applied to determine the optimal weights for combining the graph kernels to measure the overall similarity between two graphs. Then, based on graph kernel matching, we propose different mechanisms for different applications including human activity retrieval, segmentation and skeletal activity recognition.

Third, we propose the efficient multi-modal feature fusion to recognize the human activities captured by depth cameras. For multi-modal single-person activity recognition, to represent the dynamics and appearance of the human body parts, we extract the part-based feature from the skeletal and depth map based data. Then, we propose a sparse part-based feature fusion model for action recognition. Our model can select the discriminative part-based features by imposing the importance of the sharable and specific structures of these part-based features to model each single-person activity. Finally, we use the sparse fused features for multi-modal single-person activity recognition. For multi-modal person-person activity recognition, we introduce the novel multi-modal pairwise features for describing the interaction between each two people. Then, we propose a sparse pairwise feature fusion model for representing person-person activities and use the final features learned from our fusion model for recognizing person-person activities.

In this thesis, we propose the novel approaches to model the 3D human activity with the marker-based motion capture and depth-based motion capture, while the efficient mechanisms of human activity retrieval, segmentation and activity recognition are developed based on these approaches. The experimental results show that our approaches are robust under several variations, and demonstrates superior performance in comparison to other state-of-the-art methods. With the proposed approaches, the user can either retrieve human activity from large motion capture databases or segment each long 3D human activity sequence into different types of activities. Moreover, the proposed activity recognition algorithms have potential applications in many fields such as intelligent surveillance, virtual reality, human-computer interaction, etc.