Abstract
In developing an effective robot-human interaction, one of the indispensable topics is the capability of intelligent robots to recognize the motion behavior of objects or individuals from their visual observations. Motion trajectories generated by objects and individuals provide compact and informative clues to spatiotemporal motion characterization, which can be used in motion perception, representation, and recognition. Undesirable variations in raw data induced by changing viewpoints, noise contamination, or different individuals performing the same actions are among the inherent challenges in trajectory-based motion analysis. An invariant descriptor for motion trajectories can offer substantial advantages over raw data in capturing spatiotemporal features. However, most previous invariant descriptors have been proposed mainly for representing 3-D point trajectories, and thus, are insufficient for motion characterization without considering the 3-D rotations of moving objects. Accordingly, this thesis focuses on the 6-D motion trajectory of a rigid body, which can be parameterized by using a set of 3-D position vectors of a reference point on the rigid body and the 3-D rotations of this body over time. Three invariant and flexible descriptors for representing 6-D rigid body motion trajectories are proposed, to achieve trajectory-based motion recognition. These descriptors are the main contributions of this study.First, 6-D trajectories that represent the same motion can be naturally regarded to have the similar shape. Unlike existing descriptors that involve calculating high-order derivatives, a gradient-based Dual Square-Root Function (DSRF) descriptor is proposed by calculating the well-proven Square-Root Velocity Function (SRVF) of 3-D point and angular trajectories to capture local shape features. To measure the distance between two DSRF descriptor sequences, the optimal rotational alignment can be achieved within a least-square constraint. Second, a kinematic-based Rotational and Relative Velocity (RRV) descriptor is devised to represent rigid body motion trajectories. At each time step, the RRV descriptor calculates the square root velocity vector of the normalized 3-D point trajectory in a local coordinate system as translational invariants and the re-parameterized unit quaternion as rotational invariants. Compared with the DSRF descriptor, the RRV descriptor is advantageous in terms of preserving temporal information in trajectories and capturing the internal relationship between the translational and rotational invariants. Two frameworks that incorporating the proposed DSRF and RRV descriptors are adopted in the context of trajectory-based motion recognition. On the one hand, we follow the template matching framework by calculating pairwise distances between the test descriptor sequence and all training templates. The nearest neighbor (1-NN) is adopted to predict the test sample label. On the other hand, a DSRF/RRV descriptor sequence can be encoded into a statistical vector with the Bag-of-Words (BoW) approach as the input of the Support Vector Machine (SVM) classifier.
Third, 6-D rigid body motion trajectories are transformed into Multi-layer Self-similarity Matrices (MSM) at the trajectory and component levels, where a component indicates the trajectory represented in one dimension. The MSM representation is advantageous in capturing local and global spatiotemporal features at different levels, which exhibits strong invariance to rigid transformations and rich descriptions. Each similarity matrix can be regarded as a gray-scale image, hence, the well-known Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG) features extracted from the MSM are concatenated as the final trajectory descriptor. In the classification stage, SVM with a linear kernel is utilized for multiclass recognition tasks.
The motion analysis of multiple rigid body motion trajectories with the proposed DSRF descriptor is also explored to solve the attractive skeleton-based human action recognition. To effectively simulate human beings, a skeleton is decomposed into five body parts and the joint trajectories are represented in a human body coordinate system. The Most Informative Part (MIP) method is proposed for selecting the salient body parts in each action. A Virtual Rigid Body (VRB) configuration is introduced in each part to improve compactness in the final skeletal representation. In the recognition stage, the template matching and statistical encoding frameworks are also investigated for the skeleton based human action recognition.
Finally, extensive experiments on various public datasets (including sign languages, 3-D handwritten, and human actions) are conducted to evaluate the proposed invariant descriptors for trajectory based motion recognition. Experimental results demonstrate the effectiveness and usefulness of the three proposed descriptors in rigid body motion trajectory representation and recognition.
| Date of Award | 4 Dec 2017 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | You Fu LI (Supervisor) |