Modeling of single character motions with temporal sparse representation and Gaussian processes for human motion retrieval and synthesis


Student thesis: Doctoral Thesis

View graph of relations


  • Liuyang ZHOU

Related Research Unit(s)


Awarding Institution
Award date3 Oct 2014


3D motion capture (mocap) is the process to record and digitalize the movement of people or objects. Mocap technology is widely used in computer animation, manmachine interaction games, athletic training and 3D movies, etc. However, it is rather time and manpower consuming to capture human motions as it consists of calibration of the system and post processing of the captured artifacts. Therefore, it is essential to either reuse pre-captured data or develop effective methods to synthesize new motions. To reuse pre-captured data, we need an efficient retrieval mechanism to search for a particular motion from a large corpus. Human motion retrieval has proven to be challenging as human motion is high dimensional in both spatial and temporal domains. Besides, semantically similar motions are not necessarily numerically similar because of the speed variations. With the retrieved similar motions, we propose to synthesize human motion variations for intended applications. However, the joints of the human skeleton are highly correlated based on the articulated skeleton structure and it is challenging to synthesize natural human motions. In this thesis, we develop new methods to address the problem of reusing human motion capture data, which includes three sub-problems, i.e., human motion retrieval, human motion variation synthesis and human posture reconstruction. For human motion retrieval, an effective feature representation plays an important role during the motion matching procedure. In this thesis, we propose to learn features from motion data instead of designing features since hand-crafted features are not comprehensive enough to represent different kinds of motions. Motivated by the recent advancement of sparse representation which is commonly used to solve computer vision problems, we propose a temporal sparse representation (TSR) for human motion retrieval. Compared with existing methods that adopt sparse representation, our TSR encodes the temporal information within motions and thus generates a more compact and discriminative representation. In addition, we propose a spatial temporal pyramid matching (STPM) kernel based on TSR, which can be used for logical comparison between motions. Our STPM improves the effectiveness of motion retrieval in terms of accuracy and speed. To allow the user to retrieve desired motions in a natural and intuitive way, we develop a touch-less interactive human motion retrieval system. The system allows the user to specify the query motion by performing it directly with Kinect. Besides, the user interacts with the retrieval system using gestures so no controller is needed and the system delivers a natural user interface. With the retrieved similar motions, we synthesize variations that can be used for intended applications. Human motion variation synthesis is important for crowd simulation and interactive applications to enhance the synthesis quality. Here, we propose a novel generative probabilistic model to synthesize variations of human motion with the retrieved similar motions. Our key idea is to model the conditional distribution of each joint via a multivariate Gaussian Process model, namely Semiparametric Latent Factor Model (SLFM). SLFM can effectively model the correlations between degrees of freedom (DOFs) of joints rather than dealing each DOF separately as implemented in existing methods. Detailed evaluation is performed to show the proposed approach can effectively synthesize variations of different types of motions. Motions generated by our method show a richer variations compared to those generated by existing methods. Besides retrieving motions from pre-recorded motion capture database, human posture reconstruction with low cost device is an alternative way to obtain human motions. Recent research works show that devices that can estimate 3D postures from a single depth image (e.g. Kinect) have made interactive applications more appealing. In addition, it is rather costly to obtain the postures with mark-based motion capture technology. Hence, it is necessary to develop a robust method to reconstruct human posture using Kinect. Yet, it is still challenging to estimate pose accurately from a single depth camera due to the inherently noisy data derived from depth image and self-occluding action performed by the user. Here, we present a probabilistic framework to enhance the accuracy of the postures live captured by Kinect. We apply the Gaussian Process model as a prior to leverage position data obtained with Kinect and pose data from marker-based motion capture system. We also incorporate a temporal consistency term into the optimization framework to minimize the discrepancy between the current pose and the previous ones. Experimental results demonstrate that our system can achieve high quality postures even under severe self-occlusion situations, which is promising to be used for real-time posture based applications. Our proposed methods can free the user from generating realistic human motions and capturing new human movements with cheap device. With the proposed methods, the user can either retrieve motions from a large collection of motion capture database or synthesize similar motions based on the proposed variation synthesis approach. Moreover, the proposed posture reconstruction system allows the user to capture high quality human motions. Our methods are promising to be applied in computer games and animations to enhance the animation quality by introducing realistic human motions.

    Research areas

  • Digital techniques, Gaussian processes, Computer simulation, Image processing, Human locomotion