Modeling of single character motions with temporal sparse representation and Gaussian processes for human motion retrieval and synthesis
基於時域稀疏表示和高斯過程的單角色動作模型的建立及其在動作檢索和生成的應用
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 3 Oct 2014 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(f60a6d29-7008-408f-b504-c282028a0c07).html |
---|---|
Other link(s) | Links |
Abstract
3D motion capture (mocap) is the process to record and digitalize the movement of
people or objects. Mocap technology is widely used in computer animation, manmachine
interaction games, athletic training and 3D movies, etc. However, it is rather
time and manpower consuming to capture human motions as it consists of calibration
of the system and post processing of the captured artifacts. Therefore, it is essential
to either reuse pre-captured data or develop effective methods to synthesize new motions.
To reuse pre-captured data, we need an efficient retrieval mechanism to search
for a particular motion from a large corpus. Human motion retrieval has proven to
be challenging as human motion is high dimensional in both spatial and temporal domains.
Besides, semantically similar motions are not necessarily numerically similar
because of the speed variations. With the retrieved similar motions, we propose to
synthesize human motion variations for intended applications. However, the joints of
the human skeleton are highly correlated based on the articulated skeleton structure
and it is challenging to synthesize natural human motions. In this thesis, we develop
new methods to address the problem of reusing human motion capture data, which
includes three sub-problems, i.e., human motion retrieval, human motion variation synthesis and human posture reconstruction.
For human motion retrieval, an effective feature representation plays an important
role during the motion matching procedure. In this thesis, we propose to learn
features from motion data instead of designing features since hand-crafted features
are not comprehensive enough to represent different kinds of motions. Motivated by
the recent advancement of sparse representation which is commonly used to solve
computer vision problems, we propose a temporal sparse representation (TSR) for
human motion retrieval. Compared with existing methods that adopt sparse representation,
our TSR encodes the temporal information within motions and thus
generates a more compact and discriminative representation. In addition, we propose
a spatial temporal pyramid matching (STPM) kernel based on TSR, which can
be used for logical comparison between motions. Our STPM improves the effectiveness
of motion retrieval in terms of accuracy and speed. To allow the user to retrieve
desired motions in a natural and intuitive way, we develop a touch-less interactive
human motion retrieval system. The system allows the user to specify the query
motion by performing it directly with Kinect. Besides, the user interacts with the
retrieval system using gestures so no controller is needed and the system delivers a
natural user interface.
With the retrieved similar motions, we synthesize variations that can be used
for intended applications. Human motion variation synthesis is important for crowd
simulation and interactive applications to enhance the synthesis quality. Here, we
propose a novel generative probabilistic model to synthesize variations of human motion
with the retrieved similar motions. Our key idea is to model the conditional distribution of each joint via a multivariate Gaussian Process model, namely Semiparametric
Latent Factor Model (SLFM). SLFM can effectively model the correlations
between degrees of freedom (DOFs) of joints rather than dealing each DOF
separately as implemented in existing methods. Detailed evaluation is performed to
show the proposed approach can effectively synthesize variations of different types
of motions. Motions generated by our method show a richer variations compared to
those generated by existing methods.
Besides retrieving motions from pre-recorded motion capture database, human
posture reconstruction with low cost device is an alternative way to obtain human
motions. Recent research works show that devices that can estimate 3D postures
from a single depth image (e.g. Kinect) have made interactive applications more
appealing. In addition, it is rather costly to obtain the postures with mark-based
motion capture technology. Hence, it is necessary to develop a robust method to
reconstruct human posture using Kinect. Yet, it is still challenging to estimate pose
accurately from a single depth camera due to the inherently noisy data derived from
depth image and self-occluding action performed by the user. Here, we present a
probabilistic framework to enhance the accuracy of the postures live captured by
Kinect. We apply the Gaussian Process model as a prior to leverage position data
obtained with Kinect and pose data from marker-based motion capture system. We
also incorporate a temporal consistency term into the optimization framework to minimize
the discrepancy between the current pose and the previous ones. Experimental
results demonstrate that our system can achieve high quality postures even under
severe self-occlusion situations, which is promising to be used for real-time posture based applications.
Our proposed methods can free the user from generating realistic human motions
and capturing new human movements with cheap device. With the proposed methods,
the user can either retrieve motions from a large collection of motion capture
database or synthesize similar motions based on the proposed variation synthesis approach.
Moreover, the proposed posture reconstruction system allows the user to
capture high quality human motions. Our methods are promising to be applied in
computer games and animations to enhance the animation quality by introducing
realistic human motions.
- Digital techniques, Gaussian processes, Computer simulation, Image processing, Human locomotion