A coupled HMM approach to video-realistic speech animation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 2325-2340 |
Journal / Publication | Pattern Recognition |
Volume | 40 |
Issue number | 8 |
Publication status | Published - Aug 2007 |
Link(s)
Abstract
We propose a coupled hidden Markov model (CHMM) approach to video-realistic speech animation, which realizes realistic facial animations driven by speaker independent continuous speech. Different from hidden Markov model (HMM)-based animation approaches that use a single-state chain, we use CHMMs to explicitly model the subtle characteristics of audio-visual speech, e.g., the asynchrony, temporal dependency (synchrony), and different speech classes between the two modalities. We derive an expectation maximization (EM)-based A/V conversion algorithm for the CHMMs, which converts acoustic speech into decent facial animation parameters. We also present a video-realistic speech animation system. The system transforms the facial animation parameters to a mouth animation sequence, refines the animation with a performance refinement process, and finally stitches the animated mouth with a background facial sequence seamlessly. We have compared the animation performance of the CHMM with the HMMs, the multi-stream HMMs and the factorial HMMs both objectively and subjectively. Results show that the CHMMs achieve superior animation performance. The ph-vi-CHMM system, which adopts different state variables (phoneme states and viseme states) in the audio and visual modalities, performs the best. The proposed approach indicates that explicitly modelling audio-visual speech is promising for speech animation. © 2006 Pattern Recognition Society.
Research Area(s)
- Audio-to-visual conversion, Coupled hidden Markov models (CHMMs), Facial animation, Speech animation, Talking faces
Citation Format(s)
A coupled HMM approach to video-realistic speech animation. / Xie, Lei; Liu, Zhi-Qiang.
In: Pattern Recognition, Vol. 40, No. 8, 08.2007, p. 2325-2340.
In: Pattern Recognition, Vol. 40, No. 8, 08.2007, p. 2325-2340.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review