Speech enhancement is concerned with improving both the quality and intelligibility
of speech that is corrupted by background noise. It has a wide range of real-world applications
such as mobile communications and as front-end processing in robust speech
recognition. It is widely reported in the literature that existing speech enhancement
techniques based on optimal filtering generate disturbing artifacts (commonly referred
to as "musical tones") due to the errors in estimating the noise and clean speech spectra
in non-stationary and/or low signal-to-noise ratio (SNR) environment.
By taking advantage of the ever-growing computing power, it is now practically
feasible to consider additional speech prior information (e.g., harmonic structure,
temporal correlation) to improve state-of-the-art speech enhancement techniques. In
this thesis, a novel model-based technique is presented to enhance noisy speech. It
is based on an analysis-synthesis framework using the harmonic noise model (HNM).
Target speech is re-synthesized with the HNM parameters (e.g., pitch, spectrum envelope,
excitation variance) estimated from noisy observations. In addition, through
effective modeling of the spectral envelope evolution with a series of line spectrum
frequencies (LSF) using linear dynamical system (LDS), a dynamic feature tracking
scheme is proposed to improve the spectrum envelope estimation by performing online
Kalman smoothing. The system identification of Kalman filtering is achieved via
a combined design of codebook mapping and maximum likelihood estimator (MLE)
with parallel training data. In addition, it is also verified in the thesis that the proposed
dynamic tracking scheme can also be employed as a post-processing tool to
improve conventional speech enhancement techniques.
The proposed speech enhancement techniques are thoroughly evaluated based on
a study of short-time spectra, spectrograms, objective measures and subjective listening
tests. It is verified that the proposed model-based speech enhancement technique
with dynamic feature tracking constantly outperforms conventional techniques
in objective measures. An average of an approximately 2dB improvement in the log-spectral distance (LSD) measure and 0.3 point improvement in the perceptual
evaluation of speech quality (PESQ) measure are achieved. Furthermore, subjective
listening test results also confirm the perceptual improvement. The proposed dynamic
tracking scheme is not restricted to the application in speech enhancement for
voice conversation. It can be potentially extended to other autoregressive parameter
applications such as speech coding and feature enhancement for speech recognition.
| Date of Award | 15 Jul 2013 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Cheung Fat CHAN (Supervisor) |
|---|
- Digital techniques
- Speech processing systems
- Signal processing
Model-based speech enhancement with dynamic feature tracking
CHEN, R. (Author). 15 Jul 2013
Student thesis: Doctoral Thesis