Skip to main navigation Skip to search Skip to main content

Model-based speech enhancement with dynamic feature tracking

  • Ruofei CHEN

Student thesis: Doctoral Thesis

Abstract

Speech enhancement is concerned with improving both the quality and intelligibility of speech that is corrupted by background noise. It has a wide range of real-world applications such as mobile communications and as front-end processing in robust speech recognition. It is widely reported in the literature that existing speech enhancement techniques based on optimal filtering generate disturbing artifacts (commonly referred to as "musical tones") due to the errors in estimating the noise and clean speech spectra in non-stationary and/or low signal-to-noise ratio (SNR) environment. By taking advantage of the ever-growing computing power, it is now practically feasible to consider additional speech prior information (e.g., harmonic structure, temporal correlation) to improve state-of-the-art speech enhancement techniques. In this thesis, a novel model-based technique is presented to enhance noisy speech. It is based on an analysis-synthesis framework using the harmonic noise model (HNM). Target speech is re-synthesized with the HNM parameters (e.g., pitch, spectrum envelope, excitation variance) estimated from noisy observations. In addition, through effective modeling of the spectral envelope evolution with a series of line spectrum frequencies (LSF) using linear dynamical system (LDS), a dynamic feature tracking scheme is proposed to improve the spectrum envelope estimation by performing online Kalman smoothing. The system identification of Kalman filtering is achieved via a combined design of codebook mapping and maximum likelihood estimator (MLE) with parallel training data. In addition, it is also verified in the thesis that the proposed dynamic tracking scheme can also be employed as a post-processing tool to improve conventional speech enhancement techniques. The proposed speech enhancement techniques are thoroughly evaluated based on a study of short-time spectra, spectrograms, objective measures and subjective listening tests. It is verified that the proposed model-based speech enhancement technique with dynamic feature tracking constantly outperforms conventional techniques in objective measures. An average of an approximately 2dB improvement in the log-spectral distance (LSD) measure and 0.3 point improvement in the perceptual evaluation of speech quality (PESQ) measure are achieved. Furthermore, subjective listening test results also confirm the perceptual improvement. The proposed dynamic tracking scheme is not restricted to the application in speech enhancement for voice conversation. It can be potentially extended to other autoregressive parameter applications such as speech coding and feature enhancement for speech recognition.
Date of Award15 Jul 2013
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorCheung Fat CHAN (Supervisor)

Keywords

  • Digital techniques
  • Speech processing systems
  • Signal processing

Cite this

'