Robust feature extraction techniques for speech recognition
Student thesis: Master's Thesis
Related Research Unit(s)
|Award date||1 Nov 1994|
The performance of conventional LPC-based speech recognition systems degrades rapidly under noisy environment. Recent research works concentrate in three areas to deal with this problem. They are the enhancement preprocessing, robust modeling, and reduction of mismatch in noise conditions between the testing and reference templates. In this thesis, all these three strategies are addressed. Second-order Volterra Prediction filter (SVP) is proposed as a new nonlinear modeling technique for speech. It exploits the inherent nonlinear characteristics inside speech by introducing quadratic kernel into the prediction analysis. It is applied as a preprocessor module since' it produces substantial signal-to-noise ratio (SNR) improvement in speech enhancement for speech of low input SNRs. Two-Sided Linear Prediction coding (TSLP) is proposed as a robust feature extraction technique. It utilizes both the past and future samples in prediction with symmetric or asymmetric weightings. The special pole structure produces sharp and enhanced spectral peaks which make TSLP successful in describing the characteristics of noisy speech. As a result, better spectral matching is achieved and the performance of noisy speech recognition can be improved The reduced rank subspace approach is adopted to minimize the mismatch in noise conditions for testing and training patterns. It partitions the vector space of noisy speech into signal subspace and noise subspace based on some rank determination methods. This approach can integrate with the TSLP model. More robust features can then be extracted by making use of the noise-resisted signal subspace. The new feature extraction techniques are compared to several benchmark algorithms using an isolated word speaker dependent Cantonese speech recognition system under noisy environment. Simulation results show that the new techniques provide better recognition rates than other benchmark algorithms.
- Speech processing systems