Telephone speech is considered narrowband (0.3 – 3.4 kHz) and generally requires extra listening efforts as compared with the wideband speech (0 – 8 kHz). Owing to the old telephone network, the high-band spectral information is lost, yielding muffled speech quality. Albeit wideband voice terminals are gradually introduced to replace the narrowband terminals, there would be a long transition period before wideband systems are fully deployed. During this period when there are mixtures of narrowband and wideband terminals in the network, speech bandwidth extension system is an economical solution to enhance the perceived speech quality and intelligibility without modifying the infrastructure of physical telephone network. In this thesis, two speech bandwidth extension systems based on different statistical models are presented, in which speech state dynamics is utilized such that the missing high band information is more accurately estimated with priori knowledge about previous estimations. The first method is built on continuous density hidden Markov model (HMM) and maps the Markov states of the narrowband speech to those of the wideband speech. The second method employs linear state-space model with wideband speech being estimated through the well-known Kalman filtering algorithm. In spite of different underlying statistical models, both systems explore the relationship between narrowband speech state dynamics and wideband speech state dynamics, as is essential for significantly improving the performance of bandwidth extension systems. The performance of the two proposed systems is evaluated and compared with other reported bandwidth extension systems by both objective and subjective measurements. In objective evaluation, the HMM state mapping approach achieves 0.3 dB reduction on high band spectral distortion over the conventional Gaussian mixture model (GMM) approach while the linear state dynamic approach achieves 0.35 dB improvement. In subjective assessment, enhanced wideband speech sounds crispier and clearer than the narrowband telephone speech. Moreover, the hissing artifacts present in all bandwidth extension systems are substantially reduced. The results conclude that the use of the speech state dynamics can improve the performance of bandwidth extension systems.
| Date of Award | 16 Jul 2007 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Cheung Fat CHAN (Supervisor) |
|---|
- Broadband communication systems
- Design
Bandwidth extension for enhancement of narrowband speech by utilizing speech state dynamics
YAO, S. (Author). 16 Jul 2007
Student thesis: Master's Thesis