TY - GEN
T1 - Speech bandwidth enhancement using state space speech dynamics
AU - Yao, Sheng
AU - Chan, Cheung-Fat
PY - 2006
Y1 - 2006
N2 - Extending narrowband speech (0-4 kHz) to wideband speech (0-8 kHz) has applications in telephone systems and speech recognition systems where wideband training speech data may not be available. A couple of methods have been proposed to retrieve the missing high-band information (4-8 kHz) from narrowband speech. Memoryless systems are likely to produce large hissing artifacts since mutual information between low-band (0-4 kHz) and high-band (4-8 kHz) spectra are actually quite low. Generally speaking, bandwidth extension cannot recover original high-band information but good approximates with less over-estimation of the high-band energy, which usually refers to hissing artifact, can be obtained by considering the neighboring speech frames. In this paper, we propose a new bandwidth extension system with memory by using a state-space model to capture the long-term speech dynamics. The model parameters can be trained in the sense of maximum likelihood (ML) and the enhancement is obtained via wideband state vector estimation and Kalman filtering. The performance in terms of spectral distortion is shown to be much better than other memoryless systems and is comparable with early Continuous Density Hidden Markov Model (CDHMM) memory system. The new state-space method is inherent sequential and has advantages of less processing delays and robustness against block detection errors. © 2006 IEEE.
AB - Extending narrowband speech (0-4 kHz) to wideband speech (0-8 kHz) has applications in telephone systems and speech recognition systems where wideband training speech data may not be available. A couple of methods have been proposed to retrieve the missing high-band information (4-8 kHz) from narrowband speech. Memoryless systems are likely to produce large hissing artifacts since mutual information between low-band (0-4 kHz) and high-band (4-8 kHz) spectra are actually quite low. Generally speaking, bandwidth extension cannot recover original high-band information but good approximates with less over-estimation of the high-band energy, which usually refers to hissing artifact, can be obtained by considering the neighboring speech frames. In this paper, we propose a new bandwidth extension system with memory by using a state-space model to capture the long-term speech dynamics. The model parameters can be trained in the sense of maximum likelihood (ML) and the enhancement is obtained via wideband state vector estimation and Kalman filtering. The performance in terms of spectral distortion is shown to be much better than other memoryless systems and is comparable with early Continuous Density Hidden Markov Model (CDHMM) memory system. The new state-space method is inherent sequential and has advantages of less processing delays and robustness against block detection errors. © 2006 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=33947693850&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-33947693850&origin=recordpage
U2 - 10.1109/ICASSP.2006.1660064
DO - 10.1109/ICASSP.2006.1660064
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9781424404698
VL - 1
SP - I 489-I 492
BT - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing
PB - IEEE
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2006)
Y2 - 14 May 2006 through 19 May 2006
ER -