TY - JOUR
T1 - Perceptually non-uniform spectral compression for noisy speech recognition
AU - Chu, K. K.
AU - Leung, S. H.
AU - Yip, C. S.
PY - 2003
Y1 - 2003
N2 - Loundness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition for FFT-based methods. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or perceptually based linear prediction (PLP). Further to this, those sound segments or frames having broadband characteristics like those of fricatives are given larger compression as well. The frame energy is used as the index to determine the degree of compression. By using this new scheme of non-uniform spectral compression, significant improvement in recognition accuracy is obtained, especially in very low SNR, under white noise environment.
AB - Loundness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition for FFT-based methods. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or perceptually based linear prediction (PLP). Further to this, those sound segments or frames having broadband characteristics like those of fricatives are given larger compression as well. The frame energy is used as the index to determine the degree of compression. By using this new scheme of non-uniform spectral compression, significant improvement in recognition accuracy is obtained, especially in very low SNR, under white noise environment.
UR - http://www.scopus.com/inward/record.url?scp=0141479992&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-0141479992&origin=recordpage
U2 - 10.1109/ICASSP.2003.1198803
DO - 10.1109/ICASSP.2003.1198803
M3 - RGC 21 - Publication in refereed journal
SN - 0736-7791
VL - 1
SP - 404
EP - 407
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2003 IEEE International Conference on Accoustics, Speech, and Signal Processing
Y2 - 6 April 2003 through 10 April 2003
ER -