Perceptually non-uniform spectral compression for noisy speech recognition

K. K. Chu, S. H. Leung, C. S. Yip

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

6 Citations (Scopus)

Abstract

Loundness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition for FFT-based methods. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or perceptually based linear prediction (PLP). Further to this, those sound segments or frames having broadband characteristics like those of fricatives are given larger compression as well. The frame energy is used as the index to determine the degree of compression. By using this new scheme of non-uniform spectral compression, significant improvement in recognition accuracy is obtained, especially in very low SNR, under white noise environment.
Original languageEnglish
Pages (from-to)404-407
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
DOIs
Publication statusPublished - 2003
Event2003 IEEE International Conference on Accoustics, Speech, and Signal Processing - Hong Kong, Hong Kong, China
Duration: 6 Apr 200310 Apr 2003

Fingerprint

Dive into the research topics of 'Perceptually non-uniform spectral compression for noisy speech recognition'. Together they form a unique fingerprint.

Cite this