TY - JOUR
T1 - SNR-dependent non-uniform spectral compression for noisy speech recognition
AU - Chu, K. K.
AU - Leung, S. H.
PY - 2004
Y1 - 2004
N2 - It is known that the perceived loudness of a tone signal by human is spectrally masked by background noises. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This masking property of perceived loudness stimulates us to propose a new mel-scale-based feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the speech power spectrum is to undergo mel-scaled band-pass filtering, as in standard MFCC front-end. However, the energies of the outputs of the filters are compressed by different root values defined by a compression function. This compression function is a function of the SNR in each filter band. Using this new scheme of SNR-dependent non-uniform spectral compression (SNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in different noisy environments, as compared to the standard MFCC and features derived with cubic root spectral compression.
AB - It is known that the perceived loudness of a tone signal by human is spectrally masked by background noises. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This masking property of perceived loudness stimulates us to propose a new mel-scale-based feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the speech power spectrum is to undergo mel-scaled band-pass filtering, as in standard MFCC front-end. However, the energies of the outputs of the filters are compressed by different root values defined by a compression function. This compression function is a function of the SNR in each filter band. Using this new scheme of SNR-dependent non-uniform spectral compression (SNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in different noisy environments, as compared to the standard MFCC and features derived with cubic root spectral compression.
UR - http://www.scopus.com/inward/record.url?scp=4544388717&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-4544388717&origin=recordpage
U2 - 10.1109/ICASSP.2004.1326150
DO - 10.1109/ICASSP.2004.1326150
M3 - RGC 21 - Publication in refereed journal
SN - 0736-7791
VL - 1
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
Y2 - 17 May 2004 through 21 May 2004
ER -