SNR-dependent non-uniform spectral compression for noisy speech recognition

K. K. Chu, S. H. Leung

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

17 Citations (Scopus)

Abstract

It is known that the perceived loudness of a tone signal by human is spectrally masked by background noises. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This masking property of perceived loudness stimulates us to propose a new mel-scale-based feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the speech power spectrum is to undergo mel-scaled band-pass filtering, as in standard MFCC front-end. However, the energies of the outputs of the filters are compressed by different root values defined by a compression function. This compression function is a function of the SNR in each filter band. Using this new scheme of SNR-dependent non-uniform spectral compression (SNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in different noisy environments, as compared to the standard MFCC and features derived with cubic root spectral compression.

Fingerprint

Dive into the research topics of 'SNR-dependent non-uniform spectral compression for noisy speech recognition'. Together they form a unique fingerprint.

Cite this