Audio enhancement and intelligent classification of household sound events using a sparsely deployed array

Mingsian R. Bai*, Shih-Syuan Lan, Jong-Yi Huang, Yi-Cheng Hsu, Hing-Cheung So

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

8 Citations (Scopus)
52 Downloads (CityUHK Scholars)

Abstract

A household sound event classification system consisting of an audio localization and enhancement front-end cascaded with an intelligent classification back-end is presented. The front-end is composed of a sparsely deployed microphone array and a preprocessing unit to localize the source and extract the associated signal. In the front-end, a two-stage method and a direct method are compared for localization. The two-stage method introduces a subspace algorithm to estimate the time difference of arrival, followed by a constrained least squares algorithm to determine the source location. The direct localization methods, the delay-and-sum beamformer, the minimum power distortionless response beamformer, and the multiple signal classification algorithm are compared in terms of localization performance for sparse array configuration. A modified particle swarm optimization algorithm enabled an efficient grid-search. A minimum variance distortionless response beamformer in conjunction with a minimum-mean-square-error postfilter is exploited to extract the source signals for sound event classification tasks that follow. The back-end of the system is a sound event classifier that is based on convolutional neural networks (CNNs), and convolutional long short-term memory networks Mel-spectrograms are used as the input features to the CNNs. Simulations and experiments conducted in a live room have demonstrated the strength and weakness of the direct and two-stage methods. Signal quality enhancement using the array-based front-end proves beneficial for improved classification accuracy over a single microphone. © 2020 Acoustical Society of America.
Original languageEnglish
Pages (from-to)11-24
JournalJournal of the Acoustical Society of America
Volume147
Issue number1
Online published13 Jan 2020
DOIs
Publication statusPublished - Jan 2020

Publisher's Copyright Statement

  • COPYRIGHT TERMS OF DEPOSITED FINAL PUBLISHED VERSION FILE: This article may be downloaded for personal use only. Any other use requires prior permission of the author and AIP Publishing. This article appeared in Mingsian R. Bai, Shih-Syuan Lan, Jong-Yi Huang, Yi-Cheng Hsu, and Hing-Cheung So , "Audio enhancement and intelligent classification of household sound events using a sparsely deployed array", The Journal of the Acoustical Society of America 147, 11-24 (2020) and may be found at https://doi.org/10.1121/10.0000492.

Fingerprint

Dive into the research topics of 'Audio enhancement and intelligent classification of household sound events using a sparsely deployed array'. Together they form a unique fingerprint.

Cite this