Skip to main navigation Skip to search Skip to main content

AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers

  • Linya Fu (Co-first Author)
  • , Yu Liu (Co-first Author)
  • , Zhijie Liu
  • , Zedong Yang
  • , Zhong-Qiu Wang
  • , Youfu Li
  • , He Kong

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-to-fine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods. © 2025 International Speech Communication Association. All rights reserved.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2025
Pages938-942
Number of pages5
DOIs
Publication statusPublished - Aug 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025
https://www.interspeech2025.org/

Conference

Conference26th Interspeech Conference 2025
PlaceNetherlands
CityRotterdam
Period17/08/2521/08/25
Internet address

Funding

This work was supported by the Science, Technology, and Innovation Commission of Shenzhen Municipality, China (Grant No. ZDSYS20220330161800001), the Shenzhen Science and Technology Program (Grant No. KQTD20221101093557010), and the Guangdong Science and Technology Program (Grant No. 2024B1212010002).

Research Keywords

  • 3D localization
  • binaural sound source localization
  • coarse-to-fine architecture
  • overlapping sources
  • self-attention

Fingerprint

Dive into the research topics of 'AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers'. Together they form a unique fingerprint.

Cite this