基于Conformer的实时多场景说话人识别模型

Translated title of the contribution: Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios

宣茜, 韩润萍*, 高静欣

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

To handle the problems of poor performances of speaker verification systems, appearing in multiple scenarios with cross-domain utterances, long-duration utterances and noisy utterances, a real-time robust speaker recognition model, PMS-Conformer, is designed based on Conformer in this paper. The architecture of the PMS-Conformer is inspired by the state-of-the-art model named MFA-Conformer. PMS-Conformer has made the improvements on the acoustic feature extractor, network components and loss calculation module of MFA-Conformer respectively, having the novel and effective acoustic feature extractor and the robust speaker embedding extractor with high generalization capability. PMS-Conformer is trained on VoxCeleb1&2 dataset, and it is compared with the baseline MFA-Conformer and ECAPA-TDNN, and extensive comparison experiments are conducted on the speaker verification tasks. The experimental results show that on VoxMovies with cross-domain utterances, SITW with long-duration utterances and VoxCeleb-O processed by adding noise to its utterances, the ASV system built with PMS-Conformer is more competitive than those built with MFA-Conformer and ECAPA-TDNN respectively. Moreover, the trainable Params and RTF of the speaker embedding extractor of PMS-Conformer are significantly lower than those of ECAPA-TDNN. All evaluation experiment results demonstrate that PMS-Conformer exhibits good performances in real-time multi-scenarios.
Translated title of the contributionConformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios
Original languageChinese (Simplified)
Pages (from-to)147-156
Journal计算机工程与应用 Computer Engineering and Applications
Volume60
Issue number7
DOIs
Publication statusPublished - Apr 2024
Externally publishedYes

Research Keywords

  • speaker verification
  • MFA-Conformer
  • Sub-center AAM-Softmax
  • speaker embedding
  • acoustic feature extraction

Fingerprint

Dive into the research topics of 'Conformer-Based Speaker Recognition Model for Real-Time Multi-Scenarios'. Together they form a unique fingerprint.

Cite this