Abstract
Voice control has become an indispensable interaction method in smart devices. Compared to traditional microphones, mmWave radar offers a promising solution for speech sensing in noisy environments. However, most current research relies on single -view information, such as vocal cord vibrations or lip movements, to classify speech, which overlooks important details like timbre, speech rate, and intonation, limiting the application of speech sensing. To address these issues, we develop a high -quality speech sensing method based on mmWave radar, named SpDiff. This method accurately localizes the vocalizing target and, based on the human vocal mechanism, extracts multi -view speech features according to the movement characteristics of the vocal cords, lips, and face. Additionally, to generate high -quality speech signals, we design a conditional latent diffusion model (CLDM), which uses multiview radar information as conditional guidance, accurately capturing the complex mapping relationships between radar and speech signal distributions. To evaluate the SpDiff method, we build a mmWave system using 1WR1443Boost and recruit 14 volunteers to construct a dataset. Experimental results show that SpDiff achieves high standards in speech sensing, with the generated speech directly input into existing recognition models, achieving an average character and word error rate (CER/WER) of only 2.33% and 3.05%.
© 2025 IEEE
© 2025 IEEE
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE Wireless Communications and Networking Conference (WCNC) |
| Publisher | IEEE |
| Number of pages | 6 |
| ISBN (Electronic) | 979-8-3503-6836-9 |
| DOIs | |
| Publication status | Published - 2025 |
Funding
The work of Xuanheng Li was supported in part by the National Natural Science Foundation of China under Grant 62271100, in part by the Science and Technology Program of Liaoning Province under Grant 2023JH2/101700366, in part by the Fundamental Research Funds for the Central Universities under Grant DUT24ZD127, in part by the Open Research Fund of National Mobile Communications Research Laboratory, Southeast University, under Grant 2025D02, and in part by the Xiaomi Young Talents Program.
Research Keywords
- Wireless Sensing
- Millimeter Wave Radar
- Speech Sensing
- Latent Diffusion Model
- Deep Learning
Fingerprint
Dive into the research topics of 'SpDiff: A Speech Sensing System With Diffusion Model Based on mmWave Radar'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver