Understanding the Security Issues of Microphones on Smart Phones

針對手機麥克風安全問題的研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date2 May 2023

Abstract

The mobile phone nowadays are equipped with multiple microphones supporting various applications including noise cancellation in phone calls and automatic speech recognition. The improvement of advanced microphones brings benefits for useful application design while it also raises crucial security concerns. In this dissertation, we study two threats to the usage of microphones on mobile phones and their corresponding countermeasures. We first find that when a user is typing on a phone, the sounds generated from the vibration caused by finger's tapping on the screen surface can be captured by the dual microphones. These acoustic signals contain enough information to infer the position of the user's keystrokes raising privacy risks for the leakage of private information typed on mobile phones. In particular, we apply dedicated signal processing methods on the raw acoustic signals and accurately extract the keystroke sounds. We then combine time difference of arrival (TDoA) measurements and an unsupervised auto-encoder model to recognize the position of keystrokes avoiding the requirement of keystroke signals from the victim. The evaluation result shows promising successful rates for twenty-six keyboards on various types of mobile phones. Besides, the acoustic commands collected from microphones are widely used in automatic speech recognition (ASR) systems for human-computer interaction systems including Amazon Alexa, Apple Siri and Microsoft Cortana. The ASR systems are vulnerable to the replay attacks by recording the voice commands from legal users. Several state-of-the-art defense systems identify the replayed acoustic signals played by speakers through the unique acoustic features in frequency and phase domains. In particular, previous works successfully modulate the acoustic signals played by the speaker to compensate the difference in frequency domain misleading some defense systems. However, recent works utilize the unique features in phase domain to detect the replayed signals. We find that these unique features are caused by the hardware imperfection of the speakers, which are challenged to be measured with normal signal processing methods. Instead of designing normal filters, we compensate the replayed signals both in frequency domain as well as phase domain using deep learning networks in a two-stage training process and successfully attack the state-of-the-art defense systems. In summary, this dissertation carefully studies the threats in acoustic signals used in various applications from microphones on mobile phones, and provides corresponding countermeasures to defend the proposed attacks based on acoustic signals.