Abstract
Virtual personal assistants (VPA) services encompass a large number of third-party applications (or apps) to enrich their function-alities. These apps have been well examined to scrutinize their data collection behaviors against their declared privacy policies. Nonetheless, it is often overlooked that most users tend to ignore privacy policies at the installation time. Dishonest developers thus can exploit this situation by embedding excessive declarations to cover their data collection behaviors during compliance auditing.
In this work, we present Pico, a privacy inconsistency detector, which checks the VPA app's privacy compliance by analyzing (in)consistency between data requested and data essential for its functionality. Pico understands the app's functionality topics from its publicly available textual data, and leverages advanced GPT-based language models to address domain-specific challenges. Based on the counterparts with similar functionality, suspicious data collection can be detected through the lens of anomaly detection. We apply Pico to understand the status quo of data-functionality com-pliance among all 65,195 skills in the Alexa app store. Our study reveals that 21.7% of the analyzed skills exhibit suspicious data collection, including Top 10 popular Alexa skills that pose threats to 54,116 users. These findings should raise an alert to both developers and users, in the compliance with the purpose limitation principle in data regulations. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
In this work, we present Pico, a privacy inconsistency detector, which checks the VPA app's privacy compliance by analyzing (in)consistency between data requested and data essential for its functionality. Pico understands the app's functionality topics from its publicly available textual data, and leverages advanced GPT-based language models to address domain-specific challenges. Based on the counterparts with similar functionality, suspicious data collection can be detected through the lens of anomaly detection. We apply Pico to understand the status quo of data-functionality com-pliance among all 65,195 skills in the Alexa app store. Our study reveals that 21.7% of the analyzed skills exhibit suspicious data collection, including Top 10 popular Alexa skills that pose threats to 54,116 users. These findings should raise an alert to both developers and users, in the compliance with the purpose limitation principle in data regulations. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
| Original language | English |
|---|---|
| Title of host publication | ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering |
| Publisher | Association for Computing Machinery |
| ISBN (Print) | 9798400702174 |
| DOIs | |
| Publication status | Published - May 2024 |
| Externally published | Yes |
| Event | 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024) - Centro Cultural de Belém, Lisbon, Portugal Duration: 14 Apr 2024 → 20 Apr 2024 https://conf.researchr.org/home/icse-2024 |
Publication series
| Name | Proceedings - International Conference on Software Engineering |
|---|---|
| ISSN (Print) | 0270-5257 |
Conference
| Conference | 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024) |
|---|---|
| Place | Portugal |
| City | Lisbon |
| Period | 14/04/24 → 20/04/24 |
| Internet address |
Funding
We thank the anonymous reviewers for their insightful comments to improve this manuscript. This work is partially supported by Australian Research Council Discovery Projects (DP230101196, DP240103068).
Research Keywords
- Alexa skills
- privacy compliance
- Virtual Personal Assistant
Fingerprint
Dive into the research topics of 'Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA Apps'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver