TY - JOUR
T1 - Beyond Class-Level Privacy Leakage
T2 - Breaking Record-Level Privacy in Federated Learning
AU - Yuan, Xiaoyong
AU - Ma, Xiyao
AU - Zhang, Lan
AU - Fang, Yuguang
AU - Wu, Dapeng
PY - 2022/2/15
Y1 - 2022/2/15
N2 - Federated learning (FL) enables multiple clients to collaboratively build a global learning model without sharing their own raw data for privacy protection. Unfortunately, recent research still found privacy leakage in FL, especially on image classification tasks, such as the reconstruction of class representatives. Nevertheless, such analysis on image classification tasks is not applicable to uncover the privacy threats against natural language processing (NLP) tasks, whose records composed of sequential texts cannot be grouped as class representatives. The finer (record-level) granularity in NLP tasks not only makes it more challenging to extract individual text records, but also exposes more serious threats. This article presents the first attempt to explore the record-level privacy leakage against NLP tasks in FL. We propose a framework to investigate the exposure of the records of interest in federated aggregations by leveraging the perplexity of language modeling. Through monitoring the exposure patterns, we propose two correlation attacks to identify the corresponding clients when extracting their specific records. Extensive experimental results demonstrate the effectiveness of the proposed attacks. We have also examined several countermeasures and shown that they are ineffective to mitigate such attacks, and hence further research is expected.
AB - Federated learning (FL) enables multiple clients to collaboratively build a global learning model without sharing their own raw data for privacy protection. Unfortunately, recent research still found privacy leakage in FL, especially on image classification tasks, such as the reconstruction of class representatives. Nevertheless, such analysis on image classification tasks is not applicable to uncover the privacy threats against natural language processing (NLP) tasks, whose records composed of sequential texts cannot be grouped as class representatives. The finer (record-level) granularity in NLP tasks not only makes it more challenging to extract individual text records, but also exposes more serious threats. This article presents the first attempt to explore the record-level privacy leakage against NLP tasks in FL. We propose a framework to investigate the exposure of the records of interest in federated aggregations by leveraging the perplexity of language modeling. Through monitoring the exposure patterns, we propose two correlation attacks to identify the corresponding clients when extracting their specific records. Extensive experimental results demonstrate the effectiveness of the proposed attacks. We have also examined several countermeasures and shown that they are ineffective to mitigate such attacks, and hence further research is expected.
KW - Federated learning (FL)
KW - language modeling
KW - natural language processing
KW - neural networks
KW - privacy
UR - http://www.scopus.com/inward/record.url?scp=85112202658&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85112202658&origin=recordpage
U2 - 10.1109/JIOT.2021.3089713
DO - 10.1109/JIOT.2021.3089713
M3 - RGC 21 - Publication in refereed journal
SN - 2327-4662
VL - 9
SP - 2555
EP - 2565
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 4
ER -