Preserving model privacy for machine learning in distributed systems

Qi Jia, Linke Guo*, Zhanpeng Jin, Yuguang Fang

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

53 Citations (Scopus)

Abstract

Machine Learning based data classification is a widely used data mining technique. By learning massive data collected from the real world, data classification helps learners discover hidden data patterns. These hidden data patterns are represented by the learned model in different machine learning schemes. Based on such models, a user can classify whether the new incoming data belongs to an existing class; or, multiple entities may test the similarity of their datasets. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets for classifying or testing. On the one hand, the learned model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose an approach to preserve the model privacy of the data classification and similarity evaluation for distributed systems. With our scheme, neither new data nor learned models are directly revealed during the classification and similarity evaluation procedures. Based on extensive real-world experiments, we have evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.
Original languageEnglish
Pages (from-to)1808-1822
JournalIEEE Transactions on Parallel and Distributed Systems
Volume29
Issue number8
DOIs
Publication statusPublished - 1 Aug 2018
Externally publishedYes

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Research Keywords

  • data classification
  • Machine learning
  • model evaluation
  • privacy preservation

Fingerprint

Dive into the research topics of 'Preserving model privacy for machine learning in distributed systems'. Together they form a unique fingerprint.

Cite this