Privacy-Preserving Machine Learning Algorithms for Big Data Systems

Kaihe Xu, Hao Yue, Linke Guo, Yuanxiong Guo, Yuguang Fang

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

93 Citations (Scopus)

Abstract

Machine learning has played an increasing important role in big data systems due to its capability of efficiently discovering valuable knowledge and hidden information. Often times big data such as healthcare systems or financial systems may involve with multiple organizations who may have different privacy policy, and may not explicitly share their data publicly while joint data processing may be a must. Thus, how to share big data among distributed data processing entities while mitigating privacy concerns becomes a challenging problem. Traditional methods rely on cryptographic tools and/or randomization to preserve privacy. Unfortunately, this alone may be inadequate for the emerging big data systems because they are mainly designed for traditional small-scale data sets. In this paper, we propose a novel framework to achieve privacy-preserving machine learning where the training data are distributed and each shared data portion is of large volume. Specifically, we utilize the data locality property of Apache Hadoop architecture and only a limited number of cryptographic operations at the Reduce() procedures to achieve privacy-preservation. We show that the proposed scheme is secure in the semi-honest model and use extensive simulations to demonstrate its scalability and correctness.
Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015
PublisherIEEE
Pages318-327
Volume2015-July
ISBN (Print)9781467372145
DOIs
Publication statusPublished - 22 Jul 2015
Externally publishedYes
Event35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015 - Columbus, United States
Duration: 29 Jun 20152 Jul 2015

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2015-July

Conference

Conference35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015
PlaceUnited States
CityColumbus
Period29/06/152/07/15

Bibliographical note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Fingerprint

Dive into the research topics of 'Privacy-Preserving Machine Learning Algorithms for Big Data Systems'. Together they form a unique fingerprint.

Cite this