Outsourced Machine Learning with Privacy Protection
面向機器學習的安全外包計算研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 26 Jun 2019 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(079a9572-5be4-433f-a195-239e1b228024).html |
---|---|
Other link(s) | Links |
Abstract
Due to increasing storage and communication requirements, today’s organizations demonstrate a strong tendency to outsource their data to remote servers like cloud service providers. Since the outsourced data may contain sensitive information, the data owners usually opt to encrypt their data, e.g., financial transactions, medical records, before outsourcing to the server. Nevertheless, this in turn hinders the data utilization. On the other hand, machine learning has shown its huge success in various kinds of areas in recent years. Outsourcing expensive machine learning tasks to a remote server is a promising approach for ordinary users who have limited computing resources. This thesis therefore focuses on the problem of privacy-preserving outsourcing computation for machine learning.
By dividing machine learning into three stages, i.e., feature extraction, model training, and model application, we conquer them individually. The first part of the thesis is concerned with the design of privacy-preserving outsourcing of feature extraction. More specifically, we mainly focus on two prevailing feature extraction algorithms: scale-invariant feature transform (SIFT) and Speeded-up Robust Features (SURF), and propose two new privacy-preserving outsourcing protocols for them where the key characteristics of feature descriptors are well preserved. In the second part of the thesis, two typical traditional training methods, ridge regression analysis and canonical correlation analysis (CCA), are investigated. A library of building blocks is first designed to support various arithmetics over encrypted real numbers. Based on the library above, we develop two approaches to perform ridge regression in the ciphertext domain, and propose a new and novel privacy-preserving scheme for CCA. Finally, biometric identification, a typical model application scenario, is considered to show how to efficiently perform biometric identification jobs over encrypted outsourced biometric data without revealing their private information. Two solutions with different security levels are proposed, leading to a tradeoff between privacy and efficiency.
By dividing machine learning into three stages, i.e., feature extraction, model training, and model application, we conquer them individually. The first part of the thesis is concerned with the design of privacy-preserving outsourcing of feature extraction. More specifically, we mainly focus on two prevailing feature extraction algorithms: scale-invariant feature transform (SIFT) and Speeded-up Robust Features (SURF), and propose two new privacy-preserving outsourcing protocols for them where the key characteristics of feature descriptors are well preserved. In the second part of the thesis, two typical traditional training methods, ridge regression analysis and canonical correlation analysis (CCA), are investigated. A library of building blocks is first designed to support various arithmetics over encrypted real numbers. Based on the library above, we develop two approaches to perform ridge regression in the ciphertext domain, and propose a new and novel privacy-preserving scheme for CCA. Finally, biometric identification, a typical model application scenario, is considered to show how to efficiently perform biometric identification jobs over encrypted outsourced biometric data without revealing their private information. Two solutions with different security levels are proposed, leading to a tradeoff between privacy and efficiency.