0% found this document useful (0 votes)
65 views10 pages

Privacy-Preserving Outsourced Classification in Cloud Computing

Uploaded by

Sebastian Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views10 pages

Privacy-Preserving Outsourced Classification in Cloud Computing

Uploaded by

Sebastian Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Cluster Comput

DOI 10.1007/s10586-017-0849-9

Privacy-preserving outsourced classification in cloud computing


Ping Li1 · Jin Li1 · Zhengan Huang1 · Chong-Zhi Gao1 ·
Wen-Bin Chen1 · Kai Chen2

Received: 26 December 2016 / Revised: 16 February 2017 / Accepted: 28 March 2017


© Springer Science+Business Media New York 2017

Abstract Classifier has been widely applied in machine has attracted much attention in both academia and industry,
learning, such as pattern recognition, medical diagnosis, which is derived from the effectiveness and efficiency when
credit scoring, banking and weather prediction. Because of processing the big data and complex calculations. Especially,
the limited local storage at user side, data and classifier has classifier [1–3] is an essential technique for may tasks of
to be outsourced to cloud for storing and computing. How- machine learning, such as pattern recognition [4], medical
ever, due to privacy concerns, it is important to preserve diagnosis, credit scoring, banking and weather prediction etc.
the confidentiality of data and classifier in cloud comput- Classifiers have been widely used in these applications, how-
ing because the cloud servers are usually untrusted. In this ever, followed by a problem that the security of sensitive data
work, we propose a framework for privacy-preserving out- being handled by the cloud [5–11]. That is because, on the
sourced classification in cloud computing (POCC). Using one hand, the cloud can collect, store and compute the vast
POCC, an evaluator can securely train a classification model amount of data, which is outsourced from diversity sources
over the data encrypted with different public keys, which are or parties (data providers), and without requiring any inter-
outsourced from the multiple data providers. We prove that action with the data providers. On the other hand, the data
our scheme is secure in the semi-honest model providers have no control over their outsourced data and will
not realize what is being derived from their data. A typi-
Keywords Cryptography · Privacy-preserving · Machine cal scenario is that a user might be happy to provide his/her
learning · Classification · Homomorphic encryption face image for face recognition but not for deriving his/her
personal information. Therefore, it is important to do some
confidential process before outsourcing data and computa-
1 Introduction tion to an untrusted cloud server [12–14].
Mostly, to preserve the privacy of the sensitive data
Cloud computing is capable of providing seemly unlimited when outsourcing computation, the user encrypts his/her pri-
storage and computation resource to users as services across vate data before outsourcing them to cloud server. Then,
the Internet while hiding platform and implementation details cloud does some computations over the encrypted data, such
from users. Due to the powerful storage and computation as face recognition, keyword searching, feature extracting,
resource, users often outsource their data and computation to classification and so forth. Usually, to realize the compu-
cloud for storing and processing. Recently, machine learning tation on encrypted data, the encryption scheme should be
based on some homomorphic property. For instance, Erkin
B Jin Li et al. [15] used an additive homomorphic property (Paillier
jinli71@gmail.com cryptosystem) to support the secure computation and real-
1 ized a privacy-preserving face recognition. Yuan et al. [16]
School of Computational Science & Education Software,
Guangzhou University, Guangzhou 510006, adopted the BGN ‘doubly homomorphic’ encryption [17]
People’s Republic of China to support flexible operations over ciphertexts, and realized
2 Institute of Information Engineering, Chinese Academy of a privacy-preserving back-propagation algorithm. However,
Sciences, Beijing, People’s Republic of China BGN cryptosysytem supports only one multiplication and

123
Cluster Comput

unlimited number of addition operations. Thus, to perform trary number of multiplication and addition operations on
arbitrary number of multiplication and addition operations on ciphertexts securely, a proxy fully homomorphic encryption
encrypted data, a fully homomorphic encryption scheme was scheme is used to protect the privacy of data and query. This
used to protect the privacy of data. Zhang et al. [18] utilized proxy fully homomorphic encryption is based on Gentry’s
a fully homomorphic property (BGV cryptosystem [19]) to scheme [20]. Gentry’s work used a bootstrappable tech-
perform the outsourcing computation. The existing literature nique to extent somewhat homomorphic encryption scheme
on privacy-preserving in outsourcing computation are mostly to a fully homomorphic encryption scheme. In addition, we
based on encrypting the input data with same public key. In present a security analysis of our proposed POCC scheme. In
practice, there are more scenarios that data are outsourced summary, our contributions can be summarized as follows:
from many diverse sources and parties. More precisely, we
are considering the following scenario in this paper: • We propose a privacy-preserving outsourced classifica-
tion scheme in cloud computing, which allows clients to
• For n mutually distrusting data providers P1 , P2 , . . . , Pn , make privacy-query predict with high accuracy.
every data provider Pi (i ∈ [1, n]) has his/her own public • Our POCC scheme is designed to conduct the ciphertexts
and private key pair, encrypts data with their respective under different public keys, and to preserve the privacy of
public keys before outsourcing this ciphertexts to an eval- data providers and client without leakage.
uator for storing, and allows evaluator to do some process. • Our POCC scheme allows the evaluator to train a classi-
• Evaluator does part of classification computation over the fication model by using the data providers’ outsourced.
encrypted domain. But the input data, the intermediate This classification model is stored as in encrypted form
results and the output remain confidential. in the evaluator side.
• Generally, in addition to provide data to the evaluator, • The security analysis is proved that our proposed POCC
there is no other interaction between the data providers scheme is privacy-preserving in the semi-honest model.
and evaluator.
• Once the classifier is trained, the evaluator can respond 1.3 Organization
to the client’s query while the evaluator learns nothing at
all. The rest of the paper is organized as follows. Section 2 dis-
cusses related work. Section 3 gives some notations and
Based on this setting, this paper focuses on privacy- presents the definition of homomorphic encryption and clas-
preserving outsourced classification tasks in cloud comput- sification related to this paper. The system model is given in
ing over data encrypted with different public keys. Sect. 4. The privacy-preserving classification system based
on the proxy fully homomorphic encryption scheme is illus-
1.1 Challenges trated in Sect. 5, while analyzing security in Sect. 6. Finally,
the whole paper is concluded in Sect. 7.
The privacy-preserving outsourced classification model
poses various issues and challenges. The key challenge can
be broken down into two aspects: (1) The computation per- 2 Related work
formed over the sensitive data by classifier is quite complex,
including additions, multiplications and the nonlinear com- Most of the privacy-preserving techniques in machine learn-
putations, which leads to low efficiency in practice. (2) To ing focus on secure multiparty computation (SMC) [21],
support the computation over the sensitive data encrypted (fully) homomorphic encryption [20,22], garbled circuits
with different public keys, it needs to choose efficient fully (GCs) [23] and so forth. These techniques can support the
homomorphic encryption scheme according to the operations compute and compare operations over the encrypted domain.
in privacy-preserving outsourced classification model. Generally speaking, the structure of privacy-preserving
machine learning can be divided into two categories: (1)
1.2 Contributions privacy-preserving focus on training, which needs a efficient
algorithm to solve the optimization problem by using sam-
In this paper, we design a Privacy-preserving Outsourced ples or past experiences securely, and (2) privacy-preserving
Classification in Cloud Computing, called POCC. In our focus on classification, which is a forecast what will happen
POCC scheme, multiple data providers can securely out- in new situations from data that describe what happened in
source their sensitive data to an untrusted evaluator for the past, often by guessing the classification of new samples.
storing and processing. Evaluator can utilize these out- Some existing work focus on the first or second category,
sourced ciphertexts to build a classification model and which we describe in Sects. 2.1 and 2.2, respectively. Our
response to the clients’ predict query. To support arbi- work falls in the both categories. Not only protect the privacy

123
Cluster Comput

of the training samples which provided by the data providers, Table 1 Notations Used in This Paper
but also protect the classifier parameters of evaluator and Acronym Description
protect the prediction result of clients, only the clients can
obtain the prediction result. S Evaluator
CSP Crypto service provider
2.1 Privacy-preserving training ( pk i , sk i ) Data provider’s public and secret key
( pk 0 , sk 0 ) Crypto service provider’s public and secret key
A lot of works have been supposed for privacy-preserving G Classifier
training algorithm, such as back-propagation networks [16, G (x, θ) Prediction of G with input x and θ
24], naïve bayes [25,26], K-means clustering [27–29], asso- [·]u Encryption under the public key u
ciation rules [30,31], linear regression [32,33], decision trees
[31,34] and so forth.
The authors in [35,36] proposed a privacy-preserving
under different public keys into encryption under a proxy
two-party distributed algorithm for back-propagation train-
public key. In [41], Bost et al. developed a two party com-
ing with vertically partitioned data and arbitrarily partitioned
putation framework and used fully homomorphic encryption
data, respectively. Grapel et al. [37] in their work used some-
scheme as privacy-preserving technique, which allows them
what homomorphic encryption as the privacy-preserving
to realize the classifier models of hyperplane decision, Naïve
technique, describing how to implement two binary clas-
Bayes and binary decision trees are handled over ciphertext.
sification algorithms (Linear Means and Fisher’s Linear
However, these schemes cost large computation and commu-
Discriminant) over encrypted domain. In a weaker security
nication. Zhang et al. [42] focused on a class of regularized
model, the client may learn some information about the clas-
empirical risk minimization machine learning problems, and
sification model, even though their work supports private
proposed two solutions, which used dual variable permu-
classification. Shokri et al. [38] used the parallel deep learn-
tation and primal variable perturbation to support dynamic
ing to design, implement, and evaluate a privacy-preserving
differential privacy.
deep learning. In their work, every participant trains his/her
local data set with the same neural network model and uses
the selective parameter sharing of model as a technique to 3 Preliminaries
benefit from other participants’ models without explicit shar-
ing of training inputs. However, this method takes storage In this section, we define some notations and review some
space of participant, which holds the sensitive data. Mean- cryptographic primitives used in our secure system. The nota-
while, every participant needs to do some computations for tions used in this paper are listed in Table 1.
training.
3.1 Homomorphic encryption
2.2 Privacy-preserving classification
According to Gentry’s scheme [20], a public key encryp-
Classifier learning is one of the most fundamental tasks in tion scheme  = (KeyGen, Encrypt, Decrypt, Evaluate)
data mining and machine learning. In the case of a medical is called homomorphic, if the evaluation circuit performs
diagnosis for prediction, if we have a rule fits the patient’s the computation over encrypted data without decryption. In
historical data and if the new patient’s data are similar to the other words, performing some cryptographic operation over
historical data, then we can take correct predictions for the ciphertexts, is equivalent to perform corresponding operation
new patient’s diagnosis. However, the information of (new) over the plaintexts of the ciphertexts without decryption.
patient’s data are sensitive, including age, gender, past med- Key generation algorithm KenGen takes a security
ical history, and current symptoms. Hence, to preserve the parameter λ as input and outputs a public key pk and a private
confidentiality of (new) patients’ data, a privacy-preserving key sk, i.e., ( pk, sk) ← KeyGen(λ).
medical diagnosis should be proposed. Encryption algorithm Encrypt takes as input public key
Barni et al. [39] implemented two methods for secure pk and plaintext x and returns the corresponding ciphertext
classifier of EGC signals: the former based on linear branch- c, i.e., c ← Encrypt( pk, x).
ing programs and the latter based on neural network. Their Decryption algorithm Decrypt takes as input private key
key technical innovation is homomorphic encryption and the sk and ciphertext c and returns a plaintext x, i.e., x ←
garbled circuits. Liu et al. [40] realized a secure patient- Decrypt(sk, c).
centric clinical decision support system based on naïve Evaluation algorithm Evaluate takes as input public
bayesian classification. They used an additive homomor- key pk, a circuit C ∈ C , and a tuple of ciphertext
phic proxy aggregation scheme to convert the encryption (c1 , c2 , . . . , cn ), and returns a ciphertext c, i.e.,

123
Cluster Comput

c ← Evaluate( pk, C, c1 , c2 , . . . , cn ), x C1
such that C(x1 , x2 , . . . , xn ) = Decrypt(sk, c), where C is
a permitted circuit set and ci ← Encrypt( pk, xi ), i ∈ [1, n].
C
Here, the circuit C (∈ C ) may be any circuit with
different functions. Suppose circuit C is decryption circuit
D (D ∈ C ). Then Evaluate can perform the decryption
reject
operation. In this case, we say  is bootstrappable.

3.2 Classification in machine learning


C
Most of machine learning algorithms are used for classifica-
tion. Generally speaking, classification model can be broken x1
down into two stages, the inference stage in which we use
Fig. 1 Example of decision boundaries and decision regions
training set to learn a model for computing feature vector, and
the decision stage in which we use these feature vectors to
make optimal class assignments. Meanwhile, classification gi (x) ∝ Pr (x|Ci )Pr (Ci ), (3.3)
model can also be viewed as implementing a set of discrim-
inant functions, g1 (x), . . . , g K (x) which maps each input x which divides the feature space into K decision regions
directly into one of K class labels, denoted Ck , where input R1 , . . . , R K , where Ri = {x|gi (x) = max gk (x)}. Accord-
k
vector x with n attributes: x = (x1 , x2 , . . . , xn ) ∈ Rn . That ing to Eq. (3.3), the decision regions are separated by decision
is, we choose Ck if gk (x) = max gi (x), not choose Ck , oth- boundaries, surfaces in feature space where ties occur among
i
erwise. the largest discriminant functions (refer to Fig. 1).
According to the training set, classification algorithm For simplicity, if there are three classes, we define a single
can be broadly categorized as supervised or unsupervised. discriminant function
Supervised classifier algorithms experience a dataset con-
taining some labeled or targeted feature samples. While, g(x) = g1 (x) − g2 (x). (3.4)
unsupervised classifier algorithms experience a dataset con-
taining many feature samples, and learn useful knowledge of Then we choose C1 , if g(x) > 0; C0 , if g(x) = 0; and C2 ,
the structure of this dataset. In order to give more specific otherwise.
description, we show a supervised classification, i.e., Naïve It should be noted that Naïve Bayes classifier is rooted
Bayes Classifiers as follows. in the simplifying assumption that given the target value,
the attribute values are conditionally independent. Namely,
3.3 Naïve Bayes classifiers given the labeled value of the instance, the probability of
observing the conjunction x1 , x2 , . . . , xn corresponding to
According to the above description, we represent the Naïve the product of
Bayes classifier by setting the probabilities for the individual attributes:
Pr (x|Ci ) = nj=1 Pr (x j |Ci ).

K
gi (x) = −R(vi |x) = λik P(Ck |x), (3.1) 4 Problem formulation
k

where vi denotes the decision to assign the input to class Ci , In this section, we first describe our system model for
λik denotes the loss incurred for tacking vi when the input privacy-preserving outsourced classification in cloud com-
belongs to class Ck , and R is the expected risk for tacking vi . puting (POCC). Later, we describe the adversary model of
Hence, Eq. (3.1) shows the maximum discriminant function POCC. Finally, we give the description of design goals in our
corresponding to the minimum conditional risk. If we use the POCC.
zero-one loss function, then we have
4.1 System model for POCC
gi (x) = Pr (Ci |x)
Pr (x|Ci )Pr (Ci )
= Pr (x) . (3.2) In our POCC scheme, there are four types of entities, includ-
ing a group of data providers, evaluator S, individual clients,
Given the training sample input x, there is no relationship and crypto service provider CSP. More concretely and
between “witness” Pr (x) and labeled category. Therefore, clearly, each type of entity in our system is described as fol-
Eq. (3.2) can be written as lows (illustrated in Fig. 2):

123
Cluster Comput

of the training process as much as possible. Let’s assume


the evaluator S is able to correctly train a classifier. If S is a
malicious entity, he/she wants to corrupt some data providers
or S constructs an incorrect classifier in order to learn some
information about the sensitive data. However, this behavior
Data Providers against the motivation. Assume that S gets some informa-
tion from the data providers’ sensitive data or gives some
false prediction results to the clients. Once the information
is leaked or the false prediction is found, there is no one to
provide the data to it for data storing and data processing or
CSP no client wants to query for classifiable service. Therefore, a
Clients
malicious S should get nothing about the data providers and
clients.
Evaluator
Under similar motivation, a malicious CSP also gets noth-
Fig. 2 Model overview ing about the data providers’ and clients’ sensitive data.
Hence, in our system, we assume the following conditions
• Data providers Each data provider Pi (i ∈ [1, n]) out- are satisfied,
sources his/her historical record data to an untrusted
evaluator S for storing and allows the evaluator S to • Assume that the evaluator S and crypto service provider
perform some operations on these stored data. Since the CSP do not collude with each other. Even though they
historical record data contains data providers’ personal want to disturb the system, but they only do indepen-
information, it’s sensitive. To prevent the privacy leakage dently.
in the public cloud, the Pi (i ∈ [1, n]) encrypts his/her • Assume that the crypto service provider CSP publishes a
sensitive data before outsourcing to the evaluator S. public key, which can be verified by using the certificate
• Semi-honest evaluator Evaluator S provides storage from the certificate authority.
space and query service for data providers and clients.
S can be a company or institution (such as hospital or 4.3 Design goals
medical research and development institution) in real life.
The evaluator S stores the data (encrypted with different In our work, to address the security and adversary models,
public keys chosen by data providers independently), and we design a POCC scheme, which allows the evaluator S to
undertakes the work that uses these ciphertexts to con- process and construct a privacy-preserving classifier over the
struct a classification model. Finally, S uses this trained multiple data providers’ data and response to the client’s pre-
classifier to analytics or prediction for client’s query. diction queries. Our system is designed as following goals:
• Semi-honest crypto service provider Crypto service
provider CSP only provides online crypto services to • Classifier accuracy The system should be classified cor-
users. For instance, a CSP can handle ciphertexts and rectly, for every query from the client, and the prediction
decrypt a ciphertexts sent by the S. should be correct with highly probability.
• Clients Clients have gathered some symptom information • Privacy-preserving In our setting, the system needs to
or feature vector about current records. If clients want to provide privacy guarantee without disclosing sensitive
query the evaluator S for prediction service securely, then information about the data providers’/client’s data, and
clients encrypt their queries before outsourcing to S. the classifier model. To realize this security goal, the
training phase and prediction phase should be performed
on encrypted domain. Once the evaluator E gives a pre-
4.2 Adversary model
diction, only the client can obtain the decrypted result by
using its private key. Furthermore, the CSP is not known
In our setting, to ensure the security and privacy, our POCC
and hidden from the data providers.
is based on the semi-honest model, also called curious-but-
• Flexibility In our system, the CSP is not a fixed service
honest [43], which means that all types of entities (i.e., data
provider. According to different motivations and func-
providers, clients, evaluator S, and crypto service provider
tions, the CSP can be a different entity or institution and
CSP) will perform our system honestly, but try to collect
publish the corresponding different public keys.
or discover some information about the intermediate results

123
Cluster Comput

5 POCC scheme • Secure extraction Once the client obtains a prediction


G([z] pk c , [w] pk c ), he/she decrypts this prediction by
5.1 Main idea using its own private key sk c and gets the prediction.

In our POCC scheme, we focus on how to realize a clas-


sification model over multiple data sets securely, and how 5.3 Detail design of POCC
to use this classification model to provide prediction service
for clients confidentially. A set of n mutually distrusting data In this subsection, we present more detailed description
providers P1 , P2 , . . . , Pn outsource their encrypted data to about our POCC scheme, which is divided into three
an evaluator S for storing, and allow the evaluator S to per- phases: privacy-preserving training classifier (Phase 1),
form some operations on these stored data set. To support the privacy-preserving predicting for query (Phase 2), privacy-
computation on data encrypted with different public keys, a preserving extracting result (Phase 3). The overall construc-
proxy re-encryption fully homomorphic encryption will be tion of our system is showed in Fig. 2.
used as a privacy-preserving technique to encrypt the data
providers’ sensitive data before outsourcing to the evalua- 5.4 Phase 1: Privacy-preserving training classifier
tor S. Using these outsourced ciphertexts, S constructs a
classification model and stores this classification model as In this phase, the main work is that training classifier over
encrypted form. Using this classification model, evaluator S the encrypted domain, which is send from the multiple data
can response to the queries of the clients. providers P1 , P2 , . . . , Pn .

5.2 The overview of POCC 5.4.1 Initialization

In this subsection, we give the overview of POCC and Initially, a setup process initializes the Gentry’s scheme 
show how to achieve the data outsourcing and classification and distributes the system security parameters. Meanwhile,
securely. in this phase, according to different target or motivation,
determine a CSP entity with special function. Hence, once
the CSP is confirmed, it is distributed a public/private key
• Secure data outsourcing When some data are outsourced
pair ( pk0 , sk0 ).
to evaluator S, data providers need to do some confiden-
tial processing, so as to preserve the confidentiality of
data. In our work, the method for confidential process- 5.4.2 Key generation
ing is based on Gentry’s scheme , which can be used
to convert the ciphertext under different public keys into Data provider Pi (i ∈ [1, n]) runs KeyGen to generate public
ciphertext under same public key. key pki and private key ski when taking security parameter
• Secure training After receiving the outsourced data, the λ as input, i.e., ( pki , ski ) ← KeyGen(1λ ).
evaluator S will use these outsourced data to build a clas-
sification model. Due to the outsourced data encrypted 5.4.3 Outsourcing
with different public keys, evaluator S and crypto service
provider CSP will jointly train a classification model. In this phase, data provider Pi (∈ [1, n]) encrypts its sensitive
After classification, the classification model is stored as symptom vector xi ∈ Rn s and the associated desired vector
encrypted form in evaluator S. Later, S uses this classi- yi ∈ Rn d before outsourcing the ciphertexts to an evaluator
fication model to provide a prediction service for the the S. Pi (∈ [1, n]) needs to do the following algorithms
clients’ query.
• Secure prediction Right now, evaluator S runs a classi-
fier, denoted as G over a new encrypted query (a feature – ReKeyGen( pk0 , ski ). Taking a public key pk0 and its
vector) z which is contributed by the client, and pre- own private key ski as input, it outputs sk i = {sk i j } Jj=1 ,
dicts a result G([z] pk 0 , [w] pk 0 ). Here, to preserve the where sk i j ← Encrypt( pk0 , ski j ) is the j th binary rep-
confidentiality of the query, the client encrypts it before resentation of ski as elements of the plaintext space
outsourcing to S. If the classification result or prediction P = {0, 1}.
is given (encrypted under pk 0 ), then evaluator S trans- – Encrypt( pki , message). Taking a public key pki and
forms this prediction into encryption under the client’s a plaintext message ∈ P as input, it outputs cipher-
public key pk c . Finally, S sends the transformed encryp- text [message]pki ← Encrypt(pki , message), where
tion G([z] pk c , [w] pk c ) to the client. message = xi or yi .

123
Cluster Comput

Then, the data provider Pi (i ∈ [1, n]) outsources the discriminant function G is linear, then the computation over
triple ( pki , [xi ] pki , [yi ] pki , sk i ) to evaluator S for data stor- encrypted domain can be done directly by secure multiplica-
ing and allows S to process these encrypted data set. tion and secure addition, because the encryption scheme is
fully homomorphic encryption. However, if the discriminant
5.4.4 Training function G is nonlinear, then we can use Taylor or Maclaurin
series or liner function to approximate it.
Once evaluator S received the outsourcing data set, it needs
to run a ReEncrypt algorithm such that the data provider 5.5 Phase 2: Privacy-preserving predicting for the query
Pi ’s ciphertext [xi ] pki can be converted into some ciphertext
under public key pk0 . That is for ReEncrypt 5.5.1 Prediction
 
set xi j ← Encrypt pk0 , [xi ] pki , j , After learning a classification model G(·) over encrypted
domain with the parameter θ ∗ , evaluator S runs this clas-
output sifier over a new encrypted feature vector [z] pk c which is
  contributed by the client.
[xi j ] pk0 ← Evaluate pk0 , D , sk i j , xi j , Suppose the client’s query z with p attributes, i.e.,
z = (z 1 , . . . , z p ), he/she needs to compute sk cj ←
where [xi ] pki , j is an encryption of the j th bit of the ReKeyGen( pk0 , skcj ) and [z i ] pkcj ← Encrypt( pkc , z i j ),
ciphertext [xi ] pki , the encryption under pk0 is [xi ] pk0 = where encryption sk cj and [z i ] pkcj denote the encrypted form
([xi1 ] pk0 , . . . , [xi J ] pk0 ), j ∈ [1, J ]. Similarly, the encrypted about the j th binary representation of skc and z i , respec-
labeled class under pk0 is [yi ] pk0 = ([yi1 ] pk0 , · · · , [yi J ] pk0 ), tively. Here, i ∈ [1, p] and j ∈ [1, J ]. Therefore, the
j ∈ [1, J ]. client sends the data queries [z] pkc = ([z 1 ] pkc , . . . , [z p ] pkc ),
Hence, evaluator S obtains the re-encryptions (sk i , [xi ] pk0 , sk c = (sk c1 , . . . , sk c J ) and pkc to S. Then S transforms the
[yi ] pk0 ) under pk0 for each data provider Pi (i ∈ [1, n]) encryption [z] pkc into [z] pk0 by using ReEncrypt.
by using proxy fully homomorphic technique. Let’s denote Right now, evaluator S runs a classifier G(·) over unseen
K mutually disjoint classes by C1 , C2 , . . . , C K . So, for en feature vector [z] pk0 , and uses the parameter θ ∗ to output a
encrypted training set X = {[xi ] pk0 , [yi ] pk0 }i=1 n , the eval-
prediction [g] pk 0 = G([z] pk0 , [θ ∗ ] pk0 ). Since the classifier
uator S needs to design an optimized approximation to G(·) over encrypted domain under the pk 0 , S chooses a ran-
[yi ] pk0 by using the classifier model G([xi ] pk0 , [θ ] pk0 ). Here, domness vector r ∈ R p , blinds this prediction as [g  ] pk 0 =
G(X , θ ) denotes a classifier model, which defines the hypoth- AddF ( pk 0 , [g] pk 0 , [r] pk 0 ) and sends this blinded encryption
esis class, and a particular value of parameter θ instantiates to the CSP. Even though the CSP decrypts the message g 
one hypothesis from the hypothesis class. For example, in by using the private key sk 0 , it only gets a blinded message
classification problems, if we take a rectangle as our model, g  without leaking the information about the prediction g.
then parameter θ consists of four coordinates; if we take a So, the CSP encrypts the blinded message g  as [g  ] pk c ←
linear regression as our model, then the input parameter slope Encrypt( pk c , g  ) and sends it to S. Finally, S removes the
and intercept are learned from the data. randomness r as [g] pk c ← AddF ( pk c , [g  ] pk c , [−r] pk c ).
If evaluator S computes the approximation value G([xi ] pk0 ,
[θ ] pk0 )], then it needs to compute the difference between the 5.6 Phase 3: Privacy-preserving extracting result
desired vector [yi ] pk0 and approximation value. The follow-
ing is the approximation loss 5.6.1 Extraction
    
E [θ ] pk0 , X = L [yi ] pk0 , G([xi ] pk0 , [θ ] pk0 , (5.1) Once the transformed prediction results [g] pk c have been
i computed, the evaluator S sends it back to the client. When
the client receives the encrypted prediction [g] pkc , he/she
where L(·) is the loss function and checks for equality or not.
uses algorithm Decrypt to decrypt this prediction, i.e., the
Right now, evaluator S needs to use the optimization algo-
client takes private key skc and a ciphertext [g] pk c as input,
rithm to find θ ∗ which minimizes the approximation loss
and outputs g ← Decrypt(skc , [g] pk c ).
 
[θ ∗ ] pk0 = argmin E [θ ] pk0 , X (5.2)
θ
6 Security definition and analysis
where argmin returns the argument of minimum.
During the optimization algorithm, evaluator S uses the In this section, we analyze the privacy characteristic of our
Yao’s circuit [23] for comparison. It is worth noting that if the proposed POCC against possible attacks by various parties

123
Cluster Comput

involved in our model. First, we give the definition of the circuit C, then it can compute Evaluate by himself. This
semantic security [44] as follows. shows that the role of U and S are the same. But, evaluator S
only guesses the encrypted plaintext correctly with negligi-
Definition 6.1 (Semantic Security(SS), IND-CPA) A public- ble advantage, even though some of the stored encrypted data
key encryption scheme  = (KeyGen, Encrypt, Decrypt) with the same plaintext. Therefore, according to Property 6.2,
is semantically secure, if for any stateful probabilistic we have the following theorem.
polynomial-time adversary A = (A1 , A2 ), its advantage
  1 Theorem 6.3 The POCC scheme represented in Sect. 5
E ,A (λ)
AdvSS := |Pr ExpSS
E ,A (λ) = 1 − | is semantically secure, if the underlying public encryption
2 scheme  is semantically secure.
is negligible, where the experiment ExpSS
E ,A (λ) is defined as
follows:
ExpSS 7 Conclusions and future work
E ,A (λ):
b ← {0, 1}
( pk, sk) ← KeyGen(λ) In our work, we proposed a POCC scheme which allows
(m 0 , m 1 ) ← A1 ( pk) multiple data providers to outsource fully homomorphic
c ← Encrypt( pk, m b ) ciphertexts to an evaluator S for storing and processing.
b ← A2 (c) Based on Gentry’s scheme , we use a proxy fully homomor-
If b = b, return 1; phic encryption technique to preserve the privacy of sensitive
else, return 0 data. Using POCC scheme, the evaluator S and cypto ser-
vice provider CSP jointly train a classification model over
Here, the messages m 0 , m 1 are called chosen plaintext, and the data encrypted with different public keys. This classifica-
c is called the challenge ciphertext. We require the length of tion model is stored as encrypted form in S, which will be
two messages be equal, i.e., |m 0 | = |m 1 |. If the message used to provide a prediction service for the clients. However,
length is not equal, A2 can pad the shorter message into the between S and CSP, there are some interactions.
equal length of the other. Hence, from the Definition 6.1, we Therefore, as a future research effort, we would reduce
can obtain a property of semantic security as follows. the cost of computation and communication.

Property 6.2 Given the ciphertext, any computation be effi- Acknowledgements This work was supported by National Natu-
ciently performed on the plaintext, can be also efficiently ral Science Foundation of China (No. 61472091), Natural Science
Foundation of Guangdong Province for Distinguished Young Schol-
performed without the ciphertext. ars (2014A030306020), and Science and Technology Planning Project
of Guangdong Province, China (2015B010129015)
6.1 Semi-honest crypto service provider

For a semi-honest crypto service provider CSP, its only gets References
the blinded plaintext by using private key sk 0 . Since evaluator
S randomly chooses some randomness as blind factor, and 1. Gu, B., Sheng, V.S.: A robust regularization path algorithm for
adds it to a fully homomorphic ciphertext before sending to V-support vector classification. IEEE Trans. Neural Netw. Learn.
Syst. 4, 1–32 (2016). doi:10.1109/TNNLS.2016.2527796
CSP. Thus, CSP cannot gain additional information on the
2. Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability
blinded ciphertext. machine. IEEE Trans. Neural Netw. Learn. Syst. (2016). doi:10.
1109/TNNLS.2016.2544779
6.2 Semi-honest evaluator 3. Wen, X.Z., Shao, L., Xue, Y., Fang, W.: A rapid learning algorithm
for vehicle classification. Inf. Sci. 295(1), 395–406 (2015)
4. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with
Right now, we give a security analysis for our POCC deep recurrent neural networks. In: IEEE International Conference
system, proving that it is semantically secure. We first on Acoustics, Speech and Signal Processing (ICASSP), vol. 2013,
assume that a user U (the adversary) chooses two plaintexts pp. 6645–6649. IEEE (2013)
5. Gupta, B.B., Badve, O.P.: Taxonomy of DoS and DDoS attacks and
m 0 = (m 01 , m 02 , . . . , m 0k ), m 1 = (m 11 , m 12 , . . . , m 1k ) ∈ desirable defense mechanism in a cloud computing environment.
P = {0, 1}, and some circuit C ∈ C with size k = Neural Comput. Appl. 2016, 1–28 (2016)
play(κ). Then U sends two plaintexs m 0 , m 1 and cir- 6. Stergiou, C., Psannis, K.E., Kim, B.G., et al.: Secure integration
cuit C to the semi-honest evaluator S (the challenger). of IoT and cloud computing. Futur. Gener. Comput. Syst. (2016).
doi:10.1016/j.future.2016.11.031
Cloud server S tosses a fair coin b ∈ {0, 1}, and outputs
7. Li, J., Li, J.W., Chen, X.F., et al.: Identity-based encryption with
z ∗ ← Evaluate( pk, C, yb1 , yb2 , . . . , ybk ) where xbi ← outsourced revocation in cloud computing. IEEE Trans. Comput.
Encrypt( pk, m bi ). However, U knew the encryption xbi and 64(2), 425–437 (2015)

123
Cluster Comput

8. Li, J., Yan, H.Y., Liu, Z.L., et al.: Location-sharing systems with 29. Lin, K.P.: Privacy-preserving kernel k-means clustering outsourc-
enhanced privacy in mobile online social networks. IEEE Syst. J. ing with random transformation. Knowl. Inf. Syst. 49(3), 885–908
(2015) (2016)
9. Li, J., Chen, X.F., Huang, X.Y., et al.: Secure distributed dedu- 30. Vaidya, J., Clifton, C.: Privacy preserving association rule mining
plication systems with improved reliability. IEEE Trans. Comput. in vertically partitioned data. In: Proceedings of the Eighth ACM
64(12), 3569–3579 (2015) SIGKDD International Conference on Knowledge Discovery and
10. Badve, O.P., Gupta, B.B.: Taxonomy of recent DDoS attack pre- Data Mining, pp. 639–644. ACM (2002)
vention, detection, and response schemes in cloud environment. In: 31. Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM
Proceedings of the International Conference on Recent Cognizance Sigmod Record. ACM 29(2), 439–450 (2000)
in Wireless Communication and Image Processing, pp. 683–693. 32. Dankar, F.K.: Privacy preserving linear regression on distributed
Springer, Delhi (2016) databases. Trans. Data Priv. 8(1), 3–28 (2015)
11. Gou, Z., Yamaguchi, S., Gupta, B.B.: Analysis of various security 33. Dankar, F., Brien, R., Adams, C., Matwin, S.: Secure multi-party
issues and challenges in cloud computing environment: a survey. linear regression. In: EDBT/ICDT Workshops, p. 406414 (2014)
In: Handbook of Research on Modern Cryptographic Solutions for 34. Vaidya, J., Clifton, C., Kantarcioglu, M., et al.: Privacy-preserving
Computer and Cyber Security, pp. 393–419. IGI Global (2016) decision trees over vertically partitioned data. ACM Trans. Knowl.
12. Xia, Z.H., Wang, X.H., Sun, X.M., et al.: A secure and dynamic Discov. from Data (TKDD) 2(3), 14 (2008)
multi-keyword ranked search scheme over encrypted cloud data. 35. Bansal, A., Chen, T., Zhong, S.: Privacy preserving back-
IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015) propagation neural network learning over arbitrarily partitioned
13. Fu, Z.J., Huang, F.X., Sun, X.M. et al.: Enabling semantic search data. Neural Comput. Appl. 20(1), 143–150 (2011)
based on conceptual graphs over encrypted outsourced data. IEEE 36. Chen, T.T., Zhong, S.: Privacy-preserving back-propagation neural
Trans. Serv. Comput. (2016) network learning. IEEE Trans. Neural Netw. 20(10), 1554–1564
14. Fu, Z.J., Ren, K., Shu, J.G., et al.: Enabling personalized search (2009)
over encrypted outsourced data with efficiency improvement. IEEE 37. Graepel, T., Lauter, K., Naehrig, M.: ML confidential: Machine
Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016) learning on encrypted data. Information Security and Cryptology
15. Erkin, Z., Franz, M., Guajardo, J., et al.: Privacy-preserving face (ICISC), p. 121. Springer, Berlin (2012)
recognition. In: International Symposium on Privacy Enhancing 38. Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In:
Technologies Symposium, pp. 235–253. Springer, Berlin (2009) Proceedings of the 22nd ACM SIGSAC Conference on Computer
16. Yuan, J.W., Yu, S.C.: Privacy preserving back-propagation neu- and Communications Security, pp. 1310–1321. ACM (2015)
ral network learning made practical with cloud computing. IEEE 39. Barni, M., Failla, P., Lazzeretti, R.: Efficient privacy-preserving
Trans. Parallel Distrib. Syst. 25(1), 212–221 (2014) classification of ECG signals. In: First IEEE International Work-
17. Boneh, D., Goh, E.J., Nissim, K.: Evaluating 2-DNF formulas on shop on, Information Forensics and Security, et al.: WIFS 2009,
ciphertexts. In: Theory of Cryptography Conference, pp. 325–341. pp. 91–95. IEEE (2009)
Springer, Heidelberg (2005) 40. Liu, X., Lu, R., Ma, J., et al.: Privacy-preserving patient-centric
18. Zhang, Q., Yang, L.T., Chen, Z.: Privacy preserving deep compu- clinical decision support system on naive Bayesian classification.
tation model on cloud for big data feature learning. IEEE Trans. IEEE J. Biomed. Health Inf. 20(2), 655–668 (2016)
Comput. 65(5), 1351–1362 (2016) 41. Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning
19. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully classification over encrypted data. NDSS (2015)
homomorphic encryption without bootstrapping. ACM Trans. 42. Zhang, T., Zhu, Q.: Dynamic differential privacy for ADMM-
Comput. Theory (TOCT) 6(3), 13 (2014) based distributed classification learning. IEEE Trans. Inf. Forensics
20. Gentry, C.: Fully homomorphic encryption using ideal lattices. Secur. 12(1), 172–187 (2017)
STOC 2009(9), 169–178 (2009) 43. Di Vimercati, S.D.C., Foresti, S., Jajodia, S., et al.: Over-
21. Goldreich, O.: Secure multi-party computation. Manuscript. Pre- encryption: management of access control evolution on outsourced
liminary version, pp. 86–97 (1998) data. In: Proceedings of the 33rd International Conference on Very
22. Van Dijk, M.. Gentry, C., Halevi, S. et al.: Fully homomorphic Large Data Bases. VLDB endowment, pp. 123–134 (2007)
encryption over the integers. In: Annual International Conference 44. Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput.
on the Theory and Applications of Cryptographic Techniques, pp. Syst. Sci. 28(2), 270–299 (1984)
24–43. Springer, Berlin (2010)
23. Yao, A.C.: How to generate and exchange secrets. In: 27th Annual
Symposium on Foundations of Computer Science, vol. 1986, 162–
167. IEEE (1986) Ping Li received the M.S. and
24. Schlitter, N.: A protocol for privacy preserving neural network Ph.D. degree in applied mathe-
learning on horizontally partitioned data. PSD (2008) matics from Sun Yat-sen Univer-
25. Agrawal, D., Srikant, R.: Privacy-preserving data mining. Proc. sity in 2011 and 2016, respec-
ACM Conf. Manag. Data 29(2), 439–450 (2000) tively. She is currently a postdoc
26. Vaidya, J., Kantarcoǧlu, M., Clifton, C.: Privacy-preserving naive with the School of Computer
bayes classification. VLDB J. Int. J. Very Large Data Bases 17(4), Science and Educational Soft-
879–898 (2008) ware, Guangzhou University.
27. Samanthula, B.K., Rao, F.Y., Bertino, E., et al.: Privacy-Preserving Her current research interests
and Outsourced Multi-User k-Means Clustering. arXiv preprint include cryptography, privacy-
arXiv:1412.4378 (2014) preserving and cloud computing.
28. Jagannathan, G., Wright, R.: Privacy-preserving distributed k-
means clustering over arbitrarily partitioned data. In: Proceedings
of the Eleventh ACM SIGKDD International Conference on
Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)

123
Cluster Comput

Jin Li received the B.S. degree Wen-Bin Chen received his


in mathematics from Southwest M.S. degree in mathematics from
University in 2002 and the Ph.D. Institute of Software, Chinese
degree in information security Academy of Science in 2003, and
from Sun Yat-sen University in the Ph.D. degree in computer sci-
2007. Currently, he works at ence from North Carolina State
Guangzhou University as a pro- University, U.S.A in 2010. He
fessor. He has been selected as is currently an associate profes-
one of science and technology sor at the School of Computer
new star in Guangdong province. Science and Educational Soft-
His research interests include ware, Guangzhou University. His
applied cryptography and secu- research interests focus on theo-
rity in cloud computing. He has retical computer science includ-
published more than 70 research ing lattice-based cryptography,
papers in refereed international algorithm design and analysis,
conferences and journals and has served as the program chair or program bioinformatics algorithms, graph algorithms, graph mining, computa-
committee member in many international conferences. tional complexity, etc.

Zhengan Huang received his Kai Chen received the B.S.


B.S. and M.S. degrees from degree from Nanjing Univer-
Department of Mathematics, Sun sity in 2004, and the Ph.D.
Yat-sen University in 2009 and degree from University of Chi-
2011, respectively, and his Ph.D. nese Academy of Sciences in
degree from Department of Com- 2010. He is an Associate Profes-
puter Science and Engineering, sor in the Institute of Information
Shanghai Jiao Tong University Engineering, Chinese Academy
in 2015. He served as a secu- of Sciences. He was also a post-
rity engineer in Huawei Tech- doc at Pennsylvania State Uni-
nologies Co. Ltd. from 2015 to versity, State College, PA, USA.
2016. Currently, he is a postdoc- His research interests include
toral researcher in Guangzhou software security, security test-
University. His research interests ing on smartphones, and privacy
include public-key cryptography protection in social networks.
and information security.

Chong-Zhi Gao received his


Ph.D. degree in applied math-
ematics from Sun Yat-sen Uni-
versity in 2004. Currently, he
is a professor at the School of
Computer Science and Educa-
tional Software, Guangzhou Uni-
versity. His research interests
include cryptography and pri-
vacy in machine learning.

123

You might also like