0% found this document useful (0 votes)
112 views11 pages

Cloud Data Auditing with Fine-Grained Updates

1) The document discusses authorized public auditing of dynamic big data storage on cloud with efficient verifiable fine-grained updates. It aims to address drawbacks in existing schemes like lack of authorization and inefficient processing of small updates. 2) The proposed scheme utilizes a flexible data segmentation strategy and a ranked Merkle hash tree to better support small dynamic updates. It also adds authorization to address security risks in public verifiability. 3) Theoretical analysis and experiments show the scheme can offer enhanced security, flexibility and significantly lower overhead for applications with frequent small updates like social media.

Uploaded by

ilakkiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views11 pages

Cloud Data Auditing with Fine-Grained Updates

1) The document discusses authorized public auditing of dynamic big data storage on cloud with efficient verifiable fine-grained updates. It aims to address drawbacks in existing schemes like lack of authorization and inefficient processing of small updates. 2) The proposed scheme utilizes a flexible data segmentation strategy and a ranked Merkle hash tree to better support small dynamic updates. It also adds authorization to address security risks in public verifiability. 3) Theoretical analysis and experiments show the scheme can offer enhanced security, flexibility and significantly lower overhead for applications with frequent small updates like social media.

Uploaded by

ilakkiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2234 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO.

9, SEPTEMBER 2014

Authorized Public Auditing of Dynamic Big Data


Storage on Cloud with Efficient Verifiable
Fine-Grained Updates
Chang Liu, Jinjun Chen, Senior Member, IEEE, Laurence T. Yang, Member, IEEE, Xuyun Zhang,
Chi Yang, Rajiv Ranjan, and Ramamohanarao Kotagiri

Abstract—Cloud computing opens a new era in IT as it can provide various elastic and scalable IT services in a pay-as-you-go
fashion, where its users can reduce the huge capital investments in their own IT infrastructure. In this philosophy, users of cloud
storage services no longer physically maintain direct control over their data, which makes data security one of the major concerns of
using cloud. Existing research work already allows data integrity to be verified without possession of the actual data file. When the
verification is done by a trusted third party, this verification process is also called data auditing, and this third party is called an auditor.
However, such schemes in existence suffer from several common drawbacks. First, a necessary authorization/authentication
process is missing between the auditor and cloud service provider, i.e., anyone can challenge the cloud service provider for a proof
of integrity of certain file, which potentially puts the quality of the so-called ‘auditing-as-a-service’ at risk; Second, although some
of the recent work based on BLS signature can already support fully dynamic data updates over fixed-size data blocks, they only
support updates with fixed-sized blocks as basic unit, which we call coarse-grained updates. As a result, every small update will cause
re-computation and updating of the authenticator for an entire file block, which in turn causes higher storage and communication overheads.
In this paper, we provide a formal analysis for possible types of fine-grained data updates and propose a scheme that can fully support
authorized auditing and fine-grained update requests. Based on our scheme, we also propose an enhancement that can dramatically reduce
communication overheads for verifying small updates. Theoretical analysis and experimental results demonstrate that our scheme can
offer not only enhanced security and flexibility, but also significantly lower overhead for big data applications with a large number of
frequent small updates, such as applications in social media and business transactions.

Index Terms—Cloud computing, big data, data security, provable data possession, authorized auditing, fine-grained dynamic data update

1 INTRODUCTION

C LOUD computing is being intensively referred to as one


of the most influential innovations in information
technology in recent years [1], [2]. With resource virtuali-
corporations now offer powerful public cloud services to
users on a scale from individual to enterprise all over the
world; examples are Amazon AWS, Microsoft Azure, and
zation, cloud can deliver computing resources and services IBM SmartCloud.
in a pay-as-you-go mode, which is envisioned to become as Although current development and proliferation of
convenient to use similar to daily-life utilities such as cloud computing is rapid, debates and hesitations on the
electricity, gas, water and telephone in the near future [1]. usage of cloud still exist. Data security/privacy is one of
These computing services can be categorized into Infra- the major concerns in the adoption of cloud computing [3],
structure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) [4], [5]. Compared to conventional systems, users will lose
and Software-as-a-Service (SaaS) [3]. Many international IT their direct control over their data. In this paper, we will
investigate the problem of integrity verification for big
data storage in cloud. This problem can also be called data
auditing [6], [7] when the verification is conducted by a
. C. Liu is with the School of Comput. Sci. and Tech., Huazhong Uni. of
Sci. and Tech., China, and also with the Faculty of Eng. and IT, Uni. of trusted third party. From cloud users’ perspective, it may
Tech., Sydney, Australia. E-mail: changliu.it@gmail.com. also be called ‘auditing-as-a-service’. To date, extensive
. J. Chen, X. Zhang, and C.Yang are with the Faculty of Eng. and IT, Uni. research is carried out to address this problem [6], [7], [8],
of Tech., Sydney, Australia. E-mail: {jinjun.chen, xyzhanggz, chiyangit}@ [9], [10], [11], [12], [13], [14], [15]. In a remote verification
gmail.com.
. L.T. Yang is with the School of Comput. Sci. and Tech., Huazhong Uni. of scheme, the cloud storage server (CSS) cannot provide a
Sci. and Tech., China, and also with the Dept. of Comput. Sci., St. Francis valid integrity proof of a given proportion of data to a
Xavier Uni., Canada. E-mail: ltyang@stfx.ca. verifier unless all this data is intact. To ensure integrity of
. R. Ranjan is with CSIRO Computational Informatics Division, Australia.
E-mail: rranjans@gmail.com.
user data stored on cloud service provider, this support is
. R. Kotagiri is with the Dept. of Comput. and Information Systems, The of no less importance than any data protection mechanism
Uni. of Melbourne, Australia. E-mail: kotagiri@unimelb.edu.au. deployed by the cloud service provider (CSP) [16], no
Manuscript received 21 May 2013; revised 23 July 2013; accepted 24 July matter how secure they seem to be, in that it will provide
2013. Date of publication 4 Aug. 2013; date of current version 13 Aug. 2014. the verifier a piece of direct, trustworthy and real-timed
Recommended for acceptance by Y. Xiang. intelligence of the integrity of the cloud user’s data through
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. a challenge request. It is especially recommended that
Digital Object Identifier no. 10.1109/TPDS.2013.191 data auditing is to be conducted on a regular basis for the
1045-9219 Ó 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LIU ET AL.: PUBLIC AUDITING OF BIG DATA WITH FINE-GRAINED UPDATES ON CLOUD 2235

users who have high-level security demands over their 2 RELATED WORK
data.
Compared to traditional systems, scalability and elasticity
Although existing data auditing schemes already have
are key advantages of cloud [1], [2], [3]. As such, efficiency
various properties (see Section 2), potential risks and
in supporting dynamic data is of great importance. Security
inefficiency such as security risks in unauthorized auditing
and privacy protection on dynamic data has been studied
requests and inefficiency in processing small updates still
extensively in the past [6], [8], [12], [17]. In this paper,
exist. In this paper, we will focus on better support for
we will focus on small and frequent data updates, which is
small dynamic updates, which benefits the scalability and
important because these updates exist in many cloud
efficiency of a cloud storage server. To achieve this, our
applications such as business transactions and online social
scheme utilizes a flexible data segmentation strategy and a
networks (e.g. Twitter [18]). Cloud users may also need to
ranked Merkle hash tree (RMHT). Meanwhile, we will
split big datasets into smaller datasets and store them in
address a potential security problem in supporting public
different physical servers for reliability, privacy-preserving
verifiability to make the scheme more secure and robust,
or efficient processing purposes.
which is achieved by adding an additional authorization
Among the most pressing problems related to cloud is
process among the three participating parties of client, CSS
data security/privacy [4], [5], [19]. It has been one of the
and a third-party auditor (TPA).
most frequently raised concerns [5], [20]. There is a lot of
Research contributions of this paper can be summarized
work trying to enhance cloud data security/privacy with
as follows:
technological approaches on CSP side, such as [21], [22]. As
1. For the first time, we formally analyze different discussed in Section 1, they are of equal importance as our
types of fine-grained dynamic data update requests focus of external verifications.
on variable-sized file blocks in a single dataset. To Integrity verification for outsourced data storage has
the best of our knowledge, we are the first to pro- attracted extensive research interest. The concept of proofs
pose a public auditing scheme based on BLS signa- of retrievability (POR) and its first model was proposed by
ture and Merkle hash tree (MHT) that can support Jules et al. [14]. Unfortunately, their scheme can only be
fine-grained update requests. Compared to existing applied to static data storage such as archive or library. In
schemes, our scheme supports updates with a size the same year, Ateniese, et al. proposed a similar model
that is not restricted by the size of file blocks, named ‘provable data possession’ (PDP) [10]. Their schemes
thereby offers extra flexibility and scalability com- offer ‘blockless verification’ which means the verifier can
pared to existing schemes. verify the integrity of a proportion of the outsourced file
2. For better security, our scheme incorporates an ad- through verifying a combination of pre-computed file tags
ditional authorization process with the aim of which they call homomorphic verifiable tags (HVTs) or
eliminating threats of unauthorized audit chal- homomorphic linear authenticators (HLAs). Work by Shac-
lenges from malicious or pretended third-party ham, et al. [15] provided an improved POR model with
auditors, which we term as ‘authorized auditing’. stateless verification. They also proposed a MAC-based
3. We investigate how to improve the efficiency in private verification scheme and the first public verification
verifying frequent small updates which exist in scheme in the literature that based on BLS signature scheme
many popular cloud and big data contexts such as [23]. In their second scheme, the generation and verification
social media. Accordingly, we propose a further of integrity proofs are similar to signing and verification of
enhancement for our scheme to make it more BLS signatures. When wielding the same security strength
suitable for this situation than existing schemes. (say, 80-bit security), a BLS signature (160 bit) is much
Compared to existing schemes, both theoretical shorter than an RSA signature (1024 bit), which is a desired
analysis and experimental results demonstrate that benefit for a POR scheme. They also proved the security of
our modified scheme can significantly lower com- both their schemes and the PDP scheme by Ateniese, et al.
munication overheads. [9], [10]. From then on, the concepts of PDP and POR were in
fact unified under this new compact POR model. Ateniese,
For the convenience of the readers, we list some et al. extended their scheme for enhanced scalability [8], but
frequently-used acronyms in Appendix 1 which is avail- only partial data dynamics and a predefined number of
able in the Computer Society Digital Library at http://doi. challenges is supported. In 2009, Erway, et al. proposed the
ieeecomputersociety.org/10.1109/TPDS.2013.191. first PDP scheme based on skip list that can support full
dynamic data updates [12]. However, public auditability
1.1 Paper Organization and variable-sized file blocks are not supported by default.
The rest of this paper is organized as follows. Section 2 Wang, et al. [6] proposed a scheme based on BLS signature
discusses related work. Section 3 provides motivating ex- that can support public auditing (especially from a third-
amples as well as a detailed analysis of our research party auditor, TPA) and full data dynamics, which is one of
problem. Section 4 provides a description of our proposed the latest works on public data auditing with dynamics
scheme in detail, with also a detailed analysis of fine- support. However, their scheme lacks support for fine-
grained update requests and how they can be supported. grained update and authorized auditing which are the main
Section 5 provides security analysis for our design. focuses of our work. Latest work by Wang et al. [7] added a
Section 6 provides experimental results. Section 7 con- random masking technology on top of [6] to ensure the TPA
cludes our research and points out future work. cannot infer the raw data file from a series of integrity
2236 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014

Many big data applications will keep user data stored on


the cloud for small-sized but very frequent updates. A most
typical example is Twitter, where each tweet is restricted to
140 characters long (which equals 140 bytes in ASCII code).
They can add up to a total of 12 terabytes of data per day
[18]. Storage of transaction records in banking or securities
markets is a similar and more security-heavy example.
Moreover, cloud users may need to split large-scale
datasets into smaller chunks before uploading to the cloud
for privacy-preserving [17] or efficient scheduling [19]. In
this regard, efficiency in processing small updates is
always essential in big data applications.
To better support scalability and elasticity of cloud
computing, some recent public data auditing schemes do
support data dynamics. However, types of updates
Fig. 1. Relationship between the participating parties in a public auditing
supported are limited. Therefore previous schemes may
scheme. not be suitable for some practical scenarios. Besides, there
is a potential security threat in the existing schemes. We
will discuss these problems in detail in the next Section 3.2.
proofs. In their scheme, they also incorporated a strategy
first proposed in [15] to segment file blocks into multiple
3.2 Problem Analysis
‘sectors’. However, the use of this strategy was limited to
trading-off storage cost with communication cost. 3.2.1 Roles of the Participating Parties
Other lines of research in this area include the work of Most PDP and POR schemes can support public data
Ateniese, et al. [24] on how to transform a mutual iden- verification. In such schemes, there are three participating
tification protocol to a PDP scheme; scheme by Zhu, et al. parties: client, CSS and TPA. Relationships between the
[13] that allows different service providers in a hybrid three parties are shown in Fig. 1. In brief, both CSS and TPA
cloud to cooperatively prove data integrity to data owner; are only semi-trusted to the client. In the old model, the
and the MR-PDP Scheme based on PDP [10] proposed by challenge message is very simple so that everyone can send
Curtmola, et al. [11] that can efficiently prove the integrity a challenge to CSS for the proof of a certain set of file blocks,
of multiple replicas along with the original data file. which can enable malicious exploits in practice. First, a
malicious party can launch distributed denial-of-service
(DDOS) attacks by sending multiple challenges from
3 MOTIVATING EXAMPLES AND multiple clients at a time to cause additional overhead on
PROBLEM ANALYSIS CSS and congestion to its network connections, thereby
causing degeneration of service qualities. Second, an
3.1 Motivating Examples
adversary may get privacy-sensitive information from the
Cost-efficiency brought by elasticity is one of the integrity proofs returned by CSS. By challenging the CSS
most important reasons why cloud is being widely multiple times, an adversary can either get considerable
adopted. For example, Vodafone Australia is currently information about user data (due to the fact that returned
using Amazon cloud to provide their users with mobile integrity proofs are computed with client-selected data
online-video-watching services. According to their statis- blocks), or gather statistical information about cloud service
tics, the number of video requests per second (RPS) can status. To this end, traditional PDP models cannot quite
reach an average of over 700 during less than 10 percent meet the security requirements of ‘auditing-as-a-service’,
of the time such as Friday nights and public holidays, even though they support public verifiability.
compared to a mere 70 in average in the rest 90 percent of
the time. The variation in demand is more than 9 times
[3]. Without cloud computing, Vodafone cannot avoid 3.2.2 Verifiable Fine-Grained Dynamic Data Operations
purchasing computing facilities that can process 700 RPS, Some of the existing public auditing schemes can already
but it will be a total waste for most of the time. This is support full data dynamics [6], [7], [12]. In their models,
where cloud computing can save a significant amount of only insertions, deletions and modifications on fixed-sized
investmentsVcloud’s elasticity allows the user-purchased blocks are discussed. Particularly, in BLS-signature-based
computation capacity to scale up or down on-the-fly at any schemes [6], [7], [13], [15] with 80-bit security, size of each
time. Therefore, user requests can be fulfilled without data block is either restricted by the 160-bit prime group
wasting investments in computational powers. Other 2 order p, as each block is segmented into a fixed number of
large companies who own news.com.au and realestate.com. 160-bit sectors. This design is inherently unsuitable to
au, respectively, are using Amazon cloud for the same support variable-sized blocks, despite their remarkable
reason [3]. We can see through these cases that scalability advantage of shorter integrity proofs. In fact, as described
and elasticity, thereby the capability and efficiency in sup- in Section 2, existing schemes can only support insertion,
porting data dynamics, are of extreme importance in cloud deletion or modification of one or multiple fixed-sized
computing. blocks, which we call ‘coarse-grained’ updates.
LIU ET AL.: PUBLIC AUDITING OF BIG DATA WITH FINE-GRAINED UPDATES ON CLOUD 2237

Fig. 2. Example of a rank-based Merkle hash tree (RMHT).

Although support for coarse-grained updates can pro- e : G  G ! GT where GT is a multiplicative cyclic group
vide an integrity verification scheme with basic scalabil- with prime order. A useful e should have the following
ity, data updating operations in practice can always be properties: bilinearityV8m; n 2 G ) eðma ; nb Þ ¼ eðm; nÞab ;
more complicated. For example, the verifiable update non-degenera cy V8m 2 G; m 6¼ 0 ) eðm; mÞ 6¼ 1; and
process introduced in [6], [12] cannot handle deletions or computabilityVe should be efficiently computable. For
modifications in a size lesser than a block. For insertions, simplicity, we will use this symmetric bilinear map in our
there is a simple extension that enables insertion of an scheme description. Alternatively, the more efficient
arbitrary-sized datasetVCSS can always create a new asymmetric bilinear map e : G1  G2 ! GT may also be
block (or several blocks) for every insertion. However, applied, as was pointed out in [23].
when there are a large number of small upgrades
(especially insertions), the amount of wasted storage 4.1.2 Ranked Merkle Hash Tree (RMHT)
will be huge. For example, in [6], [12] the recommended The Merkle Hash Tree (MHT) [25] has been intensively
size for a data block is 16k bytes. For each insertion of a studied in the past. In this paper we utilize an extended
140-byte Twitter message, more than 99 percent of the MHT with ranks which we named RMHT. Similar to a
newly allocated storage is wastedVthey cannot be reused binary tree, each node N will have a maximum of 2 child
until the block is deleted. These problems can all be re- nodes. In fact, according to the update algorithm, every
solved if fine-grained data updates are supported. non-leaf node will constantly have 2 child nodes. Informa-
According to this observation, supporting of fine-grained tion contained in one node N in an RMHT T is represented
updates can bring not only additional flexibility, but also as fH; rN g where H is a hash value and rN is the rank of this
improved efficiency. node. T is constructed as follows. For a leaf node LN based
Our model assumes the following: on a message mi , we have H ¼ hðmi Þ, rLN ¼ si ; A parent
node of N1 ¼ fH1 ; rN1 g and N2 ¼ fH2 ; rN2 g is constructed
Assumption 1. CSS will honestly answer all data queries to its as NP ¼ fhðH1 kH2 Þ; ðrN1 þ rN2 Þg where k is a concatenation
clients. In other words, if a user asks to retrieve a certain piece operator. A leaf node mi ’s AAI Wi is a set of hash values
of her data stored on CSS, CSS will not try to cheat her with chosen from every of its upper level so that the root value
an incorrect answer. R can be computed through fmi ; Wi g. For example, for
the RMHT demonstrated in Fig. 2, m1 ’s AAI W1 ¼
This assumptionVreliabilityVshould be a basic service fhðm2 Þ; hðeÞ; hðdÞg. According to the property of RMHT,
quality guarantee for cloud storage services. we know that the number of hash values included in Wi
PDP and POR are different models with similar goals. equals the depth of mi in T .
One main difference is that the file is encoded with error-
correction code in the POR model, but not in the PDP 4.2 Framework and Definitions
model [6]. As in [6], [7], we will not restrict our work to We first define the following block-level fine-grained
either of the models. update operations:

Definition 1 (Types of Block-Level Operations in Fine-


Grained Updates). Block-level operations in fine-grained dy-
4 THE PROPOSED SCHEME
namic data updates may contain the following 6 types of
Some common notations are introduced in Appendix A. operations: partial modification PMVa consecutive part of a
certain block needs to be updated; whole-block modification
4.1 Preliminaries
MVa whole block needs to be replaced by a new set of data;
4.1.1 Bilinear Map block deletion DVa whole block needs to be deleted from the
Assume a group G is a gap Diffie-Hellman (GDH) group tree structure; block insertion J Va whole block needs to be
with prime order p. A bilinear map is a map constructed as created on the tree structure to contain newly inserted data;
2238 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014

and block splitting SPVa part of data in a block needs to be FilePreProcðF; sk; SegReqÞ: According to the preemp-
taken out to form a new block to be inserted next to it.1 tively determined segmentation requirement SegReq (in-
cluding smax , a predefined upper-bound of the number of
Framework of public auditing scheme with data dynam-
segments per block), segments file F into F ¼ fmij g;
ics support is consisted of a series of algorithms. Similar
i 2 ½1; l; j 2 ½1; s; si 2 ½1; smax , i.e., F is segmented into a
to [12], the algorithms in our framework are: Keygen,
total of l blocks, with the ith block having si segments.
FilePreProc, Challenge, Verify, Genproof, PerformUpdate
In our settings, every file segment should of the same size
and VerifyUpdate. Detailed definitions and descriptions can
 2 ð0; pÞ and as large as possible (see [15]). Since jpj ¼ 20
be found in Appendix B.
bytes is used in a BLS signature with 80-bit security
(sufficient in practice),  ¼ 20 bytes is a common choice.
4.3 Our Scheme According to smax , a set U ¼ fuk 2 Zp gk 2 ½1; smax  is chosen so
We now describe our proposed scheme in the aim of that the client
supporting variable-sized data blocks, authorized third- Q icanmcompute the HLAs i for each block:
i ¼ ðHðmi Þ sj¼1 uj ij Þ which constitute the ordered set
party auditing and fine-grained dynamic data updates. F ¼ fi gi2½1;l . This is similar to signing a message with BLS
signature. The client also generate a root R based on
4.3.1 Overview
construction of an RMHT T over Hðmi Þ and compute
Our scheme is described in three parts: sig ¼ ðHðRÞÞ . Finally, let u ¼ ðu1 k . . . kusmax Þ, the client com-
1. Setup: the client will generate keying materials via pute the file tag for F as t ¼ nameknkukSigssk ðnameknkuÞ
KeyGen and FileProc, then upload the data to CSS. and then output fF ; F; T; R; sig; tg.
Different from previous schemes, the client will store a
4.3.3 Prepare for Authorization
RMHT instead of a MHT as metadata. Moreover, the
client will authorize the TPA by sharing a value sigAUTH . The client asks (her choice of) TPA for its ID VID (for security,
2. Verifiable Data Updating: the CSS performs the client’s VID is used for authorization only). TPA will then return its
fine-grained update requests via PerformUpdate, ID, encrypted with the client’s public key. The client will then
then the client runs VerifyUpdate to check whether compute sigAUTH ¼ Sigssk ðAUTHktkVIDÞ and sends sigAUTH
CSS has performed the updates on both the data along with the auditing delegation request to TPA for it to
blocks and their corresponding authenticators (used compose a challenge later on.
for auditing) honestly. Different from existing schemes, after the execution of
3. Challenge, Proof Generation and Verification: De- the above two algorithms, the client will keep the RMHT
scribes how the integrity of the data stored on CSS is ‘skeleton’ with only ranks of each node and indices of each
verified by TPA via GenChallenge, GenProof and Verify. file block to reduce fine-grained update requests to block-
level operations. We will show how this can be done in
We now describe our scheme in detail as follows. Section 4.4. The client then sends fF ; t; F; sig; AUTHg to
CSS and deletes fF; F ; t; F; sigg from its local storage. The
4.3.2 Setup CSS will construct an RMHT T based on mi and keep T
This phase is similar to the existing BLS-based schemes stored with fF ; t; F; sig; AUTHg for later verification,
except for the segmentation of file blocks. Let which should be identical to the tree spawned at client-
e : G  G ! GT be a bilinear map defined in Section 4.1, side just a moment ago.
where G is a GDH group supported by Zp 2.H : ð0; 1Þ ! G is
a collision-resistant hash function, and h is another crypto- 4.3.4 Verifiable Data Updating
graphic hash function. Same as Setup, this process will also be between client and
After all parties have finished negotiating the fundamental CSS. We discuss 5 types of block-level updates (operations)
parameters above, the client runs the following algorithms: that will affect T : PM, M, D, J and SP (see Definition 1).
KeyGenð1k Þ: The client generates a secret value  2 Zp We will discuss how these requests can form fine-grained
and a generator g of G, then compute  ¼ g . A secret update requests in general in Section 4.4.
signing key pair fspk; sskg is chosen with respect to a The verifiable data update process for a PM-typed
designated provably secure signature scheme whose update is as follows (see Fig. 3):
signing algorithm is denoted as SigðÞ. This algorithm 1. The client composes an update quest UpdateReq
outputs fssk; g as the secret key sk and fspk; ; gg as the defined in Section 4.2 and sends it to CSS.
public key pk. For simplicity, in our settings, we use the 2. CSS executes the following algorithm:
same key pair for signatures, i.e., ssk ¼ ; spk ¼ fv; gg. PerformUpdateðUpdateReq; F Þ: CSS parses UpdateReq
and get fPM; i; o; mnew g. When Type ¼ PM, CSS will update
1. There are other possible operations such as block merging mi and T accordingly, then output Pupdate ¼ fmi ; Wi ; R0 ; sigg
MEVtwo blocks need to be merged into the first block before the (note that Wi stays the same during the update) and the
second block is deleted, and data moving MVVmove a part of data updated file F 0 .
from one block to another, if the size of the second block does not
exceed smax   after this update. However, the fine-grained update Upon finishing of this algorithm, CSS will send Pupdate to
requests discussed in this paper do not involve these operations, thus the client.
we will omit them in our current discussion. We will leave the problem 3. After receiving Pupdate , the client executes the follow-
of how to exploit them in future work. ing algorithm:
2. Most exponential operations in this paper are modulo p.
Therefore, for simplicity, we will use g instead of g mod p unless VerifyUpdateðpk; Pupdate Þ: The client computes m0i using
otherwise specified. fmi ; UpdateReqg, then parse Pupdate to fmi ; Wi ; R0 ; sigg,
LIU ET AL.: PUBLIC AUDITING OF BIG DATA WITH FINE-GRAINED UPDATES ON CLOUD 2239

Fig. 3. Verifiable PM-typed Data Update in our scheme.

compute R (and HðRÞ) and Rnew use fmi ; Wi g and fm0i ; Wi g 2. After receiving chal, CSS will run the following
respectively. It verifies sig use HðRÞ, and check if algorithm:
Rnew ¼ R0 . If either of these two verifications fails, then GenProofðpk; F; sigAUTH ; F; chalÞ: Let w ¼ max fsi gi2I . CSS
output FALSE and return to CSS, otherwise output TRUE. will first verify sigAUTH with AUTH, t, VID and the client’s
If the output of the algorithms is TRUE, then the client public key spk, andP output REJECT if it fails. Otherwise,
Q CSS
QS1 m0ij 
computes i ¼ ðHðmi Þ j¼1 uj Þ and sig0 ¼ ðHðR0 ÞÞ then
0 0 will compute k ¼ i2I vi mik ; k 2 ½1; w and  ¼ i2I ivi and
sends fi0 ; sig0 g to CSS. compose the proof P as ¼ ffk gk2½1;w ; fHðmi Þ; Wi gi2I ; sigg,
4. The CSS will update i to i0 and sig to sig0 accordingly then ouput P . Note that during the computation of k , we will
and delete F if it receives fi0 ; sig0 g, or it will run let mik ¼ 0 if k 9 si .
PerformUpdateðÞ again if it receives FALSE. A cheating After execution of this algorithm, CSS will send P to TPA.
CSS will fail the verification and constantly receive FASLE 3. After receiving P , TPA will run the following
until it performed the update as the client requested. algorithm:
Due to their similarity to the process described above, Verifyðpk; chal; P Þ: TPA will compute R using fHðmi Þ; Wi g
other types of operations are only briefly discussed as and then verify sig use public keys g and  by Q comparingvi
follows. For whole-block operations M, D, and J , as in eðsig;
Q gÞ with ðHðRÞ; vÞ. If they are equal, let ! ¼ i2I Hðmi Þ 
k
model in the existing work [6], the client can directly k2½1;w uk , TPA will further check if eð; gÞ equals eð!; vÞ,
compute i0 without retrieving data from the original file F which is similar to verifying a BLS signature. If all the
stored on CSS, thus the client can send i0 along with two equations hold then the algorithm returns TRUE,
UpdateReq in the first phase. For responding to an update otherwise it returns FALSE.
request, CSS only needs to send back Hðmi Þ instead of mi . An illustration of Challenge and Verification processes
Other operations will be similar to where Type ¼ PM. For can be found in Fig. 4.
an SP-typed update, in addition to updating mi to m0i , a
4.4 Analysis on Fine-Grained Dynamic
new block m needs to be inserted to T after m0i .
Data Updates
Nonetheless, as the contents in m is a part of the old mi ,
Following the settings in our proposed scheme, we now
the CSS still needs to send mi back to the client. The process
define a fine-grained update request for an outsourced file
afterwards will be just similar to a PM-typed upgrade,
divided into l variable-sized blocks, where each block is
with an only exception that the client will compute Rnew
consisted of si 2 ½1; smax  segments of a fixed size  each.
using fm0i ; hðm Þ; Wi g to compare to R0 , instead of
Assume an RMHT T is built upon fmi gi2½1;l for authenti-
using fm0i ; Wi g as in the PM-typed update.
cation, which means T must keep updated with each RMHT
operation for CSS to send back the root R for the client to
4.3.5 Challenge, Proof Generation and Verification verify the correctness of this operation (see Section 4.3). We
In our setting, TPA must show CSS that it is indeed now try to define and categorize all types of fine-grained
authorized by the file owner before it can challenge a updates, and then analyze the RMHT operations with
certain file. Type ¼ PM; M; D; J or SP that will be invoked along with
1. TPA runs the following algorithm: the update of the data file.
GenChallengeðAcc; pk; sigAUTH Þ: According to the accu-
Definition 2 (Fine-Grained Data Update Request). A
racy required in this auditing, TPA will decide to verify c
fine-grained update request is defined as FReq ¼ fo; len; mnew g,
out of the total l blocks. Then, a challenge message
where o indicates the starting offset of this update in F , len
chal ¼ fsigAUTH ; fVIDgPKCSS ; fi; vi gi2I g is generated where
indicates the data length after o that needs to be updated (so that
VID is TPA’s ID, I is a randomly selected subset of ½1; l
fo; leng can characterize an exact proportion of the original file
with c elements and fvi 2 Zp gi2I are c randomly-chosen
F that needs to be updated, which we will later call mold ), and
coefficients. Note that VID is encrypted with the CSS’s
mnew is the new message to be inserted into F from offset o.
public key PKCSS so that CSS can later decrypt fVIDgPKCSS
with the corresponding secret key. We assume the data needed to be obsolete and the new
TPA then sends chal to CSS. data to be added shares a common starting offset o in F , as
2240 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014

Fig. 4. Challenge, Proof Generation and Verification in our scheme.

otherwise it can be split into multiple updates defined in fPM; I; o; mnew g and use fo; jmnew jg to gather the sectors
Definition 2 commencing in sequence. We now introduce a that are not involved in this update, which we denote as
rule to follow during all update processes: fmij gj2M . CSS will then perform the update to get m0i , then
compute R0 , then send the proof of update Pupdate ¼
Condition 1 (Block Size Limits in Updates). An update ffmij gj2M ; Hðmi Þ; Wi ; R0 ; sigg to the client.
operation must not cause the size of any block to exceed smax ; VerifyUpdate: After the client received Hðmi Þ, it will first
After any operation, a block that has 0 bit data remaining compute R using Hðmi Þ; Wi and verify sig, then it will
must be deleted from T . compute m0i using ffmij gj2M ; mnew g and then compute Rnew
Detailed analysis can be found in Appendix C, which with fm0i ; Wi g and compare Rnew with R0 . If Rnew ¼ R0 , then
can be summarized as the following theorem: the client will return fi0 ; sig0 g to CSS for it to update
accordingly.
Theorem 1. Any valid fine-grained update request that is in the For an SP operation the process will be the same to our
form of fo; len; mnew g can either directly belong to, or be split basic scheme as there are no new data inserted into T ,
into some smaller requests that belong to, the following 5 types therefore the retrieving of the entire data block is inevitable
of block-level update requests: PM, M, J , D and SP. when computations of i0 and  are required. For other
types of operations, no old data is involved in new blocks;
Proof. See Appendix C. g therefore the processes will also remain the same. The
Through the analysis above, we know that a large process is shown in Fig. 5.
number of small updates, no matter insert, delete or mod-
ification, will always invoke a large number of PM op-
erations. We now try to optimize PM operations in the next 4.6 Extensions and Generalizations
section to make it more efficient. Our strategy can also be applied in RSA-based PDP or POR
schemes to achieve authorized auditing and fine-grained
4.5 Modification for Better Support of data update requests. As RSA can inherently support
Small Updates variable-sized blocks, the process will be even easier. The
batch auditing variation in [6], [7] can also be applied to our
Although our proposed scheme can support fine-grained
scheme, as we did not change the construction of HLAs and
update requests, the client still needs to retrieve the entire
the verifications on them.
file block from CSS to compute the new HLA, in the sense
For the same reason, the random masking strategy for
that the client is the only party that has the secret key  to
privacy preserving proposed in [7] can also be incorpo-
compute the new HLA but clients do not have F stored
rated into our scheme to prevent TPA from parsing the
locally. Therefore, the additional cost in communication
challenged file blocks through a series of integrity proofs
will be huge for frequent updates. In this section, we will
to a same set of blocks. Alternatively, we can also restrict
propose a modification to address this problem, utilizing
the number of challenges to the same subset of data blocks.
the fact that CSS only needs to send back data in the block
When data updates are frequent enough, the success rate
that stayed unchanged.
of this attack will drop dramatically, because there is a
The framework we use here is identical to the one used
high probability that one or many of the challenged blocks
in our scheme introduced in Section 4.2 (which we will also
have already updated before c challenges are completed,
name as ’the basic scheme’ hereafter). Changes are made in
which is the reason we did not incorporate this strategy
PerformUpdate and VerifyUpdate; Setup, Challenge, Proof
into our scheme.
Generation and Verification phases are same as in our basic
scheme. Therefore, we will only describe the two algo-
rithms in the following phase:
5 SECURITY ANALYSIS
4.5.1 Verifiable Data Updating In this section, the soundness and security of our scheme
We also discuss PM operations here first. is discussed separately in phases, as the aim and behavior
PerformUpdate: After CSS has received the update of the malicious adversary in each phase of our scheme is
request UpdateReq from the client, it will parse it as different.
LIU ET AL.: PUBLIC AUDITING OF BIG DATA WITH FINE-GRAINED UPDATES ON CLOUD 2241

Fig. 5. Verifiable PM-typed Data Update in our modified scheme.

5.1 Challenge and Verification 5.3 Verifiable Data Updating


In the challenge/verification process of our scheme, we try In the verifiable updating process, the main adversary is the
to secure the scheme against a malicious CSS who tries to untrustworthy CSS who did not carry out the data update
cheat the verifier TPA about the integrity status of the successfully, but still manages to return a satisfactory
client’s data, which is the same as previous work on both response to the client thereafter. We now illustrate the security
PDP and POR. In this step, aside from the new authoriza- of this phase of our scheme in the following theorem:
tion process (which will be discussed in detail later in this
section), the only difference compared to [6] is the RMHT Theorem 3. In the verifiable update process in both our basic
and variable-sectored blocks. Therefore, the security of this scheme and the modification, CSS cannot provide the client
phase can be proven through a process highly similar with with the satisfactory result, i.e., R0 cannot match the Rnew
[6], using the same framework, adversarial model and computed by the client with fHðm0i Þ; Wi g, if CSS did not
interactive games defined in [6]. A detailed security proof update the data as requested.
for this phase is therefore omitted here.
Proof. See Appendix D. g
5.2 TPA Authorization Note that in the verifiable update process, data retrieval
Security of the new authorization strategy in our scheme is is a part of the verifiable update process. According to
based on the existential unforgeability of the chosen Assumption 1, CSS will respond this query with the correct
signature scheme. We first define the behavior of a mii . If not with Assumption 1, it is recommended to indepen-
malicious third-party auditor. dently retrieve fmij gj2M before the update so that CSS cannot
cheat the client intentionally, as it cannot distinguish
Definition 3 (Malicious TPA). A malicious TPA is a third
whether the following update is based on this retrieval.
party who aims at challenging a user’s data stored on CSS for
If CSS can be trusted even more, the client may let CSS
integrity proof without this user’s permission. The malicious m
compute ðuj ij Þ (where mij are the sectors that did not
TPA has access to the entire network.
change) and send it back to the client, then the client will be
According to this definition, none of the previous data able to compute 0 using it along with mnew and Hðm0i Þ. This
auditing schemes is resilient against a malicious TPA. will keep the communication cost of this phase on a
Now, in our scheme, we have the following theorem: constantly low level. However, as the CSS is only
considered semi-trusted and it is difficult for the client to
Theorem 2. Through the authorization process, no malicious m
verify ðj ij Þ without mij , this assumption is unfortunately
TPA can cause the CSS to respond with an integrity proof P too strong for the majority of scenarios.
over an arbitrary subset of file F , namely mi ; i 2 I, unless a
negligible probability.
6 EVALUATION AND EXPERIMENTAL RESULTS
Proof. See Appendix D. g
We have provided an overall evaluation and comparison in
From this theorem, we can see that the security of a Appendix E.
public auditing scheme is strengthened by adding the We conducted our experiments on U-CloudVa cloud
authorization process. In fact, the scheme is now resilient computing environment located in University of Technol-
against malicious or pretended auditing requests, as well ogy, Sydney (UTS). The computing facilities of this system
as potential DDOS attacks launched by malicious auditors. are located in several labs in the Faculty of Engineering and
For even higher security, the client may mix in a nonce to IT, UTS. On top of hardware and Linux OS, We installed
the authorization message to make every auditing message KVM Hypervisor [26] which virtualizes the infrastructure
distinct, so that no one can utilize a previous authorization and allows it to provide unified computing and storage
message. However, this setting may not be appropriate for resources. Upon virtualized data centers, Hadoop [27] is
many scenarios, as the client must stay online when each installed to facilitate the MapReduce programming model
auditing happens. and distributed file system. Moreover, we installed OpenStack
2242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014

open source cloud platform [28] which is responsible for


global management, resource scheduling, task distribution
and interaction with users.
We implemented both our scheme and its modification
on U-Cloud, using a virtual machine with 36 CPU cores,
32GB RAM and 1TB storage in total. As in previous work
[6], [12], we also used a 1GB randomly generated dataset
for testing. The scheme is implemented under 80-bit
security, i.e.,  ¼ jpj ¼ 160 bits. As the number of sectors s
(per block) is one of the most influential metrics to overall
performance, we will use it as our primary metrics. For
saving of the first wave of allocated storage, we used
Fig. 7. Comparison of the total storage overhead invoked by 10 140-byte
si ¼ smax in the initial data splitting and uploading. Note insertions to the i-th block in our scheme, as opposed to the direct
that smax decides the total number of blocks for an arbitrary extension of [6].
jF j. However, according to [10], the number of authenti-
cated blocks is a constant with respect to a certain for our scheme stays constant, while in the extended old
percentage of file tampered and a certain success rate of scheme [6] (see Section 3.2.2) the storage increases linearly
detection, therefore we will not take the number of audited with the increase in size of the affected block. These results
blocks as our primary variable of measurement. All demonstrated that our scheme with fine-grained data
experimental results are an average of 20 runs. update support can incur significantly lower storage
We first tested how smax can influence the size of proof overhead (down to 0:14 in our test scenarios) for small
P , which is missing in former schemes [6], [7]. From Fig. 6, insertions when compared to existing scheme.
we can see that generally the proof size decreases when Third, we investigated the performance improvement
smax increases, because the average depth of leaf nodes mi of the modification introduced in Section 4.5. We used
of T decreases when smax increases to a certain level, es- 3 pieces of random data with sizes of 100 bytes, 140 bytes
pecially when right after the initial uploading of F . Note and 180 bytes, respectively, to update several blocks that
that the storage of HLA and RMHT at CSS side will also contain 10 to 50 standard 20-byte sectors each. Data
decrease with the increase of the average number of blocks. retrieval is a key factor of communication overheads in
Therefore, a relatively large smax (but not too large, which the verifiable update phase. For each update, we recorded
we will discuss along with the third experiment) is re- the total amount of data retrieval for both our modified
commended in our dynamic setting. scheme and our basic scheme. The results in comparison are
Second, we tested the storage overhead for small shown in Fig. 9. We can see that our modified scheme always
insertions. Without support for fine-grained updates, every has better efficiency with respect to data-retrieval-invoked
small insertion will cause creation of a whole new block communication overheads, and the advantage is more
and update of related MHT nodes, which is why our significant for larger updates. However, for an update of the
scheme has efficiency advantage. We compared our same size, the advantage will decrease with the increase of
scheme against a representative (and also recent) public jsi j where a larger number of sectors in the original file are
auditing scheme [6]. For comparison, we extended the needed to be retrieved. Therefore, the block size needs to be
older scheme a bit to let it support the communication- kept low if less communication in verifiable updates is
storage trade-off introduced in [15] so that it can support highly demanded.
larger file blocks with multiple (but only a predefined From the experimental results on small updates, we can
constant number of) sectors each. The updates chosen for see that our scheme can incur significantly lower storage
experiments are 10  140 Bytes and 10  280 Bytes, filled overhead while our modified scheme can dramatically
with random data. Results are shown in Figs. 7 and 8. For reduce communication overheads compared to the existing
updates of the same total size, the increased storage on CSS scheme. In practice, the important parameter smax should

Fig. 8. Comparison of the total storage overhead invoked by 10 280-byte


Fig. 6. Communication overhead invoked by an integrity proof with 80-bit insertions to the i-th block in our scheme, as opposed to the direct
security under different smax for a 1GB data. extension of [6].
LIU ET AL.: PUBLIC AUDITING OF BIG DATA WITH FINE-GRAINED UPDATES ON CLOUD 2243

[2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski,


G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, ‘‘A View of
Cloud Computing,’’ Commun. ACM, vol. 53, no. 4, pp. 50-58,
Apr. 2010.
[3] Customer Presentations on Amazon Summit Australia, Sydney,
2012, accessed on: March 25, 2013. [Online]. Available: http://
aws.amazon.com/apac/awssummit-au/.
[4] J. Yao, S. Chen, S. Nepal, D. Levy, and J. Zic, ‘‘TrustStore: Making
Amazon S3 Trustworthy With Services Composition,’’ in Proc.
10th IEEE/ACM Int’l Symposium on Cluster, Cloud and Grid
Computing (CCGRID), 2010, pp. 600-605.
[5] D. Zissis and D. Lekkas, ‘‘Addressing Cloud Computing Secu-
rity Issues,’’ Future Gen. Comput. Syst., vol. 28, no. 3, pp. 583-592,
Mar. 2011.
[6] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, ‘‘Enabling Public
Auditability and Data Dynamics for Storage Security in Cloud
Fig. 9. Percentage in saving of communication overhead in data retrieval Computing,’’ IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 5,
in the modified scheme, compared to our basic scheme. pp. 847-859, May 2011.
[7] C. Wang, Q. Wang, K. Ren, and W. Lou, ‘‘Privacy-Preserving
Public Auditing for Data Storage Security in Cloud Computing,’’
be carefully chosen according to different data size and in Proc. 30st IEEE Conf. on Comput. and Commun. (INFOCOM),
2010, pp. 1-9.
different efficiency demands in storage or communica-
[8] G. Ateniese, R.D. Pietro, L.V. Mancini, and G. Tsudik, ‘‘Scalable
tions. For example, for general applications with a similar and Efficient Provable Data Possession,’’ in Proc. 4th Int’l
scale (1GB per dataset and frequent 140-byte updates), a Conf. Security and Privacy in Commun. Netw. (SecureComm),
choice of smax ¼ 30 will allow the scheme to incur sig- 2008, pp. 1-10.
[9] G. Ateniese, R. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner,
nificantly lowered overheads in both storage and commu- Z. Peterson, and D. Song, ‘‘Remote Data Checking Using Provable
nications during updates. Additional analysis regarding Data Possession,’’ ACM Trans. Inf. Syst. Security, vol. 14, no. 1, May
efficiency can be found in Appendix F. 2011, Article 12.
[10] G. Ateniese, R.B. Johns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,
and D. Song, ‘‘Provable Data Possession at Untrusted Stores,’’ in
7 CONCLUSION AND FUTURE WORK Proc. 14th ACM Conf. on Comput. and Commun. Security (CCS), 2007,
In this paper, we have provided a formal analysis on pp. 598-609.
[11] R. Curtmola, O. Khan, R.C. Burns, and G. Ateniese, ‘‘MR-PDP:
possible types of fine-grained data updates and proposed a Multiple-Replica Provable Data Possession,’’ in Proc. 28th IEEE
scheme that can fully support authorized auditing and Conf. on Distrib. Comput. Syst. (ICDCS), 2008, pp. 411-420.
fine-grained update requests. Based on our scheme, we [12] C. Erway, A. Küpçü, C. Papamanthou, and R. Tamassia,
have also proposed a modification that can dramatically ‘‘Dynamic Provable Data Possession,’’ in Proc. 16th ACM Conf.
on Comput. and Commun. Security (CCS), 2009, pp. 213-222.
reduce communication overheads for verifications of small [13] Y. Zhu, H. Hu, G.-J. Ahn, and M. Yu, ‘‘Cooperative Provable
updates. Theoretical analysis and experimental results Data Possession for Integrity Verification in Multi-Cloud Storage,’’
have demonstrated that our scheme can offer not only IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 12, pp. 2231-2244,
Dec. 2012.
enhanced security and flexibility, but also significantly
[14] A. Juels and B.S. Kaliski Jr., ‘‘PORs: Proofs of Retrievability for
lower overheads for big data applications with a large Large Files,’’ in Proc. 14th ACM Conf. on Comput. and Commun.
number of frequent small updates such as applications in Security (CCS), 2007, pp. 584-597.
social media and business transactions. [15] H. Shacham and B. Waters, ‘‘Compact Proofs of Retrievability,’’
in Proc. 14th Int’l Conf. on Theory and Appl. of Cryptol. and Inf.
Based on the contributions of this paper on improved Security (ASIACRYPT), 2008, pp. 90-107.
data auditing, we plan to further investigate the next step [16] S. Nepal, S. Chen, J. Yao, and D. Thilakanathan, ‘‘DIaaS: Data
on how to improve other server-side protection methods Integrity as a Service in the Cloud,’’ in Proc. 4th Int’l Conf. on
for efficient data security with effective data confidentiality Cloud Computing (IEEE CLOUD), 2011, pp. 308-315.
[17] Y. He, S. Barman, and J.F. Naughton, ‘‘Preventing Equivalence
and availability. Besides, we also plan to investigate Attacks in Updated, Anonymized Data,’’ in Proc. 27th IEEE Int’l
auditability-aware data scheduling in cloud computing. Conf. on Data Engineering (ICDE), 2011, pp. 529-540.
As data security is also considered as a metric of quality-of- [18] E. Naone, ‘‘What Twitter Learns From All Those Tweets,’’ in
service (QoS) along with other metrics such as storage and Technology Review, Sept. 2010, accessed on: March 25, 2013.
[Online]. Available: http://www.technologyreview.com/
computation, a highly efficient security-aware scheduling view/420968/what-twitter-learns-from-all-those-tweets/
scheme will play an essential role under most cloud [19] X. Zhang, L.T. Yang, C. Liu, and J. Chen, ‘‘A Scalable Two-Phase
computing contexts. Top-Down Specialization Approach for Data Anonymization
Using MapReduce on Cloud,’’ IEEE Trans. Parallel Distrib. Syst.,
vol. 25, no. 2, pp. 363-373, Feb. 2014.
[20] S.E. Schmidt, ‘‘Security and Privacy in the AWS Cloud,’’
ACKNOWLEDGMENT presented at the Presentation Amazon Summit Australia,
Sydney, Australia, May 2012, accessed on: March 25, 2013. [Online].
This research work is partly supported by Australian Available: http://aws.amazon.com/apac/awssummit-au/.
Research Council under Linkage Project LP0990393. [21] C. Liu, X. Zhang, C. Yang, and J. Chen, ‘‘CCBKEVSession Key
Negotiation for Fast and Secure Scheduling of Scientific
Applications in Cloud Computing,’’ Future Gen. Comput. Syst.,
vol. 29, no. 5, pp. 1300-1308, July 2013.
REFERENCES [22] X. Zhang, C. Liu, S. Nepal, S. Panley, and J. Chen, ‘‘A Privacy
[1] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, Leakage Upper-Bound Constraint Based Approach for Cost-
‘‘Cloud Computing and Emerging IT Platforms: Vision, Hype, Effective Privacy Preserving of Intermediate Datasets in Cloud,’’
Reality for Delivering Computing as the 5th Utility,’’ Future Gen. IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 6, pp. 1192-1202,
Comput. Syst., vol. 25, no. 6, pp. 599-616, June 2009. June 2013.
2244 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014

[23] D. Boneh, H. Shacham, and B. Lynn, ‘‘Short Signatures From the Xuyun Zhang received the BS and MS degrees in
Weil Pairing,’’ J. Cryptol., vol. 17, no. 4, pp. 297-319, Sept. 2004. computer science from Nanjing University, China,
[24] G. Ateniese, S. Kamara, and J. Katz, ‘‘Proofs of Storage From and is currently working towards the PhD degree at
Homomorphic Identification Protocols,’’ in Proc. 15th Int’l Conf. the Faculty of Engineering & IT, University of Tech-
on Theory and Appl. of Cryptol. and Inf. Security (ASIACRYPT), nology, Sydney, Australia. His research interests
2009, pp. 319-333. include cloud computing, privacy and security, Big
[25] R.C. Merkle, ‘‘A Digital Signature Based on a Conventional Data, MapReduce and OpenStack. He has pub-
Encryption Function,’’ in Proc. Int’l Cryptol. Conf. on Adv. in lished several papers in refereed international
Cryptol. (CRYPTO), 1987, pp. 369-378. journals including IEEE Transactions on Parallel
[26] KVM Hypervisor, accessed on: March 25, 2013. [Online]. and Distributed Systems (TPDS).
Available: www.linux-kvm.org/.
[27] Hadoop MapReduce. [Online]. Available: http://hadoop. Chi Yang received the BS degree from Shandong
apache.org University At Weihai, China, the MS (by research)
[28] OpenStack Open Source Cloud Software, accessed on: March 25, degree in computer science from Swinburne
2013. [Online]. Available: http://openstack.org/ University of Technology, Melbourne, Australia, in
2007, and is currently pursuing full-time the PhD
degree at the University of Technology, Sydney,
Chang Liu received BEng and MSc degrees Australia. His major research interests include
from Shandong University, China. He is currently distributed computing, XML data stream, scientific
pursuing the PhD degree at Faculty of Engi- workflow, Distributed System, Green Computing,
neering and IT, University of Technology Big Data Processing and Cloud Computing.
Sydney, Australia. His research interests include
cloud and distributed computing, resource man- Rajiv Ranjan received the PhD degree in
agement, cryptography and data security. engineering from the University of Melbourne,
Australia, in 2009. He is a Research Scientist
and a Julius Fellow in CSIRO Computational
Informatics Division (formerly known as CSIRO
ICT Centre). His expertise is in datacenter cloud
computing, application provisioning, and perfor-
Jinjun Chen received the PhD degree in computer
mance optimization. He has published 62 scien-
science and software engineering from Swinburne
tific, peer-reviewed papers (7 books, 25 journals,
University of Technology, Australia. He is an
25 conferences, and 5 book chapters). His h-index
Associate Professor from Faculty of Engineering
is 20, with a lifetime citation count of 1660+ (Google
and IT, University of Technology Sydney (UTS),
Scholar). His papers have also received 140+ ISI citations. Seventy percent
Australia. He is the Director of Lab of Cloud
of his journal papers and 60 percent of conference papers have been A*/A
Computing and Distributed Systems at UTS. His
ranked ERA publication. Dr. Ranjan has been invited to serve as the Guest
research interests include cloud computing, big
Editor for leading distributed systems journals including IEEE Transactions
data, workflow management, privacy and security,
on Cloud Computing, Future Generation Computing Systems, and
and related various research topics. His research
Software Practice and Experience. One of his papers was in 2011’s top
results have been published in more than 100
computer science journal, IEEE Communication Surveys and Tutorials.
papers in high quality journals and at conferences, including IEEE
Transactions on Service Computing, ACM Transactions on Autonomous
and Adaptive Systems, ACM Transactions on Software Engineering and Ramamohanarao (Rao) Kotagiri received the
Methodology (TOSEM), IEEE Transactions on Software Engineering PhD degree from Monash University. He was
(TSE), and IEEE Transactions on Parallel and Distributed Systems awarded the Alexander von Humboldt Fellow-
(TPDS). He received Swinburne Vice-Chancellor’s Research Award for ship in 1983. He has been at the University of
early career researchers (2008), IEEE Computer Society Outstanding Melbourne since 1980 and was appointed as a
Leadership Award (2008-2009) and (2010-2011), IEEE Computer Society Professor in computer science in 1989. He has
Service Award (2007), Swinburne Faculty of ICT Research Thesis held several senior positions including Head of
Excellence Award (2007). He is an Associate Editor for IEEE Transactions Computer Science and Software Engineering,
on Parallel and Distributed Systems. He is the Vice Chair of IEEE Computer Head of the School of Electrical Engineering and
Society’s Technical Committee on Scalable Computing (TCSC), Vice Chair Computer Science at the University of Melbourne
of Steering Committee of Australasian Symposium on Parallel and and Research Director for the Cooperative Re-
Distributed Computing, Founder and Coordinator of IEEE TCSC Technical search Centre for Intelligent Decision Systems. He served on the editorial
Area on Workflow Management in Scalable Computing Environments, boards of the Computer Journal. At present, he is on the editorial boards
Founder and steering committee co-chair of International Conference on of Universal Computer Science, and Data Mining, IEEE Transactions on
Cloud and Green Computing, and International Conference on Big Data and Knowledge and Data Engineering and VLDB (Very Large Data Bases)
Distributed Systems. He is a Senior Member of IEEE. Journal. Dr. Kotagiri was the program cochair for VLDB, PAKDD,
DASFAA, and DOOD conferences. He is a steering committee member
of IEEE ICDM, PAKDD, and DASFAA. He received a Distinguished
Contribution Award for Data Mining. He is a fellow of the Institute of
Laurence T. Yang received the BE degree in Engineers Australia, the Australian Academy Technological Sciences
computer science and technology from Tsinghua and Engineering, and the Australian Academy of Science. He was
University, Beijing, China, and the PhD degree in awarded a Distinguished Contribution Award in 2009 by the Computing
computer science from the University of Victoria, Research and Education Association of Australasia.
Victoria, BC, Canada. He is a Professor in School of
Computer Science and Technology, Huazhong
University of Science and Technology, China and . For more information on this or any other computing topic,
in Department of Computer Science, St. Francis please visit our Digital Library at www.computer.org/publications/dlib.
Xavier University, Canada. His current research
interests include parallel and distributed computing,
embedded and ubiquitous computing. His research
has been supported by National Sciences and Engineering Research
Council, Canada and Canada Foundation for Innovation. He is a member
of IEEE.

You might also like