2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing
Survey on Data Integrity in Cloud
Ensar Şeker
Kamile Nur Seviş
Cyber Security Institute
National Common Criteria Evaluation Laboratory BILGEM TUBİTAK (The Scientific and Technological
BILGEM TUBİTAK (The Scientific and Technological Research Council of Turkey)
Research Council of Turkey) Kocaeli, TURKEY
Kocaeli, TURKEY ensar.seker@tubitak.gov.tr
nur.sevis@tubitak.gov.tr
based services through web based interfaces, applications that
Abstract— In recent years cloud computing is getting more run on a cloud infrastructure. Google Docs, Microsoft Office
and more attention every day. While outsourcing the hardware Web Access, Dropbox are examples of SaaS. (2) PaaS
and software resources, still being able to manage them remotely
(Platform as a Service): A software developer needs hardware,
with benefits like high computing power, competitiveness, cost
efficiency, scalability, flexibility, accessibility and availability are
software, operating system and some other components to
revolutionary. For all of its advantages, on the other hand, develop software. In this model, service provider has control
nothing interesting is ever completely one-sided. Security and and management functions, but provides all necessary
integrity of the data which is stored in untrustworthy server is platforms for software developers. Appistry Cloud IQ
critically important and raises concerns about it. The data can be Platform, Flexiscale, Gizmox Visual WebGUI, Google App
modified, removed, corrupted or even stolen since it is in the Engine are examples of PaaS. (3) IaaS (Infrastructure as a
remote server. These kinds of malicious activities can be done Service): In this model, clients are provided platforms with
either by untrusted server or unauthorized user(s). Therefore, virtual and physical environment to run their own managed
various integrity checking methods have been offered for cloud operating systems and applications. Thus, the client has
computing systems. This survey aims to analyze and compare control over the applications, storage and operating system)
different researches about data integrity proofs for these systems. while the service provider manages and controls the
infrastructure (network, servers, etc.) such as Amazon Web
Keywords—Cloud Computing, Data Integrity, Data Storage, Services, Bluelock, Cisco CloudVerse [4].
Cloud Audit
The deployment services generally are provided in four
İ. INTRODUCTION main formats: (1) Public clouds: Open used networks publicly.
(2) Private clouds: Only those authorized users have access to
Since the term cloud computing is around, nobody has an idea this system (3) Community clouds: A specific group of users
about what exactly it is. That’s why there are many different with requirements, mission, and policy consideration. (4)
descriptions of cloud computing and none of them, neither better Hybrid clouds: A mix of public, community, and private
nor worse than the other one. Therefore, comparing and may be systems [5].
combining definitions will give us a better idea about the cloud.
According to IBM “The cloud is a delivery of on demand It’s obvious; to eliminate the confusion, analyzing real life
computing resources – everything from applications to data examples would help as well
centers – over the internet on a pay for use basis”[1]. Gartner
defines it as “Cloud computing is a style of computing which II. CLOUD COMPUTING HISTORY
scalable and elastic IT enabled capabilities are delivered as a
Although when did the term cloud computing begin to be
service using internet technologies” [2]. The most recognized
used is uncertain. It’s roots, on the other hand, as an approach,
description comes from NIST. NIST says
go back to fifties if using shared central processing unit by
“Cloud computing is a model for enabling ubiquitous, multiple users can be thought as the fundamental concept of
convenient, on-demand network access to a shared pool of cloud computing.
configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly Intergalactic Computer Network term used by J.C.R.
provisioned and released with minimal management effort or Licklider who is believed to be one of the most important
service provider interaction”[3]. figures in computer science, it was 1962. He anticipates every
user will be able to access data and programs from anywhere
To get a better understanding about cloud computing it is [6].
important to take a look to cloud service and deployment
models. The cloud services are delivered in three main forms Douglass Parkhill’s book “The Challenge of the Computer
viz. SaaS, PaaS, and IaaS. (1) Software as a Service (SaaS) is Utility” published in 1966 is considered to be the first book which
a cloud model that end users have no control over the mentions the characteristic of cloud computing [7, 8].
infrastructure (hardware and software) and they use software Remote users’ connection to a centralized computing facility
978-1-5090-0946-6/16 $31.00 © 2016 IEEE 167
DOI 10.1109/CSCloud.2016.35
over network what he describes in his book the idea of Confidentiality and Privacy: When the subject is
computing as a public utility [9]. confidentiality and privacy, it is not a good idea to ignore it. It is
critically important to keep the data and personal information safe
Around 1970s, virtual machines were being used and it and secure in the cloud. Even though a good security layer has
was a new level for cloud computing as much as information been ensured by the service provider, users have their own
technologies. responsibilities too. Since the data is stored in external servers,
With other developments in 1990s and 2000s, cloud questions about it like who can access it, who is maintaining it,
computing usage significantly increased. Salesforce.com was how secure is it etc. are expected to be answered satisfyingly. If
one of the milestones in cloud computing history established using the cloud is not secure then it is better not to use it at all.
in 1999 brought a new concept. Amazon introduced Amazon Data Location: Physical location of the data in the cloud is
Web Services in 2002 and launched Elastic Compute cloud unknown. There is no transparency about the data’s actual
(EC2) in 2006. Another milestone came in 2009 with Web 2.0 place. Possibly, servers not even in the same country with
and Google had become a key player as cloud service costumers and it could be located anywhere in the world.
providers. It seems cloud computing has a bright future in
information technologies field. Integrity: Integrity of the data is one of the main concerns
for costumers. Client verification on file availability and
İİİ. ADVANTAGES AND DISADVANTAGES OF CLOUD integrity, retrievable and/or recoverable data without any loss
COMPUTING
or corruption issues are needed to be addressed.
Today, companies need to be competitive to stay in the IV. DATA INTEGRITY
race. Many companies are looking for efficient, frequent,
quick, and secure IT solutions. Keeping company data at their The accuracy, consistency, fidelity and validity of
own servers bring to much more cost like employment, space, remotely stored data are called data integrity and it is a
security, and maintenance, etc. For years, companies tried to fundamental component of information security. Data integrity
find how to lower their IT cost. Cloud can make IT cheaper, can be corrupted by many reasons like human errors, software
flexible, elastic, simple, available and accessible to anyone bugs, hardware malfunctions or malicious attacks etc. Backing
over network [5, 7]. up data, using security mechanisms, applying error detection
and correction software is not enough for data integrity. At
Cost: Cloud computing has many advantages. One of this point, correctness and availability of the data become a
them, maybe the most important, is cost. Using cloud services major question for users. In order to solve the challenge of
with pay-as you go basis models can reduce security cost, data integrity checking and auditing, there are many proposed
maintenance cost, software licensing cost, personal training models and systems [10].
cost, ownership cost and operational cost etc.
Cloud Service Providers (CSP) need to convince their
Time: It’s possible to reach any data at cloud from clients that their data has remained unaltered and it is kept safe
anywhere at any time via Internet. There is no need to spend from corruption, modification or unauthorized disclosure by
any time for maintenance and management. using one or more of these techniques.
Compatibility: Cloud makes it feasible compatibility
documents and between different operating systems. V. METHODS OF DATA INTEGRITY IN CLOUD
Availability and Flexibility: Anyone can access cloud
A. Provable Data Possession (PDP)
anywhere, anytime. This flexibility and availability make all
users work globally on the same project at the same time. Using challenge response protocol cloud user is able to
check the possession of data to make sure the server possesses
Elasticity: As long as there is internet connection, any user the original one. Retrieving the files from the cloud isn’t
can reach their data day and night from office or home, or necessary [11].
somewhere else. In an elastic way, limitless storage and
networking is available. To achieve this goal, sometimes MAC (Message
Authentication Code), sometimes symmetric encryption and
Like everything else cloud has its own disadvantages as sometimes other methods are being used. Before sending the
well with advantages, weaknesses with strengths, problems file to untrusted cloud server; user fills the file with metadata.
with solutions, cons with pros. After sending the file to CSP server, the user still has the
Availability: Being available from anywhere, at any time metadata to compare its file’s integrity. Original PDP Method
could be a nightmare as much as a daydream. Not just can be used for both encrypted and plain data.
authorized users, unauthorized users with malicious purposes 1) Static PDP
also can try to reach the data in the cloud.
One of the methods is based on hashing and having a key
Malfunction Time: Cloud may have many benefits but as system. In this method, the user hashes the data and has its
long as the internet connection is available. It really doesn’t own key before the data is sent to CSP server. Whenever the
matter how good quality cloud service being provided when integrity of the data wanted to be checked, all, user need to do
the internet is down. There is nothing to do but wait until it is is reveal the key and send it to CSP server. With server’s
back. Longer the loss connection time, more cost there will be. reply, the user is able to compare hash values.
168
For the MAC based method, the user with a key, computes of sentinels to accomplish integrity check, the CSP server is
a MAC of entire data and then sends it to cloud server. The expected to get associated sentinel values [20].
data doesn’t need to be stored to local disk anymore. When it HAIL (High Availability and Integrity Layer) can use
is time for integrity check, by releasing the key the user can MACs, universal hash functions, pseudorandom functions to
compare the recomputed MAC with previous one. assure availability and integrity. The tenant is able to store his
Privacy Preserving method uses encryption and Third files on independent numerous servers. If there is a failure,
Party Auditor (TPA) who can check the data integrity without storage resources can be tested and failure can be detected by
any knowledge of the data contents [12]. using PORs as building blocks [21].
2) Dynamic POR
2) Dynamic PDP
The Merkle Hash Tree is one the techniques used for
Dynamic PDP endorses dynamic operations such as Dynamic POR besides other techniques such as BLS signature,
modifying, deleting, insertion etc. Diffie-Hellman Assumption etc. Some these methods are very
Scalable PDP is a method which uses the symmetric efficient, secure and reduces the computational and storage
encryption. Cloud tenant challenges CSP server with a set of overhead for both owner and CSP server [22, 23].
random looking block indices. The data need to return its
owner after the server computes a short integrity check over C. Other Methods
specified blocks [13, 14]. Besides PDP and POR protocols, there are many other
Collaborative integrity verification is mostly applied in techniques are used as well, such as hash, encryption, MAC
hybrid clouds which uses homomorphic verifiable responses and signature methods etc.
and hash index hierarchy techniques. This method also uses
In hash method, read and compressed file is used as an
systems like IPS (Multi-Prover Interactive Proof System), input of a hash function to get a hash value. For the
MPZKPS (Multi-Prover Zero Knowledge Proof System). verification, CPS server uses the same hash function to read
back the file and generate a hash value. Both hash values have
3) Multicopy PDP to match to verify the integrity of the data.
Encryption, signature aggregation, bilinear map/pairing,
Encryption methods sometimes use a trusted third party
rank-based authenticated skip list are some techniques used
called cloud broker to assure data integrity. For this purpose,
for Multi-copy PDP. Particular differentiable copies of the file broker calculates the hash value of all encrypted segments and
are generated over numerous servers. Every particular replica compares them with hash values stored in database. Some
can be produced at the time of the challenge [15, 16]. other encryption methods prefer XOR operations [24].
B. Proof of Retrievability (POR) MAC method is one other method. In this technique,
Remotely verifying the data which is on CSP server is before sending the data to remote server, user precomputes
possible using POR method with an authentication code. The MACs for whole data with a private key. Each time the user
data doesn’t even need to be retrieved back from the CSP releases the private key to CPS server and compares its MAC
server to local disk. with the one stored in his local disk to check integrity of data.
1) Static POR After precompute the signature of each block, client sends
In this method, data owner computes an authentication the data and signatures to cloud server. This method is named
code with a secret key. The data is partially (just some blocks) Signature method which can only support the static data.
encrypted. As a next step, the file and the code are sent to CPS
server. It’s not necessary to keep the file and the code in the VI. CONCLUSION
local storage anymore. The data owner just needs his private
In this paper, we observed integrity checking methods for
key to verify cloud server’s response for integrity check [17,
cloud computing. Different research papers about data
18, 19].
integrity (including its definitions, techniques, operations,
Keyed Hash Function can be used as another technique. advantages and disadvantages) are explained. The study
For this method, cryptographic hash of the data is generated analyzes and compares all these techniques (A comparison is
before it is sent to cloud. Releasing the private key lets the shown in Table 1). As a conclusion, we can say cloud
cloud response with the value of cryptographic hash which can computing needs to be more designed efficient, multi-user
allow the data owner check the integrity of his files. computational, dynamic, retrievable, applicable and most
importantly secure. That’s why cloud computing, especially
Using sentinels is one of the approaches which is generally data integrity is still a wide open issue in researchers.
used for large files. Sentinel blocks which are hidden and
randomly embedded in the data blocks are sent to CSP server
as a part of orıginal data. When the cloud service client
challenges the cloud by specifying the positions of a collection
169
TABLE I. COMPARATIVE STUDY OF VARIOUS OF DATA INTEGRITY METHODS
Reference Integrity Check Static/ Dynamic
Advantage Disadvantage
Paper Method Operation
[11] PDP Static I. Public verifiability I. Not Recoverable
II. Blockless verification II. Lack of Privacy Preservation
III. Unbound of Queries III. Relatively Long
IV. Don’t need TPA
[12] PDP Static I. Ensure Privacy I. High Communication
Preservation Complexity
II. Lack of Verification
[13] PDP Dynamic I. Blockless Verification I. Lack of Privacy Preservation
II. Bulk Encryption isn’t II. Not Recoverable
required III. Unbound no of Queries
III. Secure and Efficient IV. Does not Perform Block
Dynamic Operation Insertions
IV. Don’t need TPA V. Limited Number of Updates
VI. No Public Verifiability
[14] PDP Dynamic I. Unique Replica I. Storage Complexity
II. Computation Complexity
[15] PDP Dynamic I. Offers Fully Dynamic I. Computational Complexity
Operation II. Not Include Provisions for
Robustness
III. Not Applicable for thin Client
[16] PDP Static I. Public Verifiability I. Extra Storage Cost
II. Unlimited Number of II. Extra Communication Cost
Auditing
[17] POR Static I. Minimum Costs and I. No Data Prevention Mechanism
Efforts II. Limited Number of Queries
II. Minimum Storage
Overhead
[18] POR Static I. Wide Range of I. Limited number of Queries
Parameter Tradeoffs
II. Lower Storage
Overhead
[19] POR Static I. Minimize Costs and I. Only to Static Storage of Data
Efforts II. Limited Number of Eserişe
II. Minimize the
Cmputational
III. Minimize Network
Bandwidth
Consumption
[20] POR Static I. Privacy Preservation I. Unbound no of Queries
II. Ensure Recoverability II. Transmission Cost
III. Blockless Verification III. No Public Verifiability
IV. Ensure Both
Posssession
V. Don’t need TPA
[21] POR Static I. Strong file Intactness I. Not Suitable for thin Client
Assurance II. Only for Static Data
II. Direct Client Server
Communication
III. Low overhead -Strong
Adversarial Model
[22] POR Dynamic I. Detect File I. Less Efficient
Corruptions with High
Probability
[23] POR Dynamic I. Data Freshness I. Not Enough Number of Trial
II. Data Retrievability II. Dependency on Caches
III. Strong Integrity
Assurances with High
Performance
IV. End-to-end Design
V. Scalability
[24] Other Methods Dynamic I. Confidentialty I. No Public Audibility
(Encryption based) II. Less Computation
170
ACKNOWLEDGMENT [20] A. Juels and B. S. Kaliski, “PORs: Proofs of Retrievability for large
This research was partially supported by the files,” Cryptology ePrint archive, 2007.
Scientific and Technological Research Council of Turkey [21] K. D. Bowers, A. Juels, and A. Oprea, "HAIL: A high-availability
and integrity layer for cloud storage," 6th ACM conference on
(TUBITAK) Informatics and Information Security Research Computer and communications security, 2009.
Center (BILGEM). We thank our colleagues from
TUBITAK BILGEM who provided insight and expertise [22] Z. Mo, Y. Zhou, and S. Chen, "A dynamic proof of Retrievability
(PoR) scheme with O(logn) complexity," in Communication and
that greatly assisted the research, although they may not Information Systems Security Symposium, IEEE, 2012.
agree with all of the interpretations/conclusions of this
[23] E. Stefanov, M. Dijk, A. Juels, and A. Oprea, "Iris: A scalable cloud
paper. file system with efficient integrity checks," in 28th Annual Computer
Security Applications Conference, ACM, 2012.
REFERENCES [24] P. Varalakshmi and H. Deventhiran, "Integrity checking for cloud
[1] B. M. Leiner et al., “A Brief History of the Internet,” 2009. [Online]. environment using encryption algorithm," in Recent Trends In
Available: Information Technology (ICRTIT), IEEE, 2012.
http://www.cs.ucsb.edu/~almeroth/classes/F10.176A/papers/internet-
history-09.pdf. Accessed: Feb. 12, 2016.
[2] “what is cloud computing?”. [Online]. Available:
http://www.ibm.com/cloud-computing/what-is-cloud-
computing.html. Accessed: Feb. 12, 2016.
[3] NIST, “Publication moved: NIST SP 800-145, the NIST definition of
cloud computing,” 2016. [Online]. Available:
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf.
Accessed: Feb. 12, 2016.
[4] E. Gorelik, “Cloud computing models,” 2013. [Online]. Available:
http://web.mit.edu/smadnick/www/wp/2013-01.pdf. Accessed: Feb.
12, 2016.
[5] M. Sajid and Z. Raza, "Cloud computing: Issues & challenges "
International Conference on Cloud, Big Data and Trust, RGPV, 2013.
[6] A.Bendovschi and B.Ionescu,“The Gap between Cloud Computing
Technology and the Audit and Information Security Supporting
Standards and Regulations”, 2015.
[7] R.Tauwhare, “Cloud Computing, Export Controls, and Sanctions”,
2015.
[8] “Cloud computing promises speed, agility, flexibility, & innovation-
learn from Gartner’s analysts,” Gartner. [Online]. Available:
http://www.gartner.com/technology/topics/cloud-computing.jsp.
Accessed: Feb. 12, 2016.
[9] D. F. Parkhill, “Challenge of the computer utility,” Addison-Wesley,
1966. [Online]. Available: http://agris.fao.org/agris-
search/search.do?recordID=US201300597968. Accessed: Feb. 12,
2016.
[10] V. Raut and S. Itkar, “A survey on data integrity of cloud storage in
cloud computing,” International Journal of Advance Foundation and
Research in Computer, vol. 1, no. 2, 2014.
[11] G. Ateniese et al., “Provable data possession at Untrusted stores,”
2007, Cryptology ePrint archive, May 2007. Report 2007/202.
[12] M. A. Shah, R. Swaminathan, and M. Baker, “Privacy-preserving
audit and extraction of digital contents,” Cryptology ePrint Archive,
2008
[13] G. Ateniese, R. Di Pietro, L. V. Mancini, and G. Tsudik, “Scalable
and efficient Provable data possession,” 2008. Proceedings of the 4th
international conference on Security and privacy in Communication
networks, Istanbul, Turkey, 2008.
[14] C. C. Erway, A. Küpçü, C. Papamanthou, and R. Tamassia,
“Dynamic Provable data possession,” 2009. [Online]. Available:
https://eprint.iacr.org/2008/432.pdf. Accessed: Feb. 12, 2016.
[15] C. Reza, K. Osama, B. Randal, and A. Giuseppe, “MR-PDP:
Multiple-replica Provable data possession,” in The 28th International
Conference on Distributed Computing Systems, 2008
[16] B. A. F and H. M. Anwar, “Integrity verification of multiple data
copies over Untrusted cloud servers,” in Cluster, Cloud and Grid
Computing (CCGrid), IEEE, 2012
[17] S. Kumar and A. Saxena, “Data integrity proofs in cloud storage,” in
Communication Systems and Networks (COMSNETS), IEEE, 2011
[18] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability:
Theory and implementation,” ACM workshop on Cloud computing
security, 2009
[19] P. Deore, M. Kale, S. Jadhav, and N. Bhadane, “Data integrity Proofs
in cloud storage,” 2015.
[Online]. Available: http://www.ijraset.com/fileserve.php?FID=1648.
Accessed: Feb. 12, 2016.
171