0% found this document useful (0 votes)

11 views16 pages

Block

This paper presents a blockchain-based decentralized cloud storage system that incorporates reliable deduplication and storage balancing strategies to enhance data availability and reduce redundancy. It utilizes a deterministic secret sharing scheme for secure data distribution across multiple cloud servers, while a heuristic matching algorithm ensures balanced storage allocation. The proposed system aims to maintain data confidentiality, integrity, and reliability against potential attacks, demonstrating effective performance through experimental results.

Uploaded by

Siddaraju S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Block

Uploaded by

Siddaraju S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO.

4, JULY/AUGUST 2024 3289

Blockchain-Based Decentralized Cloud Storage With

Reliable Deduplication and Storage Balancing
Jingyi Li , Yidong Li , Senior Member, IEEE, Jigang Wu , Member, IEEE, Zikai Zhang , Member, IEEE,
and Yi Jin

Abstract—Data deduplication schemes have been widely used in massive replicas enhance the availability of outsourced data,
cloud storage systems to save storage space by eliminating duplicate the occupation of extra disk space and transmission bandwidth
outsourced data. However, the reduction of redundancy leads to wastes cloud resources. Users may need to endure network
decreased data availability and unbalanced data distribution. This
paper proposes a blockchain-based decentralized storage system congestion when uploading data, and cloud service providers
with reliable deduplication and storage balance strategy to provide have to provide additional storage space for data copies, thereby
reliability for deduplicated outsourced data. Encrypted data is split increasing the cost of outsourcing storage services.
into chunks by a ramp secret sharing scheme, and it is distributed In response to heavy data redundancy, deduplication tech-
to multiple independent cloud servers. States of the chunks are niques are adopted by cloud storage service providers to reduce
recorded on the tamper-proofed blockchain and they can be used to
recover the raw data or support the verification of user identity. To disk consumption. The main purpose of data deduplication is to
balance the distribution of data among storage servers, a heuristic eliminate all redundant copies of a file from the database. It is re-
matching algorithm is designed to efficiently allocate the available ported that famous cloud storage platforms, such as Dropbox [2],
storage space. The allocation services are published by autonomous Mozy [3], and Google Drive [4], have saved more than 90%
smart contracts and other participants gain rewards by giving the of disk spaces by implementing deduplication [5]. Specifically,
best matching results of data chunks and storage servers. Formula-
tion analysis demonstrates the correctness of the proposed scheme deduplication approaches can be categorized into server-side
in terms of data consistency, integrity, and reliability. Experimental and client-side. The former method performs deduplication by
results show that the proposed scheme preserves the confidentiality eliminating redundant data from the storage server and the latter
of outsourced data with acceptable computational consumption. method is deployed by preventing clients from uploading dupli-
Index Terms—Deduplication, blockchain, decentralized, cloud cate data. If deduplication is done on the client-side, users can
storage, security. leverage a short proof of ownership (PoW) message to upload
their data instead of a complete file. In such a manner, client-side
I. INTRODUCTION methods require lower transmission resources than server-side
LOUD storage platforms provide users with efficient, methods in a cloud computing architecture. Deduplication can
C scalable, and flexible cross-platform data management
services. Individuals and enterprises are attracted to outsource
also be categorized into file-level and block-level. The file-level
schemes eliminate the entire file from the database, while the
their data on remote centralized cloud servers to reduce local block-level schemes split the file into chunks and remove the
storage overhead and access their data whenever necessary. duplicate chunks. Since files are most stored by chunks in
Since the requirement of data outsourcing services increases cloud storage architecture, block-level schemes provide a more
rapidly, the ever-increasing data and backups lead to a severe flexible and fine-grained deduplication. Although deduplication
disk crisis. According to the CISCO Global Cloud Index [1], has been studied for many years, there are still some challenges
the storage capacity of global data centers is expected to grow to deploying deduplication in the cloud environment.
4-fold whic/h reaches a total of 2.6 ZB in 2021 from 663 EB in Firstly, the cloud system is untrusted, outsourced data must
2016. In general, two-thirds of the data on the remote cloud are be encrypted before being uploaded. As for frequently used
duplicates due to the practice of industrial application. Although encryption methods, data owners generate various ciphertexts
depending on their unique private keys, but these ciphertexts can-
Manuscript received 12 September 2021; revised 21 March 2022; accepted not be identified as duplicates. One of the typical approaches to
13 April 2022. Date of publication 28 February 2024; date of current version solve this problem is convergent encryption [6]. Convergent en-
12 June 2024. This work was supported in part by the Guangdong Basic and cryption schemes generate deterministic secret keys depending
Applied Basic Research Foundation under Grant 2021B1515120010 and in part
by the National Natural Science Foundation of China under Grant 62072118. on plaintext and output identifiable ciphertext which supports
Recommended for acceptance by Dr. W. Saad. (Corresponding author: Jigang the duplicate check. Most of the convergent-encryption-based
Wu.) designs focus on improving the efficiency and storage perfor-
Jingyi Li and Jigang Wu are with the School of Computer Science and
Technology, Guangdong University of Technology, Guangzhou 510006, China mance of the encryption process [7], [8] but the reliability of
(e-mail: asjgwucn@outlook.com). outsourced data is not under consideration. As a result, the loss
Yidong Li, Zikai Zhang, and Yi Jin are with the School of Computer and of many-referenced deduplicated data chunks influences all the
Information Technology, Beijing Jiaotong University, Beijing 100044, China
(e-mail: ydli@bjtu.edu.cn). related data owners in previous works. In this case, secure and re-
Digital Object Identifier 10.1109/TNSE.2024.3369630 liable deduplication schemes are proposed to solve this problem.

2327-4697 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3290 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

Li et al. provide a secure deduplication scheme with improved as file chunks. Even part of the chunks are failed to be accessed,
reliability by using a deterministic secret sharing scheme [9]. data owners are allowed to recover the original data using
In addition, key allocation combined with data encryption can metadata and several remaining chunks. The proposed system
impede the repeated uploading of the same data [10]. further leverages a convergent encryption algorithm to prevent
Without exception, the above-mentioned schemes all deploy outsourced data from collusion attacks With the implementation
trusted third parties to perform encryption and identity verifi- of a heuristic matching algorithm, the proposed scheme achieves
cation. However, once the third party is attacked or colludes balanced storage allocation among cloud servers depending on
with cloud servers, the outsourced data is vulnerable to unau- the ratio of utilized storage space. Besides, we use a blockchain
thorized access or tampering by malicious attackers. Wang database to permanently record only metadata of outsourced
et al. [11] propose a novel storage module called JointCloud, files to support lightweight reliable ownership verification
which consists of multiple cloud storage providers and supports with high integrity and consistency. Blockchain-based smart
efficient cross-cloud services. Zhang et al. [12] improve the contracts are also presented to substitute a trusted third party
JointCloud by using a fully randomized tag generation algorithm to deploy automatic identity verification, data allocation, and
to construct a decentralized storage deduplication architecture. payment process without concern for single point attacks.
Although the proposed method satisfies the requirement of key The contributions of this paper are summarized as follows.
management reliability, it still suffers from the single point of r A novel blockchain-based storage system with reliable
failure. deduplication is proposed in this paper. An improved
Recently, blockchain techniques attract much attention for deterministic-secret-sharing-based deduplication scheme
their decentralized architecture and consensus mechanism. The is implemented to securely eliminate replicas from cloud
unique features of Blockchain can be used to achieve trusted storage servers. The threshold secret sharing strategy
global synchronization of data states in an untrusted environ- allows a data owner to store the file in a distributed
ment. This provides novel insight into distributed storage under way and retrieve the original one even if part of the
recent cloud computing architecture or even 6G systems [13]. outsourced data chunks are failed. The secret sharing
In terms of cloud computing, blockchain has been leveraged scheme is implemented upon convergent encrypted ci-
to replace the role of trusted authority in decentralized stor- phertext to prevent outsourced data from the collusion
age systems. Specifically, existing blockchain-based storage attack. Blockchain database is used to record data states
schemes often permanently store data objects and operations and operations to support local duplicate check and in-
on immutable blockchain to perform credible validation or be- tegrity verification. To further prevent the proposed system
havior monitor [14], [15], [16]. On these bases, Xu et al. [17] from single point attacks, a fundamental smart contract
proposed a blockchain-based data auditing mechanism to re- design is proposed to allocate data chunks and deal with
duce the burden of users and service providers while achieving payment processes instead of introducing a trusted third
efficient and trustworthy auditing. Li et al. [18] combine risk party.
management and blockchain to construct a trust mechanism r A heuristic matching algorithm is proposed to properly
for distributed IoT devices to ensure data sharing process and allocate outsourced data chunks to multiple cloud servers.
data integrity, in addition to its resistance against malicious The proposed algorithm maintains the balance of utilized
attacks. disk space by minimizing the variance of available storage
However, most of the previous works ignore the issue of spaces. To further improve the performance of allocation,
the severe storage burden in cloud storage. While the systems a data owner publishes a data uploading request to smart
with deduplication schemes mentioned above are careless about contracts and pays for the best matching results. Multiple
the reliability of deduplicated data. Besides, the heterogeneous specified allocate servers (Allocators) obtain parameters
feature of servers in a distributed storage system leads to another of outsourced data from the smart contract and execute the
problem called storage unbalance. For instance, data owners proposed heuristic algorithm to meet the requirements of
often choose storage servers based on the principle of proximity the data owner. The best matching results are verified by
and send their data to servers that have higher capacity than all Allocators before being used by data owners to ensure
the nearest. As such, nearby outsourced data are more likely correctness and effectiveness.
to be allocated to a unique server which has better network r Bilinear-map-based verification algorithm is given and
conditions and richer hard disk space. The centralization of proved to be effective in this paper. Meanwhile, the basic
data distribution can lead to the vulnerability to single point protocol of data storage process is proposed as a sample
attacks and decreases the availability of stored data. Due to the of implementation. Security analysis demonstrates that the
uniqueness of deduplicated data, once the centralized storage confidentiality, integrity, and reliability can be preserved
server fails or is attacked, data owners are more likely to lose for outsourced data under the collusion attack or malicious
the entire data. tampering. Simulation results show that the proposed sys-
To address these challenges, we present a deduplication- tem can preserve the reliability of duplicated data under the
enabled novel blockchain-based storage system. A acceptable computational consumptions, in comparison to
client-side block-level deterministic-secret-sharing-based other baselines.
deduplication scheme is proposed to securely and reliably Section II gives a brief overview of related work. Section III
distribute the outsourced data to multiple cloud storage servers provides the preliminary and notions. Section 4 presents the

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
LI et al.: BLOCKCHAIN-BASED DECENTRALIZED CLOUD STORAGE WITH RELIABLE DEDUPLICATION AND STORAGE BALANCING 3291

system model and security model. Section V proposes the dedu- platform upon the P2P storage network to provide reliable trans-
plication scheme. Section VI describes the heuristic matching action recording and processing services [27]. Another system
algorithm, and Section VII gives the security analysis. Sec- named Sia leverages proof of hard drive capacities consensus
tion VIII shows the simulation results. Section IX discusses the to generate blocks [28]. Li et al. introduce blockchain into the
contributions and limitations, and Section X concludes the paper. P2P storage network to prevent outsourced data from malicious
tampering by recording the metadata of outsourced data [29].
As for deduplication in storage systems, Singh et al. propose
II. RELATED WORK a blockchain-based key management scheme to prevent the
In terms of secure deduplication, convergent encryption is convergent key from tampering in a secure data deduplication
commonly used to generate identifiable ciphertexts where the scheme [30]. Huang et al. claim that they combine the blockchain
encrypted key is related to the raw data [6]. On this basis, and bilinear map cryptography methods to perform auditable
Bellare et al. propose the DupLESS, a deduplication scheme secure deduplication [31]. Yuan et al. use a message-locked
that encrypts the outsourced data with centralized key server- encryption algorithm to identically eliminate replicas and smart
generated content-based keys [19]. Liu et al. further improve contracts to punish the malicious cloud service providers [32].
the convergent encryption scheme by providing convergent key Zhang et al. construct role key for outsourced data by using hi-
management which uses the identity-based broadcast encryp- erarchical role hash tree, and they implement integrity check by
tion scheme [20]. However, the convergent encryption scheme leveraging blockchain-based audit protocols [12]. Li et al., pro-
suffers from brute-force attacks and tag consistency threats. As pose a blockchain-based client-side block-level deduplication
such, Ding et al. construct a user identifiable data deduplication scheme with reliability and eliminate the third party by introduc-
scheme with homomorphic re-encryption to avoid brute-force ing smart contracts [33]. Liu et al. [34] combine blockchain-aid
attacks [21]. Attribute-based encryption is another kind of brute- keyword-based searchable attribute-based encryption scheme
force-resist encryption scheme which is widely used in the and Pedersen (k, n) secret sharing to achieve data confiden-
deduplication system [22]. Deypos implements deduplication tiality and fine-grained access control between source-limited
with a novel proof of storage scheme to support the dynamic IoT devices and cloud servers. Wang et al. [35] consider the
auditing of tags [23]. Although the works mentioned above efficiency of blockchain-based data sharing system and present
satisfy the requirements of data security in the deduplication a blockchain-based special vehicles priority access guarantee
process, they cannot provide sufficient data reliability since the scheme to achieve real-time recognition and scheduling of spe-
single copy of data duplicate is vulnerable to a single point of cial vehicles. However, the consistency and integrity of data tags
failure. are not strongly preserved in [33], and users may benefit from
To provide reliability to the deduplicated data process, Wu the penalty mechanism by deliberately providing incorrect proof
et al. present a Per-File Parity scheme that computes the XOR of retrievability to servers.
parity of data chunks before deduplication to provide parity To allocate available disk space among storage servers and
redundancy [24]. However, the trusted third parties introduced improve system reliability, load balancing techniques are lever-
by the above methods weaken the fault tolerance of the storage aged in previous works. As to data sharing, Victor et al. balance
systems. As the centralized trusted party often stores the system the storage utilization of servers by using the classification
parameters and process validation operation, successful mali- model and leveraging a dynamic availability load balancing
cious attacks on this entity lead to the ineffectiveness of system scheme [36]. Considering the characteristics of blockchain net-
functions. Considering this problem, Shen et al. use erasure works, Yin et al. propose a financial incentive load-balancing
code to distribute the outsourced data to multiple servers, and protocol to allocate the distribution of outsourced data in the
the integrity of each data chunk is verified by a group of key blockchain-based storage system [37]. Liu et al. balance the
servers [25]. With the utilization of Reed-Solomon codes, the global storage resources in a blockchain network according to
reliability of outsourced data can be preserved by storing a few the Gini coefficient and solve the problem through a heuristic
extra replicas. Li et al. distribute the outsourced data to multiple algorithm [38]. However, to the best of our knowledge, there are
servers by using a deterministic secret sharing scheme [9]. The no related works on constructing a blockchain-based dedupli-
centralized trusted third party in this scheme is only used to catable storage system with balanced data allocation.
randomize the deterministic encoding algorithm to enhance the
confidentiality of outsourced data. Li et al. further improve the
III. PRELIMINARY AND NOTATIONS
deterministic secret sharing scheme by deploying the encoding
process to distributed virtual machines so that the participation In this section, we describe some preliminaries and notations
of a centralized entity is eliminated [26]. Although both of utilized in our scheme.
the schemes provide strong reliability to outsourced data, the
consistency and reliability of tags and operation records have
A. Bilinear Pairing
not been fully considered.
Data storage services can be benefited from introducing Let G be a cycle additive group and GT be a cyclic multiplica-
blockchain techniques for enhancing the consistency and in- tive group of the prime order q. Let P be an arbitrary generator
tegrity of data objects states. On this basis, a system named of group G. aP denotes P added to itself a times. A bilinear map
Storj is proposed as a blockchain-based storage space trading is a map ê : G × G → GT with the following properties.

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3292 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

r Bilinearity: ê(aP, bQ) = ê(P, Q)ab for all P, Q ∈ G and information is believed to be true. As such, states and informa-
a, b ∈ Z∗q . Here Z ∗q = {ρ|1 ≤ ρ ≤ q − 1}. tion can be synchronized by using the approved blockchain.
r Non-degeneracy: ê(P, P ) = 1. In terms of decentralization, blockchain can be categorized
r Computable: There exists an efficient algorithm to compute into three types: public blockchain, private blockchain, and con-
ê(P, Q) for all P, Q ∈ G. sortium blockchain. The public blockchain provides the highest
level of confidentiality and integrity protection to stored data
among these types. For instance, at least 51% of the computing
B. Convergent Encryption power is needed for an adversary to tamper with the blockchain
Convergent encryption is a cryptosystem that produces the data. However, for independent miners, honestly mining with
identical ciphertext from determinate plaintext, irrespective of computing power can get more rewards than performing mali-
their encryption keys [6]. In particular, let F be the plaintext of cious behaviors. Hence, it is difficult to break the public chain
a file to be uploaded, it is encrypted by a symmetric encryption when there are sufficient independent participants.
method Encryptk (·) with data-derived secret key K, which The smart contract is a kind of self-executing machine and
is generally the hash value of F . Thus, the ciphertext C is it runs across the blockchain network. The deployed contract
capable of being deduplicated due to the deterministic methods reads contents from blockchain and expresses defined business
Encryptk (·) and K. process automatically if the contract is triggered by certain
In this paper, three primitive functions, KeyGenCE (F ), conditions [41]. To deploy a smart contract, users pre-design
EncryptCE (F, K), and DecryptCE (C, K), are used to define the trigger condition, input, and output for a smart contract,
a convergent encryption scheme as follows. and then they publish it on the blockchain. When the contract
KeyGenCE (F ): This is the key generation algorithm that is approved by all nodes, it is continually maintained until the
inputs the plaintext F and outputs a convergent key K. contract expires. Once the smart contract receives the qualified
EncryptCE (F, K): This is the symmetric encryption algo- input, the contained code is automatically executed to perform
rithm that inputs the convergent key K and plaintext F , and defined operations.
outputs ciphertext C. In this paper, we implement our system on the virtu-
DecryptCE (C, K): This is the symmetric decryption algo- alchain [15]. The virtualchain architecture runs on the existing
rithm that inputs the convergent key K and ciphertext C, and public blockchain. It can maintain low computing and storage
outputs the plaintext F if C and K are correct. consumption under the premise of ensuring the confidentiality
and reliability of public chain. Here, we give a brief description
of our blockchain design as follows.
C. Ramp Secret Sharing Scheme r Participants of the proposed storage system interact with
In the proposed scheme, we leverage a deterministic ramp virtualchain API to perform data outsourcing. System oper-
secret sharing scheme based on [9] to secretly split the data into ations are recognized and processed by virtualchain while
chunks and easily recover the raw data by collecting a threshold underlying blockchain nodes are agnostic to it.
number of chunks. It is a threshold secret sharing protocol that r Underlying blockchain records the data operations as trans-
distributes a secret into n shares. We claim that the distribution action data. Blockchain nodes are responsible for verifying
algorithm should satisfy the following requirements: (1) using the consistency of transactions. Only storage system partic-
k or more pieces of the shares can reconstruct the secret, and ipants recognize and process the system operations noted
(2) using t or fewer shares that can not gain any information on the blockchain.
on the secret. Two sub-schemes named Share and Recover are r In the proposed system, smart contracts are published by
included in the proposed (t, k, n) ramp scheme. system participants and recorded on the blockchain as
Share(S): This scheme divides a secret S into (k − t) sub transactions. Each blockchain node stores blockchain data
secrets and generates t pieces of the same size. Then the k pieces including smart contracts in principle. Smart contract in
are encoded by a non-systematic k-of-n erasure code into n the proposed system is used to publish data allocation
shares. requirements and automatically pay for the best allocation
Recover(shares): Using at least k out of n shares, a user results. Besides, the smart contract is also used for pay-
can reconstruct the secret S. ment. The contract transfers prepaid costs to the transaction
address of related cloud server if the server provides correct
outsourced data to its owner.
D. Public Blockchain and Smart Contract We allow source-limited data owner to be the light node if the
A blockchain consists of sequential blocks in which the underlying blockchain platform supports lightweight clients and
subsequent block stores the collision-resistant hash value of the the reliability of lightweight node is also depend on blockchain
previous block. A typical block contains the hash value of the service providers. Obviously, the underlying blockchain needs
previous block, timestamp, transaction data, and a randomized to support any form of smart contracts such as Hyperledger [42]
parameter nonce, which is used to adjust the difficulty of block or Ethereum [40].
generation in Proof of Work [39] or Proof of Stake (PoS) [40]. As to the smart contract for data allocation, a data owner first
Generally, only the longest chain is admitted, and the recorded introduces the identity number of storage servers, number of

Fig. 1. System model.

outsourced data chunks, and size of chunks. Then, a verification multiple cloud service providers with different cloud re-
algorithm is used to calculate the performance of the input sources and service capacities. Each cloud server coop-
matching strategy. The specific verification algorithm refers to erates through software definition on top of blockchain
formula (5) and constraints in (7) mentioned in Section VI. A architecture. The proposed system does not support the
tuple is introduced in the smart contract to store the current transmission of outsourcing data between cloud servers.
best results with its publisher address. Other participants access r Data Owner. Data owner outsources data to cloud servers
available disk space of storage servers from blockchain and and pays for the storage service. Data owner performs du-
perform matching algorithms depending on the requirements plicate check and data encoding before the uploading pro-
noted on the smart contract. After a period of time (which is cess to achieve client-side deduplication. In the proposed
predefined by contract owner and blockchain consensus) the system, data owners outsource their data to the nearest
smart contract transfers the contract owner’s prepaid costs to cloud servers as in conventional distributed storage system.
the address recorded in the tuple to award the publisher. As In particular, outsourced data is split and assigned through a
to the smart contract for storage service payment, data chunk heuristic algorithm to properly balance the storage burden.
identity and predefined price are included. Detailed functions of As such, data owners are also responsible for publishing
the payment contract are introduced in Section V-B3. the allocation requirements for their data.
r Allocator. Allocator is a kind of participant who obtains fi-
IV. SYSTEM MODEL AND THREAT MODEL nancial benefit by properly allocating the storage resources.
Allocators compete for the interest of providing the best
In this section, we discuss the system model and the threat
matching strategy between data chunks and cloud servers
model considered in this paper.
by executing a specific heuristic matching algorithm. Then
all allocators verify the optimal matching results and they
A. System Model record the states of outsourced data as the transaction
As shown in Fig. 1, the proposed scheme mainly consists contents of blockchain. Allocators are also responsible for
of three types of entities, i.e. Cloud Server, Data Owner, and confirming the correctness of chosen matching through
Allocator. smart contracts.
r Cloud Server: Cloud server provides cloud storage ser- The proposed system iteratively runs the following steps.
vices and benefits from maintaining the availability of Step 1: The users, including cloud server and data owner,
outsourced data. Specially, we consider a distributed stor- who demand deduplication service, register a private-
age architecture including multiple cloud servers in the keys-based blockchain address to interact with other
proposed scheme. Specifically, cloud servers belong to participants.

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3294 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

Step 2: The data owner proceeds a duplication check ac- r Malicious Users: These kinds of adversaries pretend to be
cording to the data states recorded on the blockchain the data owners and interact with cloud servers to access
before outsourcing data. outsourced data [45]. As data tags are commonly used as
Step 3: The data owner performs the ramp secret sharing proofs of ownership and shown on blockchain in the pro-
scheme to split the raw data into chunks. Then the posed system, they may attempt to unauthorizedly access
data owner publishes a data allocation task through a the data chunks by using a Man-in-the-middle attack. A
smart contract for all the non-duplicate chunks. malicious user may also upload data with the same tag but
Step 4: The allocators execute an allocation algorithm. Then different contents as outsourced data. Later data owners
they send the matching results to the smart contract, can pass the proofs of ownership and believe that there are
according to the given task information. duplicate data chunks as they own. However, when data
Step 5: The data owner distributes data chunks following the owners try to recover the raw data they may retrieve the
matching results on the smart contract. Then data wrong data chunks due to the tampering behavior of mali-
owner records the states of these data chunks on the cious users. Besides, an adversary may attempt to perform
blockchain. Specifically, metadata of the outsourced a dictionary attack upon outsourced data to perceive user
data are recorded on the blockchain at the same time. privacy.
Step 6: When the data owner retrieves the outsourced data, r Unbalanced Data Distribution: In a distributed storage
the cloud servers calculate the proof of storage for system, data allocation conventionally depends on the sin-
related data chunks. Then cloud servers submit the gle storage server’s state, such as remaining disk space,
proofs to the blockchain to get paid off by providing communication bandwidth, and popularity. If some storage
the correct verification. servers have a higher capacity than others, most of the
Step 7: The data owner recovers the raw data according to storage redundancy will be assigned to these servers. The
the metadata recorded on the blockchain. concentration of access requests lead to storage central-
In the proposed system, blockchain is only used to store meta- ization, and the system becomes vulnerable to a single
data of outsourced files, including tag, metadata for Recover(·) point of failure. Attacks on the hot servers can also lead to
in secret sharing scheme, and allocated destination (the mapping wide-ranging data losses, especially when deduplication
between data chunks and target cloud servers). Here, we use a schemes commonly eliminate all duplicates except one
Cauchy matrix-based erasure code to construct the secret sharing copy in the whole network.
scheme as mentioned in [43]. The metadata for Recover(·) According to the proposed threat model, we define the security
can be described as a n × k matrix. Data chunks are not the requirement for our system, which consists of confidentiality,
required content for blockchain transactions. Note that the min- integrity, and reliability.
ers of blockchain do not participate in the storage system or Confidentiality: In order to guarantee plaintext confidentiality,
processing any data operations. Data owners are allowed to the outsourced data should be encrypted before uploading so that
hold only block headers for necessary verification, instead of the raw data cannot be exposed to adversaries. Any participants
keeping complete blockchain data or executing computational- cannot prove their ownership of the outsourced data to prevent
intensive mining algorithms. Thus, the storage consumption user data from unauthorized access unless they have already had
and computing burden of data owner can be significantly it. Also, the collusion among a certain number of cloud servers
decreased. can be prevented. Even if no more than a threshold number
of related storage servers collaborate to recover and obtain the
raw data, they cannot decrypt ciphertext or access additional
B. Threat Model and Security Goals information about original content.
Here, we consider threats both from inside and outside of the Integrity: Satisfying the data integrity requirement means that
system. As to the inside, we assume that semi-trusted servers the data downloaded from the cloud storage servers can be
and malicious users may try to unauthorizedly access the raw verified by data owner. Credible data tags are required. Tags
data, and the latter may further try to tamper with the out- cannot be modified by unauthorized participants to support the
sourced data and metadata. As to the outside, an unbalanced integrity verification. Also, data owners and cloud servers can
data distribution strategy leads to storage centralization. In this verify the correctness of tags to ensure outsourced data not to
case, the outsourced data is vulnerable to the single point of be tampered with by adversaries.
failure. Reliability: Reliability requirement means that the storage
r Semi-trusted Servers: In the presented system, cloud system can prevent user data from the single point of failure by
servers are considered to be semi-trusted or ‘honest-but- providing a predefined degree of fault tolerance to outsourced
curious’ [21], [44]. Cloud servers are assumed to follow data. In this situation, data owners can retrieve the raw data
our designed protocols to interact with other participants, back when the number of failed servers is less than a prede-
but they are curious about more information on the raw fined threshold. To avoid data skew and centralization of the
data. They may collude with others, who preserve the data whole system, data distribution should be balanced among cloud
with the same tags, to obtain stored data chunks. Then they servers as much as possible to enhance the data reliability.
can recover the raw data by using the decoding algorithm Some of the main parameters mentioned in the proposed
with related published metadata. scheme are shown in Table I.

TABLE I V. THE PROPOSED BLOCKCHAIN-BASED DEDUPLICATION

NOTATIONS
SCHEME
A. Overview
Our deduplication scheme consists of two main phases named
Data Uploading and Data Download. In file upload phases, data
owners utilize the convergent encryption scheme to encrypt F
and split the ciphertext into fixed chunks, no matter whether the
duplicate files exist. After that, data owners download the public
chain data and perform a file-level duplicate check to find if
there exists a tag of outsourced data. If the file-level duplicate
check succeeds, only missing chunks of the data need to be
uploaded. Then, the data owners perform the PoW for stored
chunks. Otherwise, data owners perform a block-level check
to find which chunk is to be uploaded. After that, data owners
outsource the data chunks with the verification message to the
chosen storage servers. In the file download phase, challenges
are sent to smart contract by data owners. Then, the related cloud
servers provide the proof of retrievability to the smart contract. If
the deployed verification is correct and the payment transaction
is completed, data owners can retrieve the outsourced data
chunks. Finally, data owners can recover the raw data through
the recovery algorithm with related metadata on the blockchain.

B. Construction of Blockchain-Based Deduplication Scheme

Let S be a group of cloud servers, S = {s1 , s2 , . . . , sm }. Each
si has identity idk , k = 1, 2, . . . , m. According to the model
mentioned in Section IV-A, S and a data owner u are included
in the scheme.
1) System Setup: In this step, system parameters are gen-
erated. Let G and GT indicate two groups of the same
prime order q. Let G be an additive group and GT as a
multiplicative group. Let P be an arbitrary generator of G.
We choose ê as a bilinear map that satisfies ê : G × G → GT .
Then we choose H : {0, 1}∗ → G and h : {0, 1}∗ → Zq as two
collision-resistant hash functions. To sum up, the system param-
eters are determined as P ara = {ê, G, GT , P, H, h}.
To access blockchain service, u randomly chooses sku ∈ Z∗q
as private key, then compute pku = sku P as public key where
sku , pku ∈ Zq . Additionally, u generates Addru = H(pku
λ) as an exclusive address for anonymous communication in
blockchain network. Here, λ is randomly selected by u as a
private number, where λ ∈ Zq . Similarly, sk (k ∈ m) randomly
chooses sksk ∈ Z∗q and computes pksk = sksk P as public key
where pksk , sksk ∈ Zq , then generates Addrsk = H(sksk
γ idk ) where λ ∈ Zq is randomly chosen. Digital signatures
signu (·) and signsj (·) are generated by the Elliptic Curve
Digital Signature Algorithm.
2) Data Uploading: As described in Section V-B1, system
parameters P ara is provided to u before the uploading phase. A
data owner u uploads a file F according to the following steps.
Step 1: u precomputes T agGen(F ) resulting in tag(F ).
Then u computes KeyGenCE (F ) resulting in K,
to generate a secret key K from the raw data F .
And u generates the encrypted data C by com-
puting EncryptCE (F, K) with an output C. Here

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3296 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

T agGen(F ) = h(F ), and secret key K can not be Step 8: A cloud server sj verifies the PoW of ciq by checking
inferred from h(F ).
ê(H(ciq pku nonce), pku ) = ê(powu , p).
Step 2: Given encrypted data C, u divides the ciphertext
into l chunks as {c1 , c2 , . . . , cl } and generates data If the verification fails, u would be refused by cloud
tags {tag(c1 ), tag(c2 ), . . . , tag(cl )}. To distribute server sj . Otherwise, sj computes
data chunks among multiple cloud servers, u runs
Share(ci ) to generate {ciq }, where ci denotes the i- pows = H(ciq pksj nonce)sksj ,
th chunk and {ci1 , ci2 , . . . , cin } is the set of encoded then it returns {pows , tag(ciq ), pksj , nonce}, a file
shares, i = 1, 2, . . . , l. pointer and a signed smart contract script Script to
Step 3: In this step, u receives the recent file tags u. The Script is defined below.
{tag(F ), tag(C ), tag(ciq )} from blockchain and
updates the local blockchain database. Script = (blank) signsj (Script) verif y price.
Step 4: Upon updating local data, u performs duplicate check
Here (blank) is a vacancy that should be filled for
to confirm whether the tag(F ) already exists by
the signature of u. verif y is a smart contract algo-
checking
rithm implemented on blockchain to automate the
∃tag(F ) = tag(F ), (tag(F ) ∈ {tag(F )}). payment. More specifically, this algorithm receives
a verification message E(H(ciq v), sku ) from u
If the duplicate check is successful in file level, u pro- and E(H(ciq v), sksj ) from sj then the algorithm
ceeds to perform block-level deduplication among decrypts them with pku and pksj , respectively. At
{ciq } as in Step 5. Otherwise, the block-level dupli- last, this algorithm makes a comparison and outputs
cate check is be executed as follows. the value of trigger as follows.
∃tag(ciq ) = tag(ciq ), tag(ciq ) ∈ {tag(ciq )}).
0, if H(ciq v) = H(ciq v),
trigger =
Step 5: For those chunks which are recorded in the latest 1, otherwise.
blockchain transaction, u believes that they are al- Besides, price is an amount of ETH that is already
ready stored in cloud server. In this case, u simply agreed by u and sj . Both u and sj transfer their private
prepares the PoW as follows. account to this contract as the deposit. Once the smart
powu = H(ciq pku nonce)sku . contract is implemented, price is deducted from the
addresses of participants until the contract is over.
Meanwhile, the user sends its identities to the related Step 9: u receives pofs and checks it as the following.
servers as follows.
ê(H(ciq pksj nonce), pksj ) = ê(pows , p).
Tci = {tagi , powu , pku , Addru , nonce},
If the verification fails, u choose another server
where tagi = {tag(F ), tag(ci ), tag(ciq )}. nonce is to outsource data. If not, u fills the (blank) of
chosen from the previous block header,. On the other Script with signature and publishes the smart con-
hand, if no transaction about ciq is maintained, ciq tract SCont = signu (Script) signsj (Script)
is considered to have been tampered with or lost, so verif y price.
that u should proceed as Step 6. 3) Data Download: To download raw data F , u requests
Step 6: In this step, based on the lost data chunk cik , u the recent blockchain data again. But the difference is that,
computes PoW as a light node mentioned in Section IV-B, u is allowed to
maintain only blockheader and related transaction data about F .
powu = H(cik pku nonce)sku ,
As a result, u spends insignificant storage space in obtaining
and identities blockchain service. When the blockchain data is updated, u
works as follows (hereinafter, we take the process between u
Tci = {cik , tagi , powu , pku , Addru , nonce}. and sj as an example).
Here k = 1, 2, . . . , n. After all, u proceeds as Step 7. Step 1: u sends v to sj and sj verify the consistency of v by
Step 7: Before uploading the chunks, u uploads an allocation computing
task with Tci to blockchain. Allocators execute the H(H(ciq ) v).
task and publish their matching results to blockchain.
Smart contract chooses the optimal matching and If the result equal to the value stored in Tci , sj com-
sends to u. u uses H(ciq ) v as retrieved certificate, putes H(ciq v) then processes Step 2. Otherwise,
where v ∈ Z ∗p is randomly chosen. Then H(H(ciq ) the data download process is failed because of the
v) is added to Tci . After that, Tci and Tci ought to be dishonest behavior of u. The deposit is returned to
sent to related servers. Meanwhile the payment for each participant.
allocation task is automatically deducted from the Step 2: H(ciq v) and H(ciq v) are inputted in SCont to
address of u. run verif y.

Step 3: If trigger = 0, it represents ciq is lost or tampered. server is pj

(i)(t)
, which is defined by the following expression.
SCont generates transaction to transfer all the con-
(u)(t)
tract balance to the account of u. Otherwise, sj takes (i)(t) sj
all the balance from the transaction created by con- pj = (t)(t)
. (1)
sj
tract.
Step 4: Upon receiving at least k chunks of the corresponding (t) (t) (t) (t)
Let E = {eij |eij = (ui , vj ), i = 1, 2, . . . , m, j = 1, 2,
data, u reconstructs ci by Recover({ciq }). (t)
Finally, u accesses raw data F by executing . . . , n} be the set of edges. The edge eij means that the data
(t) (t) (t)
DecryptCE (C, K). chunk ui can be stored in cloud server vj . The weight wij
(t)
on edge eij represents the increment of usage ratio caused by
VI. ALGORITHM FOR STORAGE BALANCE (t) (t) (t)
storing ui to vj . The expression of wij is
In this section, we propose an optimization algorithm to ap- (r)(t)
propriately allocate storage space of multiple servers without the (t) si
wij = (t)(t)
. (2)
participation of any third party. A heuristic matching algorithm sj
is proposed in our scheme to distribute data chunks to servers.
The goal of the algorithm is to minimize the variance of available Let P (t) be the set of usage ratio of cloud servers in V (t) .
(t)
disk space for storage servers. Allocators execute the matching pj ∈ P (t) indicates the usage ratio of the j-th cloud server.
result according to the tasks published on the blockchain, then (t)
pj can be calculated as follows
the best allocation strategy is chosen and rewarded by a smart m
contract. (t)
(t) (t) (i)(t)
pj = wij xij + pj . (3)
i=1
A. Formal Definition (t) (t)
Let X (t) = {xij } be the set of matching results. Let xij be a
The storage balance problem can be defined as a bipartite binary variable defined as follows
graph matching problem and the goal is to find the minimum
(t) (t)
fitness matching between data chunks and cloud servers. As (t) 1, ui is upload to vj ,
there is no duplicate strategy in the deduplication scheme, each xij = (4)
0, otherwise.
chunk can only be uploaded to one cloud server.
Allocation tasks are published after every time period T . Each We aim to find X (t) to minimize the variance of P (t) . The
matching process is executed for t rounds. During each round t, variance F (X (t) ) is calculated as follows
we assume that there are m data chunks and n storage servers. n (t) 1
n (t) 2
For every round t, we construct a complete weighted bipartite (t) j=1 (pj − n j=1 pj )
F (X ) = (5)
graph G(t) (U (t) ∪V (t) , E (t) ), where U (t) is the set of outsourced n
data chunks and V (t) is the set of cloud servers. Let m be the (t)
Let ηj be the failure rate of vj . The availability of cloud servers
(t)
size of U (t) , n be the size of V (t) , m < n. ui denotes the i-th
(t) A(X (t) ) is calculated as follows
chunk in set U (t) for i = 1, 2, . . . , m. Note that ui represents a
m
n
set of data chunks generated by the secret sharing scheme. The (t)
A(X (t) ) = ηj xij (6)
size of the set is less than r.
i=1 j=1
Besides, a random number of fixed-size data chunks are
randomly sorted and mixed with other files’ data chunks as In each round t, we aim to obtain the best matching decision B (t)
a cluster to participate in data allocation. As such, the scale to the storage balance problem. Thus, the problem is formulated
of U (t) can be reduced before the allocation process and any as follows

adversary can obtain no valuable information from chunk order
B (t) = arg min F X (t)
or chunk size recorded on the blockchain. While the assignment X (t)
further decreases the computing consumption of the heuristic m
(t)
matching algorithm. In this configuration, the retrievability of s.t. xij ≤ 1, j ∈ N
the outsourced data will not be significantly affected as long as i=1
the failure rate of cloud servers is not less than 0.1%, which is n
(t)
lower than 0.01% in practice. Even if the malicious attackers xij = 1, i ∈ M
successfully break through a single server, they cannot obtain j=1
any valuable information by obtaining less than r data chunks. m
n

vj denotes the j-th cloud server in set V (t) , for j = (t) (r)(t)
xij si ≤ sj
(a)(t)
, i ∈ M, j ∈ N ,
1, 2, . . . , n. The required storage space for data chunks in t i=1 j=1
(r)(t)
is si . The total storage space, utilized storage space, and
(t)(t) (u)(t) A(X (t) ) > ρ,
available storage space of j-th cloud server are sj , sj ,
(a)(t) (t)
and sj , respectively. The initial usage ratio of the j-th cloud xij ∈ {0, 1}, i ∈ M, j ∈ N ,

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3298 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

(t)
wij ∈ (0, 1), i ∈ M, j ∈ N , Algorithm 1: Heuristic Matching Algorithm.
(t)
pj ∈ [0, 1], j ∈ N . (7) Input: G(t) (U (t) ∪V (t) , E (t) ) /* a weighted bipartite graph
*/
(t)
Let ρ be the security parameter. ρ represents the minimum Output: Solution[1:n] /*Solution[i]=j indicates ui is
value of usable servers. The value of ρ depends on the practical assigned to vj */
requirement of system design. To ensure that at least k servers are Begin
available to return the correct data chunks, we define ρ > k + ηj . 1: function GETCHUNKDCSize[m]
The first and second constraints ensure that each data chunk 2: index ← −1;
matches to cloud server one to one. The third constraint indicates 3: temp ← 0;
that the size of data chunks is less than or equal to cloud servers’ 4: for i ← 0 to m − 1 do
available storage space. The fourth constraint indicates that 5: if V isitedDC[i] == 0 then
the matching results must satisfy the requirement of security 6: if temp < DCSize[i] then
parameters. 7: temp ← DCSize[i];
8: index ← i;
9: end if
B. Heuristic Matching Algorithm 10: end if
11: end for
It is clear that the storage balance is a typical combinatorial
12: Return index;
optimization problem. Therefore, it is difficult to find the optimal
13: end function
solution in polynomial time. We focus on finding a heuristic
14:
algorithm to solve it. The main idea of our heuristic strategy is (t) (i)(t)
15: function GETSERVERi, wij , pj
to select the optimal edge, in order to make F (X (t) ) the smallest
16: index ← -1;
in each iteration.
17: tempSet ← ∅;/*Store temporary results */
In each t, the steps of the proposed heuristic algorithm are as
18: for j ← 0 to n − 1 do
follows.
(i)(t) (t) 19: if V isitedCS[j] == 0 then
Step 1: Calculate pj by (1) and wij by (2). Initialize the (i)(t) (i)(t)
array V isitedDC, V isitedCS to 0, respectively. 20: arrayT ← pj ; /*Make a copy of pj */
(t)
Step 2: Select ui with the maximum value. 21: /*TH is a pre-determined safe disk occupancy ratio
(t) (t) threshold*/
Step 3: Calculate pj by (3) and sort pj in ascending order, 22: if (w[i][j] + arrayT [j]) < T H then
calculate variance by (5) and availability of servers 23: arrayT ← w[i][j] + arrayT [j]
by (6). 24: Execute availability value by (6);
Step 4: Find the minimum variance with A(X (t) ) larger 25: if A(X (t) ) > ρ then
(t)
than ρ. Find the corresponding matching xij , 26: Sort arrayT in ascending order;
and set V isitedDC[i], V isitedCS[j], and V isited 27: Execute variance by (5);
(t)
DCCS[i][j] to 1, respectively. Put xij into a solution 28: Put the matching type, variance and
set. availability value into tempSet;
Step 5: if all the data chunks are visited, output the result set. 29: end if
Otherwise, go to step 2. 30: end if
The pseudocode of the heuristic algorithm is shown in 31: end if
Algorithm 1. Note that GetChunk is used to obtain the index of 32: end for
(t)
ui with the largest size which is stored in DCSize. GetServer 33: if tempSet! = ∅ then
is used to obtain the index of cloud server when the minimum 34: index ← the index of cloud servers with the
variance has been achieved in a matching process. minimum variance;
35: end if
36: Return index;
VII. SECURITY ANALYSIS 37: end function
In this section, we demonstrate the proposed scheme that sat- 38:
isfies the requirement of confidentiality, integrity, and reliability 39: /*Main algorithm*/
as mentioned in Section IV-B. 40: for i ← 0 to m − 1 do
Theorem 1: The proposed system satisfies the data confiden- 41: V isitedDC[i] = 0; /*Initialization*/
tiality requirement under the attacks of malicious adversaries. 42: end for
The unauthorized adversaries cannot obtain meaningful infor- 43: for j ← 0 to n − 1 do
mation from the public data. 44: V isitedCS[j] = 0; /*Initialization*/
Proof: The proposed system achieves data confidentiality by 45: end for
implementing secret sharing scheme, encryption algorithms, and 46: DCSize[m] ← /*One-dimensional array for storing the
blockchian architecture. saving the required size of data chunks */

Hence, it is easy for authorized users to observe either the stored

(i)(t)
47: Execute pj by (1); data is correct or the data owner is legitimated through compar-
(i)(t) ing meta(F ) and meta(F ). This concludes that our system
48: W J[n] ← pj /*One-dimensional array W J for
storing initial usage ratio of the j-th cloud server*/ can provide data confidentiality under the threats mentioned in
(t)
49: Execute wij by (2); Section IV-B.
(t) Theorem 2: The presented system satisfies the data integrity
50: W IJ[m][n] ← wij /*Two-dimensional array W IJ for requirement by maintaining the correctness of tags.
computing the utilized storage spaces*/ Proof: Blockchain is used to store the states of outsourced
51: for i ← 0 to m − 1 do data in our system. Each data is verified by system members
(t)
52: /*obtain the index of ui that has the largest data before it is recorded on the blockchain. Therefore, the recorded
chunks to store*/ data, together with data tags, data-server allocation results, and
53: iIndexDC ← GetChunk(DCSize); verification processes, is reliable.
54: /*obtain the index of cloud server with the minimum Besides, participants can not upload the wrong verification
variance*/ unless they control more than 51% computing power of the
55: iIndexCS ← GetServer(iIndexDC, W J); blockchain network. Let A be the malicious adversary, who has
56: if iIndexDC !=-1 and iIndexCS != -1 then fewer than 51% global network computing power. Let P (A) be
57: Solution[iIndexDC] ←iIndexCS; the probability that A successfully generates an incorrect block.
58: V isitedDC[i] = 1; It is proved that P (A) is no more than 2% in [45].
59: V isitedCS[j] = 1; In terms of tag verification, u and s are required to verify the
60: W J[iIndexCS] ← integrity of data chunk ciq by computing
W IJ[iIndexDC][iIndexCS] + W J[iIndexCS]
61: end if ê(H(ciq pku ), pku ) = ê(powu , p),
62: end for
63: Return Result; ê(H(ciq pksj ), pksj ) = ê(pows , p),
End
1, if H(ciq v) = H(ciq v),
trigger =
0, otherwise.

As for using a (r, k, n) secret sharing scheme, the raw data The correctness of the equation can be proved as follows.
is distributed to n servers. Users need to obtain k or more data
chunks to recover the raw data and anyone can not extract valu- ê(H(ciq pku ), pku )
able information from less than r data chunks. In this case, adver- = ê(H(ciq pku ), sku P )
saries must attack or cheat at least k servers to obtain sufficient
data chunks. However, it is difficult for adversaries to success- = ê(H(ciq pku )sku , P )
fully invade multiple users in a heterogeneous network. There- = ê(powu , p).
fore, the data distribution strategy of the secret sharing scheme
can provide confidentiality protection to outsourced data. The correctness of the verification algorithm is preserved by
As for using encryption algorithms, outsourced data chunks the hardness of the Elliptic Curve Discrete Logarithm Problem
are encrypted by convergent encryption before uploading to (ECDLP). For example, for ciq v, no adversaries can generate
cloud servers. As mentioned in [6], the convergent encryption ciq vf to make H(ciq vf ) = H(ciq v), or vise versa.
scheme is proved to be secure under our security assump- Meanwhile, the correctness of data tags is guaranteed since
tions. Hence, the ciphertext cannot be decrypted by unautho- we have proved the immutability and credibility of blockchain
rized users if the convergent key K is prevented from leak- data in theorem 1. Thus, the verification process and integrity
age. Even if more than k servers collude and obtain suf- check process are correct due to the correctness of data tags.
ficient data chunks, they cannot decrypt the ciphertext due This concludes that, our system can provide data integrity under
to the utilization of convergent encryption. Therefore, the the threats mentioned in Section IV-B.
confidentiality of encrypted data is preserved by convergent Theorem 3: The proposed scheme achieves reliability re-
encryption. quirements by using the secret sharing scheme and the proposed
As for using blockchain, it is difficult for adversaries to tamper heuristic matching algorithm.
with the metadata meta(F ) generated from outsourced data Proof: The outsourced data is distributed by a (r, k, n) secret
F . For instance, if we use Bitcoin blockchain to implement sharing scheme in the proposed system. If cloud servers are
our system, the adversaries cannot tamper with the blockchain available in 50% of the time, they provide P = 1 − (50%)n
massage unless they control over 51% of the computing power available time for the outsourced data. Let m be the number of
in the whole network [39]. In this case, meta(F ) stored on duplication operations, the available time of the file is shown in
blockchain can be used for integrity check and identity ver- Fig. 2. Note that the available time is 87.5% for m = 3, where m
ification during file download phase. With the leveraging of is the commonly used number of data replicas, while it is 99.8%
collision-resistance hash function Hash, only users who had for m ≥ 9. If the same data is stored by k of n erasure code with
owned plaintext F = F can generate meta(F ) = meta(F ). 3-fold redundancy, such as 20 of 60, more than 99.6% available

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3300 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

verification information of data chunks is uploaded in the form

of multiple transactions, these transactions will be recorded on
the same block and processed within the same block genera-
tion period. Therefore, when the amount of data increases, the
computing cost of storage operations mainly depends on the
implementation and performance of the underlying blockchain.
Thus, our main consideration is the performance of the proposed
secret sharing scheme and heuristic algorithm. We first show the
performance of the secret sharing scheme in terms of storage cost
and computing cost. Then we analyze the performance of the
heuristic matching algorithm in terms of balance and availability.
Simulations are performed on a laptop with AMD Ryzen 3 PRO
Fig. 2. Availability of duplicate storage system. 1200 Quad-Core Processor (3.10 GHz) and Ubuntu 16.04 LTS
OS. We set the default chunk size is 4 KB, which is commonly
used in fixed-size chunking deduplication.
time can be provided according to the following P .
m−1
A. Performance of Secret Sharing Scheme
P = 1 − k
Cm · pk · (1 − p)m−k .
k=0 In this section, we compare the proposed scheme with the
baseline [9] and SeShare [31]. As to storage cost, we explore
Hence, the secret sharing scheme can provide similar available
the relationship between coding strategy and data availability,
time for outsourced data with less consumption in storage space
as described in Section 3. Then we consider the impact of RSSS
compared to the traditional duplicate scheme.
parameters (t, k, n) on the storage cost. As to computing cost,
Besides, outsourced data chunks are distributed to multiple
we emphasize the overhead of data uploading. The scheme is
cloud servers by using the proposed heuristic matching algo-
implemented by python 3.6 with a development tool named
rithm. The proposed matching algorithm minimizes the variance
Jerasure Version 1.2 [43]. SHA-256 hash function is used to
of remained storage space of servers in each round of allocation.
generate 32 bytes convergent key. AES256 (in Cipher-Block
As such, the ratio of utilized storage space The centralization
Chaining mode) is adopted as an asymmetric encryption algo-
of storage is reduced after rounds of data chunk allocation by
rithm. The PBC library [46] is used to implement bilinear pairing
minimizing the variance of remained storage space of servers.
algorithms with type A pairing setting.
To preserve the effectiveness of the secret sharing scheme, our
1) Storage Cost: We define blow-up as the ratio of en-
algorithm allocates no more than r data chunks to a single
coded
n data size to the raw data size, which is calculated by
cloud server. In this case, the single point of failure does not l
1 c
1 iq /Raw D ataS ize. The reliability of the raw data is
impact the reliability of the outsourced data in our system.
related to the combinations of k and n. Different configurations
Meanwhile, the PoW mentioned in Section V-B3 ensure that the
of k and n also influence the blow-up. Figs. 4 and 5 show the
retrieved data chunks are correct. Thus, the data chunks used
experimental results on a 5 MB text file. In general, the reliability
in Recover(·) are correct. Therefore, the confidentiality of data
of outsourced data rises with the increase of n/k. It is because
reconstruct algorithm can be preserved. Further, the reliability
the algorithm Recover(·) needs more data chunks with the
of data download process can be achieved.
increasing k. Both blow-up and availability of raw data increase
With the utilization of blockchain, metadata of outsourced
with the decreasing k/n. It is because when k/n increases, more
data are reliably stored by all participants instead of saving
redundancies are generated.
at a single server. The consistency of metadata can also be
2) Computing Cost: In terms of the computing cost, we com-
preserved by leveraging blockchain architecture as mentioned
pare the proposed scheme with the existing works [9], [31]. Here,
in Theorem 2. As a result, the data recover process can not
Exp indicates an exponential operation. P air indicates a pairing
be disrupted by losing correct metadata unless half of the
operation on ê : G × G → GT . M ul indicates multiplication
blockchain nodes failed, where the number is believed greater
in G. Add indicates a mode addition operation. Hash is a
than n. Hence, the complete data can invariably be recovered
collision-resistance hash function on G.
unless more than n − k chunks become inaccessible, and the
The main algorithms in the deduplication scheme are proce-
reliability of outsourced data can be indirectly protected. These
dure tag(·) for tag generation, procedures KeyGenCE (·) and
conclude that our system can provide data reliability under the
EncryptCE (·) for convergent encryption, procedure Share for
threats as mentioned in Section IV-B.
encoding, and procedure Recover for decoding. Besides, the
most notable algorithms in the baseline are procedure tag (·) for
VIII. PERFORMANCE ANALYSIS
tag generation, procedure Share for encoding, and procedure
In this section, we discuss the performance of the proposed Recover for decoding.
methods. It can be noticed that the size or number of out- We explore the impact of (t, k, n) settings on computing
sourced data mainly determines the encoding and decoding cost with 5 MB text file as shown in Fig. 3. Here, the most
time consumption of the secret sharing scheme. Although the compute-intensive algorithm is Share. The computing cost of

Fig. 3. Computing cost with different t − k − n.

Fig. 6. Comparisons in computing cost of encoding and decoding.

Fig. 4. Blow-up with different k and n.

because they are only influenced by the number of data chunks

required in the Recover.
As to SeShare scheme, we mainly concern the computing cost
of convergent encryption KeyGenCE (·), EncryptCE (·) and
the signature algorithm. The comparison among [9], [31] and our
scheme is illustrated as Fig. 6, with plaintext size 5, 10, . . . , 100
MB. The overhead complexity of the signature algorithm in
SeShare is calculated by cExp + (c − 1)M ul + cM ul, where
c is the number of verified chunks. The overhead complexity
of RSSS is calculated by cHash + rAdd plus about 7 XOR
operations in GF (28 ) as analysis in [47]. Besides, the over-
head complexity of PoW algorithm is calculated by cHash.
The overhead complexity of verification algorithm for PoW is
Fig. 5. Availability with different k and n. calculated by c(Hash + P air). The cExp operation spends
more computing cost than hash function, add function, and XOR
operation. Thus, our scheme is always better than SeShare in
Share increases with the value of t/n decreases. This is be- terms of computing cost under the predetermined parameters.
cause additional operations for data splitting are required in this Fig. 6 shows that our method has similar performance with
configuration. The computing costs of the procedures Recover baseline [9] in terms of computing cost. This is because the
and tag(·) are not sensitive to the change of parameters. It is proposed scheme is based on the baseline and uses lightweight

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3302 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

algorithm to improve the performance. Specifically, the pro-

posed scheme utilizes convergent encryption and hash func-
tion to prevent outsourced data from collusion attacks while
supporting the identity verification algorithm. Both convergent
encryption scheme and hash function are microsecond level
method so that our method and the baseline is approximate
compared to the second-level computational consumption of
Seshare. The difference of performance between our method and
the baseline is still at the microsecond level with the increase of
data size.

B. Performance of Heuristic Matching Algorithm

In this section, we first discuss the computing complexity of Fig. 7. Balance indexes under different number of cloud servers.
the heuristic matching algorithm. Then the comparison between
our method with the other two baselines is conducted under the
assumptions in SectionVI-A. The algorithms are implemented
by Java and the runtime environment is JDK 1.8.
1) Computing Complexity: The number of edges in the com-
plete bipartite graph G(t) is m · n. The time complexity of
the function GetChunk and GetServer is O(m) and O(n2 ),
respectively. We make m = n by adding virtual nodes, the total
complexity of the heuristic matching algorithm is bounded by
O(n3 ) which means our algorithm can terminate in polynomial
time.
2) Storage Balancing: Random decision strategy and FIFO
strategy are chosen as the baseline of the proposed heuristic
matching algorithm. Let the number of cloud servers as 10, 20,
Fig. 8. Balance Indexes with different α.
30, 40, and 50. Let the size of arrival data chunks be the random
number from 0 to 500, available storage space of servers be
the random number from 0 to 1000. ηj is chosen from 0 to
0.0001. Here we use Balance Index to measure the performance It is clear that the Balance Index has a tendency to decrease as the
of storage balance. The Balance Index can be calculated by value of α increases, which means the value of variance makes
more impact on the Balance Index. The instability of Balance
m,n
(r)(t)
Index in Fig. 7 demonstrates that the Balance Index can properly
α si ηj + (1 − α)F (X (t) ), (8) represents the performance of our algorithm for α > 0.4.
i=1,j=1

where α is used to adjust the Balance Index in terms of data IX. DISCUSSIONS AND LIMITATIONS
(r)(t)
availability and storage balance. Let Θ = m,n i=1,j=1 si ηj . Our proposed system can provide reliable deduplication to
Θ denotes the size of failed data chunks. Let Π = F (X (t) ). cloud storage. The heuristic matching algorithm distributes
Π denotes the variance of usage ratio. The availability of out- the outsourced data to multiple cloud servers with balanced
sourced data increases with the decreasing Θ. It is because the allocation. With the help of the smart contract, the proposed
reliability of outsourced data increases as the failure of data schemes can be processed without a third-party arbiter. Thus,
chunks decreases. Obviously, the storage balance is increased single point attacks do not harm the system unless more than half
as Π decreases. Thus, the lower the value of the Balance Index, of the participants are malicious. The distribution feature of the
the better the performance of the matching result. To clearly storage network further enhances the reliability of outsourced
illustrate the results, we also normalize the data by using a data.
mean normalization algorithm. Let α = 0.5, the experimental However, there are some limitations to the proposed system.
results are shown in Fig. 7. Our proposed algorithm has the First, the stability of the proposed system relies on the security
best performance on Balance Index while the random decision of the underlying blockchain network. Private or consortium
strategy is the worst. As the number of cloud servers increases, blockchain may not have a minimum number of nodes remaining
the Balance Index of the proposed algorithm increases but it online to verify the block generation. In this case, participants
still retains at a lower level. The Balance Index values of the of the storage system can easily implement a double-spend
three proposed methods are unstable. The undulation of the attack by secretly generating a task and solving the task by
results is caused by the randomized available storage sources. themselves. Therefore, a reasonable consensus mechanism is
To demonstrate the influence of α, we run the algorithm with required for non-public blockchains in future works to improve
α = 0.1, 0.2, . . . , 0.9 under 10 cloud servers as shown in Fig. 8. the adaptation of our work.

Second, when the proposed system is deployed on a public [14] J. Benet, “Ipfs-content addressed, versioned, p2p file system,”
blockchain, the efficiency of the mining algorithm limits the 2014, arXiv:1407.3561
[15] M. Ali, J. Nelson, R. Shea, and M. J. Freedman, “Blockstack: A global
performance of the storage service, which is unacceptable for naming and storage system secured by blockchains,” in Proc. USENIX
an efficiency-intensive application. Thus, improved consensus Annu. Tech. Conf., 2016, pp. 181–194.
mechanisms and low coupling architecture are needed in future [16] X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat, and L. Njilla,
“ProvChain: A blockchain-based data provenance architecture in cloud
works. environment with enhanced privacy and availability,” in Proc. IEEE/ACM
The last problem is that the definition of storage balance is 17th Int. Symp. Cluster, Cloud Grid Comput., 2017, pp. 468–477.
discussable as more constraints are considered, such as the trans- [17] Y. Xu, C. Zhang, G. Wang, Z. Qin, and Q. Zeng, “A blockchain-enabled
deduplicatable data auditing mechanism for network storage services,”
mission bandwidth, state of hardware, and geographic location. IEEE Trans. Emerg. Topics Comput., vol. 9, no. 3, pp. 1421–1432, Jul.–
For each specific application situation, the proposed design is Sep. 2021.
expected to be modified for better performance. [18] F. Li et al., “Wireless communications and mobile computing blockchain-
based trust management in distributed Internet of Things,” Wireless Com-
mun. Mobile Comput., vol. 2020, pp. 1–12, 2020.
X. CONCLUSION [19] S. Keelveedhi, M. Bellare, and T. Ristenpart, “DupLESS: Server-aided
encryption for deduplicated storage,” in Proc. 22nd USENIX Secur. Symp.,
This paper has proposed a blockchain-based storage system 2013, pp. 179–194.
with secure and reliable deduplication. Blockchain techniques [20] L. Liu, Y. Zhang, and X. Li, “KeyD: Secure key-deduplication with
identity-based broadcast encryption,” IEEE Trans. Cloud Comput., vol. 9,
have been utilized to record the state of outsourced data chunks no. 2, pp. 670–681, Apr.–Jun. 2021.
permanently and transparently to achieve trusted deduplication, [21] W. Ding, Z. Yan, and R. H. Deng, “Secure encrypted data deduplication
without the participation of any third parties. To avoid the with ownership proof and user revocation,” in Proc. 17th Int. Conf.
Algorithms Architectures Parallel Process., 2017, pp. 297–312.
centralization of storage, a heuristic matching algorithm is also [22] H. Cui, R. H. Deng, Y. Li, and G. Wu, “Attribute-based storage supporting
conducted to allocate the outsourced data in a balanced way. Se- secure deduplication of encrypted data in cloud,” IEEE Trans. Big Data,
curity analysis has shown that the confidentiality of outsourced vol. 5, no. 3, pp. 330–342, Sep. 2019.
[23] K. He, J. Chen, R. Du, Q. Wu, G. Xue, and X. Zhang, “DeyPos: Dedu-
data and the tag consistency can be preserved in our design. plicatable dynamic proof of storage for multi-user environments,” IEEE
Several kinds of attacks have been considered and overcome in Trans. Comput., vol. 65, no. 12, pp. 3631–3645, Dec. 2016.
the proposed system. Experimental results have demonstrated [24] S. Wu, B. Mao, H. Jiang, H. Luan, and J. Zhou, “PFP: Improving
the reliability of deduplication-based storage systems with per-file par-
that the security and reliability have been successfully improved. ity,” IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 9, pp. 2117–2129,
Sep. 2019.
ACKNOWLEDGMENT [25] S.-T. Shen, H.-Y. Lin, and W.-G. Tzeng, “An effective integrity check
scheme for secure erasure code-based storage systems,” IEEE Trans. Rel.,
The authors would like to thank Jiaxing Li, Long Chen, and vol. 64, no. 3, pp. 840–851, Sep. 2015.
[26] M. Li, C. Qin, and P. P. Lee, “CDStore: Toward reliable, secure, and cost-
Yalan Wu for their constructive comments to improve the paper. efficient cloud storage via convergent dispersal,” in Proc. USENIX Annu.
Tech. Conf., 2015, pp. 111–124.
REFERENCES [27] Storj, 2016. [Online]. Available: https://storj.io/storj.pdf
[28] Sia, 2014. [Online]. Available: http://www.sia.tech/
[1] “Cisco global cloud index: Forecast and methodology, 2016–2021 white [29] J. Li, J. Wu, and L. Chen, “Block-secure: Blockchain based scheme for
paper,” Cisco, San Jose, CA, USA, Feb. 2018. secure P2P cloud storage,” Inf. Sci., vol. 465, pp. 219–231, 2018.
[2] Dropbox. [Online]. Available: https://www.dropbox.com/ [30] P. Singh, N. Agarwal, and B. Raman, “Secure data deduplication using
[3] Mozy. [Online]. Available: http://mozy.com/ secret sharing schemes over cloud,” Future Gener. Comput. Syst., vol. 88,
[4] Google Drive. [Online]. Available: http://drive.google.com/ pp. 156–167, 2018.
[5] D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Side channels in cloud [31] L. Huang, G. Zhang, S. Yu, A. Fu, and J. Yearwood, “SeShare: Secure
services: Deduplication in cloud storage,” IEEE Secur. Privacy, vol. 8, cloud data sharing based on blockchain and public auditing,” Concurrency
no. 6, pp. 40–47, Nov./Dec. 2010. Comput.: Pract. Experience, vol. 31, 2017, Art. no. e4359.
[6] J. R. Douceur, A. Adya, W. J. Bolosky, P. Simon, and M. Theimer, [32] H. Yuan, X. Chen, J. Wang, J. Yuan, H. Yan, and W. Susilo, “Blockchain-
“Reclaiming space from duplicate files in a serverless distributed file based public auditing and secure deduplication with fair arbitration,” Inf.
system,” in Proc. IEEE 22nd Int. Conf. Distrib. Comput. Syst., 2002, Sci., vol. 541, pp. 409–425, 2020.
pp. 617–624. [33] J. Li, J. Wu, L. Chen, and J. Li, “Blockchain-based secure and reliable
[7] T. Jiang, X. Chen, Q. Wu, J. Ma, W. Susilo, and W. Lou, “Secure and distributed deduplication scheme,” in Proc. Int. Conf. Algorithms Archi-
efficient cloud data deduplication with randomized tag,” IEEE Trans. Inf. tectures Parallel Process., 2018, pp. 393–405.
Forensics Secur., vol. 12, no. 3, pp. 532–543, Mar. 2017. [34] S. Liu, J. Yu, Y. Xiao, Z. Wan, S. Wang, and B. Yan, “BC-SABE:
[8] J. Yin, Y. Tang, S. Deng, Z. Bangpeng, and A. Zomaya, “MUSE: A Blockchain-aided searchable attribute-based encryption for cloud-IoT,”
multi-tierd and SLA-driven deduplication framework for cloud stor- IEEE Internet Things J., vol. 7, no. 9, pp. 7851–7867, Sep. 2020.
age systems,” IEEE Trans. Comput., vol. 70, no. 5, pp. 759–774, [35] Y. Wang, J. Yu, B. Yan, G. Wang, and Z. Shan, “BSV-PAGS: Blockchain-
May 2021. based special vehicles priority access guarantee scheme,” Comput. Com-
[9] J. Li et al., “Secure distributed deduplication systems with improved reli- mun., vol. 161, pp. 28–40, 2020.
ability,” IEEE Trans. Comput., vol. 64, no. 12, pp. 3569–3579, Dec. 2015. [36] V. J. Sosa-Sosa, A. Barron, J. L. Gonzalez, J. Carretero, and I. Lopez-
[10] L. Wang, B. Wang, W. Song, and Z. Zhang, “A key-sharing based secure Arevalo, “Improving performance and capacity utilization in cloud storage
deduplication scheme in cloud storage,” Inf. Sci., vol. 504, pp. 48–60, 2019. for content delivery and sharing services,” IEEE Trans. Cloud Comput.,
[11] H. Wang, P. Shi, and Y. Zhang, “JointCloud: A cross-cloud cooperation vol. 10, no. 1, pp. 439–450, Jan.–Mar. 2020.
architecture for integrated internet service customization,” in Proc. IEEE [37] H. Yin et al., “A blockchain-based storage system with financial incen-
37th Int. Conf. Distrib. Comput. Syst., 2017, pp. 1846–1855. tives for load-balancing,” IEEE Trans. Netw. Sci. Eng., vol. 8, no. 2,
[12] G. Zhang, Z. Yang, H. Xie, and W. Liu, “A secure authorized deduplication pp. 1178–1188, Apr.–Jun. 2021.
scheme for cloud data based on blockchain,” Inf. Process. Manage., vol. 58, [38] T. Liu, J. Wu, J. Li, J. Li, and Z. Zhang, “Efficient algorithms for storage
no. 3, 2021, Art. no. 102510. load balancing of outsourced data in blockchain network,” Comput. J.,
[13] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: vol. 65, no. 6, pp. 1512–1526, 2021.
Applications, trends, technologies, and open research problems,” IEEE [39] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” white
Netw., vol. 34, no. 3, pp. 134–142, May/Jun. 2020. paper, 2008. [Online]. Available: https://bitcoin.org/bitcoin.pdf

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.
3304 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO. 4, JULY/AUGUST 2024

[40] G. Wood, “Ethereum: A secure decentralised generalised transaction Jigang Wu (Member, IEEE) received B.Sc. degree
ledger,” 2014. [Online]. Available: https://gavwood.com/paper.pdf from Lanzhou University, Lanzhou, China, in 1983,
[41] I. Weber, X. Xu, R. Riveret, G. Governatori, A. Ponomarev, and J. and Doctoral degree from the University of Science
Mendling, “Untrusted business process monitoring and execution using and Technology of China, Hefei, China, in 2000.
blockchain,” in Proc. Int. Conf. Bus. Process Manage., 2016, pp. 329–347. From 2000 to 2010, he was a Research Fellow with
[42] E. Androulaki et al., “Hyperledger fabric: A distributed operating sys- the Center for High Performance Embedded Systems,
tem for permissioned blockchains,” in Proc. 13th EuroSys Conf., 2018, Nanyang Technological University, Singapore. From
pp. 1–15. 2010 to 2015, he was the Dean, Tianjin distinguished
[43] J. S. Plank, S. Simmerman, and C. D. Schuman, “Jerasure: A library in Professor with the School of Computer Science and
c/c++ facilitating erasure coding for storage applications - version 1.2,” Software, Tianjin Polytechnic University, Tianjin,
Univ. Tennessee, Tech. Rep. CS-08-627, vol. 23, 2008. China. He is currently a distinguished Professor with
[44] Z. Yan, W. Ding, X. Yu, H. Zhu, and R. H. Deng, “Deduplication on the School of Computer Science and Technology, Guangdong University of
encrypted Big Data in cloud,” IEEE Trans. Big Data, vol. 2, no. 2, Technology, Guangzhou, China. He has authored or coauthored more than
pp. 138–150, Jun. 2016. 300 papers in IEEE TRANSACTIONS ON COMPUTERS, IEEE TRANSACTIONS
[45] Y. Zhang, C. Xu, X. Lin, and X. S. Shen, “Blockchain-based public ON PARALLEL AND DISTRIBUTED SYSTEMS, IEEE TRANSACTIONS ON VERY
integrity verification for cloud storage against procrastinating auditors,” LARGE SCALE INTEGRATION, IEEE TRANSACTIONS ON NEURAL NETWORKS
IEEE Trans. Cloud Comput., vol. 9, no. 3, pp. 923–937, Jul.–Sep. 2021. AND LEARNING SYSTEMS, TSMC JPDC, PARCO, JSA, and international con-
[46] B. Lynn, “Pbc library-pairing-based cryptography,” 2007. [Online]. Avail- ferences. His research interests include mobile computing, data science, and
able: http://crypto.stanford.edu/pbc high performance computing. He serves on China Computer Federation as a
[47] J. S. Plank and L. Xu, “Optimizing cauchy reed-solomon codes for fault- Technical Committee Member in the branch committees, High Performance
tolerant network storage applications,” in Proc. IEEE Int. Symp. Netw. Computing, Theoretical Computer Science.
Comput. Appl., 2006, pp. 173–180.

Jingyi Li received the B.Sc. degree from Central Zikai Zhang (Member, IEEE) received the B.S. de-
China Normal University, Wuhan, China, in 2017.
gree from Tianjin Polytechnic University, Tianjin,
He is currently working toward the excellent master’s
China, in 2014. He is currently working toward the
degree with the School of Computer Science and Ph.D. degree with the School of Electronic and Infor-
Technology, Guangdong University of technology,
mation Engineering and the State Key Lab of Rail
Guangzhou, China, and to be recommended to join
Traffic Control and Safety, Beijing Jiaotong Uni-
the Ph.D. Program with the School of Computer
versity, Beijing, China. His main research interests
and Information Technology, Beijing Jiaotong Uni- include Big Data analysis, security and intelligent
versity, Beijing, China. His research interests include
transportation, and deep learning.
blockchain, cybersecurity, and machine learning.

Yidong Li (Senior Member, IEEE) received the

B.Eng. degree in electrical and electronic engineering
from Beijing Jiaotong University, Beijing, China, in
2003, and the M.Sc. and Ph.D. degrees in computer Yi Jin received the Ph.D. degree in signal and infor-
science from The University of Adelaide, Adelaide, mation processing from the Institute of Information
SA, USA, in 2006 and 2010, respectively. He is Science, Beijing Jiaotong University, Beijing, China,
currently a Professor with the School of Computer in 2010. She is currently an Associate Professor with
and Information Technology, Beijing Jiaotong Uni- the School of Computer Science and Information
versity. He has authored or coauthored more than Technology, Beijing Jiaotong University. From 2013
100 research papers in various journals and refereed to 2014, she was a Visiting Scholar with the School
conferences. His research interests include Big Data of Electrical and Electronic Engineering, Nanyang
analysis, privacy preserving and information security, and intelligent transporta- Technological University, Singapore. Her research in-
tion. He has organized several international conferences and workshops and was terests include computer vision, pattern recognition,
also a program committee member for several main international conferences. image processing, and machine learning.

Authorized licensed use limited to: Maharaja Institute of Tech. Downloaded on February 25,2025 at 10:16:03 UTC from IEEE Xplore. Restrictions apply.

TSC 2024 10 0911 - Proof - Hi
No ratings yet
TSC 2024 10 0911 - Proof - Hi
15 pages
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
No ratings yet
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
13 pages
Iaetsd Controlling Data Deuplication in Cloud Storage
No ratings yet
Iaetsd Controlling Data Deuplication in Cloud Storage
6 pages
1 s2.0 S221421262300011X Main
No ratings yet
1 s2.0 S221421262300011X Main
13 pages
2024TC LSDedup Layered Secure Deduplication For Cloud Storage
No ratings yet
2024TC LSDedup Layered Secure Deduplication For Cloud Storage
14 pages
2023TPDS FCDedup A Two-Level Deduplication System For Encrypted Data in Fog Computing
No ratings yet
2023TPDS FCDedup A Two-Level Deduplication System For Encrypted Data in Fog Computing
15 pages
Sample Final Project Documentation
No ratings yet
Sample Final Project Documentation
87 pages
(IJCST-V10I5P53) :MR D Purushothaman, M Naveen
No ratings yet
(IJCST-V10I5P53) :MR D Purushothaman, M Naveen
8 pages
Storage Management System Using Block Level Deduplication Technique in Cloud Computing
No ratings yet
Storage Management System Using Block Level Deduplication Technique in Cloud Computing
5 pages
Deduplication Review
No ratings yet
Deduplication Review
18 pages
2022 V13i1173
No ratings yet
2022 V13i1173
8 pages
Revocation Based De-Duplication Systems For Improving Reliability in Cloud Storage
No ratings yet
Revocation Based De-Duplication Systems For Improving Reliability in Cloud Storage
6 pages
Lightweight Cloud Storage Auditing With Deduplication Supporting
No ratings yet
Lightweight Cloud Storage Auditing With Deduplication Supporting
14 pages
Amol PCX - Report
No ratings yet
Amol PCX - Report
15 pages
Encrypted Data Management With Deduplication in Cloud Computing
No ratings yet
Encrypted Data Management With Deduplication in Cloud Computing
13 pages
Cloud Data Deduplication Techniques
No ratings yet
Cloud Data Deduplication Techniques
7 pages
Deduplication On Encrypted Big Data in Cloud
No ratings yet
Deduplication On Encrypted Big Data in Cloud
13 pages
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
No ratings yet
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
4 pages
Li 2020
No ratings yet
Li 2020
13 pages
Sathe 2018
No ratings yet
Sathe 2018
4 pages
Cloud Data de Duplication in Multiuser Environment DeposM2
No ratings yet
Cloud Data de Duplication in Multiuser Environment DeposM2
3 pages
Protected Steadfast Deduplication in Crossbreed Cloud Technique
No ratings yet
Protected Steadfast Deduplication in Crossbreed Cloud Technique
5 pages
Secure Destributed De-Duplication System in Reliable Cloud Storage1
No ratings yet
Secure Destributed De-Duplication System in Reliable Cloud Storage1
45 pages
Cloud Backup Deduplication Guide
No ratings yet
Cloud Backup Deduplication Guide
3 pages
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
No ratings yet
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
11 pages
2023TSC Security-Aware and Efficient Data Deduplication For Edge-Assisted Cloud Storage Systems
No ratings yet
2023TSC Security-Aware and Efficient Data Deduplication For Edge-Assisted Cloud Storage Systems
12 pages
Certificateless Integrity Checking of Group Shared Data On Public Distributed Storage
No ratings yet
Certificateless Integrity Checking of Group Shared Data On Public Distributed Storage
8 pages
1822 B.E Cse Batchno 141
No ratings yet
1822 B.E Cse Batchno 141
56 pages
CLOUD COMPUTING - Using Fog Computing To Deduplicate Data For Efficient Cloud Storage.
No ratings yet
CLOUD COMPUTING - Using Fog Computing To Deduplicate Data For Efficient Cloud Storage.
4 pages
Secure and Efficient Proof of Ownership Scheme For Client-Side Deduplication in Cloud Environments
No ratings yet
Secure and Efficient Proof of Ownership Scheme For Client-Side Deduplication in Cloud Environments
9 pages
V3i501 PDF
No ratings yet
V3i501 PDF
6 pages
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
No ratings yet
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
7 pages
Secure Cloud Deduplication & Auditing
No ratings yet
Secure Cloud Deduplication & Auditing
10 pages
Deduplication 2023
No ratings yet
Deduplication 2023
14 pages
Improving Accessing Efficiency of Cloud Storage Using De-Duplication and Feedback Schemes
No ratings yet
Improving Accessing Efficiency of Cloud Storage Using De-Duplication and Feedback Schemes
3 pages
Deduplication On Encrypted Data in Cloud Computing
No ratings yet
Deduplication On Encrypted Data in Cloud Computing
4 pages
Encrypted Data Deduplication in Cloud Storage: 2015 10th Asia Joint Conference On Information Security
No ratings yet
Encrypted Data Deduplication in Cloud Storage: 2015 10th Asia Joint Conference On Information Security
8 pages
Research Paper
No ratings yet
Research Paper
6 pages
Authorized Data Deduplication in Cloud
No ratings yet
Authorized Data Deduplication in Cloud
11 pages
Journal of King Saud University - Computer and Information Sciences
No ratings yet
Journal of King Saud University - Computer and Information Sciences
12 pages
Edge Data Deduplication Under Uncertainties A Robust Optimization Approach
No ratings yet
Edge Data Deduplication Under Uncertainties A Robust Optimization Approach
12 pages
A Hybrid Cloud Approach For Secure Authorized Deduplication
100% (4)
A Hybrid Cloud Approach For Secure Authorized Deduplication
9 pages
DedupChain A Secure Blockchain-Enabled Storage System With Deduplication For Zero-Trust Network
No ratings yet
DedupChain A Secure Blockchain-Enabled Storage System With Deduplication For Zero-Trust Network
16 pages
Literature Survey
No ratings yet
Literature Survey
8 pages
A Hybrid Cloud Approach For Secure Authorized De-Duplication
No ratings yet
A Hybrid Cloud Approach For Secure Authorized De-Duplication
9 pages
A Hybrid Cloud Approach For Secure Authorized Deduplication
No ratings yet
A Hybrid Cloud Approach For Secure Authorized Deduplication
15 pages
Storage Management System Using Block Level Deduplication Technique in Cloud Computing Final
No ratings yet
Storage Management System Using Block Level Deduplication Technique in Cloud Computing Final
5 pages
Rashid, Fatema
No ratings yet
Rashid, Fatema
164 pages
Secure Distributed Deduplication Systems
No ratings yet
Secure Distributed Deduplication Systems
46 pages
SRS NNN
No ratings yet
SRS NNN
16 pages
Ijctt V3i3p108
No ratings yet
Ijctt V3i3p108
6 pages
Secure Data Deduplication With Dynamic Ownership Management in Cloud Storage
No ratings yet
Secure Data Deduplication With Dynamic Ownership Management in Cloud Storage
18 pages
Analysis of Empower Auditing and Secure Deduplication in Hybrid Cloud
No ratings yet
Analysis of Empower Auditing and Secure Deduplication in Hybrid Cloud
4 pages
Meta Cloud-Improve The Performance of Winds of Change From Vendor Lock
No ratings yet
Meta Cloud-Improve The Performance of Winds of Change From Vendor Lock
6 pages
23enhanced Storage Optimization System (SoS) For IaaS Cloud Storage
No ratings yet
23enhanced Storage Optimization System (SoS) For IaaS Cloud Storage
5 pages
LSJ1521 - Secure Auditing and Deduplicating Data in Cloud
No ratings yet
LSJ1521 - Secure Auditing and Deduplicating Data in Cloud
5 pages
191 1496476036 - 03-06-2017 PDF
No ratings yet
191 1496476036 - 03-06-2017 PDF
5 pages
SIOCEN: Secure Integrity Verification of Outsourced Data in Cloud Storage Using Blockchain
No ratings yet
SIOCEN: Secure Integrity Verification of Outsourced Data in Cloud Storage Using Blockchain
19 pages
A New Cloud Arechitecutre For Secure Verified Deduplication
No ratings yet
A New Cloud Arechitecutre For Secure Verified Deduplication
33 pages
BIS613D Model Set 1 Paper
No ratings yet
BIS613D Model Set 1 Paper
2 pages
CCS - Simp
No ratings yet
CCS - Simp
8 pages
Algorithm Analysis & Design Course
No ratings yet
Algorithm Analysis & Design Course
3 pages
Paper 5
No ratings yet
Paper 5
30 pages
Ada Lab
No ratings yet
Ada Lab
2 pages
Class 4
No ratings yet
Class 4
37 pages
Hybrid Cloud With NVGRE v5
No ratings yet
Hybrid Cloud With NVGRE v5
142 pages
AWS Cloud Practitioner Exam Guide
0% (1)
AWS Cloud Practitioner Exam Guide
3 pages
03-01 Mini 60%
No ratings yet
03-01 Mini 60%
17 pages
Online Platforms & Internet Learning Guide
100% (1)
Online Platforms & Internet Learning Guide
21 pages
SAP Financial Process Design
No ratings yet
SAP Financial Process Design
14 pages
Oracle 1Z0-1042-20 Exam Practice Test
No ratings yet
Oracle 1Z0-1042-20 Exam Practice Test
5 pages
SAP Innovation 2024 Pitch Deck Template
No ratings yet
SAP Innovation 2024 Pitch Deck Template
15 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
25 pages
The Industry's Leading Web Application Protection: Cloud WAF Service Data Sheet
No ratings yet
The Industry's Leading Web Application Protection: Cloud WAF Service Data Sheet
2 pages
Lesson 13 Gartner Top 10 Trends in Data and Analytics For 2020
No ratings yet
Lesson 13 Gartner Top 10 Trends in Data and Analytics For 2020
9 pages
KPI Partners - Recruitment - V 5.0
No ratings yet
KPI Partners - Recruitment - V 5.0
15 pages
CertPrep NCP-CN PDF Questions
No ratings yet
CertPrep NCP-CN PDF Questions
5 pages
A10 DS 15100 en 1
No ratings yet
A10 DS 15100 en 1
20 pages
PHP Developer
No ratings yet
PHP Developer
1 page
Summer Internship Project Report
No ratings yet
Summer Internship Project Report
22 pages
Oracle Cloud Infrastructure 2022 Architect Professional 1
No ratings yet
Oracle Cloud Infrastructure 2022 Architect Professional 1
19 pages
Cloud Models and Platforms
No ratings yet
Cloud Models and Platforms
35 pages
New Veeva Presentataion (Autosaved)
No ratings yet
New Veeva Presentataion (Autosaved)
16 pages
Secure Firewall 3100 Series Ds
No ratings yet
Secure Firewall 3100 Series Ds
8 pages
Annualreport2020 New en
No ratings yet
Annualreport2020 New en
119 pages
05 Task Performance 1 Introtoco
No ratings yet
05 Task Performance 1 Introtoco
3 pages
Eniscope Product Datasheet 2020 Web
No ratings yet
Eniscope Product Datasheet 2020 Web
2 pages
Dell Tech IoT & Industry 4.0 Overview
No ratings yet
Dell Tech IoT & Industry 4.0 Overview
27 pages
ML Ops
100% (1)
ML Ops
19 pages
Resume Michael Scaman
No ratings yet
Resume Michael Scaman
2 pages
CV - Serban - Stefan Gabriel Dec 2023
No ratings yet
CV - Serban - Stefan Gabriel Dec 2023
7 pages
CTX137837 - XenServer 6 - 2 - 0 - Configuration Limits
No ratings yet
CTX137837 - XenServer 6 - 2 - 0 - Configuration Limits
4 pages
CC Module-4
No ratings yet
CC Module-4
33 pages
Cloud Computing
No ratings yet
Cloud Computing
2 pages

Block

Uploaded by

Block

Uploaded by

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, VOL. 11, NO.

4, JULY/AUGUST 2024 3289

Blockchain-Based Decentralized Cloud Storage With

Fig. 1. System model.

TABLE I V. THE PROPOSED BLOCKCHAIN-BASED DEDUPLICATION

B. Construction of Blockchain-Based Deduplication Scheme

Step 3: If trigger = 0, it represents ciq is lost or tampered. server is pj

Hence, it is easy for authorized users to observe either the stored

verification information of data chunks is uploaded in the form

Fig. 3. Computing cost with different t − k − n.

Fig. 6. Comparisons in computing cost of encoding and decoding.

because they are only influenced by the number of data chunks

algorithm to improve the performance. Specifically, the pro-

B. Performance of Heuristic Matching Algorithm

Yidong Li (Senior Member, IEEE) received the

You might also like

Step 3: If trigger = 0, it represents ciq is lost or tampered. server is pj