0% found this document useful (0 votes)
36 views12 pages

Secure Multi-Cloud Backup with Fog

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views12 pages

Secure Multi-Cloud Backup with Fog

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2017.DOI

DropStore: A Secure Backup System


Using Multi-Cloud and Fog Computing
REDA MAHER1, AND OMAR A. NASR1
1Department of Electronics and Electrical Communication Engineering, Cairo University, Giza, Egypt

Corresponding author: Reda Maher (e-mail: eng.redamaher@gmail.com).

ABSTRACT Data backup is essential for disaster recovery. Current cloud-based solutions offer
a secure infrastructure. However, there is no guarantee of data privacy while hosting the data on
a single cloud. Another solution is using these Multi-Cloud technologies. Although, using multiple
clouds to save smaller pieces of data can enhance privacy, it comes at the cost of the edge
device’s need to manage different accounts and the communication with different clouds. These
drawbacks made this technology rare to use.This paper proposes DropStore to provide an easy-
to-use, highly secure, and reliable backup system using modern Multi-Cloud and encryption
techniques. DropStore adds an abstraction layer for the end-user to hide all system complexities
using a locally hosted device, “the Droplet," fully managed by the user. Hence, the user does not
rely on any untrusted third party. Using Fog Computing technology accomplish it. The
uniqueness of DropStore comes from the convergence of Multi-Cloud and Fog Computing
principles. Simulation results show that the proposed system improves data protection in terms
of reliability, security, and privacy preservation while maintaining a simple and easy interface
with edge devices.

INDEX TERMS Multi-Cloud, Fog Computing, Data Reliability, Disaster Recovery, User Privacy.

I. INTRODUCTION like on-premise facilities. In Multi-Cloud architecture, users


Network and computing is being embraced by digital are aware of the multiple clouds and are responsible for
storage rapidly. However, digital data pose many threats, their resource management and services, or a third party is
such as operation error, security attacks, and hardware
failure. Data backup is for avoiding these threats, and using responsible for handling them. There are various reasons to
cloud backup systems is commonly used to add protection
and disaster recovery. adopt a Multi-Cloud architecture, including reducing
Cloud computing [1] technology has enabled users to dependency on any single provider, cost efficiency, flexibility
use full remote computing. Millions of people use in choice, and disaster immunity.
different types of cloud services, directly or indirectly. It Many useful applications are from the Multi-Cloud
has become a big challenge to ensure the protection of concept. These include data storage applications.
their data. Many cloud service providers worldwide are Depending on the system architecture, there are many
available at low cost, and some provide free services. advantages of using the Multi-Cloud model for data
They all deliver different services but are not identical in storage and backup. The most prominent include:
their system settings, privacy policy, rules, and 1) Increasing data protection: Due to the isolation of
regulations. Therefore, they do not enforce any uniform the data between different providers, a violation in
policy that will guarantee protection and privacy- one of them only affects a small amount of data
preserving the data user. For these reasons, many which allows simple isolation of attacks.
researchers adopted the concept of Multi-Cloud [2] to 2) Increasing flexibility: The use of storage facilities
increase the data protection level. from various providers helps to prevent
Multi-Cloud is a heterogeneous architecture using providers’ lock-in and improve data reliability by
various cloud computing and storage facilities, which can replication.
come from a public cloud, a private cloud, or as a 3) Cost optimization: The ability to provide different stor-
standalone cloud- age facilities helps to tailor the cost and the choices.
Although Multi-Cloud storage offers a wide range of
benefits, maintaining, preserving, and distributing data in a
challenged unified maner.
VOLUME 4, 2016 1
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

There are some common challenges, including: This paper introduces DropStore, a new data backup
1) Different APIs: Various providers build other API system based on Multi-Cloud and Fog Computing. This
frameworks and use different programming languages. system utilizes the advantages of Multi-Cloud storage to
2) Problem with compatibility: Storage systems need ensure users’ data protection and reliability. At the same
to be consistent across the clouds to fit seamlessly time, it overcomes the problems of Multi-Cloud using the
into a single environment. Therefore, systems have Fog Computing paradigm. System users can easily and
to follow the same data structures and allow the securely backup, restore, and modify their data without
same resources to be incorporated. caring about the sophisticated operations to protect and
3) Complex management: it is called centralized secure the data on Multi-Cloud storage.
management as well and service aggregation, The proposed system has many advantages over
such as identity and access controls. Adding to the existing systems. The following bullets summarize
this, normal users typically need simplified the most important benefits of:
operations to back up and protect their data. The first system that combines the advantages of
Multi-Cloud and Fog Computing paradigms.
To overcome many of Multi-Cloud problems, DropStore
The high speed backup and better user experience.
employs Fog Computing [3]–[5]. Initially, the Fog Computing
concept establishment minimizes the data access latency
The independency on untrusted third parties for
security management.
between the cloud. Fog Computing provides data processing
The remainder of this paper organizes the following
and networking facilities at the network edge. The idea is to
sections: Section II gives a brief overview of the previous
install dedicated servers located geographically at the edge of
work in Multi-Cloud backup systems and the research
the network in micro/nano data centers close to the end-users.
efforts in Fog Computing based storage. Section III
Then, Cloud Computing provides centralized resources in the
describes the DropStore system architecture. Then, Section
network core, and Fog Computing offers distributed services
IV evaluates the system performance. And section V
and resources near/at the network edge. Fog Computing
concludes and talks about some future improvements.
architecture provides short latency services, location
awareness, quick response time, and real-time interactions.
II. RELATED WORK
The centralized nature of cloud computing cannot meet the A. DATA BACKUP ON MULTI-CLOUD
increasing number of internet-connected devices [6]. Insisting Multi-Cloud Storage has gained significant interest in recent
on using cloud computing will lead to network congestion, low years because it offers high availability, solid security, and
service quality, and high latency. Moreover, some applications prevent service provider lockouts. For example, Zaman et
demanding real-time responses will not be able to work al. [9] designed a distributed Multi-Cloud storage system
correctly. Adopting Fog computing will build broad spatial that uses hybrid encryption to secure data. The data users
distributed applications and services. It will encourage are offline encrypted, then divided into chunks and
innovation in position-aware services and real-time applications distributed to multiple cloud servers. The solution
that need quick responses from the core. It also enables deployment depends on a third-party cloud service provider,
supporting the mobility of edge devices. In addition, Fog which will keep track of the chunk sequence and addresses.
computing optimizes energy usage, reduces network Also, it needs a separate key management server to take
congestion, facilitates service delivery, and optimizes the care of the encrypted keys. The system did not implement
spending in the infrastructure. any redundancy technique to ensure data reliability, and no
Fog nodes can be the typical network elements such as explicit versioning is used to reduce the storage needs. In
routers or middle-end servers geographically positioned near addition, the third-party cloud service provider, which will
the end-users. These nodes can execute applications and store deploy the system, is a vulnerable bottleneck and a single
data to provide the required services and enhance the user point of failure.
experience. They are connected to the cloud core through high- Singh et al. [10] proposed a secure data deduplication
speed connections and can consider the cloud arms while the technique using secret sharing schemes. The data are
brain is in the center of the network. sliced based on the Permutation Ordered Binary (POB)
Fog nodes are responsible for processing the local in- numbering system [11] and stored on multiple cloud servers.
formation, which reduces traffic across the network. For The critical information is divided into various random
high-level processing, the cloud received data after being shares based on the Chinese Remainder Theorem (CRT)
processed initially by the fog nodes. For example, the cloud [12], [13] and saved to multiple servers. Whereas from k
made future planning decisions in smart cars, and cities. servers out of n can restore the key, where k is less than n,
This has the big picture based on the data collected by the the data restored if all of the shares are available.
fog nodes. In contrast, fog nodes process the real-time Therefore, this system will not survive in the case of cloud
interactions locally [7], [8]. service provider lockouts.
Triviback [14] is a chunking based backup system that
2
minimizes the storage using the sec-cs data structure
VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

[15] for deduplication of flat contents. It offers Multi-Cloud


storage for the generated backups. Whereas the storage
usage is efficient, this comes at the expense of data
reliability and immunity against lockouts.
TrustyDrive [16] is a document storage system on
multiple cloud providers. It tries to preserve user and
document anonymity. Although the focus was on
saving and securing document files only, the system
does not provide an interactive or easy way to share
and view the saved documents.
ExpanStor [17] is another Multi-Cloud storage system with
dynamic data distribution. It applies a Client-Server
architecture instead of a pure Client-Based implementation.
To add redundancy and security, it uses Low-Density Parity-
Check (LDPC) [18] codes. Also, Subramanian et al. [19]
proposed another storage framework using a dynamic data
slicing technique based on dynamic index cryptographic FIGURE 1. Droplet in the network architecture.
data slicing. The common disadvantage of these systems is
the complex operations on the edge side.
peer overlay network. It uses Bloom Filters for indexing
B. DATA BACKUP ON FOG NETWORKS and a differential replication scheme to achieve reliability.
Data backup on Fog networks is not a popular concept yet. Whereas FogStore manages replica placement and
Moysiadis et al. [20] classified the Fog Computing data consistency organization in Fog networks for stateful
storage service models to cloud offloading, data aggregation applications and Virtualized Network Functions (VNFs).
on behalf of the cloud, and Peer-to-Peer collaboration to
provide abstracted storage as a service to the edge devices. III. DROPSTORE ARCHITECTURE
Implementing a distributed storage system on Fog networks DropStore provides a unique data backup system architecture.
requires providing fault tolerance, handling different data This uniqueness is from the combination of Fog Networking and
types, scalability, low bandwidth consumption, low latency, Multi-Cloud advantages. Whereas Multi-Cloud methods provide
energy efficiency, security, and privacy-preserving. the most reliable and secure storage environment, Fog
The research efforts in data storage on Fog networks Computing offers better throughput and lower latency for the
mainly conduct four directions. The first direction is backup process.
creating new algorithms for better data handling in Fog The enabler of this unique architecture is the Droplet fog
networks. For example, Gao et al. [21] proposed a hybrid device [27]. Droplet is a personal Fog node. It is controlled by
data dissemination technique to economically use the the end-user, like the rest of the personal devices such as
Fog-Cloud bandwidth with guaranteed download smartphones and laptops. Droplet can be considered a Private
performance of users. Zhang et al. [22] proposed an Fog device. Somehow, it is similar to the Private Cloud concept,
identity-based Fog data storage scheme with anonymous where the organizations deploy and manage their cloud
key generation to enhance security. infrastructure. Fig. 1 shows that how to locate Droplet in the
The second research direction is the performance network architecture. Being a personal device, the Droplet
analysis of a used existing system in Fog networks. enables many applications to use the Fog Computing paradigm.
Confais et al. [23] evaluated three "off-the-shelf" object- DropStore benefits from this advantage to provide a better
store solutions, namely Rados, Cassandra, and backup experience and a unique system that outperforms the
InterPlanetary File System (IPFS) on Fog networks from existing systems.
the point of access times and network traffic. In this section, we describe the DropStore system
The third direction is enhancing the existing data components. Then we discuss the details of the User-
distribution systems to fit with the topology of Fog Fog and the Fog-Cloud interactions. Finally, Illustrate the
networks. Confais et al. [24] extended the IPFS by data retrieval procedure.
using a Scale-out Network Attached Storage system
(NAS) to reduce the inter-site exchanges. A. DROPSTORE SYSTEM COMPONENTS
The last research direction in data storage on Fog networks Fig. 2 shows the system components. The role
introduces new systems designed to work specifically on Fog description of each element is below:
networks. ElfStore [25] and FogStore [26] are two examples in Edge nodes: These are the end-user devices that need to
this direction. ElfStore is a united Edge-local distributed storage back up and secure their data. These edge nodes can be
service over unreliable edge devices, with Fog devices users’ mobile phones, laptops, IP cameras, etc. It requires
managing the operations using a super- a secure and fast backup interface

VOLUME 4, 2016 3
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

Store improves the user experience in terms of backup


speed and facility, retrieval, and online modifications. It
adopts state-of-the-art security and data obfuscation
strategies to store the user data on a public Multi-Cloud
storage. In addition, DropStore helps in reducing the
congestion in the user’s home network by utilizing the low
traffic periods to upload data to the cloud. This is important,
especially in the areas that have low upload speeds. In a
disaster time, DropStore restores the user data and the
whole system easily and quickly.

B. USER-DROPLET SIDE
The DropStore architecture is designed in a way to
guarantee high-speed, security, and privacy:

1) Speed
FIGURE 2. DropStore System Components.
DropStore is running on the Droplet on user premises. User
devices and the Droplet are in the same Local Area Network
(LAN). Consequently, this results in many advantages. First,
with the ability to read, modify, and delete the stored
the data backup speed from the user devices to the Droplet
data at any time. The privacy of each edge node
will be super fast, up to the LAN speed, compared to direct
should be protected to access the other nodes' data
data upload to the cloud. Second, the backup service is
on the system. These requirements should be
achieved without complicated operations at the edge available even if there is no internet access.
nodes.
2) Security
Droplet: It represents the Fog layer but with some
added advantages because of being a personal The Droplet is a personal device. Thus the connected
device. For example, it is fully controlled by the user. devices will be well known and belong to the Droplet
Hence it acts as a fully trusted local backup server. owner. The security threats are more controllable, and
Public Cloud: The data are collected from the edge the edge devices are under user control. Drop-Store
devices on the Droplet and periodically backed up to restricts the Droplet access to the specified user only
several public cloud servers. Backing up the data to the through authentication, allows better management, and
cloud servers offers disaster recovery and increases reduces security threats.
reliability. The data will be encrypted and divided into Edge devices can upload, download, or modify their data
multiple chunks before storing it in the public cloud. This interactively and at high speed. End-to-end encryption and
method prevents any malicious cloud service provider from authentication are used between the edge devices and the
utilizing the user data or compromising the user’s privacy. Droplet to protect the edge devices from Password-Sniffing,
DropStore System Software: This is the backup system Man-In-The-Middle, or any other attacks observed through
that runs on the Droplet. The DropStore offers a safe access to the user's local area network.
but straightforward backup interface to the edge nodes.
However, it ultimately protects data on a distributed 3) Privacy
Multi-Cloud storage. DropStore is responsible for all Isolating the edge devices’ data maintained the privacy on the
complicated operations to keep the user data secure Droplet where no edge device can access the data of the other
and reliable, whereas the edge nodes will not pay much devices. Therefore the DropStore can be used by all users on
attention to these operations. In other words, DropStore the same LAN without privacy concerns.
offloads the required processing by the edge devices to A product from network service providers, cloud
the Fog nodes. This type of processing offload allows service providers, or any third-party company can offer
using modern backup techniques without being limited dropstore. This introduces a new business opportunity.
by the low resources of the edge nodes. The storage capacity of the Droplet can vary according to
On the one hand, the edge nodes are usually battery- the user needs, the number of the expected edge
operated and in low power consumption mode in devices, and the nature of the data.
general. On the other hand, the fog nodes connect to a
power source. Hence, there are lower restrictions on C. DROPLET-CLOUD SIDE
their activity time. The Droplet controls the edge devices’ data. DropStore will
By using the DropStore system architecture, the system can back up these data periodically without user intervention
solve several problems related to backup. Drop- using a secure scheme to Multi-Cloud storage.
4 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

Fig. 3 shows the backup stages from the Droplet to the Multi- more network bandwidth. In addition, a high replica
Cloud storage. Each stage describes the following: count weakens the security level as each CSP has a
relatively large portion of the data (all data in the case
1) Data Delta Calculation: DropStore aims to save network of replica count equals the number of CSPs
bandwidth and storage needs. Hence, it adopts a accounts). Choosing the replica count is a
versioning scheme to back up the data. At each periodic compromise between data reliability, storage needs,
backup, DropStore calculates the delta from the last network bandwidth, and security. Although we
backup instant and only restores the new delta. In the propose using simple replication to ensure data
case of the first backup, all data will be in the delta. The reliability, some other techniques (such as channel
system generates the mandatory metadata to enable data coding) can detect or correct errors.
chain construction from the incremental backups to 5) Data Upload: This is the final stage of the backup
perform data recovery. These metadata are necessary to scenario in DropStore. The data chunks are going to
calculate the delta between a backup instant and a be uploaded to the cloud servers. DropStore
previous backup instant. authenticates itself with each CSP automatically and
2) Data Encryption and Compression: data are no longer uploads the user data. DropStore adopts a round-
under user control and visibility when uploaded to the robin scheme to preserve the balance of storage
public cloud. They can be used without full authority usage on each CSP account. Metadata are distributed
from the user, especially if the service providers are and replicated in the same manner as the user data.
not trusted. DropStore solves this problem by using This scheme keeps the storage balanced on all the
encryption and data partitioning. The data delta CSPs over the long term and avoid overwhelming
coming from the first stage is encrypted using specific CSP.
OpenPGP [28] scheme. The owner of Droplet should
provide the key pairs for the data encryption in backup D. DATA RETRIEVAL
scenarios and data decryption in retrieval scenarios.
Edge devices can restore their data at any time as long
After the delta encryption, DropStore compresses this
as they have access to the DropStore. Even if there is no
encrypted delta to reduce the storage needs on the
internet connection, edge devices can download the data
remote cloud servers.
from the DropStore in case of any data loss incident at
3) Data Partitioning: As described in stage 2, DropStore
the edge devices. Only the edge device authentication
obfuscates the user data by encryption and
accomplishes the data retrieval with the DropStore using
partitioning. The encrypted data from stage 2 will be
its username and password. After this authentication, the
split into several chunks. The chunk size can be
edge device has immediate and full access to its
configured in DropStore. Data partitioning means that
previously backed-up data.
no Cloud Service Provider (CSP) has the complete
In the event of a disaster in the Fog layer (the Droplet),
information. So it is going to be hard for CSPs to learn
DropStore usage can perform data retrieval and recovery
from encrypted chunks.
conveniently. Disaster events in the Fog layer include
4) Data Redundancy: Each chunk volume generated
Droplet software failures, hardware failures, hard drive
replication from the previous stage offers data reliability
crashes, or other natural disasters. To recover the system,
against cloud accounts lockdown or CSPs unavailability.
the Droplet owner only needs to provide the master key and
The number of replicas is configurable in DropStore to
the credentials of the used CSPs after bringing up a new
provide high versatility. Storage needs are proportionally
Droplet. Then, DropStore can search for the metadata on all
increasing with the replica count. If all CSPs are highly
the configured CSPs and construct the backup chain. After
reliable, the replica count can be configured to 1 (no
that, the data can be downloaded, decrypted, and
replicas). This case is similar to Level 0 of Redundant
reconstructed automatically. Finally, the whole system
Array of Inexpensive Disks (RAID-0) [29] where each
recovered again, and the edge devices can access their
CSP has a unique chunk. All the chunks restoration from
data as usual.
all the CSPs retrieve the data. In the opposite scenario, if
all CPSs are unreliable, the number of replicas sets the
number of CSPs. This case is equivalent to the RAID-1 E. COMPARISON WITH MULTI-CLOUD TECHNIQUES
level [29] where each CSP has all chunks. Data can be Comparing DropStore to the existing Multi-Cloud, data stor-
recovered from any alive CSP. In general, the number of age and backup techniques shows its advantages. Table 1
replicas equals the number of the CSPs count in summarizes this comparison between DropStore and some
DropStore. Whereas raising the replica count improves of the mentioned Multi-Cloud techniques in section II.
the data reliability and immunity against lockdowns by DropStore outperforms the existing Multi-Cloud storage
CSPs, it increases the storage needs and requires techniques in multiple ways. The processing complexities in
DropStore moved to the Droplet rather than the edge
VOLUME 4, 2016
devices. This method enables low-end
5
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

FIGURE 3. DropStore Backup Stages.

edge devices to store and backup their data using the most
sophisticated and advanced methods on Multi-Cloud storage.
DropStore improves data availability and upload/download
time. Even if there is no internet connection, the data are always
accessible and available to edge devices through the Droplet,
Furthermore, upload and download times at edge devices are
not affected by internet speed. This action can help manage
power consumption on the edge devices and enable backup
and storage services in countries with slow internet connections.
DropStore also does not rely on any third party for key and
metadata management, and not require exceptional support
from the cloud side. It improves data security and eliminates the
dependency on the cloud vendors to deploy the system.
FIGURE 4. DropStore Software Architecture.
IV. PERFORMANCE EVALUATION
In this section, we address DropStore performance
based on the results of our experiments. has essential support for data backup on multiple
servers; it does not support data redundancy in a
A. SYSTEM IMPLEMENTATION flexible manner. So that, we created our version of
Fig 4 shows the DropStore evaluation and demonstration. Duplicity that fills this gap and achieves a level of
We built the User-Droplet interface using the Secure File storage usage balance between the different cloud
Transfer Protocol (SFTP) servers.
[30]. SFTP provides a secure channel between the edge To allow easy system installation and configuration,
devices and the Droplet node through end-to-end we implemented a friendly interface for DropStore that
encryption and authentication. SFTP clients are widely installs all the required packages, configures the edge
popular, so it is easy to find a free and open-source client devices’ accounts, configures the cloud accounts, and
suitable to run on any type of edge device. We used restores the old data (in case of system recovery). We
SFTP Jail [31] on the Droplet to prevent the users from published the software we developed under the
accessing each other’s data. SFTP Jail achieves a level Apache-2.0 license, and it is available on GitHub un-
of data isolation and enhances the privacy preservation der https://github.com/RedaMaher/DropStore. Our
of each edge device. In addition, it limits the control of modified Duplicity is available under
each edge device to its files only without affecting the https://github.com/RedaMaher/ DropStore_duplicity.
Droplet or the whole system by any unintended action.
For the Droplet-Cloud interface, a modified version of B. SYSTEM SETUP
Duplicity [32] is used to achieve DropStore system re- The system was evaluated on two different setups. The first
quirements. Duplicity is open-source backup software that setup uses the original Droplet implementation [27] on a
supports incremental backup, encryption, and various Raspberry Pi 3 Model B SBC [33]. It has a 1.2GHz 64-bit
protocols and cloud servers. It is written in Python and quad-core ARMv8 CPU with 1GB RAM and 4 USB ports.
requires a POSIX-like operating system. Although Duplicity We also extended the Droplet storage with an external 1TB
hard disk to accommodate the user data. This setup will be
mentioned as the Original Droplet in the following
6 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

TABLE 1. Comparison between DropStore and recent Multi-Cloud storage techniques.

Technique/Criteria Zaman et al. [9] Singh et al. [10] Triviback [14] ExpanStor [17] DropStore
Versioning Not Supported Not Supported Supported Not Supported Supported
(incremental backup)
Encryption AES-256 Based on POB Supported based AES-256 OpenPGP encryption
numbering on sec-cs data [28]
system structure [15]
Complexity (at High High High High Low (All backup
edge devices) operations are done
on the Droplet)
Data upload time Large (Upload to Large (Upload to Large (Upload to Large (Upload to Small (Upload to
(at edge devices) cloud directly) cloud directly) cloud directly) cloud directly) Droplet in the same
LAN)
Data download Large (Download Large (Download Large (Download Large (Download Small (Download
time (at edge from Cloud from Cloud from Cloud from Cloud from Droplet in the
devices) directly) directly) directly) directly) same LAN)
Data redundancy Not supported Not supported Not supported Supported based Supported through
on LDPC codes data chunks
replication
Key and Done by a third The system uses The system Client-server No dependency on
metadata party key management depends on the system that any third party
management servers back-end support depends on the
back-end support

experiments. The second setup, which will be


mentioned as the Enhanced Droplet, uses more
powerful personal machine with Intel Core i7 7th
Generation CPU and 8GB RAM and 1TB of storage.
User devices are connected to the Droplet via the
Wireless Local Area Network (WLAN). They can
upload/download data to/from the DropStore at maximum
WLAN speed without any latency due to the slowness of the
internet connection or the cloud servers. The communication
between the user devices and the DropStore is end-to-end
encrypted and isolated to maintain privacy.

C. DATASETS
In our experiments, randomized datasets were used to
natural user data. The datasets consist of images, text
files, videos, etc. Their selection has been random to
ensure unbiased outcomes. DropStore will periodically FIGURE 5. Total storage required vs the number of cloud servers and
and daily backup the data to the cloud servers in the replica count.
uncongested times in the user’s home network. This can
typically be achieved after midnight.
size of the data user for this experiment is approximately
D. EVALUATION METRICS 800MB. The results show that the required storage
Several parameters are considered to evaluate the system's increases proportionally with higher replica counts. In other
performance. These parameters are the number of cloud words, the relationship between the necessary storage and
servers, data chunk size, replica count, network latency, user the replica count is linear. The replica count can be less
data size, and many other parameters. Network latency than or equal to the configured CSP count. So, the number
depends mainly on the speed of the internet connection to of cloud servers limits the maximum replica count.
which the Droplet is connected. So in our experiments (except Fig. 6 shows the ratio between the required total storage and
the last one), we used local servers to simplify the experiments the data user size for the same experiment. In using a replica
and eliminate the internet connection speed. count of 1, the ratio is less than 1 since DropStore compresses
the user data to minimize storage needs. For example, when
E. RESULTS data are almost text, the required storage is nearly 20% of the
Fig. 5 shows the total required storage on the cloud servers af- raw data size. This ratio increases when the data contain
ter multiple backups of the user data against different counts of images and videos. Even when replica count 5 is used, the
cloud servers and different replica counts. The overall raw required storage ratio is

VOLUME 4, 2016 7
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

TABLE 2. Metadata size vs different chunk sizes (Source data size ~800MB).

Chunk Size Total Metadata size Metadata size/Total


(MB) (KB) Storage
100 6673.473 0.96%
90 6673.478 0.96%
80 6673.527 0.96%
70 6673.55 0.96%
60 6673.596 0.96%
50 6673.685 0.96%
40 6673.813 0.96%
30 6674.092 0.96%
20 6674.605 0.96%
10 6675.901 0.96%

FIGURE 6. Storage ratio vs the number of cloud servers and replica count.

FIGURE 8. Backup Time vs the number of cloud servers and replica count
(Original Droplet).

Fig. 8 and Fig. 9 demonstrate the backup time of ~200


MB from the Droplet to the cloud on the Original and
FIGURE 7. Required Storage at each CSP vs CSP count. Enhanced Droplet setups, respectively. The average time is
150 seconds on the original setup and 20 seconds on the
enhanced setup. This time involves all backup operations
still less than 5 due to compression. Changing the number of (calculation of delta, encryption, partitioning, and generation
cloud servers does not affect the required total storage. of replicas). The upload time is not included as it depends
DropStore maintains the storage utilization balance be- mainly on the speed of the connection to the internet. The
tween the different cloud servers. Fig. 7 illustrates this with a backup time (without the upload time) increases slightly with
replica count of 1 and a chunk size of 30MB. Maintaining the higher replica counts due to the needed processing to
balance of the storage usage is important to avoid flooding a generate these replicas. DropStore backup time is not a
particular cloud server while the others are not being used significant aspect of the system, as the cloud backup
properly. DropStore keeps the balance regardless of the process is performed offline and does not impact the user
configured replica count and chunk size. experience. This means that the original setup of the Droplet
The overhead of the metadata is minimal in DropStore. is still a reasonable option for the DropStore system.
It is usually around 1% of the total amount of needed The time to restore the user data after a disaster is a crucial
storage. The size of the metadata is slightly influenced by aspect of the system. Fig. 10 shows that DropStore takes nearly
the chunk size, as seen in Table 2. The number of CSPs 300 seconds to restore 800 MB of user data after multiple
used does not affect the size of the metadata at all. backups when running on the Original Droplet setup (Raspberry
The ratio compression affects the metadata size notably. Pi 3). This time includes all the needed operations to retrieve
When the compression ratio is high (required storage is the user data. This contains the reconstruction of the backup
small), the metadata size is relatively larger than the low chain and data decryption and decompression. The download
compression ratio cases shown in Table 3. time is not included to simplify the results.

8 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

TABLE 3. Metadata size vs Compression Ratio.

Data Type Source Data Size Total Storage Total Metadata Required Metadata
(MB) Required (MB) size (MB) Storage/Source size/Required
data size Storage
Mixed ~200 160.96 2.15 78.50% 1.30%
Mostly Videos ~5600 5282.75 33.18 94.30% 0.63%
Mostly Text ~6150 1405.98 54.35 22.80% 3.87%

FIGURE 10. Restore Time vs the number of cloud servers and replica count
(Original Droplet).

FIGURE 9. Backup Time vs the number of cloud servers and replica count
(Enhanced Droplet).

When running on the Enhanced Droplet setup (Intel Core i7


machine), the restore time decreased to about 30 seconds
(~10% of the time on the Original Droplet), as shown in Fig.
11. Hence, upgrading the HW will enhance the backup
and restore time. The DropStore software is flexible to
run on different HW as shown in the experiments.
Backup and restore time results show slight variance
on the enhanced Droplet setup. This action comes from
the fact that the enhanced Droplet runs a general-
purpose operating system (OS), that its timing is not
accurately predictable. Using a dedicated OS distribution
that contains only the mandatory packages for running
the DropStore can reduce this variance. The results are FIGURE 11. Restore Time vs the number of cloud servers and replica count
more consistent on the original Droplet setup because it (Enhanced Droplet).
runs a light OS version on a Rasp-berry Pi.
Better user experience advantage in DropStore is apparent
when used in a country with low internet speeds, especially the between the two cases as uploading data to the cloud servers
upload speed. Fig. 12 shows the backup time of ~260 MB from takes most of the backup process. This indicates that the users
DropStore to the cloud servers. We configured DropStore with a will have a very bad experience if they will back up their data
chunk size of 10 MB, a replica count of 1, and many CSPs of 5. directly to the cloud. DropStore directly addresses this issue by
The experiment compares the backup time in two cases. The making all the cloud backup operations, including data upload
first case is when uploading data to local servers, the simulation operation, on the edge devices.
scenario. Whereas, in the real scenario, the second case is
when data uploading to real cloud servers. This experiment was V. CONCLUSION AND FUTURE WORK
performed from Cairo in Egypt with a home internet connection In this paper, we proposed DropStore, a unique backup
with up to 1 Mbps upload speed. The backup time difference is solution, to tackle data security problems and reliability. The
huge solution uses Multi-Cloud and Fog Computing
VOLUME 4, 2016 9
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

[6] S. Misra and S. Sarkar, “Theoretical modelling of fog computing: A


green computing paradigm to support iot applications,” IET
Networks, vol. 5, 02 2016.
[7] B. Tang, Z. Chen, G. Hefferman, T. Wei, H. He, and Q. Yang, “A
hierarchical distributed fog computing architecture for big data
analysis in smart cities,” in Proceedings of the ASE BigData &
SocialInformatics 2015, ser. ASE BD&SI ’15. New York, NY, USA:
Association for Computing Machinery, 2015. [Online]. Available:
https://doi.org/10.1145/2818869.2818898
[8] K. Kai, W. Cong, and L. Tao, “Fog computing for vehicular ad-hoc
networks: paradigms, scenarios, and issues,” The Journal of China
Universities of Posts and Telecommunications, vol. 23, no. 2, pp. 56–
96, 2016. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S1005888516600213
[9] S. U. Zaman, R. Karim, M. S. Arefin, and Y. Morimoto, “Distributed multi
cloud storage system to improve data security with hybrid encryption,” in
Intelligent Computing and Optimization, P. Vasant, I. Zelinka, and G.-W.
FIGURE 12. Comparison between backup times in local and cloud scenarios Weber, Eds. Cham: Springer International Publishing, 2020, pp. 61–74.
(Original Droplet). [10] P. Singh, N. Agarwal, and B. Raman, “Secure data deduplication using
secret sharing schemes over cloud,” Future Generation Computer
Systems, vol. 88, pp. 156 – 167, 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167739X17327474
paradigms. Data encryption and partitioning on Multi-Cloud [11] A. Sreekumar and S. B. Sundar, “An efficient secret sharing scheme for
Storage maintain data security and user privacy. The solution n out of n scheme using pob-number system,” Hack. in, vol. 33, 2009.
[12] J. W. Dauben, “Chinese mathematics,” The Mathematics of Egypt,
abstracts the individual users from the system complications
Mesopotamia, China, India, and Islam: A Sourcebook, pp. 187–384, 2007.
and improves the backup experience by using Fog Computing [13] O. Ore, Number theory and its history. Courier Corporation, 1988.
advantages. We have built a working implementation of the [14] D. Leibenger and C. Sorge, “Triviback: A storage-efficient secure
system and ran many experiments on real-world scenarios. backup system,” in 2017 IEEE 42nd Conference on Local
Computer Networks (LCN), Oct 2017, pp. 435–443.
DropStore enables securing the user data with minimal [15] ——, “sec-cs: Getting the most out of untrusted cloud storage,” in 2017
complexity to the users. In the future, focusing in the data IEEE 42nd Conference on Local Computer Networks (LCN), Oct 2017,
uploading is necessary. The new scheduling strategies need to pp. 623–631.
[16] R. Pottier and J. Menaud, “Trustydrive, a multi-cloud storage service
consider the QoS parameters and the remaining storage at that protects your privacy,” in 2016 IEEE 9th International Conference
each CSP. Linear block codes development for data replication on Cloud Computing (CLOUD), June 2016, pp. 937–940.
instead of the entire data block repetition incorporates error [17] Y. Wei, F. Chen, and D. C. J. Sheng, “Expanstor: Multiple cloud storage
with dynamic data distribution,” in 2017 IEEE 7th International Sympo-
detection and correction capabilities. sium on Cloud and Service Computing (SC2), Nov 2017, pp. 85–90.
[18] Y. Wei and F. Chen, “expancodes: Tailored ldpc codes for big data storage,”
in 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure
ACKNOWLEDGMENT Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl
We are grateful to Khaled A. Helal for his useful Conf on Big Data Intelligence and Computing and Cyber Science and
Technology Congress(DASC/PiCom/DataCom/CyberSciTech), Aug 2016,
comments and generous help with this article. Also, we
pp. 620–625.
would like to thank the Egyptian ITAC for sponsoring the [19] K. Subramanian and F. L. John, “Dynamic data slicing in multi cloud storage
Droplet project: ”Smart Droplet: an enabler for Fog using cryptographic technique,” in 2017 World Congress on Com-puting and
Communication Technologies (WCCCT), 2017, pp. 159–161.
networks,” which was the basis of this work.
[20] V. Moysiadis, P. Sarigiannidis, I. Moscholios, and P. Honeine,
“Towards distributed data management in fog computing,” Wirel.
REFERENCES Commun. Mob. Comput., vol. 2018, Jan. 2018. [Online]. Available:
https://doi.org/10.1155/2018/7597686
[1] L. Wang, J. Tao, M. Kunze, A. C. Castellanos, D. Kramer, and W. [21] L. Gao, T. H. Luan, S. Yu, W. Zhou, and B. Liu, “Fogroute: Dtn-
Karl, “Scientific cloud computing: Early definition and experience,” based data dissemination model in fog computing,” IEEE Internet
in 2008 10th IEEE International Conference on High Performance of Things Journal, vol. 4, no. 1, pp. 225–235, 2017.
Computing and Communications, Sep. 2008, pp. 825–830. [22] J. Zhang, W. Bai, and X. Wang, “Identity-based data storage
[2] Y. Singh, F. Kandah, and Weiyi Zhang, “A secured cost-effective scheme with anonymous key generation in fog computing,” Soft
multi-cloud storage in cloud computing,” in 2011 IEEE Conference Computing, vol. 24, no. 8, pp. 5561–5571, 2020. [Online].
on Computer Communications Workshops (INFOCOM WKSHPS), Available: https://doi.org/10.1007/s00500-018-3593-z
April 2011, pp. 619– 624. [23] B. Confais, A. Lebre, and B. Parrein, “Performance analysis of
[3] P. Habibi, M. Farhoudi, S. Kazemian, S. Khorsandi, and A. Leon- object store systems in a fog/edge computing infrastructures,” in
Garcia, “Fog computing: A comprehensive architectural survey,” 2016 IEEE International Conference on Cloud Computing
IEEE Access, vol. 8, pp. 69 105–69 133, 2020. Technology and Science (CloudCom), 2016, pp. 294–301.
[4] R. K. Naha, S. Garg, D. Georgakopoulos, P. P. Jayaraman, L. Gao, [24] ——, “An object store service for a fog/edge computing infrastructure based
Y. Xiang, and R. Ranjan, “Fog computing: Survey of trends, on ipfs and a scale-out nas,” in 2017 IEEE 1st International Conference on
architectures, requirements, and research directions,” IEEE Fog and Edge Computing (ICFEC), May 2017, pp. 41–50.
Access, vol. 6, pp. 47 980– 48 009, 2018. [25] S. K. Monga, S. K. Ramachandra, and Y. Simmhan, “Elfstore: A
[5] S. Yi, C. Li, and Q. Li, “A survey of fog computing: Concepts, resilient data storage service for federated edge and fog
applications and issues,” in Proceedings of the 2015 Workshop on resources,” in 2019 IEEE International Conference on Web
Mobile Big Data, ser. Mobidata ’15. New York, NY, USA: Services (ICWS), July 2019, pp. 336– 345.
Association for Computing Machinery, 2015, p. 37–42. [Online]. [26] R. Mayer, H. Gupta, E. Saurez, and U. Ramachandran, “Fogstore:
Available: https://doi.org/10.1145/2757384.2757397 Toward a distributed data store for fog computing,” in 2017 IEEE
Fog World Congress (FWC), Oct 2017, pp. 1–6.
10
VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing

[27] O. A. Nasr, Y. Amer, and M. AboBakr, “The “droplet”: A new personal


device to enable fog computing,” in 2018 Third International Conference
on Fog and Mobile Edge Computing (FMEC), April 2018, pp. 93–99.
[28] “Openpgp,” accessed on Nov. 15, 2020. [Online]. Available:
https://www.openpgp.org
[29] B. Dawkins and A. Jones, “Common raid disk data format
specification,” SNIA Technical Position, March 2009.
[30] “Sftp,” accessed on Nov. 15, 2020. [Online]. Available:
https://www.ssh.com/ssh/sftp/
[31] R. Natarajan, “How to setup chroot sftp in linux,” accessed on Dec.
1, 2020. [Online]. Available:
https://www.thegeekstuff.com/2012/03/chroot-sftp-setup/
[32] “Duplicity,” accessed on Nov. 15, 2020. [Online]. Available:
https://gitlab.com/duplicity/duplicity
[33] “Raspberry pi 3 model b,” accessed on Nov. 15, 2020. [Online].
Available: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/

REDA MAHER received the B.Sc. degree in


Electronics and Electrical Communication Engi-
neering from Cairo University in 2013. He has
served as an embedded software engineer in many
companies since 2014. During this period, he built
a deep experience in wireless communication stan-
dards, networks, and embedded software develop-
ment. He is currently pursuing the M.Sc. degree in
computer engineering at Cairo University, Egypt.
His main interests are fog computing, networks,
Internet of Things, and embedded software architecture.

OMAR A. NASR received his B.Sc. degree from


Cairo University in 2003. He received his M.Sc.
in the field of speech recognition and compres-
sion from Cairo University in 2005. In 2009, he
received his Ph.D. degree in the field of Wireless
Communications from the University of Califor-
nia at Los Angeles (UCLA). His Ph.D. research
was in conjunction with Silvus Technologies Inc.,
where he worked in the development and testing
of 802.11n MIMO-OFDM systems.
He joined Cairo University in 2010, where he is currently an Associate
Professor at the Electronics and Electrical Communication Department.
He has diverse research interests covering the topics of video processing,
artificial intelligence and machine learning, networking, implementation
of signal processing blocks, and IoT. He is the main shareholder of a
startup (TechiBees Inc.) working in computer vision and embedded systems.
He joined the R&D team in the Egyptian National Telecommunications
Regulatory Authority (NTRA) as a consultant from 2011 to 2020. He
was responsible for evaluation, approvals, and funding for tens of research
projects covering many areas including video analytics, machine learning,
smart grids, smart irrigation systems, software defined radios, Internet of
Things, and others.

VOLUME 4, 2016 11

You might also like