Secure Multi-Cloud Backup with Fog
Secure Multi-Cloud Backup with Fog
ABSTRACT Data backup is essential for disaster recovery. Current cloud-based solutions offer
a secure infrastructure. However, there is no guarantee of data privacy while hosting the data on
a single cloud. Another solution is using these Multi-Cloud technologies. Although, using multiple
clouds to save smaller pieces of data can enhance privacy, it comes at the cost of the edge
device’s need to manage different accounts and the communication with different clouds. These
drawbacks made this technology rare to use.This paper proposes DropStore to provide an easy-
to-use, highly secure, and reliable backup system using modern Multi-Cloud and encryption
techniques. DropStore adds an abstraction layer for the end-user to hide all system complexities
using a locally hosted device, “the Droplet," fully managed by the user. Hence, the user does not
rely on any untrusted third party. Using Fog Computing technology accomplish it. The
uniqueness of DropStore comes from the convergence of Multi-Cloud and Fog Computing
principles. Simulation results show that the proposed system improves data protection in terms
of reliability, security, and privacy preservation while maintaining a simple and easy interface
with edge devices.
INDEX TERMS Multi-Cloud, Fog Computing, Data Reliability, Disaster Recovery, User Privacy.
There are some common challenges, including: This paper introduces DropStore, a new data backup
1) Different APIs: Various providers build other API system based on Multi-Cloud and Fog Computing. This
frameworks and use different programming languages. system utilizes the advantages of Multi-Cloud storage to
2) Problem with compatibility: Storage systems need ensure users’ data protection and reliability. At the same
to be consistent across the clouds to fit seamlessly time, it overcomes the problems of Multi-Cloud using the
into a single environment. Therefore, systems have Fog Computing paradigm. System users can easily and
to follow the same data structures and allow the securely backup, restore, and modify their data without
same resources to be incorporated. caring about the sophisticated operations to protect and
3) Complex management: it is called centralized secure the data on Multi-Cloud storage.
management as well and service aggregation, The proposed system has many advantages over
such as identity and access controls. Adding to the existing systems. The following bullets summarize
this, normal users typically need simplified the most important benefits of:
operations to back up and protect their data. The first system that combines the advantages of
Multi-Cloud and Fog Computing paradigms.
To overcome many of Multi-Cloud problems, DropStore
The high speed backup and better user experience.
employs Fog Computing [3]–[5]. Initially, the Fog Computing
concept establishment minimizes the data access latency
The independency on untrusted third parties for
security management.
between the cloud. Fog Computing provides data processing
The remainder of this paper organizes the following
and networking facilities at the network edge. The idea is to
sections: Section II gives a brief overview of the previous
install dedicated servers located geographically at the edge of
work in Multi-Cloud backup systems and the research
the network in micro/nano data centers close to the end-users.
efforts in Fog Computing based storage. Section III
Then, Cloud Computing provides centralized resources in the
describes the DropStore system architecture. Then, Section
network core, and Fog Computing offers distributed services
IV evaluates the system performance. And section V
and resources near/at the network edge. Fog Computing
concludes and talks about some future improvements.
architecture provides short latency services, location
awareness, quick response time, and real-time interactions.
II. RELATED WORK
The centralized nature of cloud computing cannot meet the A. DATA BACKUP ON MULTI-CLOUD
increasing number of internet-connected devices [6]. Insisting Multi-Cloud Storage has gained significant interest in recent
on using cloud computing will lead to network congestion, low years because it offers high availability, solid security, and
service quality, and high latency. Moreover, some applications prevent service provider lockouts. For example, Zaman et
demanding real-time responses will not be able to work al. [9] designed a distributed Multi-Cloud storage system
correctly. Adopting Fog computing will build broad spatial that uses hybrid encryption to secure data. The data users
distributed applications and services. It will encourage are offline encrypted, then divided into chunks and
innovation in position-aware services and real-time applications distributed to multiple cloud servers. The solution
that need quick responses from the core. It also enables deployment depends on a third-party cloud service provider,
supporting the mobility of edge devices. In addition, Fog which will keep track of the chunk sequence and addresses.
computing optimizes energy usage, reduces network Also, it needs a separate key management server to take
congestion, facilitates service delivery, and optimizes the care of the encrypted keys. The system did not implement
spending in the infrastructure. any redundancy technique to ensure data reliability, and no
Fog nodes can be the typical network elements such as explicit versioning is used to reduce the storage needs. In
routers or middle-end servers geographically positioned near addition, the third-party cloud service provider, which will
the end-users. These nodes can execute applications and store deploy the system, is a vulnerable bottleneck and a single
data to provide the required services and enhance the user point of failure.
experience. They are connected to the cloud core through high- Singh et al. [10] proposed a secure data deduplication
speed connections and can consider the cloud arms while the technique using secret sharing schemes. The data are
brain is in the center of the network. sliced based on the Permutation Ordered Binary (POB)
Fog nodes are responsible for processing the local in- numbering system [11] and stored on multiple cloud servers.
formation, which reduces traffic across the network. For The critical information is divided into various random
high-level processing, the cloud received data after being shares based on the Chinese Remainder Theorem (CRT)
processed initially by the fog nodes. For example, the cloud [12], [13] and saved to multiple servers. Whereas from k
made future planning decisions in smart cars, and cities. servers out of n can restore the key, where k is less than n,
This has the big picture based on the data collected by the the data restored if all of the shares are available.
fog nodes. In contrast, fog nodes process the real-time Therefore, this system will not survive in the case of cloud
interactions locally [7], [8]. service provider lockouts.
Triviback [14] is a chunking based backup system that
2
minimizes the storage using the sec-cs data structure
VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
VOLUME 4, 2016 3
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
B. USER-DROPLET SIDE
The DropStore architecture is designed in a way to
guarantee high-speed, security, and privacy:
1) Speed
FIGURE 2. DropStore System Components.
DropStore is running on the Droplet on user premises. User
devices and the Droplet are in the same Local Area Network
(LAN). Consequently, this results in many advantages. First,
with the ability to read, modify, and delete the stored
the data backup speed from the user devices to the Droplet
data at any time. The privacy of each edge node
will be super fast, up to the LAN speed, compared to direct
should be protected to access the other nodes' data
data upload to the cloud. Second, the backup service is
on the system. These requirements should be
achieved without complicated operations at the edge available even if there is no internet access.
nodes.
2) Security
Droplet: It represents the Fog layer but with some
added advantages because of being a personal The Droplet is a personal device. Thus the connected
device. For example, it is fully controlled by the user. devices will be well known and belong to the Droplet
Hence it acts as a fully trusted local backup server. owner. The security threats are more controllable, and
Public Cloud: The data are collected from the edge the edge devices are under user control. Drop-Store
devices on the Droplet and periodically backed up to restricts the Droplet access to the specified user only
several public cloud servers. Backing up the data to the through authentication, allows better management, and
cloud servers offers disaster recovery and increases reduces security threats.
reliability. The data will be encrypted and divided into Edge devices can upload, download, or modify their data
multiple chunks before storing it in the public cloud. This interactively and at high speed. End-to-end encryption and
method prevents any malicious cloud service provider from authentication are used between the edge devices and the
utilizing the user data or compromising the user’s privacy. Droplet to protect the edge devices from Password-Sniffing,
DropStore System Software: This is the backup system Man-In-The-Middle, or any other attacks observed through
that runs on the Droplet. The DropStore offers a safe access to the user's local area network.
but straightforward backup interface to the edge nodes.
However, it ultimately protects data on a distributed 3) Privacy
Multi-Cloud storage. DropStore is responsible for all Isolating the edge devices’ data maintained the privacy on the
complicated operations to keep the user data secure Droplet where no edge device can access the data of the other
and reliable, whereas the edge nodes will not pay much devices. Therefore the DropStore can be used by all users on
attention to these operations. In other words, DropStore the same LAN without privacy concerns.
offloads the required processing by the edge devices to A product from network service providers, cloud
the Fog nodes. This type of processing offload allows service providers, or any third-party company can offer
using modern backup techniques without being limited dropstore. This introduces a new business opportunity.
by the low resources of the edge nodes. The storage capacity of the Droplet can vary according to
On the one hand, the edge nodes are usually battery- the user needs, the number of the expected edge
operated and in low power consumption mode in devices, and the nature of the data.
general. On the other hand, the fog nodes connect to a
power source. Hence, there are lower restrictions on C. DROPLET-CLOUD SIDE
their activity time. The Droplet controls the edge devices’ data. DropStore will
By using the DropStore system architecture, the system can back up these data periodically without user intervention
solve several problems related to backup. Drop- using a secure scheme to Multi-Cloud storage.
4 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
Fig. 3 shows the backup stages from the Droplet to the Multi- more network bandwidth. In addition, a high replica
Cloud storage. Each stage describes the following: count weakens the security level as each CSP has a
relatively large portion of the data (all data in the case
1) Data Delta Calculation: DropStore aims to save network of replica count equals the number of CSPs
bandwidth and storage needs. Hence, it adopts a accounts). Choosing the replica count is a
versioning scheme to back up the data. At each periodic compromise between data reliability, storage needs,
backup, DropStore calculates the delta from the last network bandwidth, and security. Although we
backup instant and only restores the new delta. In the propose using simple replication to ensure data
case of the first backup, all data will be in the delta. The reliability, some other techniques (such as channel
system generates the mandatory metadata to enable data coding) can detect or correct errors.
chain construction from the incremental backups to 5) Data Upload: This is the final stage of the backup
perform data recovery. These metadata are necessary to scenario in DropStore. The data chunks are going to
calculate the delta between a backup instant and a be uploaded to the cloud servers. DropStore
previous backup instant. authenticates itself with each CSP automatically and
2) Data Encryption and Compression: data are no longer uploads the user data. DropStore adopts a round-
under user control and visibility when uploaded to the robin scheme to preserve the balance of storage
public cloud. They can be used without full authority usage on each CSP account. Metadata are distributed
from the user, especially if the service providers are and replicated in the same manner as the user data.
not trusted. DropStore solves this problem by using This scheme keeps the storage balanced on all the
encryption and data partitioning. The data delta CSPs over the long term and avoid overwhelming
coming from the first stage is encrypted using specific CSP.
OpenPGP [28] scheme. The owner of Droplet should
provide the key pairs for the data encryption in backup D. DATA RETRIEVAL
scenarios and data decryption in retrieval scenarios.
Edge devices can restore their data at any time as long
After the delta encryption, DropStore compresses this
as they have access to the DropStore. Even if there is no
encrypted delta to reduce the storage needs on the
internet connection, edge devices can download the data
remote cloud servers.
from the DropStore in case of any data loss incident at
3) Data Partitioning: As described in stage 2, DropStore
the edge devices. Only the edge device authentication
obfuscates the user data by encryption and
accomplishes the data retrieval with the DropStore using
partitioning. The encrypted data from stage 2 will be
its username and password. After this authentication, the
split into several chunks. The chunk size can be
edge device has immediate and full access to its
configured in DropStore. Data partitioning means that
previously backed-up data.
no Cloud Service Provider (CSP) has the complete
In the event of a disaster in the Fog layer (the Droplet),
information. So it is going to be hard for CSPs to learn
DropStore usage can perform data retrieval and recovery
from encrypted chunks.
conveniently. Disaster events in the Fog layer include
4) Data Redundancy: Each chunk volume generated
Droplet software failures, hardware failures, hard drive
replication from the previous stage offers data reliability
crashes, or other natural disasters. To recover the system,
against cloud accounts lockdown or CSPs unavailability.
the Droplet owner only needs to provide the master key and
The number of replicas is configurable in DropStore to
the credentials of the used CSPs after bringing up a new
provide high versatility. Storage needs are proportionally
Droplet. Then, DropStore can search for the metadata on all
increasing with the replica count. If all CSPs are highly
the configured CSPs and construct the backup chain. After
reliable, the replica count can be configured to 1 (no
that, the data can be downloaded, decrypted, and
replicas). This case is similar to Level 0 of Redundant
reconstructed automatically. Finally, the whole system
Array of Inexpensive Disks (RAID-0) [29] where each
recovered again, and the edge devices can access their
CSP has a unique chunk. All the chunks restoration from
data as usual.
all the CSPs retrieve the data. In the opposite scenario, if
all CPSs are unreliable, the number of replicas sets the
number of CSPs. This case is equivalent to the RAID-1 E. COMPARISON WITH MULTI-CLOUD TECHNIQUES
level [29] where each CSP has all chunks. Data can be Comparing DropStore to the existing Multi-Cloud, data stor-
recovered from any alive CSP. In general, the number of age and backup techniques shows its advantages. Table 1
replicas equals the number of the CSPs count in summarizes this comparison between DropStore and some
DropStore. Whereas raising the replica count improves of the mentioned Multi-Cloud techniques in section II.
the data reliability and immunity against lockdowns by DropStore outperforms the existing Multi-Cloud storage
CSPs, it increases the storage needs and requires techniques in multiple ways. The processing complexities in
DropStore moved to the Droplet rather than the edge
VOLUME 4, 2016
devices. This method enables low-end
5
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
edge devices to store and backup their data using the most
sophisticated and advanced methods on Multi-Cloud storage.
DropStore improves data availability and upload/download
time. Even if there is no internet connection, the data are always
accessible and available to edge devices through the Droplet,
Furthermore, upload and download times at edge devices are
not affected by internet speed. This action can help manage
power consumption on the edge devices and enable backup
and storage services in countries with slow internet connections.
DropStore also does not rely on any third party for key and
metadata management, and not require exceptional support
from the cloud side. It improves data security and eliminates the
dependency on the cloud vendors to deploy the system.
FIGURE 4. DropStore Software Architecture.
IV. PERFORMANCE EVALUATION
In this section, we address DropStore performance
based on the results of our experiments. has essential support for data backup on multiple
servers; it does not support data redundancy in a
A. SYSTEM IMPLEMENTATION flexible manner. So that, we created our version of
Fig 4 shows the DropStore evaluation and demonstration. Duplicity that fills this gap and achieves a level of
We built the User-Droplet interface using the Secure File storage usage balance between the different cloud
Transfer Protocol (SFTP) servers.
[30]. SFTP provides a secure channel between the edge To allow easy system installation and configuration,
devices and the Droplet node through end-to-end we implemented a friendly interface for DropStore that
encryption and authentication. SFTP clients are widely installs all the required packages, configures the edge
popular, so it is easy to find a free and open-source client devices’ accounts, configures the cloud accounts, and
suitable to run on any type of edge device. We used restores the old data (in case of system recovery). We
SFTP Jail [31] on the Droplet to prevent the users from published the software we developed under the
accessing each other’s data. SFTP Jail achieves a level Apache-2.0 license, and it is available on GitHub un-
of data isolation and enhances the privacy preservation der https://github.com/RedaMaher/DropStore. Our
of each edge device. In addition, it limits the control of modified Duplicity is available under
each edge device to its files only without affecting the https://github.com/RedaMaher/ DropStore_duplicity.
Droplet or the whole system by any unintended action.
For the Droplet-Cloud interface, a modified version of B. SYSTEM SETUP
Duplicity [32] is used to achieve DropStore system re- The system was evaluated on two different setups. The first
quirements. Duplicity is open-source backup software that setup uses the original Droplet implementation [27] on a
supports incremental backup, encryption, and various Raspberry Pi 3 Model B SBC [33]. It has a 1.2GHz 64-bit
protocols and cloud servers. It is written in Python and quad-core ARMv8 CPU with 1GB RAM and 4 USB ports.
requires a POSIX-like operating system. Although Duplicity We also extended the Droplet storage with an external 1TB
hard disk to accommodate the user data. This setup will be
mentioned as the Original Droplet in the following
6 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
Technique/Criteria Zaman et al. [9] Singh et al. [10] Triviback [14] ExpanStor [17] DropStore
Versioning Not Supported Not Supported Supported Not Supported Supported
(incremental backup)
Encryption AES-256 Based on POB Supported based AES-256 OpenPGP encryption
numbering on sec-cs data [28]
system structure [15]
Complexity (at High High High High Low (All backup
edge devices) operations are done
on the Droplet)
Data upload time Large (Upload to Large (Upload to Large (Upload to Large (Upload to Small (Upload to
(at edge devices) cloud directly) cloud directly) cloud directly) cloud directly) Droplet in the same
LAN)
Data download Large (Download Large (Download Large (Download Large (Download Small (Download
time (at edge from Cloud from Cloud from Cloud from Cloud from Droplet in the
devices) directly) directly) directly) directly) same LAN)
Data redundancy Not supported Not supported Not supported Supported based Supported through
on LDPC codes data chunks
replication
Key and Done by a third The system uses The system Client-server No dependency on
metadata party key management depends on the system that any third party
management servers back-end support depends on the
back-end support
C. DATASETS
In our experiments, randomized datasets were used to
natural user data. The datasets consist of images, text
files, videos, etc. Their selection has been random to
ensure unbiased outcomes. DropStore will periodically FIGURE 5. Total storage required vs the number of cloud servers and
and daily backup the data to the cloud servers in the replica count.
uncongested times in the user’s home network. This can
typically be achieved after midnight.
size of the data user for this experiment is approximately
D. EVALUATION METRICS 800MB. The results show that the required storage
Several parameters are considered to evaluate the system's increases proportionally with higher replica counts. In other
performance. These parameters are the number of cloud words, the relationship between the necessary storage and
servers, data chunk size, replica count, network latency, user the replica count is linear. The replica count can be less
data size, and many other parameters. Network latency than or equal to the configured CSP count. So, the number
depends mainly on the speed of the internet connection to of cloud servers limits the maximum replica count.
which the Droplet is connected. So in our experiments (except Fig. 6 shows the ratio between the required total storage and
the last one), we used local servers to simplify the experiments the data user size for the same experiment. In using a replica
and eliminate the internet connection speed. count of 1, the ratio is less than 1 since DropStore compresses
the user data to minimize storage needs. For example, when
E. RESULTS data are almost text, the required storage is nearly 20% of the
Fig. 5 shows the total required storage on the cloud servers af- raw data size. This ratio increases when the data contain
ter multiple backups of the user data against different counts of images and videos. Even when replica count 5 is used, the
cloud servers and different replica counts. The overall raw required storage ratio is
VOLUME 4, 2016 7
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
TABLE 2. Metadata size vs different chunk sizes (Source data size ~800MB).
FIGURE 6. Storage ratio vs the number of cloud servers and replica count.
FIGURE 8. Backup Time vs the number of cloud servers and replica count
(Original Droplet).
8 VOLUME 4, 2016
Maher et al.: DropStore: A Secure Backup System Using Multi-Cloud and Fog Computing
Data Type Source Data Size Total Storage Total Metadata Required Metadata
(MB) Required (MB) size (MB) Storage/Source size/Required
data size Storage
Mixed ~200 160.96 2.15 78.50% 1.30%
Mostly Videos ~5600 5282.75 33.18 94.30% 0.63%
Mostly Text ~6150 1405.98 54.35 22.80% 3.87%
FIGURE 10. Restore Time vs the number of cloud servers and replica count
(Original Droplet).
FIGURE 9. Backup Time vs the number of cloud servers and replica count
(Enhanced Droplet).
VOLUME 4, 2016 11