www.ijecs.
in
International Journal Of Engineering And Computer Science ISSN:2319-7242
Volume 4 Issue 4 April 2015, Page No. 11212-11214
Improving Accessing Efficiency of Cloud Storage Using De-
Duplication and Feedback Schemes
R.K.Saranya1, R.Sanjana2, Steffi Miriam Philip3, Shahana M.S.A4
1
Assistant Professor, Department of Computer Science and Engineering, 2, 3 B.E Final Year Students
Jeppiaar Engineering College, Chennai
saranya.rks@gmail.com1, sanjanaramesh15@gmail.com2, miriam.steffi@gmail.com3,
shahanamsa@gmail.com4
Abstract File storage in cloud storage is handled by third parties. Files can be integrated, so that the users are
able to access t h e f i l e s using the centralized management. Due to the great number of users and devices in the
cloud network, the managers cannot effectively manage the efficiency of storage node. Therefore, hardware is wasted
and the complexity for managing the files also increases. In order to reduce workloads due to duplicate files, we
propose the index name servers (INS). It helps to reduce file storage, data de-duplication, optimized node selection,
and server load balancing, file compression, chunk matching, real-time feedback control, IP information, and
busy level index monitoring. Performance is also increased. By usi ng I NS t he fil es can also be reasonably
dist ri buted and workl oad can be decreased
Key Words de- dupli cation,Load Balanci ng, you store less, you also send less data over the network
Hash- Code Function. in case of a disaster, which means you save money in
hardware and network costs over time. The business
benefits of data deduplication.
I.Introduction
Load balancing distributes workloads across multiple
Files stored in the cloud can be accessed at any time computing resources, such as computers, a computer
from any place so long as we will have Internet access. cluster, network links, central processing units or disk
Another benefit is that cloud storage provides
drives. Load balancing aims to optimize resource use,
organizations with off-site backups of data which
reduces costs associated with disaster recovery. Cloud maximize throughput, minimize response time, and
storage can provide the benefits of greater accessibility avoid overload of any single resource. Using multiple
and reliability; rapid deployment; strong protection for components with load balancing instead of a single
backup, archival and disaster recovery purposes; and component may increase reliability through
lower overall storage costs as a result of not having to redundancy. Load balancing usually involves
purchase, manage and maintain expensive hardware. dedicated software or hardware, such as a multilayer
However, cloud storage will have the potential for
switch or a Domain Name System server process.
security and compliance concerns.
Data deduplication is one of the hottest technologies in
II.Related Work
storage right now because it enables companies to save
a lot of money on storage costs to store the data and on
To decrease the workload caused by duplicated files,
the bandwidth costs to move the data when replicating
it offsite for DR. This is great news for cloud this paper proposes a new data management structure:
providers, because if you store less, you need less Index Name Server (INS), which integrates data de-
hardware. If you can deduplicate what you store, you duplication with nodes optimization mechanisms for
can better utilize your existing storage space, which cloud storage performance enhancement. INS can
can save money by using what you have more manage and optimize the nodes according to the client-
efficiently. If you store less, you also back up less, side transmission conditions. By INS, each node can be
which again means less hardware and backup media. If controlled to work in the best status and matched to
R.K.Saranya1 IJECS Volume 4 Issue 4 April, 2015 Page No.11212-11214 Page 11212
suitable clients as possible. IT is improved the that the 3) Fulfilling user requirements for
performance of the cloud storage system efficiently transmission as possible.
distribute the files to reduce the load of each storage In the present work, we are implementing the
node. Techniques, such as run length encoding (RLE), SHA-1 function to improve the efficiency.
dictionary coding, calculation for the digital SHA-1 function is an advance technology that
fingerprinting of data chunks, distributed hash table provides an enhanced functionality for cloud
(DHT), and bloom filter, there have been several storage security and it protects the stored file.
investigations into load balancing in cloud computing This novel technique will improve the service
systems. A digital fingerprint is the essential feature of a of cloud storage.This technique will notify the
data chunk. Each data chunk has its unique fingerprint, user if a duplicate file is present in cloud.
and different chunks have different fingerprints.If it has Therfore it will automatically remove the
same hash values, we can say that data with the same duplicate files using the hash code functions.
hash values must have the same original data, and that By doing this, the space in the cloud is
data with different hash values must have different increased, duplicate files are reduced, load
original input data. balancing takes place and selection of
The bloom filter is composed of a long binary vector optimised node.
and a series of random mapping functions. The bloom
filter is presented to test whether an element is included
in the set. However, with the increase of the elements in IV. Algorithm
the set, more storage space will be needed and the STEP 1: .1) R(k): The initial expected value
retrieval speed will be slowed down.
A DHT node does not maintain and possess all the
information in the network, but stores only its own data
STEP 2: F(k): The output feedback;
and those of its neighboring nodes. This greatly reduces
hardware and bandwidth consumption. Essentially,
DHTs features include Decentralization, Scalability,
Fault Tolerance. STEP 3: M(k): The modified feedback;
keeps changing.
In existing system, the opportunistic load balancing
(OLB) algorithm is used which keep the node busy. STEP 4: Fs(k): The modified internal function of the
Thus, OLB does not consider the current workload of storage node;
each node, but distributes the unprocessed tasks
randomly to available nodes. Although OLB is easy
and direct, this scheduling algorithm does not consider STEP 5: D(k): The external random variable;
the expected task execution time and therefore cannot
achieve good execution time in make span.
STEP 6: X(k): The result within the storage node;
III. Present System & Framework
The INS(Index Name Server) uses a complex
P2P-like structure to manage the cloud data. STEP 7: Y(k): The actual result;
The INS principally handles the one-to-many
matches between the storage nodes’ IP
addresses and hash codes.Three main functions STEP 8: KINS: The optimal node determined by the
of INS include: SHA-1 based on the feedback.
1) Switching the fingerprints to their
corresponding storage nodes; V. Conclusion
2) Confirming and balancing the load of the
storage nodes;
R.K.Saranya1 IJECS Volume 4 Issue 4 April, 2015 Page No.11212-11214 Page 11213
We proposed the SHA-1 to process not only file [7]R. Tong and X. Zhu,. “A load balancing
compression, chunk matching, data de-duplication, strategy based on the combination of static and
real-time feedback control, IP information, and busy dynamic,” in Proc. 2nd Int. Workshop
level index monitoring, but also file storage, optimized Database Technol. Appl., Nov. 2010, pp. 1–4
node selection, and server load balancing. [8]T.-Y. Wu, W.-T. Lee, Y.-S. Lin, Y.-S.
Lin, H.-L. Chan, and J.-S. Huang, “Dynamic
Based on several SHA parameters that monitor IP
load balancing mechanism based on cloud
information and the busy level index of each node, our
storage,” in Proc Comput. Com. Appl. Conf.,
proposed scheme can determine the location of
Jan. 2012, pp. 102–106.
maximum loading and trace back to the source of
[9]Y. Zhang, C. Zhang, Y. Ji, and W. Mi, “ A
demands to determine the optimal backup node.
novel load balancing scheme for DHT-based
According to the transmission states of storage nodes server farm,” in Proc. 3rd IEEE Int. Conf Comput.
and clients, the SHA-1 received the feedback of the Broadband Netw. Multimedia Technol., Oct.
previous transmissions and adjusted the transmission 2010, pp. 980–984.
parameters to attain the optimal performance for the
storage nodes. By compressing and partitioning the
files according to the chunk size of the cloud file
system.
REFERENCES
[1] Y.-M. Huo, H.-Y. Wang, L.-A. Hu, and
H.-G. Yang, “A cloud storage architecture
model for data-intensive applications,” in Proc
Int. Conf Comput. Manage., May 2011, pp. 1–
4.
[2] L. B. Costa and M. Ripeanu, “Towards
automating the configuration of a distributed
storage system,” in Proc. 11th IEEE/ACM Int.
Conf. Grid Comput., Oct. 2010, pp. 201–208.
[3]C.-Y. Chen, K.-D. Chang, and H.-C.
Chao, “Transaction pattern based anomaly
detection algorithm for IP multimedia
subsystem, IEEE Trans Inform. Forensics
Security, vol. 6, no. 1, pp. 152–161, Mar. 2011.
[4]G. Urdaneta, G. Pierre, and M. Van Steen,
“A survey of DHT security techniques,” ACM
Comput. Surveys (CSUR), vol. 43, no. 2, pp.
8:1–8:49, Jan. 2011.
[5]T.-Y. Wu, W.-T. Lee, and C. F. Lin,
“Cloud storage performance enhancement by
real-time feedback control and de-duplication,”
in Proc Wireless Telecommun. Symp., Apr.
2012, pp. 1–5.
[6]H. He and L. Wang, “ P&P: A combined
push-pull model for resource monitoring in
cloud computing environment,” in Proc. IEEE
3rd Int Conf. Cloud Comput., Jul. 2010, pp.
260–267
R.K.Saranya1 IJECS Volume 4 Issue 4 April, 2015 Page No.11212-11214 Page 11214