International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, pp. 195-203
© IASTER 2013, www.iaster.com ISSN Online: 2347-5099
Web Server Farm to Balance and Protect
the Loaded File in Cloud
Maheswari. R, Praveena. R, Aswiga. R.V, Jayanthi. S
Department of CSE, Angel College of Engineering and Technology, Tirupur, Tamil Nadu, India
ABSTRACT
In vast many organizations are increasingly turning to the cloud because it providing ease of
access to resources, cost-effective mechanism, enhanced Collaboration, limitless flexibility,
security etc., of running resource-intensive applications. Cloud systems are very economical and
useful for businesses of all sizes. Still controlling infrastructure of cloud by on-demand, pay-per-
use basis, it finally end on demand in real time. Due to this, there will be a overload in client
side. Another problem of cloud belongs to storage of the loaded file which is accessed by
unauthorized persons. In this paper we show how to build a large-scale web server farm in the
cloud to balance the load at client side and provide the encryption to protect the load file. The
existing optimizing techniques of cloud did not provide high scalability known by our performance
study. Our survey shows architecture with high throughput, scalability and better protection.
Keywords: AmazonS3, Cloud, Data Encryption, Load balancing, Virtual machine, Web
Server.
I. INTRODUCTION
An Amazon’s EC2/S3 infrastructure cloud services changes the economics of computing. First, it
provides unlimited infrastructure capacity on demand. Users can elastically pro-vision their
infrastructure resources from the provider's pool only when needed. Second, pay-per-use
capability users have to pay for the actual consumption instead of for the peak capacity. Third, a
cloud infrastructure is much larger than most enterprise data centers. These characteristics make
cloud an attractive infrastructure solution especially for web applications due to their variable
loads. Because web applications could have a dramatic difference between their peak load and
their normal load, a traditional infrastructure is ill-suited for them. We either grossly over-
provision.for the potential peak, thus wasting valuable capital, or pro-vision for the normal load,
but not able to handle peak when it does materialize. Using the elastic provisioning capability of
a cloud, a web application can ideally provision its infrastructure tracking the load in real time and
pay only for the capacity needed to serve the real application demand.
Due to the large infrastructure capacity a cloud provides, there is a common myth that an
application can scale up unlimitedly and automatically when application demand increases. In
reality, our study shows that scaling an application in a cloud is more difficult, because a cloud is
very different from a traditional enterprise infrastructure in at least several respects.
First, in enterprises, application owners can choose an optimal infrastructure for their
applications amongst various options from various vendors. In comparison, a cloud
infrastructure is owned and maintained by the cloud providers. Because of their commodity
business model, they only offer a limited set of infrastructure components. For example, Amazon
EC2 only offers 5 types of virtual servers and application owners cannot customize the
specification of them.
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
Second, again due to its commodity business model, a cloud typically only provides commodity
Virtual Machines (VM). The computation power and the network bandwidth is typically less than
high-end servers. For example, all Amazon VMs are capable of at most transmitting at roughly
800Mbps, whereas, commercial web servers routinely have several Network Interface Cards
(NIC), each capable of at least 1Gbps. The commodity VMs requires us to use horizontal scaling
to increase the system capacity.
Third, unlike in an enterprise, application owners have little or no control of the underlying cloud
infrastructure. For example, for security reasons, Amazon EC2 disabled many networking layer
features, such as ARP, promiscuous mode, IP spoofing, and IP multicast. Application owners have
no ability to change these infrastructure features. Many performance optimization techniques
rely on the infrastructure choice and control. For example, to scale a web application,
application owners either ask for a hardware load balancer or ask for the ability to assign
the same IP address to all web servers to achieve load balancing. Unfortunately, neither
option is available in Amazon EC2.
Last, commodity machines are likely to fail more frequently. Any architecture design based on
cloud must handle machine failures quickly, ideally in a few milliseconds or faster, in
order not to frequently disrupt service. Because of these characteristics, cloud-hosted web
applications tend to run on a cluster with many standard commodity web servers, thus requiring a
scalable and agile load balancing solution. In our study, we propose a client-side load balancing
architecture that not only leverages the strength of existing cloud components, but also overcomes
the limitations posed above and we proposed encryption techniques for protecting load. More
specifically, we present the following contributions.
A. Load Balancing Architecture
Differing from previous proposals on client-side load balancing, our proposal is built on
insights gained from our performance studies of cloud components. We leverage the strength of a
cloud component (S3's scalability) to avoid any single point of scalability bottle-neck.
B. Encryption Technique in Storage
One of the important concerns that need to be addressed is to assure the customer of the integrity
i.e. correctness of data. As the data is physically not accessible to the user the cloud should provide
a way for the user to check if the integrity of data is maintained or is compromised. We provide a
scheme which gives a proof of data integrity in the cloud which the customer can employ to check
the correctness of data. This proof can be agreed upon by both the cloud and the customer and can
be incorporated in the Service level agreement (SLA). It is important to note that our proof of
data integrity protocol just checks the integrity of data i.e. if the data has been illegally modified or
deleted. In the rest of the paper, we show the limitations of cloud to host a web presence using the
standard techniques. We then describe our contributions.
II. PRIOR WORK
There are well established techniques to scale a web server farm in an owned infrastructure. We
will briefly visit them here and point out their limitations if deployed in cloud.
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
A. Load Balancer
A Standard way to scale web applications is by using a hardware-based load balancer [5]. The load
balancer assumes the IP address of the web application, so all communication with the web
application hits the load balancer rst. The load balancer is connected to one or more identical web
servers in the back-end. Depending on the user session and the load on each web server, the load
balancer forwards packets to different web servers for processing. The hardware-based load
balancer is designed to handle high-level of load, so it can easily scale.
However, a hardware-based load balancer uses application specific hardware-based
components, thus it is typically expensive. Because of cloud's commodity business model, a
hardware-based load balancer is rarely offered by cloud providers as a service. Instead,
one has to use a software-based load balancer running on a generic server.
A software-based load balancer [8, 12, 1] is not a scalable solution, though. The scalability is
usually limited by the CPU and network bandwidth capacity of the generic server that the load
balancer runs on, and a generic server's capacity is much smaller than that of a hardware-based
load balancer. For example, in our test [11], we found that an Amazon EC2 instance can handle at
most 400 Mbps combined ingress and egress traffic.
Even though some cloud platforms, such as Google App Engine [7], implicitly over a hardware-
based load balancer, we cannot easily get around their limitations because of the limited control we
have. In our test, Google App Engine is only able to handle 10 Mbps in/out or less traffic because
of its quota mechanism.
HTTP protocol [6] has a built-in HTTP redirect capability, which can instruct the client to
send the request to another location instead of returning the requested page. Using HTTP redirect,
a front-end server can load balance traffic to a number of back-end servers. However, just like a
software load balancer, a single point of failure and scalability bottleneck still exists.
B. DNS Load Balancing
Another well established technique is DNS aliasing [10]. When a user browses to a domain (e.g.,
www.website.com), the browser first asks its local DNS server for the IP address (e.g.,
209.8.231.11), then, the browser contacts the IP address. In case the local DNS server does not
have the IP address information for the asked domain, it contacts other DNS servers that have
the information, which will eventually be the original DNS server that the web server farm
directly manages. The original DNS server can hand out different IP addresses to different
requesting DNS servers so that the load could be distributed out among the servers sitting at each
IP address.
DNS load balancing has its drawbacks load balancing granularity and adaptiveness that are not
specific to the cloud. First, it does a poor job in balancing the load. For performance reasons, a
local DNS server caches the IP address information. Thus, all browsers contacting the same DNS
server would get the same IP address. Since the DNS server could be responsible for a large
number of hosts, the load could not be effectively smoothed out.
Second, the local DNS server caches IP address for a set period of time, e.g., for days. Until the
cache expires, the local DNS server guides requests from browsers to the same web server. When
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
traffic fluctuates at a time scale much smaller than days, tweaking DNS server settings has little
effect. Traditionally, this drawback has not been as pronounced because the number of back-
end web servers and their IP addresses are static anyway. However, it seriously affects the
scalability of a cloud-based web server farm. A cloud-based web server farm elastically changes
the number of web servers tracking the volume of traffic in minute’s granularity. Days of DNS
caching dramatically reduces this elasticity. More specifically, even though the web server farm
increases the number of web servers to serve the peak load, IP addresses for new web servers will
not be propagated to DNS servers that already have a cached IP address. There-fore, the requests
relying on those DNS servers will keep beng sent to the old web servers which overloads them
while the new web servers remain idle. In addition, when a web server fails, the DNS entry could
not be immediately up-dated. While the DNS changes propagate, users are not able to access the
service even though there are other live web servers.
C. Layer 2 Optimization
In an enterprise where one can fully control the infrastructure, we can apply layer 2 optimization
techniques to build a scalable web server farm that does not impose all drawbacks discussed
above: expensive hardware, single performance bottleneck, and lack of adaptiveness.
There are several variations of layer 2 optimization. One way, referred to as direct web server
return [4], is to have a set of web servers, all have the same IP address, but different layer 2
addresses (MAC address). A browser request may first hit one web server, which may in turn load
balance the request to other web servers. However, when replying to a browser request, any web
server can directly reply. By removing the constraint that all replies have to go through the same
server, we can achieve higher scalability. This technique requires the ability to dynamically
change the map-ping between an IP address and a layer 2 address at the router level.
Another variation, TCP handoff [9], works in a slightly different way. A browser first establishes a
TCP connection with a front-end dispatcher. Before any data transfer occurs, the dispatcher
transfers the TCP state to one of the back-end servers, which takes over the communication with
the client. This technique again requires the ability for the back-end servers to masquerade the
dispatcher's IP address.
Unfortunately, this ability could open doors for security exploits. For example, one can intercept
all packets targeting for a host by launching another host with the same IP address. Because of the
security concerns, Amazon EC2 disables all layer2 capabilities so that any layer 2 technique
to scale an application will not work in Amazon cloud.
D. Client Load Balancing
The concept of client side load balancing is not new [3]. One existing approach, the earlier version
of Netscape, requires modification to the browser. Given the diversity of web browsers
available today, it is difficult to make sure that the visitors to a web site have the required
modification. Smart Client [18], developed as part of the WebOS project [15] [16], requires Java
Applets to perform load balancing at the client. Unfortunately, it has several drawbacks. First, Java
Applets require the Java Virtual Machine, which is not available by default on most browsers. This
is especially true in the mobile environment. Second, if the user accidentally agrees, a Java Applet
could have full access to the client ma-chine, leaving open big security vulnerability. Third, many
organizations only allow administrators to install software, so users cannot view applets by default.
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
Fourth, a Java Applet is an application; the HTML page navigation structure is lost if navigating
within the applet. Last, Smart Client still relies on a central server to download the Java Applet
and the server list, which still presents a single point of failure and scalability bottleneck.
E. Data storage
As data generation is far outpacing data storage it proves costly for small firms to frequently
update their hardware whenever additional data is created. Also maintaining the storages can be a
difficult task. It transmitting the file across the network to the client can consume heavy
bandwidths. The problem is further complicated by the fact that the owner of the data may
be a small device, like a PDA (personal digital assist) or a mobile phone, which have limited
CPU power, battery power and communication bandwidth.
III. NEW ARCHITECTURE
Since traditional techniques for designing a scalable web server farm would not work in a cloud
environment, we need to devise new techniques which leverage scalable cloud components while
getting around their limitations. We are providing the encryption technique to protect the data in
the load.
A. Overview and Originality
We present a new web server farm
architecture: client-side load balancing
with encryption. With this architecture, a
browser decides on a web server that it
will communicate with amongst available
back-end web servers in the farm. The
decision is based on the information that
lists the web server IP addresses and their
individual load list will be maintained
with encryption key.
The new architecture gets around
limitations posed by cloud components,
and achieves a high degree of scalability
with infrastructure components currently
available from cloud providers.
Figure 1: Browser and web server
interaction in our client-side load
balancing with encryption architecture
Compared to a software load balancer, our architecture has no single point of scalability bottleneck
and has protection to the data in the load. Instead, the browser decides on the back-end web
server from the list of servers and communication flows directly between the browser and
the chosen back-end web server. Compared to DNS aliasing technique, our architecture has load
balancing granularity and adaptiveness. Moreover, changes in web server list due to auto-scaling
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
or failures can be quickly propagated to browsers (in milliseconds) so that the clients would not
experience extended congestion or outage. This is because the server information is not cached in
days but rather it is updated whenever a session is created. We achieve high scalability without
requiring sophisticated control on the infrastructure as layer 2 optimization does. Instead, IP
addresses of web servers and their individual load information are there.
B. Our Contribution
Our performance study showed that, being a purposely designed platform, S3 has a high
degree of scalability when delivering static content. This is not surprising. In order to serve
millions of customers, S3 has adopted a distributed implementation that can easily scale.
However, S3 is not suitable as a web hosting platform because modern web sites have to deal with
both static and dynamic contents. Amazon S3 does not support dynamic content processing, such
as CGI, PHP, or JSP.
In our architecture, we use back-end web servers for dynamic content processing, and use client-
side load balancing technique to distribute the traffic across the back-end web servers. We host the
client-side logic (written in JavaScript) as well as the list of web servers and their load information
in S3 in order to avoid a single point of scalability bottleneck.
The detailed architecture is shown in Figure 1. For each dynamic page, we create an anchor static
page. This anchor page includes two parts. The first part contains the list of web servers' IP
addresses and their individual load (such as CPU, memory, network bandwidth, etc.) information.
They are stored in a set of JavaScript variables that can be accessed directly from other JavaScript
code. The second part contains the client-side load balancing logic, again written in JavaScript. We
need one anchor page for each dynamic page in case a user browses to the URL directly; however,
the contents of the anchor pages are all the same. All these information stored in encrypted format.
All anchor pages are hosted in S3 to achieve scalability. We use S3's domain hosting capability,
which maps a domain (by DNS aliasing) to S3. We create a bucket with the same name as the
domain name (e.g., www.website.com). When a user accesses the domain, the request is routed
to S3 and S3 uses the host header to determine the bucket from which to retrieve the content.
The client browser should activate the key for accessing the page contents.
When a client browser loads an anchor page, the browser executes the following steps:
In the key generation let the verifier wishes to the store the file with the archive. Let this file
consist of n file blocks. We initially preprocess the file and create key to be appended to the
file. Let each of the n data blocks have m bits in them. A typical data file which the client
wishes to store in the cloud.
Each of the data from the data blocks mi is encrypted by using a suitable algorithm to
give a new modified data Mi. Without loss of generality we show this process by using a
simple XOR operation. The encryption method can be improvised to provide still
stronger protection for verifier’s data. All the data bit blocks that are generated using the
above procedure are to be concatenated together. This concatenated data should be
appended to the file before storing it at the cloud server. The file along with the appended
data is archived with the cloud.
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
Examine the load variables to determine to which web server it should send the actual
request. The current algorithm randomly chooses a web server where the probability of
choosing any one is inversely proportional to its relative load. The weighted random
distribution algorithm is designed to avoid all client browsers ash to the same web server at
the same time, as a deterministic algorithm would do.
JavaScript sends a request to a proxy on the target web server. The proxy is currently
implemented as another PHP web server running on the same machine but at different port.
The JavaScript sends over two pieces of information encoded as URL parameters. First, it
sends the browser cookie associated with the site (document. Cookie). Second, it sends the
URL path (location. pathname).
The proxy uses the cookie and URL path to reconstruct new HTTP request; and sends the
request to the actual web server.
The web server processes the request; invokes dynamic script processor such as CGI, PHP, or
JSP, as necessary; and returns the result back to the proxy.
The proxy wraps around the result in a JavaScript.
The client browser executes the returned JavaScript from the proxy; updates the page
display; and updates the cookies if a set-cookie header has been returned.
An example of anchor page is shown below. It simply includes two JavaScript sections along with
an empty “ToBeReplaced" tag which will be replaced later with the real page content.
<html>
<head><title></title>
<script type="text/javascript">
// the load balancing logic
</script>
<script type="text/javascript"> // the server list and load
// information in JavaScript // variables
</script></head>
<body onLoad="load ();">
<span id="ToBeReplaced"> </span>
</body>
</html>
As described above, the goal of the load balancing logic is to
choose a back-end web server based on the load, send the client request (cookie along with URL
path) to the proxy, receive the returned JavaScript and update the current HTML page. The
JavaScript le returned from the proxy looks like the following.
function page()
{
return "<HTML page content>";
}
function set-cookie()
{
<set cookie if instructed by web
server>
}
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
The load balancing logic calls the page () function to replace the current page content (“To Be
Replaced" tag) with the string returned by page (). It also calls the setcookie() function which
contains the logic to set the client side cookie if the web server has indicated so. The set cookie ()
function may be empty if the server does not set any cookie.
Even though JavaScript has the capability to load additional content by using XMLHttp Request, most
browsers would not allow XML HttpRequest to a different server for security reasons. Since we are
making a request to a web server other than S3, we have to adopt an architecture like shown above.
To avoid further round trip delays to fetch the anchor page, we keep the load balancing logic
JavaScript in the browser memory. When a user clicks any link on the current page, the JavaScript
intercepts the request, loads the corresponding link from the proxy, and replaces the current page
content with the new page. If the web server fails during a user session (the proxy is not
responding), the JavaScript randomly chooses another web server and resends the request.
This optimization (saving JavaScript in memory) also ensures that all requests from the same
session go to the same back-end server to make it easy to maintain states across requests.
Data Encryption
(i) Simply Archives
The problem is try to obtain and verify a proof that the data that is stored by a user at remote data
storage in the cloud (called cloud storage archives or simply archives) is not modified by the
archive and thereby the integrity of the data is assured. Cloud archive is not cheating the owner, if
cheating, in this context, means that the storage archive might delete some of the data or may
modify some of the data. While developing proofs for data possession at untrusted cloud
storage servers we are often limited by the resources at the cloud server as well as at the client.
(ii) Sentinels
In this scheme, unlike in the key-hash approach scheme, only a single key can be used irrespective
of the size of the file or the number of files whose retrievability it wants to verify. Also the
archive needs to access only a small portion of the file unlike in the key-has scheme which
required the archive to process the entire file for each protocol verification. If the prover
has modified or deleted a substantial portion of file, then with high probability it will also have
suppressed a number of sentinels.
(iii) Verification Phase
The verifier before storing the file at the archive preprocesses the file and appends some data
to the file and stores at the archive. At the time of verification the verifier uses this data to verify
the integrity of the data. It prevents the archive from modifying or deleting the data.
IV. CONCLUSION
For web applications, infrastructure cloud is the best solution enables it to dynamically adjust its
infrastructure capacity on demand.
In this study, we propose the Client-side Load Balancing architecture with cloud storage service
using Amazon S3. Through S3, load balancing directly delivers static contents while allowing a
client to choose a corresponding back-end web server for dynamic contents. A client makes the
load balancing decision based on the list of back-end web servers and their load information. So
the list will be maintained in the encrypted format. For accessing the loaded file client should have
a proper authentication. Our study shows higher performance and better protection.
International Journal of Research in Computer Applications & Information
Technology, Volume 1, Issue 2, October-December, 2013, www.iaster.com
ISSN Online: 2347-5099
REFERENCES
[1] Accoria.Rock web server and load balancer. http://www accoria.com.
[2] Amazon Web Services. Amazon Web Services (AWS). http://www.amazon.com
[3] V.Cardellini, M. Colajanni, and P.S.Yu.Dynamic load balancing on web-server systems.
IEEE Internet Computing, 3(3):28-39, 1999.
[4] L. Cherkasova. FLEX: Load Balancing and Management Strategy for Scalable Web Hosting
Service. IEEE Symposium on Computers and Communications, 0-8, 2000.
[5] F5 Networks. F5 Networks. http://www.f5.com.
[6] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T.Berners-Lee.
Hypertext transfer protocol http/1.1. In IETF RFC 2616, 1999.
[7] Google Inc. Google App Engine. http://code.google.com /appengine/
[8] HaProxy. HaProxy load balancer. http://haproxy.1w t.eu/
[9] G.Hunt,E.Nahum, and J.Tracey. Enabling content-based load distribution for scalable
services. Technical report, 1997.
[10] E. Katz, M. Butler, and R. McGrath. A scalable HTTP server: The NCSA prototype. In Proc.
First International Conference on the World Wide Web, Apr. 1994.
[11] H.Liu and S.Wee. Web Server Farm in the Cloud: Performance Evaluation and Dynamic Architecture
in Proc. of the 1st International Conference on Cloud Computing (CloudCom 2009), Dec 2009.
[12] Nginx. Nginx web server and load balancer. http://nginx.net/
[13] Z.Research. The need for speed II http://www.keynote.com/downloads/ZonaNeedForSpeed.pdf.
[14] A. Vahdat, M. Dahlin, P. Eastham, C. Yoshikawa, T.Anderson, and D.Culler. WebOS:
Software Support for Scalable Web Services. In Proc. the Sixth Workshop on Hot Topics in
Operating Systems, 1997.
[15] E.Mykletun, M.Narasimha, and G. Tsudik, “Authentication and integrityin outsourced
databases,” Trans. Storage, vol. 2, no. 2, pp. 107-138, 2006.
[16] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for searches On encrypted data,”
in SP ’00: Proceedings of the 2000 IEEE Symposium on Security and Privacy. Washington,
DC, USA: IEEE Computer Society, 2000 p.44.
[17] A. Juels and B. S. Kaliski, Jr., “Pors: proofs of retrievability for large files,” in CCS
’07: Proceedings of the 14th ACM conference on Computer and communications security.
New York, NY, USA: ACM, 2007, pp.584-597.
[18] G. Ateniese, R. Burns, R.Curtmola, J.Herring, L.Kissner, Z.Peterson, and D.Song, “Provable data
possession at untrusted stores,” in CCS ’07: Proceedings of the 14th ACM conference on Computer
and communications security. New York, NY, USA: ACM, 2007, pp. 598-609.