Module 3.
2
Cloud computing services
Content Delivery Services
Analytics Services
Deployment & Management Services
Identity & Access Management Services
Open Source Private Cloud Software
Content Delivery Services
• CDS contents Content Delivery Networks (CDN)
• CDN – A distributed system of servers located across multiple
geographic locations to serve content to end-users with high
availability and high performance
• CDNs are useful for serving static content such as text, images etc.
and streaming media
• CDN have number of edge locations deployed in multiple locations
often over multiple backbones
• Request for static or streaming media content that is served by a
CDN are directed to the nearest edge location
• CDN cache the popular content on the edge servers which helps in
reducing bandwidth, cost and improves response time
• Benefits: High speed, Less load, Less bandwidth, High security
1. Amazon CloudFront
• Content delivery service from Amazon
• CloudFront can deliver dynamic, static and streaming content using
a global network of edge locations
• Content in CF is organized into distributions
• Each distribution specifies the original location of the content to be
delivered (e.g. it can be amazon S3 bucket, EC2 instance etc.)
• Distributions can be accessed by their domain names
2. Windows Azure Content
Delivery Network
• Content delivery service by Microsoft
• Azure CDN caches windows azure blobs and static content at the
edge locations to improve the performance of website
• Azure CDN can be enabled on windows azure storage account
Analytics Services
• Cloud based analytics services allow analyzing massive data sets
stored in the cloud either in cloud storages or in cloud databases
• It analyzes the data using programming models such as MapReduce
• Using cloud analytics services applications can perform data-
intensive tasks such as data mining, log file analysis, machine
learning, etc.
1. Amazon Elastic MapReduce
(EMR)
• Amazon EMR is the Map reduce service from amazon
• Based on Hadoop framework running on Amazon EC2 and Amazon
S3
• EMR supports various job types:
i. Custom JAR: runs a Java program that is uploaded on Amazon s3
ii. Hive: It is data warehouse system for Hadoop. You can use Hive to
process data using the SQL- like language, called Hive-QL. Can create
Hive job flow with EMR which can either be an interactive Hive job or
Hive script
iii. Streaming Job: Streaming job flow runs a single Hadoop job
consisting of map and reduce functions implemented in a script or
binary that has been uploaded to Amazon S3. You can write map and
reduce script in Ruby, Perl, python, PHP, R, Bash or C++
iv. Pig programs: Pig is a platform for analyzing large data sets that
consists of a high-level language for expressing data analysis
programs, coupled with infrastructure for evaluating these programs.
You can create a pig job flow with EMR which can be Pig job or Pig
script
v. HBase: It is distributed, scalable, No-SQL database built on top of
Hadoop. EMR allows you to launch an HBase cluster. HBase can be
used for referencing data for Hadoop analytics, real-time log
ingestion and batch log analytics etc.
• To create MapReduce, enter job name, select streaming option for job flow
• Specify the locations of input, output and mapper and reducer programs
• Also specify the number of nodes to use in Hadoop cluster and instance
sizes
• A Hadoop cluster is created as specified in the job flow and MapReduce
program specified in the input is executed
2. Google MapReduce Service
• It is part of App Engine Platform
• App Engine MapReduce is optimized for App Engine environment
and provides different data analyzing capabilities
• It can be accessed using Google MapReduce API
• To execute MapReduce job a MapReduce Pipeline object is
instantiated within app engine application
• MapReduce pipeline specifies the mapper, reducer, data input
reader, output writer
3. Google BigQuery
• Service for querying massive datasets
• BigQuery allows querying datasets using SQL-like queries
• To query data, it is first loaded into BigQuery console or BigQuery
API
• Data can be either CSV or JSON format
• Uploaded data can be queried using BigQuery’s SQL dialect
4. Windows Azure HDInsight
• It is an analytics service from Microsoft
• HDInsight deploys and provisions Hadoop clusters in Azure cloud
and makes Hadoop available as a service
• It uses windows Azure Blob storage as default file system
• It provides interactive consoles for both Javascript and Hive
Deployment and Management
Services
• Cloud-based deployment & management services allow you to
easily deploy and manage applications in the cloud
• Automatically handles deployment tasks
• E.g. capacity provisioning, load balancing, auto-scaling, application
health monitoring
1. Amazon Elastic Beanstalk
• Deployment service by Amazon
• It is PAAS service used for deploying and scaling web applications
• Allow to deploy and manage applications quickly in AWS cloud
• It supports Java, PHP, .NET, Node.js, Python and Ruby applications
• Process to use Elastic Beanstalk:
i. Upload the application
ii. Specify configuration setting in simple wizard
• Cloud service automatically handles instance provisioning, server
configuration, load balancing and monitoring
Features:
• Elastic beanstalk is fastest and simplest way to deploy your application
on AWS
• Enables to focus on writing code rather than spending time managing
and configuring servers etc.
• Automatically scales up and down based on application requirements
• Allows to select the AWS resources like EC2, instance type etc.
Web
Two types Server tier
of Tier
Worker tier
Web server environment
Worker environment
2. Amazon CloudFormation
• Deployment management service by Amazon
• It creates deployments from collection of AWS resources such as
Amazon Elastic Compute cloud, Elastic block store, Amazon SNS,
Elastic load balancing and auto scaling
• A collection of resources you want to use are organized in stack
• These stacks are created using CloudFormation templates
• Templates are pre-defined or you can create your own
Identity and Access Management
Services (IAM)
• Allows managing authentication and authorization of users
• Provides secure access to cloud resources
• Useful for organizations which have multiple users who access cloud
resources
• It manages: User identifiers, user permissions, security credentials,
access keys
Amazon Identity & Windows Azure Active
Access Management Directory
Open Source Private Cloud
Software
• Open source cloud software that used to build private cloud
1. CloudStack
2. Eucalyptus
3. Openstack
1. CloudStack
• Apache cloudstack is open source cloud software that creates
private cloud
• It manages the network, storage and computing nodes (i.e. whole
infrastructure)
• Host running the hypervisor or a large cluster of hundreds of hosts
• A Management server manages one or more zones where each zone
can be a single datacenter
• Each zone has one or more pods
• A pod is rack of hardware comprising of switch and one or more
clusters
• A cluster consist of one or more hosts and primary storage
• Primary storage – Stores disk volumes for all virtual machines
running on host in cluster
• Secondary storage – Stores templates, ISO images and disk volume
snapshots
2. Eucalyptus
• Open source private cloud software for building private and hybrid
cloud that is compatible with AWS API
Fig. Eucalyptus Architecture
• Node Controller (NC): Hosts the virtual machine instances and
manages the virtual network endpoints
• Cluster controller(CC): Manages the virtual machines and is the front
end for a cluster
• Storage Controller(SC): Manages eucalyptus block volumes and snap
shots to the instances within specific cluster. This is equivalent to
Elastic block store (EBS)
• Cloud Controller (CLC): Provides administrative interface for cloud
management and performs high-level resource scheduling, system
accounting, authentication & quota management
• Walrus: It is equivalent to amazon S3 and serves as persistent
storage.
• VMWare Broker: An optional component. It provides AWS
compatible interface for VMWare environments.
3. OpenStack
• It is cloud operating system comprising of collection of interacting
services that control computing, storage and networking resources
• Nova-compute: Compute service manages networks of virtual
machines running on nodes, providing virtual servers on demand
• Nova-networking: Provides connectivity between the interfaces of
network service
• Cinder: Volume service manages storage volumes for virtual
machines
• Swift: Object storage service allows users to store and retrieve files
• Keystone: Identity service for authentication & authorization
• Glance: Image registry acts as a catalog and repository for virtual
machine images
• Nova-scheduler: Open stack scheduler maps nova-api calls to
appropriate openstack components
• The messaging service acts as a central node for message passing
• Orchestration activities such as running an instance are performed
by nova-api
Thank You!