0% found this document useful (0 votes)
140 views69 pages

NIST Cloud Architecture Guide

Uploaded by

mywishbro55555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views69 pages

NIST Cloud Architecture Guide

Uploaded by

mywishbro55555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 69

UNIT-3

Explain NIST Reference Architectural model in detail?

● NIST stands for National Institute of Standards and Technology

● The goal is to achieve effective and secure cloud computing to reduce cost
and improve services

● NIST composed for six major workgroups specific to cloud computing

○ Cloud computing target business use cases work group


○ Cloud computing Reference architecture and Taxonomy work group
○ Cloud computing standards roadmap work group
○ Cloud computing SAJACC (Standards Acceleration to Jumpstart
Adoption of Cloud Computing) work group
○ Cloud Computing security work group

● Objectives of NIST Cloud Computing reference architecture


○ Illustrate and understand the various level of services
○ To provide technical reference
○ Categorize and compare services of cloud computing
○ Analysis of security, interoperatability and portability

● In general, NIST generates report for future reference which includes survey,
analysis of existing cloud computing reference model, vendors and federal
agencies.

● The conceptual reference architecture shown in figure 3.2 involves five


actors. Each actor as entity participates in cloud computing
● Cloud consumer: A person or an organization that maintains a business
relationship with and uses a services from cloud providers

Cloud Consumer Cloud Broker


Cloud Provider
Service orchestration Cloud Service
management Service
Cloud Auditor Service Layer Implementation
SaaS
Provisioning
Security Audit PaaS and

Privacy
Securit
IaaS Configuring Service

y
Privacy impact Portability and Aggregation
Resource abstraction Interoperat-
Audit & Control Layer
Service Arbitrage
Performance Audit Physical resource Business support
Layer

Cloud Carrier

Figure 3.2 Conceptual reference model

● Cloud provider: A person, organization or entity responsible for making a


service available to interested parties

● Cloud auditor: A party that conduct independent assessment of cloud


services, information system operation, performance and security of cloud
implementation

● Cloud broker: An entity that manages the performance and delivery of cloud
services and negotiates relationship between cloud provider and consumer.

● Cloud carrier: An intermediary that provides connectivity and transport of


cloud services from cloud providers to consumers.
Consumer Auditor

Broker Provider

Figure 3.3 Interaction between actors

● Figure 3.3 illustrates the common interaction exist in between cloud


consumer and provider where as the broker used to provide service to
consumer and auditor collects the audit information.

● The interaction between the actors may lead to different use case scenario.

● Figure 3.4 shows one kind of scenario in which the Cloud consumer may
request service from a cloud broker instead of contacting service provider
directly. In this case, a cloud broker can create a new service by combining
multiple services.

Provider 1

Consumer Broker

Provider 2

Figure 3.4 Service from Cloud Broker

● Figure 3.5 illustrates the usage of different kind of Service Level


Agreement (SLA) between consumer, provider and carrier.

SLA #1 SLA #2
Consumer Provider Carrier
Maintain the consistent Specify the capacity and

level of service functionality

Figure 3.5 Multiple SLA between actors


● Figure 3.6 shows the scenario where the Cloud auditor conducts independent
assessment of operation and security of the cloud service implementation.

Auditor

Consumer Provider

Figure 3.6 Independent assessments by cloud auditor

● Cloud consumer is a principal stake holder for the cloud computing service
and requires service level agreements to specify the performance
requirements fulfilled by a cloud provider.

What is the importance of cloud storage and what are different types of cloud storages?

In Cloud Computing, Cloud storage is a virtual locker where we can remotely stash any data. When we upload a file
to a cloud-based server like Google Drive, OneDrive, or iCloud that file gets copied over the Internet into a data
server that is cloud-based actual physical space where companies store files on multiple hard drives.

There are 3 types of storage systems in the Cloud as follows.

 Block-Based Storage System

 File-Based Storage System

 Object-Based Storage System

Let’s discuss it one by one as follows.

1. Block-Based Storage System –

 Hard drives are block-based storage systems. Your operating system like Windows or Linux actually sees a
hard disk drive. So, it sees a drive on which you can create a volume, and then you can partition that
volume and format them.

 For example, If a system has 1000 GB of volume, then we can partition it into 800 GB and 200 GB for local
C and local D drives respectively.
 Remember with a block-based storage system, your computer would see a drive, and then you can create
volumes and partitions

File-Based Storage System –

 In this, you are actually connecting through a Network Interface Card (NIC). You are going over a network,
and then you can access the network-attached storage server (NAS). NAS devices are file-based storage
systems.

 This storage server is another computing device that has another disk in it. It is already created a file
system so that it’s already formatted its partitions, and it will share its file systems over the network. Here,
you can actually map the drive to its network location.

 In this, like the previous one, there is no need to partition and format the volume by the user. It’s already
done in file-based storage systems. So, the operating system sees a file system that is mapped to a local
drive letter.

Object-Based Storage System –

 In this, a user uploads objects using a web browser and uploads an object to a container i.e., Object
Storage Container. This uses the HTTP Protocols with the rest of the APIs (for example: GET, PUT, POST,
SELECT, DELETE).

 For example, when you connect to any website, you need to download some images, text, or anything that
the website contains. For that, it is a code HTTP GET request. If you want to review any product then you
can use PUT and POST requests.

 Also, there is no hierarchy of objects in the container. Every file is on the same level in an Object-Based
storage system.

Advantages of Cloud Storage

 Scalability – Capacity and storage can be expanded and performance can be enhanced.

 Flexibility – Data can be manipulated and scaled according to the rules.

 Simpler Data Migrations – As it can add and remove new and old data when required and eliminates
disruptive data migrations.

 Recovery -In the event of a hard drive failure or other hardware malfunction, you can access your files on
the cloud.
Best Cloud Storage Providers

Below are the various cloud storage providers:

1. Dropbox

Ideal for users having limited data. This platform offers a state-of-the-art workspace to store files at a
central location. You can access it from any geographical location 24X7. Computers, mobiles as well as
tablets can access files in Dropbox.

Features:

 It can be employed by users of any size.

 Permits you to share files with anybody.

 Admin controls make team management a breeze

 Shared data is secured.

Disadvantage:

 Plans start off with a mere 2GB of free data which may be insufficient for many.

Pricing:

 2GB is offered with no charges. There exist 2 plans for individuals as well as 2 plans for teams. The 2
individuals plans are termed Plus and Professional. In the Plus plan, the user is charged $8.25 a month. In
the Professional plan, the user is charged $16.58 a month.

 The 2 team plans are termed Standard and Advanced. In the Standard Plan, the user is charged $12.50 a
month. In the Advanced plan, the user is charged $20 a month. The user is offered a free trial for the
Advanced, Professional as well as Standard plans. No free trial exists for the Plus plan.

2. Microsoft OneDrive

OneDrive is a built-in feature of Windows 10’s File Explorer. To use it you need not download any extra app.
Hence it is extremely convenient for Windows 10 users. Microsoft’s Photos app has the option of
employing OneDrive in order to sync images across all your respective devices. Recently AutoCAD has been
added to OneDrive a move that may please as well as attract AutoCAD users. The Personal Vault feature
provides an extra layer of security. An app exists for iOS as well as Android devices. There is also a handy
app in the App Store meant for Mac users.

Pricing:
 OneDrive offers 5GB at no cost. The former offers 100GB at $1.99 per month. Next 1TB comes at $7 per
month. OneDrive for Business provides Unlimited storage at $10 per month.

3. Google Drive

Google Drive is one of today’s most powerful cloud storage services. To use it, however, is a bit different
from what you might be used to, so here we see the advantages and disadvantages of Google Drive.

Advantages:

 Huge free storage space is offered.

 State-of-the-art productivity suite for collaboration purposes.

 Comes with desktop-to-desktop file syncing functionality

 A plethora of third-party integrations

 Apps designed for different platforms

Disadvantages:

 Everything is stored locally by the consumer desktop utility

 Privacy is a major and relevant concern

 Productivity application fares poorly when benchmarked against the contemporary Microsoft Office.

 Lacks password protection that is much needed for shared files.

Overall Google Drive is a slick, features packed cloud storage provider with the added bonus of huge cloud
storage. If you are looking for superior collaboration capabilities along with just storage then this is the
right choice.

4. iCloud

The novelty of this platform is that it is the best as well as the most ideal cloud storage provider for Apple
aficionados. Moreover, it works great for private users. The cloud storage platform functions on operating
systems that include iOS, Mac OS as well as Windows.
Features:

 It claims to fame that it is the prestigious Apple company’s proprietary cloud storage platform.

 Users can collaborate with applications that include Notes, Keynote as well as Pages.

 Resumes each conversation from the pausing time

 It reaches peak performance on changing the phone.


Explain in detail about amazon s3 service model ?

● The well known cloud storage service is Amazon’s Simple Storage Service
(S3), which is launched in 2006.

● Amazon S3 is designed to make web scale computing easier for developers.

● Amazon S3 provides a simple web services interface that can be used to


store and retrieve any amount of data, at any time, from anywhere on
the Web.

● It gives any developer access to the same highly scalable data storage
infrastructure that Amazon uses to run its own global network of web
sites.

● The service aims to maximize benefits of scale and to pass those


benefits on to developers.

● Amazon S3 is intentionally built with a minimal feature set that includes


the following functionality:

○ Write, read, and delete objects containing from 1 byte to 5


gigabytes of data each. The number of objects that can be stored
is unlimited.
○ Each object is stored and retrieved via a unique developer assigned key.
○ Objects can be made private or public and rights can be assigned
to specific users.
○ Uses standards based REST and SOAP interfaces designed to work
with any Internet development toolkit.
● Design Requirements Amazon built S3 to fulfill the following design
requirements:

○ Scalable: Amazon S3 can scale in terms of storage, request rate


and users to support an unlimited number of web-scale
applications.
○ Reliable: Store data durably with 99.99 percent availability. Amazon
says it does not allow any downtime.
○ Fast: Amazon S3 was designed to be fast enough to support high-
performance applications. Server-side latency must be insignificant
relative to Internet latency.
○ Inexpensive: Amazon S3 is built from inexpensive commodity
hardware components.
○ Simple: Building highly scalable, reliable, fast and inexpensive storage is
difficult.

● Design Principles Amazon used the following principles of distributed


system design to meet Amazon S3 requirements:

○ Decentralization: It uses fully decentralized techniques to remove


scaling bottlenecks and single points of failure.
○ Autonomy: The system is designed such that individual components
can make decisions based on local information.
○ Local responsibility: Each individual component is responsible for
achieving its consistency. This is never the burden of its peers.
○ Controlled concurrency: Operations are designed such that no or
limited concurrency control is required.
○ Failure toleration: The system considers the failure of components to
be a normal mode of operation and continues operation with no or
minimal interruption.
○ Controlled parallelism: Abstractions used in the system are of such
granularity that parallelism can be used to improve performance and
robustness of recovery or the introduction of new nodes.
○ Symmetry: Nodes in the system are identical in terms of functionality,
and require no or minimal node specific configuration to function.
○ Simplicity: The system should be made as simple as possible, but no
simpler.
● Amazon keeps its lips pretty tight about how S3 works, but according to
Amazon, S3’s design aims to provide scalability, high availability, and low
latency at commodity costs.
● S3 stores arbitrary objects at up to 5GB in size, and each is accompanied by
up to 2KB of metadata.

● Objects are organized by buckets.

● Each bucket is owned by an AWS account and the buckets are identified
by a unique user assigned key.

● Buckets and objects are created, listed and retrieved using either a
REST or SOAP interface.

● Objects can also be retrieved using the HTTP GET interface or via BitTorrent.

● An access control list restricts who can access the data in each bucket.

● Bucket names and keys are formulated so that they can be accessed using
HTTP.

● Requests are authorized using an access control list associated with each
bucket and object, for instance:
http://s3.amazonaws.com/samplebucket/samplekey

● The Amazon AWS Authentication tools allow the bucket owner to


create an authenticated URL with a set amount of time that the
URL will be valid.

● Bucket items can also be accessed via a BitTorrent feed, enabling S3 to


act as a seed for the client.

● Buckets can also be set up to save HTTP log information to another bucket.

● This information can be used for later data mining.


Explain the service models in detail?

● The development of cloud computing introduces the concept of everything as


a Service (XaaS). This is one of the most important elements of cloud
computing

● Cloud services from different providers can be combined to provide a


completely integrated solution covering all the computing stack of a system.

● IaaS providers can offer the bare metal in terms of virtual machines where
PaaS solutions are deployed.

● When there is no need for a PaaS layer, it is possible to directly customize the
virtual infrastructure with the software stack needed to run applications.

● This is the case of virtual Web farms: a distributed system composed of Web
servers, database servers and load balancers on top of which prepackaged
software is installed to run Web applications.
● Other solutions provide prepackaged system images that already contain the
software stack required for the most common uses: Web servers, database
servers or LAMP stacks.

● Besides the basic virtual machine management capabilities, additional


services can be provided, generally including the following:

○ SLA resource based allocation


○ Workload management
○ Support for infrastructure design through advanced Web interfaces
○ Integrate third party IaaS solutions

● Figure 3.11 provides an overall view of the components forming an


Infrastructure as a Service solution.

● It is possible to distinguish three principal layers:

○ Physical infrastructure
○ Software management infrastructure
○ User interface

Web based Management Interface

Infrastructure Management Service


Provisioning
Monitoring Reservation Scheduling
QoS, Billing VM Image VM Pool

Physical Infrastructure Third party

Data Center Cluster Desktop IaaS


Cloud

Figure 3.11 IaaS reference implementation


● At the top layer the user interface provides access to the services
exposed by the software management infrastructure.

● Such an interface is generally based on Web 2.0 technologies: Web


services, RESTful APIs and mash ups.

● Web services and RESTful APIs allow programs to interact with the
service without human intervention, thus providing complete integration
within a software system.

● The core features of an IaaS solution are implemented in the infrastructure


management software layer.

● In particular, management of the virtual machines is the most


important function performed by this layer.

● A central role is played by the scheduler, which is in charge of allocating the


execution of virtual machine instances.

● The scheduler interacts with the other components such as

○ Pricing and billing component


○ Monitoring component
○ Reservation component
○ QoS/SLA management component
○ VM repository component
○ VM pool manager component
○ Provisioning component

● The bottom layer is composed of the physical infrastructure, on top


of which the management layer operates.
● From an architectural point of view, the physical layer also includes the
virtual resources that are rented from external IaaS providers.

● In the case of complete IaaS solutions, all three levels are offered as service.

● This is generally the case with public clouds vendors such as Amazon, GoGrid,
Joyent, Rightscale, Terremark, Rackspace, ElasticHosts, and Flexiscale, which
own large datacenters and give access to their computing infrastructures
using an IaaS approach.

3.4.1 laaS

● Infrastructure or Hardware as a Service (IaaS/HaaS) solutions are the most


popular and developed market segment of cloud computing.

● They deliver customizable infrastructure on demand.

● The available options within the IaaS offering umbrella range from single
servers to entire infrastructures, including network devices, load balancers,
database servers and Web servers.

● The main technology used to deliver and implement these solutions is


hardware virtualization: one or more virtual machines opportunely configured
and interconnected define the distributed system on top of which applications
are installed and deployed.

● Virtual machines also constitute the atomic components that are deployed
and priced according to the specific features of the virtual hardware:
memory, number of processors and disk storage.

● IaaS/HaaS solutions bring all the benefits of hardware virtualization: workload


partitioning, application isolation, sandboxing and hardware tuning.
● From the perspective of the service provider, IaaS/HaaS allows better
exploiting the IT infrastructure and provides a more secure environment
where executing third party applications.

● From the perspective of the customer, it reduces the administration and


maintenance cost as well as the capital costs allocated to purchase
hardware.

● At the same time, users can take advantage of the full customization offered
by virtualization to deploy their infrastructure in the cloud.

3.4.2 PaaS

● Platform as a Service (PaaS) solutions provide a development and


deployment platform for running applications in the cloud.

● They constitute the middleware on top of which applications are built.

● A general overview of the features characterizing the PaaS approach is given


in Figure 3.12.

Web based Management Interface Programming API /


Libraries

PaaS Core middleware


Elasticity Runtime Resource
QoS, Billing Application User

Physical Infrastructure IaaS


Data Center Cluster Desktop
Cloud

Figure 3.12 PaaS reference implementation

● Application management is the core functionality of the middleware.


● PaaS implementations provide applications with a runtime environment and
do not expose any service for managing the underlying infrastructure.

● They automate the process of deploying applications to the infrastructure,


configuring application components, provisioning and configuring supporting
technologies such as load balancers and databases and managing system
change based on policies set by the user.

● The core middleware is in charge of managing the resources and scaling


applications on demand or automatically, according to the commitments
made with users.

● From a user point of view, the core middleware exposes interfaces that allow
programming and deploying applications on the cloud.

● Some implementations provide a completely Web based interface hosted in


the cloud and offering a variety of services.

● It is possible to find integrated developed environments based on 4GL and


visual programming concepts or rapid prototyping environments where
applications are built by assembling mash ups and user defined components
and successively customized.

● Other implementations of the PaaS model provide a complete object model


for representing an application and provide a programming language-based
approach.

● Developers generally have the full power of programming languages such as


Java,
.NET, Python and Ruby with some restrictions to provide better scalability and
security.

● PaaS solutions can offer middleware for developing applications together with
the infrastructure or simply provide users with the software that is installed
on the user premises.
● In the first case, the PaaS provider also owns large datacenters where
applications are executed

● In the second case, referred to in this book as Pure PaaS, the middleware
constitutes the core value of the offering.

● PaaS implementation classified into three wide categories:

○ PaaS-I
○ PaaS-II
○ PaaS-III

● The first category identifies PaaS implementations that completely


follow the cloud computing style for application development and
deployment.

○ They offer an integrated development environment hosted within the


Web browser where applications are designed, developed, composed,
and deployed.
○ This is the case of Force.com and Longjump. Both deliver as platforms
the combination of middleware and infrastructure.

● In the second class focused on providing a scalable infrastructure for Web


application, mostly websites.

○ In this case, developers generally use the provider’s APIs, which are
built on top of industrial runtimes, to develop applications.
○ Google AppEngine is the most popular product in this category.
○ It provides a scalable runtime based on the Java and Python
programming languages, which have been modified for providing a
secure runtime environment and enriched with additional APIs and
components to support scalability.
○ AppScale, an open source implementation of Google AppEngine,
provides interfacecompatible middleware that has to be installed on a
physical infrastructure.

● The third category consists of all those solutions that provide a cloud
programming platform for any kind of application, not only Web
applications.

○ Among these, the most popular is Microsoft Windows Azure, which


provides a comprehensive framework for building service oriented
cloud applications on top of the .NET technology, hosted on Microsoft’s
datacenters.
○ Other solutions in the same category, such as Manjrasoft Aneka,
Apprenda SaaSGrid, Appistry Cloud IQ Platform, DataSynapse, and
GigaSpaces DataGrid, provide only middleware with different
services.

● Some essential characteristics that identify a PaaS solution:

○ Runtime framework: This framework represents the software stack of


the PaaS model and the most intuitive aspect that comes to people’s
minds when they refer to PaaS solutions.
○ Abstraction: PaaS solutions are distinguished by the higher level of
abstraction that they provide.
○ Automation: PaaS environments automate the process of deploying
applications to the infrastructure, scaling them by provisioning
additional resources when needed.
○ Cloud services: PaaS offerings provide developers and architects with
services and APIs, helping them to simplify the creation and delivery of
elastic and highly available cloud application.

3.4.3 SaaS

● Software as a Service (SaaS) is a software delivery model that provides


access to applications through the Internet as a Web based service.
● It provides a means to free users from complex hardware and software
management by offloading such tasks to third parties, which build
applications accessible to multiple users through a Web browser.

● On the provider side, the specific details and features of each customer’s
application are maintained in the infrastructure and made available on
demand.

● The SaaS model is appealing for applications serving a wide range of users
and that can be adapted to specific needs with little further customization.

● This requirement characterizes SaaS as a one-to-many software delivery


model, whereby an application is shared across multiple users.

● This is the case of CRM and ERP applications that constitute common
needs for almost all enterprises, from small to medium-sized and large
business.

● Every enterprise will have the same requirements for the basic features
concerning CRM and ERP and different needs can be satisfied with further
customization.

● SaaS applications are naturally multitenant.

● Multitenancy, which is a feature of SaaS compared to traditional packaged


software, allows providers to centralize and sustain the effort of managing
large hardware infrastructures, maintaining as well as upgrading applications
transparently to the users and optimizing resources by sharing the costs
among the large user base.

● On the customer side, such costs constitute a minimal fraction of the usage
fee paid for the software.

● The analysis carried out by Software Information and Industry Association


(SIIA) was mainly oriented to cover application service providers (ASPs) and
all their variations,
which capture the concept of software applications consumed as a service
in a broader sense.

● ASPs already had some of the core characteristics of SaaS:

○ The product sold to customer is application access


○ The application is centrally managed
○ The service delivered is one-to-many
○ The service delivered is an integrated solution delivered on the
contract, which means provided as promised.

● ASPs provided access to packaged software solutions that addressed the


needs of a variety of customers.

● Initially this approach was affordable for service providers, but it later
became inconvenient when the cost of customizations and specializations
increased.

● The SaaS approach introduces a more flexible way of delivering application


services that are fully customizable by the user by integrating new services,
injecting their own components and designing the application and
information workflows.

● Initially the SaaS model was of interest only for lead users and early adopters.

● The benefits delivered at that stage were the following:

○ Software cost reduction and total cost of ownership (TCO) were


paramount
○ Service level improvements
○ Rapid implementation
○ Standalone and configurable applications
○ Rudimentary application and data integration
○ Subscription and pay as you go (PAYG) pricing
● With the advent of cloud computing there has been an increasing
acceptance of SaaS as a viable software delivery model.

● This lead to transition into SaaS 2.0, which does not introduce a new
technology but transforms the way in which SaaS is used.

● In particular, SaaS 2.0 is focused on providing a more robust


infrastructure and application platforms driven by SLAs.

● SaaS 2.0 will focus on the rapid achievement of business objectives.

● Software as a Service based applications can serve different needs. CRM,


ERP, and social networking applications are definitely the most popular
ones.

● SalesForce.com is probably the most successful and popular example of a CRM


service.

● It provides a wide range of services for applications: customer relationship


and human resource management, enterprise resource planning, and
many other features.

● SalesForce.com builds on top of the Force.com platform, which provides a


fully featured environment for building applications.

● In particular, through AppExchange customers can publish, search and


integrate new services and features into their existing applications.

● This makes SalesForce.com applications completely extensible and


customizable.

● Similar solutions are offered by NetSuite and RightNow.

● NetSuite is an integrated software business suite featuring financials, CRM,


inventory, and ecommerce functionalities integrated all together.
● RightNow is customer experience centered SaaS application that integrates
together different features, from chat to Web communities, to support the
common activity of an enterprise

● Another important class of popular SaaS applications comprises social


networking applications such as Facebook and professional networking sites
such as LinkedIn.

● Other than providing the basic features of networking, they allow


incorporating and extending their capabilities by integrating third-party
applications.

● Office automation applications are also an important representative for SaaS


applications:

○ Google Documents and Zoho Office are examples of Web based


applications that aim to address all user needs for documents,
spreadsheets and presentation management.
○ These applications offer a Web based interface for creating, managing,
and modifying documents that can be easily shared among users and
made accessible from anywhere.

Explain the layered cloud architecture design?

It is possible to organize all the concrete realizations of cloud computing into a layered view covering the
entire, from hardware appliances to software systems.
All of the physical manifestations of cloud computing can be arranged into a layered picture that
encompasses anything from software systems to hardware appliances. Utilizing cloud resources can
provide the “computer horsepower” needed to deliver services. This layer is frequently done utilizing a
data center with dozens or even millions of stacked nodes. Because it can be constructed from a range of
resources, including clusters and even networked PCs, cloud infrastructure can be heterogeneous in
character. The infrastructure can also include database systems and other storage services.
The core middleware, whose goals are to create an optimal runtime environment for applications and to
best utilize resources, manages the physical infrastructure. Virtualization technologies are employed at the
bottom of the stack to ensure runtime environment modification, application isolation, sandboxing, and
service quality. At this level, hardware virtualization is most frequently utilized. The distributed
infrastructure is exposed as a collection of virtual computers via hypervisors, which control the pool of
available resources. By adopting virtual machine technology, it is feasible to precisely divide up hardware
resources like CPU and memory as well as virtualize particular devices to accommodate user and
application needs.
Layered Architecture of Cloud

Application Layer
1. The application layer, which is at the top of the stack, is where the actual cloud apps are located.
Cloud applications, as opposed to traditional applications, can take advantage of the automatic-
scaling functionality to gain greater performance, availability, and lower operational costs.
2. This layer consists of different Cloud Services which are used by cloud users. Users can access
these applications according to their needs. Applications are divided into Execution
layers and Application layers.
3. In order for an application to transfer data, the application layer determines whether
communication partners are available. Whether enough cloud resources are accessible for the
required communication is decided at the application layer. Applications must cooperate in order to
communicate, and an application layer is in charge of this.
4. The application layer, in particular, is responsible for processing IP traffic handling protocols like
Telnet and FTP. Other examples of application layer systems include web browsers, SNMP
protocols, HTTP protocols, or HTTPS, which is HTTP’s successor protocol.
Platform Layer
1. The operating system and application software make up this layer.
2. Users should be able to rely on the platform to provide them with Scalability, Dependability, and
Security Protection which gives users a space to create their apps, test operational processes, and
keep track of execution outcomes and performance. SaaS application implementation’s application
layer foundation.
3. The objective of this layer is to deploy applications directly on virtual machines.
4. Operating systems and application frameworks make up the platform layer, which is built on top of
the infrastructure layer. The platform layer’s goal is to lessen the difficulty of deploying
programmers directly into VM containers.
5. By way of illustration, Google App Engine functions at the platform layer to provide API support
for implementing storage, databases, and business logic of ordinary web apps.
Infrastructure Layer
1. It is a layer of virtualization where physical resources are divided into a collection of virtual
resources using virtualization technologies like Xen, KVM, and VMware.
2. This layer serves as the Central Hub of the Cloud Environment, where resources are
constantly added utilizing a variety of virtualization techniques.
3. A base upon which to create the platform layer. constructed using the virtualized network, storage,
and computing resources. Give users the flexibility they want.
4. Automated resource provisioning is made possible by virtualization, which also improves
infrastructure management.
5. The infrastructure layer sometimes referred to as the virtualization layer, partitions the physical
resources using virtualization technologies like Xen, KVM, Hyper-V, and VMware to create a
pool of compute and storage resources.
6. The infrastructure layer is crucial to cloud computing since virtualization technologies are the only
ones that can provide many vital capabilities, like dynamic resource assignment.
Datacenter Layer
 In a cloud environment, this layer is responsible for Managing Physical Resources such as
servers, switches, routers, power supplies, and cooling systems.
 Providing end users with services requires all resources to be available and managed in data
centers.
 Physical servers connect through high-speed devices such as routers and switches to the data
center.
 In software application designs, the division of business logic from the persistent data it
manipulates is well-established. This is due to the fact that the same data cannot be incorporated
into a single application because it can be used in numerous ways to support numerous use cases.
The requirement for this data to become a service has arisen with the introduction of
microservices.
 A single database used by many microservices creates a very close coupling. As a result, it is hard
to deploy new or emerging services separately if such services need database modifications that
may have an impact on other services. A data layer containing many databases, each serving a
single microservice or perhaps a few closely related microservices, is needed to break complex
service interdependencies.
As per NIST give the definition of Infrastructure as a Service? And brief note on Cloud
storage.?

According to the National Institute of Standards and Technology (NIST), Infrastructure as a Service (IaaS) is
defined as:

"The capability provided to the consumer is to provision processing, storage, networks, and other fundamental
computing resources where the consumer is able to deploy and run arbitrary software, which can include
operating systems and applications. The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited
control of select networking components (e.g., host firewalls)."

In Cloud Computing, Cloud storage is a virtual locker where we can remotely stash any data. When we upload a file
to a cloud-based server like Google Drive, OneDrive, or iCloud that file gets copied over the Internet into a data
server that is cloud-based actual physical space where companies store files on multiple hard drives.

There are 3 types of storage systems in the Cloud as follows.

 Block-Based Storage System

 File-Based Storage System

 Object-Based Storage System

Let’s discuss it one by one as follows.

1. Block-Based Storage System –

 Hard drives are block-based storage systems. Your operating system like Windows or Linux actually sees a
hard disk drive. So, it sees a drive on which you can create a volume, and then you can partition that
volume and format them.

 For example, If a system has 1000 GB of volume, then we can partition it into 800 GB and 200 GB for local
C and local D drives respectively.

 Remember with a block-based storage system, your computer would see a drive, and then you can create
volumes and partitions

File-Based Storage System –

 In this, you are actually connecting through a Network Interface Card (NIC). You are going over a network,
and then you can access the network-attached storage server (NAS). NAS devices are file-based storage
systems.
 This storage server is another computing device that has another disk in it. It is already created a file
system so that it’s already formatted its partitions, and it will share its file systems over the network. Here,
you can actually map the drive to its network location.

 In this, like the previous one, there is no need to partition and format the volume by the user. It’s already
done in file-based storage systems. So, the operating system sees a file system that is mapped to a local
drive letter.

Object-Based Storage System –

 In this, a user uploads objects using a web browser and uploads an object to a container i.e., Object
Storage Container. This uses the HTTP Protocols with the rest of the APIs (for example: GET, PUT, POST,
SELECT, DELETE).

 For example, when you connect to any website, you need to download some images, text, or anything that
the website contains. For that, it is a code HTTP GET request. If you want to review any product then you
can use PUT and POST requests.

 Also, there is no hierarchy of objects in the container. Every file is on the same level in an Object-Based
storage system.

Advantages of Cloud Storage

 Scalability – Capacity and storage can be expanded and performance can be enhanced.

 Flexibility – Data can be manipulated and scaled according to the rules.

 Simpler Data Migrations – As it can add and remove new and old data when required and eliminates
disruptive data migrations.

 Recovery -In the event of a hard drive failure or other hardware malfunction, you can access your files on
the cloud.

Disadvantages of Cloud Storage

 Data centers require electricity and proper internet facility to operate their work, failing which system
will not work properly.

 Support for cloud storage isn’t the best, especially if you are using a free version of a cloud provider.

 When you use a cloud provider, your data is no longer on your physical storage.

 Cloud-based storage is dependent on having an internet connection. If you are on a slow network you
may have issues accessing your storage.
Describe the cloud storage provide along with an example?

Cloud Storage Providers:

Cloud storage providers offer services that allow individuals and businesses to store data on remote servers
accessed via the internet. These providers maintain and manage the storage infrastructure, including
hardware, security, and network, allowing users to focus on accessing and managing their data without
worrying about the underlying infrastructure.

Key Features of Cloud Storage Providers:

1. Scalability: Users can easily scale their storage needs up or down based on demand.

2. Accessibility: Data can be accessed from anywhere with an internet connection, making it ideal for
remote work and global collaboration.

3. Reliability: Cloud storage providers typically offer high availability with data replication across multiple
locations, reducing the risk of data loss.

4. Security: Providers implement advanced security measures such as encryption, access control, and
authentication to protect data.

5. Cost-Effectiveness: Users pay for the storage they use, which can be more economical compared to
maintaining on-premises storage infrastructure.

Example of a Cloud Storage Provider:

Amazon Simple Storage Service (Amazon S3):

● The well known cloud storage service is Amazon’s Simple Storage Service
(S3), which is launched in 2006.

● Amazon S3 is designed to make web scale computing easier for developers.

● Amazon S3 provides a simple web services interface that can be used to


store and retrieve any amount of data, at any time, from anywhere on
the Web.

● It gives any developer access to the same highly scalable data storage
infrastructure that Amazon uses to run its own global network of web
sites.

● The service aims to maximize benefits of scale and to pass those


benefits on to developers.

● Amazon S3 is intentionally built with a minimal feature set that includes


the following functionality:

○ Write, read, and delete objects containing from 1 byte to 5


gigabytes of data each. The number of objects that can be stored
is unlimited.
○ Each object is stored and retrieved via a unique developer assigned key.
○ Objects can be made private or public and rights can be assigned
to specific users.
○ Uses standards based REST and SOAP interfaces designed to work
with any Internet development toolkit.
Design Requirements Amazon built S3 to fulfill the following design
requirements:

○ Scalable: Amazon S3 can scale in terms of storage, request rate


and users to support an unlimited number of web-scale
applications.
○ Reliable: Store data durably with 99.99 percent availability. Amazon
says it does not allow any downtime.
○ Fast: Amazon S3 was designed to be fast enough to support high-
performance applications. Server-side latency must be insignificant
relative to Internet latency.
○ Inexpensive: Amazon S3 is built from inexpensive commodity
hardware components.
○ Simple: Building highly scalable, reliable, fast and inexpensive storage is
difficult.

● Design Principles Amazon used the following principles of distributed


system design to meet Amazon S3 requirements:

○ Decentralization: It uses fully decentralized techniques to remove


scaling bottlenecks and single points of failure.
○ Autonomy: The system is designed such that individual components
can make decisions based on local information.
○ Local responsibility: Each individual component is responsible for
achieving its consistency. This is never the burden of its peers.
○ Controlled concurrency: Operations are designed such that no or
limited concurrency control is required.
○ Failure toleration: The system considers the failure of components to
be a normal mode of operation and continues operation with no or
minimal interruption.
○ Controlled parallelism: Abstractions used in the system are of such
granularity that parallelism can be used to improve performance and
robustness of recovery or the introduction of new nodes.
○ Symmetry: Nodes in the system are identical in terms of functionality,
and require no or minimal node specific configuration to function.
○ Simplicity: The system should be made as simple as possible, but no
simpler.

● Amazon keeps its lips pretty tight about how S3 works, but according to
Amazon, S3’s design aims to provide scalability, high availability, and low
latency at commodity costs.
● S3 stores arbitrary objects at up to 5GB in size, and each is accompanied by
up to 2KB of metadata.

● Objects are organized by buckets.

● Each bucket is owned by an AWS account and the buckets are identified
by a unique user assigned key.

● Buckets and objects are created, listed and retrieved using either a
REST or SOAP interface.

● Objects can also be retrieved using the HTTP GET interface or via BitTorrent.

● An access control list restricts who can access the data in each bucket.

● Bucket names and keys are formulated so that they can be accessed using
HTTP.

● Requests are authorized using an access control list associated with each
bucket and object, for instance:
http://s3.amazonaws.com/samplebucket/samplekey

● The Amazon AWS Authentication tools allow the bucket owner to


create an authenticated URL with a set amount of time that the
URL will be valid.

● Bucket items can also be accessed via a BitTorrent feed, enabling S3 to


act as a seed for the client.

● Buckets can also be set up to save HTTP log information to another bucket.

● This information can be used for later data mining.


Discuss the challenges of architectural design?
Challenge 1 : Service Availability and Data Lock-in Problem
Service Availability
Service Availability in Cloud might be affected because of
Single Point Failure
Distributed Denial of Service
Single Point Failure
Depending on single service provider might result in failure.
In case of single service providers, even if company has multiple data centres located in different
geographic regions, it may have common software infrastructure and accounting systems.
Solution:
Multiple cloud providers may provide more protection from failures and they provide High Availability
(HA).
Multiple cloud Providers will rescue the loss of all data.

Distributed Denial of service (DDoS) attacks.


 Cyber criminals, attack target websites and online services and makes services unavailable to
users.
 DDoS tries to overwhelm (disturb) the services unavailable to user by having more traffic
than the server or network can accommodate.
Solution:
Some SaaS providers provide the opportunity to defend against DDoS attacks by using quick scale-ups.
Customers cannot easily extract their data and programs from one site to run on another.
Solution:
Have standardization among service providers so that customers can deploy (install) services and data
across multiple cloud providers.

Data Lock-in
is a situation in which a customer using service of a provider cannot be moved to another service provider
because technologies used by a provider will be incompatible with other providers?
This makes a customer dependent on a vendor for services and makes customer unable to use service of
another vendor.
Solution:
Have standardization (in technologies) among service providers so that customers can easily move from a
service provider to another.

Challenge 2: Data Privacy and Security Concerns


Cloud services are prone to attacks because they are accessed through internet.
Security is given by
o Storing the encrypted data in to cloud.
o Firewalls, filters.
Cloud environment attacks include
o Guest hopping
o Hijacking
o VM rootkits.
Guest Hopping: Virtual machine hyper jumping (VM jumping) is an attack method that exploits (make
use of) hypervisor’s weakness that allows a virtual machine (VM) to be accessed from another.
Hijacking: Hijacking is a type of network security attack in which the attacker takes control of a
communication VM Rootkit: is a collection of malicious (harmful) computer software, designed to enable
access to a computer that is not otherwise allowed.
A man-in-the-middle (MITM) attack is a form of eavesdroppping(Spy) where communication between
two users is monitored and modified by an unauthorized party.
o Man-in-the-middle attack may take place during VM migrations [virtual machine (VM) migration - VM
is moved from one physical host to another host].
Passive attacks steal sensitive data or passwords.
Active attacks may manipulate (control) kernel data structures which will cause major damage to cloud
servers.
Challenge 3: Unpredictable Performance and Bottlenecks
Multiple VMs can share CPUs and main memory in cloud computing, but I/O sharing is problematic.
Internet applications continue to become more data-intensive (handles huge amount of data).
Handling huge amount of data (data intensive) is a bottleneck in cloud environment.
Weak Servers that does not provide data transfers properly must be removed from cloud environment.
Challenge 4: Distributed Storage and Widespread Software Bugs
The database is always growing in cloud applications.
There is a need to create a storage system that meets this growth.
This demands the design of efficient distributed SANs (Storage Area Network of Storage devices).
Data centres must meet
o Scalability
o Data durability
o HA(High Availability)
o Data consistence
Bug refers to errors in software.
Debugging must be done in data centres.

Challenge 5: Cloud Scalability, Interoperability and Standardization


Cloud Scalability
Cloud resources are scalable. Cost increases when storage and network bandwidth scaled(increased)
Interoperability
Open Virtualization Format (OVF) describes an open, secure, portable, efficient, and extensible format for
the packaging and distribution of VMs.
OVF defines a transport mechanism for VM, that can be applied to different virtualization platforms
Standardization
Cloud standardization, should have ability for virtual machine to run on any virtual platform.

Challenge 6: Software Licensing and Reputation Sharing


Cloud providers can use both pay-for-use and bulk-use licensing schemes to widen the business coverage.
Cloud providers must create reputation-guarding services similar to the “trusted e-mail” services
Cloud providers want legal liability to remain with the customer, and vice versa.
Describe public, private and hybrid clouds

1. Public Cloud:
Definition: The public cloud is a cloud computing model where services and infrastructure are provided
over the internet by third-party providers. These services are available to the general public, businesses,
and other organizations on a pay-as-you-go basis.
Key Features:
 Shared Resources: Infrastructure and resources are shared among multiple users (tenants),
making it cost-effective.
 Scalability: Public clouds offer virtually unlimited scalability, allowing users to scale up or down
quickly based on demand.
 Accessibility: Resources are accessible over the internet, from anywhere, and on any device.
 Cost-Efficiency: Users only pay for the resources they consume, avoiding the cost of maintaining
on-premises hardware.
Examples:
 Amazon Web Services (AWS)
 Microsoft Azure
 Google Cloud Platform (GCP)
2. Private Cloud:
Definition: A private cloud is a cloud computing model where the infrastructure and services are
dedicated to a single organization. The cloud environment can be managed internally by the organization
or externally by a third-party provider but is not shared with other users.
Key Features:
 Dedicated Resources: The infrastructure is dedicated to a single organization, providing higher
levels of control and security.
 Customization: Private clouds can be tailored to meet the specific needs of an organization,
including security, compliance, and performance requirements.
 Security and Compliance: Enhanced security features and the ability to meet strict regulatory
compliance needs make private clouds ideal for sensitive data.
 Control: Organizations have full control over their infrastructure, including hardware, software,
and network configurations.
Examples:
 VMware vCloud
 Microsoft Azure Stack
 OpenStack Private Cloud

3. Hybrid Cloud:
Definition: A hybrid cloud is a computing environment that combines public and private clouds, allowing
data and applications to be shared between them. This model provides the flexibility of multiple
deployment methods and optimizes workloads across various environments.
Key Features:
 Flexibility: Workloads can be moved between public and private clouds based on business needs,
cost, and security requirements.
 Optimized Resource Use: Hybrid clouds allow organizations to use public clouds for non-
sensitive operations and private clouds for critical workloads.
 Scalability: Organizations can scale their on-premises infrastructure with the resources of the
public cloud when needed, avoiding over-provisioning.
 Cost Management: Hybrid clouds enable organizations to use the cost-effective public cloud for
general purposes while maintaining critical operations in the private cloud.
Examples:
 Microsoft Azure with Azure Stack
 AWS Outposts
 Google Anthos
UNIT-4

Discuss the security features in Cloud Computing?

Security in cloud computing is a critical concern, given the shared nature of cloud environments and the
sensitivity of data being stored and processed. Cloud providers implement a variety of security features to
protect data, applications, and infrastructure from potential threats. Here’s an overview of the key security
features in cloud computing:
1. Data Encryption:
Definition: Encryption is the process of converting data into a coded format that is unreadable to
unauthorized users. In cloud computing, encryption is used to protect data both at rest (stored data) and in
transit (data being transmitted over the network).
Key Points:
 At-Rest Encryption: Data stored in the cloud is encrypted using algorithms like AES-256 to
prevent unauthorized access if the storage medium is compromised.
 In-Transit Encryption: Data transmitted between the cloud and the user or between different parts
of the cloud infrastructure is encrypted using protocols like SSL/TLS, ensuring data remains
confidential during transmission.
 Key Management: Cloud providers often offer key management services (KMS) to help users
manage their encryption keys securely.
2. Identity and Access Management (IAM):
Definition: IAM is a framework of policies and technologies for ensuring that the right individuals have
the appropriate access to technology resources.
Key Points:
 User Authentication: Strong authentication mechanisms, such as multi-factor authentication
(MFA), are used to verify the identity of users before granting access.
 Role-Based Access Control (RBAC): Access to cloud resources is granted based on the roles of
users, ensuring that only authorized personnel can access sensitive data or perform certain actions.
 Single Sign-On (SSO): Allows users to log in with a single set of credentials across multiple
applications, reducing the risk of password-related security breaches.
3. Firewalls and Network Security:
Definition: Firewalls and network security mechanisms are used to control and monitor incoming and
outgoing network traffic based on predetermined security rules.
Key Points:
 Virtual Firewalls: Cloud providers offer virtual firewalls to protect cloud workloads from
unauthorized access, similar to traditional hardware firewalls used in on-premises networks.
 Network Segmentation: By segmenting the network into different zones (e.g., public and private
subnets), cloud providers can limit the exposure of critical resources to potential threats.
 Intrusion Detection and Prevention Systems (IDPS): These systems monitor cloud networks for
suspicious activities or policy violations and can automatically block or alert administrators to
potential threats.
4. Security Information and Event Management (SIEM):
Definition: SIEM tools collect and analyze security data from across the cloud infrastructure, providing
real-time visibility and enabling quick response to security incidents.
Key Points:
 Log Management: SIEM tools aggregate logs from various cloud services and components,
making it easier to detect and investigate security incidents.
 Anomaly Detection: Advanced SIEM systems use machine learning to identify unusual patterns or
behaviors that might indicate a security threat.
 Incident Response: SIEM tools can automate incident response actions, such as isolating
compromised resources or notifying administrators of potential breaches.
5. Data Loss Prevention (DLP):
Definition: DLP technologies help prevent the unauthorized transfer of sensitive data, ensuring that
critical information does not leave the cloud environment without proper authorization.
Key Points:
 Content Inspection: DLP systems inspect data moving in and out of the cloud for sensitive
information, such as credit card numbers or personal identifiable information (PII).
 Policy Enforcement: DLP tools enforce policies that restrict the sharing, downloading, or copying
of sensitive data to unauthorized locations or users.
 Alerting and Reporting: When a potential data breach is detected, DLP systems can alert
administrators and provide detailed reports for investigation.
6. Compliance and Regulatory Security:
Definition: Cloud providers offer features and tools to help organizations meet compliance and regulatory
requirements for data security, privacy, and reporting.
Key Points:
 Compliance Certifications: Many cloud providers are certified under various standards, such as
ISO/IEC 27001, SOC 2, GDPR, and HIPAA, ensuring that their security practices meet global
regulatory standards.
 Auditing and Reporting: Cloud providers offer tools for tracking access and changes to data and
resources, helping organizations demonstrate compliance during audits.
 Data Residency: Some cloud providers offer options for data residency, allowing organizations to
store data in specific geographic regions to meet local regulatory requirements.
7. Backup and Disaster Recovery:
Definition: Backup and disaster recovery features ensure that data and applications can be restored
quickly in the event of a security incident, such as a data breach, ransomware attack, or hardware failure.
Key Points:
 Automated Backups: Cloud providers offer automated backup services to regularly back up data
and applications, ensuring that the latest versions can be restored if needed.
 Redundancy: Data is often stored across multiple locations and servers to prevent data loss due to
hardware failures or natural disasters.
 Disaster Recovery Plans: Cloud providers often offer disaster recovery as a service (DRaaS),
enabling quick recovery of IT infrastructure after a catastrophic event.
8. Physical Security:
Definition: Physical security refers to the protection of the cloud provider’s data centers from physical
threats, such as unauthorized access, natural disasters, or theft.
Key Points:
 Access Controls: Data centers are equipped with strict access controls, including biometric
scanners, security personnel, and surveillance cameras.
 Environmental Controls: Data centers are designed to withstand natural disasters, with features
such as redundant power supplies, fire suppression systems, and climate control.
 Geographic Distribution: Cloud providers often distribute data across multiple geographically
diverse locations to ensure data availability and durability.

Explain IAM in detail


Identity and Access Management (IAM) is a framework of policies, processes, and technologies that
ensures the right individuals have the appropriate access to technology resources in an organization. IAM
is critical in cloud computing because it helps manage who has access to what resources, under what
conditions, and enforces policies that protect sensitive data and systems.
Key Components of IAM
1. Identity Management:
o User Identities: In IAM, user identities are managed to ensure that each user has a unique
identity, which can be authenticated and authorized to access resources. This includes not
only employees but also contractors, partners, and customers.
o User Directories: IAM systems often integrate with directories such as Active Directory
(AD) or Lightweight Directory Access Protocol (LDAP) to manage user identities
centrally.
2. Authentication:
o Single Sign-On (SSO): SSO allows users to log in once and gain access to multiple systems
without needing to re-authenticate. This streamlines access while reducing the need for
multiple passwords.
o Multi-Factor Authentication (MFA): MFA adds an extra layer of security by requiring users
to provide two or more verification factors to gain access. These factors can include
something they know (password), something they have (smartphone), or something they
are (fingerprint).
o Password Management: IAM systems enforce strong password policies, such as complexity
requirements and regular password changes, to prevent unauthorized access.
3. Authorization:
o Role-Based Access Control (RBAC): RBAC assigns access permissions based on a user's
role within the organization. For example, a finance officer may have access to financial
systems, while an IT administrator has access to technical systems.
o Attribute-Based Access Control (ABAC): ABAC allows access decisions based on
attributes (e.g., department, location, time of access). This provides more granular control
than RBAC.
o Policy Management: IAM allows organizations to create and enforce access policies,
ensuring that only authorized users can perform specific actions on resources.
4. Access Management:
o Provisioning and De-Provisioning: IAM automates the process of granting and revoking
access to resources as users join, move within, or leave an organization. This ensures that
users have the right access at the right time.
o Just-In-Time (JIT) Access: JIT access grants users temporary access to resources, which is
automatically revoked after a certain period. This is useful for temporary projects or
external collaborators.
o Delegated Access: IAM systems can allow certain users to delegate access rights to others,
typically with restrictions on the scope and duration of the access.
5. Monitoring and Auditing:
o Access Logs: IAM systems generate logs of user access, detailing who accessed what
resource, when, and what actions were taken. These logs are crucial for auditing and
forensic investigations.
o Anomaly Detection: Advanced IAM systems can detect unusual patterns of access that may
indicate a security breach, such as multiple failed login attempts or access from unusual
locations.
o Compliance Reporting: IAM tools help organizations generate reports to demonstrate
compliance with regulatory requirements, such as GDPR, HIPAA, or SOX.
IAM in Cloud Computing
In cloud environments, IAM plays an essential role in managing access to cloud resources. Here’s how
IAM operates in some popular cloud platforms:
1. AWS Identity and Access Management (AWS IAM):
o User and Group Management: AWS IAM allows you to create users and groups, assign
permissions using policies, and control access to AWS services and resources.
o Roles: AWS IAM roles allow you to delegate access to resources across AWS accounts or
provide temporary access to services without needing to share long-term credentials.
o Policies: AWS IAM uses JSON-based policies to define permissions. These policies can be
attached to users, groups, or roles to control what actions can be performed on which
resources.
2. Azure Active Directory (Azure AD):
o Enterprise Identity Service: Azure AD provides identity management for Azure resources
and integrates with on-premises Active Directory.
o Conditional Access: Azure AD Conditional Access provides access control based on
conditions such as user location, device state, or risk level.
o Identity Protection: Azure AD includes tools to detect and respond to identity-based risks,
such as suspicious sign-ins or compromised accounts.
3. Google Cloud Identity and Access Management (Google Cloud IAM):
o Fine-Grained Control: Google Cloud IAM offers fine-grained access control by allowing
you to grant specific permissions to users for specific Google Cloud resources.
o Service Accounts: Google Cloud uses service accounts to provide identities for services
and applications, allowing them to authenticate and interact with other Google Cloud
services.
o Custom Roles: Google Cloud IAM allows you to create custom roles tailored to your
organization’s needs, providing flexibility in how permissions are assigned.
Best Practices for Implementing IAM
1. Principle of Least Privilege:
o Ensure that users have only the permissions they need to perform their job functions. This
reduces the risk of unauthorized access or actions.
2. Regular Audits and Reviews:
o Conduct regular reviews of user access rights and audit logs to ensure compliance with
policies and identify any unauthorized access.
3. Use Multi-Factor Authentication:
o Implement MFA wherever possible to add an additional layer of security, especially for
access to sensitive resources.
4. Automate Provisioning and De-Provisioning:
o Automate the process of granting and revoking access to ensure that users have the correct
access at all times and that access is promptly removed when no longer needed.
5. Monitor and Respond to Anomalies:
o Use monitoring tools to detect unusual access patterns and respond quickly to potential
security incidents.

Explain about the security standard in detail

● Security standards define the processes, procedures, and practices necessary


for implementing a security program.

● These standards also apply to cloud related IT activities and include specific
steps that should be taken to ensure a secure environment is maintained that
provides privacy and security of confidential information in a cloud
environment.

● Security standards are based on a set of key principles intended to protect


this type of trusted environment.
There are 3 types of standards for security that are defined
1. SAML(Security Assertion Markup Language)

2. OAuth(Open Authentication)
3. OpenID

SAML(Security Assertion Markup Language)


SAML is the underlying technology that allows people to sign in once using one set of credentials and
access multiple applications. Identity providers, like Microsoft Entra ID, verify users when they sign in,
and then use SAML to pass that authentication data to the service provider that runs the site, service, or
app that the users wish to access.
SAML helps strengthen security for businesses and simplify the sign-in process for employees, partners,
and customers. Organizations use it to enable single sign-on, which allows people to use one username
and password to access multiple sites, services, and apps. Decreasing the number of passwords that people
must memorize is not only easier for them, but it also reduces the risk that one of those passwords will be
stolen. Organizations can also set security standards for authentications across their SAML-enabled apps.
SAML Provider:
A SAML provider is a system that shares identity authentication and authorization data with other
providers. There are two types of SAML providers:
 Identity providers authenticate and authorize users. They provide the sign-in page where people
enter their credentials. They also enforce security policies, such as by requiring multifactor
authentication or a password reset. Once the user is authorized, identity providers pass the data to
service providers.

 Service providers are the apps and websites that people want to access. Instead of requiring
people to sign into their apps individually, service providers configure their solutions to trust
SAML authorization and rely on the identity providers to verify identities and authorize access.

SAML assertion is the XML document containing data that confirms to the service provider that the
person who is signing in has been authenticated.
There are three types:
 Authentication assertion identifies the user and includes the time the person signed-in and the
type of authentication they used, such as a password or multifactor authentication.

 Attribution assertion passes the SAML token to the provider. This assertion includes specific data
about the user.

 An authorization decision assertion tells the service provider whether the user is authenticated or
if they are denied either because of an issue with their credentials or because they don’t have
permissions for that service.
How does SAML authentication work?
In SAML authentication, service providers and identity providers share sign-in and user data to confirm
that each person who requests access is authenticated. It typically follows the following steps:
1. An employee begins work by signing in using the login page provided by the identity provider.

2. The identity provider validates that the employee is who they say they are by confirming a
combination of authentication details, such as username, password, PIN, device, or biometric data.

3. The employee launches a service provider app, such as Microsoft Word or Workday.

4. The service provider communicates with the identity provider to confirm that that the employee is
authorized to access that app.
5. The identity providers send authorization and authentication back.

6. The employee accesses the app without signing in a second time.

OAuth(Open Authentication)
OAuth (Open Authorization) is an open standard for access delegation, commonly used as a way for users
to grant websites or applications limited access to their information without exposing their passwords. It is
primarily used to enable third-party applications to access user data stored on another platform securely.
Key Concepts of OAuth:
1. Resource Owner:
o The user who owns the data or resources that are being accessed.
2. Client:
o The application requesting access to the user's resources. This could be a mobile app, a web
application, or any other type of software that needs access to the user’s information.
3. Authorization Server:
o The server that issues the access token after authenticating the user and obtaining
authorization. This server typically belongs to the service provider that hosts the user’s
data.
4. Resource Server:
o The server that hosts the protected user resources (e.g., data or services) and is capable of
accepting and responding to requests for protected resources using access tokens. This is
often the same as the authorization server.
5. Access Token:
o A token that represents the authorization granted by the resource owner to the client. It is
used by the client to access the protected resources on behalf of the resource owner.
How OAuth Works:
1. Request Authorization:
o The client application requests permission from the user to access certain data. This usually
involves redirecting the user to the authorization server, where they log in and approve the
request.
2. Authorization Grant:
o If the user consents, the authorization server grants an authorization code to the client. This
code is short-lived and can be exchanged for an access token.
3. Exchange Authorization Code for Access Token:
o The client sends the authorization code to the authorization server in exchange for an
access token.
4. Access Resources:
o The client uses the access token to make API requests to the resource server on behalf of
the user. The resource server verifies the token and, if valid, grants access to the requested
resources.
5. Token Expiry and Refresh:
o Access tokens are typically short-lived for security reasons. When a token expires, the
client can use a refresh token (if one was issued) to request a new access token without
requiring the user to log in again.

OpenID :

OpenID is an open standard and decentralized authentication protocol that allows users to log in to
multiple websites or applications using a single set of credentials, managed by an OpenID
provider. Instead of creating separate usernames and passwords for each website, users can
authenticate with one identity provider (like Google or Facebook) and use that authentication
across various platforms.

Key Concepts of OpenID:

1. OpenID Provider (OP):

o The service that authenticates the user and provides their OpenID identity. Examples
include Google, Facebook, and other major identity providers.

2. Relying Party (RP):

o The website or application that accepts OpenID for authentication. The RP trusts the
OpenID provider to verify the user's identity.

3. User (End-User):

o The individual who wants to authenticate using their OpenID identity.

4. OpenID Identifier:

o A unique identifier (usually a URL) associated with the user’s OpenID account. This
identifier is provided by the OpenID provider and is used across different relying parties.

How OpenID Works:

1. User Chooses OpenID Provider:

o The user selects an OpenID provider (e.g., Google) and logs into that provider's service.

2. User Visits a Relying Party:

o When the user wants to log in to a website (the relying party) that supports OpenID, they
choose to sign in using OpenID.
3. Redirection to OpenID Provider:

o The relying party redirects the user to the OpenID provider’s login page. If the user is not
already logged in, they will be prompted to do so.

4. Authentication and Authorization:

o The OpenID provider authenticates the user and, upon successful authentication, asks if
the user wants to share their identity with the relying party.

5. Token or Assertion is Returned:

o After the user approves, the OpenID provider sends a token or assertion back to the
relying party, confirming the user's identity.

6. Access Granted:

o The relying party receives the token and grants the user access based on the verified
identity.

Define resource provisioning and explain its methods?


Resource Provisioning is the process of bringing virtual and physical resources online. It has both a hands-
on component (racking and connecting devices) and a bootstrap component (configuring how the
resources boot into a “ready” state). Resource Provisioning happens when a cloud deployment is first
installed—i.e., an initial set of resources are provisioned—but also incrementally over time as new
resources are added, obsolete resources are removed, and out-of-date resources are upgraded.

1. Demand-Driven Resource Provisioning


2. Event-Driven Resource Provisioning
3. Popularity-Driven Resource Provisioning

Demand-Driven Resource Provisioning:


● This method adds or removes computing instances based on the current
utilization level of the allocated resources.
● The demand-driven method automatically allocates two Xeon processors for
the user application, when the user was using one Xeon processor more than
60 percent of the time for an extended period

● In general, when a resource has surpassed a threshold for a certain amount


of time, the scheme increases that resource based on demand.

● When a resource is below a threshold for a certain amount of time, that


resource could be decreased accordingly.

● Amazon implements such an auto-scale feature in its EC2 platform. This


method is easy to implement.

● The scheme does not work out right if the workload changes abruptly.

Event-Driven Resource Provisioning


● This scheme adds or removes machine instances based on a specific time
event.

● The scheme works better for seasonal or predicted events such as


Christmastime in the West and the Lunar New Year in the East.

● During these events, the number of users grows before the event period and
then decreases during the event period.

● This scheme anticipates peak traffic before it happens.

● The method results in a minimal loss of QoS, if the event is predicted correctly.

● Otherwise, wasted resources are even greater due to events that do not follow
a fixed pattern.
Popularity-Driven Resource Provisioning

● In this method, the Internet searches for popularity of certain applications


and creates the instances by popularity demand.

● The scheme anticipates increased traffic with popularity.

● Again, the scheme has a minimal loss of QoS, if the predicted popularity is
correct.

● Resources may be wasted if traffic does not occur as expected.

What are the different security challenges in cloud?Explain?


Cloud computing offers significant advantages, such as scalability, cost efficiency, and flexibility, but it
also introduces a range of security challenges that organizations must address to protect their data and
operations. Here are some of the key cloud security challenges:
1. Data Breaches:
 Risk: Cloud environments, by nature, are accessible over the internet, making them targets for
attackers seeking to steal sensitive data. Data breaches can lead to significant financial losses, legal
liabilities, and damage to reputation.
 Challenge: Protecting data at rest and in transit, managing encryption keys, and ensuring only
authorized access to data.
2. Data Loss:
 Risk: Data loss can occur due to accidental deletion, malicious attacks, or hardware failures. In
cloud environments, it’s also possible to lose data if access to the cloud provider's services is
interrupted.
 Challenge: Ensuring robust backup and recovery mechanisms, and maintaining data integrity
across distributed environments.
3. Insufficient Identity, Credential, and Access Management:
 Risk: Poorly managed identities and access controls can lead to unauthorized access to cloud
resources, resulting in data breaches or service disruptions.
 Challenge: Implementing strong identity and access management (IAM) practices, including multi-
factor authentication (MFA), role-based access control (RBAC), and regular auditing of access
rights.
4. Insecure APIs:
 Risk: Cloud services often expose APIs for interaction and integration, which can become attack
vectors if not properly secured. Insecure APIs can be exploited to bypass security controls, leading
to unauthorized access or data leakage.
 Challenge: Securing APIs through authentication, authorization, encryption, and regular testing for
vulnerabilities.
5. Misconfiguration:
 Risk: Misconfigured cloud resources, such as storage buckets or databases, are one of the most
common causes of cloud security incidents. Misconfigurations can inadvertently expose sensitive
data to the internet or create insecure entry points.
 Challenge: Implementing automated tools for configuration management, regular audits, and
adherence to security best practices and guidelines.
6. Compliance and Regulatory Challenges:
 Risk: Organizations must comply with various regulations depending on their industry and region
(e.g., GDPR, HIPAA). Storing data in the cloud can complicate compliance, especially when data
is distributed across multiple jurisdictions.
 Challenge: Ensuring that cloud providers comply with relevant regulations, maintaining data
residency, and regularly auditing cloud environments for compliance.
7. Insider Threats:
 Risk: Employees, contractors, or third parties with legitimate access to cloud resources may
intentionally or unintentionally compromise security. This risk is heightened in cloud environments
where access is often granted to a broad range of users.
 Challenge: Implementing strict access controls, monitoring user activity, and conducting regular
security awareness training.
8. Shared Responsibility Model:
 Risk: Cloud providers and customers share responsibility for security, but confusion about where
responsibilities lie can lead to security gaps. For instance, while the provider secures the cloud
infrastructure, customers are responsible for securing their data and applications within the cloud.
 Challenge: Clearly understanding and adhering to the shared responsibility model for each cloud
provider, and ensuring all security controls and policies are effectively implemented.
9. Lack of Visibility and Control:
 Risk: In cloud environments, organizations may lack full visibility into their infrastructure, making
it difficult to monitor and control what is happening within their cloud environments. This can lead
to undetected vulnerabilities or unauthorized activities.
 Challenge: Utilizing cloud security tools that provide visibility into cloud environments,
implementing logging and monitoring solutions, and using centralized management consoles.
10. Account Hijacking:
 Risk: If an attacker gains control of cloud user accounts, they can manipulate services, steal data,
or launch further attacks. Account hijacking can occur through phishing, credential theft, or
exploiting weak passwords.
 Challenge: Implementing strong password policies, multi-factor authentication (MFA), and
monitoring account activities for suspicious behavior.
Explain in detail about the security governance in cloud.
Def & Introduction
Cloud security governance is the set of policies, procedures, and controls that ensure the security and
compliance of cloud-based systems and data.
Cloud security governance involves managing risks and ensuring that security measures are in place to
protect against data breaches, cyber-attacks, and other security threats. It encompasses a range of
activities, including risk assessment, security policy development, access control, and monitoring and
reporting. By implementing effective cloud security governance, organizations can ensure the
confidentiality, integrity, and availability of their data and applications in the cloud.
Key Components of Cloud Security Governance
Cloud security governance is essential to ensure the protection of sensitive data and maintain regulatory
compliance. An effective cloud security governance framework includes several key components that
work together to create a comprehensive security plan.
The first component is risk assessment. This involves identifying potential security risks and
vulnerabilities and assessing their potential impact on the organization. The next component is security
policies and procedures. These policies and procedures should be documented and communicated to all
employees to ensure consistent security practices. The third component is access control. This involves
implementing measures to control access to sensitive data and systems. The fourth component is
monitoring and reporting. This involves monitoring security events and generating reports to identify
potential security threats. The final component is incident response. This involves having a plan in place
to respond to security incidents and minimize the impact on the organization. By implementing these key
components, organizations can establish a strong cloud security governance framework and protect their
sensitive data.
Common Challenges in Cloud Security Governance
Cloud computing has revolutionized the way businesses operate by providing them with a flexible and
scalable IT infrastructure. However, with this convenience comes the challenge of ensuring the security of
data and applications stored in the cloud. One of the common challenges in cloud security governance is
the lack of visibility and control over the cloud environment. This can lead to unauthorized access, data
breaches, and compliance issues. To overcome this challenge, businesses can implement a cloud security
strategy that includes continuous monitoring, access controls, and encryption.
Another challenge is the complexity of managing multiple cloud environments and ensuring consistency
in security policies across them. This can result in gaps in security coverage and increase the risk of cyber
attacks. To address this challenge, businesses can adopt a unified cloud security management platform that
provides a centralized view of all cloud environments and enables consistent security policies and
controls. Additionally, regular security assessments and audits can help identify any vulnerabilities and
ensure compliance with industry regulations. By addressing these challenges, businesses can leverage the
benefits of cloud computing while maintaining the security and integrity of their data and applications.
Role of Cloud Service Provider in Cloud Security Governance
Cloud service providers play a crucial role in ensuring the security of cloud computing systems. As the use
of cloud computing continues to grow, so does the need for effective security governance. This is where
cloud service providers come in, providing a range of security measures to protect data and ensure
compliance with regulations.
Cloud service providers offer a range of security services, including data encryption, access controls, and
network security. They also provide regular security updates and patches to ensure that systems are
protected against the latest threats. Additionally, cloud service providers often have teams of security
experts who monitor systems 24/7 and respond to any security incidents. By partnering with a cloud
service provider, organizations can ensure that their cloud systems are secure and compliant with
regulations, allowing them to focus on their core business activities.

Classify the various security standards in cloud?


Security standards define the processes, procedures, and practices necessary for implementing a
security program.

These standards also apply to cloud related IT activities and include specific steps that should be taken
to ensure a secure environment is maintained that provides privacy and security of confidential
information in a cloud environment.

Different Security Standards in Cloud are

1. SAML(Security Assertion Markup Language)


2. Open Authentication (OAuth)
3. OpenID

SAML

Security Assertion Markup Language (SAML) is an open standard for sharing security information
about identity, authentication and Authorization across different systems. SAML is implemented with
the Extensible Markup Language (XML) standard for sharing data. It provides a framework for
implementing single sign-on (SSO) and other federated identity systems. A federated identity system
links an individual identity to multiple identity domains. This approach enables SSO that encompasses
resources on an enterprise network, trusted third-party vendor and customer networks.
Organizations use SAML both for business-to-business and business-to-consumer applications. It is used
to share user credentials across one or more networked systems. The SAML framework is designed to
accomplish two things:
1. user authentication
2. user authorization
SAML is most often used to implement SSO authentication systems that enable end users to log in to their
networks once and be authorized to access multiple resources on that network. For example, SSO
implemented with Microsoft Active Directory (AD) can be integrated with SAML 2.0 authentication
requests.
Authentication is the process of determining whether an entity is what it claims to be. It is required before
authorization, which is the process of determining whether the authenticated identity has permission to use
a resource.
SAML authentication depends on verifying user credentials, which, at a minimum, include user identity
and password. SAML can also enable support for multifactor authentication.

SAML entities
SAML defines three categories of entities:
1. End users. An end user is a person who needs to be authenticated before being allowed to use an
application.
2. Service providers. A service provider is any system that provides services, typically the services
for which users seek authentication, including web or enterprise applications.
3. Identity providers. An identity provider is a special type of service provider that administers
identity information.
SAML components
SAML incorporates four different types of components:
1. SAML assertions are statements of identity, authentication and authorization information. These
are formatted using XML tags specified in SAML.
SAML specifies three types of assertions:
1. An authentication assertion indicates that the subject of the assertion has been authenticated. It
includes the time and method of authentication, as well as the subject being authenticated.
2. An attribute assertion associates the subject of the assertion with the specified attributes. A
specified SAML attribute is one that refers to a defined piece of information relating to the
authentication subject.
3. An authorization decision assertion indicates whether a subject's request to access a resource has
been approved or declined.
2. SAML protocols define how different entities request and respond to requests for security
information. Like SAML assertions, these protocols are encoded with XML tags specified in
SAML.

Authentication Request Protocol defines requests for authentication assertions and valid
responses to such requests. This protocol is used when a request sent from a user to a service
provider needs to be redirected to an identity provider.

Single Logout Protocol defines a technique in which all of a user's active sessions can be
terminated nearly simultaneously. This capability is important for SSO implementations that
require terminating sessions with multiple resources when the user logs out.

Assertion Query and Request Protocol defines requests for new and existing authentication
assertions

3. SAML bindings are the formats specified for SAML protocol messages to be embedded and
transported over different transmission mechanisms.
4. SAML profiles determine how SAML assertions, protocols and bindings are used together for
interoperability in certain applications.

Open Authentication (OAuth)

It is an open standard protocol for authorization of an application for using user information, in general, it
allows a third party application access to user related info like name, DOB, email or other required data
from an application like Facebook, Google etc. without giving the third party app the user password. It is
pronounced as oh-auth
You might have seen a “login with Google” or “login with Facebook” button on the login/signup page of
a website that makes easier to get using the service or website by simply logging into one of the services
and grant the client application permission to access your data without giving Password. This is done with
the OAuth.
It is designed to work with
HTTP(Hyper Text Transfer Protocol)
and it allows access tokens to be issued to the third party application by an authorization server with the
approval from the owner. There are 3 Components in
OAuth Mechanism

1. OAuth Provider – This is the OAuth provider Eg. Google, FaceBook etc.
2. OAuth Client – This is the website where we are sharing or authenticating the usage of our
information. Eg. GeeksforGeeks etc.
3. Owner – The user whose login authenticates sharing of information.
OAuth can be implemented via google console for “Login/Sign Up with Google” on a web app.
Pattern to be Followed

1. Get OAuth 2.0 Client ID from Google API Console
2. Next, Obtain an access token from the Google Authorization Server to access the API.
3. Send the request with the access token to an API .
4. Get Refresh token if longer access is required.

OpenID

OpenID Connect is an open specification for authentication and single sign-on...


Key components of OIDC
There are six primary components in OIDC:
 Authentication is the process of verifying that the user is who they say they are.

 A client is the software, such as website or application, that requests tokens that are used to
authenticate a user or access a resource.

 Relying parties are the applications that use OpenID providers to authenticate users.

 Identity tokens contain identity data including the outcome of the authentication process, an
identifier for the user, and information about how and when the user is authenticated.

 OpenID providers are the applications for which a user already has an account. Their role in
OIDC is to authenticate the user and pass that information on to the relying party.

 Users are people or services that seek to access an application without creating a new account or
providing a username and password.

OIDC authentication works by allowing users to sign in to one application and receive access to another.
For example, if a user wants to create an account at a news site, they may have an option to use Facebook
to create their account rather than creating a new account. If they choose Facebook, they are using OIDC
authentication. Facebook, which is referred to as the OpenID provider, handles the authentication process
and obtains the user’s consent to provide specific information, such as a user profile, to the news site,
which is the relying party.
ID tokens
The OpenID provider uses ID tokens to transmit authentication results and any pertinent information to
the relying party. Examples of the type of data that are sent include an ID, email address, and name.
Scopes
Scopes define what the user can do with their access. OIDC provides standard scopes, which define things
such as which relying party the token was generated for, when the token was generated, when the token
will expire, and the encryption strength used to authenticate the user.
A typical OIDC authentication process includes the following steps:
1. A user goes to the application they wish to access (the relying party).
2. The user types in their username and password.
3. The relying party sends a request to the OpenID provider.
4. The OpenID provider validates the user’s credentials and obtains authorization.
5. The OpenID provider sends an identity token and often an access token to the relying party.
6. The relying party sends the access token to the user’s device.
7. The user is given access based on the information provided in the access token and relying party.

UNIT -5
Define Hadoop? Explain its components along with its architecture?
Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to maintain and
store big size data. Hadoop works on MapReduce Programming Algorithm that was introduced by
Google. Today lots of Big Brand Companies are using Hadoop in their Organization to deal with big data,
eg. Facebook, Yahoo, Netflix, eBay, etc. The Hadoop Architecture Mainly consists of 4 components.

 MapReduce
 HDFS(Hadoop Distributed File System)
 YARN(Yet Another Resource Negotiator)
 Common Utilities or Hadoop Common
1. MapReduce
MapReduce nothing but just like an Algorithm or a data structure that is based on the YARN framework.
The major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster
which Makes Hadoop working so fast. When you are dealing with Big Data, serial processing is no more
of any use. MapReduce has mainly 2 tasks which are divided phase-wise:
In first phase, Map is utilized and in next phase Reduce is utilized.

Here, we can see that the Input is provided to the Map() function then it’s output is used as an input to the
Reduce function and after that, we receive our final output. Let’s understand What this Map() and
Reduce() does.
As we can see that an Input is provided to the Map(), now as we are using Big Data. The Input is a set of
Data. The Map() function here breaks this DataBlocks into Tuples that are nothing but a key-value pair.
These key-value pairs are now sent as input to the Reduce(). The Reduce() function then combines this
broken Tuples or key-value pair based on its Key value and form set of Tuples, and perform some
operation like sorting, summation type job, etc. which is then sent to the final Output Node. Finally, the
Output is Obtained.
The data processing is always done in Reducer depending upon the business requirement of that industry.
This is How First Map() and then Reduce is utilized one by one.
Let’s understand the Map Task and Reduce Task in detail.
Map Task:

 RecordReader The purpose of recordreader is to break the records. It is responsible for providing
key-value pairs in a Map() function. The key is actually is its locational information and value is
the data associated with it.
 Map: A map is nothing but a user-defined function whose work is to process the Tuples obtained
from record reader. The Map() function either does not generate any key-value pair or generate
multiple pairs of these tuples.
 Combiner: Combiner is used for grouping the data in the Map workflow. It is similar to a Local
reducer. The intermediate key-value that are generated in the Map is combined with the help of this
combiner. Using a combiner is not necessary as it is optional.
 Partitionar: Partitional is responsible for fetching key-value pairs generated in the Mapper
Phases. The partitioner generates the shards corresponding to each reducer. Hashcode of each key
is also fetched by this partition. Then partitioner performs it’s(Hashcode) modulus with the number
of reducers(key.hashcode()%(number of reducers)).
Reduce Task

 Shuffle and Sort: The Task of Reducer starts with this step, the process in which the Mapper
generates the intermediate key-value and transfers them to the Reducer task is known as Shuffling.
Using the Shuffling process the system can sort the data using its key value.
Once some of the Mapping tasks are done Shuffling begins that is why it is a faster process and does not
wait for the completion of the task performed by Mapper.
 Reduce: The main function or task of the Reduce is to gather the Tuple generated from Map and
then perform some sorting and aggregation sort of process on those key-value depending on its key
element.
 OutputFormat: Once all the operations are performed, the key-value pairs are written into the file
with the help of record writer, each record in a new line, and the key and value in a space-separated
manner.
2. HDFS
HDFS(Hadoop Distributed File System) is utilized for storage permission. It is mainly designed for
working on commodity Hardware devices(inexpensive devices), working on a distributed file system
design. HDFS is designed in such a way that it believes more in storing the data in a large chunk of blocks
rather than storing small data blocks.
HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the other devices
present in that Hadoop cluster. Data storage Nodes in HDFS.

 NameNode(Master)
 DataNode(Slave)
NameNode:NameNode works as a Master in a Hadoop cluster that guides the Datanode(Slaves).
Namenode is mainly used for storing the Metadata i.e. the data about the data. Meta Data can be the
transaction logs that keep track of the user’s activity in a Hadoop cluster.
DataNode:
DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a Hadoop cluster, the
number of DataNodes can be from 1 to 500 or even more than that. The more number of DataNode, the
Hadoop cluster will be able to store more data. So it is advised that the DataNode should have High
storing capacity to store a large number of file blocks
High Level Architecture Of Hadoop

File Block In HDFS: Data in HDFS is always stored in terms of blocks. So the single block of data is
divided into multiple blocks of size 128MB which is default and you can also change it manually.

Let’s understand this concept of breaking down of file in blocks with an example. Suppose you have
uploaded a file of 400MB to your HDFS then what happens is this file got divided into blocks of
128MB+128MB+128MB+16MB = 400MB size. Means 4 blocks are created each of 128MB except the
last one. Hadoop doesn’t know or it doesn’t care about what data is stored in these blocks so it considers
the final file blocks as a partial record as it does not have any idea regarding it.
Replication In HDFS Replication ensures the availability of the data. Replication is making a copy of
something and the number of times you make a copy of that particular thing can be expressed as it’s
Replication Factor.
Rack Awareness The rack is nothing but just the physical collection of nodes in our Hadoop cluster
(maybe 30 to 40). A large Hadoop cluster is consists of so many Racks . with the help of this Racks
information Namenode chooses the closest Datanode to achieve the maximum performance while
performing the read/write information which reduces the Network Traffic.

3. YARN(Yet Another Resource Negotiator)


YARN is a Framework on which MapReduce works. YARN performs 2 operations that are Job scheduling
and Resource Management. The Purpose of Job schedular is to divide a big task into small jobs so that
each job can be assigned to various slaves in a Hadoop cluster and Processing can be Maximized. Job
Scheduler also keeps track of which job is important, which job has more priority, dependencies between
the jobs and all the other information like job timing, etc. And the use of Resource Manager is to manage
all the resources that are made available for running a Hadoop cluster.
Features of YARN

 Multi-Tenancy
 Scalability
 Cluster-Utilization
 Compatibility

Explain the Four levels of federation with a detailed description?


● Federation is the ability for two XMPP servers in different domains to
exchange XML stanzas.

● According to the XEP-0238: XMPP Protocol Flows for Inter-Domain


Federation, there are at least four basic types of federation:

● Permissive federation

○ Permissive federation occurs when a server accepts a connection from


a peer network server without verifying its identity using DNS lookups
or certificate checking.
○ The lack of verification or authentication may lead to domain spoofing
(the unauthorized use of a third-party domain name in an email
message in order to pretend to be someone else), which opens the
door to widespread spam and other abuses. With the release of the
open source jabberd 1.2 server in October 2000, which included
support for the Server Dialback protocol (fully supported in Jabber
XCP), permissive federation met its demise on the XMPP network.

● Verified federation

○ This type of federation occurs when a server accepts a connection


from a peer after the identity of the peer has been verified.
○ It uses information obtained via DNS and by means of domain-
specific keys exchanged beforehand.
○ The connection is not encrypted, and the use of identity verification
effectively prevents domain spoofing.
○ To make this work, federation requires proper DNS setup and that is
still subject to DNS poisoning attacks.
○ Verified federation has been the default service policy on the open
XMPP since the release of the open-source jabberd 1.2 server.

● Encrypted federation

○ In this mode, a server accepts a connection from a peer if and only if


the peer supports Transport Layer Security (TLS) as defined for XMPP
in Request for Comments (RFC) 3920.
○ The peer must present a digital certificate.
○ The certificate may be self signed, but this prevents using mutual
authentication.
○ If this is the case, both parties proceed to weakly verify identity
using Server Dialback.
○ XEP-0220 defines the Server Dialback protocol, which is used
between XMPP servers to provide identity verification.
○ Server Dialback uses the DNS as the basis for verifying identity
○ The basic approach is that when a receiving server receives a server-
to-server connection request from an originating server, it does not
accept the request until it has verified a key with an authoritative
server for the domain asserted by the originating server.
○ Although Server Dialback does not provide strong authentication or
trusted federation, and although it is subject to DNS poisoning attacks,
it has effectively prevented most instances of address spoofing on the
XMPP network since its release in 2000.
○ This results in an encrypted connection with weak identity verification.

● Trusted federation

○ In this federation, a server accepts a connection from a peer only


under the stipulation that the peer supports TLS and the peer can
present a digital certificate issued by a root certification authority (CA)
that is trusted by the authenticating server.
○ The list of trusted root CAs may be determined by one or more factors,
such as the operating system, XMPP server software or local service
policy.
○ In trusted federation, the use of digital certificates results not only in a
channel encryption but also in strong authentication.
○ The use of trusted domain certificates effectively prevents DNS
poisoning attacks but makes federation more difficult, since such
certificates have traditionally not been easy to obtain.

Explain the concept of Federation in cloud ?


Cloud Federation, also known as Federated Cloud is the deployment and management of several external
and internal cloud computing services to match business needs. It is a multi-national cloud system that
integrates private, community, and public clouds into scalable computing platforms. Federated cloud is
created by connecting the cloud environment of different cloud providers using a common standard.

The architecture of Federated Cloud:


The architecture of Federated Cloud consists of three basic components:
1. Cloud Exchange
The Cloud Exchange acts as a mediator between cloud coordinator and cloud broker. The demands of the
cloud broker are mapped by the cloud exchange to the available services provided by the cloud
coordinator. The cloud exchange has a track record of what is the present cost, demand patterns, and
available cloud providers, and this information is periodically reformed by the cloud coordinator.
2. Cloud Coordinator
The cloud coordinator assigns the resources of the cloud to the remote users based on the quality of
service they demand and the credits they have in the cloud bank. The cloud enterprises and their
membership are managed by the cloud controller.
3. Cloud Broker
The cloud broker interacts with the cloud coordinator, analyzes the Service-level agreement and the
resources offered by several cloud providers in cloud exchange. Cloud broker finalizes the most suitable
deal for their client.
Benefits of Federated Cloud:
1. It minimizes the consumption of energy.
2. It increases reliability.
3. It minimizes the time and cost of providers due to dynamic scalability.
4. It connects various cloud service providers globally. The providers may buy and sell services on
demand.
5. It provides easy scaling up of resources.

Describe the Open stack along with its advantages and disadvantages ?
It is a free open standard cloud computing platform that first came into existence on July 21′ 2010. It was
a joint project of Rackspace Hosting and NASA to make cloud computing more ubiquitous in nature. It is
deployed as Infrastructure-as-a-service(IaaS) in both public and private clouds where virtual resources are
made available to the users. The software platform contains interrelated components that control multi-
vendor hardware pools of processing, storage, networking resources through a data center. In OpenStack,
the tools which are used to build this platform are referred to as “projects”. These projects handle a large
number of services including computing, networking, and storage services. Unlike virtualization, in which
resources such as RAM, CPU, etc are abstracted from the hardware using hypervisors, OpenStack uses a
number of APIs to abstract those resources so that users and the administrators are able to directly interact
with the cloud services.
OpenStack components
Apart from various projects which constitute the OpenStack platform, there are nine major services
namely Nova, Neutron, Swift, Cinder, Keystone, Horizon, Ceilometer, and Heat. Here is the basic
definition of all the components which will give us a basic idea about these components.
1. Nova (compute service): It manages the compute resources like creating, deleting, and handling
the scheduling. It can be seen as a program dedicated to the automation of resources that are
responsible for the virtualization of services and high-performance computing.
2. Neutron (networking service): It is responsible for connecting all the networks across
OpenStack. It is an API driven service that manages all networks and IP addresses.
3. Swift (object storage): It is an object storage service with high fault tolerance capabilities and it
used to retrieve unstructured data objects with the help of Restful API. Being a distributed
platform, it is also used to provide redundant storage within servers that are clustered together. It is
able to successfully manage petabytes of data.
4. Cinder (block storage): It is responsible for providing persistent block storage that is made
accessible using an API (self- service). Consequently, it allows users to define and manage the
amount of cloud storage required.
5. Keystone (identity service provider): It is responsible for all types of authentications and
authorizations in the OpenStack services. It is a directory-based service that uses a central
repository to map the correct services with the correct user.
6. Glance (image service provider): It is responsible for registering, storing, and retrieving virtual
disk images from the complete network. These images are stored in a wide range of back-end
systems.
7. Horizon (dashboard): It is responsible for providing a web-based interface for OpenStack
services. It is used to manage, provision, and monitor cloud resources.
8. Ceilometer (telemetry): It is responsible for metering and billing of services used. Also, it is used
to generate alarms when a certain threshold is exceeded.
9. Heat (orchestration): It is used for on-demand service provisioning with auto-scaling of cloud
resources. It works in coordination with the ceilometer.
These are the services around which this platform revolves around. These services individually handle
storage, compute, networking, identity, etc. These services are the base on which the rest of the projects
rely on and are able to orchestrate services, allow bare-metal provisioning, handle dashboards, etc.
Features of OpenStack
 Modular architecture: OpenStack is designed with a modular architecture that enables users to
deploy only the components they need. This makes it easier to customize and scale the platform to
meet specific business requirements.
 Multi-tenancy support: OpenStack provides multi-tenancy support, which enables multiple users
to access the same cloud infrastructure while maintaining security and isolation between them.
This is particularly important for cloud service providers who need to offer services to multiple
customers.
 Open-source software: OpenStack is an open-source software platform that is free to use and
modify. This enables users to customize the platform to meet their specific requirements, without
the need for expensive proprietary software licenses.
 Distributed architecture: OpenStack is designed with a distributed architecture that enables users
to scale their cloud infrastructure horizontally across multiple physical servers. This makes it easier
to handle large workloads and improve system performance.
 API-driven: OpenStack is API-driven, which means that all components can be accessed and
controlled through a set of APIs. This makes it easier to automate and integrate with other tools
and services.
 Comprehensive dashboard: OpenStack provides a comprehensive dashboard that enables users
to manage their cloud infrastructure and resources through a user-friendly web interface. This
makes it easier to monitor and manage cloud resources without the need for specialized technical
skills.
 Resource pooling: OpenStack enables users to pool computing, storage, and networking
resources, which can be dynamically allocated and de-allocated based on demand. This enables
users to optimize resource utilization and reduce waste.
Advantages of using OpenStack
 It boosts rapid provisioning of resources due to which orchestration and scaling up and down of
resources becomes easy.
 Deployment of applications using OpenStack does not consume a large amount of time.
 Since resources are scalable therefore they are used more wisely and efficiently.
 The regulatory compliances associated with its usage are manageable.
Disadvantages of using OpenStack
 OpenStack is not very robust when orchestration is considered.
 Even today, the APIs provided and supported by OpenStack are not compatible with many of the
hybrid cloud providers, thus integrating solutions becomes difficult.
 Like all cloud service providers OpenStack services also come with the risk of security breaches.

Explain about the virtual box in detail


it is software that enables us to run operating systems like Ubuntu, Windows, and many other operating
systems.
Oracle Corporation developed a virtual box, and it is also known as VB. It acts like a hypervisor for X86
machines. Originally, it was created by Innotek GmbH, and they made it accessible to all in 2007.
In general, a Virtual Box is a software virtualization program that may be run as an application on any
operating system. It’s one of the numerous advantages of Virtual Box. It supports the installation of
additional operating systems, known as Guest OS. It may then set up and administer free guest virtual
machines, each with its own operating system and virtual environment. Virtual Box is supported by
several operating systems, including Windows XP, Windows 7, Linux, Windows Vista, Mac OS X,
Solaris, and Open Solaris. Windows, Linux, OS/2, BSD, Haiku, and other guest operating systems are
supported in various versions and derivatives.
It can be used in following project
 Software portability
 Application development
 System testing and debugging
 Network simulation
 General computing
Advantages of Virtual Box
 Isolation – A virtual machine’s isolated environment is suitable for testing software or running
programmes that demand more resources than are accessible in other settings.
 Virtualization- VirtualBox allows users to run another OS on a single computer without
purchasing a new device. It generates a virtual machine that functions just like a real computer,
with its own processing cores, RAM, and hard disc space dedicated only to the virtual
environment.
 Cross-Platform Compatability- VirtualBox can run Windows, Linux, Solaris, Open Solaris, and
MacOS as its host operating system (OS). Users do not have to be concerned about compatibility
difficulties while setting up virtual computers on numerous devices or platforms.
 Easy Control Panel- VirtualBox’s simple control interface makes it easier to configure parameters
like CPU cores and RAM. Users may begin working on their projects within a few moments of
installing the software program on their PCs or laptops.
 Multiple Modes- Users have control over how they interact with their installations. Whether in
full-screen mode, flawless window mode, scaled window mode, or 3D graphics acceleration. This
allows users to customize their experience according to the kind of project they are working on.
DIsadvantages of Virtual Box
 VirtualBox, however, relies on the computer’s hardware. Thus, the virtual machine will only be
effective if the host is faster and more powerful. As a result, VirtualBox is dependent on its host
computer.
 If the host computer has any defects and the OS only has one virtual machine, just that system will
be affected; if there are several virtual machines operating on the same OS, all of them would be
affected.
 Though these machines act like real machines, they are not genuine; hence, the host CPU must
accept the request, resulting in delayed usability. So, when compared to real computers, these
virtual machines are not as efficient.
Installation of Virtual Box
 Download it from its official website at Oracle VM Virtual Box site.
 Select your platform package.
 Click on the installer.exe.
 Follow all the Virtual Box installer.exe instruction.
 Start your Virtual Box and create your OS file.

You might also like