NIST Cloud Architecture Guide
NIST Cloud Architecture Guide
● The goal is to achieve effective and secure cloud computing to reduce cost
and improve services
● In general, NIST generates report for future reference which includes survey,
analysis of existing cloud computing reference model, vendors and federal
agencies.
Privacy
Securit
IaaS Configuring Service
y
Privacy impact Portability and Aggregation
Resource abstraction Interoperat-
Audit & Control Layer
Service Arbitrage
Performance Audit Physical resource Business support
Layer
Cloud Carrier
● Cloud broker: An entity that manages the performance and delivery of cloud
services and negotiates relationship between cloud provider and consumer.
Broker Provider
● The interaction between the actors may lead to different use case scenario.
● Figure 3.4 shows one kind of scenario in which the Cloud consumer may
request service from a cloud broker instead of contacting service provider
directly. In this case, a cloud broker can create a new service by combining
multiple services.
Provider 1
Consumer Broker
Provider 2
SLA #1 SLA #2
Consumer Provider Carrier
Maintain the consistent Specify the capacity and
Auditor
Consumer Provider
● Cloud consumer is a principal stake holder for the cloud computing service
and requires service level agreements to specify the performance
requirements fulfilled by a cloud provider.
What is the importance of cloud storage and what are different types of cloud storages?
In Cloud Computing, Cloud storage is a virtual locker where we can remotely stash any data. When we upload a file
to a cloud-based server like Google Drive, OneDrive, or iCloud that file gets copied over the Internet into a data
server that is cloud-based actual physical space where companies store files on multiple hard drives.
Hard drives are block-based storage systems. Your operating system like Windows or Linux actually sees a
hard disk drive. So, it sees a drive on which you can create a volume, and then you can partition that
volume and format them.
For example, If a system has 1000 GB of volume, then we can partition it into 800 GB and 200 GB for local
C and local D drives respectively.
Remember with a block-based storage system, your computer would see a drive, and then you can create
volumes and partitions
In this, you are actually connecting through a Network Interface Card (NIC). You are going over a network,
and then you can access the network-attached storage server (NAS). NAS devices are file-based storage
systems.
This storage server is another computing device that has another disk in it. It is already created a file
system so that it’s already formatted its partitions, and it will share its file systems over the network. Here,
you can actually map the drive to its network location.
In this, like the previous one, there is no need to partition and format the volume by the user. It’s already
done in file-based storage systems. So, the operating system sees a file system that is mapped to a local
drive letter.
In this, a user uploads objects using a web browser and uploads an object to a container i.e., Object
Storage Container. This uses the HTTP Protocols with the rest of the APIs (for example: GET, PUT, POST,
SELECT, DELETE).
For example, when you connect to any website, you need to download some images, text, or anything that
the website contains. For that, it is a code HTTP GET request. If you want to review any product then you
can use PUT and POST requests.
Also, there is no hierarchy of objects in the container. Every file is on the same level in an Object-Based
storage system.
Scalability – Capacity and storage can be expanded and performance can be enhanced.
Simpler Data Migrations – As it can add and remove new and old data when required and eliminates
disruptive data migrations.
Recovery -In the event of a hard drive failure or other hardware malfunction, you can access your files on
the cloud.
Best Cloud Storage Providers
1. Dropbox
Ideal for users having limited data. This platform offers a state-of-the-art workspace to store files at a
central location. You can access it from any geographical location 24X7. Computers, mobiles as well as
tablets can access files in Dropbox.
Features:
Disadvantage:
Plans start off with a mere 2GB of free data which may be insufficient for many.
Pricing:
2GB is offered with no charges. There exist 2 plans for individuals as well as 2 plans for teams. The 2
individuals plans are termed Plus and Professional. In the Plus plan, the user is charged $8.25 a month. In
the Professional plan, the user is charged $16.58 a month.
The 2 team plans are termed Standard and Advanced. In the Standard Plan, the user is charged $12.50 a
month. In the Advanced plan, the user is charged $20 a month. The user is offered a free trial for the
Advanced, Professional as well as Standard plans. No free trial exists for the Plus plan.
2. Microsoft OneDrive
OneDrive is a built-in feature of Windows 10’s File Explorer. To use it you need not download any extra app.
Hence it is extremely convenient for Windows 10 users. Microsoft’s Photos app has the option of
employing OneDrive in order to sync images across all your respective devices. Recently AutoCAD has been
added to OneDrive a move that may please as well as attract AutoCAD users. The Personal Vault feature
provides an extra layer of security. An app exists for iOS as well as Android devices. There is also a handy
app in the App Store meant for Mac users.
Pricing:
OneDrive offers 5GB at no cost. The former offers 100GB at $1.99 per month. Next 1TB comes at $7 per
month. OneDrive for Business provides Unlimited storage at $10 per month.
3. Google Drive
Google Drive is one of today’s most powerful cloud storage services. To use it, however, is a bit different
from what you might be used to, so here we see the advantages and disadvantages of Google Drive.
Advantages:
Disadvantages:
Productivity application fares poorly when benchmarked against the contemporary Microsoft Office.
Overall Google Drive is a slick, features packed cloud storage provider with the added bonus of huge cloud
storage. If you are looking for superior collaboration capabilities along with just storage then this is the
right choice.
4. iCloud
The novelty of this platform is that it is the best as well as the most ideal cloud storage provider for Apple
aficionados. Moreover, it works great for private users. The cloud storage platform functions on operating
systems that include iOS, Mac OS as well as Windows.
Features:
It claims to fame that it is the prestigious Apple company’s proprietary cloud storage platform.
Users can collaborate with applications that include Notes, Keynote as well as Pages.
● The well known cloud storage service is Amazon’s Simple Storage Service
(S3), which is launched in 2006.
● It gives any developer access to the same highly scalable data storage
infrastructure that Amazon uses to run its own global network of web
sites.
● Each bucket is owned by an AWS account and the buckets are identified
by a unique user assigned key.
● Buckets and objects are created, listed and retrieved using either a
REST or SOAP interface.
● Objects can also be retrieved using the HTTP GET interface or via BitTorrent.
● An access control list restricts who can access the data in each bucket.
● Bucket names and keys are formulated so that they can be accessed using
HTTP.
● Requests are authorized using an access control list associated with each
bucket and object, for instance:
http://s3.amazonaws.com/samplebucket/samplekey
● Buckets can also be set up to save HTTP log information to another bucket.
● IaaS providers can offer the bare metal in terms of virtual machines where
PaaS solutions are deployed.
● When there is no need for a PaaS layer, it is possible to directly customize the
virtual infrastructure with the software stack needed to run applications.
● This is the case of virtual Web farms: a distributed system composed of Web
servers, database servers and load balancers on top of which prepackaged
software is installed to run Web applications.
● Other solutions provide prepackaged system images that already contain the
software stack required for the most common uses: Web servers, database
servers or LAMP stacks.
○ Physical infrastructure
○ Software management infrastructure
○ User interface
● Web services and RESTful APIs allow programs to interact with the
service without human intervention, thus providing complete integration
within a software system.
● In the case of complete IaaS solutions, all three levels are offered as service.
● This is generally the case with public clouds vendors such as Amazon, GoGrid,
Joyent, Rightscale, Terremark, Rackspace, ElasticHosts, and Flexiscale, which
own large datacenters and give access to their computing infrastructures
using an IaaS approach.
3.4.1 laaS
● The available options within the IaaS offering umbrella range from single
servers to entire infrastructures, including network devices, load balancers,
database servers and Web servers.
● Virtual machines also constitute the atomic components that are deployed
and priced according to the specific features of the virtual hardware:
memory, number of processors and disk storage.
● At the same time, users can take advantage of the full customization offered
by virtualization to deploy their infrastructure in the cloud.
3.4.2 PaaS
● From a user point of view, the core middleware exposes interfaces that allow
programming and deploying applications on the cloud.
● PaaS solutions can offer middleware for developing applications together with
the infrastructure or simply provide users with the software that is installed
on the user premises.
● In the first case, the PaaS provider also owns large datacenters where
applications are executed
● In the second case, referred to in this book as Pure PaaS, the middleware
constitutes the core value of the offering.
○ PaaS-I
○ PaaS-II
○ PaaS-III
○ In this case, developers generally use the provider’s APIs, which are
built on top of industrial runtimes, to develop applications.
○ Google AppEngine is the most popular product in this category.
○ It provides a scalable runtime based on the Java and Python
programming languages, which have been modified for providing a
secure runtime environment and enriched with additional APIs and
components to support scalability.
○ AppScale, an open source implementation of Google AppEngine,
provides interfacecompatible middleware that has to be installed on a
physical infrastructure.
● The third category consists of all those solutions that provide a cloud
programming platform for any kind of application, not only Web
applications.
3.4.3 SaaS
● On the provider side, the specific details and features of each customer’s
application are maintained in the infrastructure and made available on
demand.
● The SaaS model is appealing for applications serving a wide range of users
and that can be adapted to specific needs with little further customization.
● This is the case of CRM and ERP applications that constitute common
needs for almost all enterprises, from small to medium-sized and large
business.
● Every enterprise will have the same requirements for the basic features
concerning CRM and ERP and different needs can be satisfied with further
customization.
● On the customer side, such costs constitute a minimal fraction of the usage
fee paid for the software.
● Initially this approach was affordable for service providers, but it later
became inconvenient when the cost of customizations and specializations
increased.
● Initially the SaaS model was of interest only for lead users and early adopters.
● This lead to transition into SaaS 2.0, which does not introduce a new
technology but transforms the way in which SaaS is used.
It is possible to organize all the concrete realizations of cloud computing into a layered view covering the
entire, from hardware appliances to software systems.
All of the physical manifestations of cloud computing can be arranged into a layered picture that
encompasses anything from software systems to hardware appliances. Utilizing cloud resources can
provide the “computer horsepower” needed to deliver services. This layer is frequently done utilizing a
data center with dozens or even millions of stacked nodes. Because it can be constructed from a range of
resources, including clusters and even networked PCs, cloud infrastructure can be heterogeneous in
character. The infrastructure can also include database systems and other storage services.
The core middleware, whose goals are to create an optimal runtime environment for applications and to
best utilize resources, manages the physical infrastructure. Virtualization technologies are employed at the
bottom of the stack to ensure runtime environment modification, application isolation, sandboxing, and
service quality. At this level, hardware virtualization is most frequently utilized. The distributed
infrastructure is exposed as a collection of virtual computers via hypervisors, which control the pool of
available resources. By adopting virtual machine technology, it is feasible to precisely divide up hardware
resources like CPU and memory as well as virtualize particular devices to accommodate user and
application needs.
Layered Architecture of Cloud
Application Layer
1. The application layer, which is at the top of the stack, is where the actual cloud apps are located.
Cloud applications, as opposed to traditional applications, can take advantage of the automatic-
scaling functionality to gain greater performance, availability, and lower operational costs.
2. This layer consists of different Cloud Services which are used by cloud users. Users can access
these applications according to their needs. Applications are divided into Execution
layers and Application layers.
3. In order for an application to transfer data, the application layer determines whether
communication partners are available. Whether enough cloud resources are accessible for the
required communication is decided at the application layer. Applications must cooperate in order to
communicate, and an application layer is in charge of this.
4. The application layer, in particular, is responsible for processing IP traffic handling protocols like
Telnet and FTP. Other examples of application layer systems include web browsers, SNMP
protocols, HTTP protocols, or HTTPS, which is HTTP’s successor protocol.
Platform Layer
1. The operating system and application software make up this layer.
2. Users should be able to rely on the platform to provide them with Scalability, Dependability, and
Security Protection which gives users a space to create their apps, test operational processes, and
keep track of execution outcomes and performance. SaaS application implementation’s application
layer foundation.
3. The objective of this layer is to deploy applications directly on virtual machines.
4. Operating systems and application frameworks make up the platform layer, which is built on top of
the infrastructure layer. The platform layer’s goal is to lessen the difficulty of deploying
programmers directly into VM containers.
5. By way of illustration, Google App Engine functions at the platform layer to provide API support
for implementing storage, databases, and business logic of ordinary web apps.
Infrastructure Layer
1. It is a layer of virtualization where physical resources are divided into a collection of virtual
resources using virtualization technologies like Xen, KVM, and VMware.
2. This layer serves as the Central Hub of the Cloud Environment, where resources are
constantly added utilizing a variety of virtualization techniques.
3. A base upon which to create the platform layer. constructed using the virtualized network, storage,
and computing resources. Give users the flexibility they want.
4. Automated resource provisioning is made possible by virtualization, which also improves
infrastructure management.
5. The infrastructure layer sometimes referred to as the virtualization layer, partitions the physical
resources using virtualization technologies like Xen, KVM, Hyper-V, and VMware to create a
pool of compute and storage resources.
6. The infrastructure layer is crucial to cloud computing since virtualization technologies are the only
ones that can provide many vital capabilities, like dynamic resource assignment.
Datacenter Layer
In a cloud environment, this layer is responsible for Managing Physical Resources such as
servers, switches, routers, power supplies, and cooling systems.
Providing end users with services requires all resources to be available and managed in data
centers.
Physical servers connect through high-speed devices such as routers and switches to the data
center.
In software application designs, the division of business logic from the persistent data it
manipulates is well-established. This is due to the fact that the same data cannot be incorporated
into a single application because it can be used in numerous ways to support numerous use cases.
The requirement for this data to become a service has arisen with the introduction of
microservices.
A single database used by many microservices creates a very close coupling. As a result, it is hard
to deploy new or emerging services separately if such services need database modifications that
may have an impact on other services. A data layer containing many databases, each serving a
single microservice or perhaps a few closely related microservices, is needed to break complex
service interdependencies.
As per NIST give the definition of Infrastructure as a Service? And brief note on Cloud
storage.?
According to the National Institute of Standards and Technology (NIST), Infrastructure as a Service (IaaS) is
defined as:
"The capability provided to the consumer is to provision processing, storage, networks, and other fundamental
computing resources where the consumer is able to deploy and run arbitrary software, which can include
operating systems and applications. The consumer does not manage or control the underlying cloud
infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited
control of select networking components (e.g., host firewalls)."
In Cloud Computing, Cloud storage is a virtual locker where we can remotely stash any data. When we upload a file
to a cloud-based server like Google Drive, OneDrive, or iCloud that file gets copied over the Internet into a data
server that is cloud-based actual physical space where companies store files on multiple hard drives.
Hard drives are block-based storage systems. Your operating system like Windows or Linux actually sees a
hard disk drive. So, it sees a drive on which you can create a volume, and then you can partition that
volume and format them.
For example, If a system has 1000 GB of volume, then we can partition it into 800 GB and 200 GB for local
C and local D drives respectively.
Remember with a block-based storage system, your computer would see a drive, and then you can create
volumes and partitions
In this, you are actually connecting through a Network Interface Card (NIC). You are going over a network,
and then you can access the network-attached storage server (NAS). NAS devices are file-based storage
systems.
This storage server is another computing device that has another disk in it. It is already created a file
system so that it’s already formatted its partitions, and it will share its file systems over the network. Here,
you can actually map the drive to its network location.
In this, like the previous one, there is no need to partition and format the volume by the user. It’s already
done in file-based storage systems. So, the operating system sees a file system that is mapped to a local
drive letter.
In this, a user uploads objects using a web browser and uploads an object to a container i.e., Object
Storage Container. This uses the HTTP Protocols with the rest of the APIs (for example: GET, PUT, POST,
SELECT, DELETE).
For example, when you connect to any website, you need to download some images, text, or anything that
the website contains. For that, it is a code HTTP GET request. If you want to review any product then you
can use PUT and POST requests.
Also, there is no hierarchy of objects in the container. Every file is on the same level in an Object-Based
storage system.
Scalability – Capacity and storage can be expanded and performance can be enhanced.
Simpler Data Migrations – As it can add and remove new and old data when required and eliminates
disruptive data migrations.
Recovery -In the event of a hard drive failure or other hardware malfunction, you can access your files on
the cloud.
Data centers require electricity and proper internet facility to operate their work, failing which system
will not work properly.
Support for cloud storage isn’t the best, especially if you are using a free version of a cloud provider.
When you use a cloud provider, your data is no longer on your physical storage.
Cloud-based storage is dependent on having an internet connection. If you are on a slow network you
may have issues accessing your storage.
Describe the cloud storage provide along with an example?
Cloud storage providers offer services that allow individuals and businesses to store data on remote servers
accessed via the internet. These providers maintain and manage the storage infrastructure, including
hardware, security, and network, allowing users to focus on accessing and managing their data without
worrying about the underlying infrastructure.
1. Scalability: Users can easily scale their storage needs up or down based on demand.
2. Accessibility: Data can be accessed from anywhere with an internet connection, making it ideal for
remote work and global collaboration.
3. Reliability: Cloud storage providers typically offer high availability with data replication across multiple
locations, reducing the risk of data loss.
4. Security: Providers implement advanced security measures such as encryption, access control, and
authentication to protect data.
5. Cost-Effectiveness: Users pay for the storage they use, which can be more economical compared to
maintaining on-premises storage infrastructure.
● The well known cloud storage service is Amazon’s Simple Storage Service
(S3), which is launched in 2006.
● It gives any developer access to the same highly scalable data storage
infrastructure that Amazon uses to run its own global network of web
sites.
● Amazon keeps its lips pretty tight about how S3 works, but according to
Amazon, S3’s design aims to provide scalability, high availability, and low
latency at commodity costs.
● S3 stores arbitrary objects at up to 5GB in size, and each is accompanied by
up to 2KB of metadata.
● Each bucket is owned by an AWS account and the buckets are identified
by a unique user assigned key.
● Buckets and objects are created, listed and retrieved using either a
REST or SOAP interface.
● Objects can also be retrieved using the HTTP GET interface or via BitTorrent.
● An access control list restricts who can access the data in each bucket.
● Bucket names and keys are formulated so that they can be accessed using
HTTP.
● Requests are authorized using an access control list associated with each
bucket and object, for instance:
http://s3.amazonaws.com/samplebucket/samplekey
● Buckets can also be set up to save HTTP log information to another bucket.
Data Lock-in
is a situation in which a customer using service of a provider cannot be moved to another service provider
because technologies used by a provider will be incompatible with other providers?
This makes a customer dependent on a vendor for services and makes customer unable to use service of
another vendor.
Solution:
Have standardization (in technologies) among service providers so that customers can easily move from a
service provider to another.
1. Public Cloud:
Definition: The public cloud is a cloud computing model where services and infrastructure are provided
over the internet by third-party providers. These services are available to the general public, businesses,
and other organizations on a pay-as-you-go basis.
Key Features:
Shared Resources: Infrastructure and resources are shared among multiple users (tenants),
making it cost-effective.
Scalability: Public clouds offer virtually unlimited scalability, allowing users to scale up or down
quickly based on demand.
Accessibility: Resources are accessible over the internet, from anywhere, and on any device.
Cost-Efficiency: Users only pay for the resources they consume, avoiding the cost of maintaining
on-premises hardware.
Examples:
Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)
2. Private Cloud:
Definition: A private cloud is a cloud computing model where the infrastructure and services are
dedicated to a single organization. The cloud environment can be managed internally by the organization
or externally by a third-party provider but is not shared with other users.
Key Features:
Dedicated Resources: The infrastructure is dedicated to a single organization, providing higher
levels of control and security.
Customization: Private clouds can be tailored to meet the specific needs of an organization,
including security, compliance, and performance requirements.
Security and Compliance: Enhanced security features and the ability to meet strict regulatory
compliance needs make private clouds ideal for sensitive data.
Control: Organizations have full control over their infrastructure, including hardware, software,
and network configurations.
Examples:
VMware vCloud
Microsoft Azure Stack
OpenStack Private Cloud
3. Hybrid Cloud:
Definition: A hybrid cloud is a computing environment that combines public and private clouds, allowing
data and applications to be shared between them. This model provides the flexibility of multiple
deployment methods and optimizes workloads across various environments.
Key Features:
Flexibility: Workloads can be moved between public and private clouds based on business needs,
cost, and security requirements.
Optimized Resource Use: Hybrid clouds allow organizations to use public clouds for non-
sensitive operations and private clouds for critical workloads.
Scalability: Organizations can scale their on-premises infrastructure with the resources of the
public cloud when needed, avoiding over-provisioning.
Cost Management: Hybrid clouds enable organizations to use the cost-effective public cloud for
general purposes while maintaining critical operations in the private cloud.
Examples:
Microsoft Azure with Azure Stack
AWS Outposts
Google Anthos
UNIT-4
Security in cloud computing is a critical concern, given the shared nature of cloud environments and the
sensitivity of data being stored and processed. Cloud providers implement a variety of security features to
protect data, applications, and infrastructure from potential threats. Here’s an overview of the key security
features in cloud computing:
1. Data Encryption:
Definition: Encryption is the process of converting data into a coded format that is unreadable to
unauthorized users. In cloud computing, encryption is used to protect data both at rest (stored data) and in
transit (data being transmitted over the network).
Key Points:
At-Rest Encryption: Data stored in the cloud is encrypted using algorithms like AES-256 to
prevent unauthorized access if the storage medium is compromised.
In-Transit Encryption: Data transmitted between the cloud and the user or between different parts
of the cloud infrastructure is encrypted using protocols like SSL/TLS, ensuring data remains
confidential during transmission.
Key Management: Cloud providers often offer key management services (KMS) to help users
manage their encryption keys securely.
2. Identity and Access Management (IAM):
Definition: IAM is a framework of policies and technologies for ensuring that the right individuals have
the appropriate access to technology resources.
Key Points:
User Authentication: Strong authentication mechanisms, such as multi-factor authentication
(MFA), are used to verify the identity of users before granting access.
Role-Based Access Control (RBAC): Access to cloud resources is granted based on the roles of
users, ensuring that only authorized personnel can access sensitive data or perform certain actions.
Single Sign-On (SSO): Allows users to log in with a single set of credentials across multiple
applications, reducing the risk of password-related security breaches.
3. Firewalls and Network Security:
Definition: Firewalls and network security mechanisms are used to control and monitor incoming and
outgoing network traffic based on predetermined security rules.
Key Points:
Virtual Firewalls: Cloud providers offer virtual firewalls to protect cloud workloads from
unauthorized access, similar to traditional hardware firewalls used in on-premises networks.
Network Segmentation: By segmenting the network into different zones (e.g., public and private
subnets), cloud providers can limit the exposure of critical resources to potential threats.
Intrusion Detection and Prevention Systems (IDPS): These systems monitor cloud networks for
suspicious activities or policy violations and can automatically block or alert administrators to
potential threats.
4. Security Information and Event Management (SIEM):
Definition: SIEM tools collect and analyze security data from across the cloud infrastructure, providing
real-time visibility and enabling quick response to security incidents.
Key Points:
Log Management: SIEM tools aggregate logs from various cloud services and components,
making it easier to detect and investigate security incidents.
Anomaly Detection: Advanced SIEM systems use machine learning to identify unusual patterns or
behaviors that might indicate a security threat.
Incident Response: SIEM tools can automate incident response actions, such as isolating
compromised resources or notifying administrators of potential breaches.
5. Data Loss Prevention (DLP):
Definition: DLP technologies help prevent the unauthorized transfer of sensitive data, ensuring that
critical information does not leave the cloud environment without proper authorization.
Key Points:
Content Inspection: DLP systems inspect data moving in and out of the cloud for sensitive
information, such as credit card numbers or personal identifiable information (PII).
Policy Enforcement: DLP tools enforce policies that restrict the sharing, downloading, or copying
of sensitive data to unauthorized locations or users.
Alerting and Reporting: When a potential data breach is detected, DLP systems can alert
administrators and provide detailed reports for investigation.
6. Compliance and Regulatory Security:
Definition: Cloud providers offer features and tools to help organizations meet compliance and regulatory
requirements for data security, privacy, and reporting.
Key Points:
Compliance Certifications: Many cloud providers are certified under various standards, such as
ISO/IEC 27001, SOC 2, GDPR, and HIPAA, ensuring that their security practices meet global
regulatory standards.
Auditing and Reporting: Cloud providers offer tools for tracking access and changes to data and
resources, helping organizations demonstrate compliance during audits.
Data Residency: Some cloud providers offer options for data residency, allowing organizations to
store data in specific geographic regions to meet local regulatory requirements.
7. Backup and Disaster Recovery:
Definition: Backup and disaster recovery features ensure that data and applications can be restored
quickly in the event of a security incident, such as a data breach, ransomware attack, or hardware failure.
Key Points:
Automated Backups: Cloud providers offer automated backup services to regularly back up data
and applications, ensuring that the latest versions can be restored if needed.
Redundancy: Data is often stored across multiple locations and servers to prevent data loss due to
hardware failures or natural disasters.
Disaster Recovery Plans: Cloud providers often offer disaster recovery as a service (DRaaS),
enabling quick recovery of IT infrastructure after a catastrophic event.
8. Physical Security:
Definition: Physical security refers to the protection of the cloud provider’s data centers from physical
threats, such as unauthorized access, natural disasters, or theft.
Key Points:
Access Controls: Data centers are equipped with strict access controls, including biometric
scanners, security personnel, and surveillance cameras.
Environmental Controls: Data centers are designed to withstand natural disasters, with features
such as redundant power supplies, fire suppression systems, and climate control.
Geographic Distribution: Cloud providers often distribute data across multiple geographically
diverse locations to ensure data availability and durability.
● These standards also apply to cloud related IT activities and include specific
steps that should be taken to ensure a secure environment is maintained that
provides privacy and security of confidential information in a cloud
environment.
2. OAuth(Open Authentication)
3. OpenID
Service providers are the apps and websites that people want to access. Instead of requiring
people to sign into their apps individually, service providers configure their solutions to trust
SAML authorization and rely on the identity providers to verify identities and authorize access.
SAML assertion is the XML document containing data that confirms to the service provider that the
person who is signing in has been authenticated.
There are three types:
Authentication assertion identifies the user and includes the time the person signed-in and the
type of authentication they used, such as a password or multifactor authentication.
Attribution assertion passes the SAML token to the provider. This assertion includes specific data
about the user.
An authorization decision assertion tells the service provider whether the user is authenticated or
if they are denied either because of an issue with their credentials or because they don’t have
permissions for that service.
How does SAML authentication work?
In SAML authentication, service providers and identity providers share sign-in and user data to confirm
that each person who requests access is authenticated. It typically follows the following steps:
1. An employee begins work by signing in using the login page provided by the identity provider.
2. The identity provider validates that the employee is who they say they are by confirming a
combination of authentication details, such as username, password, PIN, device, or biometric data.
3. The employee launches a service provider app, such as Microsoft Word or Workday.
4. The service provider communicates with the identity provider to confirm that that the employee is
authorized to access that app.
5. The identity providers send authorization and authentication back.
OAuth(Open Authentication)
OAuth (Open Authorization) is an open standard for access delegation, commonly used as a way for users
to grant websites or applications limited access to their information without exposing their passwords. It is
primarily used to enable third-party applications to access user data stored on another platform securely.
Key Concepts of OAuth:
1. Resource Owner:
o The user who owns the data or resources that are being accessed.
2. Client:
o The application requesting access to the user's resources. This could be a mobile app, a web
application, or any other type of software that needs access to the user’s information.
3. Authorization Server:
o The server that issues the access token after authenticating the user and obtaining
authorization. This server typically belongs to the service provider that hosts the user’s
data.
4. Resource Server:
o The server that hosts the protected user resources (e.g., data or services) and is capable of
accepting and responding to requests for protected resources using access tokens. This is
often the same as the authorization server.
5. Access Token:
o A token that represents the authorization granted by the resource owner to the client. It is
used by the client to access the protected resources on behalf of the resource owner.
How OAuth Works:
1. Request Authorization:
o The client application requests permission from the user to access certain data. This usually
involves redirecting the user to the authorization server, where they log in and approve the
request.
2. Authorization Grant:
o If the user consents, the authorization server grants an authorization code to the client. This
code is short-lived and can be exchanged for an access token.
3. Exchange Authorization Code for Access Token:
o The client sends the authorization code to the authorization server in exchange for an
access token.
4. Access Resources:
o The client uses the access token to make API requests to the resource server on behalf of
the user. The resource server verifies the token and, if valid, grants access to the requested
resources.
5. Token Expiry and Refresh:
o Access tokens are typically short-lived for security reasons. When a token expires, the
client can use a refresh token (if one was issued) to request a new access token without
requiring the user to log in again.
OpenID :
OpenID is an open standard and decentralized authentication protocol that allows users to log in to
multiple websites or applications using a single set of credentials, managed by an OpenID
provider. Instead of creating separate usernames and passwords for each website, users can
authenticate with one identity provider (like Google or Facebook) and use that authentication
across various platforms.
o The service that authenticates the user and provides their OpenID identity. Examples
include Google, Facebook, and other major identity providers.
o The website or application that accepts OpenID for authentication. The RP trusts the
OpenID provider to verify the user's identity.
3. User (End-User):
4. OpenID Identifier:
o A unique identifier (usually a URL) associated with the user’s OpenID account. This
identifier is provided by the OpenID provider and is used across different relying parties.
o The user selects an OpenID provider (e.g., Google) and logs into that provider's service.
o When the user wants to log in to a website (the relying party) that supports OpenID, they
choose to sign in using OpenID.
3. Redirection to OpenID Provider:
o The relying party redirects the user to the OpenID provider’s login page. If the user is not
already logged in, they will be prompted to do so.
o The OpenID provider authenticates the user and, upon successful authentication, asks if
the user wants to share their identity with the relying party.
o After the user approves, the OpenID provider sends a token or assertion back to the
relying party, confirming the user's identity.
6. Access Granted:
o The relying party receives the token and grants the user access based on the verified
identity.
● The scheme does not work out right if the workload changes abruptly.
● During these events, the number of users grows before the event period and
then decreases during the event period.
● The method results in a minimal loss of QoS, if the event is predicted correctly.
● Otherwise, wasted resources are even greater due to events that do not follow
a fixed pattern.
Popularity-Driven Resource Provisioning
● Again, the scheme has a minimal loss of QoS, if the predicted popularity is
correct.
These standards also apply to cloud related IT activities and include specific steps that should be taken
to ensure a secure environment is maintained that provides privacy and security of confidential
information in a cloud environment.
SAML
Security Assertion Markup Language (SAML) is an open standard for sharing security information
about identity, authentication and Authorization across different systems. SAML is implemented with
the Extensible Markup Language (XML) standard for sharing data. It provides a framework for
implementing single sign-on (SSO) and other federated identity systems. A federated identity system
links an individual identity to multiple identity domains. This approach enables SSO that encompasses
resources on an enterprise network, trusted third-party vendor and customer networks.
Organizations use SAML both for business-to-business and business-to-consumer applications. It is used
to share user credentials across one or more networked systems. The SAML framework is designed to
accomplish two things:
1. user authentication
2. user authorization
SAML is most often used to implement SSO authentication systems that enable end users to log in to their
networks once and be authorized to access multiple resources on that network. For example, SSO
implemented with Microsoft Active Directory (AD) can be integrated with SAML 2.0 authentication
requests.
Authentication is the process of determining whether an entity is what it claims to be. It is required before
authorization, which is the process of determining whether the authenticated identity has permission to use
a resource.
SAML authentication depends on verifying user credentials, which, at a minimum, include user identity
and password. SAML can also enable support for multifactor authentication.
SAML entities
SAML defines three categories of entities:
1. End users. An end user is a person who needs to be authenticated before being allowed to use an
application.
2. Service providers. A service provider is any system that provides services, typically the services
for which users seek authentication, including web or enterprise applications.
3. Identity providers. An identity provider is a special type of service provider that administers
identity information.
SAML components
SAML incorporates four different types of components:
1. SAML assertions are statements of identity, authentication and authorization information. These
are formatted using XML tags specified in SAML.
SAML specifies three types of assertions:
1. An authentication assertion indicates that the subject of the assertion has been authenticated. It
includes the time and method of authentication, as well as the subject being authenticated.
2. An attribute assertion associates the subject of the assertion with the specified attributes. A
specified SAML attribute is one that refers to a defined piece of information relating to the
authentication subject.
3. An authorization decision assertion indicates whether a subject's request to access a resource has
been approved or declined.
2. SAML protocols define how different entities request and respond to requests for security
information. Like SAML assertions, these protocols are encoded with XML tags specified in
SAML.
Authentication Request Protocol defines requests for authentication assertions and valid
responses to such requests. This protocol is used when a request sent from a user to a service
provider needs to be redirected to an identity provider.
Single Logout Protocol defines a technique in which all of a user's active sessions can be
terminated nearly simultaneously. This capability is important for SSO implementations that
require terminating sessions with multiple resources when the user logs out.
Assertion Query and Request Protocol defines requests for new and existing authentication
assertions
3. SAML bindings are the formats specified for SAML protocol messages to be embedded and
transported over different transmission mechanisms.
4. SAML profiles determine how SAML assertions, protocols and bindings are used together for
interoperability in certain applications.
It is an open standard protocol for authorization of an application for using user information, in general, it
allows a third party application access to user related info like name, DOB, email or other required data
from an application like Facebook, Google etc. without giving the third party app the user password. It is
pronounced as oh-auth
You might have seen a “login with Google” or “login with Facebook” button on the login/signup page of
a website that makes easier to get using the service or website by simply logging into one of the services
and grant the client application permission to access your data without giving Password. This is done with
the OAuth.
It is designed to work with
HTTP(Hyper Text Transfer Protocol)
and it allows access tokens to be issued to the third party application by an authorization server with the
approval from the owner. There are 3 Components in
OAuth Mechanism
–
1. OAuth Provider – This is the OAuth provider Eg. Google, FaceBook etc.
2. OAuth Client – This is the website where we are sharing or authenticating the usage of our
information. Eg. GeeksforGeeks etc.
3. Owner – The user whose login authenticates sharing of information.
OAuth can be implemented via google console for “Login/Sign Up with Google” on a web app.
Pattern to be Followed
–
1. Get OAuth 2.0 Client ID from Google API Console
2. Next, Obtain an access token from the Google Authorization Server to access the API.
3. Send the request with the access token to an API .
4. Get Refresh token if longer access is required.
OpenID
A client is the software, such as website or application, that requests tokens that are used to
authenticate a user or access a resource.
Relying parties are the applications that use OpenID providers to authenticate users.
Identity tokens contain identity data including the outcome of the authentication process, an
identifier for the user, and information about how and when the user is authenticated.
OpenID providers are the applications for which a user already has an account. Their role in
OIDC is to authenticate the user and pass that information on to the relying party.
Users are people or services that seek to access an application without creating a new account or
providing a username and password.
OIDC authentication works by allowing users to sign in to one application and receive access to another.
For example, if a user wants to create an account at a news site, they may have an option to use Facebook
to create their account rather than creating a new account. If they choose Facebook, they are using OIDC
authentication. Facebook, which is referred to as the OpenID provider, handles the authentication process
and obtains the user’s consent to provide specific information, such as a user profile, to the news site,
which is the relying party.
ID tokens
The OpenID provider uses ID tokens to transmit authentication results and any pertinent information to
the relying party. Examples of the type of data that are sent include an ID, email address, and name.
Scopes
Scopes define what the user can do with their access. OIDC provides standard scopes, which define things
such as which relying party the token was generated for, when the token was generated, when the token
will expire, and the encryption strength used to authenticate the user.
A typical OIDC authentication process includes the following steps:
1. A user goes to the application they wish to access (the relying party).
2. The user types in their username and password.
3. The relying party sends a request to the OpenID provider.
4. The OpenID provider validates the user’s credentials and obtains authorization.
5. The OpenID provider sends an identity token and often an access token to the relying party.
6. The relying party sends the access token to the user’s device.
7. The user is given access based on the information provided in the access token and relying party.
UNIT -5
Define Hadoop? Explain its components along with its architecture?
Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to maintain and
store big size data. Hadoop works on MapReduce Programming Algorithm that was introduced by
Google. Today lots of Big Brand Companies are using Hadoop in their Organization to deal with big data,
eg. Facebook, Yahoo, Netflix, eBay, etc. The Hadoop Architecture Mainly consists of 4 components.
MapReduce
HDFS(Hadoop Distributed File System)
YARN(Yet Another Resource Negotiator)
Common Utilities or Hadoop Common
1. MapReduce
MapReduce nothing but just like an Algorithm or a data structure that is based on the YARN framework.
The major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster
which Makes Hadoop working so fast. When you are dealing with Big Data, serial processing is no more
of any use. MapReduce has mainly 2 tasks which are divided phase-wise:
In first phase, Map is utilized and in next phase Reduce is utilized.
Here, we can see that the Input is provided to the Map() function then it’s output is used as an input to the
Reduce function and after that, we receive our final output. Let’s understand What this Map() and
Reduce() does.
As we can see that an Input is provided to the Map(), now as we are using Big Data. The Input is a set of
Data. The Map() function here breaks this DataBlocks into Tuples that are nothing but a key-value pair.
These key-value pairs are now sent as input to the Reduce(). The Reduce() function then combines this
broken Tuples or key-value pair based on its Key value and form set of Tuples, and perform some
operation like sorting, summation type job, etc. which is then sent to the final Output Node. Finally, the
Output is Obtained.
The data processing is always done in Reducer depending upon the business requirement of that industry.
This is How First Map() and then Reduce is utilized one by one.
Let’s understand the Map Task and Reduce Task in detail.
Map Task:
RecordReader The purpose of recordreader is to break the records. It is responsible for providing
key-value pairs in a Map() function. The key is actually is its locational information and value is
the data associated with it.
Map: A map is nothing but a user-defined function whose work is to process the Tuples obtained
from record reader. The Map() function either does not generate any key-value pair or generate
multiple pairs of these tuples.
Combiner: Combiner is used for grouping the data in the Map workflow. It is similar to a Local
reducer. The intermediate key-value that are generated in the Map is combined with the help of this
combiner. Using a combiner is not necessary as it is optional.
Partitionar: Partitional is responsible for fetching key-value pairs generated in the Mapper
Phases. The partitioner generates the shards corresponding to each reducer. Hashcode of each key
is also fetched by this partition. Then partitioner performs it’s(Hashcode) modulus with the number
of reducers(key.hashcode()%(number of reducers)).
Reduce Task
Shuffle and Sort: The Task of Reducer starts with this step, the process in which the Mapper
generates the intermediate key-value and transfers them to the Reducer task is known as Shuffling.
Using the Shuffling process the system can sort the data using its key value.
Once some of the Mapping tasks are done Shuffling begins that is why it is a faster process and does not
wait for the completion of the task performed by Mapper.
Reduce: The main function or task of the Reduce is to gather the Tuple generated from Map and
then perform some sorting and aggregation sort of process on those key-value depending on its key
element.
OutputFormat: Once all the operations are performed, the key-value pairs are written into the file
with the help of record writer, each record in a new line, and the key and value in a space-separated
manner.
2. HDFS
HDFS(Hadoop Distributed File System) is utilized for storage permission. It is mainly designed for
working on commodity Hardware devices(inexpensive devices), working on a distributed file system
design. HDFS is designed in such a way that it believes more in storing the data in a large chunk of blocks
rather than storing small data blocks.
HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the other devices
present in that Hadoop cluster. Data storage Nodes in HDFS.
NameNode(Master)
DataNode(Slave)
NameNode:NameNode works as a Master in a Hadoop cluster that guides the Datanode(Slaves).
Namenode is mainly used for storing the Metadata i.e. the data about the data. Meta Data can be the
transaction logs that keep track of the user’s activity in a Hadoop cluster.
DataNode:
DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a Hadoop cluster, the
number of DataNodes can be from 1 to 500 or even more than that. The more number of DataNode, the
Hadoop cluster will be able to store more data. So it is advised that the DataNode should have High
storing capacity to store a large number of file blocks
High Level Architecture Of Hadoop
File Block In HDFS: Data in HDFS is always stored in terms of blocks. So the single block of data is
divided into multiple blocks of size 128MB which is default and you can also change it manually.
Let’s understand this concept of breaking down of file in blocks with an example. Suppose you have
uploaded a file of 400MB to your HDFS then what happens is this file got divided into blocks of
128MB+128MB+128MB+16MB = 400MB size. Means 4 blocks are created each of 128MB except the
last one. Hadoop doesn’t know or it doesn’t care about what data is stored in these blocks so it considers
the final file blocks as a partial record as it does not have any idea regarding it.
Replication In HDFS Replication ensures the availability of the data. Replication is making a copy of
something and the number of times you make a copy of that particular thing can be expressed as it’s
Replication Factor.
Rack Awareness The rack is nothing but just the physical collection of nodes in our Hadoop cluster
(maybe 30 to 40). A large Hadoop cluster is consists of so many Racks . with the help of this Racks
information Namenode chooses the closest Datanode to achieve the maximum performance while
performing the read/write information which reduces the Network Traffic.
Multi-Tenancy
Scalability
Cluster-Utilization
Compatibility
● Permissive federation
● Verified federation
● Encrypted federation
● Trusted federation
Describe the Open stack along with its advantages and disadvantages ?
It is a free open standard cloud computing platform that first came into existence on July 21′ 2010. It was
a joint project of Rackspace Hosting and NASA to make cloud computing more ubiquitous in nature. It is
deployed as Infrastructure-as-a-service(IaaS) in both public and private clouds where virtual resources are
made available to the users. The software platform contains interrelated components that control multi-
vendor hardware pools of processing, storage, networking resources through a data center. In OpenStack,
the tools which are used to build this platform are referred to as “projects”. These projects handle a large
number of services including computing, networking, and storage services. Unlike virtualization, in which
resources such as RAM, CPU, etc are abstracted from the hardware using hypervisors, OpenStack uses a
number of APIs to abstract those resources so that users and the administrators are able to directly interact
with the cloud services.
OpenStack components
Apart from various projects which constitute the OpenStack platform, there are nine major services
namely Nova, Neutron, Swift, Cinder, Keystone, Horizon, Ceilometer, and Heat. Here is the basic
definition of all the components which will give us a basic idea about these components.
1. Nova (compute service): It manages the compute resources like creating, deleting, and handling
the scheduling. It can be seen as a program dedicated to the automation of resources that are
responsible for the virtualization of services and high-performance computing.
2. Neutron (networking service): It is responsible for connecting all the networks across
OpenStack. It is an API driven service that manages all networks and IP addresses.
3. Swift (object storage): It is an object storage service with high fault tolerance capabilities and it
used to retrieve unstructured data objects with the help of Restful API. Being a distributed
platform, it is also used to provide redundant storage within servers that are clustered together. It is
able to successfully manage petabytes of data.
4. Cinder (block storage): It is responsible for providing persistent block storage that is made
accessible using an API (self- service). Consequently, it allows users to define and manage the
amount of cloud storage required.
5. Keystone (identity service provider): It is responsible for all types of authentications and
authorizations in the OpenStack services. It is a directory-based service that uses a central
repository to map the correct services with the correct user.
6. Glance (image service provider): It is responsible for registering, storing, and retrieving virtual
disk images from the complete network. These images are stored in a wide range of back-end
systems.
7. Horizon (dashboard): It is responsible for providing a web-based interface for OpenStack
services. It is used to manage, provision, and monitor cloud resources.
8. Ceilometer (telemetry): It is responsible for metering and billing of services used. Also, it is used
to generate alarms when a certain threshold is exceeded.
9. Heat (orchestration): It is used for on-demand service provisioning with auto-scaling of cloud
resources. It works in coordination with the ceilometer.
These are the services around which this platform revolves around. These services individually handle
storage, compute, networking, identity, etc. These services are the base on which the rest of the projects
rely on and are able to orchestrate services, allow bare-metal provisioning, handle dashboards, etc.
Features of OpenStack
Modular architecture: OpenStack is designed with a modular architecture that enables users to
deploy only the components they need. This makes it easier to customize and scale the platform to
meet specific business requirements.
Multi-tenancy support: OpenStack provides multi-tenancy support, which enables multiple users
to access the same cloud infrastructure while maintaining security and isolation between them.
This is particularly important for cloud service providers who need to offer services to multiple
customers.
Open-source software: OpenStack is an open-source software platform that is free to use and
modify. This enables users to customize the platform to meet their specific requirements, without
the need for expensive proprietary software licenses.
Distributed architecture: OpenStack is designed with a distributed architecture that enables users
to scale their cloud infrastructure horizontally across multiple physical servers. This makes it easier
to handle large workloads and improve system performance.
API-driven: OpenStack is API-driven, which means that all components can be accessed and
controlled through a set of APIs. This makes it easier to automate and integrate with other tools
and services.
Comprehensive dashboard: OpenStack provides a comprehensive dashboard that enables users
to manage their cloud infrastructure and resources through a user-friendly web interface. This
makes it easier to monitor and manage cloud resources without the need for specialized technical
skills.
Resource pooling: OpenStack enables users to pool computing, storage, and networking
resources, which can be dynamically allocated and de-allocated based on demand. This enables
users to optimize resource utilization and reduce waste.
Advantages of using OpenStack
It boosts rapid provisioning of resources due to which orchestration and scaling up and down of
resources becomes easy.
Deployment of applications using OpenStack does not consume a large amount of time.
Since resources are scalable therefore they are used more wisely and efficiently.
The regulatory compliances associated with its usage are manageable.
Disadvantages of using OpenStack
OpenStack is not very robust when orchestration is considered.
Even today, the APIs provided and supported by OpenStack are not compatible with many of the
hybrid cloud providers, thus integrating solutions becomes difficult.
Like all cloud service providers OpenStack services also come with the risk of security breaches.