Cloud Computing Unit 5
Cloud Computing Unit 5
 Security in Clouds
     Cloud security challenges
     Software as a Service Security
 Common Standards
     The Open Cloud Consortium
     The Distributed Management Task Force
     Standards for Application Developers
     Standards for Messaging
     Standards for Security
 End user Access to Cloud Computing
 Mobile Internet devices and the Cloud
 Hadoop, MapReduce, Virtual Box, Google App Engine
 Programming Environment for Google App Engine
                                Security in Clouds
 Cloud Security, also known as cloud computing security, consists of a set of policies, controls, procedures
  and technologies that work together to protect cloud-based systems, data, and infrastructure.
 These security measures are configured to protect cloud data, support regulatory compliance and
  protect customers' privacy as well as setting authentication rules for individual users and devices.
 From authenticating access to filtering traffic, cloud security can be configured to the exact needs of the
  business. And because these rules can be configured and managed in one place, administration
  overheads are reduced and IT teams empowered to focus on other areas of the business.
 The way cloud security is delivered will depend on the individual cloud provider or the cloud security
  solutions in place. However, implementation of cloud security processes should be a joint responsibility
  between the business owner and solution provider.
 For businesses making the transition to the cloud, robust cloud security is imperative. Security threats are
  constantly evolving and becoming more sophisticated, and cloud computing is no less at risk than an on-
  premise environment. For this reason, it is essential to work with a cloud provider that offers best-in-class
  security that has been customized for your infrastructure.
                          Benefits of Cloud Security
1.Centralized security: Just as cloud computing centralizes applications and data, cloud
security centralizes protection. Cloud-based business networks consist of numerous
devices and endpoints that can be difficult to manage when dealing with shadow IT or
BYOD. Managing these entities centrally enhances traffic analysis and web filtering,
streamlines the monitoring of network events and results in fewer software and policy
updates. Disaster recovery plans can also be implemented and actioned easily when they
are managed in one place.
2.Reduced costs: One of the benefits of utilizing cloud storage and security is that it
eliminates the need to invest in dedicated hardware. Not only does this reduce capital
expenditure, but it also reduces administrative overheads. Where once IT teams were
firefighting security issues reactively, cloud security delivers proactive security features
that offer protection 24/7 with little or no human intervention.
                                                                           Contd……
4.Reliability: Cloud computing services offer the ultimate in dependability. With the
right cloud security measures in place, users can safely access data and applications
within the cloud no matter where they are or what device they are using.
                 Software as a Service Security
 SaaS security is cloud-based security designed to protect the data that software as
  service applications carry.
 It’s a set of practices that companies that store data in the cloud put in place to
  protect sensitive information pertaining to their customers and the business itself.
 However, SaaS security is not the sole responsibility of the organization using the
  cloud service. In fact, the service customer and the service provider share the
  obligation to adhere to SaaS security guidelines published by the National Cyber
  Security Center (NCSC).
 SaaS security is also an important part of SaaS management that aims to reduce
  unused licenses, shadow IT and decrease security risks by creating as much visibility
  as possible.
                   6 SaaS Security best practices
One of the main benefits that SaaS has to offer is that the respective applications are on-
demand, scalable, and very fast to implement, saving companies valuable resources and
time. On top of that, the SaaS provider typically handles updates and takes care of software
maintenance.
This flexibility and the fairly open access have created new security risks that SaaS security
best practices are trying to address and mitigate. Below are 6 security practices and
solutions that every cloud-operating business should know about.
1. Enhanced Authentication
Offering a cloud-based service to your customers means that there has to be a way for them
to access the software. Usually, this access is regulated through login credentials. That’s
why knowing how your users access the resource and how the third-party software provider
handles the authentication process is a great starting point.
                                                                                                 Contd……
Once you understand the various methods, you can make better SaaS security decisions and
enable additional security features like multifactor authentication or integrate other enhanced
authentication methods.
2. Data Encryption
The majority of channels that SaaS applications use to communicate employ TLS (Transport Layer Security)
to protect data that is in transit. However, data that is at rest can be just as vulnerable to cyber attacks as
data that is being exchanged. That’s why more and more SaaS providers offer encryption capabilities that
protect data in transit and at rest. It’s a good idea to talk to your provider and check whether enhanced data
encryption is available for all the SaaS services you use.
5. Consider CASBs
It is possible that the SaaS provider that you are choosing is not able to provide the level of SaaS security that your
company requires. If there are no viable alternatives when it comes to the vendor, consider cloud access security
broker (CASB) tool options. This allows your company to add a layer of additional security controls that are not native
to your SaaS application. When selecting a CASB –whether proxy or API-based –make sure it fits into your existing IT
architecture.
The most well-known standard in information security and compliance is ISO 27001,
developed by the International Organization for Standardization.
The ISO 27001 standard was created to assist enterprises in protecting sensitive data by
best practices.
Cloud compliance is the principle that cloud-delivered systems must be compliant with
the standards their customers require. Cloud compliance ensures that cloud computing
services meet compliance requirements.
                                                                                                       Contd……
https://kinsta.com/blog/cloud-
 security/#how-does-cloud-
security-work
 OCC manages and operates resources including the Open Science Data Cloud (aka OSDC), which is a
  multi-petabyte scientific data sharing resource.
 The consortium is based in Chicago, Illinois, and is managed by the 501(c)3 Center for Computational Science
3.The Open Cloud Testbed - This working group manages and operates the Open Cloud Testbed. The
Open Cloud Testbed (OCT) is a geographically distributed cloud testbed spanning four data centers and
connected with 10G and 100G network connections. The OCT is used to develop new cloud computing
software and infrastructure.
4.The Biomedical Data Commons - The Biomedical Data Commons (BDC) is cloud-based infrastructure that
provides secure, compliant cloud services for managing and analyzing genomic data, electronic medical records
(EMR), medical images, and other PHI data. It provides resources to researchers so that they can more easily make
discoveries from large complex controlled access datasets. The BDC provides resources to those institutions in the
BDC Working Group. It is an example of what is sometimes called condominium model of sharing research
infrastructure in which the research infrastructure is operated by a consortium of educational and research
organizations and provides resources to the consortium.
                                                                                Contd……
5. NOAA Data Alliance Working Group - The OCC National Oceanographic and Atmospheric
Administration (NOAA) Data Alliance Working Group supports and manages the NOAA data
commons and the surrounding community interested in the open redistribution of NOAA
datasets.
In 2015, the OCC was accepted into the Matter healthcare community at Chicago's historic
Merchandise Mart. Matter is a community healthcare entrepreneurs and industry leaders
working together in a shared space to individually and collectively fuel the future of
healthcare innovation.
In 2015, the OCC announced a collaboration with the National Oceanic and Atmospheric
Administration (NOAA) to help release their vast stores of environmental data to the general
public. This effort is managed by the OCC's NOAA data alliance working group.
                    The Distributed management Task Force (DMTF)
   DMTF is a 501(c)(6) nonprofit industry standards organization that creates open manageability standards spanning diverse emerging
    and traditional IT infrastructures including cloud, virtualization, network, servers and storage. Member companies and alliance
    partners collaborate on standards to improve interoperable management of information technologies.
   Based in Portland, Oregon, the DMTF is led by a board of directors representing technology companies including: Broadcom Inc., Cisco,
    Dell Technologies, Hewlett Packard Enterprise, Intel Corporation, Lenovo, NetApp, Positive Tecnologia S.A., and Verizon.
   Founded in 1992 as the Desktop Management Task Force, the organization’s first standard was the now-legacy Desktop Management
    Interface (DMI). As the organization evolved to address distributed management through additional standards, such as the Common
    Information Model (CIM), it changed its name to the Distributed Management Task Force in 1999 , but is now known as, DMTF.
 The DMTF continues to address converged, hybrid IT and the Software Defined Data Center (SDDC)
  with its latest specifications, such as the CADF (Cloud Auditing Data Federation), CIMI (Cloud Infrastructure Management        Interface), CIM
    (Common Information Model), DASH (Desktop and Mobile Architecture for System Hardware), MCTP (Management
    Component Transport Protocol), NC-SI (Network Controller Sideband Interface), OVF (Open Virtualization Format), PLDM (Platform
    Level Data Model), Redfish Device Enablement (RDE), Redfish (Including Protocols, Schema, Host Interface, Profiles) SMASH (Systems Management
    Architecture for Server Hardware) and SMBIOS (System Management BIOS).
  The Distributed Management Task Force
                     (DMTF)
 DMTF enables more effective management of millions of IT systems
  worldwide by bringing the IT industry together to collaborate on
  the development, validation and promotion of               systems
  management standards.
 The group spans the industry with 160 member companies and
  organizations, and more than 4,000 active participants crossing
  43 countries.
 The DMTF board of directors is led by 16 innovative, industry-
  leading technology companies.
  The Distributed Management Task Force
                     (DMTF)
 DMTF management standards are critical to            enabling
  interoperability among multi vendor systems, tools   management
  enterprise.                                           and solutions within
                                                        the
 The DMTF started the Virtualization Management Initiative (VMAN).
 The Open Virtualization Format (OVF) is a fairly new standard   that has
  emerged
  within the VMAN Initiative.
 Benefits of VMAN are Lowering the IT learning curve, and Lowering complexity
  for vendors implementing their solutions
 Standardized Approaches available to
   Companies due to VMAN Initiative
 Deploy virtual computer systems
 Discover and take inventory of virtual computer
  systems
 Manage the life cycle of virtual computer    systems
 Add/change/delete virtual resources
 Monitor virtual systems for health and performance
 Standards for Application Developers
 The purpose of application development standards      is to
  ensure
  uniform, consistent, high-quality software solutions.
 An Ajax framework helps developers to build dynamic web pages on the client
  side. Data is sent to or from the server using requests, usually written in
  JavaScript.
 The acronym derives from the fact that it includes Linux, Apache,
   MySQL, and PHP (or Perl or Python) and is considered by many to be
   the platform of choice for development and deployment of high-
   performance web applications which require a solid and reliable
   foundation.
 The Post Office Protocol (POP) was introduced to circumvent this situation.
 Once the client connects, POP servers begin to download the messages and subsequently
   delete them from the server (a default setting) in order to make room for more messages.
Internet Messaging Access Protocol
 Once mail messages are downloaded with POP, they are        automatically deleted
  from the server when the download process has finished.
 To get around these problems, a standard called Internet Messaging Access Protocol
  was created. IMAP allows messages to be kept on the server but viewed and
  manipulated (usually via a browser) as though they were stored locally.
 Standards for Security
 Security standards define the processes, procedures, and practices
  necessary for implementing a secure environment that provides
  privacy and security of confidential information in a cloud
  environment.
 Security protocols, used in the cloud are:
     Security Assertion Markup Language (SAML)
      Open Authentication (Oauth)
      OpenID
      SSL/TLS
Security Assertion Markup Language (SAML)
 SAML is an XML-based standard for communicating authentication, authorization,
  and attribute information among online partners. It allows businesses to securely
  send assertions between partner organizations regarding the identity and
  entitlements of a principal.
 SAML           allows a user to log on once for affiliated but separate Web     sites.
  SAML is designed for business-to-business (B2B) and             business-to-consumer
  (B2C) transactions.
 SAML is built on a number of existing standards, namely, SOAP, HTTP, and
  XML. SAML relies on HTTP as its communications protocol and specifies the
  use of SOAP.
 Most SAML transactions are expressed in a standardized form of            XML.
  SAML
  assertions and protocols are specified using XML schema.
                 Open Authentication (Oauth)
 OAuth is an open protocol, initiated by Blaine Cook and Chris Messina,
  to allow secure API authorization in a simple, standardized method for
  various types of web applications.
 OAuth is a method for publishing and interacting with protected
  data.
 OAuth      provides users access to their data while protecting
  account credentials.
 OAuth by itself provides no privacy at all and   depends on other protocols
  such as SSL to accomplish that.
                           OpenID
 OpenID is an open, decentralized standard for user authentication and access
  control that allows users to log onto many services using the same digital
  identity.
 It is a single-sign-on (SSO) method of access control.
 It replaces the common log-in process (i.e., a log-in name and a password)
  by allowing users to log in once and gain access to resources across
  participating systems.
 An OpenID is in the form of a unique URL and is authenticated by the
  entity hosting the OpenID URL.
                               SSL/TLS
 Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are
  cryptographically secure protocols designed to provide security and data integrity for
  communications over TCP/IP
 TLS and SSL encrypt the segments of network connections at the transport layer.
 TLS provides endpoint authentication and data confidentiality by using
  cryptography.
 TLS involves three basic phases:
     Peer negotiation for algorithm support
     Key exchange and authentication
     Symmetric cipher encryption and message authentication
   End user Access to Cloud Computing
 In its most strict sense, end-user computing (EUC) refers to computer systems and
  platforms that help non-programmers create applications. ... What's important is that
  a well-designed EUC/VDI plan can allow users to access the digital platforms they need
  to be productive, both on-premises and working remotely in the cloud.
 An End-User Computing application or EUC is any application that is not managed and
  developed in an environment that employs robust IT general controls. ... Although
  the most pervasive EUCs are spreadsheets, EUCs also can include user databases,
  queries, scripts, or output from various reporting tools.
 Broadly, end-user computing covers a wide range of user-facing resources, such as:
  desktop and notebook end user computers; desktop operating systems and
  applications; wearables and smartphones; cloud, mobile, and web applications; and
  virtual desktops and applications.
       Mobile Internet devices and the Cloud
 Mobile cloud computing uses cloud computing to deliver applications to mobile devices. These
  mobile apps can be deployed remotely using speed and flexibility and development tools.
 Mobile cloud storage is a form of cloud storage that is accessible on mobile devices such as
  laptops, tablets, and smartphones. Mobile cloud storage providers offer services that allow the
  user to create and organize files, folders, music, and photos, similar to other cloud computing
  models.
 The mobile cloud is Internet-based data, applications and related services accessed through
  smartphones, laptop computers, tablets and other portable devices. Mobile cloud computing
  is differentiated from mobile computing in general because the devices run cloud- based Web
  apps rather than native apps.
 Locator apps and remote backup are two types of cloud-enabled services for mobile devices
 A mobile cloud app is a software program designed to be accessible via the internet through
  portable devices. In terms of the real world, there are many examples of mobile cloud
  solutions, including: Email.
         Hadoop (https://en.wikipedia.org/wiki/Apache_Hadoop)
 It is a collection of open-source software utilities that facilitates using a network of many
  computers to solve problems involving massive amounts of data and computation.
 It provides a software framework for distributed storage and processing of big data using
  the MapReduce programming model.
 Hadoop was originally designed for computer clusters built from commodity hardware, which
  is still the common use. It has since also found use on clusters of higher-end hardware.
 All the modules in Hadoop are designed with a fundamental assumption that hardware
   failures are common occurrences and should be automatically handled by the framework.
 The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File
  System (HDFS), and a processing part which is a MapReduce programming model.
 Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then
  transfers packaged code into nodes to process the data in parallel. This approach takes
  advantage of data locality, where nodes manipulate the data they have access to.
 This allows the dataset to be processed faster and more efficiently than it would be in a more
  conventional supercomputer architecture that relies on a parallel file system where
  computation and data are distributed via high-speed networking.
                                                                                            Contd…
The base Apache Hadoop framework is composed of the following modules:
     Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
     Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity
        machines, providing very high aggregate bandwidth across the cluster;
     Hadoop YARN – (introduced in 2012) a platform responsible for managing computing resources in
        clusters and using them for scheduling users' applications;[10][11]
     Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale
        data processing.
     Hadoop Ozone – (introduced in 2020) An object store for Hadoop
 The term Hadoop is often used for both base modules and sub-modules and also the ecosystem, or
  collection of additional software packages that can be installed on top of or alongside Hadoop, such as
  Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper,
  Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.
 Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on MapReduce
  and Google File System.
 The Hadoop framework itself is mostly written in the Java programming language, with some native code
  in C and command line utilities written as shell scripts. Though MapReduce Java code is common, any
  programming language can be used with Hadoop Streaming to implement the map and reduce parts of the
   user's program.[15] Other projects in the Hadoop ecosystem expose richer user in
                                          MapReduce
 MapReduce is a programming model or pattern within the Hadoop framework that is used to
  access big data stored in the Hadoop File System (HDFS). ... MapReduce facilitates concurrent
  processing by splitting petabytes of data into smaller chunks, and processing them in parallel on
  Hadoop commodity servers.
 MapReduce is a programming model for processing large amounts of data in a parallel and
  distributed fashion. It is useful for large, long-running jobs that cannot be handled within the scope of
  a single request, tasks like:
     Analyzing application logs
     Aggregating related data from external sources
     Transforming data from one format to another
     Exporting data for external analysis
     App Engine MapReduce is a community-maintained, open source library that is built on top of
         App Engine services, including Datastore and Task Queues. The library is available on GitHub at
         these locations:
            Java source project.
            Python source project.
                                                             Contd…
 MapReduce is a software framework for easily writing
  applications which process vast amounts of data (multi-terabyte
  data-sets) in-parallel on large clusters (thousands of nodes) of
  commodity hardware in a reliable, fault-tolerant manner.
 A MapReduce job usually splits the input data-set into
  independent chunks which are processed by the map tasks in a
  completely parallel manner.
 The framework sorts the outputs of the maps, which are then
  input to the reduce tasks.
 Typically both the input and the output of the job are stored in a
   file-system.
 The framework takes care of scheduling tasks, monitoring them
  and re-executes the failed tasks.
                                                                         Contd…
 Typically the compute nodes and the storage nodes are the same, that is, the
  MapReduce framework and the Hadoop Distributed File System are running on
  the same set of nodes. This configuration allows the framework to effectively
  schedule tasks on the nodes where data is already present, resulting in very
  high aggregate bandwidth across the cluster.
 The MapReduce framework consists of a single master JobTracker and one
  slave TaskTracker per cluster-node. The master is responsible for scheduling
  the jobs' component tasks on the slaves, monitoring them and re-executing
  the failed tasks. The slaves execute the tasks as directed by the master.
 Minimally, applications specify the input/output locations and supply map and
  reduce functions via implementations of appropriate interfaces and/or abstract-
  classes. These, and other job parameters, comprise the job configuration.
 The Hadoop job client then submits the job (jar/executable etc.) and
  configuration to the JobTracker which then assumes the responsibility of
  distributing the software/configuration to the slaves, scheduling tasks and
  monitoring them, providing status and diagnostic information to the job-client.
                                    VirtualBox
 VirtualBox is a general-purpose Type-2 Hypervisor virtualization tool for x86 and x86-
   64 hardware developed by Oracle Corp., targeted at server, desktop, and embedded use,
   that allows users and administrators to easily run multiple guest operating systems on a
   single host.
 VirtualBox was originally created by Innotek GmbH, which was acquired by Sun
  Microsystems in 2008, which was in turn acquired by Oracle in 2010.
 VirtualBox may be installed on Microsoft Windows, MacOS, Linux, Solaris and
  OpenSolaris. There are also ports to FreeBSD and Genode.
 It supports the creation and management of guest virtual machines running Windows,
  Linux, BSD, OS/2, Solaris, Haiku, and OSx86, as well as limited virtualization of macOS
  guests on Apple hardware. For some guest operating systems, a "Guest Additions"
  package of device drivers and system applications is available, which typically improves
  performance, especially that of graphics, and allows changing the resolution of the guest
  OS automatically when the window of the virtual machine on the host OS is resized.
                             Google App Engine
 Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing
  platform as a service for developing and hosting web applications in Google-managed
  data centers. Applications are sandboxed and run across multiple servers.
 Google App Engine, which is a platform-as-a-service (PaaS) offering that gives software
  developers access to Google's scalable hosting.
 An App Engine web application can be described as having three major parts:
         Application instances                       Scalable data storage
                                                     Scalable services
 Programming Environment for Google App Engine
 Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing platform as
  a service for developing and hosting web applications in Google- managed data centers.
 Applications are sandboxed and run across multiple servers. App Engine offers automatic scaling for
  web applications—as the number of requests increases for an application, App Engine automatically
  allocates more resources for the web application to handle the additional demand.
 Google App Engine primarily supports Go, PHP, Java, Python, Node.js, .NET, and Ruby applications,
  although it can also support other languages via "custom runtimes". The service is free up to a
  certain level of consumed resources and only in standard environment but not in flexible
  environment. Fees are charged for additional storage, bandwidth, or instance hours required by the
  application. It was first released as a preview version in April 2008 and came out of preview in
  September 2011.
 The environment you choose depends on the language and related technologies you want to use for
  developing the application.
                                                                                  Contd…
Runtimes and framework
 Google App Engine primarily supports Go, PHP, Java, Python, Node.js, .NET,
  and Ruby applications, although it can also support other languages via "custom runtimes".
 Python web frameworks that run on Google App Engine include Django, CherryPy, Pyramid,
  Flask, web2py and webapp2, as well as a custom Google-written webapp framework and
  several others designed specifically for the platform that emerged since the release.
 Any Python framework that supports the WSGI using the CGI adapter can be used to create
  an application; the framework can be uploaded with the developed application. Third-party
  libraries written in pure Python may also be uploaded.
 Google App Engine supports many Java standards and frameworks. Core to this is
   the servlet 2.5 technology using the open-source Jetty Web Server, along with
   accompanying technologies such as JSP. JavaServer Faces operates with some
   workarounds. A newer release of App Engine Standard Java in Beta supports Java8, Servlet
   3.1 and Jetty9.
                                                                      Contd…
 Though the integrated database, Google Cloud Datastore, may be unfamiliar
  to programmers, it is accessed and supported with JPA, JDO, and by the
  simple low-level API.
 There are several alternative libraries and frameworks you can use to model
  and map the data to the database such as Objectify, Slim3 and Jello
  framework.
 The Spring Framework works with GAE. However, the Spring Security module
  (if used) requires workarounds. Apache Struts 1 is supported, and Struts 2
  runs with workarounds.
 The Django web framework and applications running on it can be used on App
  Engine with modification.
 Django-nonrelaims to allow Django to work with non-relational databases and
  the project includes support for App Engine.