S-24 SOL
1. Attempt any FIVE of the following :
   a) Define :
        i.    Cloud
       
       Cloud refers to a network or servers that are accessed over the internet present at remote
       location (servers situated Far away from machine).
       OR
       The definition of cloud, it's a term used to describe a global network of servers, each with a
       unique function The cloud is not a physical entity, but instead is a vast network of remote
       servers around the globe which are booked together and meant to operate as a single
       ecosystem.
         ii.   Cloud Computing
        REFER W-23
   b) Differentiate between SLA and SLO. (Any 2 points)
   
   c) Define Key-Value Databases and give any two advantages.
   
   A key-value data model or database is also referred to as a key-value store. It is a non-relational
   type of database. In this, an associative array is used as a basic database in which an individual key
   is linked with just one value in a collection. For the values, keys are special identifiers. Any kind of
   entity can be valued. The collection of key-value pairs stored on separate records is called key-
   value databases and they do not have an already defined structure.
   Advantages:
   d) Define Data Pipeline.
    REFER W-23
   e) State difference between Hybrid and Multicloud Kubernetes. (Any 2 points)
    REFER W-23
   f) Enlist any two features of Azure ML Studio.
   
   g) Define Cloud Computing Data Warehouse.
   
   A data warehouse is a centralized repository for storing and managing large amounts of data from
   various sources for analysis and reporting. It is optimized for fast querying and analysis, enabling
   organizations to make informed decisions by providing a single source of truth for data.
2. Attempt any THREE of the following :
   a) Give advantages of cloud computing in machine learning. (Any 4 points)
    REFER W-23
b) Explain Cloud Deployment Models.
c) Differentiate Batch data and Streaming data in machine learning.
 REFER W-23
d) Explain evolution of cloud computing.
Distributed Systems:
It is a composition of multiple independent systems but all of them are depicted as a single entity
to the users. The purpose of distributed systems is to share resources and also use them effectively
and efficiently. Distributed systems possess characteristics such as scalability, concurrency,
continuous availability, heterogeneity, and independence in failures. But the main problem with
this system was that all the systems were required to be present at the same geographical location.
Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful and reliable computing
machines. These are responsible for handling large data such as massive input-output operations.
Even today these are used for bulk processing tasks such as online transactions etc. These systems
have almost no downtime with high fault tolerance. After distributed computing, these increased
the processing capabilities of the system.
Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe computing. Each machine in the
cluster was connected to each other by a network with high bandwidth. These were way cheaper
than those mainframe systems. These were equally capable of high computations. Also, new
nodes could easily be added to the cluster if it was required.
Grid computing:
In 1990s, the concept of grid computing was introduced. It means that different systems were
placed at entirely different geographical locations and these all were connected via the internet.
These systems belonged to different organizations and thus the grid consisted of heterogeneous
nodes. Although it solved some problems but new problems emerged as the distance between the
nodes increased.
Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a virtual layer over the
hardware which allows the user to run multiple instances simultaneously on the hardware. It is a
key technology used in cloud computing. It is the base on which major cloud computing services
such as Amazon EC2, VMware vCloud, etc work on.
Web 2.0:
It is the interface through which the cloud computing services interact with the clients. It is
because of Web 2.0 that we have interactive and dynamic web pages. It also increases flexibility
among web pages. Popular examples of web 2.0 include Google Maps, Facebook,
Service orientation:
It acts as a reference model for cloud computing. It supports low-cost, flexible, and evolvable
applications. Two important concepts were introduced in this computing model.
   Utility computing:
   It is a computing model that defines service provisioning techniques for services such as compute
   services along with other major services such as storage, infrastructure, etc which are provisioned
   on a pay-per-use basis.
3. Attempt any THREE of the following :
   a) Explain Resource Management and Disaster Recovery in Cloud Computing.
   
   b) Compare SaaS and laaS. (Any 4 points)
   
c) Describe Evolving from ETL to ELT with example.
d) Explain Cloud Computing architecture.
1. Frontend :
Frontend of the cloud architecture refers to the client side of cloud computing system. Means it
contains all the user interfaces and applications which are used by the client to access the cloud
computing services/resources. For example, use of a web browser to access the cloud platform.
Client Infrastructure – Client Infrastructure is a part of the frontend component. It contains the
applications and user interfaces which are required to access the cloud platform.
In other words, it provides a GUI( Graphical User Interface ) to interact with the cloud.
2. Backend :
Backend refers to the cloud itself which is used by the service provider. It contains the resources
as well as manages the resources and provides security mechanisms. Along with this, it includes
huge storage, virtual applications, virtual machines, traffic control mechanisms, deployment
models, etc.
Application –
Application in backend refers to a software or platform to which client accesses. Means it
provides the service in backend as per the client requirement.
Service –
Service in backend refers to the major three types of cloud based services like SaaS, PaaS and
IaaS. Also manages which type of service the user accesses.
Runtime Cloud-
Runtime cloud in backend provides the execution and Runtime platform/environment to the
Virtual machine.
Storage –
Storage in backend provides flexible and scalable storage service and management of stored data.
Infrastructure –
Cloud Infrastructure in backend refers to the hardware and software components of cloud like it
includes servers, storage, network devices, virtualization software etc.
Management –
Management in backend refers to management of backend components like application, service,
runtime cloud, storage, infrastructure, and other security mechanisms etc.
Security –
Security in backend refers to implementation of different security mechanisms in the backend for
secure cloud resources, systems, files, and infrastructure to end-users.
Internet –
Internet connection acts as the medium or a bridge between frontend and backend and establishes
the interaction and communication between frontend and backend.
Database–
Database in backend refers to provide database for storing structured data, such as SQL and
NOSQL databases. Example of Databases services include Amazon RDS, Microsoft Azure SQL
database and Google Cloud SQL.
Networking–
Networking in backend services that provide networking infrastructure for application in the
cloud, such as load balancing, DNS and virtual private networks.
Analytics–
Analytics in backend service that provides analytics capabilities for data in the cloud, such as
warehousing, business intelligence and machine learning.
4. Attempt any THREE of the following :
   a) State properties and characteristics of cloud computing.
    REFER W-23
   b) Explain platform as a service in detail.
    REFER W-23
   c) Explain steps involved in collecting and ingesting data.
    REFER W-23
   d) Explain elastic resources in cloud computing.
    REFER W-23
5. Attempt any TWO of the following:
   a) Give various Cloud-based tools used in Data Science in Machine Learning.
   
   Cloud-based Tools for Data Science in Machine Learning
   Cloud-based tools have become essential for data scientists and machine learning practitioners
   because they provide scalable infrastructure, prebuilt algorithms, collaborative environments, and
   easy integration with data storage and processing services. Here are some widely used tools:
   1. Google Cloud Platform (GCP)
   Tools:
      BigQuery: A serverless data warehouse for analyzing large datasets using SQL.
      AI Platform: Provides services to train, deploy, and manage machine learning models.
      TensorFlow on GCP: Integration of TensorFlow with GCP for deep learning.
      Vertex AI: Unified platform to build, deploy, and scale ML models.
      Dataflow: Real-time stream and batch data processing.
   2. Amazon Web Services (AWS)
   Tools:
      SageMaker: End-to-end machine learning service for building, training, and deploying
       models at scale.
      Redshift: Cloud data warehouse for storing and querying large datasets.
   AWS Lambda: For running serverless machine learning pipelines.
   Glue: ETL (Extract, Transform, Load) service to prepare and process data for ML.
   AWS DeepLens: A deep learning-enabled video camera for edge ML development.
3. Microsoft Azure
Tools:
   Azure Machine Learning Studio: A drag-and-drop interface for building ML pipelines without
    extensive coding.
   Azure Synapse Analytics: Combines big data and data warehouse capabilities.
   Data Factory: Helps build ETL and ELT pipelines.
   Azure Cognitive Services: Prebuilt APIs for vision, speech, and language processing.
   Azure Databricks: Collaborative platform for big data analytics and machine learning.
b) Define Container and explain Docker in detail.
Containerization is defined as a form of operating system virtualization, through which
applications are run in isolated user spaces called containers, all using the same shared operating
system (OS). A container is essentially a fully packaged and portable computing environment
Docker
Docker can package an application and its dependencies in a virtual container that can run on any
Linux, Windows, or macOS computer.This enables the application to run in a variety of locations,
such as on-premises, in a public cloud and/or in a private cloud
Docker is the containerization platform that is used to package your application and all its
dependencies together in the form of containers to make sure that your application works
seamlessly in any environment which can be
developed or tested or in production.
2) Docker is a tool designed to make it easier to
create, deploy, and run applications by using
containers.
3) Docker is the world’s leading software
container platform. It was launched in 2013 by a
company called Dotcloud, Inc which was later
renamed Docker, Inc. It is written in the Go
language.
Docker architecture consists of Docker client, Docker Daemon running on Docker Host, and
Docker Hub repository. Docker has client
Server architecture in which the client communicates with the Docker Daemon running on the
Docker Host using a combination of APIs, Socket IO, and TCP.
c) Explain Jupyter Notebook with its workflow.
The Jupyter Notebook is an open-source web application that allows you to create and share
documents. This document contain live code, equations, visualizations and narrative text.Useful
for data cleaning and transformation, numerical simulation, statistical modelling, data
visualization, machine learning, and much more. Language of choice  40+ Languages
Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook
Viewer. Your code can produce rich, interactive output: HTML, images, videos, and custom
MIME types. Big data integration - Leverage big data tools, such as Apache Spark, from Python,
R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, TensorFlow.
   Key Features of Jupyter Notebook
   1. Supports Multiple Languages: While originally supporting Python, Jupyter now works with
      over 40 programming languages (e.g., R, Julia, JavaScript, etc.).
   2. Interactive Development: Users can write and execute code in real-time.
   3. Rich Text Integration: Combine live code with Markdown for explanatory text, making it
      perfect for documentation and tutorials.
   4. Data Visualization: Seamlessly integrates with visualization libraries like Matplotlib,
      Seaborn, and Plotly.
   5. Modular and Reusable Code: Users can execute specific cells independently, allowing for
      quick debugging and experimentation.
   6. Open and Shareable: Notebooks are stored in .ipynb format and can be easily shared via
      platforms like GitHub.
   7. Extensions: A wide array of extensions (e.g., JupyterLab, nbconvert) adds functionality like
      exporting notebooks or collaborative editing.
   How Jupyter Notebook Works
   1. Web Interface: Jupyter runs on a local server, and users interact with it through a web
      browser.
   2. Cells: The interface is divided into cells, which can contain:
           o   Code Cells: Run live code and see outputs immediately.
           o   Markdown Cells: Add documentation with rich text formatting.
   3. Kernel: The kernel is the computational engine (e.g., Python or R) that executes the code in
      the notebook.
   4. Notebook Server: Hosts the notebook interface, allowing users to run computations locally
      or on the cloud.
6. Attempt any TWO of the following:
   a) Draw and explain architecture of Modern Data Pipeline.
    REFER W-23
   b) Explain any six issues which are common with Kubernetes.
    REFER W-23
   c) Elaborate AWS SageMaker.
   
AWS SageMaker is service provided by the Amazon Web Services (AWS) which provides every
developers and data scientist the ability to build, train and deploy machine learning models
quickly.
It is a fully managed service that covers entire machine learning workflow from labelling and
preparing the data, choosing an algorithm, training the model, tuning and optimizing it for
deployment, making predictions to taking action. This helps in the faster model production with
less effort and cost.
AWS SageMaker enables data scientists and developers to quickly and easily build, train and
deploy machine learning models at any scale.
It includes modules that can be used together or independently to build, train and deploy ml
models.
Applications of AWS Sagemaker :
1. Predictive Analysis
2. Computer Vision
3. Natural Language Processing (NLP)
4. Fraud Detection
5. Recommendation System
Workflow in SageMaker :
1. Data Preparation: The first step in the workflow is to prepare the data for training the machine
learning model. This includes tasks such as collecting, cleaning, and transforming data into the
appropriate format.
2. Model Building: Once the data is prepared, the next step is to build the machine learning
model. SageMaker provides a variety of pre-built algorithms and frameworks, or users can bring
their own custom algorithms.
3. Model Training: After the model is built, the next step is to train it using the prepared data.
SageMaker provides a range of options for training, including distributed training on multiple
instances for faster results.
4. Model Optimization: Once the model is trained, the next step is to optimize it for
performance. This includes tasks such as fine-tuning hyperparameters and optimizing the model’s
architecture.
5. Model Deployment: Once the model is optimized, the next step is to deploy it for use in a
production environment. SageMaker provides options for deploying models to various endpoints,
including Amazon EC2 instances, Lambda functions, and API Gateway.
6. Model Monitoring: Once the model is deployed, the next step is to monitor its performance in
real time. SageMaker provides built-in monitoring tools that track the model’s performance
metrics and detect anomalies.
7. Model Management: Finally, once the model is in production, it’s important to manage it over
time. This includes tasks such as updating the model with new data, retraining the model
periodically, and ensuring that it remains performant over time.
Advantages of SageMaker :
1. Scalability : Supports training on multiple instances, including powerful GPU’s.
2. Cost – Effectiveness : Pay – as – you – go model with features like spot instances and
serverless inference to reduce cost.
3. Ease of Use : Provides managed infrastructure and tools, reducing the complexity of setting
up and managing the ML environment.
4. Integration with AWS : Seamlessly integrates with other AWS Services like S3, Redshift &
Lambda.