0% found this document useful (0 votes)
21 views34 pages

Aws Aiml

Uploaded by

MULTI Sim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views34 pages

Aws Aiml

Uploaded by

MULTI Sim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

AWS AI-ML VIRTUAL INTERNSHIP

Internship-II report submitted in partial fulfillment of


requirements for the award of degree of

Bachelor of Technology
in
Electronics And Communication Engineering
by
UNDURTY TEJA BABU (323103312L24)
Under the guidance of

Dr. S.M.K Chaitanya

Department of Electronics and Communication Engineering


GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING (AUTONOMOUS)
(Affiliated to Andhra University, Visakhapatnam, A.P)
VISAKHAPATNAM- 530048
AUGUST-2025

1
Gayatri Vidya Parishad College of Engineering (Autonomous)
Visakhapatnam

ELECTRONICS AND COMMUNICATION ENGINEERING


DEPARTMENT

CERTIFICATE

This is to certify that the Mini project-II/Intern-II titled AI-ML VIRTUAL INTERNSHIP a
bonafide record of the work done by UNDURTY TEJA BABU (323103312L24) in partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Electronics and Communication Engineering of the Gayatri Vidya Parishad College of
Engineering (Autonomous) affiliated to Andhra University, Visakhapatnam during the year
2025-2026.

Under the guidance of: Head of Department:

Dr. S.M.K Chaitanya Dr. N. Deepika Rani


MTech., Ph.D. B.E., M.E., Ph.D.
Assistant Professor, Professor & HOD,
Dept. of ECE, Dept. of ECE,
GVPCE(A) GVPCE(A)

2
INTERNSHIP CERTIFICATE

3
ACKNOWLEDGEMENT
I would like to express our deep sense of gratitude to our esteemed institute Gayatri Vidya
Parishad College of Engineering (Autonomous), which has provided us an opportunity to
fulfill our cherished desire.

I thank our Course coordinator Dr. CH. Sitha kumari, Associate Professor, Department of
Computer Science and Engineering , and our internship mentor Dr. D. Uma devi , Associate
Professor for the kind suggestions and guidance for the successful completion of our internship.

I am highly indebted to Dr. D. Deepika Rani , Associate Professor and Head of the
Department of Electronics and Communication Engineering, Gayatri Vidya Parishad
College of Engineering (Autonomous), for giving us an opportunity to do the internship in
college.

I express our sincere thanks to our Principal Dr. A.B. KOTESWARA RAO, Gayatri Vidya
Parishad College of Engineering (Autonomous) for his encouragement to us during this
project, giving us a chance to explore and learn new technologies in the form of mini projects.

I am grateful for EDUSKILLS and AICTE for providing us this learning opportunity

Finally, I am indebted to the teaching and non-teaching staff of the Electronics And
Communication Engineering Department for all their support in completion of our project.

UNDURTY TEJA BABU (323103312L24)

4
INDEX
Sl. No Topic name Page number

CLOUD FOUNDATION 7

1 8
Introduction to cloud computing
2 Cloud economics and billing 9

3 AWS global infrastructure 10

4 AWS cloud security 11-12

5 Networking and content delivery 13-14

6 Compute 15

7 Storage 16

8 Databases 17

9 Cloud Architecture 18

10 Auto scaling and monitoring 19

AI-ML 20

11 Introduction to Machine learning 21-22

12 ML pipeline with amazon Sage Maker 23-24

13 Forecasting 25-26

14 Computer vision 27

15 Natural Language Processing 28

16 Case Study 29-30

17 Conclusion 31

18 References 32

5
ABSTRACT

The AI-ML virtual internship is about learning includes topics of artificial

intelligence and machine learning. In this internship before introducing to AI – ML

the basics of the cloud computing technologies which are termed as cloud

foundation were intended in the course.

At the end of this internship I have understood the complete concepts of the cloud

foundations along with the basics topics of artificial intelligence which also as some

topics indulged in it as machine learning , nlp , deep learning.

A Cloud Foundation is a multi-disciplinary team of enterprise architects, developers,

and operators, network and security engineers, system and database administrators. The

team governsand enables the organization's cloud transformation process.

AI which stands for artificial intelligence refers to systems or machines that mimic

human intelligence to perform tasks and can iteratively improve themselves based on

the informationthey collect. AI manifests in a number of forms.

6
TOPICS LEARNT IN CLOUD FOUNDATION:

1. Introduction to cloud computing

2. Cloud economics and billing

3. AWS global infrastructure

4. AWS cloud security

5. Networking and content delivery

6. Compute

7. Storage

8. Data bases

9. Cloud Architecture

10. Auto scaling and monitoring

7
I. INTRODUCTION TO CLOUD COMPUTING

Cloud computing is the on-demand delivery of compute power, database, storage,


applications, and other IT resources via the internet with pay-as-you-go pricing. When
you use a cloud service provider like AWS, that service provider owns the computers that
you are using.
Cloud computing enables you to stop thinking of your infrastructure as hardware, and
instead think of and use it as software as it can change much more quickly, easily, and
cost effectively.
The services provided by cloud computing are:
IAAS (infrastructure as a service), PAAS (platform as a service), SAAS (software as a
service)
The cloud deployment models are: Cloud -> public, Hybrid, On- premises -> private
The advantages of cloud computing: Trade capital expense for variable expense,
Benefit from massive economies of scale, Stop guessing capacity, Increase speed and
agility, Stop spending money on running and maintaining data centers, Go global in
minutes.
A web service is any piece of software that makes itself available over the internet or on
private (intranet) networks. A web service uses a standardized format such as Extensible
Markup Language (XML) or JavaScript Object Notation (JSON) for the request and the
response of an application programming interface (API) interaction.

Few services provided by AWS: Amazon Elastic Compute Cloud (Amazon EC2) ,
Amazon Simple Storage Service (Amazon S3), Amazon Virtual Private Cloud (Amazon
VPC), Amazon Relational Database Service (Amazon RDS) etc.

Three ways to interact with AWS: AWS Management Console(Easy to use graphical
interface), Command Line Interface (AWS CLI) (Access to services by discrete
commands or scripts), Software Development Kits (SDKs)(Access services directly from
your code (such as Java, Python, and others)).

The AWS Cloud Adoption Framework (AWS CAF) provides guidance and best practices
to help organizations identify gaps in skills and processes.

Six core perspectives: Business, People, and Governance perspectives (focus on business
capabilities.) Platform, Security, and Operations perspectives (focus on technical
capabilities.)

8
II. CLOUD ECONOMICS AND BILLING

AWS pricing model: There are three fundamental drivers of cost with AWS: compute,
storage, and outbound data transfer.
 How we do pay for AWS:
1) Pay for what you use
2) Pay less when you reserve
3) Pay less when you use more
4) Pay even less as AWS grows

Services with no charge: Amazon VPC(Virtual Private Cloud), IAM(Identity and Access
Management), AWS Elastic Beanstalk, Cloud Formation, Automatic Scaling, AWS Ops
Works, Consolidated Billing.
Total Cost of Ownership (TCO): It is the financial estimate to help identify direct and
indirect costs of a system.
 Uses of TCO:
• To compare the costs of running an entire infrastructure environment or specific
work load on-premises versus on AWS
• To budget and build the business case for moving to the cloud.
TCO considerations: Server cost00s, Storage costs, Network costs, IT labor costs.
 Use the AWS Pricing Calculator to:
• Estimate monthly costs
• Identify opportunities to reduce monthly costs
• Model your solutions before building them
• Explore price points and calculations behind your estimate
• Find the available instance types and contract terms that meet your needs
• Name your estimate and create and name groups of services
AWS ORGANIZATIONS: It is a free account management service that enables you to
consolidate multiple AWS accounts into an organization that you create and centrally
manage. AWS Organizations include consolidated billing and account management
capabilities that help you tobetter meet the budgetary, security, and compliance needs of
your business.
 BENEFITS:
Policy-based account management
Group based account management
Application programming interfaces (APIs) that automate account management
Consolidated billing.

9
III. AWS GLOBAL INFRASTRUCTURE

The AWS Global Infrastructure is designed and built to deliver a flexible, reliable,
scalable, andSecure cloud computing environment with high-quality global network
performance.

The AWS Cloud infrastructure is built around Regions. AWS has 22 Regions

worldwide. An AWS region is a physical geographical location with one or more

Availability Zones. Availability Zones in turn consist of one or more data centers.

Each AWS Region has multiple, isolated locations that are known as availability

Zones. AWS Points of Presence are located in most of the major cities around the

world.

AWS INFRASTRUTURE FEATURES:

Elasticity and scalability


• Elastic infrastructure; dynamic adaption of capacity
• Scalable infrastructure; adapts to accommodate growth
Fault-tolerance
• Continues operating properly in the presence of a failure
• Built-in redundancy of
components High availability

• High level of operational performance


• Minimized downtime
• No human intervention

AWS categories of Services:


Storage service category, Computer Service category, Database service category,
Networking and content delivery service category, Security, Identity and Compliance
Service category, AWS cost management Service category, Management and

10
Governance service category.

11
IV. AWS CLOUD SECURITY

AWS SHARED RESPONSIBILITY MODEL: This shared model can help relieve the
customers operational burden as AWS operates, managesand controls the components
from the host operating system. The responsibility of this model is protecting
infrastructure that runs all the services offered inthe AWS Cloud.

Security of the Cloud: AWS is responsible for the physical infrastructure that hosts your
resources, including: Physical security of data centers, Hardware infrastructure, Software
infrastructure, Network infrastructure (routers, switches)

Security in the Cloud: Customer is responsible for Amazon Elastic Compute Cloud
(Amazon EC2) instance operating system (Including patching, maintenance),
Applications (Passwords, role-based access, etc.), Security group configuration, OS
or host-based firewalls (Including intrusion detection or prevention systems),
Network configurations, Account management (Login and permission settings for
each user).

AWS Identity and Access Management (IAM)allows you to control access to compute,
storage, database, application services and handles authentication in the AWS Cloud.
Essential components: IAM user, IAM group, IAM policy, IAM role

Authentication is a basic computer security concept. A user or system must first prove
their identity. You can assign two different types of access to users:
Programmatic access:
1) Authenticate using Access key ID, Secret access key
2) Provides AWS CLI and AWS SDK access
AWS Management Console access:
1) Authenticate using 12-digit Account ID and IAM user name, IAM password

12
2) If enabled, multi-factor authentication (MFA)prompts for an authentication code.

Authorization is the process of determining what permissions a user, service or application


should be granted. By default, IAM users do not have permissions to access any resources
or data in an AWS account. Instead, you must explicitly grant permissions to a user, group,
or by creating a policy, which is a document in JavaScript Object Notation (JSON) format.

An IAM policy is a formal statement of permissions that will be granted to an entity. There
are two types of IAM policies.

1) Identity-based policies: Permissions policies that you can attach to a principal such as
an IAM user. Categorized as : Managed policies, Inline policies
2) Resource-based policies: These are JSON policy documents that you attach to a
resource, such as an S3 bucket.

Securing data on AWS:


• Encryption of data at rest (data that is physically stored on disk or on tape is
encrypted by using the open standard Advanced Encryption Standard (AES)-256
encryption algorithm)
• Encryption of data in transit (data that is moving across the network is Encrypted by
using Transport Layer Security (TLS) 1.2 with an open standard AES-256 cipher.
TLS was formerly called Secure Sockets Layer (SSL))

AWS compliance programs:


AWS security compliance programs provide information about the policies, processes, and
controls that are established and operated by AWS.
• AWS Config is used to assess, audit, and evaluate the configurations of AWS
resources.
• AWS Artifact provides access to security and compliance reports.

13
V. NETWORKING AND CONTENT DELIVERY

A computer network is two or more client machines that are connected together to share
resources. A 32-bit IP address is called an IPv4 address and IPv6 addresses, which are 128
bits, are also available. IPv6 addresses can accommodate more user devices.

A common method to describe networks is Classless Inter-Domain Routing (CIDR).


The CIDR address is expressed as follows: An IP address(the 1 st address), a slash character
(/), Finally, a number that tells you how many bits of the routing prefix must be fixed or
allocated for the network identifier.

The Open Systems Interconnection (OSI) model is a conceptual model that is used to
explain how data travels over a network, consists of 7 layers, shows the common protocols
and addresses that are used to send data at each layer, also be used to understand how
communication takes place in a virtual private cloud (VPC).

A VPC is a logically isolated section of the AWS Cloud, to one Region and requires a
CIDR block, subdivided into subnets. A subnet belongs to one Availability Zone and
requires a CIDR block.

VPC networking options include: Internet gateway, NAT gateway, VPC ,VPC peering,
VPC sharing, AWS Site-to-Site VPN, AWS Direct Connect, AWS Transit Gateway, You
can use the VPC Wizard to implement your design.

VPC security:

1) Build security into your VPC architecture(Isolate subnets if possible, Choose the
appropriate gateway device or VPN connection for your needs, Use firewalls)
2) Security groups and network ACLs are firewall options that you can use to secure your
VPC.

14
Amazon Route 53 is a highly available and scalable cloud DNS web service that translates
domain names into numeric IP addresses, supports several types of routing policies.

Multi-Region deployment improves your application’s performance for a global audience,


you can use Amazon Route 53 failover to improve the availability of your applications

A CDN is a globally distributed system of caching servers that accelerates delivery of


content, Amazon CloudFront is a fast CDN service that securely delivers data, videos,
applications, and APIs over a global infrastructure with low latency and high transfer
speeds.
•Amazon CloudFront offers many benefits, including:
•Fast and global
•Security at the edge
•Highly programmable

15
VI. COMPUTE

Amazon Elastic Compute Cloud (Amazon EC2): It provides virtual machines where you
can host the same kinds of applications that you might run on a traditional on premises
server. It provides secure, resizable compute capacity in the cloud. EC2 instances can
support a variety of workloads.

Common uses for EC2 instances: Application servers, Web servers, Database
servers, Game servers, Mail servers, Media servers, Catalog servers, File servers,
Computing servers, Proxy servers

Launching an Amazon EC2 instance:

1) Select an AMI 2) Select an instance type 3)Specify network settings


4)Attach IAM role (optional). 5)User data script (optional) 6) Specify storage
7) Add tags 8) Security group settings. 9)Identify or create the key pair

The four pillars of cost optimization:

Right size: Choose the right balance of instance types. Notice when servers can be
either sized down or turned off, and still meet your performance requirements.
Increase elasticity: Design your deployments to reduce the amount of server
capacity that is idle by implementing deployments that are elastic, such as
deployments that use automatic scaling to handle peak loads.
Optimal pricing model: Recognize the available pricing options. Analyse your
usage patterns so that you can run EC2 instances with the right mix of pricing
options.
Optimize storage choices: Analyse the storage requirements of your deployments.
Reduce unused storage overhead when possible, and choose less expensive storage
options if they can still meet your requirements for storage performance.

Benefits of Lambda:

It supports multiple programming languages


Completely automated administration
Built in fault tolerance
It supports the orchestration of multiple
functions Pay-per-use pricing

16
VII. STORAGE

Amazon EBS provides block-level storage volumes for use with Amazon EC2 instances.
Amazon EBS provides three volume types: General Purpose SSD, Provisioned IOPS SSD,
and magnetic.

Benefits: replication in the same Availability Zone, easy and transparent


encryption, elastic volumes, and backup by using snapshots.

Amazon Simple Storage Service (Amazon S3): Data is stored as objects in buckets,
virtually unlimited storage, Single object is limited to 5 TB, designed for 11 9s of
durability, Granular access to bucket and objects Amazon S3 offers a range of object-level
storage classes that are designed for different use: Amazon S3 Standard, Amazon S3
Intelligent- Tiering, Amazon S3 Standard-IA, Amazon S3 Standard, Amazon S3 One Zone-
IA, Amazon S3 Glacier, Amazon S3 Glacier Deep Archive

Key benefits:
You pay for only what you use, You can access Amazon S3 at anytime from anywhere
through a URL, Amazon S3 offers rich security controls.

Amazon Elastic File System (Amazon EFS): It provides file storage over a network;
Perfect for big data and analytics, media processing workflows, content management, web
serving, and home directories; Fully managed service that eliminates storage administration
tasks; Accessible from the console, an API, or the CLI; Scales up or down as files are
added or removed and you pay for what you use.

Amazon S3 Glacier: Amazon S3 Glacier is a secure, durable, and extremely low-cost


cloud storage service for data archiving and long-term backup use cases: Media asset
archiving, Healthcare information archiving, Regulatory and compliance archiving,
Scientific data archiving, Digital preservation, Magnetic tape replacement. AmazonS3 life
cycle policies enable you to delete or move objects based on age.

Amazon S3 Glacier pricing is based on Region. Its extremely low-cost design works well
for long-term archiving. The service is designed to provide 11 9s of durability for objects.

17
VIII. DATABASES

Amazon Relational Database Service: (From on-premises DB to Amazon RDS)


When your database is on premises, the database administrator is responsible for optimizing
applications and queries; setting up the hardware and air conditioning (HVAC). If you
move to a database that runs on an Amazon Elastic Compute Cloud (Amazon EC2)
instance, you no longer need to manage the underlying hardware or handle data centre
operations, still responsible for patching the OS and handling all software and backup
operations.
If you set up your database on Amazon RDS or Amazon Aurora, you reduce your
administrative responsibilities. focus on what really matters most—optimizing your
application.
You can run an instance by using Amazon Virtual Private Cloud (Amazon VPC).
When you use a virtual private cloud (VPC), you have control over your virtual
networking environment.

Use cases:
Web and mobile applications: High throughput, Massive storage scalability, High availability
Ecommerce applications: Low-cost database, Data security, Fully managed solution
Mobile and online games: Rapidly grow capacity, Automatic scaling, Database monitoring

Amazon DynamoDB: DynamoDB is a fast and flexible NoSQL database service for all
applications that need consistent, single-digit-millisecond latency at any scale. The core
DynamoDB components are tables, items, and attributes; runs exclusively on SSDs, and it
supports document and key-value store models& works well for mobile, web, gaming, ad
tech, and Internet of Things (IoT)applications. It’s accessible via the console, the AWS
CLI, and API calls.

Amazon Redshift features: Fast, fully managed data warehouse service; Easily scale with
no downtime; Columnar storage and parallel processing architectures;
Automatically and continuously monitors cluster; Encryption is built in.

Amazon Aurora is a highly available, performant, and cost-effective managed relational


database. Multiple levels of security are available, including network isolation by using
Amazon VPC; encryption at rest by using AWS Key Management Service (AWS KMS)
and encryption of data in transit by using Secure Sockets Layer (SSL). Finally, Amazon
Aurora is fully managed by Amazon RDS. Aurora automates database management tasks,
such as hardware provisioning, software patching, setup, configuration, or backups.

18
IX. CLOUD ARCHITECTURE

Architecture is the art and science of designing and building large structures.
Large systems require architects to manage their size and complexity.
Any Company Corporation has three main departments:
• Fly and Snap –image acquisition, preprocessing, and storage
• Show and Sell –promoting, selling, and working with customers
• Make and Ship –manufacturing of products and delivery

The AWS Well-Architected Framework is organized into six pillars:


Operational excellence design principles:
• Perform operations as code and Anticipate failure
• Refine operations procedures frequently
• Learn from all operational events and failures

Security design principles:


• Enable traceability
• Apply and Automate security at all layers
• Protect data in transit and at rest

Reliability design principles:


• Automatically recover from failure
• Test recovery procedures
• Scale horizontally to increase aggregate workload availability

Performance efficiency design principles:


• Go global in minutes
• Use serverless architectures
• Consider mechanical sympathy

Cost optimization design principles:


• Implement Cloud Financial Management
• Adopt a consumption model
• Analyze and attribute expenditure

Reliability is a measure of your system’s ability to provide functionality when desired by


the user. A common way to measure reliability is to use statistical measurements, such as
Mean Time Between Failures (MTBF).
MTBF is the total time in service over the number of failures.
Availability is defined as the percentage of uptime (that is, length of time that a system is
online between failures) over a period of time (commonly 1 year).

Factors that influence availability: Fault tolerance, Recoverability and Scalability


19
X. AUTOMATIC SCALING AND MONITRING

Elastic Load Balancing is an AWS service that distributes incoming application or


network traffic across multiple targets such as Amazon Elastic Compute Cloud (Amazon
EC2) instances, containers, internet protocol (IP) addresses, and Lambda functions in a
single Availability Zone or across multiple Availability Zones. It is available in three types:
An Application Load, A Network Load, A Classic.

How Elastic Load Balancing works


A load balancer accepts incoming traffic from clients and routes requests to its registered
targets (such as EC2 instances) in one or more Availability Zones.

Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service that is built for DevOps
engineers, developers, site reliability engineers (SRE), and IT managers. CloudWatch
monitors your AWS resources (and the applications that you run on AWS) in real time.

CloudWatch alarms
You can create a CloudWatch alarm that watches a single CloudWatch metric or the result
of a math expression based on CloudWatch metrics. You can create a CloudWatch alarm
based on a static threshold, anomaly detection, or a metric math expression. For an alarm
based on a static threshold, you must specify the: Namespace, Metric, Statistic, Period,
Conditions

Amazon EC2 Auto Scaling


AWS Auto Scaling is a separate service that monitors your applications.
AWS Auto Scaling is a separate service that monitors your applications, and it
automatically adjusts capacity for the following resources:
•Amazon EC2 instances and Spot Fleets
•Amazon ECS tasks
•Amazon DynamoDB tables and indexes
•Amazon Aurora Replicas
•Dynamic scaling uses Amazon EC2 Auto Scaling, CloudWatch, and Elastic Load
Balancing.
•AWS Auto Scaling is a separate service from Amazon EC2 Auto Scaling.

20
TOPICS LEARNT IN AI – ML:

1. AWS ACADEMY MACHINE LEARNING

2. INTRODUCING MACHINE LEARNING

3. MACHINE LEARNING PIPELINE WITH AMAZON SAGE MAKER

4. INTRODUCING FORECASTING

5. INTRODUCING COMPUTER VISION

6. INTRODUCING NATURAL LANGUAGE PROCESSING

21
I, II. INTRODUCING MACHINE LEARNING

Machine learning is the scientific study of algorithms and statistical models to perform a
task by using inference instead of instructions. It is subset of AI, which is a broad branch of
computer science for building machines that can do human tasks.

Deep learning itself a subdomain of machine learning. It represents a significant leap


forward in the capabilities for AI and ML. The theory behind deep learning was created
from how the human brain works.

Machine learning has three main types:

The first type is supervised learning, where a model uses known inputs and outputs to
generalize future outputs. you can have different types of problems within supervised
learning, categorized into two categories:

1) Classification:
• Binary classification (classifying an observation into one of two categories.)
• Multiclass classification (classify an observation into one of three or more
categories.)
2) Regression: mapping inputs to a continuous value, like an integer.
Most business problems are supervised learning.
The second type is unsupervised learning, where the model doesn’t know inputs or outputs
—it finds patterns in the data without help.

The third type is reinforcement learning, an agent(what drives the learning ) continuously
learns, through trial and error, as it interacts in an environment(the place where the agent
learns).

Machine learning process:


Iterative Model Training: Feature engineering is the process of selecting or creating the
features that you will use to train your model. Features are the columns of data that you
have within your dataset.

The goal of the model is to try to correctly estimate the target value for new data.
The ML algorithm uses the features to predict the target.

22
Machine learning tools overview:
Jupyter Notebook is an open-source web application that enables you to create and share
documents that contain live code, equations, visualizations, and narrative text.

Jupyter Lab is a web-based interactive development environment for Jupyter. Notebooks,


code, and data. Jupyter Lab is flexible.

Jupyter Lab is also extensible and modular. You can write plugins that add new
components and integrate with existing ones.

Matplotlib is a library for creating scientific static, animated, and interactive visualizations
in Python.

NumPy is one of the fundamental scientific computing packages in Python. It contains


functions for N-dimensional array objects and useful math functions such as linear algebra,
Fourier transform, and random number capabilities.

scikit-learn is an open-source machine learning library that supports supervised and


unsupervised learning.

Machine learning challenges:

The biggest problems that you directly influence are related to data, but you will also
deal with people, business and technology challenges.

23
III. MACHINE LEARNING PIPELINE WITH AMAZON SAGE MAKER

Collecting data: Private data, Commercial data ,Open-source data

ETL process: Extract, Transform, Load


Typical ETL framework has several components: Crawler, Job, Schedule or event, AWS
Glue is a fully managed ETL service.

Data scientists write scripts in their Jupyter notebook to handle data.


A simple extract and load script is shown, which includes:

Imports and variables –This section imports the libraries that are used.
Download and extract –This section makes a web request and saves the bytes from the URL
as a stream.
Upload to Amazon S3 –With the extracted files in a folder, this section enumerates the
folder’s files.

Evaluating your data:

To run statistics on your data, better understand it, you must ensure that it’s in the right
format for analysis.
Loading data can be done by using pandas, you can load data in many different formats such
as CSV, JSON, Excel, and Pickle.
Descriptive statistics - histogram, scatter plot, correlation matrix

Two things can make your models more successful:

Feature selection: selecting the features that are most relevant and discarding the Rest.

Feature extraction or creation: building up valuable information from raw data by


reformatting, combining, and transforming primary features into new ones.

Cleaning data: Outliers as fall into two broad categories (a single variation for a single
variable (univariate) and a variation of two or more variables(multivariate)) Deleting the
outliers, Transforming the outlier ,Imputing a new value for the outlier.

24
Feature selection methods are available:

filter: Pearson’s correlation coefficient, Linear discriminant analysis (LDA) , Analysis of


variance (ANOVA), Chi-square.
wrapper: Forward selection starts with no features, and adds them until the best model is
found. Backward selection starts with all features, drops them one at a time,
and selects the best model.
embedded: combine the qualities of filter and wrapper methods.

Split data into training, testing and validation sets to help you validate the models accuracy.
 Can use K-fold cross validation can help with smaller datasets
 Can use 2 key algorithms for supervised learning—XGBoost and linear learner
 Use k-means for unsupervised learning
 Use Amazon SageMaker training jobs to train models

Evaluating the accuracy of the model:


To evaluate the model, you must have data that the model hasn’t seen, through either
a hold-out set or by using k-fold cross validation.
 Different machine learning models use different metrics.
 Classification can use confusion matrix, and the AUC-ROC that can be
generated from it.
 Regression can use mean squared.

Tune the model’s hyperparameters to improve model performance.


It is important to find the best solution to your business problem.
 Hyperparameters can be tuned for the model, optimizer, and data.
 Amazon SageMaker can perform automatic hyperparameter tuning.
 Overall model development can be accelerated by using Autopilo.

Securing Data: Done by AWS Identity and Access Management (IAM) service,
encryption at rest & transit, AWS CloudTrail

25
IV. FORECASTING

Forecasting is an important area of machine learning. Because so many opportunities for


predicting future outcomes are based on historical data

You can think of time series data as falling into two broad categories.
The first type is univariate, which means that it has only one variable. The second type is
multivariate, which means that it has more than one variable.
In addition to these two categories, most time series datasets also follow one of the following
patterns:
 Trend –A pattern that shows the values as they increase, decrease, or stay the
same over time
 Seasonal – A repeating pattern that is based on the seasons in a year
 Cyclical – Some other form of a repeating pattern
 Irregular – Changes in the data over time that appear to be random or that have
no discern able pattern

Applications:

•Marketing applications, such as sales forecasting or demand projections.


•Inventory management systems to anticipate required inventory levels. Often, this type of
forecast includes information about delivery times.
•Energy consumption to determine when and where energy is needed.
•Weather forecasting systems for governments, and commercial applications such as
agriculture.

A common occurrence in real world forecasting problems is missing values in the raw data.
The missing data can be calculated in several ways:

 Forward fill –Uses the last known value for the missing value.
 Moving average –Uses the average of the last known values to calculate the missing
value.
 Backward fill –Uses the next known value after the missing value. This practice is
known as lookahead, and it should be avoided.
 Interpolation –Essentially uses an equation to calculate the missing value.
26
Some of the time challenges include –
 Handling different time formats
 Handling missing data through down sampling, up sampling and smoothing
 Handling seasonality, such as weekdays and yearly cycles
 Avoiding bad correlations

Time series algorithms:


Autoregressive Integrated Moving Average (ARIMA)
Deep AR+
Exponential Smoothing (ETS)
Non-Parametric Time Series
(NPTS) Prophet

Supported domains:
 Retail –Product demand
 Inventory planning –Raw materials requirements
 EC2 capacity –Capacity demand for Amazon Elastic Compute Cloud
 Work force –Workload projections
 Web traffic –Projected traffic to one or more websites
 Metrics –Projecting metrics such as revenue, sales, or cash flow
 Custom –Projections for a domain that you can’t map to one of the previous domains

The Root Mean Square Error (RMSE) is another method for evaluating the reliability of
your forecasts. Like we Quantile Loss, RMSE calculates how far off the forecasted values
were from the actual test data. The RMSE finds the difference between the actual target
value in the dataset and the forecasted value for that time period, and it then squares the
differences.

Data includes: Time series data, Meta data, Related data

27
V. INTRODUCING COMPUTER VISION (CV)

Object detection provides the categories of the image and the location of the objects in the
image. The location is provided by a set of coordinates for a box that surrounds the image,
which is known as the bounding box.

Computer vision is the automated extraction of information from images.


You can divide computer vision into two distinct areas—image analysis and video analysis.
Image analysis includes object classification, detection, and segmentation.
Video analysis includes instance tracking, action recognition, and motion estimation.

Amazon Recognition is a computer vision service based on deep learning.


You can easily add image and video analysis to your applications
Amazon Recognition provides image and video detection of faces, sentiment, text, unsafe
content, and library search.
Amazon Recognition is integrated into other AWS services.

You can use Amazon Recognition Custom Labels to:


Simplify data labelling
Provide automated machine learning
Provide simplified model evaluation, inference, and feedback

Amazon Sage Maker Ground Truth enables you to build high-quality training datasets
for your machine learning models.Ground Truth can use active learning to automate the
labelling of your input data.Models must be trained for the specific domain that you want
to analyse.You can set custom labelling for the specific business case.Custom labelling
workflow.You must label images and create bounding boxes for objects.

28
VI. INTRODUCING NATURAL LANGUAGE PROCESSING

Overview of natural language processing.


NLP is a broad term for a general set of business or computational problems that you can
solve with machine learning (ML). NLP systems predate ML.

You can apply NLP to a wide range of problems. Some of the more common applications
include:
 Search applications (such as Google and Bing)
 Human machine interfaces (such as Alexa)
 Sentiment analysis for marketing or political campaigns
 Social research that is based on media analysis
 Chatbots to mimic human speech in applications

Natural language processing managed services.


Amazon Transcribe is the first managed machine learning service that you will learn about.
You can use Amazon Transcribe to recognize speech in audio files an produce a
transcription.

Some of the more common use cases for Amazon Transcribe include:
 Medical transcription
 Video subtitles
 Streaming content labeling
 Customer call center monitoring

Amazon Polly
Amazon Polly can convert text into lifelike speech. You can input either plaintext files or a
file in Speech Synthesis Markup Language (SSML) format. SSML is a markup language
that you can use to provide special instructions for how speech should sound.

Amazon Translate
With Amazon Translate, you can create multilanguage experiences in your applications.
You can create systems for reading documents in one language, and then render or storing it
in another language. You can also use Amazon Translate as part of a document analysis
system.

29
CASE STUDY: PREDICTIVE MAINTENANCE FOR INDUSTRIAL
EQUIPEMNT
PROBLEM STATEMENT:
The manufacturing enterprise is facing challenges with high equipment downtime
and maintenance costs due to unforeseen breakdowns in a large number of industrial
machines and equipment. This unplanned downtime results in significant production losses
and increased maintenance expenses. To tackle this issue effectively, the company needs a
solution capable of analysing sensor data from these machines to predict potential
malfunctions or maintenance needs. By implementing a predictive maintenance strategy,
the company aims to proactively schedule maintenance tasks, prevent failures, and reduce
overall maintenance costs by minimizing downtime.

SOLUTION:
To address this problem, the enterprise can leverage AWS AI/ML services to build a
predictive maintenance solution. The solution involves the following steps:

1. Data Collection: Use AWS IoT Core to ingest sensor data from various industrial
equipment and machines in real-time. The sensor data may include measurements such as
temperature, vibration, pressure, and other relevant parameters.

2. Data Storage: Store the collected sensor data in a scalable and durable storage service
like Amazon S3.

3. Model Training: Build a predictive maintenance model using AWS SageMaker. The
company can choose from various built-in algorithms or implement custom models using
frameworks like TensorFlow or PyTorch. The model will be trained on historical sensor
data and labeled with instances of equipment failures or maintenance activities.

4. Model Deployment: Deploy the trained predictive maintenance model to an AWS


SageMaker endpoint or AWS Lambda function for real-time inference.

5. Real-time Inference: As new sensor data streams in from the industrial equipment, the
deployed model will analyze this data and predict the likelihood of equipment failure or the
need for maintenance.

6. Integration: Integrate the predictive maintenance solution with the company's


maintenance scheduling system or dashboard using AWS Amplify or AWS API Gateway.
This integration will allow maintenance personnel to receive timely alerts and schedule
maintenance activities accordingly.
30
FLOWCHART:

31
CONCLUSION:

The manufacturing company can significantly reduce equipment downtime and


maintenance costs by implementing a predictive maintenance solution that leverages AWS
AI/ML services such as AWS SageMaker, AWS IoT Core, and AWS Lambda. This
solution collects sensor data, trains predictive models, and performs real-time inference to
enable proactive scheduling of maintenance activities based on the model's predictions,
minimizing unexpected failures and optimizing resource allocation. Furthermore, the
solution can be seamlessly integrated with existing maintenance scheduling systems or
dashboards using AWS Amplify or AWS API Gateway, demonstrating the power of AWS
AI/ML services in enabling data-driven decision-making and improving operational
efficiency in manufacturing and industrial settings.

32
REFERENCES:

1. Predictive Maintenance: https://integrio.net/blog/machine-learning-in-predictive-


maintenance-benefits-main-use-cases

2. AWS SageMaker: https://docs.aws.amazon.com/sagemaker/index.html

3. AWS IoT Core: https://docs.aws.amazon.com/iot/

4. AWS Lambda: https://docs.aws.amazon.com/lambda/

5. AWS Amplify: https://docs.amplify.aws/

6. AWS API Gateway: https://docs.aws.amazon.com/apigateway/

7. AWS S3: https://docs.aws.amazon.com/s3/

33
34

You might also like