R20 ML Notes Unit-I
R20 ML Notes Unit-I
(AUTONOMOUS) L T P C
3 - - 3
III B.Tech. – II Sem.
COURSE OBJECTIVES
UNIT-II
UNIT-III
UNIT-V
TEXT BOOKS
REFERENCES
1. Tom M Mitchell, Machine Learning, First Edition, McGraw Hill Education, 2013
UNIT-I
Machine learning is a growing technology which enables computers to learn automatically from past
data. Machine learning uses various algorithms for building mathematical models and making
predictions using historical data or information. Currently, it is being used for various tasks such
as image recognition, speech recognition, email filtering, Facebook auto-tagging, recommender
system, and many more.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences on
their own. The term machine learning was first introduced by Arthur Samuel in 1959. We can define it
in a summarized way as:
Machine learning enables a machine to automatically learn from data, improve performance from
experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning algorithms
build a mathematical model that helps in making predictions or decisions without being explicitly
programmed. Machine learning brings computer science and statistics together for creating predictive
models. Machine learning constructs or uses the algorithms that learn from historical data. The more we
will provide the information, the higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining more data.
Machine learning is a subfield of artificial intelligence that involves training computers to learn from
data without being explicitly programmed. In other words, machine learning algorithms use statistical
techniques to find patterns in data and use these patterns to make predictions or take actions.
A Machine Learning system learns from historical data, builds the prediction models, and whenever
it receives new data, predicts the output for it. The accuracy of predicted output depends upon the
amount of data, as the huge amount of data helps to build a better model which predicts the output more
accurately.
Suppose we have a complex problem, where we need to perform some predictions, so instead of writing
a code for it, we just need to feed the data to generic algorithms, and with the help of these algorithms,
machine builds the logic as per the data and predict the output. Machine learning has changed our way
of thinking about the problem. The below block diagram explains the working of Machine Learning
algorithm:
The need for machine learning is increasing day by day. The reason behind the need for machine learning
is that it is capable of doing tasks that are too complex for a person to implement directly. Asa human,
we have some limitations as we cannot access the huge amount of data manually, so for this, we need
some computer systems and here comes the machine learning to make things easy for us.
We can train machine learning algorithms by providing them the huge amount of data and let them
explore the data, construct the models, and predict the required output automatically. The performance
of the machine learning algorithm depends on the amount of data, and it can be determined by the cost
function. With the help of machine learning, we can save both time and money.
The importance of machine learning can be easily understood by its uses cases, Currently, machine
learning is used in self-driving cars, cyber fraud detection, face recognition, and friend suggestion
by Facebook, etc. Various top companies such as Netflix and Amazon have build machine learning
models that are using a vast amount of data to analyze the user interest and recommend product
accordingly.
Following are some key points which show the importance of Machine Learning:
Before some years (about 40-50 years), machine learning was science fiction, but today it is the part of
our daily life. Machine learning is making our day to day life easy from self-driving cars to Amazon
virtual assistant "Alexa". However, the idea behind machine learning is so old and has a long history.
Below some milestones are given which have occurred in the history of machine learning:
Now machine learning has got a great advancement in its research, and it is present everywhere around
us, such as self-driving cars, Amazon Alexa, Catboats, recommender system, and many more. It
includes Supervised, unsupervised, and reinforcement learning with
clustering, classification, decision tree, SVM algorithms, etc.
Modern machine learning models can be used for making various predictions, including weather
prediction, disease prediction, stock market analysis, etc.
Prerequisites
Before learning machine learning, you must have the basic knowledge of followings so that you can
easily understand the concepts of machine learning:
Machine learning is a buzzword for today's technology, and it is growing very rapidly day by day. We
are using machine learning in our daily life even without knowing it such as Google Maps, Google
assistant, Alexa, etc. Below are some most trending real-world applications of Machine Learning:
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face detection
is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo with our
Facebook friends, then we automatically get a tagging suggestion with name, and the technology behind
this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face recognition and
person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's
a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech
to text", or "Computer speech recognition." At present, machine learning algorithms are widely used
by various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are
using speech recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from the
user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies suchas
Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same product while internet surfing on the
same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the product
as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc.,
and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays
a significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on
self-driving car. It is using unsupervised learning method to train the car models to detect people and
objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always
receive an important mail in our inbox with the important symbol and spam emails in our spambox, and
the technology behind this is Machine learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants canhelp us
in various ways just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.
These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML
algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent transaction
can take place such as fake accounts, fake ids, and steal money in the middle of a transaction. So to
detect this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or
a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values become
the input for the next round. For each genuine transaction, there is a specific pattern which gets change
for the fraud transaction hence, it detects it and makes our online transactions more secure.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of
up and downs in shares, so for this machine learning's long short term memory neural network is used
for the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the
brain. It helps in finding brain tumors and other brain-related diseases easily.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all,
as for this also machine learning helps us by converting the text into our known languages. Google's
GNMT (Google Neural Machine Translation) provide this feature, which is a Neural Machine Learning
that translates the text into our familiar language, and it called as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm, which
is used with image recognition and translates the text from one language to another language.
Supervised learning is the types of machine learning in which machines are trained using well"labelled"
training data, and on basis of that data, machines predict the output. The labelled data meanssome input
data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that teaches
the machines to predict the output correctly. It applies the same concept as a student learns in the
supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the machine
learning model. The aim of a supervised learning algorithm is to find a mapping function to map the
input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud
Detection, spam filtering, etc.
In supervised learning, models are trained using labelled dataset, where the model learns about each type
of data. Once the training process is completed, the model is tested on the basis of test data (a subset of
the training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the below example and diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the
shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the
shape on the bases of a number of sides, and predicts the output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Below are some popular Regression algorithms which come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two
classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
In the previous topic, we learned supervised machine learning in which models are trained using labeled
data under the supervision of training data. But there may be many cases in which we do not have labeled
data and need to find the hidden patterns from the given dataset. So, to solve such typesof cases in
machine learning, we need unsupervised learning techniques.
Unsupervised learning is a machine learning technique in which models are not supervised using training
dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be
compared to learning which takes place in the human brain while learning new things. It can be defined
as:
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
Unsupervised learning cannot be directly applied to a regression or classification problem because unlike
supervised learning, we have the input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that data according to similarities,
and represent that dataset in a compressed format.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it
does not have any idea about the features of the dataset. The task of the unsupervised learning algorithm
is to identify the image features on their own. Unsupervised learning algorithm will perform this task by
clustering the image dataset into the groups according to similarities between images.
Keep Watching
Below are some main reasons which describe the importance of Unsupervised Learning:
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
o In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to
the similarities and difference between the objects.
The unsupervised learning algorithm can be further categorized into two types of problems:
o Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them
as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriority algorithm
o Singular value decomposition
Supervised and Unsupervised learning are the two techniques of machine learning. But both the
techniques are used in different scenarios and with different datasets. Below the explanation of both
learning methods along with their difference table is given.
The main differences between Supervised and Unsupervised learning are given below:
Supervised learning model takes direct Unsupervised learning model does not take
feedback to check if it is predicting correct any feedback.
output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the
hidden patterns in data.
In supervised learning, input data is provided In unsupervised learning, only input data is
to the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when it the hidden patterns and useful insights from
is given new data. the unknown dataset.
Supervised learning needs supervision to train Unsupervised learning does not need any
the model. supervision to train the model.
Supervised learning model produces an Unsupervised learning model may give less
accurate result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train true Artificial Intelligence as it learns
the model for each data, and then only it can similarly as a child learns daily routine things
predict the correct output. by his experiences.
Reinforcement learning:
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that
is the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths
and then choosing the path which gives him the reward with the least hurdles. Each right step will
give the robot a reward and each wrong step will subtract the reward of the robot. The total reward
will be calculated when it reaches the final reward that is the diamond.
Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to a particular
problem
• Training: The training is based upon the input, The model will return a state and the user will
decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular behavior,
increases the strength and the frequency of the behavior. In other words, it has a positive effect
on behavior.
Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can diminish the
results
2. Negative –
Negative Reinforcement is defined as strengthening of behavior because a negative condition is
stopped or avoided.
Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior
Various Practical applications of Reinforcement Learning –
Model selection refers to the process of selecting the best model from a set of candidate models based
on their performance on a given task. This process typically involves splitting the available data into
training and validation sets, using the training set to train each candidate model, and then evaluating
their performance on the validation set. The model with the best performance on the validation set is
selected as the final model.
Generalization refers to the ability of a model to perform well on new, unseen data. When a model is
trained on a dataset, it may overfitt the training data by memorizing specific patterns in the data that
are not representative of the underlying distribution. This can lead to poor performance on new data.
To ensure good generalization, it is important to evaluate a model's performance on a separate test set
that was not used during model selection or training.
To improve generalization, techniques such as regularization, early stopping, and data augmentation
can be used. Regularization involves adding a penalty term to the loss function to discourage complex
models that are prone to overfitting. Early stopping involves monitoring the validation error during
training and stopping the training process when the error begins to increase. Data augmentation
involves generating new training examples by applying transformations to existing examples, which
can increase the size and diversity of the training set and help prevent overfitting.
Overall, model selection and generalization are crucial aspects of machine learning that help ensure
that models are accurate and reliable, and can be applied successfully to new data.
Fig:Model Seleciton
1. Collecting Data:
As you know, machines initially learn from the data that you give them. It is of the utmost importance
to collect reliable data so that your machine learning model can find the correct patterns. The quality of
the data that you feed to the machine will determine how accurate your model is. If you have incorrect
or outdated data, you will have wrong outcomes or predictions which are not relevant.
Make sure you use data from a reliable source, as it will directly affect the outcome of your model. Good
data is relevant, contains very few missing and repeated values, and has a good representation of the
various subcategories/classes present.
2. Preparing the Data:
After you have your data, you have to prepare it. You can do this by:
• Putting together all the data you have and randomizing it. This helps make sure that data is evenly
distributed, and the ordering does not affect the learning process.
• Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate values,
data type conversion, etc. You might even have to restructure the dataset and changethe rows
and columns or index of rows and columns.
• Visualize the data to understand how it is structured and understand the relationship between
various variables and classes present.
• Splitting the cleaned data into two sets - a training set and a testing set. The training set is the set
your model learns from. A testing set is used to check the accuracy of your model after training.
Figure 3: Cleaning and Visualizing Data
3. Choosing a Model:
A machine learning model determines the output you get after running a machine learning algorithm
on the collected data. It is important to choose a model which is relevant to the task at hand. Over the
years, scientists and engineers developed various models suited for different tasks like speech
recognition, image recognition, prediction, etc. Apart from this, you also have to see if your model is
suited for numerical or categorical data and choose accordingly.
Training is the most important step in machine learning. In training, you pass the prepared data to your
machine learning model to find patterns and make predictions. It results in the model learning from the
data so that it can accomplish the task set. Over time, with training, the model gets better at predicting.
After training your model, you have to check to see how it‟s performing. This is done by testing the
performance of the model on previously unseen data. The unseen data used is the testing set that you
split our data into earlier. If testing was done on the same data which is used for training, you will not
get an accurate measure, as the model is already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high accuracy.
When used on testing data, you get an accurate measure of how your model will perform and its speed.
Figure 6: Evaluating a model
6. Parameter Tuning:
Once you have created and evaluated your model, see if its accuracy can be improved in any way. This
is done by tuning the parameters present in your model. Parameters are the variables in the model that
the programmer generally decides. At a particular value of your parameter, the accuracy will be the
maximum. Parameter tuning refers to finding these values.
7. Making Predictions
In the end, you can use your model on unseen data to make predictions accurately.
How to Implement Machine Learning Steps in Python?
You will now see how to implement a machine learning model using Python.
In this example, data collected is from an insurance company, which tells you the variables that come
into play when an insurance amount is set. Using this, you will have to predict the insurance amount
for a person. This data was collected from Kaggle.com, which has many reliable datasets.
You need to start by importing any necessary modules, as shown.
Now, clean your data by removing duplicate values, and transforming columns into numerical values
to make them easier to work with.
As you need to predict a numeral value based on some parameters, you will have to use Linear
Regression. The model needs to learn on your training set. This is done by using the '.fit' command.
Now, predict your testing dataset and find how accurate your predictions are.
Figure 15: Predicting using your model
1.0 is the highest level of accuracy you can get. Now, get your parameters.
The above picture shows the hyperparameters which affect the various variables in your dataset.
AI& ML Differences
AI is a bigger concept to create intelligent machines that can simulate human thinking capability and
behavior, whereas, machine learning is an application or subset of AI that allows machines to learn
from data without being programmed explicitly.
Below are some main differences between AI and machine learning along with the overview of Artificial
intelligence and machine learning
Artificial Intelligence
Artificial intelligence is a field of computer science which makes a computer system that can mimic
human intelligence. It is comprised of two words "Artificial" and "intelligence", which means "a
human-made thinking power." Hence we can define it as,
Artificial intelligence is a technology using which we can create intelligent systems that can simulate
human intelligence.
The Artificial intelligence system does not require to be pre-programmed, instead of that, they use such
algorithms which can work with their own intelligence. It involves machine learning algorithms such as
Reinforcement learning algorithm and deep learning neural networks. AI is being used in multiple places
such as Siri, Google?s AlphaGo, AI in Chess playing, etc.
Currently, we are working with weak AI and general AI. The future of AI is Strong AI for which it is
said that it will be intelligent than humans.
Machine learning
Machine learning is about extracting knowledge from the data. It can be defined as,
Machine learning is a subfield of artificial intelligence, which enables machines to learn from past
data or experiences without being explicitly programmed.
Artificial intelligence is a technology which Machine learning is a subset of AI which allows a machine
enables a machine to simulate humanbehavior. to automatically learn from past data withoutprogramming
explicitly.
The goal of AI is to make a smart computer The goal of ML is to allow machines to learn from data so
system like humans to solve complex that they can give accurate output.
problems.
In AI, we make intelligent systems to perform In ML, we teach machines with data to perform a
any task like a human. particular task and give an accurate result.
Machine learning and deep learning are the Deep learning is a main subset of machine learning.
two main subsets of AI.
AI has a very wide range of scope. Machine learning has a limited scope.
AI is working to create an intelligent system Machine learning is working to create machines that can
which can perform various complex tasks. perform only those specific tasks for which they are trained.
AI system is concerned about maximizing the Machine learning is mainly concerned about accuracy and
chances of success. patterns.
The main applications of AI are Siri, customer The main applications of machine learning are Online
support using catboats, Expert System, recommender system, Google search
Online game playing, intelligent algorithms, Facebook auto friend tagging suggestions,
humanoid robot, etc. etc.
On the basis of capabilities, AI can be divided Machine learning can also be divided into mainly three
into three types, which are, Weak AI, General types that are Supervised learning, Unsupervised
AI, and Strong AI. learning, and Reinforcement learning.
It includes learning, reasoning, and self- It includes learning and self-correction when introduced
correction. with new data.
AI completely deals with Structured, semi- Machine learning deals with Structured and semi-
structured, and unstructured data. structured data.