UNIT 2-3 - Notes
Machine Learning Techniques (Dr. A.P.J. Abdul Kalam Technical University)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
UNIT-II
Praccal of ML: hps://www.youtube.com/watch?v=dl_ZsuHSIFE
REGRESSION: Linear Regression and Logistic Regression
BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal
Classier, Naïve Bayes classier, Bayesian belief networks, EM algorithm. SUPPORT
VECTOR MACHINE: Introduction, Types of support vector kernel – (Linear kernel,
polynomial kernel,and Gaussiankernel), Hyperplane – (Decision surface), Properties
of SVM, and Issues in SVM.
Regression is a technique for investigating the relationship between independent
variables or features and a dependent variable or outcome. It’s used as a method for
predictive modelling in machine learning, in which an algorithm is used to predict
continuous outcomes.
Difference between Regression and Classification
Consider the below diagram:
Regression Algorihm Classificaton Algorihm
In Regression, the output variable In Classication, the output variable must be a
must be of continuous nature or real discrete value.
value.
The task of the regression algorithm The task of the classication algorithm is to map
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
is to map the input value (x) with the the input value(x) with the discrete output
continuous output variable(y). variable(y).
Regression Algorithms are used with Classication Algorithms are used with discrete
continuous data. data.
In Regression, we try to nd the best In Classication, we try to nd the decision
t line, which can predict the output boundary, which can divide the dataset into
more accurately. dierent classes.
Regression algorithms can be used to Classication Algorithms can be used to solve
solve the regression problems such classication problems such as Identication of
as Weather Prediction, House price spam emails, Speech Recognition, Identication
prediction, etc. of cancer cells, etc.
The regression Algorithm can be The Classication algorithms can be divided into
further divided into Linear and Non- Binary Classier and Multi-class Classier.
linear Regression.
Linear Regression vs Logistic Regression
Linear Regression and Logistic Regression are the two famous Machine Learning
Algorithms which come under supervised learning technique. Since both the
algorithms are of supervised in nature hence these algorithms use labeled dataset
to make the predictions. But the main dierence between them is how they are
being used. The Linear Regression is used for solving Regression problems whereas
Logistic Regression is used for solving the Classication problems. The description
of both the algorithms is given below along with dierence table.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
Linear Regression Logistic Regression
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a categorical dependent variable using a
given set of independent variables. given set of independent variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classication problems.
In Linear regression, we predict the value In logistic Regression, we predict the
of continuous variables. values of categorical variables.
In linear regression, we nd the best t In Logistic Regression, we nd the S-curve
line, by which we can easily predict the by which we can classify the samples.
output.
Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be
continuous value, such as price, age, etc. a Categorical value such as 0 or 1, Yes or
No, etc.
In Linear regression, it is required that In Logistic regression, it is not required to
relationship between dependent variable have the linear relationship between the
and independent variable must be linear. dependent and independent variable.
In linear regression, there may be In logistic regression, there should not be
collinearity between the independent collinearity between the independent
variables. variable.
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specically, Regression analysis helps us to
understand how the value of the dependent variable is changing corresponding to
an independent variable when other independent variables are held xed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:
Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in nding the correlation
between variables and enables us to predict the continuous output variable based
on the one or more predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the causal-eect
relationship between variables.
In Regression, we plot a graph between the variables which best ts the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes
through all the datapoints on target-predictor graph in such a way that
the vertical distance between the datapoints and the regression line is
minimum." The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
Some examples of regression can be as:
o Prediction of rain using temperature and other factors
o Determining Market trends
o Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
o Dependent Variable: The main factor in Regression analysis which we want
to predict or understand is called the dependent variable. It is also
called target variable.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
o Independent Variable: The factors which aect the dependent variables or
which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or
very high value in comparison to other observed values. An outlier may
hamper the result, so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with
each other than other variables, then such condition is called
Multicollinearity. It should not be present in the dataset, because it creates
problem while ranking the most aecting variable.
o Undertting and Overtting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is
called Overtting. And if our algorithm does not perform well even with
training dataset, then such problem is called undertting.
Why do we use Regression Analysis?
As mentioned above, Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends, etc., for
such case we need some technology which can make predictions more accurately.
So for such case we need Regression analysis which is a statistical method and used
in machine learning and data science. Below are some other reasons for using
Regression analysis:
o Regression estimates the relationship between the target and the
independent variable.
o It is used to nd the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can condently determine the most
important factor, the least important factor, and how each factor is
aecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on dierent scenarios, but at the core,
all the regression methods analyze the eect of the independent variable on
dependent variables. Here we are discussing some important types of regression
which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for
predictive analysis.
o It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
o If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
o The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
o Below is the mathematical equation for Linear regression:
1. Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Some popular applications of linear regression are:
o Analyzing trends and sales estimates
o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to
solve the classication problems. In classication problems, we have
dependent variables in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or
1, Yes or No, True or False, Spam or not spam, etc.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is dierent from the linear
regression algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:
o f(x)= Output between the 0 and 1 value.
o x= input to the function
o e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as
follows:
o It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
There are three types of logistic regression:
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
What is Bayes Theorem?
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes' theorem can be derived using product rule and conditional probability of
event X with known event Y:
o According to the product rule we can express as the probability of event X
with known event Y as follows;
1. P(X ? Y)= P(X|Y) P(Y) {equation 1}
o Further, the probability of event Y with known event X:
1. P(X ? Y)= P(Y|X) P(X) {equation 2}
Mathematically, Bayes theorem can be expressed by combining both equations on
right hand side. We will get:
Here, both events X and Y are independent events which means probability of
outcome of both events does not depends one another.
The above equation is called as Bayes Rule or Bayes Theorem.
o P(X|Y) is called as posterior, which we need to calculate. It is dened as
updated probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when
hypothesis is true.
o P(X) is called the prior probability, probability of hypothesis before
considering the evidence
o P(Y) is called marginal probability. It is dened as the probability of evidence
under any consideration.
Hence, Bayes Theorem can be written as:
posterior = likelihood * prior / evidence
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
Bayes Optimal Classier: The Bayes Optimal Classier is a
probabilistic model that predicts the most likely outcome for a new
situation.
Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classication as well as Regression problems.
However, primarily, it is used for Classication problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future. This best decision boundary is
called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine. Consider the below diagram in which there are two
dierent categories that are classied using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN
classier. Suppose we see a strange cat that also has some features of dogs, so if
we want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm. We will rst train our model with
lots of images of cats and dogs so that it can learn about dierent features of cats
and dogs, and then we test it with this strange creature. So as support vector
creates a decision boundary between these two data (cat and dog) and choose
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
extreme cases (support vectors), it will see the extreme case of cat and dog. On the
basis of the support vectors, it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classication, text
categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classied into two classes by using a single straight line, then
such data is termed as linearly separable data, and classier is used called as
Linear SVM classier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classied by using a straight line, then
such data is termed as non-linear data and classier used is called as Non-
linear SVM classier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space, but we need to nd out the best decision boundary
that helps to classify the data points. This best boundary is known as the
hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be a
straight line. And if there are 3 features, then hyperplane will be a 2-dimension
plane.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which aect
the position of the hyper plane are termed as Support Vector. Since these vectors
support the hyper plane, hence called a Support vector.
How does SVM works?
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classier that can classify the pair(x1, x2) of
coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:
Hence, the SVM algorithm helps to nd the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm nds the closest point
of the lines from both the classes. These points are called support vectors. The
distance between the vectors and the hyperplane is called as margin. And the goal
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for
non-linear data, we cannot draw a single straight line. Consider the below image:
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add a
third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)
Since we are in 3-d Space,
hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with
z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Downloaded by Neeraj Pandey (neeraj.cse@kipm.edu.in)