0% found this document useful (0 votes)

50 views138 pages

ML Unit-2

machine learning course unit 2

Uploaded by

dokihi3931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views138 pages

ML Unit-2

machine learning course unit 2

Uploaded by

dokihi3931

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 138

Machine Learning

Techniques

KCS 055
Regression Algorithm
House Price
Challenges in guessing the House price
Predicting the price with the help of ML model
Regression Model
Simple Linear Regression
Y = a + bX
Dependent
Variable
Independent
Variable
Y-intercept
(The value of
Y when x is 0)

Slope
(How much Y
changes for a unit
change in X)
Linear Regression
35

30
30

25
Area Price 20
(sq. feet) (in Lakhs) 20

Price
100 10 15

10
200 20 10

300 30 5

0
0 50 100 150 200 250 300 350
Area
Linear Regression
35

30
30

Area Price 25

(sq. feet) (in Lakhs) 20

20
100 10

Price
15
200 20 10
10
300 30
5

Y = a + bX 0
0 50 100 150 200 250 300 350
Y -> Price Area

X -> Area
Linear Regression
Slope (b) = Sum of product of deviation/ Sum of square
of deviation for X
Y-intercept (a) = Mean of Y – (b* Mean of X)

Area (X) Price (Y) Mean Mean Deviation (X) Deviation (Y) Product of Square of
(sq. feet) ( Lakhs) of X of Y X – mean(X) Y – mean(Y) Deviations Deviation for X
100 10 200 20 100 – 200 = -100 10 - 20= -10 1000 10,000
200 20 200-200 = 0 20-20 = 0 0 0
300 30 300 – 200 = 100 30 – 20 = 10 1000 10,000

If you have 150 sq. feet house, predict the price?

Slope (b) = 2000/20,000 = 0.1
Y- intercept (a) = 20 – 0.1 * 200 = 0 Y = a + bx
Y = 0 + 0.1 * 150
Y = 15
35

30
30

20
Price 20

10
10

0
0 50 100 150 200 250 300 350
Area
35

30
30

20
Price 20

Outliers
15

10
10

0
0 50 100 150 200 250 300 350
Area
Outliers
An observation that lies an abnormal distance from other
values in a random sample from a population
Predict the price of the pizza whose
diameter is 20 inches.

Diameter in Inches (X) Price in Dollar (Y)

8 10

10 13

12 16
Predict the price of the pizza whose
diameter is 20 inches.

Diameter Price (Y) Mean Mean Deviation (X) Deviation (Y) Product of Square of
(X) (Dollar) of X of Y X – mean(X) Y – mean(Y) Deviations Deviation for X
(inches)
8 10 10 13 8 -10 = -2 10 – 13 = -3 6 4
10 13 10 – 10 = 0 13 -13 = 0 0 0
12 16 12- 10 = 2 16 – 13 = 3 6 4

Slope (b) = Sum of product of deviation/ Sum of square of deviation for X Price when X is 20
Y-intercept (a) = Mean of Y – (b* Mean of X)
Price = a + bx
Slope (b) = 1.5 = -2 + 1.5 * 20
Y-intercept (a) = -2 = 28
Pizza Price
30

0
0 5 10 15 20 25

-5
The world in not so linear
Multiple Linear Regression
• When the data has more
than one independent
variable.
Y = a + b1X1+ b2X2 + b3X3
………………. + bnXn
Dataset
Use the following steps to fit a multiple linear
regression model to this dataset.

Step 1: Calculate X12, X22, X1y, X2y and X1X2.

Step 2: Calculate Regression Sums.
•Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
•Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
•Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8 = 1,162.5
•Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -953.5
•Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -200.375
Step 3: Calculate b0, b1, and b2.

The formula to calculate b1 is:

[(Σx22)(Σx1y) – (Σx1x2)(Σx2y)] / [(Σx12) (Σx22) – (Σx1x2)2]
Thus,
b1 = [(194.875)(1162.5) – (-200.375)(-953.5)] / [(263.875) (194.875) – (-200.375)2]
= 3.148

The formula to calculate b2 is:

[(Σx12)(Σx2y) – (Σx1x2)(Σx1y)] / [(Σx12) (Σx22) – (Σx1x2)2]
Thus,
b2 = [(263.875)(-953.5) – (-200.375)(1152.5)] / [(263.875) (194.875) – (-
200.375)2] = -1.656

The formula to calculate b0 is:

Y– b1X1 – b2X2
Thus,
b0 = 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867
Step 5: Place b0, b1, and b2 in the estimated
linear regression equation.

The estimated linear regression equation is:

Y = b0 + b1*x1 + b2*x2

In our example, it is
Y = -6.867 + 3.148x1 – 1.656x2
Matrix Approach
Coefficients = ((XTX)-1XT )Y

1 1 4
Y X1 X2 1 1 1 1
X= 1 2 5 XT = 1 2 3 4
1 1 4 1 3 8
4 5 8 2
1 4 2 4x3
3x4
6 2 5
1
8 3 8 (((XT)3x4X4x3)-1)3x3(XT)3x4 ) 3x4Y4x1
Y= 6
8 = (result) 3x1
12 4 2
12
4x1
Matrix Approach
Coefficients = ((XTX)-1XT )Y
1 1 1 1 1 1 4
X TX = 1 2 3 4 * 1 2 5
Y X1 X2 1 3 8
4 5 8 2
1 4 2
1 1 4 4 10 19
6 2 5
XTX = 10 30 46
19 46 109
8 3 8
3.15 −0.59 −0.30
12 4 2 (XTX)-1 = −0.59 0.20 0.016
−0.30 0.016 0.054
Matrix Approach
Coefficients = ((XTX)-1XT )Y
3.15 −0.59 −0.30 1 1 1 1
(XTX)-1XT = −0.59 0.20 0.016 * 1 2 3 4
Y X1 X2 −0.30 0.016 0.054 4 5 8 2
1 1 4 (XTX)-1XT =
6 2 5
0.05 0.47 − 1.02 0.19
8 3 8 −0.32 −0.098 0.155 0.26
12 4 2 −0.065 0.005 0.185 −0.125
Matrix Approach
Coefficients = ((XTX)-1XT )Y
((XTX)-1XT )Y =
1
Y X1 X2 0.05 0.47 − 1.02 0.19
* 6
1 1 4 −0.32 −0.098 0.155 0.26 8
−0.065 0.005 0.185 −0.125 12
6 2 5
−1.69 𝑏0
8 3 8 ((XTX)-1XT )Y = 3.48 = 𝑏1
12 4 2 −0.05 𝑏2
b0 = -1.69, b1 = 3.48, b2 = -0.05
Matrix Approach
Coefficients = ((XTX)-1XT )Y
So, Coefficients are:
Y X1 X2 b0 = -1.69, b1 = 3.48, b2 = -0.05
1 1 4
Y = b0 + b1X1 + b2X2
6 2 5
8 3 8 Y = -1.69 + 3.48X1 + -0.05X2

12 4 2
Polynomial Regression Model
It is the extended version of Simple Linear Model
Polynomial
• Zero degree Polynomial
Y = ax0 = a = Constant
• One degree Polynomial
Y = a + bx1 = Simple Linear Equation
• Two degree Polynomial
Y = a + bx1 + bx2
• n degree Polynomial
Y = a + bx1 + bx2 + bx3 +……….. + bxn
Regression Model

Simple Linear • Y = a + bX
Regression
Multiple Linear • Y = a + b1X1 + b2X2 + b3X3 + ……….. + bnXn
Regression
Polynomial • Y = a + bX1 + bX2 + bX3 +……….. + bXn
Regression
1 - Linear Relationship
Between dependent and independent variables
2 - Normal Distribution of Residuals
Mean should be zero
3 - Very low/No Multicollinearity
As we can see, there is no relation between
independent variables
4- No Auto-correlation

• Whenever you plot errors you should not find any

correlation between them
5 - Homoscedasticity
• Homo → Same
• Scedasticity → spread/scatter
‘Having the same scatter’
Application of Linear Regression
• House Price Prediction
• Bitcoin Price Prediction
• Stock Market Analysis
• Market Sales Prediction
• Rainfall Prediction
• Weather Prediction
Logistic Regression
Logistic Regression

𝑌 = 𝜎 𝑎 + 𝑏𝑥
Sigmoid
Function
Logistic Regression

1
𝑌= −(𝑎+𝑏𝑥)
1+ 𝑒
Logistic Regression
Study Exam
Hours Result
X Y
• Supervised Classification Model
2 0 • Dependent Variable (Y) is
3 0
Categorical or binary (0 or 1)
4 0
5 1 • Independent Variable (X) is
6 1 Continuous
7 1
8 1
Linear Regression Vs Logistic Regression
What is error

• The difference between predicted values and the

actual values.
𝐸𝑟𝑟𝑜𝑟 = 𝑌 − 𝑌෠

Observed Predicted value

or actual
value
Mean Squared Error

Note: MSE can be used to calculate

loss in Linear regression but can’t be
used in Logistic Regression
Support Vector Machine

• Supervised Machine
Learning Algorithm.
• Binary Classification.
• Vectors means the
data points.
Basic Concepts in SVM

• Support Vectors – The data points closest to

hyperplane.
• Hyperplane – Line which divides into two different
classes.
• Margin – Gap between 2 lines on closest data points
of two different classes.
How to choose the Hyperplane
Scenario
SVM chooses the hyperplane with maximum margin
Non- Linearly Separable Data Points
Kernel Functions

• Mathematical Functions
• Take data at input and transform it into required output.
• Different Kernal Functions are:
– Linear Kernel
– Polynomial Kernel
– Gaussian Kernel
– Radial Basis Function (RBF)
Linear Kernel

• When data is linearly separable, then Linear

Kernel is used.
• For eg, we have 2 vectors x1 and y1, then linear
Kernel K is:
𝐾 𝑥1, 𝑦1 = 𝑥1. 𝑦1
Polynomial Kernel

• Allowed for the curved lines in the

input space.
𝐾 𝑥𝑖, 𝑦𝑖 = 1 + 𝑥𝑖 , 𝑦𝑖 d
Where d = degree of polynomial.

• Very popular in Image Processing.

Gaussian Kernel

• When there is no prior knowledge

of data.
|𝑥−𝑦|2
−( 2 )
𝐾 𝑥, 𝑦 = 𝑒 2𝜎
Radial Basis Function (RBF)

𝐾 𝑥𝑖 , 𝑥𝑗 = exp −𝐺𝑎𝑚𝑚𝑎 ∗ 𝑆𝑢𝑚 𝑥𝑖 − 𝑥𝑗 2

Where Gamma (𝛾) = Constant Parameter (0 < 𝛾 < 1)

Naïve Bayes Classifier
• Naïve means “untrained” or “without experience”.
• Based on Bayes Theorem.
• Supervised learning algorithm.
• Simple and powerful.
• Assumption made here is that every feature is class
conditionally independent.
Marginal Probability
• Simplest form of probability
• Occurring of event A in presence of all other
events.
Favourable Events
P(A) =
𝑇𝑜𝑡𝑎𝑙 𝐸𝑣𝑒𝑛𝑡𝑠
Eg – Probability of a card being Ace in a deck of
52 cards.
4 1
P(Ace) = =
52 13
Joint Probability
• Occurring of 2 events at same time
𝑃 𝐴, 𝐵 = 𝑃 𝐴 ∩ 𝑃 𝐵 = P (A and B)
Eg - The probability that a card is an Ace and red.
𝑃 𝐴𝑐𝑒, 𝑅𝑒𝑑 = 𝑃 𝐴𝑐𝑒 𝑎𝑛𝑑 𝑅𝑒𝑑
2
=
52
1
=
26
Conditional Probability
• Occurrence of event B when event A is already
occurred.
𝑃(𝐵 ∩ 𝐴)
𝑃 𝐵𝐴 =
𝑃(𝐴)
Eg - Given that you drew a red card, what’s the
probability that it’s an Ace
𝑃(𝐴𝑐𝑒 𝑎𝑛𝑑 𝑅𝑒𝑑)
𝑃 𝐴𝑐𝑒 𝑅𝑒𝑑 =
𝑃(𝑅𝑒𝑑)
2/52
=
26/52
1
=
13
Bayes Theorem
Outlook Play Tennis
Rainy Yes
Total No. of Yes = 10
Sunny Yes Total No. of No = 4
Overcast Yes
Overcast Yes P(Yes) = 10/14
Sunny No P(No) = 4/14
Rainy Yes
Sunny Yes P(Sunny) = 5/14
Overcast Yes P(Rainy) = 4/14
Rainy No P(Overcast) = 5/14
Sunny No
Sunny Yes
Rainy No
Overcast Yes
Overcast Yes
Step-1: Make a frequency table

Outlook Yes No
Overcast
Rainy
Sunny
Total
Step-1: Make a frequency table

Outlook Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Step-2: Make Likelihood Table

Outlook Yes No Likelihood

Overcast
Rainy
Sunny
All
Step-2: Make Likelihood Table

Outlook Yes No Likelihood

Overcast 5 0 5/14
Rainy 2 2 4/14
Sunny 3 2 5/14
All 10/14 4/14
Outlook P(Outook|Yes) P(Outook|No)

Overcast
Rainy
Sunny
Outlook P(Outook|Yes) P(Outook|No)

Overcast 5/10 0
Rainy 2/10 2/4
Sunny 3/10 2/4
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
Step-3: Apply Bayes’ Theorem:
𝑃 𝐵 𝐴 . 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
• First, we find the probability of Yes when it is Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 ∗𝑃(𝑌𝑒𝑠)
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• First, we find the probability of Yes when it is Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 ∗𝑃(𝑌𝑒𝑠)
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
3 10
∗
10 14
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 = 5 = 3/5 = 0.60
14
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• Second, we find the probability of No when it is
Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑁𝑜 ∗𝑃(𝑁𝑜)
𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
2 4
∗
4 14
𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 = 5 = 2/5 = 0.40
14
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• So, P(Yes|Sunny) > P(No|Sunny)
= 0.60 > 0.40
Therefore, we can say that Player can play tennis on a
sunny day.
P(Play Tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Humidity Prob.
Outlook Prob. Temperature Prob.
High
Sunny hot
Normal
Overcast mild

Rain Windy Prob. cool

true
false
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook Prob. Humidity Prob.

Temperature Prob.
Sunny 5/14 High 7/14
hot 4/14
Normal 7/14
Overcast 4/14 mild 6/14
Rain 4/14 Windy Prob. cool 4/14

true 6/14
false 8/14
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook yes no Humidity yes no Temperature yes no

Sunny High hot
Normal
Overcast mild
Rain cool
Windy yes no
true
false
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook yes No Humidity yes no Temperature yes no

Sunny 2/9 3/5 High 3/9 4/5 hot 2/9 2/5
Normal 6/9 1/5
Overcast 4/9 0 mild 4/9 2/5
Rain 3/9 2/5 cool 3/9 1/5
Windy yes no
true 3/9 3/5
false 6/9 2/5
Find the probability to play tennis on 15th day using
Naïve Bayes Classifier where conditions are:
Outlook = Sunny, Temperature = Cool,
Humidity = High and Wind = true
argmax P(Yj)∏iP(Xi|Yj)
→ P(Y|X) =
∏iP(Xi)
→ P(yes|X) =
P(yes) x P(Sunny|yes) x P(Cool|yes) x P(High|yes) x P(true|yes)
P(Sunny) x P(Cool) x P(High) x P(true)

9/14 x 2/9 x 3/9 x 3/9 x 3/9

5/14 x 3/5 x 1/5 x 4/5 x 3/5

= = 0.9408 Thus, P(no|X) > P(yes|X)
5/14 x 4/14 x 7/14 x6/14
So, the result is No.
Applications Of Naïve Bayes Classifier
• Real Time Prediction
• Text Classification
• Spam Filtering
• Sentiment Analysis
• Recommendation System
• Multiclass Classification
Advantages and Disadvantages

• Advantages:
– Fast and easy algorithm
– Can be used for binary and multi classification
– Mostly used for text classification
• Disadvantages:
– Cannot learn relation between independent features
Bayesian Belief Network
• Probabilistic Graphical Model.
• Represents a set of variables and
their conditional dependencies
using a directed acyclic graph.
• Two major components:
• Directed Acyclic Graph (DAG)
• Table of Conditional
Probabilities
Bayesian Belief Network

• Node represents the

random variables
• Arc represents the
casual relationships or
conditional probabilities
between random
variables.
Example 1

0.001
Calculate the probability that alarm has
sounded, but there is neither a burglary,
nor an earthquake occurred, and David
and Sophia both called the Harry.

• We will calculate the joint probability of all the

events
– P(¬B, ¬E, A, D, S)
= P (¬B) *P (¬E) * P (A|¬B ^ ¬E) * P (S|A) * P (D|A)
= 0.998*0.999 * 0.001* 0.75* 0.91
= 0.00068045.
0.001

What is the probability that David called?

P(D) = P(D|A)P(A) + P(D|⌐A)P(⌐A)
What is the probability that David called?
P(D) = P(D|A)P(A) +
P(D|⌐A)P(⌐A)

P(⌐A) = P(⌐A|B,E)P(B)P(E)+
0.001 P(⌐A|B, ⌐E)P(B)P(⌐E)+ P(⌐A| ⌐
B,E)P(⌐ B)P(E)+ P(⌐A|⌐B,
⌐E)P(⌐B)P(⌐E)
What is the probability that David called?

• P(A) = 0.00252
• P(⌐A) = 0.99748
• P(D) = P(D|A)P(A) + P(D|⌐A)P(⌐A)
• P(D) = 0.91 * 0.00252 + 0.05 * 0.99748
EM Algorithm
• E -> Expectation
• M -> Maximization
• Used to find latent variable.
• Latent variable – not directly observed
• Basically, used for many unsupervised
clustering algorithm
Steps involved in EM Algorithm
• Step 1- A set of initial values are considered
– Set of incomplete data is given to the system.
• Step 2 - Expectation Step or E-step
– Use observed data to estimate or guess the values.
• Step 3 – Maximization Step or M-Step
– Update the generated values
• Step 4 – To check values are converging or not
– If converging – stop
– Otherwise repeat step 2 or 3 till the convergence
occurs
Usage of EM Algorithm
• Used to fill missing data.
• Used for unsupervised clustering.
• Used to discover values of latent
variable.
• Used to calculate Gaussian density of a
function.
• Used to estimate parameters of Hidden
Markov Model.
Advantages & Disadvantages

Advantages Disadvantages
• Easy to implement as it has • Slow convergence.
only 2 steps E-step and M- • Make convergence local
step. optimal only.
• Likelihood increases after • Required both forward and
each iteration. backward probabilities.
• Solution of M-Step exists in
closed form.
Concept Learning
• “A task of acquiring potential
hypothesis (solution) that best fits the
given training examples”.

• Main Goal – Find all

concepts/hypothesis that are
consistent.
• For each attribute, the hypothesis will
either
• indicate by a “?’ that any value is
acceptable for this attribute,
• specify a single required value
(e.g., Warm) for the attribute, or
• indicate by a “ø” that no value is
acceptable.
Most General and Specific Hypothesis
• The most general hypothesis-that every day
is a positive example-is represented by
(?, ?, ?, ?, ?, ?)
• The most specific possible hypothesis-that
no day is a positive example-is represented
by
(ø, ø, ø, ø, ø, ø)
Types Of Concept Learning

List then Candidate

Find-S
eliminate Elimination
Algorithm
Algorithm Algorithm
Find – S Algorithm
• Step-1: Initialize with most specific
hypothesis (Փ).
H0 = < Փ , Փ , Փ, Փ, Փ >
• Step 2: For each +ve sample,
– For each attribute,
• If (value = Hypothesis value) => Ignore
Else
Replace with the most general hypothesis
(?).
• h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
• h2 = <Sunny, Warm, ?, Strong, Warm, Same>
• h3 = h2
• h4 = <Sunny, Warm, ?, Strong, ?, ?>
• h4 → most specific hypothesis
Disadvantage of Find-S algorithm

• Considers only +ve values.

• h4 may not be the sole hypothesis that

fits the complete data.
Candidate Elimination Algorithm
• Used the concept of version space.
• Considers both +ve and –ve values.
• For +ve samples,
– move from specific(Փ) to general(?).
• For –ve samples,
– move from general(?) to specific(Փ).
Example
S0 = <Փ,Փ,Փ,Փ,Փ,Փ>
G0 = <?,?,?,?,?,?>

1) +ve
S1 = < Sunny, Warm, Normal,
Strong, Warm, Same>
G1 = <?,?,?,?,?,?>
2) +ve
S2 = < Sunny, Warm, ?, Strong, Warm, Same>
G2 = <?,?,?,?,?,?>

3) –ve
S3 = < Sunny, Warm, ?, Strong, Warm, Same>
G3 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>,<?,?,?,?,?,same>>
4) +ve
S4 = < Sunny, Warm, ?, Strong, ?, ?>
G4 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>>

S4 and G4 are final hypothesis

SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Find S algorithm.

S0 = <Փ,Փ,Փ,Փ>

1) +ve (Honda, Blue, 1970, Economy)

S1 = < Honda, Blue, 1970, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Find S algorithm.

2) -ve (Toyota, Green, 1980, Sports)

S1 = < Honda, Blue, 1970, Economy>
S2 = S1
S2 = < Honda, Blue, 1970, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

3) +ve (Toyota, Blue, 1990, Economy)

S2 = < Honda, Blue, 1970, Economy>
S3 = <?, Blue, ?, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

4) -ve (BMW, Red, 2000, Economy)

S3 = <?, Blue, ?, Economy>
S4 = S3
S4 = <?, Blue, ?, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

5) +ve (Hona, White, 2010, Economy)

S4 = <?, Blue, ?, Economy>
S5 = <?, ?, ?, Economy> Final Specific Hypothesis
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm
S0 = <Փ,Փ,Փ,Փ>
G0 = <?,?,?,?>
1) +ve (Honda, Blue, 1970, Economy)
S1 = < Honda, Blue, 1970, Economy>
G1 = <?,?,?,?,?,?>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm

2) -ve (Toyota, Green, 1980, Sports)

S2 = S1
S2 = < Honda, Blue, 1970, Economy>
G2 = <<Honda,?,?,?>,<?,Blue,?,?>,<?,?,1970,?>,<?,?,?, Economy>>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm

3) +ve (Toyota, Blue, 1990, Economy)

S2 = < Honda, Blue, 1970, Economy >
S3 = < ?, Blue, ?, Economy>
G2 = <<Honda,?,?,?>,<?,Blue,?,?>,<?,?,1970,?>,<?,?,?, Economy >>
G3 = <<?,Blue,?,?>,<?,?,?, Economy >>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Candidate Elimination Algorithm

4) -ve (BMW, Red, 2000, Economy)

S4 = S3
S4 = < ?, Blue, ?, Economy>
G4 = <<?,Blue,?,?>,<?,?,?, Economy >>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm
5) +ve (Honda, White, 2010, Economy)
S5 and G5 are final
S4 = < ?, Blue, ?, Economy >
Hypothesis.
S5 = < ?, ?, ?, Economy>
G4 = <<?,Blue,?,?>,<?,?,?, Economy >>
G5 = <?,?,?, Economy >
Consistent Hypothesis (H)

• A hypothesis (h) is consistent with a set of training

examples (D) if and only if h(x) = C(x) for each
example <x, C(x)> in D”.
– Consistent (h,D) Ξ ( ꓯ<x, C(x)> є D) h(x) = C(x)
Example

Example Citations Size In Library Price Editions Buy

1 Some Small No Afordable One No
2 Many Big No Expensive Many Yes

h1 = (?, ?, No, ?, Many) Consistent

h2 = (?, ?, No, ?, ?) Not Consistent

Version Space (VSH,D )

• The version space with respect to hypothesis space

(H) and training examples (D), is the subset of
hypotheses from (H) consistent with the training
examples in D.
– VSH,D = {h є H | Consistent (h, D)}
List then Eliminate

• Step 1- Version Space : A list of every hypothesis in H.

• Step 2- For each training example (x,c(x))
• Remove from version space any hypothesis (h) which
is not consistent (h(x) != c(x)).
• Step 3- Output the list of hypothesis remaining in the
version space.
Example

• F1 -> A, B
• F2-> X, Y
• Instance Spaces: (A,X), (A,Y), (B,X), (B,Y) – 4 examples
• Hypothesis Space: (A,X), (A,Y), (A, Փ), (A,?), (B,X),
(B,Y), (B, Փ), (B,?), (Փ,X), (?,X), (Փ,Y), (?,Y), (Փ, Փ),
(Փ,?), (?, Փ), (?,?) - 16 Hypothesis
List then Eliminate

• Semantically Distinct Hypothesis: (A,X), (A,Y), (A,?),

(B,X), (B,Y), (B,?), (?,X), (?,Y), (?,?), (Փ, Փ) – 10
• Now, Using list then eliminate algorithm:
• Step 1: Version Space:
– (A,X), (A,Y), (A,?), (B,X), (B,Y), (B,?), (?,X), (?,Y),
(?,?), (Փ, Փ)
List then Eliminate
• Step 1: Version Space:
– (A,X), (A,Y), (A,?), (B,X), (B,Y), (B,?), (?,X),
(?,Y), (?,?), (Փ, Փ)
F1 F2 Target
• Training Instances: A X Yes
A Y Yes

• Step 2: Remove from Version Space which is

not consistent:
– (A,?), (?,) → Consistent Hypothesis
Problem with List then Eliminate
Algorithm

• The Hypothesis space must be finite.

• Enumeration of all the hypothesis,

rather inefficient because listing all the
hypothesis is just a waste of time.
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Arathi
No ratings yet
Arathi
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Module 2 Notes
No ratings yet
Module 2 Notes
4 pages
chp6 (10) Fam
No ratings yet
chp6 (10) Fam
24 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Linear Regression
No ratings yet
Linear Regression
25 pages
Lecture 9-10 - Updated Vesion S25 - Regression
No ratings yet
Lecture 9-10 - Updated Vesion S25 - Regression
43 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
03 Regression
No ratings yet
03 Regression
33 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Assignment#4: A) Draw The Scatter Plot of The Data, If You Can Plot Via R Would Also Be Acceptable
No ratings yet
Assignment#4: A) Draw The Scatter Plot of The Data, If You Can Plot Via R Would Also Be Acceptable
10 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Regression
No ratings yet
Regression
60 pages
Iml Unit III
No ratings yet
Iml Unit III
18 pages
ML Unit
No ratings yet
ML Unit
23 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
NM Presentation
No ratings yet
NM Presentation
16 pages
Multiple Regression Problems Edited
No ratings yet
Multiple Regression Problems Edited
9 pages
Regression Model
No ratings yet
Regression Model
30 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Regression
No ratings yet
Regression
64 pages
Intro to Linear Regression Analysis
100% (2)
Intro to Linear Regression Analysis
39 pages
Chap 5
No ratings yet
Chap 5
13 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
Stats 101 - Class 03
No ratings yet
Stats 101 - Class 03
94 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
ML Exp 2
No ratings yet
ML Exp 2
8 pages
CPSC 4830 2025summer Lecture 3
No ratings yet
CPSC 4830 2025summer Lecture 3
33 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Regression Analysis
100% (2)
Regression Analysis
28 pages
AI & ML Lab Manual - LDCE
No ratings yet
AI & ML Lab Manual - LDCE
70 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
21 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Linear Regression
No ratings yet
Linear Regression
32 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
Multiple Regression Models
No ratings yet
Multiple Regression Models
10 pages
Multivar 2 - Simple and Multiple Regression PDF
No ratings yet
Multivar 2 - Simple and Multiple Regression PDF
26 pages
Regression
No ratings yet
Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Sessions 18 19 - Regression - SLR MLR
No ratings yet
Sessions 18 19 - Regression - SLR MLR
70 pages
Sma32
No ratings yet
Sma32
30 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Lecture 4 - Linear Regression
No ratings yet
Lecture 4 - Linear Regression
18 pages
B.Tech Linear Regression Guide
No ratings yet
B.Tech Linear Regression Guide
6 pages
MSE250 Syllabus Fall2013
No ratings yet
MSE250 Syllabus Fall2013
4 pages
Higher Ed Admissions Leader Resume
No ratings yet
Higher Ed Admissions Leader Resume
2 pages
Table of Contents
No ratings yet
Table of Contents
11 pages
Crazy Dictation PDF
No ratings yet
Crazy Dictation PDF
2 pages
EST Exams Date Sheet - MBA Batch 2024-26
No ratings yet
EST Exams Date Sheet - MBA Batch 2024-26
3 pages
Impact of Subsidy Removal On Nigerian Educational System
No ratings yet
Impact of Subsidy Removal On Nigerian Educational System
12 pages
Description: SAP Certified Application Associate - SAP HCM With ERP 6.0 EHP7 (C - THR12 - 67)
No ratings yet
Description: SAP Certified Application Associate - SAP HCM With ERP 6.0 EHP7 (C - THR12 - 67)
3 pages
Introduction Sla
No ratings yet
Introduction Sla
6 pages
Co Tefyl PDF
No ratings yet
Co Tefyl PDF
5 pages
Powerpoint Slides Prepared by Robert F. Brooker, Ph.D. Slide 1
No ratings yet
Powerpoint Slides Prepared by Robert F. Brooker, Ph.D. Slide 1
55 pages
Hemispherotomy PPT Draft 1
No ratings yet
Hemispherotomy PPT Draft 1
66 pages
Social Relations & Externality in PNG
No ratings yet
Social Relations & Externality in PNG
13 pages
Be - Information Technology - Semester 8 - 2023 - December - Blockchain and DLT Rev 2019 C Scheme
No ratings yet
Be - Information Technology - Semester 8 - 2023 - December - Blockchain and DLT Rev 2019 C Scheme
1 page
Principles of Microeconomics 5 Ed Frank
No ratings yet
Principles of Microeconomics 5 Ed Frank
301 pages
School Event Observations
No ratings yet
School Event Observations
3 pages
Assignment Cognitve Illusions
No ratings yet
Assignment Cognitve Illusions
9 pages
Farhan Habib (Team Lead-Supervisor) CV
No ratings yet
Farhan Habib (Team Lead-Supervisor) CV
2 pages
Lesson 5 May I Come in
No ratings yet
Lesson 5 May I Come in
58 pages
Grade 6 Daily Lesson Plan Technology and Livelihood Education Agriculture Week 1
No ratings yet
Grade 6 Daily Lesson Plan Technology and Livelihood Education Agriculture Week 1
22 pages
How To Do The History of Homosexuality David M Halperin Download
100% (4)
How To Do The History of Homosexuality David M Halperin Download
81 pages
Counselling Process
100% (2)
Counselling Process
3 pages
Elite Express 3 May
No ratings yet
Elite Express 3 May
27 pages
Informative and Explanatory Writi... : # of Students
No ratings yet
Informative and Explanatory Writi... : # of Students
8 pages
Smart Notebook Lesson
No ratings yet
Smart Notebook Lesson
5 pages
U2 - Progress Check - Revisión Del Intento
No ratings yet
U2 - Progress Check - Revisión Del Intento
7 pages
Bilingualism: Early Learning Benefits
No ratings yet
Bilingualism: Early Learning Benefits
6 pages
2005 - EnG106 - Technical Writing and Scientific Writing
No ratings yet
2005 - EnG106 - Technical Writing and Scientific Writing
6 pages
Pattison-Gordon Resume+PublicationList
No ratings yet
Pattison-Gordon Resume+PublicationList
3 pages
Complete Your Office Microsoft Office 2016 Volume 1 1st Edition Kinser Verified
No ratings yet
Complete Your Office Microsoft Office 2016 Volume 1 1st Edition Kinser Verified
317 pages
CSSGB Exam Prep Questions
No ratings yet
CSSGB Exam Prep Questions
4 pages

ML Unit-2

Uploaded by

ML Unit-2

Uploaded by

Machine Learning

(sq. feet) (in Lakhs) 20

If you have 150 sq. feet house, predict the price?

Diameter in Inches (X) Price in Dollar (Y)

Step 1: Calculate X12, X22, X1y, X2y and X1X2.

The formula to calculate b1 is:

The formula to calculate b2 is:

The formula to calculate b0 is:

The estimated linear regression equation is:

• Whenever you plot errors you should not find any

• The difference between predicted values and the

Observed Predicted value

Note: MSE can be used to calculate

• Support Vectors – The data points closest to

• When data is linearly separable, then Linear

• Allowed for the curved lines in the

• Very popular in Image Processing.

• When there is no prior knowledge

𝐾 𝑥𝑖 , 𝑥𝑗 = exp −𝐺𝑎𝑚𝑚𝑎 ∗ 𝑆𝑢𝑚 𝑥𝑖 − 𝑥𝑗 2

Where Gamma (𝛾) = Constant Parameter (0 < 𝛾 < 1)

Outlook Yes No Likelihood

Outlook Yes No Likelihood

Rain Windy Prob. cool

Outlook Prob. Humidity Prob.

Outlook yes no Humidity yes no Temperature yes no

Outlook yes No Humidity yes no Temperature yes no

9/14 x 2/9 x 3/9 x 3/9 x 3/9

5/14 x 3/5 x 1/5 x 4/5 x 3/5

• Node represents the

• We will calculate the joint probability of all the

What is the probability that David called?

• Main Goal – Find all

List then Candidate

• Considers only +ve values.

• h4 may not be the sole hypothesis that

S4 and G4 are final hypothesis

1) +ve (Honda, Blue, 1970, Economy)

2) -ve (Toyota, Green, 1980, Sports)

3) +ve (Toyota, Blue, 1990, Economy)

4) -ve (BMW, Red, 2000, Economy)

5) +ve (Hona, White, 2010, Economy)

2) -ve (Toyota, Green, 1980, Sports)

3) +ve (Toyota, Blue, 1990, Economy)

Candidate Elimination Algorithm

4) -ve (BMW, Red, 2000, Economy)

• A hypothesis (h) is consistent with a set of training

Example Citations Size In Library Price Editions Buy

h1 = (?, ?, No, ?, Many) Consistent

h2 = (?, ?, No, ?, ?) Not Consistent

• The version space with respect to hypothesis space

• Step 1- Version Space : A list of every hypothesis in H.

• Semantically Distinct Hypothesis: (A,X), (A,Y), (A,?),

• Step 2: Remove from Version Space which is

• The Hypothesis space must be finite.

• Enumeration of all the hypothesis,

You might also like