ML Unit-2
ML Unit-2
Techniques
          KCS 055
Regression Algorithm
House Price
Challenges in guessing the House price
Predicting the price with the help of ML model
                Regression Model
Simple Linear Regression
          Y = a + bX
         Dependent
         Variable
                                      Independent
                                      Variable
               Y-intercept
             (The value of
             Y when x is 0)
                           Slope
                       (How much Y
                       changes for a unit
                       change in X)
                                                Linear Regression
                                  35
                                                                               30
                                  30
                                  25
Area         Price                                                 20
(sq. feet)   (in Lakhs)           20
                          Price
100          10                   15
                                                10
200          20                   10
300 30 5
                                  0
                                       0   50   100   150          200   250   300   350
                                                            Area
                                                Linear Regression
                                  35
                                                                               30
                                  30
Area Price 25
                          Price
                                  15
200          20                                 10
                                  10
300          30
                                  5
Y = a + bX                        0
                                       0   50   100   150          200   250   300   350
Y -> Price                                                  Area
X -> Area
                                                                     Linear Regression
                                                           Slope (b) = Sum of product of deviation/ Sum of square
                                                           of deviation for X
                                                           Y-intercept (a) = Mean of Y – (b* Mean of X)
Area (X)       Price (Y) Mean     Mean     Deviation (X)         Deviation (Y)    Product of       Square of
(sq. feet)     ( Lakhs) of X      of Y     X – mean(X)           Y – mean(Y)      Deviations       Deviation for X
100            10         200     20       100 – 200 = -100      10 - 20= -10     1000             10,000
200            20                          200-200 = 0           20-20 = 0        0                0
300            30                          300 – 200 = 100       30 – 20 = 10     1000             10,000
                                                     30
        30
25
                                         20
Price   20
15
                      10
        10
        0
             0   50   100   150          200   250   300   350
                                  Area
                   35
                                                                30
                   30
25
                                                    20
           Price   20
Outliers
                   15
                                 10
                   10
                   0
                        0   50   100   150          200   250   300   350
                                             Area
Outliers
An observation that lies an abnormal distance from other
values in a random sample from a population
              Predict the price of the pizza whose
              diameter is 20 inches.
10 13
12                      16
                                   Predict the price of the pizza whose
                                   diameter is 20 inches.
Diameter       Price (Y) Mean     Mean     Deviation (X)       Deviation (Y)     Product of    Square of
(X)            (Dollar) of X      of Y     X – mean(X)         Y – mean(Y)       Deviations    Deviation for X
(inches)
8              10           10    13       8 -10 = -2          10 – 13 = -3      6             4
10             13                          10 – 10 = 0         13 -13 = 0        0             0
12             16                          12- 10 = 2          16 – 13 = 3       6             4
     Slope (b) = Sum of product of deviation/ Sum of square of deviation for X       Price when X is 20
     Y-intercept (a) = Mean of Y – (b* Mean of X)
                                                                                     Price = a + bx
     Slope (b) = 1.5                                                                       = -2 + 1.5 * 20
     Y-intercept (a) = -2                                                                  = 28
                  Pizza Price
30
25
20
15
10
0
     0   5   10                 15   20   25
-5
The world in not so linear
Multiple Linear Regression
       • When the data has more
         than one independent
         variable.
          Y = a + b1X1+ b2X2 + b3X3
          ………………. + bnXn
Dataset
      Use the following steps to fit a multiple linear
      regression model to this dataset.
In our example, it is
Y = -6.867 + 3.148x1 – 1.656x2
                                           Matrix Approach
          Coefficients = ((XTX)-1XT )Y
                      1    1   4
Y    X1    X2                                   1 1 1 1
                   X= 1    2   5           XT = 1 2 3 4
1    1     4          1    3   8
                                                4 5 8 2
                      1    4   2     4x3
                                                           3x4
6    2     5
                        1
8    3     8                       (((XT)3x4X4x3)-1)3x3(XT)3x4 ) 3x4Y4x1
                     Y= 6
                        8           = (result) 3x1
12   4     2
                        12
                               4x1
                                    Matrix Approach
          Coefficients = ((XTX)-1XT )Y
                              1   1 1     1     1   1   4
                       X TX = 1   2 3     4  *  1   2   5
Y    X1    X2                                   1   3   8
                              4   5 8     2
                                                1   4   2
1    1     4                   4   10    19
6    2     5
                       XTX = 10    30    46
                              19   46    109
8    3     8
                                 3.15     −0.59     −0.30
12   4     2           (XTX)-1 = −0.59     0.20     0.016
                                 −0.30    0.016     0.054
                                     Matrix Approach
          Coefficients = ((XTX)-1XT )Y
                             3.15 −0.59   −0.30   1 1 1   1
                (XTX)-1XT = −0.59 0.20    0.016 * 1 2 3   4
Y    X1    X2               −0.30 0.016   0.054   4 5 8   2
1    1     4        (XTX)-1XT =
6    2     5
                       0.05  0.47 − 1.02  0.19
8    3     8          −0.32 −0.098 0.155  0.26
12   4     2          −0.065 0.005 0.185 −0.125
                                    Matrix Approach
          Coefficients = ((XTX)-1XT )Y
                 ((XTX)-1XT )Y =
                                                          1
Y    X1    X2      0.05       0.47 − 1.02          0.19
                                                        * 6
1    1     4      −0.32 −0.098 0.155               0.26   8
                  −0.065 0.005 0.185 −0.125               12
6    2     5
                                  −1.69       𝑏0
8    3     8      ((XTX)-1XT )Y = 3.48 = 𝑏1
12   4     2                      −0.05       𝑏2
                  b0 = -1.69, b1 = 3.48, b2 = -0.05
                                     Matrix Approach
          Coefficients = ((XTX)-1XT )Y
                        So, Coefficients are:
Y    X1       X2              b0 = -1.69, b1 = 3.48, b2 = -0.05
1    1        4
                        Y = b0 + b1X1 + b2X2
6    2        5
8    3        8         Y = -1.69 + 3.48X1 + -0.05X2
12   4        2
         Polynomial Regression Model
It is the extended version of Simple Linear Model
                                    Polynomial
• Zero degree Polynomial
      Y = ax0 = a = Constant
• One degree Polynomial
      Y = a + bx1 = Simple Linear Equation
• Two degree Polynomial
      Y = a + bx1 + bx2
• n degree Polynomial
      Y = a + bx1 + bx2 + bx3 +……….. + bxn
                                 Regression Model
Simple Linear     • Y = a + bX
 Regression
Multiple Linear   • Y = a + b1X1 + b2X2 + b3X3 + ……….. + bnXn
 Regression
  Polynomial      • Y = a + bX1 + bX2 + bX3 +……….. + bXn
  Regression
             1 - Linear Relationship
Between dependent and independent variables
2 - Normal Distribution of Residuals
      Mean should be zero
3 - Very low/No Multicollinearity
As we can see, there is no relation between
independent variables
                            4- No Auto-correlation
𝑌 = 𝜎 𝑎 + 𝑏𝑥
  Sigmoid
  Function
            Logistic Regression
           1
𝑌=          −(𝑎+𝑏𝑥)
     1+   𝑒
                        Logistic Regression
Study   Exam
Hours   Result
  X       Y
                 • Supervised Classification Model
  2       0      • Dependent Variable (Y) is
 3        0
                  Categorical or binary (0 or 1)
 4        0
 5        1      • Independent Variable (X) is
 6        1       Continuous
 7        1
 8        1
Linear Regression Vs Logistic Regression
                                                What is error
• Supervised Machine
  Learning Algorithm.
• Binary Classification.
• Vectors means the
  data points.
                          Basic Concepts in SVM
• Mathematical Functions
• Take data at input and transform it into required output.
• Different Kernal Functions are:
   – Linear Kernel
   – Polynomial Kernel
   – Gaussian Kernel
   – Radial Basis Function (RBF)
                                  Linear Kernel
Outlook    Yes   No
Overcast
Rainy
Sunny
Total
 Step-1: Make a frequency table
Outlook    Yes   No
Overcast   5     0
Rainy      2     2
Sunny      3     2
Total      10    4
             Step-2: Make Likelihood Table
Overcast
Rainy
Sunny
Outlook    P(Outook|Yes)   P(Outook|No)
Overcast   5/10            0
Rainy      2/10            2/4
Sunny      3/10            2/4
                  Find the probability to play tennis
                      on 15th day using Naïve Bayes
                  Classifier where Outlook is Sunny
Step-3: Apply Bayes’ Theorem:
                           𝑃 𝐵 𝐴 . 𝑃(𝐴)
                 𝑃 𝐴𝐵 =
                                𝑃(𝐵)
• First, we find the probability of Yes when it is Sunny
                          𝑃   𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠   ∗𝑃(𝑌𝑒𝑠)
       𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
                                 𝑃(𝑆𝑢𝑛𝑛𝑦)
                  Find the probability to play tennis
                      on 15th day using Naïve Bayes
                  Classifier where Outlook is Sunny
• First, we find the probability of Yes when it is Sunny
                         𝑃   𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠   ∗𝑃(𝑌𝑒𝑠)
      𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
                                 𝑃(𝑆𝑢𝑛𝑛𝑦)
                          3 10
                            ∗
                         10 14
      𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =         5    = 3/5 = 0.60
                           14
                 Find the probability to play tennis
                     on 15th day using Naïve Bayes
                 Classifier where Outlook is Sunny
• Second, we find the probability of No when it is
  Sunny
                        𝑃   𝑆𝑢𝑛𝑛𝑦 𝑁𝑜   ∗𝑃(𝑁𝑜)
      𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 =
                               𝑃(𝑆𝑢𝑛𝑛𝑦)
                        2 4
                         ∗
                        4 14
      𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 =        5    = 2/5 = 0.40
                         14
                 Find the probability to play tennis
                     on 15th day using Naïve Bayes
                 Classifier where Outlook is Sunny
• So, P(Yes|Sunny) > P(No|Sunny)
      = 0.60 > 0.40
Therefore, we can say that Player can play tennis on a
sunny day.
           P(Play Tennis = yes) = 9/14 = 0.64
           P(Play Tennis = no) = 5/14 = 0.36
                     Humidity   Prob.
Outlook Prob.                              Temperature Prob.
                     High
Sunny                                      hot
                     Normal
Overcast                                   mild
                     true
                     false
          P(Play tennis = yes) = 9/14 = 0.64
          P(Play Tennis = no) = 5/14 = 0.36
                       true       6/14
                       false      8/14
           P(Play tennis = yes) = 9/14 = 0.64
           P(Play Tennis = no) = 5/14 = 0.36
• Advantages:
   – Fast and easy algorithm
   – Can be used for binary and multi classification
   – Mostly used for text classification
• Disadvantages:
   – Cannot learn relation between independent features
Bayesian Belief Network
          • Probabilistic Graphical Model.
          • Represents a set of variables and
            their conditional dependencies
            using a directed acyclic graph.
          • Two major components:
             • Directed Acyclic Graph (DAG)
             • Table of Conditional
               Probabilities
                  Bayesian Belief Network
  0.001
              Calculate the probability that alarm has
              sounded, but there is neither a burglary,
              nor an earthquake occurred, and David
              and Sophia both called the Harry.
                     P(A) = P(A|B,E)P(B)P(E)+
            c0.001   P(A|B, ⌐ E)P(B)P(⌐ E)+
                     P(A| ⌐ B,E)P(⌐ B)P(E)+
                     P(A| ⌐ B, ⌐ E)P(⌐ B)P(⌐ E)
What is the probability that David called?
                    P(D) = P(D|A)P(A) +
                    P(D|⌐A)P(⌐A)
                    P(⌐A) = P(⌐A|B,E)P(B)P(E)+
            0.001   P(⌐A|B, ⌐E)P(B)P(⌐E)+ P(⌐A| ⌐
                    B,E)P(⌐ B)P(E)+ P(⌐A|⌐B,
                    ⌐E)P(⌐B)P(⌐E)
       What is the probability that David called?
•   P(A) = 0.00252
•   P(⌐A) = 0.99748
•   P(D) = P(D|A)P(A) + P(D|⌐A)P(⌐A)
•   P(D) = 0.91 * 0.00252 + 0.05 * 0.99748
EM Algorithm
•   E -> Expectation
•   M -> Maximization
•   Used to find latent variable.
•   Latent variable – not directly observed
•   Basically, used for many unsupervised
    clustering algorithm
           Steps involved in EM Algorithm
• Step 1- A set of initial values are considered
   – Set of incomplete data is given to the system.
• Step 2 - Expectation Step or E-step
   – Use observed data to estimate or guess the values.
• Step 3 – Maximization Step or M-Step
   – Update the generated values
• Step 4 – To check values are converging or not
   – If converging – stop
   – Otherwise repeat step 2 or 3 till the convergence
     occurs
   Usage of EM Algorithm
• Used to fill missing data.
• Used for unsupervised clustering.
• Used to discover values of latent
  variable.
• Used to calculate Gaussian density of a
  function.
• Used to estimate parameters of Hidden
  Markov Model.
                     Advantages & Disadvantages
          Advantages                    Disadvantages
• Easy to implement as it has    • Slow convergence.
  only 2 steps E-step and M-     • Make convergence local
  step.                            optimal only.
• Likelihood increases after     • Required both forward and
  each iteration.                  backward probabilities.
• Solution of M-Step exists in
  closed form.
Concept Learning
• “A task of acquiring potential
  hypothesis (solution) that best fits the
  given training examples”.
   1) +ve
   S1 = < Sunny, Warm, Normal,
   Strong, Warm, Same>
   G1 = <?,?,?,?,?,?>
2) +ve
S2 = < Sunny, Warm, ?, Strong, Warm, Same>
G2 = <?,?,?,?,?,?>
3) –ve
S3 = < Sunny, Warm, ?, Strong, Warm, Same>
G3 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>,<?,?,?,?,?,same>>
4) +ve
S4 = < Sunny, Warm, ?, Strong, ?, ?>
G4 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>>
S0 = <Փ,Փ,Փ,Փ>
Find S algorithm.
Find S algorithm.
Find S algorithm.
•   F1 -> A, B
•   F2-> X, Y
•   Instance Spaces: (A,X), (A,Y), (B,X), (B,Y) – 4 examples
•   Hypothesis Space: (A,X), (A,Y), (A, Փ), (A,?), (B,X),
    (B,Y), (B, Փ), (B,?), (Փ,X), (?,X), (Փ,Y), (?,Y), (Փ, Փ),
    (Փ,?), (?, Փ), (?,?) - 16 Hypothesis
                                    List then Eliminate
Tom M. Mitchell,         Ethem Alpaydin, ―Introduction    Stephen Marsland,          Bishop, C., Pattern
―Machine Learning,       to Machine Learning (Adaptive    ―Machine Learning: An      Recognition and Machine
McGraw-Hill              Computation and Machine          Algorithmic Perspective,   Learning. Berlin:
Education (India)        Learning), The MIT Press 2004.   CRC Press, 2009.           Springer-Verlag.
Private Limited, 2013.
                              Text Books
Saikat Dutt,          Andreas C. Müller and     John Paul Mueller and   Dr. Himanshu
Subramanian           Sarah Guido -             Luca Massaron -         Sharma, Machine
Chandramaouli, Amit   Introduction to Machine   Machine Learning for    Learning, S.K.
Kumar Das – Machine   Learning with Python      Dummies                 Kataria & Sons -2022
Learning, Pearson