Support Vector Machine (SVM)
History of SVM
•Vladimir N. Vapnik has invented SVM in 1963. He was basically Russian and
statistician, researcher, and academic.
•His non-linear model is very popular among the ML community.
•The accuracy of SVM is in the range of multilayer perceptron which was obtained
in challenge in 80s.
•Also, he published book called Statistical Learning Theory which was highly cited
book in ML community.
Support Vector Machine Algorithm
• Support Vector Machine or SVM is one of the most popular
  Supervised Learning algorithms, which is used for
  Classification as well as Regression problems. However,
  primarily, it is used for Classification problems in Machine
  Learning.
• The goal of the SVM algorithm is to create the best line or
  decision boundary that can segregate n-dimensional space
  into classes so that we can easily put the new data point in
  the correct category in the future. This best decision
  boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in
  creating the hyperplane. These extreme cases are called as
  support vectors, and hence algorithm is termed as Support
  Vector Machine.
                                 Hyperplane
A hyperplane divides any ‘d’ dimensional space into
two parts using a (d-1) dimensional hyperplane.
                                               For instance, a point in (0-D) divides a
                                               line (in 1-D) into two parts,
                                               a line in (1-D) divides a plane (in 2-D)
                                               into two parts,
                                               and a plane (in 2-D) divides a three-
                                               dimensional space into two parts.
  Linearly separable and non-linearly
             separable data
• Linearly separable data is data that is populated in such a way that
  it can be easily classified with a straight line or a hyperplane.
• Non-linearly separable data, on the other hand, is described as data
  that cannot be separated using a simple straight line (requires a
  complex classifier).
       Numbers of Hyperplane
• Problem is, there can be an infinite number of
  hyperplanes that will classify the linearly
  separable classes.
Linear SVM
                   Defining a line
 Any pair of values (w1,w2) defines a line through the origin:
                                          x2
                                               (1,2)
                            x1
      w=(1,2)
We can also view it as the
line perpendicular to the
weight vector
               Classifying with a line
  Mathematically, how can we classify points based on a
  line?
                                        x2
                                                 BLUE
(1,1):
                                                  (1,1)
(1,-1):                     x1
                                 RED          (1,-1)
                                                          w=(1,2)
  The sign indicates which side of the line
                  Defining a line
Any pair of values (w1,w2) defines a line through the origin:
                                         x2
                                              Now intersects at -
                                              1
                           x1
       -2      0.5
       -1      0
       0       -0.5
       1       -1
       2       -1.5
              Linear models
A linear model in n-dimensional space (i.e. n
features) is define by n+1 weights:
In two dimensions, a line:
                             (where b = -a)
In three dimensions, a plane:
In n-dimensions, a hyperplane
     Classifying with a linear model
 We can classify with a linear model by checking
 the sign:
                                     Positive
                                     example
x1, x2, …, xm   classifier
                                     Negative example
            Learning a linear model
Geometrically, we know what a linear model represents
Given a linear model (i.e. a set of weights and b) we can
classify examples
    Training                              How do we learn a
      Data                                linear model?
  (data with labels)
Which hyperplane would you choose?
          Large margin classifiers
                                      margin
margin
         Choose the line where the distance to the
         nearest point(s) is as large as possible
           Large margin classifiers
                                                margin
margin
 The margin of a classifier is the distance to the closest points of either class
 Large margin classifiers attempt to maximize this
    Large margin classifier setup
Select the hyperplane with the largest margin where
the points are classified correctly!
Setup as a constrained optimization problem:
        subject
        to:
                                                    what does this
                                                    say?
      yi: label for example i, either 1 (positive) or -1 (negative)
      xi: our feature vector for example i
          Large margin classifier setup
subject                    subject
to:                        to:
Large margin classifier setup
      subject
      to:
We’ll assume c =1, however, any c > 0 works
Measuring the margin
How do we calculate the
margin?
                    Support vectors
For any separating hyperplane, there exist some set of “closest
points”
These are called the support vectors
       Measuring the margin
The margin is the distance to the support
vectors, i.e. the “closest points”, on either side
of the hyperplane
     Distance from the hyperplane
How far away is this point from the
hyperplane?
                 f2
                      w=(1,2)
f1
       (-1,-2)
     Distance from the hyperplane
How far away is this point from the
hyperplane?
                 f2
                      w=(1,2)
f1
       (-1,-2)
     Distance from the hyperplane
How far away is this point from the
hyperplane?
              f2
                   w=(1,2)
                     (1,1)
f1
     Distance from the hyperplane
How far away is this point from the
hyperplane?
              f2
                   w=(1,2)
                     (1,1)
f1
                                      length normalized
                                      weight vectors
     Distance from the hyperplane
How far away is this point from the
hyperplane?
              f2
                   w=(1,2)
                     (1,1)
f1
     Distance from the hyperplane
 Why length
 normalized?
               f2
                    w=(1,2)
                     (1,1)
f1
                              length normalized
                              weight vectors
     Distance from the hyperplane
 Why length
 normalized?
               f2
                        w=(2,4)
                    (1,1)
f1
                                  length normalized
                                  weight vectors
     Distance from the hyperplane
 Why length
 normalized?
               f2
                    w=(0.5,1)
                     (1,1)
f1
                                length normalized
                                weight vectors
Measuring the margin
             Thought
     margi   experiment:
     n
                 Where is the max margin
                 hyperplane?
Measuring the margin
     margin
               Margin = (d+-d-)/2
              Max margin hyperplane is
              halfway in between the
              positive support vectors and
              the negative support vectors
Measuring the margin
     margi
              Margin = (d+-d-)/2
     n
             Max margin hyperplane is
             halfway in between the
             positive support vectors and
             the negative support vectors
              - All support vectors are the
                same distance
              - To maximize, hyperplane
                should be directly in between
Measuring the margin
                 Margin = (d+-d-)/2
         What is wx+b for support vectors?
         Hint:
            subject
            to:
          Measuring the margin
            subject
            to:
The support vectors have
Measuring the margin
            Margin = (d+-d-)/2
                          negative example
Deriving the margin equation
Derivation
Derivation
Illustration for Lagrange and associate dual formulation
     Lagrange formulation
Illustration for Lagrange and associate dual formulation
Primal and dual visualization
Primal and dual visualization
     Example (Linear SVM)
Data points   Co-ordinate   yi
x1            (2,1)         +1
x2            (4,3)         -1
x3            (2,0.5)       +1
Example (Linear SVM)