0% found this document useful (0 votes)

15 views49 pages

Support Vector Machine

The document provides an overview of Support Vector Machines (SVM), a significant concept in machine learning introduced by Vladimir Vapnik. It discusses the methodology of selecting the optimal separating hyperplane, maximizing the margin, and the role of support vectors in classification tasks. Additionally, it covers the formulation of the optimization problem, including the dual problem and the introduction of slack variables for non-linearly separable data.

Uploaded by

Naif N.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views49 pages

Support Vector Machine

Uploaded by

Naif N.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Support Vector Machine: an

Overview

Shihabudheen KV
Dept. of Electrical Engineering,
NIT Calicut
Support Vector Machines
• Support Vector Machines is arguably the most
important & interesting recent discovery in Machine
Learning.

• Support vector machines were introduced by Vapnik

Vladimir Naumovich Vapnik

Support Vector Machines
Which Separating Hyperplane to Use?
Var1

3
Maximizing the Margin
Var1
Select the
separating
hyperplane
that
maximizes the
margin!

Margin
Width

Margin
Width
Var2
4
Support Vectors
Var1

Support
Vectors

Margin
Width
Var2
5
Example
(0,2)
x1 = [0 0]t x1 = [1 0]t .......... ...
* +1
x1 = [2 0] t
x1 = [0 2] .......... ..
t
-1

(0,0)* *
(1,0) (2,0)
H2

(0,2)

(0,0) (1,0) (2,0)

Equation forms of a straight line on a plane
x1 x2 x1 x2
+ =1 + =1
p q 1.5 1.5
ax1 + bx2 + c = 0 − 2 x1 − 2 x2 + 3 = 0  x1 
[−2 − 2]   + 3 = 0 wt x + b = 0
 x2 
x1 x2 x1 x2
+ =1 + =1
1 1 2 2
Equations of the three hyperplanes Distance from the line to a point P( x0 , y0 )

− 2 x1 − 2 x2 + 2 = 0 ax0 + by0 + c
− 2 x1 − 2 x2 + 3 = 0 a 2 + b2
If P is the origin, then distance is
− 2 x1 − 2 x2 + 4 = 0
c
a2 + b2
2 2
The distance between origin and H1 =
2 +2
2 2
8
4 4
The distance between origin and H2 =
22 + 22 8

The distance between origin and H 3

=
3
2 +2
2 2
8
2 2
The distance between H1 and H2 is
8 w

− 2 x1 − 2 x2 + 2 = 0 − 2 − 2x + 3 = 1
− 2 x1 − 2 x2 + 3 = 0 − 2 − 2x + 3 = 0
− 2 x1 − 2 x2 + 4 = 0 − 2 − 2x + 3 = −1
2 x1 + 2 x2 − 3 = −1
H2 2 x1 + 2 x2 − 3 = 0
2 x1 + 2 x2 − 3 = +1
(0,2)

2 x1 + 2 x2 − 3  +1

(0,0) (1,0) (2,0)

2 x1 + 2 x2 − 3  −1
This is a constrained optimization problem. Solving it requires
some new tools
Lagrange function

Construct the Lagrange function (or Lagrangian):

N
1 T
J ( w , b, α ) = w w −   i [ yi ( w T x i + b) − 1]
2 i =1
where i are Lagrange multipliers.

According to KT condition, for non zero multipliers the

constraints are precisely met

Formally, we have
i [ yi (wT x i + b) −1] = 0
The patterns xi for which αi > 0 are called Support Vectors.
J ( w, b,  )
=0
w
 1 T N
 { w w −   i [ yi ( w T x i + b) − 1]} = 0
w 2 i =1
N
 w −   i yi x i = 0
i =1
N
 w =   i yi x i
i =1

J ( w , b,  )
=0
b
 1 T N
 { w w −   i [ yi ( w T x i + b) − 1]} = 0
b 2 i =1
N
   i yi = 0
i =1
Two condition of optimality:

J ( w , b,  ) N

condition 1: = 0  w =   i yi x i
w i =1

J ( w , b,  ) N
condition 2:
b
=0   y
i =1
i i =0
Dual problem
Given a constrained optimization problem with a
convex cost function and linear constraints; a dual
problem with the Lagrange multipliers providing the
solution can be formulated.
Duality Theorem (Bertsekas 1995)
(a) If the primal problem has an optimal solution, then
the dual problem has an optimal solution with the
same optimal values.
(b) In order for wo to be an optimal solution and o to
be an optimal dual solution, it is necessary and
sufficient that wo is feasible for the primal problem
and
(wo) = J(wo,bo, o) = Minw J(w,bo, o)
Since
N
w =   i yi x i
i =1
N
 w w =   i yi w T x i
T

i =1

We can get
J ( w , b,  )
N
1 T
= w w −   i [ yi ( w T x i + b) − 1]
2 i =1
N N N
1 T
= w w −   i y i w x i −   i yi b +   i
T

2 i =1 i =1 i =1
N N
1 T
= w w − w w − b   i yi +   i
T

2 i =1 i =1
N N
1 T
= − w w − b   i yi +   i
2 i =1 i =1
N

 y
i =1
i i =0 0
N N
1 T
Hence J ( w , b,  ) = − w w − b  i yi +   i
2 i =1 i =1
N
1 T
= − w w + i
2 i =1
Furthermore, we have
N N N N
w w =   i yi w x i =   i j yi y j x x j ( w =   i yi x i )
T T T
i
i =1 i =1 j =1 i =1

Accordingly, the objective function can be reformulated as

N
1 N N
J ( w , b,  ) =   i −   i j yi y j x Ti x j
i =1 2 i =1 j =1
Let Q ( ) = J ( w, b,  ) , then
N
1 N N
Q( ) =   i −   i j yi y j x Ti x j where  i  0
i =1 2 i =1 j =1
Dual Problem

The dual problem:

Given the training sample {(xi,yi)} ,find the Lagrange
multipliers {( i)} that maximize the objective function
N
1 N N
Q( ) =   i −   i j yi y j xTi x j <xi,xj>
i =1 2 i =1 j =1

subject to the constraints N

 y
i =1
i i =0
i  0
for i=1,2,…,N

The dual problem is cast entirely in terms of the training data.

Q(α) to be maximized depends only on the input
patterns in the form of dot product.
Classifying new data points
• Once the parameters (*, b*) are found by solving the
required quadratic optimisation on the training set of
points, the SVM is ready to be used for classifying
new points.
• Given new point x, its class membership is
sign[f(x, *, b*)], where

f (x, , b ) = w x + b = i =1 i* yi xi x + b* = iSV  i* yi xi x + b*

* * * * N

Data enters only

in the form of
dot products!
1 N
maximise : Q( ) = − i , j =1 i j yi y j x i x j + i =1 i
t N
2
subject to : 
N
y
i =1 i i
= 0,  i  0, i = 1,... N
 i = 1 + 2 + 3 +  4
K ( Xi, Xj ) = X i X j
T

 0 0 0 0  1  
 
0 1 2 0   2  
− i , j =1  i  j y i y j x i x j = −  1 −  4 
1 N 1
2 − 3
2 2 0 2 4 0  −  3  
   
 0 0 0 4 −  4  

 2
= x 2 x3 = 1 0  = 2
T
K 23
0 
1
Q() = (1 +  2 +  3 +  4 ) − ( 2 − 4 2  3 + 4 3 + 4 4 )
2 2 2

2
1 +  2 −  3 −  4 = 0
 i  0, i = 1,...4
Non-Linearly Separable Data

Var1 Introduce slack

i variables i
Allow some
instances to fall
within the margin,
i but penalize them
 w x + b = 1
w wxi + b  1 −  i for y i = +1
wxi + b  −1 +  i for y i = −1
w  x + b = −1 1 Var2 i  0
1
 
w x + b = 0 yi ( wxi + b)  1 − i
Formulating the Optimization
Problem
Constraint becomes :

Var1 yi (w  xi + b)  1 − i , xi
i
i  0
Objective function penalizes for
misclassified instances and
those within the margin.
i Introducing extra cost term

 w x + b = 1
w 1
min w + C  i
2

2 i
w  x + b = −1 1 Var2
1
 
w x + b = 0 C trades-off margin
width and
misclassifications
Linear, Soft-Margin SVMs
1
min w + C  i yi (w  xi + b)  1 − i , xi
2

2 i i  0

• As C→, we get closer to the hard-

margin solution
Soft-Margin always have a solution
Smoother surfaces (in the non-linear case)

 i do not appear in the dual problem

Robustness of Soft vs Hard
Margin SVMs

i

i
 
w x + b = 0
 
w x + b = 0

Soft Margin Hard Margin

SVN SVN
The Soft-Margin Classifier cont.
Lagrangian for soft margin problem is
l l l
1
L p (w , b,  , α, r ) = w, w + C  i −   i [ yi ( xi  w + b) − 1 + i ] −  rii ( )
(1)
2
with  i 0 ri  0 Differentiating w.r.t. w, and b
i i i
and

 l
L p = w −   i yi xi = 0 ()
(2)
w i =1

L p = C −  i − ri = 0 ()
(3)
i

 l
L p =   i yi = 0 ()
(4)
b i

Substituting (2),(3),(4) into primal Lagrangian (1)

we can form the dual formulation for soft margin problem
Soft Margin-Dual Lagrangian

Dual problem
1
LD =   i −   i j yi y j xi x j ()
i 2 i, j
0  i  C ()
 y
i
i i =0 ()

The nonseparable case differs from the separable case in that the constraint  i  0
is replaced with the more stringent constraint 0   i  C
Mapping Data to a High-Dimensional Space
• Find function (x) to map to a different space,
then SVM formulation becomes:
1
w + C  i
2
min
2 i
s.t. yi ( w   ( x ) + b)  1 −  i , xi
i  0

• Data appear as (x), weights w are now weights

in the new space
• Explicit mapping expensive if (x) is very high
dimensional
• Solving the problem without explicitly mapping
the data is desirable
The Dual of the SVM Formulation

Original SVM formulation 1 2

min w + C i
– n inequality constraints w ,b 2 i
– n positivity constraints
s.t. yi ( w   ( x ) + b)  1 −  i , xi
– n number of  variables
i  0
The dual of this problem
– n positivity constraints 1
max   i j yi y j ( ( xi )   ( x j )) −   i
– one equality constraint ai 2 i, j i
– n number of  variables
(Lagrange multipliers) s.t. C   i  0, xi
– Objective function more  y
i
i i =0
complicated

Data only appear as (xi)  (xj)

The Kernel Trick
• (xi)  (xj): means, map data into new space, then take
the inner product of the new vectors
• We can find a function such that: K(xi  xj) = (xi)  (xj),
i.e., the image of the inner product of the data is the inner
product of the images of the data
• Then, we do not need to explicitly map the data into the
high-dimensional space to solve the optimization problem
(for training)

Example 1:x, y points in 2D input, 3D feature space

 x12 
 x1   y1   
x =  ; y =  ; if  (x) =  2 x1 x2 , this implies K (x, y ) =  (x) (y ) = (xy ) 2
x  y   2 
 2  2  x2 
 
◼ The linear classifier relies on dot product between vectors
K(xi,xj)=xiTxj
◼ If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
◼ A kernel function is some function that corresponds to an inner
product in some expanded feature space.
◼ Example 2:

2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,

Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
Examples of Kernel Functions
◼ Linear: K(xi,xj)= xi Txj

◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

◼ Gaussian (radial-basis function network):

2
xi − x j
K (x i , x j ) = exp( − )
2 2

◼ Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

SVM-Architecture
XOR Problem
K(xi,xj)=(1 + xiTxj)2,
Input Class
vector label
X= (x1,x2) Y={1, -1}  x j1  2
K ( x i , x j ) = ( 1 +  x i1 xi 2   ) = ( 1 + xi1 x j1 + xi 2 x j 2 )2
(1, 1) -1  x j2 
= 1 + xi1 x j1 + 2 xi1 x j1 xi 2 x j 2 + xi 2 x j 2 + 2 xi1 x j1 + 2 xi 2 x
2 2 2 2
(1, -1) 1
(-1, 1) 1
(-1, -1) -1
9 1 1 1 K11=(1+[-1 -1]*[-1 -1]')^2
1 9 1 1 K11 = 9
K =
1 1 9 1 K12=(1+[-1 -1]*[-1 1]')^2
  K12 = 1
1 1 1 9
Notice that in the dual problem the image of input
vectors only involved as an inner product meaning
that the optimisation can be performed in the (lower
dimensional) input space and that the inner product
can be replaced by an inner-product kernel
Q(a) = S ai – ½ S S ai aj di dj (xi) T (xj)
= S ai – ½ S S ai aj di dj K(xi, xj)
=a1 +a2 +a3 +a4 – ½(9 a1 a1 - 2a1 a2 -2 a1 a3 +2a1 a4
+9a2 a2 + 2a2 a3 -2a2 a4 +9a3 a3 -2a3 a4 +9 a4 a4 )
1 = 9 a1 - a2 - a3 + a4
1 = -a1 + 9 a2 + a3 - a4
1 = -a1 + a2 + 9 a3 - a4
1 = a1 - a2 - a3 + 9 a4 a1 = a2 = a3 = a4 = 1/8
MultiClass SVMs
Total C classes
One-versus-all
(OVA)

Total C binary SVMs

One-versus-one
All-vs-all (AVA)

Total C(C-1)/2 binary SVMs

Support Vector Regression
Support Vector Regression
Linear ε- Insensitive Loss Regression

l
min 1 2
w + C  ( i +  i* )
2 i =1

 yi − w, xi − b   +  i
subject to 
 w, xi + b − yi   +  i
*


 i ,  i  0
*

ε → decide Insensitive Zone

C → a trade-off between error and ||w||
➢ εand C must be tuned simultaneously
Regression is more difficult than Classification?
Dual Formulation
• Lagrangian function will help us to formulate the dual problem
l l
1
L = || w || +C  (i + i ) −  i ( i + i − yi + w, xi + b)
2 *

2 i =1 i =1
l l
−  ( yi + i +  − w, xi − b) −  (  ii +  i*i* )
*
i i
*

i =1 i =1

• ε: insensitive loss βi* : Lagrange Multiplier

ξi : difference value for points above εband
ξi*: difference value for points below εband
• Optimality Conditions
L l
= w −  ( i −  i* ) x i = 0
w i =1

L l
=  ( i* −  i ) = 0
b i =1

L
= C −i − i = 0
 i
L
= C −  i* −  i* = 0
 i*
Dual Formulation(Cont’)
• Dual Problem
1 l l l l
max −   ( i −  )( j −  ) xi , x j −   ( i +  ) +  yi ( i −  i* )
*
i
*
j
*
i
2 i =1 j =1 i =1 i =1

subject to  ( i − 
i )=0
 i , i*  [0, C ]

Solving w =  ( i −  *i ) x i ,
*
and
i =1
l
f ( x) =  ( i −  i* ) x i x + b
i =1
Extended versions of SVM
• Least-squares support-vector machine (LS-
SVM)
Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine
classifiers. Neural processing letters, 9(3), 293-300.

• Proximal support vector machine (PSVM)

Fung, G. M., & Mangasarian, O. L. (2005). Multicategory proximal support vector machine
classifiers. Machine learning, 59(1-2), 77-97.
Comparison with Neural
Networks
Neural Networks SVMs
• Hidden Layers map to lower • Kernel maps to a very-high
dimensional spaces dimensional space
• Search space has multiple local • Search space has a unique
minima minimum
• Training is expensive • Training is extremely efficient
• Classification extremely • Classification extremely
efficient efficient
• Requires number of hidden • Kernel and cost the two
units and layers parameters to select
• Very good accuracy in typical • Very good accuracy in typical
domains domains
• Extremely robust
46
SVM:Nonlinear case
Model selection procedure

-We have to decide which Kernel function and “C” value to use.
-”In practice a Gaussian radial basis or a low degree polynomial kernel
is a good start.” [Andrew.Moore]
- We start checking which set of parameters (such as C
or  if we choose Gaussian radial basis) are the most appropriate by
Cross-Validation (K- fold) :
1) divide randomly all the available training examples into K equal-
sized subsets.
2) use all but one subset to train the SVM with the chosen para’.
3) use the held out subset to measure classification error.
4) repeat Steps 2 and 3 for each subset.
5) average the results to get an estimate of the generalization error
of the SVM classifier.
-The SVM is tested using this procedure for various parameter
settings. In the end, the model with the smallest generalization error
is adopted. Then we train our SVM classifier using these parameters
over the whole training set.
- For Gaussian RBF trying exponentially growing sequences of C and
is a practical method to identify good parameters :
- A good choice * is the following grid:
C = 2−5 , 2−4 ,......, 215
 = 2−15 , 2−14 ,...., 23
* This grid is suggested by LibSVM (An integrated and easy-to-use
tool for SVM classifier )
Software: Popular implementations
SVMlight: http://svmlight.joachims.org/
By Joachims, is one of the most widely used SVM classification and
regression package. Distributed as C++ source and binaries for
Linux, Windows, Cygwin, and Solaris. Kernels: polynomial, radial
basis function, and neural (tanh).
LibSVM : http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM
(Library for Support Vector Machines), is developed by Chang and
Lin; also widely used. Developed in C++ and Java, it supports also
multi-class classification, weighted SVM for unbalanced data,
cross-validation and automatic model selection. It has interfaces
for Python, R, Splus, MATLAB, Perl, Ruby, and LabVIEW. Kernels:
linear, polynomial, radial basis function, and neural (tanh).

For applications:
http://www.clopinet.com/isabelle/Projects/SVM/app
list.html

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
20 SVM
No ratings yet
20 SVM
35 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Main 7
No ratings yet
Main 7
25 pages
Support Vector Machines Jie Tang
No ratings yet
Support Vector Machines Jie Tang
28 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
SVM Student
No ratings yet
SVM Student
40 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
25 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
SVMs for Machine Learning Students
No ratings yet
SVMs for Machine Learning Students
36 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
SVM
No ratings yet
SVM
21 pages
SVM New
No ratings yet
SVM New
12 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Web Design Complete Reference 2nd Edition Thomas A. Powell Full Digital Chapters
No ratings yet
Web Design Complete Reference 2nd Edition Thomas A. Powell Full Digital Chapters
74 pages
Impact of ICT on Education & Society
No ratings yet
Impact of ICT on Education & Society
1 page
Tender - RCB NIT 08 20 21 IBDC - 1597146134
No ratings yet
Tender - RCB NIT 08 20 21 IBDC - 1597146134
124 pages
Urban Futurism
No ratings yet
Urban Futurism
7 pages
Future Home Technology Predictions
No ratings yet
Future Home Technology Predictions
5 pages
Data Science Assignment
No ratings yet
Data Science Assignment
2 pages
Report
No ratings yet
Report
17 pages
Erin Solomon Resume 06-25
No ratings yet
Erin Solomon Resume 06-25
1 page
WGS345 Pre Lab 6
No ratings yet
WGS345 Pre Lab 6
4 pages
Smart Grid Exam Prep
67% (3)
Smart Grid Exam Prep
10 pages
M1.1 Introduction To Data Engineering
No ratings yet
M1.1 Introduction To Data Engineering
75 pages
Minerva - Minerva 8 - 16E - 80 Opperating Manual E
No ratings yet
Minerva - Minerva 8 - 16E - 80 Opperating Manual E
28 pages
Gta 1536
No ratings yet
Gta 1536
2 pages
Recomputer J30
No ratings yet
Recomputer J30
2 pages
Judge Dredd d20
No ratings yet
Judge Dredd d20
10 pages
Avia - HCMC
No ratings yet
Avia - HCMC
15 pages
Track DDL in Noarchivelog Mode
No ratings yet
Track DDL in Noarchivelog Mode
2 pages
Structural Steel Inspection Report Checklist - SafetyCulture
100% (1)
Structural Steel Inspection Report Checklist - SafetyCulture
3 pages
Esp32 Datasheet en
No ratings yet
Esp32 Datasheet en
65 pages
Network Setup Guide for Businesses
No ratings yet
Network Setup Guide for Businesses
10 pages
Destination Technologies - TCS Ninja Training (Coding Round)
No ratings yet
Destination Technologies - TCS Ninja Training (Coding Round)
11 pages
Simulation and Implementation of Ac-Dc Interleaved
No ratings yet
Simulation and Implementation of Ac-Dc Interleaved
7 pages
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
No ratings yet
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
9 pages
Presentation Topics
No ratings yet
Presentation Topics
8 pages
Computer Programming Exam Paper
No ratings yet
Computer Programming Exam Paper
2 pages
Hacker Rank
No ratings yet
Hacker Rank
44 pages
R Programming: Flat & Cross Tables
No ratings yet
R Programming: Flat & Cross Tables
3 pages
Pluralsight 20patterns Ebook
No ratings yet
Pluralsight 20patterns Ebook
53 pages
Midterm Questions - Answer Sheet
No ratings yet
Midterm Questions - Answer Sheet
4 pages
Amazon Connect Salesforce Setup
No ratings yet
Amazon Connect Salesforce Setup
5 pages

Support Vector Machine

Uploaded by

Support Vector Machine

Uploaded by

Support Vector Machine: an

• Support vector machines were introduced by Vapnik

Vladimir Naumovich Vapnik

(0,0) (1,0) (2,0)

The distance between origin and H 3

(0,0) (1,0) (2,0)

Construct the Lagrange function (or Lagrangian):

According to KT condition, for non zero multipliers the

Accordingly, the objective function can be reformulated as

The dual problem:

subject to the constraints N

The dual problem is cast entirely in terms of the training data.

f (x, , b ) = w x + b = i =1 i* yi xi x + b* = iSV  i* yi xi x + b*

Data enters only

Var1 Introduce slack

• As C→, we get closer to the hard-

 i do not appear in the dual problem

Soft Margin Hard Margin

Substituting (2),(3),(4) into primal Lagrangian (1)

• Data appear as (x), weights w are now weights

Original SVM formulation 1 2

Data only appear as (xi)  (xj)

Example 1:x, y points in 2D input, 3D feature space

2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,

◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

◼ Gaussian (radial-basis function network):

◼ Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

Total C binary SVMs

Total C(C-1)/2 binary SVMs

ε → decide Insensitive Zone

• ε: insensitive loss βi* : Lagrange Multiplier

• Proximal support vector machine (PSVM)

You might also like