0% found this document useful (0 votes)
67 views33 pages

6.867 Nonlinear Models, Kernels: Fall 2016

The document discusses support vector machines and kernel methods for nonlinear classification. It introduces the kernel trick which allows applying linear models in higher-dimensional feature spaces, enabling nonlinear decision boundaries. It shows how the regularization parameter C and kernel width affect the decision boundary complexity and number of errors.

Uploaded by

Suchan Khankluay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views33 pages

6.867 Nonlinear Models, Kernels: Fall 2016

The document discusses support vector machines and kernel methods for nonlinear classification. It introduces the kernel trick which allows applying linear models in higher-dimensional feature spaces, enabling nonlinear decision boundaries. It shows how the regularization parameter C and kernel width affect the decision boundary complexity and number of errors.

Uploaded by

Suchan Khankluay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

6.

867
Nonlinear models,Kernels

Fall 2016
1X X
maximize i j yi yj hxi , xj i + i
2 i,j i
X
subject to i yi = 0 and i 2 [0, C]
i

X
w= yi i x i
i i = 0 =) yi [hw, xi i + b] 1
i [yi [hw, xi i + b] + i 1] = 0 0 < i < C =) yi [hw, xi i + b] = 1
i i = 0 i = C =) yi [hw, xi i + b] 1

courtesy: A. Smola (CMU, 2013)


The Kernel Trick
Linear soft margin problem
1 2
X
minimize kwk + C i
w,b 2 i
subject to yi [hw, xi i + b] 1 i and i 0

Dual problem
1X X
maximize
2 i,j
i j yi yj hxi , xj i + i
i
X
subject to i yi = 0 and i 2 [0, C]
i

Support vector expansion


X
f (x) = i yi hxi , xi + b
i
courtesy: A. Smola (CMU, 2013)
The Kernel Trick
Linear soft margin problem
1 2
X
minimize kwk + C i
w,b 2 i
subject to yi [hw, (xi )i + b] 1 i and i 0

Dual problem
1X X
maximize
2 i,j
i j yi yj k(xi , xj ) + i

i
X
subject to i yi = 0 and i 2 [0, C]
i

Support vector expansion


X
f (x) = i yi k(xi , x) + b
i
courtesy: A. Smola (CMU, 2013)
C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


C=1

courtesy: A. Smola (CMU, 2013)


C=2

courtesy: A. Smola (CMU, 2013)


C=5

courtesy: A. Smola (CMU, 2013)


C=10

courtesy: A. Smola (CMU, 2013)


C=20

courtesy: A. Smola (CMU, 2013)


C=50

courtesy: A. Smola (CMU, 2013)


C=100

courtesy: A. Smola (CMU, 2013)


now with a narrower kernel

courtesy: A. Smola (CMU, 2013)


courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
courtesy: A. Smola (CMU, 2013)
now with a very wide kernel

courtesy: A. Smola (CMU, 2013)


courtesy: A. Smola (CMU, 2013)
Nonlinear separation
Pattern Recognition

Figure 7.10 2D toy example of a binary classification problem solved using a soft margin
Increasing C allows for more nonlinearities
SVC. In all cases, a Gaussian kernel (7.27) is used. From left to right, we decrease the
kernel width. Note that for a large width, the decision boundary is almost linear, and
Decreases number of errors
the data set cannot be separated without error (see text). Solid lines represent decision
boundaries; dotted lines depict the edge of the margin (where (7.34) becomes an equality
SV boundary need not be contiguous
with i = 0).

Kernel width adjusts function class


was used, but the kernel width c was varied. For large values of c, the classifier is
courtesy: A. Smola (CMU, 2013)
almost linear, and it cannot separate the data set without errors. For a small width

You might also like