0% found this document useful (0 votes)

192 views19 pages

Support Vector Machines & Kernels: David Sontag New York University

1. The document discusses support vector machines and kernels. It covers the derivation of the dual formulation of SVMs, which allows solving the SVM optimization problem using kernels without having to explicitly compute features in a high dimensional space. 2. The kernel trick is introduced, which allows computing the dot product of two vectors after they have been mapped to a high dimensional feature space, without having to explicitly perform the mapping. This makes SVMs with kernels efficient even for complex, non-linear mappings. 3. The document provides an overview of key concepts like slack variables, Lagrange multipliers, solving the constrained optimization problem in its dual formulation, and using kernels to implicitly compute dot products in high dimensional feature spaces.

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

192 views19 pages

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Support Vector Machines & Kernels

Lecture 5

David Sontag
New York University

Slides adapted from Luke Zettlemoyer and Carlos Guestrin

Multi-class SVM
As for the SVM, we introduce slack variables and maximize margin:

To predict, we use:

Now can we learn it? 

How to deal with imbalanced data?

•  In many practical applications we may have

imbalanced data sets
•  We may want errors to be equally distributed
between the positive and negative classes
•  A slight modification to the SVM objective
does the trick!

Class-specific weighting of the slack variables

What’s Next!
•  Learn one of the most interesting and
exciting recent advancements in machine
learning
–  The “kernel trick”
–  High dimensional feature spaces at no extra
cost!
•  But first, a detour
–  Constrained optimization!
Constrained optimization

No Constraint x ≥ -1 x≥1

x=0 x=0 x*=1

How do we solve with constraints?

 Lagrange Multipliers!!!
Lagrange multipliers – Dual variables
Add Lagrange multiplier
Rewrite
Constraint
Introduce Lagrangian (objective):

We will solve:
Why is this equivalent?
•  min is fighting max!
x<b  (x-b)<0  maxα-α(x-b) = ∞
•  min won’t let this happen!
Add new
x>b, α≥0  (x-b)>0  maxα-α(x-b) = 0, α*=0 constraint
•  min is cool with 0, and L(x, α)=x2 (original objective)

x=b  α can be anything, and L(x, α)=x2 (original objective)

The min on the outside forces max to behave, so constraints will be satisfied.
Dual SVM derivation (1) – the linearly
separable case

Original optimization problem:

Rewrite One Lagrange multiplier

constraints per example
Lagrangian:

Our goal now is to solve:

Dual SVM derivation (2) – the linearly
separable case

(Primal)

Swap min and max

(Dual)

Slater’s condition from convex optimization guarantees that

these two optimization problems are equivalent!
 
x(1)
 ... 
 
Dual SVM derivation
 x (n) 
 (3) – the linearly
 x(1) x(2) 
φ(x) =  separable case
 
(1) (3) 
 x x 
 
 ... 
(Dual)  
 ex(1) 
Can solve for optimal w,. .b. as function of α:
∂L �
∂w
=w− αj yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

Sums over all training examples scalars dot product

 
x(1)
 ... 
 
Dual SVM derivation
 x (n) 
 (3) – the linearly
 x(1) x(2) 
φ(x) =  separable case
 
(1) (3) 
 x x 
 
 ... 
(Dual)  
 ex(1) 
Can solve for optimal w,. .b. as function of α:
∂L �
∂w
=w− αj yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

So, in dual formulation we will solve for α directly!

•  w and b are computed from α (if needed)
Dual SVM derivation (3) – the linearly
separable case
Lagrangian:

αj > 0 for some j implies constraint

is tight. We use this to obtain b:

(1)

(2)

(3)
Dual for the non-separable case – same basic
story (we will skip details)

Primal: Solve for w,b,α:

Dual:

What changed?
•  Added upper bound of C on αi!
•  Intuitive explanation:
•  Without slack, αi  ∞ when constraints are violated (points
misclassified)
•  Upper bound of C limits the αi, so misclassifications are allowed
Wait a minute: why did we learn about the dual
SVM?

•  There are some quadratic programming

algorithms that can solve the dual faster than the
primal
–  At least for small datasets

•  But, more importantly, the “kernel trick”!!!

Reminder: What if the data is not
linearly separable?
Use features of features
of features of features….
 
x(1)
 ... 
 
 x(n) 
 
 x(1) x(2) 
 
φ(x) =  (1) (3) 
 x x 
 
 ... 
 
 ex(1) 
...

Feature space can get really large really quickly!

Higher order polynomials
number of monomial terms

d=4

m – input features
d – degree of polynomial

d=3
grows fast!
d = 6, m = 100
d=2
about 1.6 billion terms
number of input dimensions
Dual formulation only depends on
dot-products, not on w!

Remember the
First, we introduce features: examples x only
 appear in one dot
product
Next, replace the dot product with a Kernel:

Why is this useful???


∂w x . . .x
(1)   j j j
φ(x) =   ex x j�
(1)
 (1)e∂Lx(3) 
� x �. . .�= w − � αj yj xj
u1 .∂w .. v1  j
=  �. .. .  = u1 v1 + u2 v2 = u.v
Efficient dot-product
φ(u).φ(v)
∂L
∂L = w
=
− u2� (1)of
�
α �y
w − euα1j yj xj 
x j
v2�polynomials
j xj
 �
∂w
Polynomials of degree exactly d
v1
∂w 2 =
φ(u).φ(v)  jju 2
.  = u1 v1 + u2 v2 = u.v
u1 2 v1 v2
d=1 �� . . .

uu u 1u 2
v1  v 1 v 2
 2 2
φ(u).φ(v)
φ(u).φ(v) =
=  11
φ(u).φ(v) = u u u. v1�. 
u 2 1
.  == uu v v2+ =
+u2uu
v221vv= + u.v
u.v
1= 2u1 v1 u2 v2 + u

∂Lu22 21 u1 uv222  v
1
2 v1
1 11
1 2
v v 
φ(u).φ(v)
 =
= u


w
2 −  α
.
 2

vj2y j1x 2
j  = u21 v12 + 2u1 v1 u2 v2 +

d=2 u∂w
1
2 2
uv21u12 v2 v1
 u1 u2   v1uv22  j 2
v2 v )2
φ(u).φ(v) =  .
�u2 u1 �� = (u v 2
+ 2
u 2 2
v2 v1 �1 11 1 2 21 1 2 2
= u v + 2u v u v + u 2 v2

u
u2 1 v 2v1 = (u v + u v ) 2
φ(u).φ(v) = 2 . 2 = 1u11 v1 + 2 2u v = u.v
2 2
u2 v2 = (u.v) 2
φ(u).φ(v) = (u.v)d
For any d (we will skip proof): φ(u).φ(v) = (u.v) d

= (u.v)=d (u.v)d
φ(u).φ(v)φ(u).φ(v)
•  Cool! Taking a dot product and exponentiating gives same
results as mapping into high dimensional space and then taking
dot produce
Finally: the “kernel trick”!

•  Never compute features explicitly!!!

–  Compute dot products in closed form
•  Constant-time high-dimensional dot-
products for many classes of features
•  But, O(n2) time in size of dataset to
compute objective
–  Naïve implements slow
–  much work on speeding up
Common kernels
•  Polynomials of degree exactly d

•  Polynomials of degree up to d

•  Gaussian kernels

•  Sigmoid

•  And many others: very active area of research!

Practical Database Design
100% (6)
Practical Database Design
13 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
The Python Workbook (A Brief Introduction With Exercises and Solutions) (2nd Edition) Stephenson
No ratings yet
The Python Workbook (A Brief Introduction With Exercises and Solutions) (2nd Edition) Stephenson
10 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
35 pages
Unit 4D
No ratings yet
Unit 4D
43 pages
Chapter 7: Dimensionality Reduction
No ratings yet
Chapter 7: Dimensionality Reduction
34 pages
Lecture Dimensionality Reduction
No ratings yet
Lecture Dimensionality Reduction
34 pages
Jump Start MySQL Master The Database That Powers The Web 1st Edition Boronczyk Updated 2025
100% (5)
Jump Start MySQL Master The Database That Powers The Web 1st Edition Boronczyk Updated 2025
82 pages
C1 W2 Lab01 Python Numpy Vectorization Soln
No ratings yet
C1 W2 Lab01 Python Numpy Vectorization Soln
12 pages
Architecture Diagram: E-Commerce Website
No ratings yet
Architecture Diagram: E-Commerce Website
2 pages
C1 W2 Lab04 FeatEng PolyReg Soln
No ratings yet
C1 W2 Lab04 FeatEng PolyReg Soln
5 pages
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
No ratings yet
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
6 pages
Tìm Hiểu Về Thuật Toán AES - Luận Văn, Đồ Án, Đề Tài Tốt Nghiệp
No ratings yet
Tìm Hiểu Về Thuật Toán AES - Luận Văn, Đồ Án, Đề Tài Tốt Nghiệp
18 pages
Module 07 Lecture Slides
No ratings yet
Module 07 Lecture Slides
166 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Python Oops Concept
No ratings yet
Python Oops Concept
37 pages
TPACK Framework
No ratings yet
TPACK Framework
25 pages
Assignment
No ratings yet
Assignment
10 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
Tableau Calculations
No ratings yet
Tableau Calculations
52 pages
Prim's Algorithm for Beginners
No ratings yet
Prim's Algorithm for Beginners
9 pages
CNN Course V1.3
No ratings yet
CNN Course V1.3
19 pages
Intro To Tableau Workshop
No ratings yet
Intro To Tableau Workshop
16 pages
Role of Technology in Accounting and E-Accounting: Sciencedirect
No ratings yet
Role of Technology in Accounting and E-Accounting: Sciencedirect
4 pages
KNN K Nearest Neighbors Algorithm
No ratings yet
KNN K Nearest Neighbors Algorithm
6 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Sem232 La Cc03 Group 2
No ratings yet
Sem232 La Cc03 Group 2
16 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF
100% (1)
A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF
49 pages
Features 1 B
No ratings yet
Features 1 B
94 pages
Chapter 4 - Linear Regression
100% (2)
Chapter 4 - Linear Regression
25 pages
Solu 5
0% (2)
Solu 5
46 pages
NLP Text Summarization Techniques
100% (1)
NLP Text Summarization Techniques
8 pages
The Bayesian Information Criterion
No ratings yet
The Bayesian Information Criterion
32 pages
50 Page PYTHON Notes
No ratings yet
50 Page PYTHON Notes
48 pages
Convex Analysis
No ratings yet
Convex Analysis
32 pages
Power Bi Connect Data
No ratings yet
Power Bi Connect Data
1,161 pages
Computer Science Topic 1.1 Questions
No ratings yet
Computer Science Topic 1.1 Questions
34 pages
J. Camm, J. Cochran, M. Fry, J. Ohlmann - Data Visualization - Exploring and Explaining With Data (2022) - Libgen - Li - Compressed (1) - Trang-2
No ratings yet
J. Camm, J. Cochran, M. Fry, J. Ohlmann - Data Visualization - Exploring and Explaining With Data (2022) - Libgen - Li - Compressed (1) - Trang-2
208 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Lec7 Full
No ratings yet
Lec7 Full
35 pages
Grammar Tiểu Học New
100% (1)
Grammar Tiểu Học New
321 pages
Behrouzi, Yew Wong - 2011 - Lean Performance Evaluation of Manufacturing Systems A Dynamic and Innovative Approach PDF
No ratings yet
Behrouzi, Yew Wong - 2011 - Lean Performance Evaluation of Manufacturing Systems A Dynamic and Innovative Approach PDF
8 pages
Bai-Giang - PTTK-HTTT - Ch3-Ch4
No ratings yet
Bai-Giang - PTTK-HTTT - Ch3-Ch4
181 pages
Chèn thêm phần tử vào mảng
No ratings yet
Chèn thêm phần tử vào mảng
8 pages
E Book Handbook For Spoken Mathematics
No ratings yet
E Book Handbook For Spoken Mathematics
63 pages
CTDLGT
No ratings yet
CTDLGT
50 pages
Kactl
No ratings yet
Kactl
26 pages
Intel X86 and Arm Data Types
No ratings yet
Intel X86 and Arm Data Types
20 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Chuyển Đổi Số Trong Đại Học
No ratings yet
Chuyển Đổi Số Trong Đại Học
6 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
4.MBBS NRI Declaration 2023 24
No ratings yet
4.MBBS NRI Declaration 2023 24
1 page
Hayagriva Telugu Meanings
No ratings yet
Hayagriva Telugu Meanings
7 pages
Hostel Rules & Regulations
No ratings yet
Hostel Rules & Regulations
3 pages
2020 Hindu Religious Calendar in Color in 1 Page
No ratings yet
2020 Hindu Religious Calendar in Color in 1 Page
1 page
Ii Semester 2004-2005CS C415/is C415 - Data Mining
No ratings yet
Ii Semester 2004-2005CS C415/is C415 - Data Mining
6 pages
Song 2class sip69SIP2011
No ratings yet
Song 2class sip69SIP2011
8 pages
Clustering and K-Means Lecture
No ratings yet
Clustering and K-Means Lecture
36 pages
Knime Seventechniquesdatadimreduction
No ratings yet
Knime Seventechniquesdatadimreduction
266 pages
Land Records and Titles in India
No ratings yet
Land Records and Titles in India
24 pages
Sattriya and Bharatanatyam
No ratings yet
Sattriya and Bharatanatyam
13 pages
Selenium - A Brief Overview
No ratings yet
Selenium - A Brief Overview
27 pages
Biology For 12th Class Cbse
100% (2)
Biology For 12th Class Cbse
245 pages
20027-InfosysImprovingProductivity (1) PDF
No ratings yet
20027-InfosysImprovingProductivity (1) PDF
22 pages
5952 Frequently Asked Questions 9001 2015
No ratings yet
5952 Frequently Asked Questions 9001 2015
4 pages
STATISTICS For MGT Summary of Chapters
100% (1)
STATISTICS For MGT Summary of Chapters
16 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
Numerical Methods Notes - by Trockers
No ratings yet
Numerical Methods Notes - by Trockers
35 pages
ASP UNIT-2 Question Bank
No ratings yet
ASP UNIT-2 Question Bank
18 pages
NLP MultiVAr Constrained
No ratings yet
NLP MultiVAr Constrained
63 pages
Uma035 5
No ratings yet
Uma035 5
2 pages
Laberport
No ratings yet
Laberport
9 pages
1 Step
No ratings yet
1 Step
3 pages
Introduction to Finite Element Method
No ratings yet
Introduction to Finite Element Method
17 pages
Numerical Methods Lab Report
No ratings yet
Numerical Methods Lab Report
12 pages
Chapter 2 Numerical Methods
No ratings yet
Chapter 2 Numerical Methods
46 pages
C1 A Level Maths Polynomial Answers
No ratings yet
C1 A Level Maths Polynomial Answers
5 pages
Binomial Theorem
No ratings yet
Binomial Theorem
12 pages
Sol Mock Exam
No ratings yet
Sol Mock Exam
10 pages
ME Math 10 Q1 0702 SG
No ratings yet
ME Math 10 Q1 0702 SG
20 pages
Optimization in Assignment Problems
No ratings yet
Optimization in Assignment Problems
7 pages
Sma 3303 Numerical Analysis I
No ratings yet
Sma 3303 Numerical Analysis I
3 pages
OR 2 Quiz 1
No ratings yet
OR 2 Quiz 1
2 pages
Unit 5
No ratings yet
Unit 5
60 pages
ch-2 5
No ratings yet
ch-2 5
3 pages
Chapter 7 More About Polynomials
No ratings yet
Chapter 7 More About Polynomials
15 pages
Notes 2 3 Algebraic Method Maximization
No ratings yet
Notes 2 3 Algebraic Method Maximization
10 pages
QT - 2 Mark Questions With Answer
No ratings yet
QT - 2 Mark Questions With Answer
11 pages
Lagrange Interpolation Slides
No ratings yet
Lagrange Interpolation Slides
37 pages
4 - Revised Simplex
No ratings yet
4 - Revised Simplex
10 pages
An Overview of Traditional Optimization Methods - Truncated
No ratings yet
An Overview of Traditional Optimization Methods - Truncated
17 pages
Narayana Educational Society: Star Super Chaina Campus Quiz
No ratings yet
Narayana Educational Society: Star Super Chaina Campus Quiz
4 pages
2 Stepping Stone and Modi
No ratings yet
2 Stepping Stone and Modi
13 pages
Quadratic Equation TN F
No ratings yet
Quadratic Equation TN F
12 pages
Taylor Series
No ratings yet
Taylor Series
14 pages
Homework #3: Solution of Linear Systems AX B
No ratings yet
Homework #3: Solution of Linear Systems AX B
6 pages

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

Support Vector Machines & Kernels

Slides adapted from Luke Zettlemoyer and Carlos Guestrin

Now can we learn it? 

• In many practical applications we may have

Class-specific weighting of the slack variables

x*=0 x*=0 x*=1

How do we solve with constraints?

x=b  α can be anything, and L(x, α)=x2 (original objective)

Original optimization problem:

Rewrite One Lagrange multiplier

Our goal now is to solve:

Swap min and max

Slater’s condition from convex optimization guarantees that

Substituting these values back in (and simplifying), we obtain:

Sums over all training examples scalars dot product

Substituting these values back in (and simplifying), we obtain:

So, in dual formulation we will solve for α directly!

αj > 0 for some j implies constraint

Primal: Solve for w,b,α:

• There are some quadratic programming

• But, more importantly, the “kernel trick”!!!

Feature space can get really large really quickly!

Why is this useful???

• Never compute features explicitly!!!

• And many others: very active area of research!

You might also like

•  In many practical applications we may have

x=0 x=0 x*=1

•  There are some quadratic programming

•  But, more importantly, the “kernel trick”!!!

•  Never compute features explicitly!!!

•  And many others: very active area of research!