Open navigation menu

Scribd

0% found this document useful (0 votes)

277 views34 pages

SVM Tutorial

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression analysis. It works by finding a hyperplane in a high-dimensional space that distinctly classifies the data points. SVMs use kernel functions to maximize the margin between the decision boundary and the nearest data points of each class. They can efficiently perform nonlinear classification by implicitly mapping their inputs into high-dimensional feature spaces. SVMs have been successfully applied to areas like text categorization, image recognition, and cancer classification.

Uploaded by

Copyright

© Attribution Non-Commercial (BY-NC)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

277 views34 pages

SVM Tutorial

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression analysis. It works by finding a hyperplane in a high-dimensional space that distinctly classifies the data points. SVMs use kernel functions to maximize the margin between the decision boundary and the nearest data points of each class. They can efficiently perform nonlinear classification by implicitly mapping their inputs into high-dimensional feature spaces. SVMs have been successfully applied to areas like text categorization, image recognition, and cancer classification.

Uploaded by

Copyright

© Attribution Non-Commercial (BY-NC)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Support Vector Machine

& Its Applications

A portion (1/3) of the slides are taken from Prof. Andrew Moores SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials

Mingyue Tan
The University of British Columbia Nov 26, 2004

Overview

Intro. to Support Vector Machines (SVM) Properties of SVM Applications

Gene Expression Data Classification Text Categorization if time permits

Discussion

Linear Classifiers
denotes +1 denotes -1

x
w x + b>0
b= 0

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x + b<0

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

Any of these would be fine.. ..but which is best?

Linear Classifiers
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

How would you classify this data?

Misclassified to +1 class

Classifier Margin
x
denotes +1 denotes -1

yest

f(x,w,b) = sign(w x + b)

Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Maximum Margin
denotes +1 denotes -1

x
1. Maximizing the margin is good accordingf(x,w,b) = sign(w x + b) to intuition and PAC theory 2. Implies that only support vectors are important; other The maximum training examples are ignorable.

yest

Support Vectors are those datapoints that the margin pushes up against

margin linear 3. Empirically it works very very well. classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM)
Linear SVM

Linear SVM Mathematically 1

=+ s l as e C ict zon re d P

x+

M =Margin Width

1 b= + 0 wx b= + wx -1 b= + wx

X-1 = ss la t C o ne c z edi Pr

What we know: w . x+ + b = +1 w . x- + b = -1 w . (x+-x-) = 2

(x x ) w 2 M = = w w

Linear SVM Mathematically

Goal: 1) Correctly classify all training data wx i + b 1 if yi = +1

wx i +b 1 if yi = -1 yi ( wxi + b) 1 for all i 2

M =

2) Maximize the Margin same as minimize

1 t w ww 2

We can formulate a Quadratic Optimization Problem and solve for w and b

1 t ( w) = w w Minimize 2
subject to

yi ( wxi + b) 1

Solving the Optimization Problemthat Find w and b such

(w) = wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) 1 Need to optimize a quadratic function subject to linear constraints. Quadratic optimization problems are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist for solving them. The solution involves constructing a dual problem where a Lagrange multiplier i is associated with every constraint in the primary problem: Find 1N such that Q() =i - ijyiyjxiTxj is maximized and (1) iyi = 0 (2) i 0 for all i

The Optimization Problem Solution

The solution has the form: w =iyixi b= yk- wTxk for any xk such that k 0

Each non-zero i indicates that corresponding xi is a support vector. Then the classifying function will have the form: f(x) = iyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi we will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points.

Dataset with noise

denotes +1 denotes -1

Hard Margin: So far we require

all data points be classified correctly - No training error

What if the training set is noisy? - Solution 1: use very powerful

kernels

OVERFITTING!

Soft Margin Classification

Slack variables i can be added to allow misclassification of difficult or noisy examples.

2
=1 +b wx =0 +b -1 wx b= + wx

11

What should our quadratic optimization criterion be? Minimize

1 w.w + C k 2 k =1

Hard Margin v.s. Soft Margin

The old formulation:

Find w and b such that (w) = wTw is minimized and for all {(xi ,yi)} yi (wTxi + b) 1

The new formulation incorporating slack variables:

Find w and b such that (w) = wTw + Ci is minimized and for all {(xi ,yi)} yi (wTxi + b) 1- i and i 0 for all i

Parameter C can be viewed as a way to control overfitting.

Linear SVMs: Overview

The classifier is a separating hyperplane. Most important training points are support vectors; they define the hyperplane. Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers i. Both in the dual formulation of the problem and in the solution training points appear only inside dot products:

Find 1N such that Q() =i - ijyiyjxiTxj is maximized and (1) iyi = 0 (2) 0 i C for all i f(x) = iyixiTx + b

Non-linear SVMs

Datasets that are linearly separable with some noise work out great:
0 x

But what are we going to do if the dataset is just too hard? How about mapping data to a higher-dimensional space:
x2 0 x

Non-linear SVMs: Feature spaces

General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:

: x (x)

The Kernel Trick

The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation : x (x), the dot product becomes: K(xi,xj)= (xi) T(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space. Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj)= (xi) T(xj): K(xi,xj)=(1 + xiTxj)2, = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 2 xi1xi2 xi22 2xi1 2xi2]T [1 xj12 2 xj1xj2 xj22 2xj1 2xj2] = (xi) T(xj), where (x) = [1 x12 2 x1x2 x22 2x1 2x2]

What Functions are Kernels?

For some functions K(xi,xj) checking that K(xi,xj)= (xi) T(xj) can be cumbersome. Mercers theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3)

K=

K(x1,xN) K(x2,xN) K(xN,xN)

K(x2,x1) K(x2,x2) K(x2,x3) K(xN,x1) K(xN,x2) K(xN,x3)

Examples of Kernel Functions

Linear: K(xi,xj)= xi Txj

Polynomial of power p: K(xi,xj)= (1+ xi Txj)p Gaussian (radial-basis function network): 2

K (x i , x j ) = exp(

xi x j 2
2

Sigmoid: K(xi,xj)= tanh(0xi Txj + 1)

Non-linear SVMs Mathematically

Dual problem formulation:

Find 1N such that Q() =i - ijyiyjK(xi, xj) is maximized and (1) iyi = 0 (2) i 0 for all i

The solution is: f(x) = iyiK(xi, xj)+ b

Optimization techniques for finding is remain the same!

Nonlinear SVM - Overview

SVM locates a separating hyperplane in the feature space and classify points in that space It does not need to represent the space explicitly, simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.

Properties of SVM

Flexibility in choosing a similarity function Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane

Ability to handle large feature spaces

- complexity does not depend on the dimensionality of the feature space

Overfitting can be controlled by soft margin approach Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution

Feature Selection

SVM Applications

SVM has been used successfully in many real-world problems

- text (and hypertext) categorization - image classification - bioinformatics (Protein classification, Cancer classification) - hand-written character recognition

Application 1: Cancer Classification High Dimensional

- p>1000; n<100

Patients P-1

Genes g-2 g-p

g-1

Imbalanced
- less positive samples

p-2 . p-n

n+ K [ x , x ] = k ( x, x ) + N

Many irrelevant features Noisy

FEATURE SELECTION In the linear case, wi2 gives the ranking of dim i

SVM is sensitive to noisy (mis-labeled) data

Weakness of SVM

It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease the performance

It only considers two classes

- how to do multi-class classification with SVM? - Answer: 1) with output arity m, learn m SVMs SVM 1 learns Output==1 vs Output != 1 SVM 2 learns Output==2 vs Output != 2 : SVM m learns Output==m vs Output != m 2)To predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.

Application 2: Text Categorization

Task: The classification of natural text (or hypertext) documents into a fixed number of predefined categories based on their content.
- email filtering, web searching, sorting documents by topic, etc..

A document can be assigned to more than one category, so this can be viewed as a series of binary classification problems, one for each category

Representation of Text
IRs vector space model (aka bag-of-words representation) A doc is represented by a vector indexed by a pre-fixed set or dictionary of terms Values of an entry can be binary or weights

Normalization, stop words, word stems Doc x => (x)

Text Categorization using SVM

The distance between two documents is (x)(z) K(x,z) = (x)(z) is a valid kernel, SVM can be used with K(x,z) for discrimination. Why SVM?
-High dimensional input space -Few irrelevant features (dense concept) -Sparse document vectors (sparse instances) -Text categorization problems are linearly separable

Some Issues

Choice of kernel
- Gaussian or polynomial kernel is default - if ineffective, more elaborate kernels are needed - domain experts can give assistance in formulating appropriate similarity measures

Choice of kernel parameters

- e.g. in Gaussian kernel - is the distance between closest points with different classifications - In the absence of reliable criteria, applications rely on the use of a validation set or cross-validation to set such parameters.

Optimization criterion Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters are tested

Additional Resources

An excellent tutorial on VC-dimension and Support Vector Machines:

C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955974, 1998.

The VC/SRM/SVM Bible:

Statistical Learning Theory by Vladimir Vapnik, WileyInterscience; 1998

http://www.kernel-machines.org/

Reference

Support Vector Machine Classification of Microarray Gene Expression Data, Michael P. S. Brown William Noble Grundy, David Lin, Nello Cristianini, Charles Sugnet, Manuel Ares, Jr., David Haussler www.cs.utexas.edu/users/mooney/cs391L/svm.ppt Text categorization with Support Vector Machines: learning with many relevant features
T. Joachims, ECML - 98

You might also like

An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Stats & ML Model Comparisons
100% (1)
Stats & ML Model Comparisons
72 pages
Ch5 - Support Vector Machine (SVM)
No ratings yet
Ch5 - Support Vector Machine (SVM)
27 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Heart Disease Prediction Guide
100% (1)
Heart Disease Prediction Guide
73 pages
DBSCAN Algorithm for Data Scientists
No ratings yet
DBSCAN Algorithm for Data Scientists
10 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
SVM Algorithm Guide with Python Code
No ratings yet
SVM Algorithm Guide with Python Code
10 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
SVM Example
No ratings yet
SVM Example
10 pages
Feature Selection in Python ML
No ratings yet
Feature Selection in Python ML
7 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Lecture 14 - Logistic and Softmax Regression - Plain
No ratings yet
Lecture 14 - Logistic and Softmax Regression - Plain
12 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Feature Selection Methods
No ratings yet
Feature Selection Methods
24 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
100% (1)
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
12 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Association Rules for Data Analysts
No ratings yet
Association Rules for Data Analysts
16 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Decision Trees
No ratings yet
Decision Trees
32 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
16 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
Machine Learning Loss Functions Guide
100% (2)
Machine Learning Loss Functions Guide
37 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
No ratings yet
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
63 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Armenia's Cultural Concepts of Negotiation Style
No ratings yet
Armenia's Cultural Concepts of Negotiation Style
6 pages
Model 3h Final3
100% (1)
Model 3h Final3
20 pages
Presentation Skills Jhu
No ratings yet
Presentation Skills Jhu
23 pages
DLP Lo1 #31
No ratings yet
DLP Lo1 #31
2 pages
Brainwashed
No ratings yet
Brainwashed
2 pages
Leap 2025 Ela Practice Test Guidance
No ratings yet
Leap 2025 Ela Practice Test Guidance
9 pages
Human Behavior Insights for Leaders
100% (1)
Human Behavior Insights for Leaders
6 pages
Teachers Names
No ratings yet
Teachers Names
4 pages
Mathematics As A Tool: at The End of The Fourth Module, You Must Have
No ratings yet
Mathematics As A Tool: at The End of The Fourth Module, You Must Have
5 pages
(PDF) Matlab As A Teaching and Learning Tool For Mathematics - A Literature Review
No ratings yet
(PDF) Matlab As A Teaching and Learning Tool For Mathematics - A Literature Review
50 pages
Project 4b
No ratings yet
Project 4b
2 pages
Nursing As An Art
100% (1)
Nursing As An Art
4 pages
Mfat Action Plan PDF Free
No ratings yet
Mfat Action Plan PDF Free
1 page
Effectiveness of Online Learning To Grade-12 STEM Learners of La Salle University of AY 2021-2022
No ratings yet
Effectiveness of Online Learning To Grade-12 STEM Learners of La Salle University of AY 2021-2022
4 pages
Aprilia Putri Rahmadini, M.Psi ., Psikolog
No ratings yet
Aprilia Putri Rahmadini, M.Psi ., Psikolog
12 pages
Written Task No. 3 - Res301
No ratings yet
Written Task No. 3 - Res301
6 pages
Udaya Two Year CRP 2018-20 Marks of Phase Test-III Test Date 28-10-2018
No ratings yet
Udaya Two Year CRP 2018-20 Marks of Phase Test-III Test Date 28-10-2018
1 page
DMIT LAB Sample Report PDF
0% (1)
DMIT LAB Sample Report PDF
45 pages
Johnson - Derrida and Science
No ratings yet
Johnson - Derrida and Science
18 pages
Sample Holistic and Analytic Rubric
50% (4)
Sample Holistic and Analytic Rubric
6 pages
22h1310018 - Nguyễn Tuấn Linh
No ratings yet
22h1310018 - Nguyễn Tuấn Linh
3 pages
4 Language of Logic
No ratings yet
4 Language of Logic
12 pages
Caitlin Tolley - CV
No ratings yet
Caitlin Tolley - CV
2 pages
Lewis and Clark Lesson
No ratings yet
Lewis and Clark Lesson
2 pages
English Teacher Application Profile
No ratings yet
English Teacher Application Profile
2 pages
Grade 5 SAU 1
No ratings yet
Grade 5 SAU 1
1 page
Signal Learning
No ratings yet
Signal Learning
3 pages
5 Main Stages of Project Management System
No ratings yet
5 Main Stages of Project Management System
4 pages
Ebert Be13e 05 Header
No ratings yet
Ebert Be13e 05 Header
41 pages
Project Proposal On Challanges Facing Rehabilitation Centres
No ratings yet
Project Proposal On Challanges Facing Rehabilitation Centres
8 pages