0% found this document useful (0 votes)

101 views40 pages

3b Features PDF

The document discusses feature reduction techniques in machine learning, including feature selection and feature extraction. It covers the curse of dimensionality and how irrelevant and redundant features can confuse learners. Feature selection techniques include filter methods that evaluate features independently and wrapper methods that consider feature subsets. Feature extraction techniques include principal component analysis (PCA) and linear discriminant analysis (LDA), with LDA seeking projections that maximize between-class distance and minimize within-class distance to better discriminate classes.

Uploaded by

jithu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views40 pages

3b Features PDF

Uploaded by

jithu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Foundations of Machine Learning

Module 3: Instance Based Learning and

Feature Reduction

Part B: Feature Selection

Sudeshna Sarkar
IIT Kharagpur
Feature Reduction in ML
- The information about the target class is inherent
in the variables.
- Naïve view:
More features
=> More information
=> More discrimination power.
- In practice:
many reasons why this is not the case!
Curse of Dimensionality
• number of training examples is fixed
=> the classifier’s performance usually will
degrade for a large number of features!
Feature Reduction in ML
- Irrelevant and
- redundant features
- can confuse learners.

- Limited training data.

- Limited computational resources.
- Curse of dimensionality.
Feature Selection
Problem of selecting some subset of features, while
ignoring the rest

Feature Extraction

Criteria for selection/extraction:

either improve or maintain the classification
accuracy, simplify classifier complexity.
Feature Selection - Definition
Subset selection

7
Feature Selection Steps

Feature selection is an
optimization problem.
o Step 1: Search the space
of possible feature
subsets.
o Step 2: Pick the subset
that is optimal or near-
optimal with respect to
some objective function.
Feature Selection Steps (cont’d)

Search strategies
– Optimum
– Heuristic
– Randomized

Evaluation strategies
- Filter methods
- Wrapper methods
Evaluating feature subset
• Supervised (wrapper method)
– Train using selected subset
– Estimate error on validation dataset

• Unsupervised (filter method)

– Look at input only
– Select the subset that has the most information
Evaluation Strategies
Filter Methods Wrapper Methods
Subset selection
• Select uncorrelated features
• Forward search
– Start from empty set of features
– Try each of remaining features
– Estimate classification/regression error for adding specific
feature
– Select feature that gives maximum improvement in
validation error
– Stop when no significant improvement
• Backward search
– Start with original set of size d
– Drop features with smallest impact on error
Feature selection
Univariate (looks at each feature independently of others)
– Pearson correlation coefficient
– F-score Univariate methods measure
– Chi-square some type of correlation
– Signal to noise ratio between two random variables
– mutual information • the label (yi) and a fixed feature
– Etc. (xij for fixed j)
• Rank features by importance
• Ranking cut-off is determined by user
Pearson correlation coefficient
Pearson correlation coefficient

From Wikipedia
Signal to noise ratio
• Difference in means divided by difference in
standard deviation between the two classes

S2N(X,Y) = (μX - μY)/(σX – σY)

• Large values indicate a strong correlation

Multivariate feature selection
• Multivariate (considers all features simultaneously)
• Consider the vector w for any linear classifier.
• Classification of a point x is given by wTx+w0.
• Small entries of w will have little effect on the dot
product and therefore those features are less relevant.
• For example if w = (10, .01, -9) then features 0 and 2 are
contributing more to the dot product than feature 1.
– A ranking of features given by this w is 0, 2, 1.
Multivariate feature selection
• The w can be obtained by any of linear classifiers
• A variant of this approach is called recursive feature
elimination:
– Compute w on all features
– Remove feature with smallest wi
– Recompute w on reduced data
– If stopping criterion not met then go to step 2
Feature Extraction
Foundations of Machine Learning
Module 3: Instance Based Learning and
Feature Reduction

Part C: Feature Extraction

Sudeshna Sarkar
IIT Kharagpur
Feature extraction - definition
Feature Extraction
PCA
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Geometric picture of principal components (PCs)
Algebraic definition of PCs
Given a sample of p observations on a vector of N variables

x , x ,, x  
1 2 p
N

define the first principal component of the sample

by the linear transformation
N
z1  a x j   ai1 xij ,
T
1 j  1,2,  , p.
i 1

where the vector a1  (a11 , a21 ,  , a N 1 )

x j  ( x1 j , x2 j ,  , x Nj )
is chosen such that var[ z1 ] is maximum.
PCA
PCA
PCA for image compression

p=1 p=2 p=4 p=8

Original
p=16 p=32 p=64 p=100 Image
Is PCA a good criterion for classification?

• Data variation
determines the
projection direction
• What’s missing?
– Class information
What is a good projection?
Two classes
• Similarly, what is a
overlap
good criterion?
– Separating different
classes

Two classes are

separated
What class information may be useful?
• Between-class distance
– Distance between the centroids
of different classes

Between-class distance
What class information may be useful?
• Between-class distance
– Distance between the centroids of
different classes
• Within-class distance
• Accumulated distance of an instance
to the centroid of its class

• Linear discriminant analysis (LDA) finds

most discriminant projection by
• maximizing between-class distance
• and minimizing within-class distance Within-class distance
Linear Discriminant Analysis
Means and Scatter after projection
Good Projection
• Means are as far away as possible
• Scatter is small as possible
• Fisher Linear Discriminant

 m1  m2 
2

J w 
s s
2
1
2
2
Multiple Classes

Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
Unit 3
No ratings yet
Unit 3
55 pages
Feature Selection for Data Scientists
No ratings yet
Feature Selection for Data Scientists
61 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
Unit 3
No ratings yet
Unit 3
50 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
CS464 Ch5 FeatureSelection
No ratings yet
CS464 Ch5 FeatureSelection
31 pages
Probability and Estimation Review
No ratings yet
Probability and Estimation Review
18 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Multivariate Parametric Methods: Steven J Zeil
No ratings yet
Multivariate Parametric Methods: Steven J Zeil
36 pages
ML Notes
No ratings yet
ML Notes
15 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
ML - 3 - Sovan - KNN - 1
No ratings yet
ML - 3 - Sovan - KNN - 1
95 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Feature Engineering and Dimensionality Reduction
No ratings yet
Feature Engineering and Dimensionality Reduction
146 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
117 pages
Kernels, Model & Feature Selection
No ratings yet
Kernels, Model & Feature Selection
5 pages
Feature Selection & Extraction
No ratings yet
Feature Selection & Extraction
15 pages
Lua Chon Dac Trung
No ratings yet
Lua Chon Dac Trung
18 pages
Wa0028.
No ratings yet
Wa0028.
10 pages
ML Unit 2 CLS Notes
No ratings yet
ML Unit 2 CLS Notes
38 pages
Unit 4
No ratings yet
Unit 4
121 pages
3c Feature Extraction
No ratings yet
3c Feature Extraction
19 pages
Early Detection of Thyroid
No ratings yet
Early Detection of Thyroid
12 pages
HRA Notes
No ratings yet
HRA Notes
18 pages
Home Test AI 220224404
No ratings yet
Home Test AI 220224404
9 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Linear Regression Assignment
No ratings yet
Linear Regression Assignment
49 pages
Intelligent Land Mapping via Remote Sensing
No ratings yet
Intelligent Land Mapping via Remote Sensing
196 pages
AI & ML Lab Course Guide
No ratings yet
AI & ML Lab Course Guide
250 pages
Real-Time ASL Recognition with CNNs
No ratings yet
Real-Time ASL Recognition with CNNs
5 pages
NOCS Rethinking Upload
No ratings yet
NOCS Rethinking Upload
42 pages
Clustering Techniques Review
No ratings yet
Clustering Techniques Review
2 pages
AI in Medical Diagnostics
No ratings yet
AI in Medical Diagnostics
24 pages
Khatri 2020
No ratings yet
Khatri 2020
4 pages
Review On Image Splicing Forgery Detecti
100% (1)
Review On Image Splicing Forgery Detecti
5 pages
ML Model Set 2
No ratings yet
ML Model Set 2
2 pages
Weka Lab Manual
No ratings yet
Weka Lab Manual
49 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Convolutional Neural Network For Diabetic Retinopathy Detection
No ratings yet
Convolutional Neural Network For Diabetic Retinopathy Detection
5 pages
Disease Prediction Based On Symptoms
No ratings yet
Disease Prediction Based On Symptoms
16 pages
Idsa Reviewer
No ratings yet
Idsa Reviewer
4 pages
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
100% (1)
Using Generative Adversarial Networks For Improving Classification Effectiveness in Credit Card Fraud Detection
8 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Smart Aid for Visually Impaired
No ratings yet
Smart Aid for Visually Impaired
14 pages
Stock Prediction Using Machine
No ratings yet
Stock Prediction Using Machine
13 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Exploring Edge-Based Image Steganography in Image Processing
No ratings yet
Exploring Edge-Based Image Steganography in Image Processing
23 pages
ML Lec 15 Alexnet CNN
No ratings yet
ML Lec 15 Alexnet CNN
8 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
16 pages
Data Mining Warehousing - Data Mining - Notes
No ratings yet
Data Mining Warehousing - Data Mining - Notes
56 pages
RelativityOne - Review Center Guide
No ratings yet
RelativityOne - Review Center Guide
50 pages

3b Features PDF

Uploaded by

3b Features PDF

Uploaded by

Foundations of Machine Learning

Module 3: Instance Based Learning and

Part B: Feature Selection

- Limited training data.

Criteria for selection/extraction:

• Unsupervised (filter method)

S2N(X,Y) = (μX - μY)/(σX – σY)

• Large values indicate a strong correlation

Part C: Feature Extraction

define the first principal component of the sample

where the vector a1  (a11 , a21 ,  , a N 1 )

p=1 p=2 p=4 p=8

Two classes are

• Linear discriminant analysis (LDA) finds

You might also like