0% found this document useful (0 votes)

32 views13 pages

Feature Selection

Feature selection is the process of identifying and selecting the most relevant features from a dataset to improve machine learning model performance by enhancing accuracy, speeding up training, and increasing interpretability. It includes methods such as Filter, Wrapper, and Embedded techniques, as well as feature extraction and dimensionality reduction. Additionally, univariate analysis and correlation-based feature selection are important for understanding data characteristics and relationships among features.

Uploaded by

Mandeep Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views13 pages

Feature Selection

Uploaded by

Mandeep Panchal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Feature selection

Feature selection is the process of picking the most important data columns (features) from your dataset that help a
machine learning model make better predictions.

The process of identifying and selecting a subset of relevant features.

It helps make the model:

• Faster to train

• More accurate

• Easier to understand

Why Feature Selection Is Important:

1. Improves Accuracy: Reducing irrelevant features can lead to better model performance.

2. Speeds Up Training: Fewer data points reduce algorithm complexity and training time.

3. Enhances Interpretability: Simpler models are easier to understand and explain.

Reduces Overfitting: Less redundant data means less chance to make decisions based on noise.

Steps in Feature Engineering:

1. Creating new features

2. Transforming features (e.g., scaling, encoding)

3. Selecting important features

4. Reducing the number of features (dimensionality reduction)

Types of Feature Selection Methods:

1. Filter Methods (Like Pre-checking Features)

• Think of this like doing a quick check before training a model.

• It looks at each feature on its own to see if it has a strong relationship with the target (the thing you're trying
to predict).

• Example: "Does age affect whether someone buys a product?"

Tools used: correlation, chi-square test, etc.

Good: Fast and simple

Bad: Doesn't look at how features work together
2. Wrapper Methods (Like Trying Different Combinations)

• These methods test many combinations of features by actually training models.

• It picks the combination that works best for prediction.

• Like trying different outfits to see which one fits best!

Tools used: Recursive Feature Elimination (RFE), forward selection, backward elimination

Good: Very accurate

Bad: Slower and takes more computer power

3. Embedded Methods (Built Into the Model)

• These methods let the model pick the best features while it's being trained.

• It's like cooking and choosing the best ingredients at the same time.

Tools used: Lasso (L1), Decision Trees, Random Forest

Good: A nice balance between speed and accuracy

Bad: Tied to specific types of models
REQUIREMENT AND CLASSIFICATION OF FEATURE ENGINEERING

➤ Why is it needed?

• Raw data often contains irrelevant or redundant features.

• Helps models learn better by focusing on useful data.

➤ Classification:

1. Feature Selection

This means choosing only the most useful features from the data and removing the rest.

• Why? Some features may not help the model and can even make it worse.

• How?

o Filter Methods:

o Wrapper Methods

o Embedded Methods

2. Feature Extraction

This means creating new features from the existing ones.

• Why? Sometimes combining or transforming old features can give better insights.

• Example: PCA (Principal Component Analysis) combines many features into fewer ones that still keep most
of the information.

3. Dimensionality Reduction

This means reducing the number of features while keeping the important information.

• Why?

o Makes models faster to train.

o Easier to visualize the data.

o Reduces overfitting (when a model learns too much from noise in the data).

• Common Techniques: PCA, t-SNE, LDA.

Univariate Analysis

➤ What is it?

Univariate analysis means analysing one variable at a time.

• It helps us understand the basic characteristics of that variable.

• Useful to see how data is spread and whether there are any unusual values (outliers).

Why is it important?

• Helps find central values (like average).

• Shows variability (how spread out the values are).

• Makes it easier to decide how to clean or transform the data.

Common Techniques & Tools

• Mean, Median, Mode: Show the central tendency.

• Histogram: Shows the distribution (how frequently values appear).

• Box Plot: Shows spread, median, and outliers.

• Standard Deviation & Range: Show how spread out the values are.

CORRELATION BASED FEATURE SELECTION (CFS)

➤ What is it?

• Selects features that are highly correlated with the target variable but not with each other.

Method: Chi-Square Test

• Used for categorical features.

• Measures the relationship between two variables.

• Large value → Strong relationship between feature and target.

CFS - Heatmap (Correlation Feature Selection Heatmap)
➤ What is it?

A CFS Heatmap is a visual tool used to show how strongly features in a dataset are related (correlated) to each other.

• It uses color intensity to represent the strength of correlation between pairs of features.

How to Read It:

• Darker colors = Stronger correlation (positive or negative).

• Lighter colors = Weaker or no correlation.

• Values usually range from -1 to 1:

o +1 = Strong positive correlation (both features increase together)

o -1 = Strong negative correlation (one increases while the other decreases)

o 0 = No correlation

Why is it useful?
• Helps in Feature Selection: Identify and remove highly correlated (redundant) features.

• Makes complex relationships easy to understand visually.

• Often used before model building to clean up and simplify data.

FEATURE EXTRACTION – PCA (Principal Component Analysis)
➤ What is PCA?

PCA is a technique to reduce the number of features by combining them into new variables (principal components)
that still capture most of the information (variance).
Dimensionality Reduction in Machine Learning
Dimensionality reduction is the process of reducing the number of input variables (features) in a dataset while
keeping as much important information as possible.

It helps simplify the data, make models faster, reduce overfitting, and improve visualization.

• Real-world data often has too many features (high-dimensional data).

• More features = more complexity, harder to visualize and slower to train.

• Some features may be irrelevant, redundant, or noisy.

Example:

Imagine you're analyzing students with these features:

• Age

• Height

• Weight

• Shoe Size

• Shirt Size

• Grade

• Exam Score

Many of these features are correlated (e.g., height and shoe size). Dimensionality reduction can combine them into
fewer features without losing useful information.

Presentation 1
No ratings yet
Presentation 1
15 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
Unit 3
No ratings yet
Unit 3
50 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Day School 03
No ratings yet
Day School 03
32 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Unit 3
No ratings yet
Unit 3
55 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Tripti Ahmed 20 42960 1
No ratings yet
Tripti Ahmed 20 42960 1
11 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Feature Engineering Essentials
0% (1)
Feature Engineering Essentials
29 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Unit 3 - MSC
No ratings yet
Unit 3 - MSC
51 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
7 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Life Lesson
No ratings yet
Life Lesson
13 pages
Feature Extraction & PCA Guide
No ratings yet
Feature Extraction & PCA Guide
10 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Data Mining Notes C2
No ratings yet
Data Mining Notes C2
12 pages
Feature Selection for ML Experts
No ratings yet
Feature Selection for ML Experts
38 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
Feature Engineering and Dimensionality Reduction
No ratings yet
Feature Engineering and Dimensionality Reduction
146 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Power Bi 4
No ratings yet
Power Bi 4
20 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Futer of ...
No ratings yet
Futer of ...
2 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
4 pages
Unit 4
No ratings yet
Unit 4
42 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
CS ML Unit 2
No ratings yet
CS ML Unit 2
24 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Feature Selection Tech
No ratings yet
Feature Selection Tech
5 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
PMF IAS Geography
100% (3)
PMF IAS Geography
899 pages
Group AFM Assignment
No ratings yet
Group AFM Assignment
11 pages
Segment Reporting Guidelines
No ratings yet
Segment Reporting Guidelines
3 pages
4.nf.a.1, 4.nf.a.2, 4.NF.B.3, 4.NF.B.4
No ratings yet
4.nf.a.1, 4.nf.a.2, 4.NF.B.3, 4.NF.B.4
32 pages
E-Way Bill System-2
No ratings yet
E-Way Bill System-2
1 page
02 Laboratory Exercise 31
No ratings yet
02 Laboratory Exercise 31
8 pages
Sleeping Beauty Landscape Book CKF FKB
No ratings yet
Sleeping Beauty Landscape Book CKF FKB
30 pages
Three Methods For Removing DRM From EPUB On Adobe Digital Editions
No ratings yet
Three Methods For Removing DRM From EPUB On Adobe Digital Editions
5 pages
The Oldest Roblox Yotuber - Google Search
No ratings yet
The Oldest Roblox Yotuber - Google Search
1 page
Catalytic Bleaching
100% (1)
Catalytic Bleaching
195 pages
Asm 16272
No ratings yet
Asm 16272
15 pages
Finite - Element - Modeling - of - Prestressed - Concrete - SP
No ratings yet
Finite - Element - Modeling - of - Prestressed - Concrete - SP
11 pages
Mariem Abidi Rapport PFE 2020 Final
No ratings yet
Mariem Abidi Rapport PFE 2020 Final
101 pages
B.A. Nutrition and Health Education
No ratings yet
B.A. Nutrition and Health Education
30 pages
Roots, Thieir Meanings and Examples
No ratings yet
Roots, Thieir Meanings and Examples
1 page
Ann Afamefuna 1654838736
No ratings yet
Ann Afamefuna 1654838736
104 pages
Autoresponder Alchemy WSL
No ratings yet
Autoresponder Alchemy WSL
13 pages
Search Something and Save4
No ratings yet
Search Something and Save4
7 pages
Eng English1.pdflish 1
No ratings yet
Eng English1.pdflish 1
234 pages
92-005-541 - Poly-Pro All-Format Polish, 1 Gallon
No ratings yet
92-005-541 - Poly-Pro All-Format Polish, 1 Gallon
8 pages
Technology Globalization and Ethics
No ratings yet
Technology Globalization and Ethics
30 pages
AJP - Unit 5
No ratings yet
AJP - Unit 5
106 pages
Universal Helmet Laws Reduce Injuries
No ratings yet
Universal Helmet Laws Reduce Injuries
13 pages
Research Questionnaire
No ratings yet
Research Questionnaire
3 pages
MCA Exam: Mathematical Foundations
No ratings yet
MCA Exam: Mathematical Foundations
3 pages
Yokogawa DX2000 Manual PDF
No ratings yet
Yokogawa DX2000 Manual PDF
324 pages
Keratograph 5m en PDF
No ratings yet
Keratograph 5m en PDF
16 pages
9.1.2.5 Lab - Install Linux in A Virtual Machine and Explore The GUI
No ratings yet
9.1.2.5 Lab - Install Linux in A Virtual Machine and Explore The GUI
4 pages
Sony STR-DB940 Service Manual
100% (1)
Sony STR-DB940 Service Manual
86 pages
Architectural Concrete
No ratings yet
Architectural Concrete
24 pages