0% found this document useful (0 votes)

28 views53 pages

Feature Selection

Feature selection is a machine learning process aimed at identifying a subset of features that enhance model performance while avoiding overfitting and reducing training time. Dimensionality reduction techniques are essential due to the challenges posed by high-dimensional data, such as decreased accuracy and efficiency. Various methods for feature selection include filter methods, wrapper methods, and intrinsic methods, each with distinct advantages and applications in fields like customer relationship management and image recognition.

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views53 pages

Feature Selection

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Feature Selection

Feature Selection 2

• It is a procedure that is used in machine learning to find a subset

of features that produces a good model for the given dataset
fulfilling the following requirements:

• Avoiding overfitting
• Achieving better generalization ability
• Reducing storage
• Reducing training time

Andrew
Why Dimensionality Reduction? 3
• It is so easy and convenient to collect data
• Data accumulates in an unprecedented speed
• Data preprocessing is an important part for effective machine learning and data
mining
• Most machine learning and data mining techniques may not be effective for
high-dimensional data
• Curse of Dimensionality
• Query accuracy and efficiency degrade rapidly as the dimension increases
• The intrinsic dimension may be small.
• For example, the number of genes responsible for a certain type of disease may be small.
• Dimensionality reduction is an effective approach to downsizing data

Andrew
Why Dimensionality Reduction? 4

• Visualization: projection of high-dimensional data onto 2D or 3D.

• Data compression: efficient storage and retrieval.

• Noise removal: positive effect on query accuracy.

Andrew
Application of Dimensionality Reduction 5

•Customer relationship management

•Text mining
•Image retrieval
•Microarray data analysis
•Protein classification
•Face recognition
•Handwritten digit recognition
•Intrusion detection
Andrew
Document Classification 6
Terms
Web Pages
Emails
T1 T2 ….…… TN CS
D p
12 0 ….…… 6 Tr
or
D
1 3 10 ….…… 28 av
ts
Documents J
el

…
2

…
D o
0 11 ….…… 16
M
b
Intern s
et ■ Task: To classify unlabeled
documents into categories
ACM PubMed
■ Challenge: thousands of
IEEE Xplore
Portal terms
Digital Libraries ■ Solution: to apply
dimensionality reduction
Andrew
Other Types of High-Dimensional Data 7

Face images Handwritten digits

Andrew
Major Techniques of Dimensionality Reduction 8

1. Feature selection
2. Feature Extraction (reduction)

Andrew
Feature Selection vs. Feature extraction 9

• Feature selection
• A process that chooses an optimal subset of features according to an objective
function (Only a subset of the original features are selected)

• Feature extraction/reduction
• All original features are used
• The transformed features are linear combinations of the original features

Andrew
1. Feature Selection
10
• Feature or Variable Selection refers to the process of selecting features that are used in
predicting the target or output.
• The purpose of Feature Selection is to select the features that contribute the most to output
prediction.
• The following line from the abstract of Machine Learning Journal sums up the purpose of
Feature Selection.
The objective of variable selection is three-fold: improving the prediction performance of
the predictors, providing faster and more cost-effective predictors, and providing a better
understanding of the underlying process that generated the data.

• Usually these benefits of Feature Selection are quoted :

• Reduces overfitting
• Improves accuracy
• Reduces training time

Andrew
Feature Selection models 11
Feature Selection Methods are categorized into the following methods.

Andrew
Feature Selection models 12

Andrew
a. Filter Method 13

• Filter Methods
• The Filter Methods involve selecting features based on their various statistical scores
with the output column.
• The filter method ranks each feature based on some uni-variate metric and then
selects the highest-ranking features.
• The selection of features is independent of any Machine Learning algorithm.
• The following rules of thumb are as follows:
• The more the features are correlated with the output column or the column to be
predicted, the better the performance of the model.
• Features should be least correlated with each other. If some of the input features
are correlated with some additional input features, this situation is known
as Multicollinearity. It is recommended to get rid of such a situation for better
performance of the model.

Andrew
a. Filter Method
14
Filter Selection Select independent features with:
• No constant Variables
• No/less Quasi-constant variables
• No Duplicate Rows
• High correlation with the target variable
• Low correlation with another independent variable
• Higher information gain or mutual information of the independent variable.
• mRMR Score:
• Selecting an optimal feature subset from a large feature space is considered challenging problem.
• The mRMR (Minimum Redundancy and Maximum Relevance) feature selection framework solves this
problem by selecting the relevant features while controlling for the redundancy within the selected features.
• mRMR feature selection method is used for different classification problem such as marketing machine
learning platform at Uber that automates creation and deployment of targeting and personalization models at
scale

Andrew
a. Filter Method 15

Andrew
a. Filter Method 16

• Removing features with low variance(Low variance filter)

• Variance Threshold is a simple baseline approach to Feature Selection.

• It removes all features whose variance doesn’t meet some threshold.
• This involves removing the features having the same value (zero-variance features)
in all of their rows or to a specified number of rows.
• Such features provide no value in building the machine learning predictive model.

Andrew
Example: rental bikes
predict the count of bikes that have been rented- 17

Andrew
Example contd. 18

Andrew
Example: rental bikes
19

❑ Target variable: count of bikes

❑ Total 6 variables or columns
❑ First drop the ID variable
❑ Apply the low variance filter and try to reduce the dimensionality of the
data.
1. Normalize the data
2. Compute the variance

Threshold>=0.006
3. Set variance threshold
4. Select features have Variance greater than set threshold

Andrew
Example: rental bikes 20

•4. Select features have Variance greater than set threshold

Andrew
How to choose a feature selection method? 21

Andrew
B. Pearson’s correlation coefficient 22

• Pearson Correlation
• A Pearson correlation is a number between -1 and 1 that indicates the
extent to which two variables are linearly related.
• The Pearson correlation is also known as the “product moment
correlation coefficient” (PMCC) or simply “correlation”
• Pearson correlations are suitable only for metric variables
• The correlation coefficient has values between -1 to 1
• A value closer to 0 implies weaker correlation (exact 0 implying no correlation)
• A value closer to 1 implies stronger positive correlation
• A value closer to -1 implies stronger negative correlation

Andrew
Mpg data set: heat map 23
• It is seen that the variables cyl and
disp are highly correlated with each
other (0.902033).
• Hence we compared with target
variable where target variable mpg is
highly correlated with cyl hence
would keep and drop the other.
• Then we check with other variable
same process is followed until last
variable.
• we are left with four features wt, qsec,
gear, carb.
• These are the final features given by
Pearson correlation.
• Multicolineartity: Variance inflation
factor(VIF)
Andrew
B. Pearson’s correlation coefficient 24

• Example: A researcher in a scientific foundation wished to evaluate the relation between

annuals salaries of mathematicians (Y, in thousand dollars) and an index of work quality
(X1), number of years of experience (X2), and an index of publication success (X3)
index number of index of annual Find the correlation Matrix for
of years of publicati salaries given data set
work experience on (in
quality (X2) success thousand
(X1) (X3) dollars
Y)
3.5 9 6.1 33.2
5.1 18 7.4 38.7
6 13 5.9 37.5
3.1 5 5.8 30.1
4.5 25 5 38.2
Andrew
Example: cont. 25

Andrew
C. Mutual Information
(Information Gain) 26

• Information gain as a measure of how much information a feature provides

about a class.
• The feature having the most information is considered important by the
algorithm and is used for training the model.
• The effort is to reduce the entropy and maximize the information gain.
• Information gain helps to determine the order of attributes in the nodes of
a decision tree.
• Information gain is used in decision trees and random forest to decide the
best split.

Andrew
C. Mutual Information 27

• A more robust approach would be to use Mutual Information, which can be

thought as the reduction in uncertainty about one random variable given
knowledge of another.
• Entropy of variable X

• Entropy of X after observing Y

• Information Gain

Andrew
Entropy 28
• Entropy is an information theory metric that measures the impurity or uncertainty in a group of
observations.
• Entropy is uncertainty/ randomness in the data, the more the randomness the higher will be the entropy.
Information gain uses entropy to make decisions.
• If the entropy is less, information will be more.
• Higher entropy implies greater uncertainty or lack of predictability.

Andrew
ENTROPY 29

Andrew
Example#1: 30

Andrew
Example#2: Illustrative Data Set 31

Sunburn data

Ex: Which attribute has the highest info gain for the data?

Andrew
Filter Method 32

• Univariate Selection Methods

• Univariate Feature Selection methods selects the best features based on
Univariate Statistical tests. Scikit Learn provides the following methods in
the Univariate Feature Selection methods.
• SelectKBest
• SelectPercentile
• These two above are the most commonly used methods.

Andrew
Filter Method 33
• Advantages of Filter methods
• Filter methods are model agnostic(compatible)
• Rely entirely on features in the dataset
• Computationally very fast
• Based on different statistical methods
• The disadvantage of Filter methods
• The filter method looks at individual features for identifying it’s relative importance.
A feature may not be useful on its own but may be an important influencer when
combined with other features. Filter methods may miss such features.

• One thing that should be kept in mind is that the filter method does not remove
multicollinearity. So, you must deal with the multicollinearity of features as well
before training models for your data.

Andrew
2. Wrapper Methods (Feature Selection) 34

• Wrapper Methods: In Wrapper Methods the problem of Feature

Selection is reduced to a search problem.
• A model is built using a set of features and its accuracy is recorded.
• Based on the accuracy, more features are added or removed, and
the process is repeated.

Andrew
2. Wrapper Methods (Feature Selection) 35

Andrew
2. Wrapper Methods (Feature Selection) 36

• We have the following methods in Wrapper Methods.

• Forward Selection
• Backward Elimination
• Exhaustive Search
• Recursive Feature Elimination

Andrew
Wrapper Methods 37

• A. Forward Selection
• Forward Selection is an iterative method.
• In this method, we start with one feature and we keep on adding
features until no improvement in the model is observed.
• The search is stopped after a pre-set criteria is met.
• This is a greedy approach because it always targets the features in a
forward fashion, which gives a boost to the performance.
• If the number of features are large, it can be computationally
expensive.
• B. Backward Elimination
• This process is the opposite of the Forward Selection Method.
• It starts initially with all the features and keeps on removing features
until no improvement is observed.
Andrew
Feature Selection :
Wrapper Methods 38
• C. Exhaustive Search
• This Feature Selection Method tries all the possible combinations of features to
select the best model.
• This method is quite computationally expensive.
• For example, if we have five features, we will be evaluating 2^5 =
3225=32 models before finalizing a model with good accuracy.
• D. Recursive Feature Elimination
• Recursively Feature Elimination (RFE) involves the following steps:
• We train the model on the initial set of features and the importance of each
feature is calculated.
• In the second iteration, a model is built again using the most important features
and excluding the least important features.
• These steps are repeated recursively until we are left with the most important
features for the problem under consideration.
• Scikit learn library provides functions for Recursive Feature Elimination.

Andrew
Wrapper method:
Forward feature selection 39

Andrew
Wrapper based model:
forward feature selection 40
• Fitness level prediction

• So the first step in Forward Feature Selection is

to train n models using each feature individually
and checking the performance.
• If you have three independent variables, we will
train three models using each of these three
features individually.

Andrew
forward feature selection: Example 41

• Let’s say we trained the model using

the Calories_Burnt feature and the target
variable, Fitness_Level and we’ve got an
accuracy of 87%

Andrew
forward feature selection: Example cont. 42

Next, we’ll train the model using

the Gender feature, and we get an accuracy
of 80%

Andrew
Example cont. 43
• Next, we will repeat this process
and add one variable at a time. So
of course we’ll keep
the Calories_Burnt variable and
keep adding one variable. So let’s
take Gender here and using this
we get an accuracy of 88%-

Andrew
Example cont 44

Plays_Sport along with Calories_Burnt, we get an

accuracy of 91%. A variable that produces the highest
improvement will be retained.

Andrew
Feature Selection (Wrapper model):
Backward Feature Elimination 45

• These are our assumptions-

• No missing values in the dataset
• Variance of the variables is high
• Low correlation between the independent variables

Andrew
Backward Feature Elimination: Example 46

• Fitness prediction level

• The first step is to train the model,
using all the variables.
• You’ll of course not take the ID
variable train the model as ID
contains a unique value for each
observation
• So we’ll first train the model using
the other three independent
variables. And of course, the target
variable, which is
the Fitness_Level.
• we get an accuracy of 92% using
all three independent variables.

Andrew
47

Andrew
Backward Feature Elimination: Example 48

• If you see gender has produced the smallest change in the performance
in the model first, it was 92% when we took all the variables and when
we dropped gender, it was 91.6%. So we can infer that gender does not
have a high impact on the Fitness_Level variable. And hence it can be
dropped.
• Finally, we will repeat all these steps until no more variables can be
dropped.
• It’s a very simple, but very effective technique.

Andrew
3.Intrinsic Methods (Feature Selection)
49

• Intrinsic or Embedded Methods

• Embedded methods learn about the features that contribute the most to the
model’s performance while the model is being created. You have seen Feature
Selection methods in the previous lessons, and we will discuss several more in
future lessons, like Decision Tree based methods.
• Ridge Regression (L2-Regularization)
• Lasso Regression (L1-Regularization)
• Elastic-Net Regression (uses both L1 and L2 Regularization)
• Decision Tree-Based Methods (Decision Tree Classification, Random Forest
Classification, XgBoost Classification, LightGBM).

Andrew
Variance Inflation Factor(VIF)
50
• Multicollinearity is a statistical phenomenon that occurs when two or more independent
variables in a regression model are highly correlated with each other
• It is challenging in the regression analysis because it becomes difficult to determine the
individual effects of each independent variable on the dependent variable accurately
• VIF determines the strength of the correlation between the independent variables
• VIF score of an independent variable represents how well the variable is explained by
other independent variables

Andrew
VIF 51

• VIF Range is 1 to ∞

• VIF = 1, no correlation between the independent variable and the other variables
• VIF between 1 and 5 = variables are moderately correlated
• VIF greater than 5 = variables are highly correlated
• VIF exceeding 10 indicates significant multicollinearity that needs to be corrected

Andrew
Example 52

Andrew
53

• One method to detect multicollinearity is to calculate the variance inflation factor

(VIF) for each independent variable

• To fix multicollinearity, one can remove one of the highly correlated variables,
combine them into a single variable, or use a dimensionality reduction technique
such as principal component analysis (PCA) to reduce the number of variables while
retaining most of the information.

Andrew

Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection for ML Experts
No ratings yet
Feature Selection for ML Experts
38 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Feature Selection for Data Scientists
No ratings yet
Feature Selection for Data Scientists
61 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
10 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Unit 3
No ratings yet
Unit 3
55 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
Unit 3
No ratings yet
Unit 3
50 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
6 pages
CS464 Ch5 FeatureSelection
No ratings yet
CS464 Ch5 FeatureSelection
31 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Module 2 Data Preprocessing
No ratings yet
Module 2 Data Preprocessing
31 pages
Lecture 03
No ratings yet
Lecture 03
33 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Feature Engineering Essentials
0% (1)
Feature Engineering Essentials
29 pages
KNIME - Seven Techs For Dimensionality Reduction
No ratings yet
KNIME - Seven Techs For Dimensionality Reduction
17 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
ML Lecture UIII 1 Dim Red
No ratings yet
ML Lecture UIII 1 Dim Red
25 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
Feature Selection Tech
No ratings yet
Feature Selection Tech
5 pages
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
No ratings yet
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
66 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Errecom Hvacr Compress
No ratings yet
Errecom Hvacr Compress
88 pages
Report On Merchandising Process of Garments Sector
No ratings yet
Report On Merchandising Process of Garments Sector
27 pages
Chapter 8 Philosophies of Education
No ratings yet
Chapter 8 Philosophies of Education
18 pages
Range and Endurance
No ratings yet
Range and Endurance
26 pages
Guide To Clinical Documentation. ISBN 0803666624, 978-0803666627
96% (25)
Guide To Clinical Documentation. ISBN 0803666624, 978-0803666627
23 pages
Diode Circuits Lab Report
No ratings yet
Diode Circuits Lab Report
8 pages
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
No ratings yet
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
25 pages
Senzor CO2
No ratings yet
Senzor CO2
2 pages
Tortosa, Francisco, and Civera, Cristina. Historia de La Psicología (1A. Ed.) - Madrid, Es: Mcgraw-Hill España, 2009. Proquest Ebrary. Web. 30 June 2016
No ratings yet
Tortosa, Francisco, and Civera, Cristina. Historia de La Psicología (1A. Ed.) - Madrid, Es: Mcgraw-Hill España, 2009. Proquest Ebrary. Web. 30 June 2016
14 pages
Lab 3
No ratings yet
Lab 3
6 pages
WI QC 09 Ink Viscosity
No ratings yet
WI QC 09 Ink Viscosity
2 pages
Technical Sheet LDR N V2-GB
No ratings yet
Technical Sheet LDR N V2-GB
1 page
Linde HPR135 - 1021
No ratings yet
Linde HPR135 - 1021
2 pages
Survey Questions With Tagalog
No ratings yet
Survey Questions With Tagalog
3 pages
The Relationship Between Inventory Management and Profitability A Comparative Research On Turkish Firms Operated in Weaving Industry Eatables Industry Wholesale and Retail Industry
No ratings yet
The Relationship Between Inventory Management and Profitability A Comparative Research On Turkish Firms Operated in Weaving Industry Eatables Industry Wholesale and Retail Industry
6 pages
Types of Stylistics
88% (17)
Types of Stylistics
5 pages
Outline and Evaluate The MSM
No ratings yet
Outline and Evaluate The MSM
2 pages
Anx1 Sow b000
No ratings yet
Anx1 Sow b000
5 pages
Borer Structuring Sense
No ratings yet
Borer Structuring Sense
13 pages
Project Synopsis Part 2
No ratings yet
Project Synopsis Part 2
8 pages
Pembahasan PTK-2011
No ratings yet
Pembahasan PTK-2011
19 pages
Inspection Checklist
No ratings yet
Inspection Checklist
11 pages
Jay Shetty Find Your Purpose Booklet
96% (24)
Jay Shetty Find Your Purpose Booklet
69 pages
Chapter 3 Force and Motion
No ratings yet
Chapter 3 Force and Motion
22 pages
Blow Mould Tool Design and Manufacturing Process For 1litre Pet Bottle
No ratings yet
Blow Mould Tool Design and Manufacturing Process For 1litre Pet Bottle
10 pages
Class 10 Math Exam Attendance
No ratings yet
Class 10 Math Exam Attendance
5 pages
T-Loop Force System With and Without Vertical Step Using Finite Element Analysis
No ratings yet
T-Loop Force System With and Without Vertical Step Using Finite Element Analysis
8 pages
Xii (Applied) Half-Yearly Paper 2023-24
No ratings yet
Xii (Applied) Half-Yearly Paper 2023-24
5 pages
Ew Chacklist
No ratings yet
Ew Chacklist
1 page
Finite Element Analysis - 2 Marks - All 5 Units
77% (31)
Finite Element Analysis - 2 Marks - All 5 Units
13 pages

Feature Selection

Uploaded by

Feature Selection

Uploaded by

Feature Selection

• It is a procedure that is used in machine learning to find a subset

• Visualization: projection of high-dimensional data onto 2D or 3D.

• Data compression: efficient storage and retrieval.

• Noise removal: positive effect on query accuracy.

•Customer relationship management

Face images Handwritten digits

• Usually these benefits of Feature Selection are quoted :

• Removing features with low variance(Low variance filter)

• Variance Threshold is a simple baseline approach to Feature Selection.

❑ Target variable: count of bikes

•4. Select features have Variance greater than set threshold

• Example: A researcher in a scientific foundation wished to evaluate the relation between

• Information gain as a measure of how much information a feature provides

• A more robust approach would be to use Mutual Information, which can be

• Entropy of X after observing Y

• Univariate Selection Methods

• Wrapper Methods: In Wrapper Methods the problem of Feature

• We have the following methods in Wrapper Methods.

• So the first step in Forward Feature Selection is

• Let’s say we trained the model using

Next, we’ll train the model using

Plays_Sport along with Calories_Burnt, we get an

• These are our assumptions-

• Fitness prediction level

• Intrinsic or Embedded Methods

• One method to detect multicollinearity is to calculate the variance inflation factor

You might also like