Feature Transformation

Uploaded by

sabarisris.23it

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Feature Transformation

Uploaded by

sabarisris.23it

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

FEATURE TRANSFORMATION

 Feature Transformation is the process of changing, improving or deriving features

so that a machine learning model can understand and use them better.
 It mainly includes:
1. Feature Construction (building/deriving new features).
2. Feature Extraction (reducing and combining features into useful
representations).

FEATURE CONSTRUCTION
1. Encoding Categorical (Nominal) Variables
 Nominal variables represent categories without any order.
 Eg: gender (Male, Female), city (Paris, London, Tokyo), color (Red, Blue, Green).
 Since most ML algorithms require numeric input, these must be encoded.
Techniques:
i). One-Hot Encoding
 Creates a new column for each category.
 Eg: Color = {Red, Blue, Green} →
 Red = [1,0,0]
 Blue = [0,1,0]
 Green = [0,0,1]
ii). Dummy Encoding
 Similar to one-hot, but one column is dropped to avoid redundancy.
 Eg: Color = {Red, Blue, Green} →
 Red = [1,0]
 Blue = [0,1]
 Green = [0,0]

2. Encoding Categorical (Ordinal) Variables

 Ordinal variables represent categories with a natural order or ranking, but the
distance between levels is not meaningful.
 Eg: education level (High School < Bachelor < Master < PhD), satisfaction (Low <
Medium < High).
Techniques:
i). Label Encoding
 Assigns numbers according to order.
 Eg: Size = {Small, Medium, Large} →
 Small = 1, Medium = 2, Large = 3

ii). Custom Scoring / Ranking

Numbers are given based on domain knowledge.
Eg: Credit Rating = {Poor, Average, Good, Excellent} →
 Poor = 1, Average = 2, Good = 3, Excellent = 4

3. Transforming Numeric Features into Categories

Continuous/numeric variables are sometimes grouped into bins or categories to
simplify models or highlight meaningful ranges.
Techniques:
i). Binning / Discretization
 Divides data into fixed ranges.
 Eg: Age →
 0–17 = Child
 18–59 = Adult
 60+ = Senior

ii). Equal-Frequency Binning (Quantile Binning)

 Groups data so each bin has roughly the same number of samples.
 Eg: Income of 1000 people → split into 4 groups (25% each) = Low, Medium,
High, Very High.

4. Text-Specific Feature Construction

Text is unstructured and must be transformed into numerical features for ML.
Techniques:
i). Bag of Words (BoW)
 Represents text by counting word appearances.
 Eg: “I love AI” → {I:1, love:1, AI:1}

ii). TF-IDF (Term Frequency–Inverse Document Frequency)

 Gives more weight to important/rare words.
 Eg: In product reviews, the word “excellent” is given higher weight than common
words like “the” or “is.”

FEATURE TRANSFORMATION
1. Principal Component Analysis (PCA)
 PCA is an unsupervised dimensionality reduction technique.
 It creates new features (principal components) which are combinations of the
original features.
 The first component captures the maximum variance in the data, the second
captures the next highest variance, and so on.
 PCA reduces the number of features while still keeping most of the important
information.
 Helps in handling multicollinearity (when two or more features are highly
correlated).
 Eg 1: A dataset with 100 features can be reduced to 20 features that still explain
about 95% of the data variance.
 Eg 2: If “Height” and “Weight” are strongly correlated, PCA combines them into
one new feature like “Body Size.”

2. Singular Value Decomposition (SVD)

 SVD is a matrix factorization technique that breaks a matrix into three smaller
matrices (U, Σ, V).
 It helps identify hidden patterns and structure in large datasets.
 Very useful when data is represented as a large matrix (e.g., documents vs words,
users vs movies).
 Helps reduce high-dimensional and sparse data into fewer meaningful dimensions.
 Eg 1: In a term-document matrix (rows = documents, columns = words), SVD
reduces thousands of words into a smaller set of “concepts,” which improves
search engines and topic modeling.
 Eg 2: Netflix uses SVD in its recommender system. From the sparse ratings of
thousands of users and movies, SVD predicts how much a user might like a new
movie.
3. Linear Discriminant Analysis (LDA)
 LDA is a supervised dimensionality reduction method.
 Unlike PCA (which ignores class labels), LDA uses class information to find
features that best separate different categories.
 It projects data into a new space where the distance between different classes is
maximized and the variation within the same class is minimized.
 Especially useful when the goal is classification.
 Eg 1: In medical data, LDA can separate patients into “Cancer Present” vs “Cancer
Absent” groups by projecting features onto a line that maximizes class separation.
 Eg 2: In face recognition, each image has thousands of pixel features. LDA
reduces them into fewer features that highlight differences between people’s faces,
making classification easier.

Basics of Feature Engineering Marked
No ratings yet
Basics of Feature Engineering Marked
33 pages
ML Unit 2 CLS Notes
No ratings yet
ML Unit 2 CLS Notes
38 pages
CHP 4
No ratings yet
CHP 4
72 pages
Unit 2
No ratings yet
Unit 2
91 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
ML 3170724 Unit-4
No ratings yet
ML 3170724 Unit-4
97 pages
Day School 03
No ratings yet
Day School 03
32 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Unit No: 4 Basics of Feature Engineering (31707 24)
No ratings yet
Unit No: 4 Basics of Feature Engineering (31707 24)
98 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Data Pre Processing II
No ratings yet
Data Pre Processing II
26 pages
Wa0003.
No ratings yet
Wa0003.
27 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
Machine Learning (Feature Engineering)
No ratings yet
Machine Learning (Feature Engineering)
10 pages
UT-1-Machine Learning Lecture Notes-2
No ratings yet
UT-1-Machine Learning Lecture Notes-2
11 pages
Data Preprocessing & Feature Engineering
No ratings yet
Data Preprocessing & Feature Engineering
12 pages
4 Data Pre Processing II
No ratings yet
4 Data Pre Processing II
26 pages
Business Analytics
No ratings yet
Business Analytics
14 pages
Featuer Extraction
No ratings yet
Featuer Extraction
3 pages
Unit - 3
No ratings yet
Unit - 3
12 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
ML Notes
No ratings yet
ML Notes
44 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Dimensionality Reduction Methods
No ratings yet
Dimensionality Reduction Methods
3 pages
ML Week 8
No ratings yet
ML Week 8
12 pages
Unit 3
No ratings yet
Unit 3
50 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
DM 2 Part 2
No ratings yet
DM 2 Part 2
35 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
Data Transformation Techniques Guide
No ratings yet
Data Transformation Techniques Guide
1 page
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Week 3
No ratings yet
Week 3
2 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Unit 3
No ratings yet
Unit 3
55 pages
ML Unit4 QB Solutions
No ratings yet
ML Unit4 QB Solutions
8 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Week 3 - LAQ
No ratings yet
Week 3 - LAQ
2 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
Module 4
No ratings yet
Module 4
44 pages
Data
No ratings yet
Data
36 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
ML UNIT 2 2 Old
No ratings yet
ML UNIT 2 2 Old
15 pages
UNIT04
No ratings yet
UNIT04
35 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
AI-Module 4 Updated
No ratings yet
AI-Module 4 Updated
42 pages
ML 4
No ratings yet
ML 4
14 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
Dimensionality Reduction: Key Concepts
No ratings yet
Dimensionality Reduction: Key Concepts
13 pages
Media Access Control (MAC) - Standard Ethernet
No ratings yet
Media Access Control (MAC) - Standard Ethernet
96 pages
Network Layer-Network Layer Services - Network Layer Performance - IPV4 Addresses-DHCP-NAT
No ratings yet
Network Layer-Network Layer Services - Network Layer Performance - IPV4 Addresses-DHCP-NAT
88 pages
Framing HDLC PPP
No ratings yet
Framing HDLC PPP
81 pages
Task - Serialization Deserialization
No ratings yet
Task - Serialization Deserialization
1 page
Routing Protocols - RIP and OSPF - IPV6 Addressing - IPV6 Protocol
No ratings yet
Routing Protocols - RIP and OSPF - IPV6 Addressing - IPV6 Protocol
46 pages
Story Board
No ratings yet
Story Board
2 pages
Case Study Format and Written Discussion
No ratings yet
Case Study Format and Written Discussion
8 pages
Conflict Management for Managers
No ratings yet
Conflict Management for Managers
12 pages
Reading Models for Educators
No ratings yet
Reading Models for Educators
2 pages
Trainer Skills Jeopardy Game
No ratings yet
Trainer Skills Jeopardy Game
4 pages
Effective Oral Communication Guide
No ratings yet
Effective Oral Communication Guide
4 pages
Bridging Wrting Unit 1 Lesson Plan
No ratings yet
Bridging Wrting Unit 1 Lesson Plan
32 pages
Medical Terminology Complete! Plus With Mylab Medical Terminology Pearson
No ratings yet
Medical Terminology Complete! Plus With Mylab Medical Terminology Pearson
407 pages
2 Listening
No ratings yet
2 Listening
21 pages
Alberta Education's Guidelines For Recognizing Diversity and Promoting Respect
No ratings yet
Alberta Education's Guidelines For Recognizing Diversity and Promoting Respect
17 pages
Integrative Approach Handouts
No ratings yet
Integrative Approach Handouts
2 pages
Factors Affecting The Academic Performance of Selected College Working Students in The Province of Cavite, 2018-2019
No ratings yet
Factors Affecting The Academic Performance of Selected College Working Students in The Province of Cavite, 2018-2019
16 pages
Students' Performance and Misconception in Metric Conversion: A Case Study
No ratings yet
Students' Performance and Misconception in Metric Conversion: A Case Study
8 pages
Komunikasi Dalam Pembelajaran Di Era Pandemi Covid-19: Dr. Moh. Toharudin, M.PD
No ratings yet
Komunikasi Dalam Pembelajaran Di Era Pandemi Covid-19: Dr. Moh. Toharudin, M.PD
12 pages
Action Plan - Special Committee
No ratings yet
Action Plan - Special Committee
6 pages
ESP Annual Implementation Plan
83% (6)
ESP Annual Implementation Plan
3 pages
Lesson Plan (Week 5)
No ratings yet
Lesson Plan (Week 5)
3 pages
Teaching Strategies for Literacy and Numeracy
No ratings yet
Teaching Strategies for Literacy and Numeracy
5 pages
IPCRF Review for Education Supervisor
No ratings yet
IPCRF Review for Education Supervisor
6 pages
Ed. 5 - Course Outline
No ratings yet
Ed. 5 - Course Outline
4 pages
ML Lecture#02
No ratings yet
ML Lecture#02
20 pages
BROCHURE Gifted Programs - Jan 2022
100% (1)
BROCHURE Gifted Programs - Jan 2022
2 pages
Unit 3 Plan Speed, Motion and Forces
No ratings yet
Unit 3 Plan Speed, Motion and Forces
14 pages
Class XI Debate Writing
No ratings yet
Class XI Debate Writing
3 pages
Advanced English Lesson Plan
No ratings yet
Advanced English Lesson Plan
3 pages
Educ 450 Learner Centered QLP
No ratings yet
Educ 450 Learner Centered QLP
5 pages
IELTS Teacher Training Workshop-Morning
No ratings yet
IELTS Teacher Training Workshop-Morning
24 pages
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 4
No ratings yet
LET Reviewer Professional Education Prof. Ed.: Assessment and Evaluation of Learning Part 4
3 pages
Dewey's Educational Philosophy
No ratings yet
Dewey's Educational Philosophy
6 pages
Request Letter For Form 137
0% (1)
Request Letter For Form 137
2 pages

Feature Transformation

Uploaded by

Feature Transformation

Uploaded by

FEATURE TRANSFORMATION

 Feature Transformation is the process of changing, improving or deriving features

2. Encoding Categorical (Ordinal) Variables

ii). Custom Scoring / Ranking

3. Transforming Numeric Features into Categories

ii). Equal-Frequency Binning (Quantile Binning)

4. Text-Specific Feature Construction

ii). TF-IDF (Term Frequency–Inverse Document Frequency)

2. Singular Value Decomposition (SVD)

You might also like