0% found this document useful (0 votes)

28 views47 pages

L5 Dimensionality Reduction

Uploaded by

ankitkumar20771089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views47 pages

L5 Dimensionality Reduction

Uploaded by

ankitkumar20771089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Oil and Gas Petro chemical Pulp and Paper Water and wastewater Metal Industries

Applications
Data pre- ML model AI algorithm Optimizer DSS
Data
processing development development
Modelling &
APC
Simulation

Dr. Senthilmurugan Subbiah, Department of Chemical Engineering, IITG.

Dimensionality reduction
Dimensionality reduction
Why do we need dimensionality reduction
• Dimensionality reduction is a • Dimensionality reduction techniques help us
comprehensive approach to reduce overcome these problems associated with
dataset complexity. high dimensional data
• Mitigates curse of dimensionality,
enhancing model performance.
• Reduces overfitting by eliminating
irrelevant features.
• Improves computational efficiency,
speeding up training.
• Facilitates data visualization for better
interpretation.
• Focuses on crucial features for clearer
model insights.

February 17, 2024 | Slide 2

Example: Importance of dimensionality reduction
Soft sensor for distillate concentration estimation
Controller
• In the distillation column, top distillate Reflux
concentration is an important parameter Setpoint Distillate
concentration Condenser Drum
to control the product quality
• It will be varying with respect to feed flow, Reflux
composition, temperature, reflux ratio etc. Analyser Top
Feed
• Composition of top distillate can be Distillation
Product
Distillate
controlled by manipulating the reflux flow. column
• Controller use input from the analyser to
control the reflux flow with respect to the
setpoint
• This solution is expensive because the
analyser expensive also continuous
measurement not possible ( every 1s) Bottom
product Reboiler

February 17, 2024 | Slide 3

Soft sensor for distillate concentration estimation

• Alternate solution to replace the expensive Controller

analyser is a soft sensor Setpoint Distillate
concentration
Reflux
• To develop a soft sensor, the model between Condenser Drum
distillation concentration at time t and other
dependent parameters is important T3
Reflux
• Now, how to identify suitable dependent parameter Soft T2
Sensor Top
• Tray temperature, surrounding temperature, T1
Analyser
feed temperature, flow, pressure etc Product
Flow meter Distillation Distillate
• 𝐶𝑑𝑖𝑠𝑡 = 𝑓 𝑡, 𝑇𝑡𝑟𝑎𝑦 , 𝑇𝐸 , 𝑄𝑓 , 𝑄𝑑 , 𝑄𝑏 , 𝑄𝑇 , 𝑇𝑏 , 𝑒𝑡𝑐 column
• Both model input and output is called as features Feed
in ML
• In general, in the first principle model approach,
the feature is chosen based on physics
• General perception is that incorporation more
feature leads better model, but this not true
Bottom
product Reboiler

February 17, 2024 | Slide 4

Dimensionality reduction

t T1 T2 T3 TE Qf Qd Qt Ct • How to Identify the best relation between Ct and

(Kg/h) (Kg/h) (Kg/h) other variables
5 90 80 70 25 92 63 29 • By removing irrelevant feature
• Example: environmental temperature (TE)
10 88.5 78 52.61 25 89 61 28 may not directly affect the distillate
concentration, so the inclusion of this
15 91 72 49.19 25 88 60 28 variable may mislead the model prediction
28.15
• Redundant feature
20 87.5 68 59.35 25 88.25 60.1
• Example inclusion feed, distillate and bottom
29 flow (Qf, Qd, Qb) lead to redundant features
25 93 75 50.85 25 90 61
by knowing two of them third can be
30 87 61 42.88 25 82 53 29 estimated.
• To overcome these challenges, we need to identify
35 89 62 31 25 76 50.4 25.6 the key features of data with knowing the physics
behind the data
40 94 64 44.12 25 80 57 23 • Two approaches
• feature selection
• feature extraction

February 17, 2024 | Slide 5

Dimensionality reduction / Feature Engineering
Feature selection Vs Feature Extraction
Dimensionality reduction can be achieved
through two main strategies:
Feature extraction: This involves
creating new features from the existing
ones (e.g., PCA transforms the original
variables into a new set of uncorrelated
variables representing the same
information).
Feature selection: This strategy involves
selecting a subset of the most relevant
features from the original dataset without
transforming them.
February 17, 2024 | Slide 6
Feature selection Vs Feature Extraction

Feature selection: Optimization involves

Given the initial set of features,
maximization of accuracy
𝐹 = 𝑥1 , 𝑥2 , 𝑥3 … . 𝑥𝑁
Find a subset of features within F by minimization of complexity
optimizing
𝐹 ′ ⊃ 𝐹 = 𝑥1 , 𝑥2 , 𝑥3 … . 𝑥𝑀

Feature extraction:
Given the initial set of features,
𝐹 = 𝑥1 , 𝑥2 , 𝑥3 … . 𝑥𝑁
Find a projected/transformed new M
features which are less than the original N
features by optimizing
𝐹 ′ ⊃ 𝐹 = 𝑥1′ , 𝑥2′ , 𝑥2′ , … . . 𝑥1′
February 17, 2024 | Slide 7
Dimensionality Reduction Techniques : Overview

DIMENSIONALITY
REDUCTION Filter Method Wrapper Method Embedded Method
TECHNIQUES
• Correlation Method • Step Forward • Lasso, Ridge Elastic Net
• Chi Square Test • Step Backward • Random Forest, XG
• Anova • Recursive Feature Boost, Decision Tree
• Variance Inflation Factor Elimination (RFE) Algos

Feature Selection :
Remove less significant
features from data so that
model is trained only on These techniques will form a part of our lecture.
significant features
• PCA : Principle Component Analysis
• SVD : Singular Vector Decomposition
Feature Extraction : It is • ICA : Integrated Component Analysis
a method by which initial • t-SNE : t- distributed Stochastic Neighbour Embedding
set of raw data is reduced • UMAP : Uniform Manifold Approximation Projection
to more manageable • LDA : Linear Discriminant Analysis
groups for processing.

February 17, 2024 | Slide 8

Identification of optimized subset
Feature selection - Filter method
• No of subset possible = 2𝑁 Select all Selecting the best Learning
subset based on
• Identification of the best subset from 2𝑁 features
the score
algorithm
set by evaluation of individual subset Analysis of performance
performance may not be viable
• Methods to find the best subset: The following table for defining statistical scores
Optimization technique, Heuristic ,
Randomized / gird output Continuous/ Categorical/
• Filter method: Select features based on regression Classification
their statistical scores in relation to the Input Feature output output
target variable. These methods are Continuous/ Person’s LDA
usually fast and independent of any regression input coefficient
machine learning algorithms.
• Use filter methods for a quick and rough Categorical/ Anova Anova / Chi-
feature selection before applying more
Classification input squre
complex methods.

February 17, 2024 | Slide 9

Defining statistical scores

• Pearson’s Correlation:It is used as a measure for quantifying linear dependence between two
continuous variables X and Y. Its value varies from -1 to +1. Pearson’s correlation (r) is given as:
σ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑟=
σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑦𝑖 − 𝑦ത 2
• LDA: Linear discriminant analysis is used to find a linear combination of features that characterizes
or separates two or more classes (or levels) of a categorical variable.
• It maximizes the separation between multiple classes while minimizing the variance within
each class.
• LDA assumes that the features are normally distributed and that the classes have identical
covariance matrices
• Anova: ANOVA stands for Analysis of variance.
• It is similar to LDA except that it is operated using one or more categorical independent
features and one continuous dependent feature.
• It provides a statistical test of whether the means of several groups are equal or not.
• Chi-Square: It is a is a statistical test applied to the groups of categorical features to evaluate the
likelihood of correlation or association between them using their frequency distribution.
February 17, 2024 | Slide 10
Feature selection
Wrapper method
Select all
• Principle: Select subsets of features that features
contribute most to the performance of a
given model. This involves searching Select the initial
through different combinations and subset
evaluating their performance.
Optimization
• Select a subset of features and train a
model using that subset.
Objective
• Based on the performance of the subset, function
→decide to add or remove features from
your subset.
• The problem is essentially reduced to a Learning
algorithm
search problem.
• These methods are usually
computationally very expensive Analysis of performance

February 17, 2024 | Slide 11

Feature selection
Wrapper method - Heuristic Approch
• Forward selection Backward selection
• Step1: Start with the first feature • Step1: Start with the full feature

set and estimate the accuracy set and estimate the accuracy
• Step2: Remove features one by
• Step2: Add remaining features one
one to estimate accuracy, i.e.
by one to estimate accuracy i.e. classification/regression error
classification/regression error
• Step3: Drop the feature that gives
• Step3: Select the feature that minimum degradation in accuracy
gives maximum improvement (use the validation set)
using the validation set • Step4: Stop when there is no
• Step4: Stop when there is no significant degradation by
improvement. dropping further features.

February 17, 2024 | Slide 12

Recursive Feature Elimination (RFE)

Principle: RFE is a more systematic Process:

approach compared to the first two • Train a model using all features.
methods. It involves recursively removing • Rank the features based on their
features, building a model, and evaluating importance (e.g., coefficients in
it by using the model itself to estimate regression).
feature importance.
• Remove the least important features.

• Rebuild the model with the remaining

features and repeat the process until
the desired number of features is
reached.

February 17, 2024 | Slide 13

Forward selection, Backward selection, Recursive Feature Elimination (RFE)\

Recursive Feature
Forward selection
Backward selection Elimination (RFE)
February 17, 2024 | Slide 14
Feature Selection
Embedded Method
Select all
• Embedded methods combine the qualities’ of features
filter and wrapper methods.
• It’s implemented by algorithms that have their Select the initial
own built-in feature selection methods. subset
• Some of the most popular examples of these
methods are LASSO and RIDGE regression Optimization
which have inbuilt penalization functions to
reduce overfitting. Objective
• Lasso regression performs L1 regularization function
which adds penalty equivalent to absolute
value of the magnitude of coefficients.
• Ridge regression performs L2 regularization Learning
which adds penalty equivalent to square of the algorithm
magnitude of coefficients.
• For more details and implementation of LASSO
and RIDGE regression will be covered after Analysis of
mid semester performance

February 17, 2024 | Slide 15

Feature Extraction
Principal component analysis (PCA)

▪ Water quality data from a different location ▪ If the case of a single parameter
▪ How to identify the relation between
parameter and location or between parameter L1 L3 L2 L4 L5 L6

source L1 L2 L3 L4 L5 L6
Parameter
pH (-) 2 6 2.4 6.5 7 7.5
▪ If the case of two-parameter
TDS (g/l) 10 20 10.1 12 19 18.5 25
20 L2 L5 L6
BOD (g/l) 1 0.1 1.2 0.2 0.3 0.15 15

TDS
L4
10 L1 L3
COD (g/l) 1.0 0.2 1.5 0.1 0.2 0.3
5
Temp (C) 23 23 21 24 25 20 0
1 2 3 4 5 6 7 8
pH
February 17, 2024 | Slide 16
Feature Extraction
Principal component analysis (PCA)

▪ Water quality data from a different location ▪ If the case of a single parameter
▪ How to identify the relation between
parameter and location or between parameter L1 L3 L2 L4 L5 L6

source L1 L2 L3 L4 L5 L6
Parameter
pH (-) 2 6 2.4 6.5 7 7.5
▪ If the case of two-parameter
TDS (g/l) 10 20 10.1 12 19 18.5 25
20 L2 L5 L6
BOD (g/l) 1 0.1 1.2 0.2 0.3 0.15 15

TDS
L4
10 L1 L3
COD (g/l) 1.0 0.2 1.5 0.1 0.2 0.3
5
Temp (C) 23 23 21 24 25 20 0
1 2 3 4 5 6 7 8
pH
February 17, 2024 | Slide 17
Feature Extraction
Principal component analysis (PCA)
▪ In case of three parameter • If no more than three parameters then
plotting and grouping may not be
L3
possible
L1
• Therefore, a higher dimensional
problem has to be reduced to a lower
BOD dimension problem
• PCA help to reduce the higher
L4
dimension problem lower dimension
L5 through PCs
L6 L2
TDS
pH

February 17, 2024 | Slide 18

How to apply PCA

February 17, 2024 | Slide 20

𝒅𝟏

February 17, 2024 | Slide 22

February 17, 2024 | Slide 23
February 17, 2024 | Slide 24
February 17, 2024 | Slide 25
February 17, 2024 | Slide 26
Variation of PC

▪ From problem in hand,

Variation for PC1 = 1.81 ⇒ Total Variation around PC is 1.81+26.23 =28.04
Variation for PC2 = 26.23

𝑆𝑆 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑃𝐶1

= 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑃𝐶1
𝑛−1
∴ 𝑃𝐶1 accounts for 1/18 = 0.83 = 83 %
𝑆𝑆 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑃𝐶2
= 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑃𝐶2 of the total variation around the PCs.
𝑛−1

(83%)

February 17, 2024 | Slide 27

Scree Plot
▪ A Scree Plot is a graphical
representation of the percentages of (17%)
variation that each PC accounts for.

6.53%
(83%)

February 17, 2024 | Slide 28

Similarly for 3D
L3
source L1 L2 L3 L4 L5 L6
Parameter
pH (-) 2 6 2.4 6.5 7 7.5 L1

TDS (g/l) 10 20 10.1 12 19 18.5

BOD (g/l) 1 0.1 1.2 0.2 0.3 0.15 L6

L5
L2
L4

3D scatter plot

February 17, 2024 | Slide 30

Similarly for 3D

Different view of
3D scatter plot of Centered Data
same 3D scatter plot
February 17, 2024 | Slide 31
Similarly for 3D
Scree Plot
Lastly, we find PC3, the best fitting line that
goes through the origin and is perpendicular 93.39%
to PC1 & PC2

PC3

6.53%
PC1 0.08%
PC2

PC2 & PC3 account for the

vast majority of the variation.

February 17, 2024 | Slide 32

Math behind the PCA
M features
Step 1:Features and records in matrix form Step 2: Compute mean
𝑥11 𝑥21 𝑥31 𝑥.1 𝑥.1 𝑥𝑀1 𝑁
𝑥12 𝑥22 𝑥32 𝑥.2 𝑥.2 𝑥𝑀2 Mean 𝑋ത 1
𝑥13 𝑥23 𝑥33 𝑥.3 𝑥.3 𝑥𝑀3 𝑋ത𝑗𝑖 = ෍ 𝑥𝑗𝑖
𝑁
𝑋= 𝑥 𝑥2 . 𝑥3 . 𝑥𝑟 . 𝑥. 𝑖 𝑥𝑀 . 𝑖=1
1. 𝑗 = 1: 𝑀
𝑥1 . 𝑥2 . 𝑥3 . 𝑥. . 𝑥. 𝑗 𝑥𝑀 .
𝑥1𝑁 𝑥2𝑁 𝑥3𝑁 𝑥.𝑁 𝑥. 𝑁 𝑥𝑀𝑁
N feature record Step 3: Calculate the eigenvector that represents the line
with maximum variance (i.e. calculate the covariance matrix
Step 3: Centre the data with respect to Mean 𝑋ത
of data = Cov(X), select M largest eigenvector)

6.0
6.0
4.0 𝐶𝑜𝑣(𝑋) = 𝐵𝑇 𝐵
4.0 𝐵 = 𝑋 − 𝑋ത 2.0
2.0
𝑃𝐶 = 𝐵𝑉

TDS
0.0
TDS

0.0 -4.0 -3.0 -2.0 -1.0-2.0 0.0 1.0 2.0 3.0

-4.0 -3.0 -2.0 -1.0-2.0 0.0 1.0 2.0 3.0
-4.0 V - eigenvector
-4.0
-6.0
-6.0 pH
pH

February 17, 2024 | Slide 33

Applying PCA for Dimensionality Reduction

Step-by-Step Process
• Standardize the Data: Ensure each feature contributes equally.

• Compute the Covariance Matrix: Understand how variables relate.

• Eigen Decomposition: Find eigenvectors (directions of maximum variance) and

eigenvalues (magnitude) from the covariance matrix.
• Select Principal Components: Choose components that capture most variance
(often visualized through a scree plot).
Constructing Reduced Dataset
• Form a feature vector from the selected eigenvectors.

• Transform the original dataset into a lower-dimensional space using this feature
vector.
February 17, 2024 | Slide 34
Application of PCA
How segregate the image content using PCs
Things like these can be achieved with PCA

Github Link : PCA IRIS DATA VISUALISATION

February 17, 2024 | Slide 35

Singular Vector Decomposition (SVD)
How it works
• Given a matrix A of dimensions m x n, SVD decomposes A into three matrices:
𝐴 = 𝑈Σ𝑉 𝑇
• U is an m x m orthogonal matrix, where the columns are the left singular vectors of
A
• Σ is an m x n diagonal matrix (with additional zeros if m ≠ n), where the diagonal
elements are the singular values of A, sorted in descending order. These values
are non-negative and are a measure of the "importance" or "strength" of the
corresponding vectors in U and 𝑉 𝑇 .
• 𝑉 𝑇 is the transpose of an n x n orthogonal matrix, where the columns are the right
singular vectors of A.
• Essential for revealing the underlying structure of data.

February 17, 2024 | Slide 36

Implementing SVD for Dimensionality Reduction

Data Preparation Advantages of using SVD over PCA

• Normalize or standardize data for equal • Numerically stable, avoids covariance
feature contribution. matrix.
Performing SVD • Works with non-centered data.
• Decompose the data matrix A to get • Efficient for large, sparse matrices.
𝑈, Σ, 𝑎𝑛𝑑 𝑉 𝑇 • More robust to outliers.
Selecting Top Singular Values • Applicable to any m x n covariance
• Choose k numbers of largest singular matrix.
values for dimension k.
• Versatile beyond dimension reduction.
• Retains most significant data variance.
• Facilitates missing data reconstruction.
Constructing Reduced Data
• Informed rank selection for
• 𝐴𝑘 = 𝑈𝑘 Σ𝑘 𝑉𝑘𝑇 , where k reflects the reduced dimensionality.
dimensions
February 17, 2024 | Slide 37
Singular Vector Decomposition (SVD)

Singular Vector Decomposition follows a

similar approach to PCA where we find
features Eigen vectors. This problem shows
how involving more and more singular
vectors (r) we can get to a clear image. If
you look that at r = 10 also the image looks
great and nearly interpretable, despite the
fact that this image has far less features
than the original image. It involves only
characteristic features. This is the power of
SVD. Github Link
February 17, 2024 | Slide 38
Improved version of PCA
Linear Discriminant Analysis (LDA) for improved classification
• Instead of only maximization of
convergence of data points
• Addition of following objective will
improve the classification of groups
can be better
• Maximize the distance between
the centroids of different
classes/groups
• Minimize within the class/group
distance

February 17, 2024 | Slide 39

Linear Discriminant Analysis (LDA)

Maximise the combined objective

𝑚1 −𝑚2
J=
𝑠12 −𝑠22
• m1 = Mean or centroid for class 1
• m2 = mean or centroid for class 2
• S1 = Standard deviation of class 1
• S2 = Standard deviation of class 2

Advantages of LDA
Efficiency: LDA is particularly useful when the datasets are
linearly separable.
Simplicity: It’s straightforward to implement and understand.
Performance: Often outperforms other linear classifiers,
especially when the assumptions of common covariance
(Homoscedasticity), Gaussian distributions and statistically
independent feature approximately hold.

February 17, 2024 | Slide 40

Independent Component Analysis (ICA)
ICA Seeks to reduce dimensionality by finding
orthogonal directions that maximize variance
There are two key assumptions in ICA :
• Components we are trying to uncover must be
statistically independent.
• They should be non-gaussian in character.
• Assumes linear combination between observed
and mutually independent variables.

Let’s look at the measured signals in a cocktail party

on the microphones. On the purple microphone, we
will have some linear combination of red and blue
people which are mutually independent of each other.
ICA takes the observed output of your mic and traces
it back to those red and blue people, which are
mutually independent. This is a very important
application for deconstructing an observed variable
into mutually independent variables.

February 17, 2024 | Slide 41

Independent Component Analysis (ICA)

February 17, 2024 | Slide 42 Github Link

Steps to Apply ICA for Dimensionality Reduction

Standardize the Data:

• As with most data preprocessing for dimensionality reduction techniques, start by standardizing your data so that
each feature has zero mean and unit variance.
Choose the Number of Components:
• Decide how many independent components you want to extract. This is analogous to selecting the number of
principal components in PCA and will be the dimensionality of your reduced data.
Whitening:
• Before applying ICA, it's common to whiten the data. Whitening is a transformation that linearly transforms the data
so that the resulting signals are uncorrelated and their variances equal unity.
Apply an ICA Algorithm:
• Apply an ICA algorithm to the whitened data to estimate the independent components. Each independent
component will be a linear combination of the original features.
• FastICA is a popular algorithm for performing ICA, because it's efficient and typically converges quickly.
Form the Reduced Data Set:
• Once the independent components are found, you can form a reduced dataset by selecting a subset of the most
relevant components, according to some criterion relevant to your application; Domain Knowledge, Statistical Tests
and Performance on a Task

February 17, 2024 | Slide 43

t-Distributed Stochastic Neighbor Embedding (t-SNE )

t-SNE is a powerful tool for visualizing high-dimensional data in two or three dimensions. Unlike PCA, which
preserves global structure, t-SNE focuses on preserving local relationships between points, making it excellent
for identifying clusters or groups in the data.
How t-SNE Works:
• Neighborhoods in High Dimensions: Imagine each data point in high-dimensional space has a bubble of
neighbors around it. This bubble isn't rigid; it's more like a cloud of points that are close to it, where
"closeness" is measured by probability (similar points have a higher probability of being neighbors).
• Bringing it Down to Earth: t-SNE aims to represent these high-dimensional neighborhoods faithfully on a
2D or 3D map. It's like taking a complex constellation of stars and drawing it on paper such that stars close
together in space are also close on the paper.
• Attention to the Local: While PCA reduces dimensions by capturing the most variance across the whole
dataset, t-SNE zeroes in on the local structure. It ensures that if two points are neighbors in the high-
dimensional space, they should also be neighbors in the lower-dimensional space.
• Clusters Emerge: By focusing on these local relationships, t-SNE allows clusters of similar data points to
emerge naturally in the visualization, even if those clusters are shaped by complex, nonlinear relationships
that PCA might miss.

February 17, 2024 | Slide 44

t-Distributed Stochastic Neighbor Embedding (t-SNE )

t-SNE

Link

When we need to reduce a non-linear data distribution into lower dimensions t-SNE
becomes a very important dimensionality reduction technique. It is especially used in CNN
networks to flatten images. In this technique, we calculate the similarity score (similarity of
the target data point to another datapoint is a conditional probability that it will pick as its
neighbour) of each point with other points in the mix. That similarity score distribution help
project different points in a lower dimension. Once Projected on a lower dimensional scale
they are clustered as per the different groups they belong to.

February 17, 2024 | Slide 45

t-SNE vs. PCA

t-SNE vs. PCA: Key Differences Practical Tips for Using t-SNE
• Focus: PCA captures the directions of • t-SNE for Visualization: Use t-SNE when your
maximum variance, useful for reducing main goal is to visualize high-dimensional data
dimensions and sometimes for visualization. t- in a way that highlights clusters or groups.
SNE, however, excels in visualization by • Starting with PCA: For very large datasets, it's
preserving local neighbor relationships. often beneficial to first reduce the
• Linearity vs. Non-linearity: PCA is linear, while dimensionality with PCA (to about 50
t-SNE is non-linear, making t-SNE better for dimensions) before applying t-SNE, to make
capturing the true intricacies of data. the computation faster and less noisy.
• Global vs. Local: PCA looks at the big picture • Parameter Tuning: t-SNE has a few
(global structure), while t-SNE focuses on the parameters (like perplexity) that can
detailed local patterns, making it easier to significantly affect the outcome. Experimenting
identify clusters or groups. with these can help achieve a more informative
visualization.

February 17, 2024 | Slide 46

Uniform Manifold Approximation and Projection (UMAP)

UMAP

PCA might not work appropriately with very complex datasets. PCA only work for such visualisations when the first two or
first three components capture the maximum variance in the data. If that’s not the case we may not be able to visualise the
data with the help of PCA. We use UMAP for this purpose. UMAP is very popular because it's relatively faster than other
techniques like t-SNE. It works on a similar idea as that of t-SNE by calculating similarity scores. It does not involve
probability measures as in t-SNE rather it focuses on taking a log base 2 off the nearest neighbours one defines (an
important parameter for the UMAP) to decide characteristic curves. It clusters all the points together whose scores sum up
to the log base two of the number of nearest neighbours. UMAP scales the curve for every point in a way that they all will be
clubbed accurately to the same sum of log base 2 of the nearest neighbours. Once the similarity scores are received it
moves the points accordingly on a lower dimension to the cluster.

February 17, 2024 | Slide 47

Uniform Manifold Approximation and Projection (UMAP)

t-SNE : Time taken : 7.36 sec UMAP: Time taken : 1.23 sec

Github Link
February 17, 2024 | Slide 48
Disadvantages of Dimensionality Reduction

1. Interpretability : The newly formed features using dimensionality reduction

reduces the interpretability of the data as they are not directly related to the
features. For e.g We do not know whether Principal Component 1 corresponds to
the slate value of the crude. Most of the times in Chemical Industry people are more
concerned with interpretability of the models than they are with the model itself.
2. Loss of Information : While doing dimensionality reduction, we lose some of the
information, which can possibly affect the performance of subsequent training
algorithms.
3. Computationally Intensive : Some dimensionality reduction algorithm are quite
computationally extensive. For e.g in case of t-SNE.
4. Limitations : Dimensionality reductions techniques have their limitations. For e.g
PCA cannot work on very complex datasets or t-SNE is a time consuming algorithm
or issue of computational space required by each of these algorithms.
February 17, 2024 | Slide 49

Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
24 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
Unit 3
No ratings yet
Unit 3
50 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
KNIME - Seven Techs For Dimensionality Reduction
No ratings yet
KNIME - Seven Techs For Dimensionality Reduction
17 pages
Unit 3
No ratings yet
Unit 3
55 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Tripti Ahmed 20 42960 1
No ratings yet
Tripti Ahmed 20 42960 1
11 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
Module 2
No ratings yet
Module 2
12 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
6 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
ML Module 6
No ratings yet
ML Module 6
29 pages
Day School 03
No ratings yet
Day School 03
32 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
Research Citation Notes
No ratings yet
Research Citation Notes
35 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
Module 2 Data Preprocessing
No ratings yet
Module 2 Data Preprocessing
31 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Feature Selection Tech
No ratings yet
Feature Selection Tech
5 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Week 2 v1.1 (Hidden) - Dimensionality and Evaluation
No ratings yet
Week 2 v1.1 (Hidden) - Dimensionality and Evaluation
47 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
47 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
ML Notes
No ratings yet
ML Notes
15 pages
Machine Learning Neural and Statistical Classification Ellis Horwood Series in Artificial Intelligence 1st Edition by Michie, Spiegelhalter, Taylor ISBN 013106360X 9780131063600 - The latest ebook version is now available for instant access
100% (9)
Machine Learning Neural and Statistical Classification Ellis Horwood Series in Artificial Intelligence 1st Edition by Michie, Spiegelhalter, Taylor ISBN 013106360X 9780131063600 - The latest ebook version is now available for instant access
87 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Anova & Factor Analysis
No ratings yet
Anova & Factor Analysis
24 pages
Chemometric Classification Techniques As
No ratings yet
Chemometric Classification Techniques As
10 pages
Denooy The Features of Experts Awarding Literary Prizes 1988
No ratings yet
Denooy The Features of Experts Awarding Literary Prizes 1988
15 pages
Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
Predictive Modelling - Final Project Report-Logistic Regression and LDA
100% (1)
Predictive Modelling - Final Project Report-Logistic Regression and LDA
25 pages
Discriminant Analysis in SAS Guide
No ratings yet
Discriminant Analysis in SAS Guide
53 pages
Discriminant Analysis Guide
No ratings yet
Discriminant Analysis Guide
19 pages
Open Stat Reference
No ratings yet
Open Stat Reference
403 pages
Credit Scoring For Microfinance Institutions in Mexico An Ensemble and Hybridized Approach
No ratings yet
Credit Scoring For Microfinance Institutions in Mexico An Ensemble and Hybridized Approach
7 pages
Advanced Pattern Recognition Technologies With Applications To Biometrics 1st Edition David Zhang Full Access
No ratings yet
Advanced Pattern Recognition Technologies With Applications To Biometrics 1st Edition David Zhang Full Access
108 pages
Discriminant Analysis Presentation
No ratings yet
Discriminant Analysis Presentation
21 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Classification of Multivariate Techniques
No ratings yet
Classification of Multivariate Techniques
25 pages
Credit Scoring Models Using Data Mining
No ratings yet
Credit Scoring Models Using Data Mining
23 pages
2024.transformer Health Index Monitoring Using Supervised Prediction Model
No ratings yet
2024.transformer Health Index Monitoring Using Supervised Prediction Model
4 pages
Facialrecognition Basedattendancesystem
No ratings yet
Facialrecognition Basedattendancesystem
8 pages
Alberts Cancellation Task
No ratings yet
Alberts Cancellation Task
11 pages
Data Mining For Business Analytics Concepts Techniques and Applications in Python Ebook Download Instantly
0% (2)
Data Mining For Business Analytics Concepts Techniques and Applications in Python Ebook Download Instantly
314 pages
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
Research Methodology 4
No ratings yet
Research Methodology 4
33 pages
Linear Discriminant Analysis (LDA) Tutorial
No ratings yet
Linear Discriminant Analysis (LDA) Tutorial
2 pages
23ma1305 - Computational Statistics Multivariate Normal Distribution
No ratings yet
23ma1305 - Computational Statistics Multivariate Normal Distribution
17 pages
Predictive Models in Ecology: Comparison of Performances and Assessment of Applicability
No ratings yet
Predictive Models in Ecology: Comparison of Performances and Assessment of Applicability
17 pages
Data Mining Practical Machine Learning Tools and Techniques Fourth Edition Ian H. Witten Full
No ratings yet
Data Mining Practical Machine Learning Tools and Techniques Fourth Edition Ian H. Witten Full
89 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Unit Iv BRM
No ratings yet
Unit Iv BRM
15 pages
Multivariate Analysis Guide
No ratings yet
Multivariate Analysis Guide
7 pages
Discriminant Analysis Guide
No ratings yet
Discriminant Analysis Guide
29 pages

L5 Dimensionality Reduction

Uploaded by

L5 Dimensionality Reduction

Uploaded by

Oil and Gas Petro chemical Pulp and Paper Water and wastewater Metal Industries

Dr. Senthilmurugan Subbiah, Department of Chemical Engineering, IITG.

February 17, 2024 | Slide 2

February 17, 2024 | Slide 3

• Alternate solution to replace the expensive Controller

February 17, 2024 | Slide 4

t T1 T2 T3 TE Qf Qd Qt Ct • How to Identify the best relation between Ct and

February 17, 2024 | Slide 5

Feature selection: Optimization involves

February 17, 2024 | Slide 8

February 17, 2024 | Slide 9

February 17, 2024 | Slide 11

February 17, 2024 | Slide 12

Principle: RFE is a more systematic Process:

• Rebuild the model with the remaining

February 17, 2024 | Slide 13

February 17, 2024 | Slide 15

February 17, 2024 | Slide 18

February 17, 2024 | Slide 20

February 17, 2024 | Slide 22

▪ From problem in hand,

𝑆𝑆 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑓𝑜𝑟 𝑃𝐶1

February 17, 2024 | Slide 27

February 17, 2024 | Slide 28

TDS (g/l) 10 20 10.1 12 19 18.5

BOD (g/l) 1 0.1 1.2 0.2 0.3 0.15 L6

February 17, 2024 | Slide 30

PC2 & PC3 account for the

February 17, 2024 | Slide 32

0.0 -4.0 -3.0 -2.0 -1.0-2.0 0.0 1.0 2.0 3.0

February 17, 2024 | Slide 33

• Compute the Covariance Matrix: Understand how variables relate.

• Eigen Decomposition: Find eigenvectors (directions of maximum variance) and

Github Link : PCA IRIS DATA VISUALISATION

February 17, 2024 | Slide 35

February 17, 2024 | Slide 36

Data Preparation Advantages of using SVD over PCA

Singular Vector Decomposition follows a

February 17, 2024 | Slide 39

Maximise the combined objective

February 17, 2024 | Slide 40

Let’s look at the measured signals in a cocktail party

February 17, 2024 | Slide 41

February 17, 2024 | Slide 42 Github Link

Standardize the Data:

February 17, 2024 | Slide 43

February 17, 2024 | Slide 44

February 17, 2024 | Slide 45

February 17, 2024 | Slide 46

February 17, 2024 | Slide 47

1. Interpretability : The newly formed features using dimensionality reduction

You might also like