0% found this document useful (0 votes)

5 views77 pages

Week 3

Uploaded by

sainathgunda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views77 pages

Week 3

Uploaded by

sainathgunda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

sainathgunda99@gmail.

com
DLZNK464L9 Data Preprocessing

This file is meant for personal use by sainathgunda99@gmail.com only.

● Explain data pre-processing tasks.

● Illustrate methods to handle missing values and noisy data.
● Explain the importance of outlier removal and redundant data removal from datasets.
● List the methods for dimensionality reduction and numerosity reduction .
sainathgunda99@gmail.com
● Define data discretization and its methods.
DLZNK464L9

● Explain data transformation and the importance of normalization.

● Demonstrate typical data pre-processing tasks in Python.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Concepts of Data Pre-processing:
○ Data Quality
○ Data Formats
○ Major Tasks in Data Pre-processing
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

● Accuracy: proper or incorrect, accurate or not.

● Completeness: not recorded, un-available, missing values, important variables not included

● Consistency: dangling and some features are modified but some features not
sainathgunda99@gmail.com
DLZNK464L9

● Interpretability: how easily the data can be understood, codes as variable names, or coded values,

nominal values – semantic ambiguity in the data

● Timeliness: is timely updated?

● Believability: how much data is trustable are as perceived by the end user

● Evaluate all of the above to assess data’s fitness for the task
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Formats: Tidy Data
1. Each variable forms a column.

2. Each observation forms a row.

3. Each type of observational unit forms a table.

Var 1 Var 2 … … Var n
Obs 1
sainathgunda99@gmail.com
DLZNK464L9 2.3 34 Yes 123.45 0.3
Obs 2 3.6 23 No 567.34 0.7
Obs n 5.6 56 No 112.7 0.56

● Provides a standard way of structuring a dataset.

● Make it easier to extract needed variables for analysis.

This file is meant for personal use by sainathgunda99@gmail.com only.

Name Math English

Anna 86 90
John 43 75
Cath 80 82
sainathgunda99@gmail.com
DLZNK464L9

● “long” format: considered variable “Subject”

Name Subject Grade
Anna Math 86
Anna English 90
John Math 43
John English 75
Cath Math 80
Cath English 82
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Pre-processing: Major Tasks
● Data cleaning
○ Handling missing values, noisy data, resolve inconsistencies and identify or remove the outliers
● Data integration
○ Integration of multiple databases, data cubes, or files
● Data reduction
sainathgunda99@gmail.com
DLZNK464L9

○ Dimensionality reduction (PCA)

○ Numerosity reduction
● Data transformation
○ Normalization
○ Data discretization
○ Concept of hierarchy generation
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Pre-processing: Major Tasks

Tasks Methods
Binning, Histogram analysis
Missing values
Regression
Noisy data
Clustering, Classification
Outliers Correlation/covariance
sainathgunda99@gmail.com
Redundancy
DLZNK464L9
PCA, Feature selection
Box plots
Dimensionality reduction
Sampling
Numerosity reduction
Data compression
Data discretization Data Normalization
Scale differences Concept hierarchy

This file is meant for personal use by sainathgunda99@gmail.com only.

● Data quality: format, accuracy, completeness, consistency, timeliness, believability, interpretability.

● Tidy data provides a standard way of structuring a dataset.
● Major pre-processing tasks - Data cleaning, data integration, data reduction, and data
sainathgunda99@gmail.com

transformation.
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Different Tasks and Methods
○ Missing values
○ How to handle missing data?
○ Simple Linear Regression
○ Multiple Linear Regression
sainathgunda99@gmail.com
DLZNK464L9

○ Noisy data

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Missing Values
● Empty cells or cells filled with “NA”-like tokens.
● Semantics of missing data
○ An empty data cell could mean:
■ Value exists
sainathgunda99@gmail.com
DLZNK464L9
● Value is available but not recorded due to human error, for example
○ Negative findings are left empty (e.g., negative for asymmetric binary variables)
● Value is not available (e.g., I don’t know my grandpa’s birthday)
■ Value does not exist:
● Absence of a value (I don’t have a middle name)
● Not applicable (I don’t have a tail)
○ Different semantics should be encoded as different values,
NA-not applicable, Missing Sharing –applicable
or but
publishing the contents Rightsnot available,
inReserved.
part or Unauthorized use or etc.
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All
full distribution prohibited.
is liable for legal action.
How to Handle Missing Data ?
● Ignore the tuples with missing value
○ when the class label is missing (when doing classification)
○ not effective when the percentage of missing information varies greatly per attribute - resulting
in a large number of tuples not being included in analyses.
● Fill in the missing value manually: major feasibility issue
sainathgunda99@gmail.com
DLZNK464L9

● Replace empty cells with ‘NA’, “Missing”, etc. More see https://support.datacite.org/docs/schema-
values-unknown-information-v42

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
How to Handle Missing Data ?
● Fill in automatically with (imputation)
○ A global constant: e.g., NA. Not ideal but often done
○ The attribute mean/median/mode
○ The mean/median/mode for all data objects in the same class (smarter)
○ The most probable value: regression or inference-based such as Bayesian inference or decision
sainathgunda99@gmail.com
DLZNK464L9

tree: best, but is this problem-free?

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Simple Linear Regression
● A statistical method that summarizes and studies the relationships between two continuous
(quantitative) variables
○ Independent (predictor) variable: x = height
○ Dependent (response) variable: y = weight
● Goal: find the best straight line that fits the data
○ y = bx +a
sainathgunda99@gmail.com
DLZNK464L9
● Method: find a and b that minimize the objective function

● How good is the fit?: coefficient of determination (R Squared,=1 is the best)

Adjusted R Squared
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Simple Linear Regression
y y = bx + a
250
Residual
200
r = 100 - 150 = -50
Weight (lbs)

150
sainathgunda99@gmail.com
DLZNK464L9
r
100 (55, 100)

x
10 20 30 40 50 60

Height (inches)

‘r’ here shows a residual, the difference between the true value and the predicted value.
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Multiple Linear Regression
● Multiple linear regression (more than 1 independent variables, X and beta are vectors).
● Tips on choosing the best model.
○ http://blog.minitab.com/blog/adventures-in-statistics-2/how-to-choose-the-best-regression-
model
● Use for:
○ missing values: use predicted values to replace missing values.
sainathgunda99@gmail.com
DLZNK464L9

○ data smoothing: use predicted values to replace original data.

○ data reduction: save only the function, parameters, and outliers (not the original data for the
predicated dimensions).
○ outlier detection: identify (visualize) data that are far away from the predicted values.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Noise
● Noise has two main sources:
○ Implicit inaccuracies caused by measuring devices
○ Random errors caused by human errors or other issues
● Noise can occur in attribute names and attribute values, including class labels

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: How to Handle Noisy Data ?
● Binning/Histogram analysis
○ First, sort data and partition it into (e.g., equal-frequency) bins.
○ Then smooth by bin means, smooth by bin median, or smooth by bin borders.
● Regression
○ Smooth by fitting data into the regression functions
sainathgunda99@gmail.com
DLZNK464L9

● Clustering
○ Smooth data by cluster centres
○ detect and remove outliers/errors

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: How to Handle Noisy Data ?
● Truncation
○ Truncate the least significant digits in a real number
● Human inspection and Combined computer
○ Detect suspicious values and check by humans
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Smooth by Binning
● Divide sorted data into bins.
● Partitioning rules:
○ Equal-width: equal bin range
○ Equal-frequency (or equal-depth): equal # of
data points in the bins
sainathgunda99@gmail.com
DLZNK464L9

● For data smoothing/discretization, replace data

with bin mean, median, etc/bin label.
● In effect, it also reduced the number of different
data values (cardinality of the variable)

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Equal-width binning
● Equal-width (interval) partitioning
○ Divides the range into N bins of equal intervals.
○ if A is lowest and B is highest values of the attribute,
The width of intervals will be:
W = (B –A)/N.
sainathgunda99@gmail.com
DLZNK464L9

○ In practice: Freedman-Diaconis rule works well (more rules)

■ W=2×IQR×n−1/3 . N = (B−A)/W

○ The most straightforward, but outliers may dominate the presentation.

○ Data can’t be handled well if it is skewed

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Equal-Depth Binning
● Equal-depth (count, frequency) partitioning
○ Divides the entire range into N bins of equal number of data points.
○ Good data scaling with varied bin width

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Example Equal-Depth Binning for Data Smoothing
● Sorted data for the price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
○ Partition into equal-frequency (equi-depth) bins:
■ Bin 1: 4, 8, 9, 15
■ Bin 2: 21, 21, 24, 25
■ Bin 3: 26, 28, 29, 34
sainathgunda99@gmail.com
DLZNK464L9
○ Smoothing by bin boundaries:
■ Bin 1: 4, 4, 4, 15
■ Bin 2: 21, 21, 25, 25
■ Bin 3: 26, 26, 26, 34
○ Smoothing by bin means:
■ Bin 1: 9, 9, 9, 9
■ Bin 2: 23, 23, 23, 23
■ Bin 3: 29, 29, 29, 29
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Clustering
● Partition continuous, discrete, or mixed datasets into clusters
based on similarity [distance].
○ There are many choices of distance functions, clustering
definitions, and clustering algorithms
● Can be used to smooth noisy data, detect outliers, numerosity
sainathgunda99@gmail.com
DLZNK464L9

reduction, and data discretization.

○ Data smoothing/discretization: take cluster means,
median, etc.
○ Data reduction: store cluster representation only
○ Outlier detection: visualize data points far away

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Clustering
● Can be very useful if the data is clustered, but it
cannot be effective if the data is "splattered."
● Can have hierarchical clustering and be stored in
multi-dimensional index tree structures.
● A non-parametric method: no assumption. Let the
sainathgunda99@gmail.com
DLZNK464L9

data tell the story.

This file is meant for personal use by sainathgunda99@gmail.com only.

● Empty cells or cells filled with “NA”-like tokens are referred to as missing data.
● Noisy Data can be implicit errors introduced by measurement tools, such as different types of
sensors, or random errors.
● There are different ways to handle missing data and noisy data, including various imputation
sainathgunda99@gmail.com
DLZNK464L9

methods and data smoothing methods.

This file is meant for personal use by sainathgunda99@gmail.com only.

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Outliers: Outlier Detection
● Exploratory data analysis:
○ Data summary plots – boxplots
○ Histogram analysis
● Regression
○ Data that doesn’t fit the known distribution model are outliers.
sainathgunda99@gmail.com
DLZNK464L9

● Clustering
○ Outliers form small and distant clusters or not be included in any cluster.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Data Integration
● Data integration:
○ Data from multiple sources is combined into a coherent storage.
● Database schema integration
○ Challenging; examining metadata carefully originates from various
sources.
sainathgunda99@gmail.com
DLZNK464L9

● Data redundancy, e.g., entity identification problem:

○ Identify real-world entities from a variety of data sources
● Detecting and resolving data value conflicts and scale differences.
○ Attribute values from different sources differ for the same real-world
item.
○ Possible reasons: different representations (e.g., date, GPA), different
scales, e.g., metric vs.Proprietary
BritishThis file units
is meant for personal use by sainathgunda99@gmail.com only.
content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Handling Redundancy in Data Integration
● Redundant attributes may be detected by correlation analysis or covariance analysis.

● Redundant attributes should be removed

● Attributes that are correlated but not redundant should often be kept.

● Careful integration of data from various sources may aid in the reduction/avoidance of redundancies
sainathgunda99@gmail.com
DLZNK464L9
and inconsistencies, as well as the improvement of mining speed and quality.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Correlation Analysis (Nominal Data)
Play chess Not play chess Sum (row)
[c1] [c2]
Like science fiction[r1] 250(90) 200(360) 450 [R=r1]
Not like science fiction 50(210) 1000(840) 1050 [R=r2]
[r2]
Sum(col.)
sainathgunda99@gmail.com
DLZNK464L9
300 [C=c1] 1200 [C=c2] 1500 [n]

This file is meant for personal use by sainathgunda99@gmail.com only.

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

● H0: A and B are not correlated. alpha = 0.001

sainathgunda99@gmail.com
DLZNK464L9
● Χ2 (chi-square) value calculation

● Using the Χ2 table (next slide), we find the critical value=10.828 for the alpha and d.f.=1
● Χ2 > 10.828, reject H0, so A and B are correlated.
● Most tests will give you a p-value; if p-value < alpha, reject H0.

This file is meant for personal use by sainathgunda99@gmail.com only.

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Correlation Analysis (Numeric Data)
● Correlation coefficient (also called Pearson’s product moment coefficient) [-1, 1]

○ Where n is the number of tuples, and are the respective means of A and B,
sainathgunda99@gmail.com

○ σA and σB are the respective standard deviation of A and B

DLZNK464L9

○ Σ(aibi) is the sum of the AB cross-product.

● If rA,B > 0, A and B are positively linearly correlated (A’s values increase as B’s). The higher the value of
rA,B, the stronger the correlation.
● rA,B = 0: not linearly correlated; may still be associated in other ways.
● rAB < 0: negatively linearly correlated

This file is meant for personal use by sainathgunda99@gmail.com only.

Scatter plots showing Pearson

sainathgunda99@gmail.com
DLZNK464L9

coefficient from –1 to 1.

This file is meant for personal use by sainathgunda99@gmail.com only.

Contrast: Correlation coefficient:

sainathgunda99@gmail.com
DLZNK464L9

○ where n is the number of tuples, are the respective mean or expected values (E) of A
and B, σA and σB are the respective standard deviation of A and B.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Covariance (Numeric Data)
● Positive covariance: Cov A and B> 0, indicating A and B both tend to be larger than their expected
values.

● Negative covariance: CovA and B < 0, indicating two variables change in different directions: one is
larger and the other one is smaller than their expected values.
sainathgunda99@gmail.com
● Independence: CovA and B= 0, but the reverse is not true:
DLZNK464L9

○ Some random variable pairings may have a covariance of zero but they are not independent. A
covariance of 0 implies independence only under certain additional conditions (for example,
the data have multivariate normal distributions).

This file is meant for personal use by sainathgunda99@gmail.com only.

● Suppose two stocks A and B have the following values in one week: (2,5), (3, 8), (5, 10), (4,
11), (6, 14).
sainathgunda99@gmail.com
DLZNK464L9

● Question: Are the prices of A and B rise or fall together?

● E(A) = (2 + 3 + 5 + 4 + 6)/5 = 20/5 = 4
● E(B) = (5 + 8 + 10 + 11 + 14)/5 = 48/5 = 9.6
● Cov(A,B) = (2x5+3x8+5x10+4x11+6x14)/5 - 4 x 9.6 = 4
Thus, A and B rise together since Cov(A, B) > 0.

This file is meant for personal use by sainathgunda99@gmail.com only.

● Outliers can be detected.

● Data redundancy occurs mostly because of data integration, and redundant attributes may be
detected by correlation or covariance analysis.
● Redundant attributes should be removed.
sainathgunda99@gmail.com
DLZNK464L9

● Correlated attributes are often useful in mining tasks.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Tasks and Methods
○ Dimensionality reduction
○ Curse of Dimensionality and data sparseness
○ PCA – Principal Component Analysis
○ Numerosity reduction and random sampling methods
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Reduction Strategies
● Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but
yet produces the same (or almost the same) analytical results.

● Why data reduction? — A database/data warehouse may store terabytes of data. Complex data
analysis may take a long time on the complete data set.
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Reduction Strategies
● Data reduction strategies
○ Dimensionality reduction, e.g., removing or merging attributes
■ Principal Components Analysis (PCA).
■ Feature subset selection, feature creation
○ Numerosity reduction (reduce data volume, use smaller forms of data representation)
■ Regression
sainathgunda99@gmail.com
DLZNK464L9

■ Histograms/binning, clustering, sampling

■ Data cube aggregation
○ Data compression

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Dimensionality Reduction: Curse of Dimensionality
● Curse of dimensionality
○ When dimensionality of features in the dataset increases, data becomes increasingly sparse in
feature space.
○ Density and distance between points, which are important for grouping and outlier analysis,
become less relevant.
○ The number of possible subspace combinations will expand exponentially.
● Dimensionality reduction
sainathgunda99@gmail.com
DLZNK464L9

○ Avoid the curse of dimensionality by reducing features

○ Dimensionality reduction help in eliminate irrelevant features and reduce noise.
○ Reduces time and space required in data mining.
○ Ease to visualize
● Dimensionality reduction techniques
○ Principal Component Analysis
○ Supervised techniques
○ Nonlinear techniques (e.g., feature selection)
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Curse of Dimensionality: sparseness

A single feature does not

result in a perfect separation
of our training data
sainathgunda99@gmail.com
DLZNK464L9

Adding a third feature

results in a linearly
separable classification
problem in our training data

Adding a second feature still does

not result in a linearly separable
This file is meant for personal use by sainathgunda99@gmail.com only.
classification Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Sparseness: More Training Data Needed

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

sainathgunda99@gmail.com
DLZNK464L9

● With increased dimensionality, the hypersphere occupies only a very small

portion of the search space; all training examples are essentially located in
the corners.
● When dim -> infinity, all training examples are at the same distance from all
other examples.
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (PCA): Numeric Data
● Finds the projection that captures the most variety in the data.
● The original data can be reflected into a much smaller space, which reduces dimensionality while
keeping variability. We find the eigenvectors (“characteristic” vectors) of the covariance matrix, and
these eigenvectors define the new space.
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (PCA)
• First three PCs capture 75% of original
variance based on loadings.
• Component values are weighted sum of
the original dimensions.
• Comp1 = 0.361*Sepal.Length +
0.867*Petal.Length + 0.358*Petal.Width
sainathgunda99@gmail.com
DLZNK464L9

• Subsequent analysis will use reduced

presentation/dimensions

53 This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Numerosity Reduction
● Reduce the size of data volume by choosing alternative smaller forms of data representation.
● Parametric methods (Example: regression)
○ Consider the data fits some model, estimate model parameters, store only the parameters, and
discard the data (except possible outliers).
● Non-parametric methods
sainathgunda99@gmail.com
DLZNK464L9

○ Do not assume parameterized probability distributions.

○ Major families: histograms/binning, clustering, sampling, …

This file is meant for personal use by sainathgunda99@gmail.com only.

● Also used in sampling training and test examples.

● Allow mining algorithms to run at a complexity that is possibly sub-linear to data size.

● Key principle: choose a representative subset of the data.

sainathgunda99@gmail.com
DLZNK464L9

○ In skewed datasets, simple random sampling may perform poorly.

○ Develop adaptive sampling methods, e.g., stratified sampling.

This file is meant for personal use by sainathgunda99@gmail.com only.

● Data reduction obtains a reduced representation of the data set that is much smaller in volume but
yet produces the same (or almost the same) analytical results.
● Data reduction can be done by:
○ Dimensionality reduction - It is the process of removing unimportant attributes.
sainathgunda99@gmail.com
DLZNK464L9

○ Numerosity reduction - It reduces data volume; uses smaller forms of data representation.
○ Data compression
● Sampling is about obtaining a small sample s to represent the whole data set N.

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Tasks and Methods
○ Data transformation: Normalization
○ Data discretization methods
○ Concept Hierarchy generation
sainathgunda99@gmail.com
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Transformation
● Data are transformed or consolidated into forms appropriate for mining.
● Methods
○ Smoothing: Remove noise from data
○ Attribute / feature construction
■ New attributes constructed from the given ones
sainathgunda99@gmail.com
DLZNK464L9

○ Aggregation: Data cube construction, summarization

○ Normalization: Scaled to fall within a smaller, specified range for more meaningful comparison
■ min-max normalization
■ z-score normalization
■ normalization by decimal scaling
○ Discretization: Concept hierarchy climbing
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Normalization
● Min-max normalization: to [new_minA, new_maxA]

○ Ex. Let income range from $12,000 to $98,000 normalized to [0.0,

1.0]. Then $73,600 is mapped to
sainathgunda99@gmail.com
DLZNK464L9

● Z-score normalization (μ: mean, σ: standard deviation):

○ Ex. Let μ = 54,000, σ = 16,000. Then

This file is meant for personal use by sainathgunda99@gmail.com only.

○ Ex. (50, 20) -> (0.5, 0.2) with j=2

Where j is the smallest integer such

sainathgunda99@gmail.com that Max(|ν’|) < 1 or =1
DLZNK464L9

This file is meant for personal use by sainathgunda99@gmail.com only.

● Actual data values are replaced with interval labels..

● Reduce attribute cardinality
● Handles outliers and skewed data
● Supervised vs. unsupervised
sainathgunda99@gmail.com
DLZNK464L9

● Prepare data for further analysis, e.g., classification.

This file is meant for personal use by sainathgunda99@gmail.com only.

All the methods mentioned below can be applied recursively.

● Histogram and Binning analysis

○ Top-down split
sainathgunda99@gmail.com
DLZNK464L9

○ Unsupervised

● Clustering analysis (unsupervised, top-down split, or bottom-up merge)

● Classification analysis, e.g., decision-tree (supervised, top-down split)

● Correlation (e.g., χ2) analysis, e.g., ChiMerge (supervised, bottom-up merge)

This file is meant for personal use by sainathgunda99@gmail.com only.

● Exploit the correlation between intervals and class labels.

● "Interval – Class” contingency tables

● If two adjacent intervals have low χ2 values (less correlated to the class labels), merge them to form
sainathgunda99@gmail.com
DLZNK464L9

a larger interval (keeping them separate does not offer more information on how to classify objects).

● Merge performed recursively until a predefined stopping condition is met.

This file is meant for personal use by sainathgunda99@gmail.com only.

● Statistical approach to Data 1 1 1

Discretization. 2 3 2
3 7 1
● Discretizing the data based on class 4 8 1
labels, using the Chi-square
sainathgunda99@gmail.com
DLZNK464L9
5 9 1
approach.
6 11 2
● F:attribute 7 23 1
8 37 1
● K:class label 9 39 2
10 45 1
11 46 2
12 59 1
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
ChiMerge Discretization Example Sample F K Intervals
● Sort and arrange the
1 1 1 {0,2}
attributes you want to 2 3 2 {2,5}
group (Example: 3 7 1 {5,7.5}
attribute F). 4 8 1
● Begin by having each {7.5,8.5}
unique value in the
sainathgunda99@gmail.com
DLZNK464L9 5 9 1 {8.5,10}
attribute in its own 6 11 2 {10,17}
interval. 7 23 2 {17,30}
8 37 1 {30,38}
9 39 2 {38,42}
10 45 1 {42,45.5}
11 46 1 {45.5,52}
12 59 1 {52,60}
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
ChiMerge Discretization Example Sample F K

● Begin calculating the Chi- 1 1 1

square test on every pair 2 3 2
of adjacent intervals 3 7 1

● Interval/class contingency 4 8 1
tables:
sainathgunda99@gmail.com
DLZNK464L9
5 9 1
Sample K=1 K=2 6 11 2
2 0 1 1 7 23 2
3 1 0 1
8 37 1
total 1 1 2
9 39 2
Sample K=1 K=2
10 45 1
3 1 0 1
11 46 1
4 1 0 1
total 2 0 2 12 59
This file is meant for personal use by sainathgunda99@gmail.com only. 1
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Sampl K=1 K=2 E11 = (1/2)*1 = .05
e E12 = (1/2)*1 = .05
E21 = (1/2)*1 = .05
2 0 1 1
E22 = (1/2)*1 = .05
3 1 0 1
total 1 1 2
sainathgunda99@gmail.com
DLZNK464L9 X2 = (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 = 2
Sampl K=1 K=2
E11 = (1/2)*2 = 1
e
E12 = (0/2)*2 = 0
3 1 0 1 E21 = (1/2)*2 = 1
4 1 0 1 E22 = (0/2)*2 = 0

total 2 0 2
X2 = (1-1)2/1+(0-0)2/0+ (1-1)2/1+(0-0)2/0 = 0
Sig Level 0.1 with df=1 from Chi square distribution X2 critical value = 2.7024. Not
correlated, can be merged. ProprietarySharing
This file is meant for personal use by sainathgunda99@gmail.com only.
content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example

Sample F K Intervals Chi2 ● Calculate all the Chi-square

1 1 1 {0,2} values for all intervals.
2
● Merge the intervals with the
2 3 2 {2,5}
2 smallest Chi values.
3 7 1 {5,7.5}
0
4 8 1 {7.5,8}
sainathgunda99@gmail.com
DLZNK464L9 0
5 9 1 {8.5,10}
2
6 11 2 {10,17}
0
7 23 2 {17,30}
2
8 37 1 {30,38}
2
9 39 2 {38,42}
2
10 45 1 {42,45.5}
0
11 46 1 {45.5,52}
0
12 59 1 {52,60}
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Intervals Chi2
Samp F K
1 1 1 {0,2} 2
2 3 2 {2,5}
Repeat.
3 7 1 4
Keep merging intervals with small X2
4 8 1 {5,10} until all X2 > 2.7024
sainathgunda99@gmail.com
DLZNK464L9 5 9 1 5
6 11 2
7 23 2 {10,30}
3
8 37 1 {30,38}
2
9 39 2 {38,42}
10 45 1 4
11 46 1 {42,60}
12 59 1
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Sample F K Intervals Chi2
1 1 1
2 3 2 {0,10}
3 7 1
4 8 1 ● End: There are no
more intervals with
5 9 1 2.72
sainathgunda99@gmail.com
DLZNK464L9 X2 < 2.7024.

6 11 2 ● These intervals are

7 23 2 {10,30} correlated with class
8 37 1 labels.
9 39 2
3.93

10 45 1
11 46 1 {42,60}
12 59 1 This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Concept Hierarchy Generation
● Concept hierarchy organises concepts (attribute values) hierarchically and is typically associated with
each dimension in a data warehouse.
● In data warehouses, concept hierarchies enable drilling and rolling to see data at various
granularities.
● Concept hierarchy generation
sainathgunda99@gmail.com
DLZNK464L9

○ Specified by domain experts, taxonomies/thesaurus/ ontologies

○ Generated from data sets (for some simple, specific cases)
■ Discretization for numerical or ordinal data
■ Frequency counts for categorical data (limited cases)
○ Concept hierarchy learning
■ Natural language processing and ML approaches.
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Concept Hierarchy Generation for Nominal Data
● Specification of a partial/total ordering of attributes explicitly at the schema level by users or
experts.
○ street < city < state < country
● Specification of a hierarchy for a set of values by explicit data grouping.
○ {Urbana, Champaign, Canada} < Illinois
sainathgunda99@gmail.com
DLZNK464L9

● Specification of only a partial set of attributes.

○ E.g. only street < city, not others
● Automatic generation of hierarchies (or attribute levels) by the analysis of the number of distinct
values.
○ E.g. for a set of attributes: {street, city, state, country}

This file is meant for personal use by sainathgunda99@gmail.com only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Automatic Concept Hierarchy Generation
● Some hierarchies can be built automatically based on a study of the number of distinct values for
each attribute in the data collection.
○ The attribute with the most distinct values is at the bottom of the hierarchy.
○ Exceptions,
Example: weekday, month, quarter, year
sainathgunda99@gmail.com
DLZNK464L9

country 15 distinct values

province_or_ state 365 distinct values

city 3567 distinct values

street 674,339 distinct values

This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
In this session, we discussed:

● Normalization – The data is scaled to fall within a smaller, specified range for more meaningful
comparison.
● Discretization divides the range of a continuous attribute into the interval.
● Chi-Merge Discretization example
sainathgunda99@gmail.com
DLZNK464L9

● Concept hierarchy organizes concepts (i.e., attribute values) hierarchically and is usually associated
with each dimension in a data warehouse.
● Concept hierarchy generation for nominal data

This file is meant for personal use by sainathgunda99@gmail.com only.

● Apply data pre-processing tasks and methods to prepare data for a data mining task.
● Summarize the importance of outlier removal and redundant data removal from data sets.
● Explain the methods for dimensionality reduction and numerosity reduction.
● Implement data transformation strategies, such as normalization, discretization, and concept
sainathgunda99@gmail.com
DLZNK464L9
hierarchy generation.
● Perform typical data pre-processing tasks in Python.

This file is meant for personal use by sainathgunda99@gmail.com only.

03 Data Preprocessing
No ratings yet
03 Data Preprocessing
15 pages
C2 - Data Cleaning & Preprocessing
No ratings yet
C2 - Data Cleaning & Preprocessing
59 pages
ET 610 - Data Preprocessing
No ratings yet
ET 610 - Data Preprocessing
41 pages
Data Cleaning Essentials
No ratings yet
Data Cleaning Essentials
42 pages
Data Preprocessing
No ratings yet
Data Preprocessing
67 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
FDS Chapter 3
No ratings yet
FDS Chapter 3
103 pages
Data Preprocessing for COVID-19 Data
No ratings yet
Data Preprocessing for COVID-19 Data
8 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
Unit-4 Part 1 Preparing Model
No ratings yet
Unit-4 Part 1 Preparing Model
20 pages
CS322 - Lec 3 - S25
No ratings yet
CS322 - Lec 3 - S25
42 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
66 pages
Section 4
No ratings yet
Section 4
3 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
16-Data Preprocessing
No ratings yet
16-Data Preprocessing
27 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
Lecture 02
No ratings yet
Lecture 02
41 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
Data Preparation
No ratings yet
Data Preparation
17 pages
Da Mid1
No ratings yet
Da Mid1
32 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
Data Cleaning Techniques Guide
No ratings yet
Data Cleaning Techniques Guide
11 pages
Module II - Data Processing
No ratings yet
Module II - Data Processing
54 pages
02 - 23ECE216 - EDA - Pre Processing
No ratings yet
02 - 23ECE216 - EDA - Pre Processing
16 pages
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
No ratings yet
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
31 pages
Class3-9 DataPreprocessing 22Aug-06Sept2019
No ratings yet
Class3-9 DataPreprocessing 22Aug-06Sept2019
53 pages
Data Preparation Guide COS10022
No ratings yet
Data Preparation Guide COS10022
61 pages
Pandas: Data Cleaning Essentials
No ratings yet
Pandas: Data Cleaning Essentials
6 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
Data Preparation .1
No ratings yet
Data Preparation .1
37 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
33 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
9 pages
Data Quality
100% (2)
Data Quality
16 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Study Material Data Preprocessing
No ratings yet
Study Material Data Preprocessing
11 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
73 pages
Unit 1
No ratings yet
Unit 1
21 pages
W4-5 03preprocessing
No ratings yet
W4-5 03preprocessing
83 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
DMDW 03
No ratings yet
DMDW 03
25 pages
DM Chapter 3
No ratings yet
DM Chapter 3
60 pages
FDS Unit 2
No ratings yet
FDS Unit 2
8 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Data Quality
No ratings yet
Data Quality
13 pages
Week3 - Data Preprocessing, Extraction and Preparation
No ratings yet
Week3 - Data Preprocessing, Extraction and Preparation
34 pages
Data Pre Processing I
No ratings yet
Data Pre Processing I
37 pages
UNIT - Introduction - DataScience - New
No ratings yet
UNIT - Introduction - DataScience - New
55 pages
Data Preparation
No ratings yet
Data Preparation
59 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
Data Pre-Processing & Cleaning Guide
No ratings yet
Data Pre-Processing & Cleaning Guide
37 pages
ML Lecture 5 Data Quality
No ratings yet
ML Lecture 5 Data Quality
19 pages
Introduction To Data Science 1-2-2025
No ratings yet
Introduction To Data Science 1-2-2025
14 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
Selenium Java Notes Part-1
No ratings yet
Selenium Java Notes Part-1
95 pages
30 Most Important ML Concepts 1735541667
No ratings yet
30 Most Important ML Concepts 1735541667
18 pages
Dashboards Intro
No ratings yet
Dashboards Intro
27 pages
Dashboard Layouts
No ratings yet
Dashboard Layouts
33 pages
Week+1-Part+1 Upd
No ratings yet
Week+1-Part+1 Upd
30 pages
Weekly Assessment-4
No ratings yet
Weekly Assessment-4
10 pages
L7 - Project Planning and Scheduling - Network Technique - PERT
No ratings yet
L7 - Project Planning and Scheduling - Network Technique - PERT
29 pages
Gold Volatility Prediction Using A CNN-LSTM Approa
No ratings yet
Gold Volatility Prediction Using A CNN-LSTM Approa
9 pages
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
No ratings yet
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
235 pages
@ilapss Ilaps GenAI Syllabuspdf
No ratings yet
@ilapss Ilaps GenAI Syllabuspdf
7 pages
SKEE3143 Assignment - Crane System
No ratings yet
SKEE3143 Assignment - Crane System
8 pages
Movie Recommendation System: Using Machine Learning
No ratings yet
Movie Recommendation System: Using Machine Learning
7 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
Pcs - Css - FPSC - General Ability Mcq's Test With Solution - Data Structure and Algorithm Mcq's 12
No ratings yet
Pcs - Css - FPSC - General Ability Mcq's Test With Solution - Data Structure and Algorithm Mcq's 12
4 pages
Reduced Order Observer Design
100% (1)
Reduced Order Observer Design
6 pages
Math II for Computer Science Majors
No ratings yet
Math II for Computer Science Majors
2 pages
AI&ML Lab Manual - 18CSL76 - Master
No ratings yet
AI&ML Lab Manual - 18CSL76 - Master
47 pages
Bootstrap Techniques in R
No ratings yet
Bootstrap Techniques in R
10 pages
Modulo 1.2 Probability Rules
No ratings yet
Modulo 1.2 Probability Rules
12 pages
Hybrid Heuristics For High Speed Route Optimization: 2019 Detrack Systems Pte Ltd. All Rights Reserved
No ratings yet
Hybrid Heuristics For High Speed Route Optimization: 2019 Detrack Systems Pte Ltd. All Rights Reserved
20 pages
Mid Lecture 2
No ratings yet
Mid Lecture 2
14 pages
AI Searchingstrategies PDF
No ratings yet
AI Searchingstrategies PDF
65 pages
Chapter 6 - Project Crashing
No ratings yet
Chapter 6 - Project Crashing
45 pages
Reduction of Order
No ratings yet
Reduction of Order
5 pages
Cryptography in NET Succinctly
100% (1)
Cryptography in NET Succinctly
67 pages
Poissonizationvonneumann
No ratings yet
Poissonizationvonneumann
49 pages
Enhancing Control Systems Through Type-3 Fuzzy Log
No ratings yet
Enhancing Control Systems Through Type-3 Fuzzy Log
15 pages
Taylor Ims11 Tif Modc
No ratings yet
Taylor Ims11 Tif Modc
15 pages
Roots
No ratings yet
Roots
29 pages
Double Error Correction Code For 32-Bit Data Words With Efficent Decoding
No ratings yet
Double Error Correction Code For 32-Bit Data Words With Efficent Decoding
3 pages
Time Series Analysis
No ratings yet
Time Series Analysis
22 pages
Quantum Computing For Finance
No ratings yet
Quantum Computing For Finance
16 pages
Transportation Model Insights
100% (1)
Transportation Model Insights
7 pages
MATLAB Simulation For Digital Signal Processing PDF
No ratings yet
MATLAB Simulation For Digital Signal Processing PDF
5 pages