0% found this document useful (0 votes)

20 views23 pages

Data Transformation

The document discusses data transformation in statistical hydrology, emphasizing its importance for achieving normal distribution in variables. It outlines various transformation techniques, particularly logarithmic transformations, and introduces Tukey's Ladder of Powers for addressing skewness in data. Additionally, it highlights methods for assessing normality, including graphical examinations and goodness-of-fit tests.

Uploaded by

engr Muhammad Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views23 pages

Data Transformation

Uploaded by

engr Muhammad Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Class Information

AE 5418 Statistical Hydrology

Engr. Dr. Muhammad Ajmal

Associate Professor
PhD in Water Resources & Environmental Engineering
Civil & Environmental Engineering, Hanyang University, South Korea

Agricultural Engineering Department

University of Engineering & Technology Peshawar, Pakistan

1
Statistical Hydrology

Data Transformation

22
04. Data Transformation
What is a data transformation?
➢ Many statistical methods require that the numeric variables
we are working with have an approximate normal distribution.

➢ It is a mathematical function that is applied to all the

observations of a given variable

y = f ( x)
⚫ x represents the original variable, y is the transformed variable, and f
is a mathematical function that is applied to the data.

⚫ Transformation of a variable can change its distribution from a

skewed distribution to a normal distribution (bell-shaped, symmetric
about its center). 3
04. Data Transformation

Why data transformation?

(a) (b)

Fig.: A scatterplot in which the areas of the sovereign states and dependent
territories in the world are plotted on the vertical axis against their population
on the horizontal axis.
❖ Fig. (a) uses raw data. Fig. (b) both the area and population data have been
transformed using the logarithm function. (Wikipedia.com) 4
04. Data Transformation

▣ Data Transformation
⚫ Transformation are used for three purposes
① To make data more symmetric
② To make data more linear, and
③ To make data more constant in variance

✓ In order to make an asymmetric distribution become more

symmetric, the data can be transformed or re-expressed into new
units.
✓ The new nits alter the distances between observations on a line
plot.
✓ The effect is to either expand or contract the distances to extreme
observations on one side of the median making it look more like
the other side.
5
04. Data Transformation

✓ The most commonly used transformation in water resources is the

logarithm.
✓ Logs of water resources data for example, stream discharge,
hydraulic conductivity and sediment concentration are often
taken before statistical analyses are performed.

6
04. Data Transformation

✓ The purpose of data transformation in most instances is not

merely getting a normal distribution from a non-normal, but a try
to meet the assumptions of a statistical test or procedure
(parametric or non-parametric).

✓ If the data does not meet the assumptions of a given test or

procedure, and the problem appears due to the distribution of a
variable we are using, then we often try transformations, although
alternatively we can try a different test or procedure that might
have different assumptions or be more robust.

7
04. Data Transformation

✓ The normality can be decided using different statistical tests

based on the p-value. If the p-value is small (majority of the time
p < 0.05 or p < 0.01), then the data distribution will be non-
normal.

✓ To decide how to transform a variable, you might find the term

"Tukey's ladder" to be a useful search term, as the great
mathematician John Wilder Tukey (1916-2000) created an
ordered list of transformations to bring skewed distributions
toward normality.

8
04. Data Transformation

✓ In simple cases, it might make sense to use a test that say convert
the raw values to ranks (as many nonparametric tests do) and
sidesteps some of the problems that a skewed distribution may be
causing with some parametric test.

✓ If you need something more complex, such as multiple

regression, a Tukey-style transformation may help you meet the
requirements for the residuals that you cannot meet with the
original, untransformed variable.

9
Tukey’s Ladder of Powers for Transformation
UP
Here X represents our variable of
interest. We are going to consider this
variable raised to a power l, i.e. Xl X4 Left skewed
Bigger
Impact X3

We go up the ladder to remove left X2

skewness and down the ladder to
remove right skewness. X Middle rung:
No transformation
(l = 1)
Bigger
Impact 2
X
3
X
log10 X (think of this as X 0 )
DOWN
−1
X
Right skewed
− 1
X2
Tukey’s Ladder of Power

11
Tukey’s Ladder of Powers for Transformation

✓ To remove right skewness, we typically take the square root,

cube root, logarithm, or reciprocal of a variable etc., i.e.
V 0.5, V 0.333, log10(V) (think of V0) , V -1, etc.

✓ To remove left skewness, we raise the variable to a power

greater than 1, such as squaring or cubing the values, i.e. V 2,
V 3 etc.

12
Transformations to Achieve Normality
⚫ How can we determine if observations are normally distributed?

⚫ Graphical examination
✓ Frequency plot (histogram)
✓ Boxplot
✓ Normal quantile-quantile plot (QQ-plot)

⚫ Goodness of fit tests

✓ Chi-Square Text
✓ Shapiro-Wilk Test
✓ Kolmogorov-Smirnov Test
✓ Anderson-Darling Test

13
How to Express a Distribution

Cumulative Density

Probability Density

Which method conveys the

information best to you?

Probability Plot Equation

14
Transformations to Achieve Normality
⚫ Original and Transformed Data

15
Transformations to Achieve Normality
Q-Q Plot for Normally Distributed Data

16
Transformations to Achieve Normality
Q-Q Plot for Left Skewed Data

17
Transformations to Achieve Normality
Q-Q Plot for Right Skewed Data

18
Transformations to Achieve Normality
Q-Q Plot for Leptokurtic (high peak) and Low Spread Data

19
Transformations to Achieve Normality
Q-Q Plot for Platykurtic (low peak) and More Spread Data

20
Transformations to Achieve Normality
⚫ Some Models with Transformed Data

21
Homework

What is a normality test? Why is it conducted? Use any software

for an example data from water resources and discuss the results
in terms of its normality or non-normality. Also, which
techniques will be suitable to normalize it?

22
Questions?

Module3-Part2 (1) (Autosaved)
No ratings yet
Module3-Part2 (1) (Autosaved)
35 pages
Wa0006
No ratings yet
Wa0006
2 pages
Lee - 2020 - Data Transformation A Focus On The Interpretation
No ratings yet
Lee - 2020 - Data Transformation A Focus On The Interpretation
6 pages
Statistics Normality
No ratings yet
Statistics Normality
42 pages
Data Transformation Handout
No ratings yet
Data Transformation Handout
2 pages
Transformation
No ratings yet
Transformation
4 pages
2.6the Normal Transform
No ratings yet
2.6the Normal Transform
4 pages
Population vs Area Scatterplot Analysis
100% (1)
Population vs Area Scatterplot Analysis
3 pages
Box Cox Transformation
No ratings yet
Box Cox Transformation
9 pages
Krebs Chapter 16 2013
No ratings yet
Krebs Chapter 16 2013
45 pages
2.1 - Normal Data
No ratings yet
2.1 - Normal Data
19 pages
Improving Your Data Transformations - Applying The Box-Cox Transf
No ratings yet
Improving Your Data Transformations - Applying The Box-Cox Transf
10 pages
Long-Normality Test Data Transformation
No ratings yet
Long-Normality Test Data Transformation
11 pages
DADM S3 Skewness and Transformations To Achieve Normality
No ratings yet
DADM S3 Skewness and Transformations To Achieve Normality
9 pages
Group 1 Testing Assumptions
No ratings yet
Group 1 Testing Assumptions
35 pages
Data Transformation
No ratings yet
Data Transformation
58 pages
Transformations
No ratings yet
Transformations
4 pages
Data Analysis: Theory Dossier
No ratings yet
Data Analysis: Theory Dossier
51 pages
Data Transformation
No ratings yet
Data Transformation
5 pages
T Rns Formations
No ratings yet
T Rns Formations
6 pages
Lecture1&2slides PDF
No ratings yet
Lecture1&2slides PDF
88 pages
CH.5.
No ratings yet
CH.5.
34 pages
Week 9 Chapter 1 Normal
No ratings yet
Week 9 Chapter 1 Normal
51 pages
Data Transformation by Andy Field
No ratings yet
Data Transformation by Andy Field
1 page
Fams 01 00012
No ratings yet
Fams 01 00012
10 pages
Understanding Normal Distribution
No ratings yet
Understanding Normal Distribution
23 pages
GhilaniStatistics PartI
No ratings yet
GhilaniStatistics PartI
5 pages
MEASURE OF DISPERSION - Distribution
No ratings yet
MEASURE OF DISPERSION - Distribution
37 pages
Tips and Tricks For Analyzing Non-Normal Data
No ratings yet
Tips and Tricks For Analyzing Non-Normal Data
3 pages
Research Method
No ratings yet
Research Method
18 pages
Chapter 3
No ratings yet
Chapter 3
3 pages
Ebooks File (Ebook PDF) Business Statistics: A First Course 8th Edition All Chapters
100% (7)
Ebooks File (Ebook PDF) Business Statistics: A First Course 8th Edition All Chapters
50 pages
Transformations
No ratings yet
Transformations
21 pages
Statistical Analysis
No ratings yet
Statistical Analysis
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
26 pages
L1.2 Exploratory Data Analysis 2023
No ratings yet
L1.2 Exploratory Data Analysis 2023
49 pages
Adv U2
No ratings yet
Adv U2
13 pages
Checking The Normality of A Dataset
No ratings yet
Checking The Normality of A Dataset
6 pages
Lec448B 20160406
No ratings yet
Lec448B 20160406
30 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Business Statistics A First Course - 6ed Index
0% (2)
Business Statistics A First Course - 6ed Index
7 pages
Statistics: Modeling Data Distributions
No ratings yet
Statistics: Modeling Data Distributions
5 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
7 pages
Adobe Scan 05-Dec-2023
No ratings yet
Adobe Scan 05-Dec-2023
12 pages
How To Transform Features Into Normal Gaussian Distribution
No ratings yet
How To Transform Features Into Normal Gaussian Distribution
9 pages
What To Do When The Distribution Is Not Normal
No ratings yet
What To Do When The Distribution Is Not Normal
2 pages
Applied Statistics Outliers Chapter 2
No ratings yet
Applied Statistics Outliers Chapter 2
12 pages
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
No ratings yet
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
6 pages
Lec 09 - DSFa23
No ratings yet
Lec 09 - DSFa23
66 pages
Excel Stats Analysis Guide
100% (1)
Excel Stats Analysis Guide
141 pages
Community Project: Checking Normality For Parametric Tests in R
No ratings yet
Community Project: Checking Normality For Parametric Tests in R
4 pages
EJ1165803
No ratings yet
EJ1165803
15 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Lecture 4 Probability Distributions
No ratings yet
Lecture 4 Probability Distributions
4 pages
Osborne 2002 Transformations
No ratings yet
Osborne 2002 Transformations
8 pages
Social Media Data Analysis Guide
No ratings yet
Social Media Data Analysis Guide
12 pages
05.. Time Series and Forecasting (Smoothing Techniques)
No ratings yet
05.. Time Series and Forecasting (Smoothing Techniques)
53 pages
Pakistan Monthly Climate Summary Jan 2024
No ratings yet
Pakistan Monthly Climate Summary Jan 2024
7 pages
Calibration and Validation of Hydrologic Modeling 1
No ratings yet
Calibration and Validation of Hydrologic Modeling 1
35 pages
The Khyber Pakhtunkhwa Civil Servants Amendment Act 2013 Act No III 2013
No ratings yet
The Khyber Pakhtunkhwa Civil Servants Amendment Act 2013 Act No III 2013
2 pages
CE Civil Engg Deptt Spring 2025 7
No ratings yet
CE Civil Engg Deptt Spring 2025 7
1 page
Sediment Transport Principles Applications 17 Oct 2024
No ratings yet
Sediment Transport Principles Applications 17 Oct 2024
5 pages
Data Collection & Interpretation
No ratings yet
Data Collection & Interpretation
15 pages
Introduction Statistical Hydrology
No ratings yet
Introduction Statistical Hydrology
27 pages
Datesheet - SSC Annual-I Exam 2025
No ratings yet
Datesheet - SSC Annual-I Exam 2025
2 pages
Preview-9781000936742 A46645912
No ratings yet
Preview-9781000936742 A46645912
67 pages
Pakistan Monthly Climate Summary Apr 2025
No ratings yet
Pakistan Monthly Climate Summary Apr 2025
8 pages
Annual Report: National Assembly of Pakistan
No ratings yet
Annual Report: National Assembly of Pakistan
125 pages
Influence of Specific Training On Graduate School Aptitude Test Performance
No ratings yet
Influence of Specific Training On Graduate School Aptitude Test Performance
54 pages
Government of Khyber Pukhtunkhwa Crop Reporting Services District Nowshera Agriculture Department
No ratings yet
Government of Khyber Pukhtunkhwa Crop Reporting Services District Nowshera Agriculture Department
1 page
English Grammar Basics
No ratings yet
English Grammar Basics
1 page
Brush Up Your English PDF
No ratings yet
Brush Up Your English PDF
4 pages
Starburn
No ratings yet
Starburn
8 pages
General Physics 2 Current, Resistivity, and Resistance
100% (3)
General Physics 2 Current, Resistivity, and Resistance
36 pages
MCQ Class 11 Chapter 3
100% (2)
MCQ Class 11 Chapter 3
5 pages
Bamberg, DeFina - Schiffrin (2011) - Discourse and Identity Construction
No ratings yet
Bamberg, DeFina - Schiffrin (2011) - Discourse and Identity Construction
21 pages
Sample - Snail Market, 2032
No ratings yet
Sample - Snail Market, 2032
33 pages
2 - 2022-23 Land Course Information Page (LLAW)
No ratings yet
2 - 2022-23 Land Course Information Page (LLAW)
4 pages
P2S1
No ratings yet
P2S1
8 pages
Globalization As The - New - Colonialism - of - The - 21st - Century
No ratings yet
Globalization As The - New - Colonialism - of - The - 21st - Century
11 pages
Mobile Phone Problems Solutions
No ratings yet
Mobile Phone Problems Solutions
12 pages
Varnum, M., Et Al. (2010) - The Origins of Cultural Differences in Cognition. The Social Orientation Hypothesis.
No ratings yet
Varnum, M., Et Al. (2010) - The Origins of Cultural Differences in Cognition. The Social Orientation Hypothesis.
6 pages
Student Work Immersion Diary
No ratings yet
Student Work Immersion Diary
7 pages
Mechanical Tee Specs for Engineers
No ratings yet
Mechanical Tee Specs for Engineers
3 pages
22 Business English Expressions You Cant Live Without and How To Use Them
No ratings yet
22 Business English Expressions You Cant Live Without and How To Use Them
8 pages
Activity Sheet: Rubrics For Reflection Paper Score 10 8 6 4
No ratings yet
Activity Sheet: Rubrics For Reflection Paper Score 10 8 6 4
3 pages
WHO Child Growth Standards Length Height for Age Weight for Age Weight for Length Weight for Height and Body Mass Index for Age Methods and Development 1st Edition World Health Organization pdf download
No ratings yet
WHO Child Growth Standards Length Height for Age Weight for Age Weight for Length Weight for Height and Body Mass Index for Age Methods and Development 1st Edition World Health Organization pdf download
52 pages
Hora Pico Solar
No ratings yet
Hora Pico Solar
18 pages
Markovian Queueing Models Guide
No ratings yet
Markovian Queueing Models Guide
46 pages
Tower Scaffold Site Inspection Checklist v2 0
No ratings yet
Tower Scaffold Site Inspection Checklist v2 0
2 pages
Introduction To 21st Century Skills and Life Skills Sep 2023
No ratings yet
Introduction To 21st Century Skills and Life Skills Sep 2023
30 pages
Wuthering Heights: Dark Love Analysis
No ratings yet
Wuthering Heights: Dark Love Analysis
1 page
Bung Processor Meeting Summary
No ratings yet
Bung Processor Meeting Summary
5 pages
Common Office Phrases
No ratings yet
Common Office Phrases
3 pages
Metric Prefixes Explained
No ratings yet
Metric Prefixes Explained
7 pages
V5WE2 LA-9532P Schematic Guide
50% (2)
V5WE2 LA-9532P Schematic Guide
42 pages
Chen 2010
No ratings yet
Chen 2010
10 pages
Iso Dis 12215 7
No ratings yet
Iso Dis 12215 7
40 pages
Martin Fowler Testing Methodologies
No ratings yet
Martin Fowler Testing Methodologies
22 pages
A Case Study On-Shutdown Audit of AFBC Boiler
100% (2)
A Case Study On-Shutdown Audit of AFBC Boiler
12 pages
Mini Project
No ratings yet
Mini Project
23 pages
Faa Ac 27-1b Chg3 30-Sep-08 - Certification of Normal Category Rotorcraft
No ratings yet
Faa Ac 27-1b Chg3 30-Sep-08 - Certification of Normal Category Rotorcraft
943 pages

Data Transformation

Uploaded by

Data Transformation

Uploaded by

Class Information

AE 5418 Statistical Hydrology

Engr. Dr. Muhammad Ajmal

Agricultural Engineering Department

➢ It is a mathematical function that is applied to all the

⚫ Transformation of a variable can change its distribution from a

Why data transformation?

✓ In order to make an asymmetric distribution become more

✓ The most commonly used transformation in water resources is the

✓ The purpose of data transformation in most instances is not

✓ If the data does not meet the assumptions of a given test or

✓ The normality can be decided using different statistical tests

✓ To decide how to transform a variable, you might find the term

✓ If you need something more complex, such as multiple

We go up the ladder to remove left X2

✓ To remove right skewness, we typically take the square root,

✓ To remove left skewness, we raise the variable to a power

⚫ Goodness of fit tests

Which method conveys the

Probability Plot Equation

What is a normality test? Why is it conducted? Use any software

You might also like