Measures of Variation
(Dispersion)
Variation (Dispersion)
How observations in a data are spread about
an average value (Mean, Median, Mode).
If observations are close to averages (Mean,
Median, Mode) then variation is small but if
observations are spread away from the
centre, we say variation is large.
Explanation
Suppose we have three groups of students
who have obtained following marks in the test.
Group A: 46 48 50 52 54 XA = 50
Group B: 30 40 50 60 70 XB = 50
Group C: 40 50 60 70 80 XC = 60
Explanation
Group A and group B have the same Means, but in
group A observations are concentrated on the centre.
While in group B the observations are not close to the
centre (Mean) that is one observation is as small as 30
and the other one is as large as 70. In group B and C
the Variations are same but Means are different. In
group A and C both variation and Means are different.
IMPORTANCE
(1) With the help of variation we can maintain
consistency in the wages of factory workers, for the
satisfaction of workers.
(2) We can measure the economic disparity with the
help of variation.
(3) Variance is important to predicts about the prices of
commodities, standard of living of different peoples,
distribution of wealth, land and so on.
How Can We Measure Variability?
Followings are the parameters use to
measure Variations:
– Range
– Variance
– Standard Deviation
– Coefficient of Variation
(1) RANGE
The Range is the difference between the
maximum (Xmax) and minimum (Xmin) values in
a data set i.e.,
Range = Xmax - Xmin .
In case of grouped data:
Range = Upper boundary of the highest class
minus (-) Lower boundary of the lowest class
Application
(1) It is the simplest measure of dispersion.
(2) It has its application in quality control
methods which are used to maintain the quality
of the products produced in the factories. The
quality of products is to be kept within certain
range of values.
EXAMPLE
Two experimental brands of outdoor paint are
tested to see how long each will last before
fading. Six cans of each brand constitute a
small population. The results (in months) are
shown. Find the mean and range of each
group.
EXAMPLE
Brand A Brand B
10 35 X 210
35
Brand A: N 6
60 45
R 60 10 50
50 30
30 35 X
210
35
Brand B: N 6
40 40
R 45 25 20
20 25
EXAMPLE
The average for both brands is the same, but
the range for Brand A is much greater than the
range for Brand B.
(2) Variance & Standard Deviation
The variance is the average of the
squares of the distance each value is
from the mean.
The standard deviation is the square
root of the variance.
The standard deviation is a measure of
how spread out your data are.
Uses of the Variance and Standard
Deviation
To determine the spread of the data.
To determine the consistency of a
variable.
To determine the number of data values
that fall within a specified interval in a
distribution
Used in inferential statistics.
FORMULAS
The population variance is
X
2
2
N
The population standard deviation is
X
2
N
EXAMPLE
Find the variance and standard deviation
for the data set for Brand A paint.
10, 60, 50, 30,40, 20.
Solution
X
2
Months, X µ X-µ (X - µ) 2
2
n
10 35 -25 625
60 35 25 625 1750
50 35 15 225 6
30 35 -5 25
40 35 5 25
291.7
20 35 15 225
1750
1725
6
17.1
Variance & Standard Deviation
(Sample Theoretical Model)
The sample variance is
n X X
2 2
s
2
n n 1
The sample standard deviation
is:
s s 2
EXAMPLE
Find the variance and standard deviation
for the amount of European auto sales for
a sample of 6 years. The data are in
millions of Dollars:
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution
n X X
X X2 2 2
11.2 125.44 s
2
11.9 141.61 n n 1
12.0 144.0
6 958.94 75.6
2
12.8 163.84
13.4 179.56 s
2
14.3 204.49 6 5
75.6 958.94
2
s 6 958.94 75.6 / 6 5
2
s 2 1.28
s 1.13
Coefficient of Variation
The coefficient of variation is the
standard deviation divided by the mean,
expressed as a percentage.
s
CVAR 100%
X
Use CVAR to compare standard
deviations when the units are different
Example
The mean of the number of sales of cars
over a 3-month period is 87, and the
standard deviation is 5. The mean of the
commissions is $5225, and the standard
deviation is $773. Compare the variations
of the two.
Solution
5
CVar 100% 5.7% Sales
87
773
CVar 100% 14.8% Commissions
5225
Commissions are more variable than sales.
Practice problems
Exercise 3-3
Page # 126, Q # 10, 11, 12.
Page # 128, Q # 28-31.
Measures of Position
It identify the position of data value in a
data set, using various measures of
position such as percentiles, deciles
and quartiles etc.
How Can We Measure position?
Standard scores OR Z-scores
Percentiles
Quartile
Outlier
Standard scores OR Z-scores
A z-score or standard score for a value is obtained by
subtracting the mean from the value and dividing the
result by the standard deviation.
X X X
z z
s
The z scores represents the number of standard
deviations that a data value falls above or below the mean
EXAMPLE
A student scored 65 on a calculus test
that had a mean of 50 and a standard
deviation of 10; he scored 30 on a history
test with a mean of 25 and a standard
deviation of 5. Compare his relative
positions on the two tests.
Solution
X X 65 50
z 1.5 Calculus
s 10
X X 30 25
z 1.0 History
s 5
Since z score in calculus is higher, he
has a higher relative position in the
Calculus class than history.
NOTE
If z score is positive, the score is above
the mean. If z score is zero, the score
is same as mean. And if z score is
negative, the score is below the mean.
Percentiles
Percentiles separate the data set into
100 equal groups.
A percentile rank for a datum represents
the percentage of data values below the
datum.
FORMULA
Percentile
# of values below X 0.5
100%
total # of values
The value correspond to the given
percentile is: c n p
100
Example 1
A teacher gives a 20-point test to 10
students. Find the percentile rank of a
score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution
Sort in ascending order:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
# of values below X 0.5
Percentile 100%
total # of values
6 0.5
100%
10 A student whose score
65% was 12 did better than
65% of the class.
EXAMPLE 2
A teacher gives a 20-point test to 10
students. Find the value corresponding to
the 25th percentile.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
SOLUTION
Sort in ascending order.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
n p
c
100
10 25
2.5 3
100
The value 5 corresponds to the 25th
percentile.
Quartile
Quartiles separate the data set into 4
equal groups. Q1=P25, Q2=MD, Q3=P75
25% 25% 25% 25%
Q1 Q2 Q3
PROCEDURE
Arrange the data in order from lowest to
highest.
Find the median of data values. This is the
values for Q2.
Find the median of the data falls below Q 2. This
is the value for Q1.
Find the median of the data that fall above Q 2.
This is value for Q3.
EXAMPLE
Find Q1, Q2, Q3.for the data set.
15, 13, 6, 5,12, 50, 22, 18.
Solution: Step 1: 5, 6, 12, 13, 15, 18, 22, 50
Step 2: Find the median Q2
5, 6, 12, 13, 15, 18, 22, 50
Median
Median = 13 +15 =14
2
EXAMPLE
Step3: Find the median of the data values less
Than 14.
5, 6, 12, 13
Q1 Q1 = 6 + 12 = 9
2
EXAMPLE
Step 4: Find the median of the data values greater
than 14.
15, 18, 22, 50
Q3
Q3 = 18+22 = 20
2
OUTLIER
An OUTLIER is an extremely high or
extremely low data value when compared
with the rest of the data values.
The Interquartile Range,
Range
IQR = Q3 – Q1.
Procedure to find OUTLIER
A data value less than Q1 – 1.5(IQR) or
greater than Q3 + 1.5(IQR) can be
considered an outlier.
EXAMPLE
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50
Solution: We have found Q1 = 9 and Q3 =20
Interquartile range (IQR) = Q3 - Q1 = 20-9=11
Q1 - 1.5(IQR) = 9 – 1.5(11) = -7.5
Q3 + 1.5(IQR) = 20 + 1.5(11) = 36.5
Now check the data set for any data values that falls
outside the interval [-7.5, 36.5]. The only value is 50, so
it will be consider as an outlier.
Practice Problems
Page # 141: Q#10, Q#12, Q#14, Q#15.
Page # 142: Q#22, Q#24, Q#26, Q#28
SKEWNESSS
Skewness measures the degree of asymmetry
exhibited by the data
(x i x) 3
skewness i 1
3
ns
NOTE
The histogram is an effective graphical
technique for showing both the skewness
and kurtosis of a data set
Positive skewness vs negative
skewness
Positive skewness
– There are more observations below the mean than above it
– When the mean is greater than the median
Negative skewness
– There are a small number of low observations and a large
number of high ones
– When the median is greater than the mean
Positive skewness vs negative
skewness
If skewness equals zero, the histogram is
symmetric about the mean
Kurtosis
Kurtosis measures how peaked the histogram is
n
i
( x x ) 4
kurtosis i
4
3
ns
The kurtosis of a normal distribution is 0
Kurtosis characterizes the relative peakedness or flatness of a
distribution compared to the normal distribution