MEASURES OF DISPERSION
• Measures of dispersion
describe the spread of the
data. They include the range,
interquartile range, standard
deviation and variance. Range
and Interquartile Range. The
range is given as the smallest
and largest observations. This
is the simplest measure of
variability.
Machine 1 Machine 2
10.07 8. 01
5.85 7. 99
8. 1 7.95
9. 52 8.03
6.41 15 8.02
Mean: 8.0 Mean: 8.0
STANDARD
DEVIATION
What is standard deviation?
-The standard deviation of a set of
numerical data makes use of the
amount by which each individual
data value deviates from the
mean.
If x₁, x₂, is a population of n numbers
with a mean of μ,
then the standard deviation of the
population is
If x₁, x₂, is a sample of n numbers with a
mean of x,
then the standard deviation of the
sample is
Standard deviation of the population.
Population number = 8.01, 7.99, 7.95, 8.03, 8.02
= 8.01 +7.99 +7.95 +8.03 + 8.02 = 40 =8
5 5
2
(Xi - ) = (8.01 - 8)² + (7.99 - 8)² + (7.95 - 8)² + (8.03 - 8)²
+ (8.02 - 8)²
(Xi - )2 = (0.01)² + ( -0.01)² +(-0.05)² + (0.03)² + (0.02)²
(Xi - )2 = 0.0001 + 0.0001 + 0.0025 + 0.0009 + 0.0004
(Xi - )2 = 0.004
= 0.004 = 0.03
5
Standard deviation of the sample
Population number = 8.01, 7.99, 7.95, 8.03, 8.02
x = 8.01 +7.99 +7.95 +8.03 + 8.02 x = 40 x =8
5 5
(X - X )2 = (8.01 - 8)² + (7.99 - 8)² + (7.95 - 8)² + (8.03 - 8)²
+ (8.02 - 8)²
(Xi - X)2 = (0.01)² + ( -0.01)² +(-0.05)² + (0.03)² + (0.02)²
(Xi - X)2 = 0.0001 + 0.0001 + 0.0025 + 0.0009 + 0.0004
(Xi - X)2 = 0.004
S = 0.004 S = 0.004 S = 0.03
n-1 5-1
Variance
Variance
- A statistic known as the variance is
also used as a measure of dispersion.
- The variance for a given set of data is
the square of the standard deviation
of the data.
Sample Variance Formula
X Xi- X (Xi -X)²
X = 40 X =8
7.95 - 0.05 0.0025 5
7.99 - 0.01 0.0001
8.01 0.01 0.0001 S² = 0.004
8.02 0.02 0.0004 5-1
8.03 0.03 0.0009
40 0.004 S² = 0.004
4
S² = 0.001
MEASURES OF RELATIVE
POSITION
The number of standard deviations
between the data value and the mean is
known as the data value's z-score or
standard score
Z-Scores
The z-score for a given data value x is the
number of standard deviations that x is
above or below the mean of the data.
We always remember that;
• If a z-score is equal to 0, it is on the
mean.
A positive z-score indicates the raw score
is higher than the mean.
• For example, if a z-score is equal to +1,it
is 1 standard deviation above the mean.
A negative z-score reveals the raw score
is below the mean.
• For example, if a z-score is -2, it is 2
standard deviations below the mean.
The following formulas show how to
calculate the z-score for a data value x in a
population and in a sample.
population: Raw score
Mean of the
population
Standard
deviation of the
population
Sample:
Raw score
Mean of the sample
Standard
deviation of the sample
EXAMPLE 1
Compare z-Scores
Raul has taken two tests in his chemistry
class. He scored 72 on the first test, for which
the mean of all scores was 65 and the
standard deviation was 8. He received a 60 on
a second test, for which the mean of all scores
was 45 and the standard deviation was 12. In
comparison to the other students, did Raul do
better on the first test or the second test?
SOLUTION:
Given for the First test:
72 65 8
Formula:
Raul scored 0.875 standard deviation
above the mean on the first test.
GIVEN FOR THE SECOND TEST:
60 45 12
Formula:
Raul score 1.25 standard deviation above
the mean on the second test.
PERCENTILE
Most standardized examinations provide scores
in terms of percentiles, which are defined as
follows:
pth Percentile
A value x is called the pth percentile of a data
set provided p% of the data values are less
than x.
The following formula can be used to find the
percentile that corresponds to a particular data
value in a set of data
PERCENTILE FOR A GIVEN DATA VALUE
Given a set of data and a data value x
Number of data values
less than x
Percentile of score x= .100
Total number of data
values
EXAMPLE
EXAMPLE
QUARTILES
Quartiles
keyword quarter which is to divide into
four.
In Quartiles, the data set is partitioned into 4
approximately equal group. The medians that
occupy the demarcation lines are the
Quartiles (Q1 ,Q2, Q3)
Q1 is the "middle" value in the first half of
the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the "middle" value in the second half
of the rank-ordered data set.
How to find the Quartiles using the
Medians
A. Rank the data.
B. Find the median and label it Q2.
C. Find the median Q1 of the data
group of data values less than Q2.
D. Find the median Q3 of the group of
data values greater than Q2.
EXAMPLE
FIND THE QUARTILES USING
MEDIAN
35 31 29 28 29 31 27 33 32
39 31
Sort from lowest to
Highest.
Q Q Q
1 2 3
EXAMPLE
Normal Distribution
is a continuous probability distribution that is
symmetrical around its mean, most of the
observations cluster around the central peak,
and the probabilities for values further away
from the mean taper off equally in both
directions.
Frequency Distributions
is a representation that displays
the number of observations within a
given interval. ...
Frequency distributions are
particularly useful for normal
distributions, which show the
observations of probabilities
divided among standard deviations.
Histogram
A histogram is a bar graph-like
representation of data that buckets a
range of outcomes into columns along
the x-axis.
The y-axis represents the number
count or percentage of occurrences in
the data for each column and can be
used to visualize data distributions.
What is histogram and its uses?
A histogram allows you to see the
frequency distribution of a data set. It
offers an “at a glance” picture of a
distribution pattern, charted in specific
categories. Histograms are one of the
most frequently used methods for
charting historical data. ... It's a
simple chart that employs a horizontal
and vertical axis.
EXAMPLE
"THE NORMAL
DISTRIBUTIONS
AND EMPIRICAL
RULE."
NORMAL DISTRIBUTIONS FORMS A
BELL - SHAPE CURVE THAT IS
SYMMETRIC ABOUT VERTICAL LINE
THROUGH THE MEAN OF THE DATA.
SOMETIMES CALLED THE BELL
CURVE OR NORMAL CURVE.
PROPERTIES OF A NORMAL DISTRIBUTION
🐶 THE GRAPH IS SYMMETRIC ABOUT A
VERTICAL LINE THROUGH THE MEAN OF THE
DISTRIBUTIONS.
🐶. THE MEAN, MEDIAN, MODE ARE EQUAL.
🐶. THE Y-VALUE OF EACH POINT ON THE
CURVE IS THE PERCENT (EXPRESS AS A
DECIMAL) OF THE DATA AT THE
CORRESPONDING X-VALUE.
🐶. AREAS UNDER THE CURVE THAT ARE
SYMMETRIC ABOUT THE MEAN ARE EQUAL.
🐶. THE TOTAL AREA UNDER CURVE IS 1.
THE FOLLOWING RULE, CALLED THE
EMPIRICAL RULE, DESCRIBES THE
PERCENTS OF DATA THAT IS WITHIN 1, 2,
AND 3 STANDARD DEVIATIONS OF THE
MEAN IN A NORMAL DISTRIBUTIONS.
EMPIRICAL RULE OF A NORMAL
DISTRIBUTION
🐶68% OF THE DATA LIE WITHIN I
STANDARD DEVIATION OF THE MEAN
🐶 95% OF THE DATA LIE WITH 2
STANDARD DEVIATIONS OF THE MEAN.
🐶99.7% OF THE DATA LIE WITHIN 3
STANDARD DEVIATIONS OF THE MEAN.
EXAMPLE
EXAMPLE
The Standard
normal distribution
►
►If the original distribution of X is a normal
distribution, then the corresponding
distribution of z-scores will be also a normal
distribution. This normal distribution of z
scores is called the Standard Normal
Distribution. See figures 4.7. It has a mean
of 0 and standard deviation is 1.
The Standard Normal Distribution
►The standard normal distribution is the
normal distribution that has a mean 0 and
standard deviation of 1.
►Table and calculators are often used to
determine the area under a portion of the
standard normal curve.