Chapter 3.
Describing data with numerical measures
1. Population and sample means
1) Population mean
· The population mean is the sum of all the values in the
population divided by the number of values in the population.
· Formula:
Sum of all the values in the population
Population mean =
Number of values in the population
Or
∑
=
Where:
represents the population mean.
is the number of values in the population.
represents any particular value.
∑ indicates the operation of adding.
∑ is the sum of the X values in the population.
· Parameter
- Any measurable characteristic of a population.
- The mean of a population is a parameter.
· Example: Compute the mean of the following population
values:
6 3 5 7 6
∑ 6 + 3 + 5 + 7 + 6 27
= = = = 5.4.
5 5
1
2) The sample mean
· The sample mean is the sum of all the sampled values divided
by the number of sampled values.
· Formula:
Sum of all the values in the sample
Sample mean =
Number of values in the sample
Or
∑
=
Where:
represents the sample mean.
is the number of values in the sample.
· Statistic
- A measurable characteristic of a sample.
- The mean of a sample is a statistic.
· Example: Compute the mean of the following sample values:
16.25 12.91 14.58
∑ 16.25 + 12.91 + 14.58 43.74
= = = = 14.58.
3 3
2
3) Properties of the mean
a. Every set of interval- or ratio-level data has a mean.
b. All the values are included in computing the mean.
c. The mean is unique.
d. The sum of the deviation of each value from the mean is zero.
Expressed symbolically:
∑( − ) = 0.
As an example, the mean of 3, 8, and 4 is 5. Then,
( − ) = (3 − 5) + (8 − 5) + (4 − 5)
= −2 + 3 − 1 = 0.
3
2. Median vs. Mode
1) Median
· The midpoint of the values after they have been ordered from
the smallest to the largest, or the largest to the smallest.
· The median is denoted by m.
· If n is odd, m is the middle number; if n is even, m is the mean
of the middle two numbers.
· Examples
a. The observations 5, 7, 4, 5, 20, 6, 2.
b. The observations 5, 7, 4, 5, 20, 6.
· Properties:
a. It is not affected by extremely large or small values.
b. It can be computed for ordinal-level data or higher.
4
2) Mode
· The value of the observation that appears most frequently.
· Example: Recall Table 2-3.
Selling Prices
($ thousands) Frequency
15 up to 18 8
18 up to 21 23
21 up to 24 17
24 up to 27 18
27 up to 30 8
30 up to 33 4
33 up to 36 2
Total 80
5
3. The relative positions of the mean, median, and mode
1) Symmetric distribution (with single peak)
· Mean = Median = Mode.
2) Positively skewed distribution
· Skewed to the right
· Example: Income distribution
· Mode < Median < Mean.
3) Negatively skewed distribution
· Skewed to the left
· Mean < Median < Mode.
6
4. Why study dispersion?
1) To get the information about the spread of the data.
· The mean or the median, only describes the center of the data.
It is valuable from that standpoint, but it does not tell us
anything about the spread of the data.
· For example, if your nature guide told you that the river ahead
averaged 3 feet in depth, would you want to wade across on
foot without additional information? Probably not. You would
want to know something about the variation in the depth.
2) To compare the spread in two or more distributions.
· Suppose a statistics instructor has two classes, one in the
morning and one in the evening; each with six students. In the
morning class (AM) the students’ ages are 18, 20, 21, 21, 23,
and 23 years. In the evening class (PM) the ages are 17, 17, 18,
20, 25, and 29 years. Note that for both classes the mean age is
21 years but there is more variation or dispersion in the ages of
the evening students.
7
5. Measures of Dispersion
1) Range
· The difference between the largest and the smallest values in a
data set.
· Formula:
Range = Largest value − Smallest value
· Example: The statistics instructor referred to above has two
classes with the ages indicated:
A.M. Class: 18, 20, 21, 21, 23, 23
P.M. Class: 17, 17, 18, 20, 25, 29
The range for the classes is:
A.M. Class: (23 - 18) = 5
P.M. Class: (29 - 17) = 12
Thus we can say that there is more spread in the ages of the
students enrolled in the evening (P.M.) class compared with
the morning (A.M.) class.
8
2) Mean Deviation
· The arithmetic mean of the absolute values of the deviations
from the arithmetic mean.
· In terms of a formula, the mean deviation (MD) is computed
for a sample by:
∑ | − |
=
Where
is the value of each observation.
is the sample mean.
is the number of observations in the sample.
| | indicates the absolute value.
· Example: The number of cappuccinos sold at the Starbucks
location in the Orange Country Airport between 4 and 7 p.m.
For a sample of 5 days last year were 20, 40, 50, 60, and 80.
Determine the mean deviation for the number of cappuccinos
sold.
Mean = 50, Median = 50, Range = 80 - 20 = 60
9
3) Population variance and population standard deviation (SD)
· Population variance:
∑( − ) ∑
= = −
Where
is the population variance. (Read “sigma square”.)
is a particular value in the population.
is the population mean.
is the number of values in the population.
· Population standard deviation:
∑( − ) ∑
= = −
· Note the process of computing the variance.
a. We begin by finding the mean.
b. Next we find the difference between each observation and
the mean, and square that difference.
c. Then we sum all the squared differences.
d. And finally we divide the sum of the squared differences by
the number of items in the population.
· Note the process of computing the standard deviation.
a-d. We repeat a-d to get the population variance
e. Take the square root of the population variance.
10
· Example: Consider these five values a population: 8, 3, 7, 3,
and 4.
a. Determine the mean of the population. (5)
b. Determine the variance. (4.4)
c. Determine the standard deviation. (2.098)
X − ( − )
8 8–5=3 9
3 3 – 5 = -2 4
7 7–5=2 4
3 3 – 5 = -2 4
4 4 – 5 = -1 1
Sum 25 22
11
4) Sample variance and sample standard deviation (SD)
· The conversion from population variance to sample variance
requires a change in the denominator. Instead of substituting n,
the number in the sample, for N, the number in the population,
we replace N with (n – 1).
· Thus, the formula for the sample variance is:
∑( − )
=
−1
Where
is the sample variance.
is the value of each observation in the sample.
is the sample mean.
is the number of observations in the population.
· Changing the denominator to ( – 1 ) seems insignificant,
however the use of n tends to underestimate the population
variance. The use of ( – 1) in the denominator provides an
appropriate correction factor.
· Sample standard deviation:
∑( − )
= .
−1
12
· Example: The hourly wages for a sample of part-time
part
employees at Home Depot are: $12, $20, $16, $18, and $19.
What is the sample variance?
13
6. Interpretation and uses of the standard deviation
1) General interpretation
· The standard deviation is used to measure the spread of the
data.
· A small standard deviation indicates that the data is clustered
close to the mean, thus the mean is representative of the data.
· A large standard deviation indicates that the data are spread
out from the mean and the mean is not representative of the
data.
2) Chebyshev’s Theorem
· We can use Chebyshev’s theorem to determine the percent of
the values that lie within a specified number of standard
deviations of the mean.
· Definition: For any set of observations (sample or population),
the proportion of the values that lie within k standard
deviations of the mean is at least 1 – 1/k2, where k is any
constant greater than 1.
· The theorem holds for any set of observations regardless of the
shape of the distribution.
· Example: The arithmetic mean biweekly amount contributed
by the Dupree Paint employees to the company’s profit-
sharing plan is $51.54, and the standard deviation is $7.51. At
least what percent of the contributions lie within plus 3.5
standard deviations and minus 3.5 standard deviations of the
mean?
About 92 percent, found by
1 1 1
1− =1− = 1 − = 0.92
(3.5) 12.25
14
3) The Empirical Rule (68-95-99.7 rule)
· If the distribution is approximately symmetrical and bell
shaped, then the Empirical Rule or Normal Rule as it is often
called is applied.
· Definition: ForF a symmetrical, bell-shaped
shaped frequency
distribution, approximately 68 percent of the observations will
lie within plus and minus one standard deviation of the mean;
about 95 percent of the observations will lie within plus and
minus two standard deviations of the mean; and practically all
(99.7 percent) will lie within plus and minus three standard
deviations of the mean.
· The rule states that:
a. The mean, plus and minus one standard deviation,
deviatio will
include about 68% of the observations.
b. The mean, plus and minus two standard deviations, will
include about 95% of the observations.
c. The mean, plus and minus three standard deviations, will
include about 99.7% of the observations.
· [Chart 3-7] A symmetric and bell-shaped
shaped distribution with a
mean of 100 and a standard deviation of 10.
15
· Example: The distribution of the weights of a sample of 1400
cargo containers is symmetric and bell-shaped. According to
the Empirical rule, what percent of the weights will lie:
a. Between − 2 and + 2?
b. Between and + 2? Below − 2?
16