Chapter 7
Sampling Distributions and Point Estimation of
Parameters
Part 1: Sampling Distributions,
the Central Limit Theorem,
Point Estimation & Estimators
Sections 7-1 to 7-2
1 / 26
Statistical Inferences
A random sample is collected on a population to draw conclusions, or
make statistical inferences, about the population.
Definition (Random Sample)
The random variables X1 , X2 , . . . , Xn are a random sample of size n if...
1) the Xi ’s are independent
2) every X has the same probability distribution
Types of statistical inference:
1 Parameter estimation (e.g. estimating µ) with a confidence interval
For estimating µ, we collect data and we use the observed sample
mean x̄ as a point estimate for µ and create a confidence interval to
report a likely range in which µ lies.
2 Hypothesis testing about a population parameter (e.g. H0 : µ = 50)
We wish to compare the mean time that women and men spend at the
CRWC. H0 : µM = µW ? Or perhaps there is evidence against this
hypothesis.
2 / 26
Sample Mean X̄, a Point Estimate for µ
The sample mean X̄ is used as a point estimate for the population
parameter µ. It is a point estimate because it is a single value.
NOTATION: µ̂ = X̄ (a ‘hat’ over a parameter represents an estimator)
X̄ is the estimator here
Prior to data collection, X̄ is random variable and it is the statistic
of interest calculated from the data when estimating µ.
The value we get for X̄ (the sample mean) depends on the specific
sample chosen!
If X̄ is random variable, then it has a certain expected value,
variance, and distribution. The distribution of the random variable X̄
is called the sampling distribution of X̄.
3 / 26
Sample-to-Sample Variability
As stated earlier, there is randomness in the X̄ value we get from a
random sample. Suppose I want to estimate a population mean
height µ using a sample mean X̄.
Suppose I randomly select 50 individuals from a population, measure
their heights, and find the sample mean x̄ = 5 foot 6 inches
Suppose I repeat the process, I again randomly select 50 individuals
from a population, measure their heights, and find the sample mean
x̄ = 5 foot 8 inches
Suppose I repeat the process, I again randomly select 50 individuals
from a population, measure their heights, and find the sample mean
x̄ = 5 foot 5 inches
I didn’t do anything wrong in my data collection, this is just
SAMPLING VARIABILITY!
[NOTE: In reality, we only take one sample. The above is meant to emphasize the
existence of sample-to-sample variability.]
4 / 26
The Sampling Distribution of X̄
Definition (Sampling Distribution)
The probability distribution of a statistic is called a sampling distribution.
X̄ is a statistic calculated from a random sample X1 , X2 , . . . , Xn .
X̄ is a linear combination of random variables.
Pn
Xi 1
X̄ = i=1
n = n X1 + n1 X2 + · · · + n1 Xn
For a random sample X1 , X2 , . . . , Xn drawn from any distribution
with E(Xi ) = µ and V (Xi ) = σ 2 or Xi ∼?(µ, σ 2 ), we have
σ2
E(X̄) = µ and V (X̄) = n
But a mean and variance does not fully specify a distribution.
Do we know the probability distribution of X̄? ...
5 / 26
The Sampling Distribution of X̄
It turns out that X̄ has some predictable behavior...
If the X1 , X2 , . . . , Xn are drawn from a normal distribution, or by
notation Xi ∼ N (µ, σ 2 ) for all i, then
2
X̄ ∼ N (µ, σn ) for any sample size n.
Example
Suppose IQ scores are normally distributed with mean µ = 100 and
variance σ 2 = 256. If n = 9 IQ scores are drawn at random from this
population, what is the probability that the sample mean is less than 93?
ANSWER: Find P (X̄ < 93) (next page).
6 / 26
The Sampling Distribution of X̄
Example
Suppose IQ scores are normally distributed with mean µ = 100 and
variance σ 2 = 256. If n = 9 IQ scores are drawn at random from this
population, what is the probability that the sample mean is less than 98?
ANSWER: Find P (X̄ < 93).
We first need a distribution for X̄ (it follows a normal distribution!), and
then we’ll use it to create a Z random variable and use the Z-table.
7 / 26
The Sampling Distribution of X̄
The graphic below shows how the variability in X̄ decreases as n
2
increases. Recall X̄ ∼ N (µ, σn ).
8 / 26
The Sampling Distribution of X̄
Notation:
E(X̄) = µX̄ = E(X) = µ
2 = V (X) σ2
V (X̄) = σX̄ n = n
Terminology:
The term standard
p deviation refers to the population standard
deviation, or V (X) = σ, and...
Z = X−µ
σ
The term standard error is a value related to X̄ and is also more
fully stated as the standard error of the sample mean and it is the
square root of the variance of X̄. q
σ2
p
Std. Error of X̄ is V (X̄) = √σ
n = n
And then...
X̄ − µ X̄ − µ
Z = q = √
σ2 σ/ n
n
9 / 26
The Sampling Distribution of X̄
Even when Xi are NOT drawn from a normal distribution, it turns
out that X̄ has some predictable behavior...
If the X1 , X2 , . . . , Xn were NOT drawn from a normal distribution,
or by notation Xi ∼?(µ, σ 2 ) for all i, then X̄ is approximately
normally distributed as long as n is large enough or
· 2
X̄ ∼ N (µ, σn ) for n > 25 or 30.
Thus, X̄ follows a normal distribution!!! (for a sufficiently large n)
This is an incredibly useful result for calculating probabilities for X̄!!
10 / 26
The Sampling Distribution of X̄
Example (Probability for X̄, Flaws in a copper wire)
Let X denote the number of flaws in a 1 inch length of copper wire. The
probability mass function of X is presented in the following table:
x P (X = x)
0 0.48
1 0.39
2 0.12
3 0.01
Suppose n = 100 wires are sampled from this population. What is the
probability that the average number of flaws per wire in the sample is less
than 0.5? (i.e. find P (X̄ < 0.5)... next page)
11 / 26
The Sampling Distribution of X̄
Example (Probability for X̄, Flaws in a copper wire)
ANSWER: P (X̄ < 0.5))=
12 / 26
Central Limit Theorem (CLT)
Definition (Central Limit Theorem)
Let X1 , X2 , . . . , Xn be a random sample drawn from any population (or
distribution) with mean µ and variance σ 2 . If the sample size is
*sufficiently large*, then X̄ follows an approximate normal distribution.
d 2
We write: X̄ → N (µ, σn ) as n → ∞
X̄−µ d
Or: Z = σ/√n → N (0, 1) as n → ∞
If the random sample is drawn from a non-normal population, then X̄ is
approximately normal for sufficient large n (at least 25 or 30) and the
approximation gets better and better as n increases.
NOTE: If the original ‘parent population’ from which the sample was drawn is
normal, then X̄ follows a normal distribution for any n (a linear combination of
normals is normal), and the CLT is not needed to achieve normality.
13 / 26
The Sampling Distribution of X̄ (simulation)
Let’s simulate this situation...
f(x)
Case 1: Original population is normally distributed x
1 Choose a sample of size n from a normal distribution
2 Compute x̄
3 Plot the x̄ on our frequency histogram
4 Do steps 1-3 many time, such as 1000 times
5 Draw a histogram of the 1000 x̄ values
(to see the sampling distribution of X̄)
See applet at:
http://onlinestatbook.com/stat sim/sampling dist/index.html
14 / 26
The Sampling Distribution of X̄ (simulation)
Case 1: Original population is normally distributed (with n=2)
The empirical distribution for X̄n=2 is in the lower plot (in blue). Its mean
is very close to the parent population mean µ = 16, and its s.d. of 3.59 is
very close to the theoretical σx̄ = √σn = √52 = 3.54.
15 / 26
The Sampling Distribution of X̄ (simulation)
Case 1: Original population is normally distributed (with n=25)
The empirical distribution for X̄n=25 is in the lower plot (in blue). Its
mean is very close to the parent population mean µ = 16, and its s.d. of
of 1.0 is the same as the theoretical σx̄ = √σn = √525 = 1.
16 / 26
The Sampling Distribution of X̄ (simulation)
f(x)
RESULT - If the parent population (the one you are drawing from)
is normal , then X̄ will follow a normal distribution for any sample
size n with known mean and variance as show below.
2
X̄ ∼ N (µ, σn )
17 / 26
The Sampling Distribution of X̄ (simulation)
Let’s simulate this situation...
Case 2: Original population is NOT normally distributed...
f(x)
f(x)
f(x)
x x x
1 Choose a sample of size n from a NON-normal distribution
2 Compute x̄
3 Plot the x̄ on our frequency histogram
4 Do steps 1-3 many time, such as 1000 times
5 Draw a histogram of the 1000 x̄ values
(to see the sampling distribution of X̄)
See applet at:
http://onlinestatbook.com/stat sim/sampling dist/index.html
18 / 26
The Sampling Distribution of X̄ (simulation)
Case 2: Original population is NOT normally distributed
(with right-skewed parent population and n=10)
The empirical distribution for X̄n=10 is in the lower plot (in blue). Its
bell-shaped with a mean equal to the parent population mean µ = 8.08.
Its s.d. of 1.96 is very close to the theoretical σx̄ = √σn = √
6.22
10
= 1.97.
19 / 26
The Sampling Distribution of X̄ (simulation)
Case 2: Original population is NOT normally distributed
(with very non-normal parent population and n=2)
FAIL!!!! The empirical distribution for X̄n=2 is in the lower plot (in blue)
and it is not normally distributed. This is just too small of a sample size to
overcome the very non-normal parent population. 20 / 26
The Sampling Distribution of X̄ (simulation)
Case 2: Original population is NOT normally distributed
(with very non-normal parent population and n=25)
The empirical distribution for X̄n=25 is in the lower plot (in blue). Its bell-shaped
with a mean close to the parent population mean µ = 16.92. Its s.d. of 2.46 is
very close to the theoretical σx̄ = √σn = 12.29
√
25
= 2.458. 21 / 26
The Sampling Distribution of X̄ (simulation)
f(x)
f(x)
f(x)
x x x
RESULT - If the parent population (the one you are drawing from)
is NOT normal , then X̄ will follow an approximate normal
distribution for sufficiently large n (we’ll say n > 25 or 30).
· 2
X̄ ∼ N (µ, σn )
This is the Central Limit Theorem.
The approximation improves as n increases.
22 / 26
The Sampling Distribution of X̄
A couple comments:
Averages are less variable than individual observations.
The distribution for X̄ has less variability than the distribution for X.
The distribution of our estimator X̄n is squeezed closer to, or is
tighter, around the thing we’re trying to estimate as n increases.
For some non-normal distributions, the approximation is pretty good
for n lower than 25 or 30, so it depends on the parent population
from which we are drawing.
23 / 26
The Sampling Distribution of X̄
The next graphic shows 3 different original populations (one nearly
normal, two that are not), and the sampling distribution for X̄ based
on a sample of size n = 5 and size n = 30.
The three original distributions are on the far left (one that is nearly
symmetric and bell-shaped, one that is right skewed, and one that is
highly right skewed).
The graphic emphasizes the concept that the normal approximation
becomes better as n increases.
24 / 26
The Sampling Distribution of X̄
As shown in: Navidi, W. ‘Statistics for Engineers and Scientists’, McGraw Hill, 2006 25 / 26
The Sampling Distribution of X̄
The variability of X̄ decreases as n increases
2
Recall: V (X̄) = σn .
If the original population has a shape that’s closer to normal, smaller
n is sufficient for X̄ to be normal.
The normal approximation gets better with larger n when you’re
starting with a non-normal population.
Even when X has a very non-normal distribution, X̄ still has a normal
distribution with a large enough n.
26 / 26