11/16/20
END 311E
STATISTICS
Sampling and Sampling Distributions
16.11.2020 1
Population
census
Are we able to observe all
members of the population?
Motor vehicle
registration
1
11/16/20
16.11.2020 3
Sampling
§ Sampling: The process or action of taking samples representing the population
§ Sample: a finite part of a population whose properties are studied to gain
information about the whole
§ Sample Statistics, any value of observed data, especially one used to estimate
the corresponding parameter of the population:
16.11.2020 4
2
11/16/20
Why Sample?
• Selecting a sample is less time-consuming than selecting every item in
the population.
• Selecting a sample is less costly than selecting every item in the
population.
• An analysis of a sample is less cumbersome and more practical than an
analysis of the entire population.
16.11.2020 5
Sampling Distributions
• A sampling distribution is a distribution of all of the possible values of a sample
statistic for a given size sample selected from a population.
• For example, suppose you sample 50 students from your college regarding their mean
GPA. If you obtained many different samples of 50, you will compute a different mean
for each sample. We are interested in the distribution of all potential mean GPA we
might calculate for any given sample of 50 students.
16.11.2020 6
3
11/16/20
Developing a Sampling Distribution
• Assume there is a population
D
A B C
• Population size 𝑁 = 4
• Random variable, X, is age of individuals
• Values of X: 18, 20, 22, 24 (years)
16.11.2020 7
Developing a Sampling Distribution
Summary Measures for the Population Distribution:
μ=
åX i
P(x)
N .3
18 + 20 + 22 + 24 .2
= = 21
4 .1
σ=
å (X - μ)
i
2
= 2.236
18
A
20
B
22
C
24
D
x
N
Uniform Distribution
16.11.2020 8
4
11/16/20
Developing a Sampling Distribution
Now consider all possible samples of size n=2
16 Sample Means
1st 2nd Observation
Obs
18 20 22 24
18 18,18 18,20 18,22 18,24 18 20 22 24
20 20,18 20,20 20,22 20,24 18 18 19 20 21
22 22,18 22,20 22,22 22,24 20 19 20 21 22
24 24,18 24,20 24,22 24,24 22 20 21 22 23
24 21 22 23 24
16 possible samples
(sampling with
replacement)
16.11.2020 9
Developing a Sampling Distribution
Sampling Distribution of All Sample Means
16 Sample Means Sample Means Distribution
_
P(X)
18 20 22 24
.3
18 18 19 20 21
20 19 20 21 22 .2
22 20 21 22 23
.1
24 21 22 23 24
0 _
18 19 20 21 22 23 24 X
16.11.2020 (no longer uniform)
10
5
11/16/20
Developing a Sampling Distribution
Summary Measures of this Sampling Distribution:
18 + 19 + 19 + ! + 24
μX = = 21
16
(18 - 21) 2 + (19 - 21)2 + ! + (24 - 21)2
σX = = 1.58
16
Note: Here we divide by 16 because there are 16 different samples of size 2.
16.11.2020 11
11
Comparing the Population Distribution to the Sample Means
Distribution
Population Sample Means Distribution
N=4 n=2
μ = 21 σ = 2.236 μX = 21 σ X = 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0
18 20 22 24 X
0
18 19 20 21 22 23 24
_
X
A B C D
16.11.2020 12
12
6
11/16/20
Population mean = The mean of all possible samples
Sample mean is an UNBIASED ESTIMATOR for the population mean!!
Suppose we have random samples 𝑋! , 𝑋" , … , 𝑋#
#
𝐸 % 𝑋$ = 𝐸 𝑋! + 𝐸 𝑋" + ⋯ + 𝐸 𝑋# = 𝑛𝜇
$%!
𝜇 𝜇 𝜇
#
1 1
𝐸 𝑋+ = 𝐸 % 𝑋$ = 𝑛𝜇 = 𝜇
𝑛 𝑛
$%!
16.11.2020 13
13
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σ
σX =
n
• Note that the standard error of the mean decreases as the
sample size increases
16.11.2020 14
14
7
11/16/20
Sample variance is a BIASED ESTIMATOR for the population variance!!
Suppose we have random samples 𝑋! , 𝑋" , … , 𝑋#
#
Var % 𝑋$ = Var 𝑋! + Var 𝑋" + ⋯ + Var 𝑋# = 𝑛𝜎 "
$%!
𝜎 𝜎 𝜎
#
1 1 𝜎"
Var 𝑋+ = Var % 𝑋$ = "
𝑛𝜎 " =
𝑛 𝑛 𝑛
$%!
16.11.2020 15
15
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normal with mean μ and standard
deviation σ, the sampling distribution of X is also
normally distributed with
σ
μX = μ and σX =
n
16.11.2020 16
16
8
11/16/20
Z-value for Sampling Distribution of the Mean
• Z-value for the sampling distribution of : X
( X - μX ) ( X - μ)
Z= =
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
16.11.2020 17
17
Sampling Distribution Properties
μx = μ
Normal Population
Distribution
μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)
μx
x
16.11.2020 18
18
9
11/16/20
Example
• A cereal firm fills thousands of boxes of cereals during a day.
• To be consistent with the package labeling, boxes should contain 368 grams of
cereal.
• Cereal weight varies from box to box.
• Given that the standard deviation of the cereal-filling process is 15 grams,
• What will the standard error be for a sample contains 25 boxes?
16.11.2020 19
19
Example
The standard error is 3.
𝜎 15
= =3 The variation in the sample means for samples
𝑛 25 of n = 25 is much less than the variation in the
individual boxes of cereal
16.11.2020 20
20
10
11/16/20
Example
• What is the probability that the mean of a sample (n=25) being less than 365 gr?
( X - μX ) ( X - μ)
Z= =
σX σ
n
#$%&#$'
𝑍= #
= −1
P{𝑍 < −1} = 0.1587
16.11.2020 21
21
Example 𝑃{𝑍 < −1}
𝑃{𝑍 > 1}
16.11.2020 22
22
11
11/16/20
Example
𝑃{𝑍 < 1}
1 − 𝑃{𝑍 < 1}
16.11.2020 𝑃 𝑍 < −1 = 1 − 𝑃{𝑍 < 1} 23
23
Example
• Find the probability that the mean being less than 365 gr using the population
parameters.
#$%&#$'
𝑍= 𝟏𝟓
= −0.2
P{𝑍 < −0.2} = 0.4207
many more individual boxes than
sample means are below 365 grams
the chance that the sample mean of
25 boxes is far away from
the population mean is less than the
chance that a single box is far away!
16.11.2020 24
24
12
11/16/20
Example
taking a larger sample results in
less variability in the sample means from
sample to sample!
• What is the probability that the mean of a sample (n=100) being less than 365
15
𝜎-, = = 1.5
100
#$%&#$'
𝑍= = −2
*.%
P{𝑍 < −2} = 0.0228
16.11.2020 25
25
Sampling Distribution Properties
As n increases, Larger sample
size
σ x decreases
Smaller sample
size
μ x
16.11.2020 26
26
13
11/16/20
Determining An Interval Including A Fixed Proportion of the
Sample Means
Find a symmetrically distributed interval around µ
that will include 95% of the sample means when µ
= 368, σ = 15, and n = 25.
• Since the interval contains 95% of the sample
means 5% of the sample means will be outside
the interval
• Since the interval is symmetric 2.5% will be above
the upper limit and 2.5% will be below the lower
limit.
• From the standardized normal table, the Z score
with 2.5% (0.0250) below it is -1.96 and the Z
score with 2.5% (0.0250) above it is 1.96.
16.11.2020 27
27
Determining An Interval Including A Fixed Proportion of the
Sample Means
• Calculating the lower limit of the interval
σ 15
XL = μ + Z = 368 + (-1.96) = 362.12
n 25
• Calculating the upper limit of the interval
σ 15
XU = μ + Z
= 368 + (1.96) = 373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and
373.88
16.11.2020 28
28
14
11/16/20
Sample Mean Sampling Distribution:
If the Population is not Normal
• We can apply the Central Limit Theorem:
• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough.
Properties of the sampling distribution:
σ
μx = μ and σx =
n
16.11.2020 29
29
𝐸 𝑆! = 𝜎!
∑%"#$ 𝑋" − 𝑋, !
𝐸 = 𝜎 ! ⟹ 𝑘 =?
𝑘
%
1
𝐸[9 𝑋" − 𝑋, ! ] = 𝜎 !
𝑘
"#$
𝑋" − 𝜇 − (𝑋, − 𝜇) 𝑋" − 𝑋,
%
1 !
𝐸[9 𝑋" − 𝜇 − (𝑋, − 𝜇) ] = 𝜎 !
𝑘
"#$
%
1
𝐸[9 𝑋" − 𝜇 !
− 2 𝑋" − 𝜇 𝑋, − 𝜇 + (𝑋, − 𝜇)! ] = 𝜎 !
𝑘
"#$
%
1
9(𝐸[ 𝑋" − 𝜇 ! ] − 2𝐸[ 𝑋" − 𝜇 𝑋, − 𝜇 ] + 𝐸[(𝑋, − 𝜇)! ]) = 𝜎 !
𝑘
"#$
30
15
11/16/20
%
1
9(𝐸[ 𝑋" − 𝜇 ! ] − 2𝐸[ 𝑋" − 𝜇 𝑋, − 𝜇 ] + 𝐸[(𝑋, − 𝜇)! ]) = 𝜎 !
𝑘
"#$
𝜎! The square of the
&!
standard error:
%
%
1 𝜎! 𝒏 constant
(𝑛𝜎 ! + 𝑛 − 2 𝐸[9 𝑋" − 𝜇 𝑋, − 𝜇 ]) = 𝜎 !
𝑘 𝑛 𝒏
"#$
%
1 𝒏
(𝑛𝜎 ! + 𝜎 ! − 2 𝐸[ 𝑋, − 𝜇 9 𝑋" − 𝜇 ]) = 𝜎 !
𝑘 𝒏
"#$
%
1 𝑋" − 𝜇
(𝑛𝜎 ! + 𝜎 ! − 2𝒏𝐸[ 𝑋, − 𝜇 9 ]) = 𝜎 !
𝑘 𝒏
"#$
31
%
1 𝑋" − 𝜇
(𝑛𝜎 ! + 𝜎 ! − 2𝒏𝐸[ 𝑋, − 𝜇 9 ]) = 𝜎 !
𝑘 𝒏
"#$
1
(𝑛𝜎 ! + 𝜎 ! − 2𝒏𝐸[ 𝑋, − 𝜇 ! ]) = 𝜎 !
𝑘
𝜎!
𝑛
1 𝜎! 1
(𝑛𝜎 ! + 𝜎 ! − 2𝑛 ) = 𝜎 ! ⟹ (𝑛𝜎 ! + 𝜎 ! − 2𝜎 ! ) = 𝜎 !
𝑘 𝑛 𝑘
1 1 ∑%
"#$(𝑋" − 𝑋)
!
(𝑛𝜎 ! − 𝜎 ! ) = 𝜎 ! ⟹ (𝑛 − 1)𝜎 ! = 𝜎 ! ⟹ 𝑘 = (𝑛 − 1) ⟹ 𝑆=
𝑘 𝑘 𝑛−1
32
16
11/16/20
Central Limit Theorem
the sampling
As the n↑
distribution of the
sample sample mean becomes
size gets almost normal
large regardless of shape of
enough… population
16.11.2020
x 33
33
Sample Mean Sampling Distribution:
If the Population is not Normal
Population Distribution
Sampling distribution
properties:
Central Tendency
μx = μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller sample
size
sample
size
16.11.2020 μx x 34
34
17
11/16/20
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n
= 36 is selected.
• What is the probability that the sample mean is
between 7.8 and 8.2?
16.11.2020 35
35
Example
Solution:
• Even if the population is not normally distributed, the
central limit theorem can be used (n is relatively large)
• … so the sampling distribution of x is approximately
normal
• … with μx = 8
σ 3
• …and σx = = = 0.5
n 36
16.11.2020 36
36
18
11/16/20
Example
Solution (continued):
æ ö
ç 7.8 - 8 X -μ 8.2 - 8 ÷
P(7.8 < X < 8.2) = Pç < < ÷
ç 3 σ 3 ÷
è 36 n 36 ø
= P(-0.4 < Z < 0.4) = 0.6554 - 0.3446 = 0.3108
Population Sampling Standard Normal
Distribution Distribution Distribution
???
? ??
? ?
? ? ? Sample Standardize
?
7.8 8.2 -0.4 0.4
Z
μ=8 X
μX = 8
x μz = 0
16.11.2020 37
37
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion (p) provides an estimate of π:
X number of items in the sample having the characteristic of interest
p= =
n sample size
• 0≤p≤1
• p is approximately distributed as a normal distribution when n is large
(assuming sampling with replacement from a finite population or without replacement from an infinite
population)
16.11.2020 38
38
19
11/16/20
Sampling Distribution of p
• Approximated by a
Sampling Distribution
normal distribution if: P(ps)
.3
•
nπ ³ 5 .2
.1
and 0
0 .2 .4 .6 8 1 p
n(1 - π ) ³ 5
where
π(1- π )
μp = π and σp =
n
(where π = population proportion)
16.11.2020 39
39
Standardize p to a Z value with the formula:
p -p p -p
Z= =
σp p (1- p )
n
Suppose q is the probability of success and (1-q) is the probability of failure where
2
𝑞 = 3.
𝐸 𝑋 = 𝑛𝑞
Var 𝑋 = 𝑛𝑞(1 − 𝑞)
𝑋 𝑛𝑞
𝐸 𝑝 =𝐸 = =𝑞
𝑛 𝑛
𝑋 1 𝑞(1 − 𝑞)
Var 𝑝 = Var = 4 𝑛𝑞 1 − 𝑞 =
𝑛 𝑛 𝑛
16.11.2020 40
40
20
11/16/20
Example
• If the true proportion of voters who support
Proposition A is π = 0.4, what is the probability that
a sample of size 200 yields a sample proportion
between 0.40 and 0.45?
i.e.: if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
16.11.2020 41
41
Example
if 𝝅 = 𝟎. 𝟒 and 𝒏 = 𝟐𝟎𝟎, what is
𝑷(𝟎. 𝟒𝟎 ≤ 𝒑 ≤ 𝟎. 𝟒𝟓) ?
Find σ p: p (1- p ) 0.4(1- 0.4)
σp = = = 0.03464
n 200
Convert to æ 0.40 - 0.40 0.45 - 0.40 ö
P(0.40 £ p £ 0.45) = Pç £Z£ ÷
standardized normal: è 0.03464 0.03464 ø
= P(0 £ Z £ 1.44)
16.11.2020 42
42
21
11/16/20
Example
if 𝝅 = 𝟎. 𝟒 and 𝒏 = 𝟐𝟎𝟎, what is
𝑷(𝟎. 𝟒𝟎 ≤ 𝒑 ≤ 𝟎. 𝟒𝟓) ?
Utilize the cumulative normal table:
P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251
Standardized
Sampling Distribution Normal Distribution
0.4251
Standardize
0.40 0.45 0 1.44
p Z
16.11.2020 43
43
22