0% found this document useful (0 votes)
10 views19 pages

Lecture 6

The document is a lecture on estimation and confidence intervals in biostatistics, focusing on statistical inference and interval estimation methods. It explains point estimation, confidence intervals, and the implications of known versus unknown population standard deviation. The lecture also covers the use of the t-distribution for confidence intervals when the population standard deviation is unknown, along with practical applications related to fetal alcohol syndrome.

Uploaded by

taty minta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views19 pages

Lecture 6

The document is a lecture on estimation and confidence intervals in biostatistics, focusing on statistical inference and interval estimation methods. It explains point estimation, confidence intervals, and the implications of known versus unknown population standard deviation. The lecture also covers the use of the t-distribution for confidence intervals when the population standard deviation is unknown, along with practical applications related to fetal alcohol syndrome.

Uploaded by

taty minta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Biostatistics

Lecture 6
Estimation/Confidence Intervals
2024-1 Fall Semester

Instructor: Min Jin Ha


Department of Health Informatics and Biostatistics
Graduate School of Public Health
Yonsei University
Reading
• Pagano and Gauvreau, Chapter 9
Statistical Inference
• We have investigated the theoretical properties of a distribution of
sample means, we’re ready to take the next step and apply this
knowledge to the process of statistical inference

• Aim: estimate some characteristic of a continuous random variable


(e.g., mean) using information contained in a sample of observations
Interval Estimation
• Point estimation: using sample data to calculate a single number to
estimate the parameter of interest
• Sample mean 𝑥̅ to estimate the population mean 𝜇
• The problem is that two different samples are very likely to result in
different sample means à there is some degree of uncertainty
• A point estimate does not provide any information about the inherent
variability of the point estimator
• From CLT, we know that 𝑥̅ is more likely to be near the true population
mean if it is based on large sample
• Interval estimation provides a range of reasonable values that are intended
to contain the parameter of interest, a certain degree of confidence
What is Confidence Interval
• A confidence interval provides a range of reasonable values that are
intended to contain the parameter of interest with a certain degree of
confidence. It often takes the form
Point estimate ± margin of error
and is written
(point estimate – margin of error, point estimate + margin of error)
Caveat
• For illustration, we start by assuming 𝜎 is known.
• When 𝜎 is known?
• Almost never!
• However, it’s easier to understand if we assume that to start.
• By the end of the class, we’ll get rid of this assumption
Two-sided 95% Confidence Intervals (𝜎 known)
• A random variable 𝑋 has mean 𝜇 and standard deviation 𝜎
• The CLT states that
𝑋& − 𝜇
𝑍= ∼ 𝑁(0,1)
𝜎/√𝑛
• From Lecture 5, we know 𝑃 −1.96 < 𝑍 < 1.96 = 0.95
!
"#$
• Equivalently, 𝑃 −1.96 < %/√(
< 1.96 =0.95
• Given this, we are able to manipulate the inequality inside the parentheses
without altering the probability statement to the form
𝑃 𝐿 < 𝜇 < 𝑈 =0.95
Show how L and U are derived
Two-sided 95% Confidence Intervals (𝜎 known)
! !
𝑃 𝑋# − 1.96 < 𝜇 < 𝑋# + 1.96 =0.95
" √"
! !
• The quantities 𝑋" − 1.96 "
and "
𝑋 + 1.96 √" are 95% confidence limits for the
population mean 𝜇
• we are 95% confident that the interval will cover 𝜇
• If we were to select 100 random samples from the population and use these samples to
calculate 100 Cis for 𝜇, approximately 95% of the Cis would cover the true population mean 𝜇
and 5 would not
• Wrong Interpretations:
• There is 95% chance that 𝜇 lies in the interval
• Why it’s wrong? 𝜇 is fixed and does not move
Two-sided 1 − 𝛼 ×100% Confidence Intervals (𝜎 known)

• A generic confidence interval for 𝜇 can be obtained


$
• Let 𝑧$/& be the upper 𝛼/2 quantile, i.e., P 𝑍 > 𝑧 ! = = P 𝑍 < −𝑧!
" & "

• The generic form 1 − 𝛼 ×100% CI for 𝜇 is


𝜎 𝜎
(𝑋# − 𝑧% , 𝑋# + 𝑧% )
& 𝑛 & 𝑛
• when 𝛼 = 0.05, the 1 − 𝛼 ×100% CI is the 95% CI that we found
99% Confidence Interval
• For a 99% interval, we need the z-value that cuts off the top 0.5% or
0.005 of the distribution, which is ?
When can we use this CI?
! !
• The CI given by (𝑋 − 𝑧! , 𝑋# + 𝑧!
# ) is safe to use in the following
" " " "
circumstances when 𝜎 is known
• X is normal (regardless of sample size)
• X is non-normal but the sample size is large
• It is typically not safe to use this CI when the sample size is small and
X is not normal random variable.
How can we get a more narrow CI?
𝜎 𝜎
(𝑋# − 𝑧% #
, 𝑋 + 𝑧% )
& 𝑛 & 𝑛
• Known 𝜎
• Decrease the margin of error
1. Compromise on our level of confidence, e.g., 90% interval
2. Increase the sample size n!
What if 𝜎 is unknown?
𝜎 𝜎
(𝑋# − 𝑧% #
, 𝑋 + 𝑧% )
& 𝑛 & 𝑛
• The CI cannot be computed if 𝜎 is unknown
• We use the sample standard deviation 𝑠 as an estimate of 𝜎
• We never know 𝜎, and if we replace 𝜎 with 𝑠, then
• We can’t use the CLT
What do you do when 𝜎 is unknown?
• While working for Guiness brewery in Dublin, William
Sealy Gosset published a paper on the t distribution, which
became known as Student’s t distribution.
(He published under “Student” because the brewery didn’t
allow him to use his own name)

• The t distribution is appropriate for constructing a


confidence interval for the mean when we need to
account for the additional variability due to estimating 𝜎
with s
!
𝑺𝒕𝒖𝒅𝒆𝒏𝒕 𝒔 𝒕-distribution

• The Student ' 𝑠 𝑡-distribution to account


for the additional variability due to
estimating 𝜎 with 𝑠
• The t distribution looks a lot like the normal
except that it has fatter tails
• The parameter for t distribution is called degrees of freedom (df)
• As the df (denoted by 𝜈 in the figure)gets bigger, the t distribution looks
more and more like the normal
t distribution for CI
• The df measure the amount of information available in the data to
estimate 𝜎
̅
)*+
• The statistic 𝑡 = ,/ " has a t distribution with 𝑛 − 1 df (denoted by
𝑡"*. ). We use 1 df by estimating the sample mean 𝑥̅
• Thus, 𝑛 gets larger à 𝑠 gets to be a better estimate of 𝜎 à the
distribution of the t statistic looking more like the normal
• With large enough 𝑛, normal approximation can be used to construct
CI.
• CI from t distribution is wider, accounting for the uncertainty on 𝜎
What if 𝜎 is unknown?
• We use CI given by
𝑠 𝑠
P 𝑋# − t "*.,% < 𝜇<𝑋# + t "*.,%
& 𝑛 & 𝑛
%
Where t "*.,! is the quantile of probability 1- from t "*.
" &
• In R, use qt(p=0.975,df=n-1)
Applications
• Consider the distribution of heights for the population of individuals
between ages of 12 and 40 who suffer from fetal alcohol syndrome.
Fetal alcohol syndrome is the severe end of the spectrum of
disabilities caused by maternal alcohol use during pregnancy. The
distribution of heights has unknown mean 𝜇. A random sample of 31
patients is selected from the underlying population; the average
height for these individuals was 𝑥̅ = 147.4𝑐𝑚.
1. When 𝜎 is known to be 6cm, construct 90% and 99% confidence intervals
for 𝜇. Interpret the results.
2. When 𝜎 is not known and the sample standard deviation calculated to be
6cm, construct 90% and 99% confidence intervals for 𝜇. Interpret the results.

You might also like