Stats for Health Researchers
Stats for Health Researchers
• Studied confidence intervals and tests for a single population mean μ, a single
populations proportion and a single population variance σ2 .
α α
= 0.025 = 0.025
2 2
x
Intervals μx = μ
extend from ‾
𝑋1
x1
σ ‾
x2 𝑋2 (1-)100%
X − Zα / 2 of intervals
n
to constructed
σ contain μ;
X + Zα / 2
n ()100% do
not.
Confidence Intervals
Student 𝑡 Distribution
If a population has a normal distribution, then the distribution of
𝑥‾ − 𝜇
𝑡= 𝑠
𝑛
(where tα/2 is the critical value of the t distribution with n -1 degrees of freedom
and an area of α/2 in each tail)
S
X tα / 2
n
Student’s t Distribution
Note: t Z as n increases
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
The Null Hypothesis, H0
▪ Begin with the assumption that the null Verdict Defendant
hypothesis is true Χ of Jury Innocent Guilty
• Similar to the notion of innocent until
Not Guilty Correct Incorrect
proven guilty
Guilty Incorrect Correct
▪ Refers to the status quo or historical value
▪ Always contains “=“, or “≤”, or “≥” sign Population
Result of Test
▪ May or may not be rejected 𝜇 = 𝜇0 𝜇 ≠ 𝜇0
Region of Region of
Rejection Rejection
Region of
Non-Rejection
Critical Values
Level of significance =
H0: μ = 211
H1: μ ≠ 211
/2 /2
211
Critical values
Rejection Region
This is a two-tail test because there is a rejection region in both tails
Z Test of Hypothesis for the Mean (σ Known)
• Convert sample statistic (X ) to a ZSTAT test statistic
Hypothesis
Tests for
σ Known
Large sample σ Unknown
Small sample
(Z test) Unknown
The test statistic is: (t test)
X −μ
ZSTAT =
σ
n
p-Value Approach to Testing: Interpreting the
p-value
• Compare the p-value with
• If p-value < , reject H0
• If p-value > , do not reject H0
• Remember
X −μ
t STAT =
S
n
Exact Methods for Testing Claims About a
Population Proportion p
• Instead of using the normal distribution as an approximation to the binomial
distribution, we can get exact results by using the binomial probability distribution
itself. Binomial probabilities are a real nuisance to calculate manually, but
technology makes this approach quite simple. Also, this exact approach does not
require that np ≥ 5 and, so we have a method that applies when that requirement is
not satisfied.
Objectives
In this lecture, you learn:
• How to use hypothesis testing for comparing the difference between
• The means
• The proportions
Identifying the Target Parameter
• For instance, an epidemiologist might need to estimate the difference in mean life
expectancy between inner-city and suburban residents.
Population Population
Means, Means, Population Population
Independent Related Proportions Variances
Samples Samples
Examples:
Group 1 vs. Same group Proportion 1 vs. Variance 1 vs.
Group 2 before vs. after Proportion 2 Variance 2
treatment
Comparing two means
Comparing two (sub-)sample means
• For example: blood cholesterol levels between pre-menopausal and
post-menopausal women
• Another example: lung function (FEV1) between asthmatic and
non-asthmatic individuals
• Sample means will be different, even by little
Comparing two (sub-)sample means
• For example: blood cholesterol levels between pre-menopausal and post-
menopausal women
• Another example: lung function (FEV1) between asthmatic and non-
asthmatic individuals
• Sample means will be different, even by little
σ1 and σ2 unknown,
Use Sp to estimate unknown σ.
assumed equal
Use a Pooled-Variance t test.
Two-tail test:
H0: μ1 = μ2
H1: μ1 ≠ μ2
i.e.,
H0: μ1 – μ2 = 0
H1: μ1 – μ2 ≠ 0
Choose the Appropriate Test
• Use a t-test if variances are unknown and sample size is small.
• Consider if the test should be paired or unpaired based on study design.
Hypothesis tests for μ1 – μ2
Two Population Means, Independent Samples
/2 /2
σ1 and σ2 unknown,
* distributed or both sample
sizes are at least 30
assumed equal If in doubt, you may test normality
(Q-Q plots, Shapiro-Wilk test)
σ1 and σ2 unknown,
not assumed equal ▪ Population variances are
unknown but assumed equal
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown
and assumed equal If these assumptions do not hold:
...then we can perform a non-parametric test
Population means, called the
independent samples Wilcoxon-Mann-Whitney (or Mann-Whitney)
test
σ1 and σ2 unknown,
* • Assesses whether the two samples come from
the same distribution
assumed equal • Non-parametric = it does not assume
normality of the samples
• The Mann-Whitney test is the non-
σ1 and σ2 unknown,
parametric “brother” of the two-sample t-
not assumed equal
test
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown
and assumed equal
Population means, • The pooled variance is:
independent samples S 2
=
(n1 − 1)S1 + (n2 − 1)S2
2 2
(n1 − 1) + (n2 − 1)
p
2 1 1
S p +
σ1 and σ2 unknown, n1 n 2
not assumed equal
• Where tSTAT has d.f. = (n1 + n2 – 2)
Confidence interval for µ1 - µ2 with σ1 and σ2 unknown
and assumed equal
Population means, The confidence interval for
independent samples μ1 – μ2 is:
σ1 and σ2 unknown,
assumed equal * ( X1 − X 2 ) tα/2 2
Sp
1
+
1
n1 n 2
σ1 and σ2 unknown,
not assumed equal Where tα/2 has d.f. = n1 + n2 – 2
Pooled-Variance t Test Example
You are a pneumonologist and investigate the effect of second-hand smoke. Is
there a difference in absorbed nicotine (indicated by cotinine) between smokers
and passive smokers listed on the Smoke & Environmental Smokers? You
collect the following data:
SMOK ENV.SMOK
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are
approximately normal with equal
variances, is there a difference in
mean yield ( = 0.05)?
Pooled-Variance t Test Example: Calculating the
Test Statistic
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
The test statistic is: H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
t=
(X1 − X 2 ) − (μ1 − μ 2 )
=
(3.27 − 2.53) − 0 = 2.040
2 1 1
1
1.5021 +
1
Sp +
n1 n 2 21 25
S =
2 (n1 − 1)S1
2
+ (n 2 − 1)S 2
2
=
(21 − 1)1.30 2 + (25 − 1)1.16 2
= 1.5021
(n1 − 1) + (n2 − 1) (21 - 1) + (25 − 1)
p
Interpret Results
• If p < 0.05, we reject the null hypothesis, concluding that the means are
significantly different.
• The p-value measures the probability that the observed difference between groups
could have occurred by random chance. A smaller p-value (typically < 0.05)
suggests a statistically significant result.
Pooled-Variance t Test Example: Hypothesis Test
Solution Reject H Reject H 0 0
(X − X ) t
1 2 /2
2
p
1 1
S + = 0.74 2.0154 0.3628 = (0.009, 1.471)
n1 n 2
Since 0 is less than the entire interval, we can be 95% confident that µSMOKE > µENV-
SMOKE
A confidence interval provides a range within which the true difference between two
means or proportions is likely to fall, with a certain level of confidence (usually 95%).
Self-Assessment Exercises
Comparing Two Methods of Teaching
Suppose you wish to compare a new method of teaching reading to "learners"
with the current standard method. You decide to base your comparison on the
results of a reading test given at the end of a learning period of six months. Of a
random sample of 22 "learners," 10 are taught by the new method and 12 are
taught by the standard method. All 22 students are taught by qualified instructors
under similar conditions for the designated six-month period. The results of the
reading test at the end of this period are given in Table.
Reading Test Scores for Learners
New Method Standard Method
80 80 79 81 79 62 70 68
76 66 71 76 73 76 86 73
70 85 72 68 75 66
Cystic fibrosis-iron in blood
• Consider the distribution of serum iron levels for the population of healthy
children and the population of children with cystic fibrosis.
• A random sample is selected from each population. The sample of n1 = 9 healthy
children has mean serum iron level x1 = 18.9μmol/l and standard deviation s1 =
5.9μmol/l; the sample of n2 = 13 children with cystic fibrosis has mean iron level
x2 = 11.9μmol/1 and standard deviation s2 = 6.3μmol/1. Is it likely that the
observed difference in sample means 18.9 versus 11.9μmol/l is the result of
chance variation, or should we conclude that the discrepancy is due to a true
difference in population means?
Intervals
Di = X1i - X2i
• Eliminates Variation Among Subjects
• Assumptions:
• Differences are normally distributed
• Or, if not Normal, use large samples
Related Populations
The Paired Difference Test-FYI…only
The ith paired difference is Di , where Di = X1i - X2i
Related
samples The point estimate for the n
(Paired samples) paired difference population D i
mean μD is D : D= i =1
n
n
i
The sample standard
(D − D ) 2
deviation is SD
SD = i=1
n −1
n is the number of pairs in the paired sample
The Paired Difference Test:
Finding tSTAT
• The test statistic for μD is:
Related D − μD
samples t STAT =
SD
(Paired samples)
n
(D − D)
i
2
where SD = i=1
n −1
Paired Difference Test: Example
• Assume the University introduces the seminar “antimicrobial prescription” for young
doctors. Has the course made a difference in the number of antimicrobial tablets
prescribed? You collect the following data:
M.O. 4 0 - 4 n −1
-21 = 5.67
Paired Difference Test: Solution
• Has the training made a difference in the number of
Reject Reject
antimicrobial tablets (at the 0.01 level)?
H0: μD = 0 /2
/2
H1: μD 0
- 4.604 4.604
= .01 D = - 4.2 - 1.66
t0.005 = ± 4.604
d.f. = n - 1 = 4 Decision: Do not reject H0
(tstat is not in the rejection region)
Test Statistic:
Conclusion: There is
D − μ D − 4.2 − 0 insufficient of a change in the
t STAT = = = −1.66
SD / n 5.67/ 5 number of antimicrobial
tablets.
The Paired Difference Confidence Interval -- Example
SD
The confidence interval for μD is: D t / 2
n
D = -4.2, SD = 5.67
5.67
99% CI for D : − 4.2 4.604
5
= (-15.87, 7.47)
Medical Example:
• When estimating the rate of a side effect in a large trial, normal
approximation can provide quick, reasonable estimates.
Two Population Proportions
Goal: test a hypothesis or form a
confidence interval for the difference
Population
proportions
between two population proportions,
p1 – p2
X1 + X2
p=
n1 + n2
where X1 and X2 are the number of
items of interest in samples 1 and 2
Two Population Proportions
The test statistic for
Population p1 – p2 is a Z statistic:
proportions
X1 + X 2 X1 X2
where p= , p1 = , p2 =
n1 + n 2 n1 n2
Hypothesis Tests for Two Population Proportions
Population proportions
Two-tail test:
H0: p1 = p2
H1: p1 ≠ p2
i.e.,
H0: p1 – p2 = 0
H1: p1 – p2 ≠ 0
Hypothesis Tests for Two Population Proportions
Population proportions
Lower-tail test: Upper-tail test: Two-tail test:
H0: p1 – p2 0 H0: p1 – p2 ≤ 0 H0: p1 – p2 = 0
H1: p1 – p2 < 0 H1: p1 – p2 > 0 H1: p1 – p2 ≠ 0
/2 /2
-1.96 1.96
-2.20
Decision: Reject H0
Conclusion: There is evidence
of a significant difference in
Critical Values = ±1.96
For = .05 the proportion of men and
women who will choose
women-gender related
specialties.
Confidence Interval for
Two Population Proportions
Since this interval does not contain 0 can be 95% confident the two
proportions are different.
Clinical Significance vs. Statistical Significance
a. Use a 0.05 significance level to test the claim that the fatality rate of occupants
is lower for those in cars equipped with airbags.
b. Construct a 90% confidence interval estimate of the difference between the two
population proportions. What does the result suggest about the claim that “the
fatality rate of occupants is lower for those in cars equipped with airbags”?
Recap - what we learned today
• Getting a 95% CI for a sample mean (one-sample t-test) and comparing two
sample means (two-sample t-test) by getting a 95% for their difference
• If the 95% for the difference does not include the null value (=zero) then the
difference is described as “statistically significant”
• If the 95% includes the null value, then the difference is described as
statistically non-significant
• Thank you!