0% found this document useful (0 votes)

29 views56 pages

Measures of Association

The document discusses measures of association in statistical analysis, particularly focusing on the Chi-Square test for categorical data. It explains how to use a 2x2 contingency table to assess the association between two categorical variables and introduces concepts such as Relative Risk (RR) and Odds Ratio (OR). The document also covers the assumptions of the Chi-Square test and provides examples to illustrate the calculations involved.

Uploaded by

Bewket Chalachew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views56 pages

Measures of Association

Uploaded by

Bewket Chalachew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Measures of Association

• While a test of hypothesis can be used to

determine whether an association exists
between two random variables, it cannot
provide a measure of the strength of the
association
• Several methods are available for
estimating the magnitude of the effect
given the categorical data in a 2× 2
contingency table
• For the most part, we have been applying
the techniques of hypothesis testing to
either continuous or ordinal data
• What about nominal data?
• Instead of using the normal approximation
to the binomial distribution, we could reach
the same conclusion using different
techniques
1. Categorical Data
1. Chi-Square Test
• A Chi-Square (χ2) is a probability
distribution used to make statistical
inferences about categorical data
(proportions) in which the numbers of
categories are two or more.
• Widely used in the analysis of
contingency tables.
• Chi-Square test allows us to test for
association between two categorical
variables.
Ho: No association between the variables.
HA: There is association
• Consequently a significant p-value implies
association.
• Chi-Square test compares observed to
expected counts (frequencies) under the
assumption of no association (or Ho is
true)

• With this method, data are arranged in the

form of a contingency table.
X2 Distribution
• Indexed by the degrees of freedom (n)
• Unlike z and t distributions, which are always
symmetric about 0, the X2 distribution only
takes on positive values and is always
skewed to the right.
• The skewness diminishes as n increases
Rejection
Acceptance region
region
0,95
0.05

18.307 210
Contingency Table
• A table composed of rows cross-classified
by columns
• A 2x2 contingency table is a table
composed of two rows cross-classified by
two columns
• Appropriate to display data that can be
classified by two different variables, each
of which has only two possible outcomes
Example
• A study was conducted to look at the effects of
oral contraceptives (OC) on heart disease in
women 40 to 44 years of age. It is found that
among 5000 current OC users at baseline, 13
women develop a myocardial infarction (MI) over
a 3-year period, where as among 10,000 non-
OC users, 7 develop an MI over a 3-year period.
– P1 = 0.0026, P2 = 0.0007
– Z-test = 2.77, P-value = 0.006
– There is a highly significant association between MI
and OC use
Display the above data in the form of a 2x2
contingency table

MI status over
OC-use 3 years
group Yes No Total
OC users 13 4987 5000

Non-OC 7 9993 10,000

users
Total 20 14,980 15,000

Is the proportion of MI the same in OC users and non-OC users?

What can be said about the relationship between MI status and OC use?
Definition
• Chi-Square test is a statistic which
measures the discrepancy between k
observed frequencies O1, O2,…Ok and the
corresponding expected frequencies E1,
E2,… Ek.
• When the Ho of no association is true, the observed and expected
counts will be similar, their difference will be close to zero, resulting
in a SMALL chi square statistic value.
• When the HA of an association is true the observed counts will be
unlike the expected counts, their difference will be non zero and
their squared difference will be positive, resulting in a LARGE
POSITIVE chi square statistic value.
• Chi-Square test is based on the table of Χ2
for different degrees of freedom (df).
• Requires 2x2 table
• If the value of χ2 is zero, no discrepancy
between the observed and the expected
frequencies.
• The greater the discrepancy, the larger will
be the value of χ2.
• The calculated value of χ2 is compared
with the tabulated value for the given df.
Degrees of Freedom
• Counts in the Chi-Square Test of a 2x2
table are represented as “a”, “b”, “c” and
“d”.
• The general calculation:

• is the same calculation as the following

shortcut formula:
Expected Value
• Is the product of the row total multiplied by
the column total, divided by the grand total

• The expected numbers must be computed for

each cell.
Example
• Compute the expected table for the OC-MI
data in the previous example
MI status over
OC-use 3 years
group Yes No Total
OC users 13 4987 5000

Non-OC 7 9993 10,000

users
Total 20 14,980 15,000
Example
• Compute the expected table for the OC-MI
data in the previous example
MI status over 3-
OC use group years
Yes No Total
OC users 6.7 4993.3 5000
Non-OC users 13.3 9986.7 10,000
Total 20 14,980 15,000

• X2 ≈ 8, 0.001 <p-value < 0.005

Example

X2 = 8.30, P-value = 0.004

Example: Observed Numbers
Response by Treatment
Expected Numbers
Shortcut Formula for 2x2 Tables
Example
• A study was conducted to investigate the
possible cause of gastroenteritis outbreak
following a lunch served in a high school
cafeteria. Among the 225 students who
ate the sandwiches, 109 became ill. While,
among the 38 students who did not eat the
sandwiches, 4 became ill.
• Present the data by 2x2 contingency table
• With this method, data are arranged in the
form of a contingency table

• This is a 2 × 2 table for two dichotomous

random variables
• We again wish to know whether the
proportions of students who became ill in
each of the groups are identical
• To carry out the test, we first calculate the
expected counts for the table assuming
that:
H0: p1 = p2
HA: p1 ≠ p2
p1 = 48.44%, p2 = 10.52%
Z test = 4.36
• Expected counts are represented as follows:
• The chi-square test compares the
observed frequencies in each category
with the expected frequencies given that
H0 is true
• Are the deviations between Observed and
Expected too large to be attributed to
chance?
• To determine this, deviations from all 4
cells must be combined
• Calculate the sum:
• The Ho is rejected at α level if X2 is too
large, in particular, if X2 > X21,α
• If α = 0.05, we would reject H0 for X2
greater than X21,α = 3.84
• Therefore, we reject the Ho
• The p-value is given by the area under the
X2 distribution to the right of X2
• P-value < 0.001
Relationship between X2 and Z test
X2 = Z2
19 = (4.36)2
19 ≈ 19.01
Assumptions of the 2 - test
• No expected value in the table is <5, and
no more than 20% of the expected
frequencies should be <5.
• If this does not hold
• - row or column variables categories can
sometimes be combined to make the
expected frequencies larger or
• - use Yates correction
• For 2x2 table, when the total no of
observations is less than 20 or when it is
greater than 20 and the smallest of the
four expected frequencies is < 5,
use Fisher’s Exact test.
Fisher’s Exact Test

• Given the fixed margins, the probability of

obtaining the specific table which was
observed is
• Both the Chi-square test and the exact test
can be generalized to allow the
comparison of three or more proportions
• The data are arranged in the form of an
R × C contingency table
2. Relative Risk (RR)
• Or Risk Ratio
• Defined as the ratio of the incidence of
disease in the exposed group divided by
the corresponding incidence of disease in
the non-exposed group
• A point estimate of the risk ratio
(RR=p1/p2) is given by:
Disease
Exposure Yes No Total
Yes a b a+b
No c d c+d
Total a+c b+d N

RR = a/a+b
c/c+d
1st Give Breast Cancer
Birth Yes No Total
≥25 years 31 1597 1628
<25 years 65 4475 4540
Total 96 6072 6168

RR = a/a+b
c/c+d
a/a+b = 31/1628 = 0.019
b/b+d = 65/4540 = 0.014

• Women who first give birth at an older age

are 36% more likely to develop breast
cancer
• To obtain a CI for the RR,

• Where, n1=a+b n2=c+d,

ln=natural logarithm
• Exponentiate each side to get a CI for RR
• For the breast cancer data, a 95% CI for
ln(RR) is
• Consequently, a 95% CI for RR itself is

or
(0.89, 2.08)
• This interval contains the value 1
3. The Odds Ratio
• The odds ratio (OR) is the odds in favor of
disease for the exposed group divided by
the odds in favor of disease for the
unexposed group
• The odds in favor of disease = p/(1-p),
where p = probability of a disease
• Odds = Pr (event occurs) / Pr (event does
not occur) = p/(1-p)
• The odds ratio defined as:

=
• Is estimated by
Example:
• In a study of the risk factors for invasive
cervical cancer, the following data were
collected (Case-Control):
• The odds ratio is estimated by:

• Women with cancer have an odds of

smoking that are 1.52 times the odds of
those without cancer
• A CI can be constructed for OR
• To find a CI for the underlying OR, we first
find a CI for ln(OR) = (c1,c2), where
• Exponentiate the upper and lower confidence
limits for the natural log of the OR:

1 1 1 1 1 1 1 1
ln ORˆ − Z + + + ln ORˆ + Z + + +
a b c d a b c d
e ,e
• For the cervical cancer data,

• Therefore, a 95% CI for ln(OR) is

ln(1.52) ± 1.96(0.166)
or
(0.093, 0.744)
• A 95% CI for the OR itself is

or
(1.10, 2.13)
• This interval does not contain the value 1
• We conclude that the odds of developing
cervical cancer are significantly higher for
smokers than for nonsmokers
Example: Odds of Death
Related to Vit A use (Case-Control Study)
• What is the estimated OR?

• Estimated OR = (46/61)/(74/59)=0.60
• 95% CI = (0.36, 1.04)

1measures of Association
No ratings yet
1measures of Association
105 pages
10measures of Association
No ratings yet
10measures of Association
249 pages
Categorical Data Analysis Guide
No ratings yet
Categorical Data Analysis Guide
194 pages
Lecture 4&5-Categorical Data Analysis
No ratings yet
Lecture 4&5-Categorical Data Analysis
85 pages
Statistical Method of Categorical Variable
No ratings yet
Statistical Method of Categorical Variable
68 pages
Basic Biostatistics - Wakgari Module 17-21
No ratings yet
Basic Biostatistics - Wakgari Module 17-21
82 pages
Chi-Square Test Fall Semester 2024
No ratings yet
Chi-Square Test Fall Semester 2024
21 pages
Biostatistics: Categorical Hypothesis
No ratings yet
Biostatistics: Categorical Hypothesis
47 pages
Chapter 8-10 Contigency Table, Correlation and Regression
No ratings yet
Chapter 8-10 Contigency Table, Correlation and Regression
91 pages
Categorical Data Analysis: 48Th Icro-Sun PG Teaching Programme 26 & 27 OCTOBER, 2024
No ratings yet
Categorical Data Analysis: 48Th Icro-Sun PG Teaching Programme 26 & 27 OCTOBER, 2024
57 pages
Probability and Statistics - Lecture 4
No ratings yet
Probability and Statistics - Lecture 4
35 pages
Analysis of Categorical Data and Epidemiologic Studies - Topic 8
No ratings yet
Analysis of Categorical Data and Epidemiologic Studies - Topic 8
52 pages
Analysis of Categorical Data
No ratings yet
Analysis of Categorical Data
75 pages
6.3 Chi-Square
No ratings yet
6.3 Chi-Square
35 pages
Statistical Analysis Techniques
No ratings yet
Statistical Analysis Techniques
79 pages
12 Chi As 2-1
No ratings yet
12 Chi As 2-1
33 pages
Data Analysis:: Simple Statistical Tests
100% (1)
Data Analysis:: Simple Statistical Tests
45 pages
Categorical Data - spss2019
No ratings yet
Categorical Data - spss2019
62 pages
6 Contingency Tables
No ratings yet
6 Contingency Tables
72 pages
Chi Square Test PDF
No ratings yet
Chi Square Test PDF
82 pages
Chi-Square Test
No ratings yet
Chi-Square Test
10 pages
Goodness of Fit Tests Contingency Tables
No ratings yet
Goodness of Fit Tests Contingency Tables
49 pages
Inferences On Two-Way Contingency Tables
No ratings yet
Inferences On Two-Way Contingency Tables
45 pages
Lecture10 Chisquare Tests
No ratings yet
Lecture10 Chisquare Tests
45 pages
Chi-Square Test for PG Students
No ratings yet
Chi-Square Test for PG Students
32 pages
Chi-Square Test
No ratings yet
Chi-Square Test
16 pages
Categorical Data Analysis Guide
No ratings yet
Categorical Data Analysis Guide
44 pages
Chi Square Test Guide & Examples
100% (2)
Chi Square Test Guide & Examples
14 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
Analysis of Categorical Data and Epidemiologic Studies - Topic 8 22042024
No ratings yet
Analysis of Categorical Data and Epidemiologic Studies - Topic 8 22042024
53 pages
Statistical Theory Lecture 5-2025
No ratings yet
Statistical Theory Lecture 5-2025
13 pages
Chi-Square Test Guide
No ratings yet
Chi-Square Test Guide
27 pages
Categorical Data Analysis 2023
No ratings yet
Categorical Data Analysis 2023
73 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
40 pages
2measures of Effect (May 2024)
No ratings yet
2measures of Effect (May 2024)
209 pages
Ch10 TestCD4
No ratings yet
Ch10 TestCD4
13 pages
Quiz 4 Review
No ratings yet
Quiz 4 Review
21 pages
Week 6 Lecture 1 - 2023-2024
No ratings yet
Week 6 Lecture 1 - 2023-2024
47 pages
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
No ratings yet
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
4 pages
Comparing Frequencies Using Chi-Square
No ratings yet
Comparing Frequencies Using Chi-Square
4 pages
Chi-Square Test for Categorical Data
No ratings yet
Chi-Square Test for Categorical Data
13 pages
Logistic Regression - 2021 ch-8
No ratings yet
Logistic Regression - 2021 ch-8
52 pages
BS IMI U8 Oct23
No ratings yet
BS IMI U8 Oct23
100 pages
Chi Square Test
No ratings yet
Chi Square Test
23 pages
QM Lecture 10 - Chi Square Tests
No ratings yet
QM Lecture 10 - Chi Square Tests
48 pages
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
No ratings yet
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
37 pages
The Chi Square Test
100% (1)
The Chi Square Test
57 pages
Chi-Square (X2) Distribution
No ratings yet
Chi-Square (X2) Distribution
35 pages
Non Parametric Test
No ratings yet
Non Parametric Test
102 pages
Methods For Proportions
No ratings yet
Methods For Proportions
19 pages
Chisquare Final
No ratings yet
Chisquare Final
16 pages
Chapter12 - X2 - Student
No ratings yet
Chapter12 - X2 - Student
31 pages
Chi-Square Test Guide & Example
No ratings yet
Chi-Square Test Guide & Example
16 pages
Chapter#8 Association
No ratings yet
Chapter#8 Association
59 pages
Chi-Square Analysis in Research
No ratings yet
Chi-Square Analysis in Research
17 pages
Chi Square Statistics
100% (1)
Chi Square Statistics
7 pages
When To Use Chi Square
No ratings yet
When To Use Chi Square
5 pages
Chi-square (χ2) test compiled
No ratings yet
Chi-square (χ2) test compiled
34 pages
Drinking Water
No ratings yet
Drinking Water
8 pages
2.introduction To Environmental Health
No ratings yet
2.introduction To Environmental Health
4 pages
Estimation
No ratings yet
Estimation
106 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Led Screen 2012
No ratings yet
Led Screen 2012
3 pages
Generalized Die Hard Number Theory
No ratings yet
Generalized Die Hard Number Theory
3 pages
Millwright Expertise & Experience
No ratings yet
Millwright Expertise & Experience
2 pages
TOFD vs Radiography for Steel Welds
No ratings yet
TOFD vs Radiography for Steel Welds
6 pages
Westby Play Scale 2000
0% (1)
Westby Play Scale 2000
5 pages
Historical Development of Language
No ratings yet
Historical Development of Language
2 pages
(Ebook) Incompleteness - The Proof and Paradox of Kurt Gödel ( Goedel) by Rebecca Goldstein ISBN 9780393051698, 0393051692 Latest PDF 2025
100% (1)
(Ebook) Incompleteness - The Proof and Paradox of Kurt Gödel ( Goedel) by Rebecca Goldstein ISBN 9780393051698, 0393051692 Latest PDF 2025
148 pages
Quadratic Equation (Short Notes)
No ratings yet
Quadratic Equation (Short Notes)
3 pages
MSDS Bolidt Future Teak (01 B en GHS) PDF
No ratings yet
MSDS Bolidt Future Teak (01 B en GHS) PDF
12 pages
BOND & HOBSON, 2023 - Reporting Stable-Isotope Ratios in Ecology Recommended Terminology, Guidelines and Best Practices
No ratings yet
BOND & HOBSON, 2023 - Reporting Stable-Isotope Ratios in Ecology Recommended Terminology, Guidelines and Best Practices
9 pages
Unisza:taf3023: Discrete Mathematics Presentation 1:forespec Group
No ratings yet
Unisza:taf3023: Discrete Mathematics Presentation 1:forespec Group
29 pages
Jefferson - Catalogo de Solenoides
No ratings yet
Jefferson - Catalogo de Solenoides
8 pages
Inspection and Test Plan For Waterproofing Works
67% (3)
Inspection and Test Plan For Waterproofing Works
8 pages
Reelcraft Catalog 2021
No ratings yet
Reelcraft Catalog 2021
20 pages
Producing Multimedia Content
No ratings yet
Producing Multimedia Content
9 pages
As Media Studies Coursework Blog
100% (2)
As Media Studies Coursework Blog
6 pages
A Building Management System (BMS) 2
No ratings yet
A Building Management System (BMS) 2
213 pages
Brochure - Global AI Nov - 2024
No ratings yet
Brochure - Global AI Nov - 2024
4 pages
012 Tacr 01a PDF
No ratings yet
012 Tacr 01a PDF
399 pages
Hydraulic Schematic 6060 FS (MDG+SIL) - 2 - 2013
No ratings yet
Hydraulic Schematic 6060 FS (MDG+SIL) - 2 - 2013
1 page
Gulf Transcrest Synthetic HD ATF
No ratings yet
Gulf Transcrest Synthetic HD ATF
1 page
DipTrace Tutorial
No ratings yet
DipTrace Tutorial
134 pages
Salt Contamination
100% (1)
Salt Contamination
8 pages
Entrepreneurship OUM: MPU 2222/MPU3223 - V2 Topic 2: Identifying Entrepreneurial Characteristics
100% (1)
Entrepreneurship OUM: MPU 2222/MPU3223 - V2 Topic 2: Identifying Entrepreneurial Characteristics
8 pages
Interpretable Machine Learning
100% (4)
Interpretable Machine Learning
251 pages
Mid Pre Board
No ratings yet
Mid Pre Board
6 pages
February 2023
No ratings yet
February 2023
2 pages
Case of Contagious Cook
No ratings yet
Case of Contagious Cook
5 pages
Conversation
No ratings yet
Conversation
57 pages
Sardar Resume Acad
No ratings yet
Sardar Resume Acad
2 pages

Measures of Association

Uploaded by

Measures of Association

Uploaded by

Measures of Association

• While a test of hypothesis can be used to

• With this method, data are arranged in the

Non-OC 7 9993 10,000

Is the proportion of MI the same in OC users and non-OC users?

• is the same calculation as the following

• The expected numbers must be computed for

Non-OC 7 9993 10,000

• X2 ≈ 8, 0.001 <p-value < 0.005

X2 = 8.30, P-value = 0.004

• This is a 2 × 2 table for two dichotomous

• Given the fixed margins, the probability of

• Women who first give birth at an older age

• Where, n1=a+b n2=c+d,

• Women with cancer have an odds of

• Therefore, a 95% CI for ln(OR) is

You might also like