CHI-SQUARE TEST SRT605/
FOR INDEPENDENCE SRT666
CHI-SQUARE TEST FOR INDEPENDENCE
To explore the relationship between two categorical
variables
Each of the variables can have two or more
categories (e.g. gender: M/F; smoker: Y/N)
The test compares the observed
frequencies/proportions of cases that occur in each
categories with the values that would be expected if
there was no association between the two variables
being measured
CHI-SQUARE TEST FOR INDEPENDENCE
Smoker
TOTAL
Yes No
Male
Gender
Female
TOTAL
Based on a crosstabulation/contingency table
Examples of RQ:
Is there an association between gender and
smoking behaviour?
Are males more likely to be smokers than females?
Is the proportion of males that smoke the same as
the proportion of females?
CHI-SQUARE TEST FOR INDEPENDENCE
Smoker
TOTAL
Yes No
Male
Gender
cell Female
TOTAL
Assumptions:
1. Random sampling
2. Independence of observations
3. No cell should have an expected count of < 5 OR
< 20% of cells with expected count of < 5
HOW TO CALCULATE EXPECTED COUNT ?
Group A
Total
Yes No
Observed a b a+b
Yes Expected
Group B
Observed c d c+d
No Expected
a+c b+d a+b+c+d
Expected count = Total of row x total of column
Total
HOW TO CALCULATE EXPECTED COUNT ?
Group A
Total
Yes No
Observed a b a+b
Yes Expected (a+c) x (a+b)
a+b+c+d
Group B
Observed c d c+d
No Expected
a+c b+d a+b+c+d
Expected count = Total of row x total of column TR X TC
T
Total
% cells with EC < 5 = No of cells with EC < 5 x 100
Total no of cells
A1
HOW TO CALCULATE EXPECTED COUNT ?
Expected count = Total of row x total of column
Total
= 22 x 31
69
= 9.9
Slide 7
A1 Ang, 31/5/2022
HOW TO CALCULATE EXPECTED COUNT ?
In this 2 X 3 table, there are no cell with EC <5
Assumption met for Chi-square test for independence
If you have a 2 X 2 table with any cell that has an EC < 10,
use Fisher’s Exact test
HOW TO CALCULATE % OF EC < 5 ?
Group A
Total
Yes No
Observed 25 16 41
High
Expected 18.48 22.52
Group Observed 4 20 24
Average
B Expected 10.82 13.18
Observed 3 3 6
Low
Expected 2.70 3.30
32 39 71
Percentage of Cells if EC less than 5 = No of cells with EC < 5 x 100
Total no of cells
= 2 x 100
6
= 33.33%
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Step 1: Generate H0 and HA (2-tailed)
H0 : The is no association between risk factors and
goal achievement (Observed = Expected)
HA : The is an association between risk factors and
goal achievement (Observed ≠ Expected)
Step 2: Set the significance level (α)
α = 0.05
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Step 3: Check the assumptions
Dataset : https://goo.gl/ng8Nc7
Assumptions 1 : Random sampling
Assumptions 2 : Independence of observations
Assumptions 3 : No cell should have an EC < 5 OR
< 20% of cells with EC < 5
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Analyze → Descriptive Statistics → Crosstabs
In Crosstabs dialogue box, move risk factor into Row(s) and
achieve_goal.3 in Column(s)
Click Statistics and select Chi-square → Continue
Click Cell → Tick Observed and Expected in the Counts box →
Continue → OK
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Assumptions 3 : No cell should have an EC < 5 OR
< 20% of cells with EC < 5
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Step 4: Determine the test statistic and p values
Refer to the Pearson’s Chi-square Asymp. Sig. value
The χ2 (2, n = 69) = 11.48
The p-value = 0.003
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Step 5: Statistical decision
Reject H0 as the p-value is <0.05
RQ: IS THERE AN ASSOCIATION BETWEEN RISK
FACTORS AND GOAL ACHIEVEMENT?
Step 6: Conclusion
A Chi-Square Test for Independence indicated a
significant association between risk factors and goal
achievement, χ2 (2, n = 69) = 11.48, p = .003
RQ: IS THERE AN ASSOCIATION BETWEEN GENDER
AND SMOKING BEHAVIOUR?
RQ: IS THERE AN ASSOCIATION BETWEEN GENDER
AND SMOKING BEHAVIOUR?
For a 2 X 2 table, refer to the Continuity Correction
Asymp. Sig. value
This is the Yates' Correction for Continuity which
compensates for the overestimate of the chi-square
value when used with a 2 X 2 table
RQ: IS THERE AN ASSOCIATION BETWEEN GENDER
AND SMOKING BEHAVIOUR?
The p-value = 0.562 i.e. >0.05
Fail to reject H0
A Chi-Square Test for Independence (with Yates'
Continuity Correction) indicated no significant
association between gender and smoking status,
χ2 (1, n = 436) = .34, p = .56
FISHER’S EXACT TEST
Fisher’s exact test is automatically generated and
provided as part of output from chi-square for a 2 X 2
table
Use when chi-square EC assumption is not met
THANK YOU…..