0% found this document useful (0 votes)
8 views24 pages

Stat 8

The document outlines the statistical methods for analyzing the association between categorical variables, focusing on concepts such as residuals, measures of association, relative risk, and odds ratios. It discusses the misuse of the chi-square test and provides examples of calculating standardized residuals and interpreting measures of association. Additionally, it emphasizes the importance of checking p-values when interpreting the strength of associations.

Uploaded by

beliz tuzel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

Stat 8

The document outlines the statistical methods for analyzing the association between categorical variables, focusing on concepts such as residuals, measures of association, relative risk, and odds ratios. It discusses the misuse of the chi-square test and provides examples of calculating standardized residuals and interpreting measures of association. Additionally, it emphasizes the importance of checking p-values when interpreting the strength of associations.

Uploaded by

beliz tuzel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

WEEK 9.

ANALYZING THE ASSOCIATION


BETWEEN
CATEGORICAL VARIABLES (CONT.)

STATISTICAL METHODS IN SOCIOLOGY II


SOC 242
Spring 2024-2025

Tuesday, 14:40-16:30 G204


Thursday, 12:40-14:30 G204

FACULTY OF ARTS AND SCIENCES


DEPARTMENT OF SOCIOLOGY
SESSION PLAN

o Residuals

o Measure of Association

o Relative Risk (RR)

o Odds Ratio (OR)

o Small sample sizes: Fisher’s Exact Test

Statistical Methods in Sociology II, Week 9 2


MISUSE OF CHI-SQUARE TEST

▪ When some of the expected frequencies are too small.


▪ When separate rows or columns are dependent samples,
such as when each row of the table has the same subjects.
▪ To data that do not result from a random sample or
randomized experiment.
▪ To data by classifying quantitative variables into
categories. This results in a loss of information. It is
usually more appropriate to analyze the data with
methods for quantitative variables.

Statistical Methods in Sociology II, Week 9 3


RESIDUALS
• Residuals help us to reveal the pattern of association
• A residual is the difference (𝑓𝑜 -𝑓𝑒 ) between an
observed and expected cell frequency.

Gender
Male Female Total
Subjective health
Very good 172 (155)
+17 148 (165)-17 320
Good 385 (380) 781
+5 396 (401)-5
Fair 179 (189)-
10 210 (200)10 389
Poor 27 (39)-12 53 (41)12 80
Total 763 807 1570
Data source: WVS Turkey, 2011

Statistical Methods in Sociology II, Week 9 4


CHI-SQUARED (Χ2 )

RESIDUALS

O E O-E (O-E)2 (O-E)2/E

172 155 17 289 1.865


148 165 -17 289 1.751
385 380 5 25 0.066
396 401 -5 25 0.062
179 189 -10 100 0.529
210 200 10 100 0.5
27 39 -12 144 3.692
53 41 12 144 3.512

2 = 
( fo − fe )2
Total: 12.043
fe

Statistical Methods in Sociology II, Week 9 5


BUT HOW BIG IS THE RESIDUAL TO
INDICATE A SIGNIFICANT DEVIATION?
Standardized residual: is the number of standard
errors that 𝑓𝑜 −𝑓𝑒 falls from the value of 0 that we expect
when 𝐻0 is true.
𝑓 −𝑓 𝑓𝑜 −𝑓𝑒
z= 𝑜 𝑒=
𝑠𝑒 𝑓𝑒 (1−𝑟𝑜𝑤 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)(1−𝑐𝑜𝑙𝑢𝑚𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)

Gender
Male Total
172−155
Subjective z= 155 155
health 155 1− (1− )
320 763
Very good 172 (155)
17 320
z = -0.27 2.13
Total 763

Statistical Methods in Sociology II, Week 9 6


ALE ASSIGNMENT
Gender
Male Female Total
Subjective health
Very good 172 (155) +17 148 (165)-17 320
Good 385 (380)
396 (401)-5 781
+5
Fair 179 (189)-10 210 (200)10 389
Poor 27 (39)-12 53 (41)12 80
Total 763 807 1570
Data source: WVS Turkey, 2011

Calculate standardized residuals for each cell and write whether there is “dependence” or “independence”
• Reporting “very good” health status and being “male”
• Reporting “very good” health status and being “female”
• Reporting “good” health status and being “male”
• Reporting “good” health status and being “female”
• Reporting “fair” health status and being “male”
• Reporting “fair” health status and being “female”
• Reporting “poor” health status and being “male”
• Reporting “poor” health status and being “female” 30 May
send via e-mail

Statistical Methods in Sociology II, Week 9 7


BUT HOW BIG IS THE RESIDUAL TO
INDICATE A SIGNIFICANT DEVIATION?

▪ A standardized residual reports the number of


standard errors that an observed count falls from
its expected count. The se describes how much the
(observed - expected) difference would tend to
vary in repeated random sampling if the variables
were independent.

▪ When H0: independence is true, the standardized


residuals have approximately a standard normal
distribution: They fluctuate around a mean of 0,
with a standard deviation of 1.

Statistical Methods in Sociology II, Week 9 8


MEASURE OF ASSOCIATION
➢ Is there an association? Chi-square
➢ How do each cell differ from independence? Standardized residuals
➢ How strong is the association? Measure of association

Measure of association is a statistic or a parameter that summarizes


the strength of the dependence between two variables.
This measure falls between -1 and 1. The stronger the association, the
larger the absolute value of the measure of association.
(i.e. -1 and 1 extreme association, 0 no association)

Statistical Methods in Sociology II, Week 9 9


MEASURE OF ASSOCIATION:
DIFFERENCE OF PROPORTIONS
Case A Case B
Gender Gender
Male Female Male Female
Subjective health
Good 100 0 55 55
Poor 0 100 45 45
Total 100 100 100 100

100 0
Case A= − = 1 − 0 = 1 (strong assoc.)
100 100

55 55
Case B= − = 0 (𝑛𝑜 𝑎𝑠𝑠𝑜𝑐. )
100 100

Statistical Methods in Sociology II, Week 9 10


MEASURE OF ASSOCIATION:
RELATIVE RISK (RR)

• Compare the probability of an event occurring all


possible events “How much more likely?”

EX: developing a lung cancer who exposed smoking in last ten years

Disease No Disease

Exposed A B

Unexposed C D

Statistical Methods in Sociology II, Week 9 11


MEASURE OF ASSOCIATION: RELATIVE RISK (RR)

𝐷𝑖𝑠𝑒𝑎𝑠𝑒 𝑅𝑖𝑠𝑘 (𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒) 𝑖𝑛 𝐸𝑥𝑝𝑜𝑠𝑒𝑑 (366 ÷ 366 + 32 )


Ratio of risks=
𝐷𝑖𝑠𝑒𝑎𝑠𝑒 𝑅𝑖𝑠𝑘 𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛 𝑈𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑(64 ÷ (64 + 319))

= 0.92 / 0.17
= 5.41

Disease No Disease

Exposed 366 32

Unexposed 64 319

Statistical Methods in Sociology II, Week 9 12


INTERPRETING RELATIVE RISK (RR)

• If RR=1, the incidence in the exposed is the same as the incidence in the non-
exposed.
• The further the RR from 1, the stronger the association
• No increased risk, no association.

• If RR > 1, the incidence in the exposed is greater than the incidence in the non-
exposed.
• Increased risk, positive association.

• If RR < 1, the incidence in the exposed is lower than the incidence in the non-
exposed.
• Decreased risk, negative association.
• NOTE= You need to check p-value before you interpret its strength.

Statistical Methods in Sociology II, Week 9 13


THE ODDS RATIO (OR)
➢ Success (outcome of interest) and failure
➢ Odds= Probability/(1-Probability)
Example: 60% probability that he will pass the exam. Odds are 1.5 (i.e.
60:40)

➢ The estimated odds for a binary response (e.g. yes/no) equal the
number of successes divided by number of failures.
➢ The odds ratio is a measure of association for 2x2 contingency
tables that equals the odds in row 1 divided y the odds in row 2.

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 (𝑌𝑒𝑠)


Odds ratio =
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 (𝑁𝑜)

Statistical Methods in Sociology II, Week 9 14


RELATIVE RISK (RR) VS ODDS RATIO (OR)

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎𝑛 exposed 𝑔𝑒𝑡𝑠 𝑑𝑖𝑠𝑒𝑎𝑠𝑒


RR=𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎𝑛 unexposed 𝑔𝑒𝑡𝑠 𝑑𝑖𝑠𝑒𝑎𝑠𝑒

𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑


OR= 𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑

Statistical Methods in Sociology II, Week 9 15


MEASURE OF ASSOCIATION:
ODDS RATIO (OR)

𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐴 ÷ 𝐶)


OR=
𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐵 ÷ 𝐷)
= AD / BC

Disease No Disease
(Case) (Control)

Exposed A B

Unexposed C D

Statistical Methods in Sociology II, Week 9 16


MEASURE OF ASSOCIATION:
ODDS RATIO (OR)

𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐴 ÷ 𝐶)


OR=
𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐵 ÷ 𝐷)

87/201 0.43
=
147/508
=
0.29
= 1.48

Parental
smoking in Disease No Disease
pregnancy (Cancer) (No cancer)

Yes: smoking 87 147

No: No smoking 201 508

Statistical Methods in Sociology II, Week 9 17


ODDS RATIO: INTERPRETATION
Odds ratio takes values between 0 and ∞. When OR = 1, no
association.
Examples:
• P (success) = 0.75, P(failure)= 0.25, OR = 0.75/0.25 = 3
‘Success is three times as likely as failure’
• P(success) = 0.50, P(failure) = 0.50, OR = 1
‘Success is as likely as failure’
• P(success) = 0.25, P(failure)=0.75, OR = 1/3
‘Failure is three times as likely as success, or we expect three failures for each
success’
For the example:
You need to
check
If OR = 1, exposure is not associated with the disease
P-value before
If OR > 1, exposure is positively associated with the disease
you interpret!!
If OR < 1, exposure is negatively associated with the disease

Statistical Methods in Sociology II, Week 9 18


ODDS RATIO: EXAMPLE
Cross-classification of Race of Victims and Race of Offenders

Race of victim
White Black Total
White 3150 230 3380
Race of
offender Black 516 2984 3500

For white offenders:


Proportion of white victims: 3150/3380=0.983
Proportion of black victims:230/3380=0.068
OR= 0.983/0.068 = 13.7
For white offenders, having a white versus black
victim is 14 times more likely.

Statistical Methods in Sociology II, Week 9 19


ODDS RATIO: EXAMPLE
Cross-classification of Race of Victims and Race of Offenders

Race of victim
White Black Total
White 3150 230 3380
Race of
offender Black 516 2984 3500

For black offenders, what is the OR for having black victim?


Proportion of black victims: 2984/3500
Proportion of white victims: 516/350
OR= 0.852/0.147 = 5.8
For black offenders, having a black versus white victim is 6
times as likely.

Statistical Methods in Sociology II, Week 9 20


ODDS RATIO: EXAMPLE
Cross-classification of Race of Victims and Race of Offenders

Race of victim
White Black Total
White 3150 230 3380
Race of
offender Black 516 2984 3500

For black offenders, what is the OR for having white victim?


Proportion of white victims: 516/3500
Proportion of black victims: 2984/3500
OR= 0.147/0.852= 0.17 =1/5
For black offenders, for every 5 black victim, there is a white
victim.

Statistical Methods in Sociology II, Week 9 21


SMALL SAMPLE SIZES:
FISHER’S EXACT TEST
▪ The chi-squared test of independence, like one- and
two-sample z tests for proportions, is a large-sample
test.

▪ When the expected frequencies are small, any of


them being less than about 5, small-sample tests are
more appropriate than the chi-squared test. For 2 x 2
contingency tables, Fisher’s exact test is a small
sample test of independence.

Statistical Methods in Sociology II, Week 9 22


SMALL SAMPLE SIZES:
FISHER’S EXACT TEST
Fisher’s Exact Test of Independence for 2 x 2 Tables
1. Assumptions:
• Two binary categorical variables
• Randomization, such as random sampling or a randomized experiment
2. Hypotheses:
• H0: The two variables are independent (H0: p1 = p2 )
• Ha: The two variables are associated
(Choose Ha: p1 ≠ p2 or Ha: p1 > p2 or Ha: p1 < p2 ).
3. Test statistic: First cell count (this determines the others, given the margin
totals).
4. P-value: Probability that the first cell count equals the observed value or a value
even more extreme than observed in the direction predicted by Ha.
5. Conclusion: Report P-value and interpret in context. If a decision is needed,
reject
• H0 when P-value ≤ significance level (such as 0.05).

Statistical Methods in Sociology II, Week 9 23


Questions? Ideas?

Thank you for your attention!

You might also like