WEEK 9.
ANALYZING THE ASSOCIATION
             BETWEEN
 CATEGORICAL VARIABLES (CONT.)
         STATISTICAL METHODS IN SOCIOLOGY II
                        SOC 242
                    Spring 2024-2025
                     Tuesday, 14:40-16:30 G204
                     Thursday, 12:40-14:30 G204
     FACULTY OF ARTS AND SCIENCES
        DEPARTMENT OF SOCIOLOGY
                       SESSION PLAN
o Residuals
o Measure of Association
o Relative Risk (RR)
o Odds Ratio (OR)
o Small sample sizes: Fisher’s Exact Test
                          Statistical Methods in Sociology II, Week 9   2
         MISUSE OF CHI-SQUARE TEST
▪ When some of the expected frequencies are too small.
▪ When separate rows or columns are dependent samples,
  such as when each row of the table has the same subjects.
▪ To data that do not result from a random sample or
  randomized experiment.
▪ To data by classifying quantitative variables into
  categories. This results in a loss of information. It is
  usually more appropriate to analyze the data with
  methods for quantitative variables.
                         Statistical Methods in Sociology II, Week 9   3
                                  RESIDUALS
• Residuals help us to reveal the pattern of association
• A residual is the difference (𝑓𝑜 -𝑓𝑒 ) between an
  observed and expected cell frequency.
                                                   Gender
                                           Male               Female             Total
  Subjective health
 Very good                              172 (155)
                                           +17    148 (165)-17                   320
 Good                                  385 (380)                                 781
                                           +5     396 (401)-5
 Fair                                  179 (189)-
                                           10     210 (200)10                    389
 Poor                                  27 (39)-12 53 (41)12                       80
  Total                                      763                 807             1570
  Data source: WVS Turkey, 2011
                                   Statistical Methods in Sociology II, Week 9           4
      CHI-SQUARED (Χ2 )
                    RESIDUALS
O      E        O-E              (O-E)2                   (O-E)2/E
172   155         17                289                    1.865
148   165        -17                289                     1.751
385   380          5                 25                    0.066
396   401         -5                 25                    0.062
179   189        -10                100                    0.529
210   200         10                100                      0.5
 27    39        -12                144                    3.692
 53    41         12                144                    3.512
                                                                     2 = 
                                                                              ( fo − fe )2
                                                    Total: 12.043
                                                                                   fe
            Statistical Methods in Sociology II, Week 9                                  5
              BUT HOW BIG IS THE RESIDUAL TO
         INDICATE A SIGNIFICANT DEVIATION?
   Standardized residual: is the number of standard
   errors that 𝑓𝑜 −𝑓𝑒 falls from the value of 0 that we expect
   when 𝐻0 is true.
       𝑓 −𝑓                       𝑓𝑜 −𝑓𝑒
   z= 𝑜 𝑒=
         𝑠𝑒      𝑓𝑒 (1−𝑟𝑜𝑤 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)(1−𝑐𝑜𝑙𝑢𝑚𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)
                  Gender
                   Male           Total
                                                                            172−155
 Subjective                                            z=                     155     155
 health                                                            155 1−           (1−   )
                                                                              320     763
Very good         172 (155)
                     17            320
                                                       z = -0.27 2.13
 Total               763
                              Statistical Methods in Sociology II, Week 9                     6
                               ALE ASSIGNMENT
                                                              Gender
                                                      Male             Female            Total
                Subjective health
               Very good                         172 (155) +17 148 (165)-17              320
               Good                                385 (380)
                                                               396 (401)-5               781
                                                       +5
               Fair                              179 (189)-10 210 (200)10                389
               Poor                                27 (39)-12         53 (41)12           80
                 Total                                  763               807            1570
                Data source: WVS Turkey, 2011
Calculate standardized residuals for each cell and write whether there is “dependence” or “independence”
• Reporting “very good” health status and being “male”
• Reporting “very good” health status and being “female”
• Reporting “good” health status and being “male”
• Reporting “good” health status and being “female”
• Reporting “fair” health status and being “male”
• Reporting “fair” health status and being “female”
• Reporting “poor” health status and being “male”
• Reporting “poor” health status and being “female”                                     30 May
                                                                                         send via e-mail
                                           Statistical Methods in Sociology II, Week 9                     7
                  BUT HOW BIG IS THE RESIDUAL TO
                INDICATE A SIGNIFICANT DEVIATION?
▪ A standardized residual reports the number of
  standard errors that an observed count falls from
  its expected count. The se describes how much the
  (observed - expected) difference would tend to
  vary in repeated random sampling if the variables
  were independent.
▪ When H0: independence is true, the standardized
  residuals have approximately a standard normal
  distribution: They fluctuate around a mean of 0,
  with a standard deviation of 1.
                                         Statistical Methods in Sociology II, Week 9   8
               MEASURE OF ASSOCIATION
➢ Is there an association? Chi-square
➢ How do each cell differ from independence? Standardized residuals
➢ How strong is the association? Measure of association
Measure of association is a statistic or a parameter that summarizes
     the strength of the dependence between two variables.
 This measure falls between -1 and 1. The stronger the association, the
       larger the absolute value of the measure of association.
          (i.e. -1 and 1 extreme association, 0 no association)
                              Statistical Methods in Sociology II, Week 9   9
             MEASURE OF ASSOCIATION:
            DIFFERENCE OF PROPORTIONS
                                 Case A                                   Case B
                                 Gender                                   Gender
                          Male         Female                      Male          Female
 Subjective health
Good                         100            0                        55            55
Poor                         0            100                        45            45
 Total                       100          100                       100           100
            100         0
 Case A=           −         = 1 − 0 = 1 (strong assoc.)
            100        100
             55          55
 Case B=             −           = 0 (𝑛𝑜 𝑎𝑠𝑠𝑜𝑐. )
             100         100
                                   Statistical Methods in Sociology II, Week 9            10
                 MEASURE OF ASSOCIATION:
                   RELATIVE RISK (RR)
• Compare the probability of an event occurring all
  possible events “How much more likely?”
EX: developing a lung cancer who exposed smoking in last ten years
                              Disease                  No Disease
             Exposed                A                            B
             Unexposed              C                            D
                           Statistical Methods in Sociology II, Week 9   11
  MEASURE OF ASSOCIATION: RELATIVE RISK (RR)
                  𝐷𝑖𝑠𝑒𝑎𝑠𝑒 𝑅𝑖𝑠𝑘 (𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒) 𝑖𝑛 𝐸𝑥𝑝𝑜𝑠𝑒𝑑 (366 ÷ 366 + 32 )
Ratio of risks=
                  𝐷𝑖𝑠𝑒𝑎𝑠𝑒 𝑅𝑖𝑠𝑘 𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛 𝑈𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑(64 ÷ (64 + 319))
              = 0.92 / 0.17
              = 5.41
                                Disease                No Disease
            Exposed                366                        32
            Unexposed               64                       319
                            Statistical Methods in Sociology II, Week 9   12
                 INTERPRETING RELATIVE RISK (RR)
• If RR=1, the incidence in the exposed is the same as the incidence in the non-
  exposed.
• The further the RR from 1, the stronger the association
• No increased risk, no association.
• If RR > 1, the incidence in the exposed is greater than the incidence in the non-
  exposed.
• Increased risk, positive association.
• If RR < 1, the incidence in the exposed is lower than the incidence in the non-
  exposed.
• Decreased risk, negative association.
•   NOTE= You need to check p-value before you interpret its strength.
                                          Statistical Methods in Sociology II, Week 9   13
             THE ODDS RATIO (OR)
➢ Success (outcome of interest) and failure
➢ Odds= Probability/(1-Probability)
Example: 60% probability that he will pass the exam. Odds are 1.5 (i.e.
60:40)
➢ The estimated odds for a binary response (e.g. yes/no) equal the
  number of successes divided by number of failures.
➢ The odds ratio is a measure of association for 2x2 contingency
  tables that equals the odds in row 1 divided y the odds in row 2.
                                𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 (𝑌𝑒𝑠)
                 Odds ratio =
                                𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 (𝑁𝑜)
                                Statistical Methods in Sociology II, Week 9   14
RELATIVE RISK (RR) VS ODDS RATIO (OR)
     𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎𝑛      exposed 𝑔𝑒𝑡𝑠 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
RR=𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎𝑛 unexposed 𝑔𝑒𝑡𝑠 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
        𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
   OR= 𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑
                 Statistical Methods in Sociology II, Week 9   15
MEASURE OF ASSOCIATION:
  ODDS RATIO (OR)
       𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐴 ÷ 𝐶)
OR=
      𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐵 ÷ 𝐷)
  = AD / BC
                             Disease                  No Disease
                             (Case)                    (Control)
   Exposed                        A                        B
   Unexposed                      C                        D
             Statistical Methods in Sociology II, Week 9           16
 MEASURE OF ASSOCIATION:
   ODDS RATIO (OR)
       𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑎𝑠𝑒 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐴 ÷ 𝐶)
OR=
      𝑂𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑤𝑎𝑠 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝐵 ÷ 𝐷)
      87/201    0.43
 =
      147/508
              =
                0.29
                                       = 1.48
  Parental
  smoking in                Disease                No Disease
  pregnancy                (Cancer)                (No cancer)
  Yes: smoking                  87                        147
  No: No smoking               201                        508
                   Statistical Methods in Sociology II, Week 9   17
             ODDS RATIO: INTERPRETATION
Odds ratio takes values between 0 and ∞. When OR = 1, no
association.
Examples:
• P (success) = 0.75, P(failure)= 0.25, OR = 0.75/0.25 = 3
                      ‘Success is three times as likely as failure’
• P(success) = 0.50, P(failure) = 0.50, OR = 1
                            ‘Success is as likely as failure’
• P(success) = 0.25, P(failure)=0.75, OR = 1/3
  ‘Failure is three times as likely as success, or we expect three failures for each
                                        success’
For the example:
                                                                              You need to
                                                                              check
If OR = 1, exposure is not associated with the disease
                                                                              P-value before
If OR > 1, exposure is positively associated with the disease
                                                                              you interpret!!
If OR < 1, exposure is negatively associated with the disease
                                     Statistical Methods in Sociology II, Week 9                18
     ODDS RATIO: EXAMPLE
   Cross-classification of Race of Victims and Race of Offenders
                                       Race of victim
                                  White                 Black         Total
                White               3150                  230         3380
 Race of
 offender       Black                516                 2984         3500
For white offenders:
Proportion of white victims: 3150/3380=0.983
Proportion of black victims:230/3380=0.068
OR= 0.983/0.068 = 13.7
For white offenders, having a white versus black
victim is 14 times more likely.
                        Statistical Methods in Sociology II, Week 9           19
             ODDS RATIO: EXAMPLE
   Cross-classification of Race of Victims and Race of Offenders
                                       Race of victim
                                  White                Black        Total
                White              3150                  230        3380
  Race of
  offender      Black               516                 2984        3500
For black offenders, what is the OR for having black victim?
Proportion of black victims: 2984/3500
Proportion of white victims: 516/350
OR= 0.852/0.147 = 5.8
For black offenders, having a black versus white victim is 6
times as likely.
                      Statistical Methods in Sociology II, Week 9           20
         ODDS RATIO: EXAMPLE
   Cross-classification of Race of Victims and Race of Offenders
                                       Race of victim
                                  White                Black        Total
                White              3150                  230        3380
  Race of
  offender      Black               516                 2984        3500
For black offenders, what is the OR for having white victim?
Proportion of white victims: 516/3500
Proportion of black victims: 2984/3500
OR= 0.147/0.852= 0.17 =1/5
For black offenders, for every 5 black victim, there is a white
victim.
                      Statistical Methods in Sociology II, Week 9           21
                SMALL SAMPLE SIZES:
                FISHER’S EXACT TEST
▪ The chi-squared test of independence, like one- and
  two-sample z tests for proportions, is a large-sample
  test.
▪ When the expected frequencies are small, any of
  them being less than about 5, small-sample tests are
  more appropriate than the chi-squared test. For 2 x 2
  contingency tables, Fisher’s exact test is a small
  sample test of independence.
                          Statistical Methods in Sociology II, Week 9   22
                            SMALL SAMPLE SIZES:
                            FISHER’S EXACT TEST
Fisher’s Exact Test of Independence for 2 x 2 Tables
1. Assumptions:
• Two binary categorical variables
• Randomization, such as random sampling or a randomized experiment
2. Hypotheses:
• H0: The two variables are independent (H0: p1 = p2 )
• Ha: The two variables are associated
(Choose Ha: p1 ≠ p2 or Ha: p1 > p2 or Ha: p1 < p2 ).
3. Test statistic: First cell count (this determines the others, given the margin
totals).
4. P-value: Probability that the first cell count equals the observed value or a value
even more extreme than observed in the direction predicted by Ha.
5. Conclusion: Report P-value and interpret in context. If a decision is needed,
reject
• H0 when P-value ≤ significance level (such as 0.05).
                                        Statistical Methods in Sociology II, Week 9      23
 Questions? Ideas?
Thank you for your attention!