0% found this document useful (0 votes)

91 views11 pages

Categorical Data Analysis Guide

The document discusses contingency analysis, which allows determination of whether two categorical variables are independent. It provides examples of questions that can be addressed, such as whether fur color is related to gender. Contingency tables and mosaic plots can be used to examine relationships between categorical variables. Odds ratios are calculated to assess the odds of success relative to failure between groups. A chi-square contingency test determines if variables are independent by comparing observed and expected frequencies in a contingency table.

Uploaded by

Jai Kanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views11 pages

Categorical Data Analysis Guide

Uploaded by

Jai Kanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Contingency Analysis:

association between categorical variables

Contingency Analysis
Mosaic Plots
Odds
Odds Ratio
SE & CI for Odds Ratio
2 Contingency Test
R example
Assumptions of 2
Correction for Continuity
Fisher's Exact Test
G-tests

Contingency Analysis

There are many examples in biology where we wish to

relate two variables that are categorical.

For example:

1. Do bright and drab butterflies differ in their probability of

being eaten?
2. Is fur color (tan, brown, black) related to gender?
3. Is tree death related to slope aspect?

These questions are best approached using contingency

analysis, which allows us to determine whether two or
more categorical variables are independent.
2

Mosaic Plots

The Titanic disaster provides a simple example of the use

of mosaic plots for examining the structure of frequency
data.

Plots are composed of a series of graphical blocks or

boxes. The area of each box is proportional to the number
of elements in that group. Groups can be compared side by
side (rowise or columnwise).

The plot clearly shows that women experienced a greater

survival rate then men.

3
Mosaic Plots

Odds
Let's consider a variable (e.g., our previous coin toss
example) for which a random trial yields one of two
outcomes: success or failure (heads or tails).

The probability of success is p and the probability of failure

is 1-p. The odss of success (O) are the probability of
success divided by the probability of failure:
p
O=
1 p

The estimate of the odds is calculated from a random

sample of trials using the observed proportion of
successes (p-hat):

p
O=
1 p 5

Odds
- Example -

It is well established that there is a link between the use

of aspirin and decreased risk of heart attack. A
suggestion was made that there may also be a link with
reduction in cancer risk. A total of 39,876 women were
split into two groups: half took aspirin, half a placebo.
After 10 years the prevalence of cancer was assessed
in the two groups:

6
Odds
- Example -

Odds
- Example -

The estimated proportion that did not get cancer

(and the cmplement; those that did get cancer) is:

18496
p1 = =0.9279
19934

1 p 1 =10.9279=0.0721

The odds of not getting cancer while taking aspirin are:

p 0.9279
O 1= 1 p 1= =12.87
1 0.0721

So, the odds are ca. 12.87:1 of not getting cancer if taking aspirin.
8

Odds
- Example -

But, what are the odds of getting cancer if taking aspirin?

18515
p 2 = =0.9284
19942

1 p2 =10.9284=0.0716

p 0.9284
O 2 = 2 p 2= =12.97
1 0.0716

The difference between 12.87 and 12.97 is negligible,

so aspirin is not likely to influence cancer rate.

9
Odds Ratio

But, as statisticians, we are seldom convinced by just a

small difference (such a large sample size could still be
significant). We can use the odds ratio (OR) to assess
the odds of success relative to the odds of failure:

O1
O R=
O2

If the odds ratio is equal to one, the the odds of success

in the response variable is the same for both groups.

Odds Ratio
- Example -

O 1 12.86
O R= = =0.992
O 2 12.97

The OR suggests that the odds of developing cancer

while taking aspirin were about the same as while taking
the placebo; however, since the value is less than one,
there was a slight benefit of taking aspirin.

We're still left with the question of whether the aspirin is

a significant help towards reducing cancer risk (even if
small). We can evaluate this using the SE and CI
around OR.
11

Odds Ratio
Because the data are highly skewed, we must convert the OR
to its natural log form and then calculate the SE from which we
can derive the CI:

SE [ ln O R]=
1 1 1 1

a b c d

SE [ ln O R]=
1

1

1

1
1438 1427 18496 18515

SE [ ln O R]=0.03878

12
Odds Ratio
Now that we have a SE calculated, we can calculate the
the 95% CI:

-0.00803 1.96(0.03878) < ln(OR) < -0.00803 + 1.96(0.03878)

-0.084 < ln(OR) < 0.068

e-0.084 < OR < e0.068

0.92 < OR < 1.07

The CI is tightly bounded around 1.0, so the data provide good

evidence that aspirin plays no effect on the probability of
developing cancer. 13

2 Contingency Test

The most commonly used frequency data analysis

method is the chi-square contingency test for
association.

You may also see this test referred to in the literature as

an R x C (row-by-column) association test. R can have
two or more categories and C can have two or more
categories.

This test is widely adaptable to a variety of tests dealing

with the comparison of categorical data (and can be
expanded to 3+ dimensions = log-linear analysis)

2 Contingency Test
- Example -

Example 9.3 provides a biological example involving the

infection of fish with a parasite and their risk of predation by
birds as a function of their position in the water column.

The two variables of interest are infection status (uninfected,

lightly infected, and highly infected) and predation (eaten, not
eaten).

The corresponding hypotheses:

H0: Parasite infection and being eaten are independent.

HA: Parasite infection and being eaten are not independent.

15
2 Contingency Test
- Example -

2 Contingency Test
- Example -

[uninfected ]=50/141=0.3546
Pr

[eaten ]=48/141=0.3404
Pr

[uninfected eaten ]=0.35460.3404=0.1207

Expected [ uninfected eaten]=0.1207141=17.0

18
19

2 Statistic

Now that we have observed frequencies and expected

frequencies, we can generate a chi-square test using our
general formula:

c r 2
[Observed column , rowExpected column , row]
2 = .
[ Expected column , row]
column=1 row=1

117.02 4933.02 930.32

=
2
= 69.5
17.0 33.0 30.3
2
2, 0.05 =5.99 therefore , reject H 0

NB : df = r1c1=2131=2
20

Example
How would we solve this problem in R? Basically, a row x
column table is a matrix; so in keeping with the approach
of using vectors for data, we create an array using the
matrix function and specify that the data are read by rows
(note how R cycles through the data to create a matrix from
a vector):

> fish<-matrix(c(1,49,10,35,37,9),nrow=2)
> fish
[,1] [,2] [,3]
[1,] 1 10 37
[2,] 49 35 9

21
Example
While we have a perfectly workable matrix, let's prettify it and
add the appropriate variable names and levels:

> fish<-matrix(c(1,49,10,35,37,9), nrow=2,

dimnames=list("Predation"=c("Eaten", "Not Eaten"),
"Infection" = c("Uninfected", "Light", "Heavy")))
> fish
Infection
Predation Uninfected Light Heavy
Eaten 1 10 37
Not Eaten 49 35 9

Example

And the chi-square test...

> chisq.test(fish)

Pearson's Chi-squared test

data: fish
X-squared = 69.7557, df = 2, p-value
= 7.124e-16

Chi-square has a number of sub-routines that we take

further advantage of:

> chisq.test(fish)$observed
Infection
Predation Uninfected Light Heavy
Eaten 1 10 37
Not Eaten 49 35 9
> chisq.test(fish)$expected
Infection
Predation Uninfected Light Heavy
Eaten 17.02128 15.31915 15.65957
Not Eaten 32.97872 29.68085 30.34043

24
> mosaicplot(t(fish),cex=1.25,color=TRUE)

The chi-square contingency test makes the same assumptions

as the goodness of fit test:

1. No more than 20% of the cells can have a frequency less

than 5, and
2. No cell can have an expected frequency less than one.

If either are violated, the response is the same: (a) combine a

row or column [if array is bigger than 2 2], (b) if table is 2 2
use Fisher's Exact Test, or (c) use a randomization procedure
(discussed at end of course).

Correction for Continuity

When the contingency table is 2 2, most statisticians

recommend the use of a continuity correction factor. This
modification is known as the Yates Correction for
Continuity:

2
1
c r Observed column ,rowExpected column , row
2
= .
2

column=1 row=1 [ Expected column ,row]

27
Fisher's Exact Test

Fisher's Exact Test is used specifically for 2 2 contingency

tests. The test is an improvement over the normal chi-
square in cases where the expected cell frequencies are too
low to meet the regular assumptions. Thus, this test is used
for small data sets comparing two categorical variables.

Let's look at Example 9.4

which examines the feeding
habits of vampire bats.
The main question is whether
or not cows in estrous have a
greater chance of being
attacked by bats compared
to cows not in estrous.
28

> bats<-matrix(c(15,7,6,322),nrow=2)
> bats
[,1] [,2]
[1,] 15 6
[2,] 7 322

> fisher.test(bats)

Fisher's Exact Test for Count Data

data: bats
p-value < 2.2e-16
alternative hypothesis: true odds ratio
is not equal to 1
95 percent confidence interval:
29.94742 457.26860
sample estimates:
odds ratio
108.3894
30
G-tests

The G-test is another contingency test seen frequently in the

literature. The G-test is very similar to the chi-square test
across a wider range of circumstances. It utilizes the natural
logarithm (ln) in its calculation.

The G-test may not be as powerful as the chi-square test for

small sample sizes.

R code for G-test statistics are available, but are not part of the
normal stats base package or related packages.

Contingency Analysis
No ratings yet
Contingency Analysis
7 pages
Data Analysis:: Simple Statistical Tests
100% (1)
Data Analysis:: Simple Statistical Tests
45 pages
Dsur I Chapter 18 Categorical Data
No ratings yet
Dsur I Chapter 18 Categorical Data
47 pages
Basic Biostatistics - Wakgari Module 17-21
No ratings yet
Basic Biostatistics - Wakgari Module 17-21
82 pages
Analysis of Categorical Data
No ratings yet
Analysis of Categorical Data
75 pages
Data Analysis Basics
No ratings yet
Data Analysis Basics
11 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
40 pages
Probability and Statistics - Lecture 4
No ratings yet
Probability and Statistics - Lecture 4
35 pages
6 Contingency Tables
No ratings yet
6 Contingency Tables
72 pages
Goodness of Fit Tests Contingency Tables
No ratings yet
Goodness of Fit Tests Contingency Tables
49 pages
Measures of Association
No ratings yet
Measures of Association
56 pages
1measures of Association
No ratings yet
1measures of Association
105 pages
Categorical Data Analysis Course
No ratings yet
Categorical Data Analysis Course
191 pages
Chi-Square Test and Odds Ratio - Tagged
No ratings yet
Chi-Square Test and Odds Ratio - Tagged
45 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
Notes
No ratings yet
Notes
9 pages
10measures of Association
No ratings yet
10measures of Association
249 pages
Chi-Square Questions - Biostatistics
No ratings yet
Chi-Square Questions - Biostatistics
10 pages
Biostatistics: Categorical Hypothesis
No ratings yet
Biostatistics: Categorical Hypothesis
47 pages
Statistical Theory Lecture 5-2025
No ratings yet
Statistical Theory Lecture 5-2025
13 pages
Categorical Notes Ch3
No ratings yet
Categorical Notes Ch3
15 pages
Outline Note Allan Agresti
No ratings yet
Outline Note Allan Agresti
187 pages
Categorical Data Analysis Guide
No ratings yet
Categorical Data Analysis Guide
44 pages
Statistics: The Chi Square Test
No ratings yet
Statistics: The Chi Square Test
41 pages
Chi Square
No ratings yet
Chi Square
18 pages
2 Questions
No ratings yet
2 Questions
8 pages
Chi-square (χ2) test compiled
No ratings yet
Chi-square (χ2) test compiled
34 pages
Analysis of Categorical Data and Epidemiologic Studies - Topic 8
No ratings yet
Analysis of Categorical Data and Epidemiologic Studies - Topic 8
52 pages
Chi Square: Objectives
No ratings yet
Chi Square: Objectives
8 pages
Chi Square Test
No ratings yet
Chi Square Test
23 pages
Business Chi-Square Analysis Guide
No ratings yet
Business Chi-Square Analysis Guide
19 pages
Lecture 4&5-Categorical Data Analysis
No ratings yet
Lecture 4&5-Categorical Data Analysis
85 pages
Section 9 Solutions: Statistics 104 Spring 2020
No ratings yet
Section 9 Solutions: Statistics 104 Spring 2020
6 pages
Chi Square Test Guide & Examples
100% (2)
Chi Square Test Guide & Examples
14 pages
M3 L03 Contingency Analysis D2L
No ratings yet
M3 L03 Contingency Analysis D2L
8 pages
Nonparametric Testing
No ratings yet
Nonparametric Testing
4 pages
Biostatistics L11+12 2021
No ratings yet
Biostatistics L11+12 2021
9 pages
Chi Squared Analysis With Voiceover 1
No ratings yet
Chi Squared Analysis With Voiceover 1
29 pages
Chi Square
No ratings yet
Chi Square
28 pages
CDA Ex
No ratings yet
CDA Ex
10 pages
Statistical Tests
No ratings yet
Statistical Tests
20 pages
Chi Square
No ratings yet
Chi Square
11 pages
Chi Square
No ratings yet
Chi Square
20 pages
Using Minitab Chapter 11: Additional Hypothesis Tests: Assuming Equal Probabilities - Assuming Non-Equal Probabilities
No ratings yet
Using Minitab Chapter 11: Additional Hypothesis Tests: Assuming Equal Probabilities - Assuming Non-Equal Probabilities
5 pages
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
No ratings yet
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
657 pages
Two-Sample Inference
No ratings yet
Two-Sample Inference
13 pages
Ch10 TestCD4
No ratings yet
Ch10 TestCD4
13 pages
Chapter 8-10 Contigency Table, Correlation and Regression
No ratings yet
Chapter 8-10 Contigency Table, Correlation and Regression
91 pages
Statistical Methods Explained
No ratings yet
Statistical Methods Explained
21 pages
Chi-Square Tests
No ratings yet
Chi-Square Tests
6 pages
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
No ratings yet
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
4 pages
Catedatach2 PDF
No ratings yet
Catedatach2 PDF
100 pages
Chi Squared Tests Applied To Ecology
No ratings yet
Chi Squared Tests Applied To Ecology
3 pages
Chi Square Tests 2020
No ratings yet
Chi Square Tests 2020
42 pages
Stats Assignment: Chi-Square Analysis
No ratings yet
Stats Assignment: Chi-Square Analysis
12 pages
Final-Review BIOSTATS PHD
No ratings yet
Final-Review BIOSTATS PHD
24 pages
Chapter - 16: Categorical Data Field (2005)
No ratings yet
Chapter - 16: Categorical Data Field (2005)
31 pages
Categorical Data Analysis: 48Th Icro-Sun PG Teaching Programme 26 & 27 OCTOBER, 2024
No ratings yet
Categorical Data Analysis: 48Th Icro-Sun PG Teaching Programme 26 & 27 OCTOBER, 2024
57 pages
Know Thy Complexities!: Big-O Complexity Chart
No ratings yet
Know Thy Complexities!: Big-O Complexity Chart
2 pages
Node+Hero+-+The+complete+Node js+tutorial+series+from+RisingStack
No ratings yet
Node+Hero+-+The+complete+Node js+tutorial+series+from+RisingStack
65 pages
Node js24hrs
50% (4)
Node js24hrs
58 pages
16 Creating RESTful Web Services With Application Modules
No ratings yet
16 Creating RESTful Web Services With Application Modules
49 pages
Exception Handling Patterns in Process-Aware Information Systems
No ratings yet
Exception Handling Patterns in Process-Aware Information Systems
29 pages
Beginning Cursive 1 Z B
No ratings yet
Beginning Cursive 1 Z B
36 pages
Mudravig PDF
No ratings yet
Mudravig PDF
24 pages
VLDN
No ratings yet
VLDN
1 page
Karuna Reiki Symbols
84% (19)
Karuna Reiki Symbols
6 pages
Fusion Reiki PDF
100% (2)
Fusion Reiki PDF
9 pages
ALEC Internship Program - Position Descriptions
No ratings yet
ALEC Internship Program - Position Descriptions
3 pages
DoD VV&A Documentation Standard
100% (1)
DoD VV&A Documentation Standard
55 pages
Unfinished Study
No ratings yet
Unfinished Study
28 pages
En GARDE - Advanced Rules Reference
No ratings yet
En GARDE - Advanced Rules Reference
2 pages
Lab Manual LOGIC DESIGN
No ratings yet
Lab Manual LOGIC DESIGN
38 pages
Economics of Education Review: Michela Braga, Marco Paccagnella, Michele Pellizzari
No ratings yet
Economics of Education Review: Michela Braga, Marco Paccagnella, Michele Pellizzari
18 pages
Cambridge International AS & A Level: Psychology 9990/11
No ratings yet
Cambridge International AS & A Level: Psychology 9990/11
12 pages
Book 1811 - Thomas White - Celestial Intelligencer
100% (1)
Book 1811 - Thomas White - Celestial Intelligencer
451 pages
Content Compilation 2
No ratings yet
Content Compilation 2
28 pages
A Systems Approach To Small Group Interaction 11th Edition Official Test Bank
No ratings yet
A Systems Approach To Small Group Interaction 11th Edition Official Test Bank
321 pages
Masters of The Night Rulebook
No ratings yet
Masters of The Night Rulebook
20 pages
Williams A A B Pidgeon J T Day 1985 Expansive Soils
No ratings yet
Williams A A B Pidgeon J T Day 1985 Expansive Soils
12 pages
Full Chapter of Cohens Pathways of The Pulp 11th Edition Ebook and TestBank Bundle EPUB DOCX PDF Download Now
No ratings yet
Full Chapter of Cohens Pathways of The Pulp 11th Edition Ebook and TestBank Bundle EPUB DOCX PDF Download Now
409 pages
RE/MAX Executive Realty: Instructions For Submitting An Offer
No ratings yet
RE/MAX Executive Realty: Instructions For Submitting An Offer
1 page
Surah Al Usri Yusra - Google Search
No ratings yet
Surah Al Usri Yusra - Google Search
1 page
'05 Sullivan B.
100% (1)
'05 Sullivan B.
36 pages
Rainbow Part - 1: (Work Book With Activities)
No ratings yet
Rainbow Part - 1: (Work Book With Activities)
72 pages
Discovering the Batak Culture
No ratings yet
Discovering the Batak Culture
5 pages
English V Semester Recap
No ratings yet
English V Semester Recap
11 pages
Preferred Stock Investment Agreement
No ratings yet
Preferred Stock Investment Agreement
17 pages
The Art of Public Speaking 13th Edition Edition Stephen E. Lucas Download
100% (2)
The Art of Public Speaking 13th Edition Edition Stephen E. Lucas Download
121 pages
Spelling Bee - Word Bank
No ratings yet
Spelling Bee - Word Bank
1 page
Contemporary Philippine Music Guide
No ratings yet
Contemporary Philippine Music Guide
45 pages
Grammar2-PresentCont 2625
No ratings yet
Grammar2-PresentCont 2625
2 pages
Social & BK Mock
100% (3)
Social & BK Mock
7 pages
DBA T4 Business Analytics Proposal Development 2023
No ratings yet
DBA T4 Business Analytics Proposal Development 2023
33 pages
Guidebook On APEC Projects (9th Edition)
No ratings yet
Guidebook On APEC Projects (9th Edition)
84 pages
ANSWER: D. A Leader Rely On Trust While A Manager Relies On Control Rationale
No ratings yet
ANSWER: D. A Leader Rely On Trust While A Manager Relies On Control Rationale
6 pages
Wind of Provence
No ratings yet
Wind of Provence
13 pages
CA FM & ECO Marathon 2019 Guide
No ratings yet
CA FM & ECO Marathon 2019 Guide
79 pages