0% found this document useful (0 votes)
23 views17 pages

Statistics

Uploaded by

Muskan Sikarwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Statistics

Uploaded by

Muskan Sikarwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Hypothesis

Testing for
Categorical
Data (Chi-
Square Test)
By- Kingshuk Roy 26
Kunal Ahirwar 27
Manoj Singh 28
Mohan Singh Bhadoriya 29
Muskan Sikarwar 30
Roadmap
Introduction to Hypothesis Testing

Understanding the Chi-Square Test

Defining the Chi-Square Test Statistic (x²)

Chi-Square Goodness-of-Fit Test

Hypothesis Testing for a Population Proportion Using Chi-Square


Goodness-of-Fit Test

Chi-Square Test of Independence

Chi-Square Test of Homogeneity


Introduction to Hypothesis
Testing for Categorical Data
Meaning:
Hypothesis testing for categorical data helps
determine if there are meaningful patterns or
relationships in data that fall into categories (like
survey responses, customer preferences, or
demographic characteristics).
• What is Categorical Data? Data grouped into categories
without inherent numeric meaning (e.g., gender, brand
preference).
• Chi-Square Test Purpose: Used to evaluate relationships
and test hypotheses on categorical data without requiring
normal distribution.
• Applications: Commonly used in fields like market
research, healthcare studies, and social sciences to make
Defining the Chi-Square Test
Statistic (x²)
Chi-Square Test Overview:
The Chi-Square (x²) test is a statistical method used to
analyze categorical data. It assesses whether observed
frequencies in categories significantly differ from expected
frequencies.
Formula:
x² = Σ((O - E)² / E)
• O = Observed Frequency, E = Expected Frequency
Interpretation:
A higher x² value suggests a greater difference between
observed and expected frequencies, potentially indicating a
significant relationship or difference within the data.
Conditions for Applying
the Chi-Square Test
Key Conditions:
• Categorical Data Required: The test is designed for
categorical variables, such as survey responses or
demographic groups.
• Sample Size: Expected frequencies in each category
should be at least 5 to ensure reliable results.
• Independence of Observations: Each observation
should be independent, meaning that one outcome
doesn’t influence another.
Importance:
Meeting these conditions is essential for the validity of the
Chi-Square test, as it helps produce accurate and meaningful
Chi-Square for Goodness-of-
Purpose:
Fit
The Goodness-of-Fit test checks is the observed frequency distribution of a single
categorical variable matches an expected distribution (e.g., testing if preferences are
equally spread across options).
•The chi-square test for goodness-of-fit uses frequency data from a sample to
test hypotheses about the shape or proportions of a population.
•The data, called observed frequencies, simply count how many individuals from
the sample are in each category.

Example : Candy Color Preference


Objective:
• To determine if customers have a uniform preference for candy colors.
Hypotheses:
• Null Hypothesis (H0): Customers prefer candy colors uniformly.
• Alternative Hypothesis (H1): Customers do not prefer candy colors uniformly
Data Collection:
A survey of 200 customers yields the following observed preferences:
• Red: 50
• Blue: 60
• Green: 30
• Yellow: 60
Expected Distribution:
Assuming equal preference, the expected count for each color is:
Expected count= 200/4= 50
Chi-Square Calculation: Using the formula x² = Σ((O−E)2E):
For Red: (50 - 50)²/50 = 0
For Blue: (60 - 50)²/50 = 2
For Green: (30 - 50)²/50 = 8
For Yellow: (60 - 50)²/50= 2
Total x²:
x² = 0 + 2 + 8 + 2 = 12
Degrees of Freedom:
df=k−1=4−1=3
Critical Value (α = 0.05):
Approximately 7.815 for df=3
Conclusion:
• Since 12>7.815 we reject the null hypothesis.
• Interpretation: There is significant evidence that customer preferences for candy colors are not uniform.
Case Study for Goodness-of-Fit
Test: Survey on Preferred
Coffee Flavors
Objective: Test if customer preference for coffee flavors (e.g., Vanilla, Mocha, Hazelnut) is
uniformly distributed.

Case Study Details:


• Hypotheses:
⚬ Null (H0): Preferences are uniformly distributed across flavors.
⚬ Alternative (H1): Preferences are not uniformly distributed.
• Data Collection:
⚬ Survey results: 150 respondents, with observed preferences:
■ Vanilla: 40
■ Mocha: 55
■ Hazelnut: 55
• Expected Distribution (Assume Equal Preference):
⚬ Expected count for each flavor = 150 / 3 = 50.
Chi-Square Calculation:
• For Vanilla: (40 - 50)² / 50 = 2
• For Mocha: (55 - 50)² / 50 = 0.5
• For Hazelnut: (55 - 50)² / 50 = 0.5
• Total x² = 2 + 0.5 + 0.5 = 3
Interpretation: The calculated x² value for this test is 3. To determine whether this value is
statistically significant, we compare it to the critical chi-square value for the specified degrees of
freedom (df) and significance level (commonly set at 0.05).
• Degrees of Freedom (df): Calculated as the number of categories minus one, so df = 3 - 1 = 2.
• Critical Value: For df = 2 at a 0.05 significance level, the critical chi-square value is
approximately 5.99.
Decision Rule:
• If x² ≤ 5.99: Fail to reject the null hypothesis (H0), suggesting that any differences in flavor
preferences could be due to random chance.
• If x² > 5.99: Reject the null hypothesis (H0), indicating that the differences in observed
preferences are significant and not due to random variation.
Conclusion for This Case Study: Since the calculated x² value (3) is less than the critical value
(5.99), we fail to reject the null hypothesis. This suggests that customer preferences across
Vanilla, Mocha, and Hazelnut flavors do not significantly differ from a uniform distribution, meaning
any observed variation is likely due to random chance rather than a true preference.
Hypothesis Testing for a Population
Proportion Using Chi-Square Goodness-
Objective:
of-Fit Test
Use the Chi-Square (x²) test as an alternative to the z-test for testing if an observed proportion matches an expected population
proportion.
Example Scenario:
A retailer believes 60% of its customers prefer online shopping, while 40% prefer in-store shopping. To verify this, a survey is conduct
with 200 customers.
Steps:
1.Formulate Hypotheses:
⚬ Null Hypothesis (H0): The observed distribution matches the expected proportions (60% online, 40% in-store).
⚬ Alternative Hypothesis (H1): The observed distribution does not match the expected proportions.
2.Collect Observed and Expected Frequencies:
⚬ Observed (from Survey):
■ Online: 120 customers, In-store: 80 customers
⚬ Expected (based on Assumption):
■ Online: 200 × 0.60 = 120, In-store: 200 × 0.40 = 80
3.Apply x² Formula:
⚬ x² = Σ((Observed - Expected)² / Expected) for each category
⚬ Here, x² = ((120 - 120)² / 120) + ((80 - 80)² / 80) = 0
4.Interpret Result (p-value or Critical Value):
⚬ Degrees of Freedom (df): 1 (number of categories - 1)
⚬ Decision Rule:
■ If x² is less than or equal to the critical value for df = 1 (e.g., 3.84 at a 0.05 significance level), we fail to reject H0, suggest
Case Study for Test of
Independence: Customer
Satisfaction and Brand Loyalty
Objective:
Investigate if customer satisfaction levels are associated with brand loyalty
status.
Case Study Details:
Hypotheses:
• Null Hypothesis (H0): Satisfaction and loyalty are independent.
• Alternative Hypothesis (H1): Satisfaction and loyalty are not independent.
Data Collection: Survey results show:
• Satisfied: 70 loyal, 30 not loyal
• Neutral: 40 loyal, 60 not loyal
• Dissatisfied: 10 loyal, 90 not loyal
Contingency Loyal Not Loyal Row Total

Table: Satisfied 70 30 100

Neutral 40 60 100

Dissatisfied 10 90 100

Total 120 180 300

Expected Frequencies Calculation: Chi-Square Calculation:


Using the formula:
For each cell, calculate the expected frequency using the x² = Σ((Observed - Expected)² / Expected)
formula: • For Satisfied & Loyal: (70-40)2/40 = 22.5
Expected Frequency = (Row Total × Column Total) / Grand • For Satisfied & Not Loyal: (30-60)2 / 60 = 15
Total • For Neutral & Loyal: (40-40)2 / 40 = 0
• For Satisfied & Loyal: (100 × 120) / 300 = 40 • For Neutral & Not Loyal: (60-60)2 / 60 = 0
• For Dissatisfied & Loyal: (10-40)2 / 40 = 22.5
• For Satisfied & Not Loyal: (100 × 180) / 300 = 60
• For Dissatisfied & Not Loyal: (90-60)2 / 60 =15
• For Neutral & Loyal: (100 × 120) / 300 = 40
• For Neutral & Not Loyal: (100 × 180) / 300 = 60
• For Dissatisfied & Loyal: (100 × 120) / 300 = 40
• For Dissatisfied & Not Loyal: (100 × 180) / 300 = 60
Total Chi-Square Value:
x² = 22.5 + 15 + 0 + 0 + 22.5 + 15 = 75

Interpretation:
Compare the calculated x² value (75) to the critical value with degrees of freedom (df
= (rows - 1) × (columns - 1) = 2) at a significance level of 0.05 (critical value ≈ 5.99).
Conclusion:
Since 75 > 5.99, we reject the null hypothesis (H0). This indicates that customer
satisfaction levels are associated with brand loyalty status.
Chi-Square Test of Homogeneity
Objective:
To test if distributions of preferences are the same across different groups.
Example Use Case:
Testing if beverage preferences differ among various age groups (18-25, 26-35, 36-45).
Case Study Details:
• Hypotheses:
⚬ Null Hypothesis (H0): Beverage preferences are the same across age groups.
⚬ Alternative Hypothesis (H1): Beverage preferences differ across age groups.

Age Group Beverage A Beverage B Beverage C Beverage D


Data Collection:
18-25 30 20 50 100

26-35 25 35 40 100

36-45 20 30 50 100

Total 75 85 140 300


Steps Involved:
Calculate Expected Frequencies:
For example, for Age Group 18-25 and Beverage A:
Expected Count=100×75/300=25
Chi-Square Calculation:
For each cell, calculate:
x² = Σ((O - E)²/E)
Calculated values:
Beverage A (18-25): (30 - 25)²/25 = 1
Beverage B (18-25): (20 - 28.33)²/28.33=1.14
Beverage C (18-25): (50 - 46.67)²/46.67=0.21
Repeat for other cells and sum the values to get x².
Total Chi-Square Value:
Assume calculated x² = 10.5
Degrees of Freedom:
df=(r−1)(c−1)=(3−1)(3−1)=4
Critical Value (α = 0.05):
From Chi-Square table, critical value for df=4 is approximately 9.488.
Conclusion:
Decision: Since 10.5>9.488, we reject H0.
Interpretation: There is significant evidence that beverage preferences differ across age groups,
suggesting tailored marketing strategies for each demographic.
Conclusion
In summary, the Chi-Square test serves as a robust method for hypothesis
testing involving categorical data, making it invaluable across diverse fields
such as marketing, healthcare, and social sciences. The case studies discussed
highlight its practical applications, particularly in analyzing customer
preferences and behaviors. By enabling researchers and analysts to assess the
relationships and distributions of categorical variables, the Chi-Square test aids
in informed decision-making and strategic planning, ultimately enhancing the
understanding of market trends and consumer choices.
THANK YOU!

You might also like