0% found this document useful (0 votes)
17 views10 pages

State Proect

The document presents a mini project conducted by a group of data science students analyzing household income, gender, marital status, job satisfaction, and retirement status using various statistical tests. Independent-samples T-tests indicated no significant difference in household income between genders, while chi-square tests revealed a significant association between gender and marital status. Additionally, a one-way ANOVA showed significant differences in household income across job satisfaction levels, and a chi-square test confirmed a significant difference in retirement status between males and females.

Uploaded by

bisratengda613
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

State Proect

The document presents a mini project conducted by a group of data science students analyzing household income, gender, marital status, job satisfaction, and retirement status using various statistical tests. Independent-samples T-tests indicated no significant difference in household income between genders, while chi-square tests revealed a significant association between gender and marital status. Additionally, a one-way ANOVA showed significant differences in household income across job satisfaction levels, and a chi-square test confirmed a significant difference in retirement status between males and females.

Uploaded by

bisratengda613
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistics Group4 Mini Project

DATA SCIENCE 2017


GROUP MEMBER ID

1 BISRAT ENGDA 1601073

2 ANDARGE SEYFU 1601391

3 BEREKET ANDUALEM 1601442

4 AFOMIA KELEMEWERK 1601505

5 BETHLEHEM FASIL 1601455

6 ALEMTSEHAY DERIBE 1601519

7 AYELE GIRUM 1601406


1. Perform independent-samples T-test to show whether there is a significant difference in the household income between
males and females and interpret the result.

Null and Alternative Hypotheses for the Independent-Samples T-Test


Null Hypothesis (H₀):
"There is no significant difference in household income between males and females."
(Formally: μFemale=μMaleμFemale=μMale, where μμ is the population mean income.)
Alternative Hypothesis (H₁):
"There is a significant difference in household income between males and females."
(Formally: μFemale≠μMaleμFemale=μMale)

Group Statistics
Std. Std. Error
Gender N Mean Deviation Mean
Household income in Female 3179 68.7798 75.73510 1.34323
thousands Male 3221 70.1608 81.56216 1.43712

Independent Samples Test


Levene's Test for Equality of
Variances t-test for Equality of Means
95% Confidence Interval of
Mean Std. Error the Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Household income in Equal variances 1.865 .172 -.702 6398 .483 -1.38101 1.96808 -5.23912 2.47709
thousands assumed
Equal variances not -.702 6374.362 .483 -1.38101 1.96713 -5.23725 2.47522
assumed

In the output above, the significance level for Levene’s test is 0.172, which is greater than 0.05. This means that the assumption of equal variances has
not been violated; therefore, our report will be based on the t-value provided in the first line of the table (equal variances assumed).
An independent-samples t-test was conducted to compare household income between males and females. Since p = 0.483, which is greater than 𝛼 =
0.05, we fail to reject the null hypothesis and conclude that there was no significant difference in mean household income between males and females
(t(6398) = -0.702, p = 0.483).

The mean difference was -1,381.01∗∗(951,381.01∗∗(955,239.12, $2,477.09]), indicating that the average household income for females was slightly
lower than that of males, but this difference was not statistically significant.

2. Use the chi-square test for independence to see the association between the variables gender and marital status and provide
interpretation.

Null Hypothesis (H₀):


"There is no association between gender and marital status."
(Formally: Gender and marital status are independent variables.)
Alternative Hypothesis (H₁):
"There is an association between gender and marital status."
(Formally: Gender and marital status are not independent.)

Gender * Marital status Cross tabulation


Count
Marital status
Unmarrie
d Married Total
Gende Female 1533 1646 3179
r Male 1691 1530 3221
Total 3224 3176 6400

Chi-Square Tests
Asymptotic
Significance Exact Sig. Exact Sig.
Value df (2-sided) (2-sided) (1-sided)
Pearson Chi- 11.705a 1 .001
Square
Continuity 11.534 1 .001
b
Correction
Likelihood Ratio 11.708 1 .001
Fisher's Exact .001 .000
Test
N of Valid Cases 6400
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
1577.58.
b. Computed only for a 2x2 table
From the result, a footnote below the Chi-Square Tests table indicates that "0 cells (0.0%) have expected count less than 5". This means we have not
violated the assumption, as all expected cell sizes were greater than 5 (in this case, the minimum expected count was 1577.58).
Results of the Analysis:
The null hypothesis stated that there is no relationship between gender and marital status. Since the p-value (0.001) is less than our chosen significance
level (𝛼 = 0.05), we reject the null hypothesis and conclude that there was a statistically significant association between gender and marital status (χ²(1)
= 11.705, p = 0.001).

Interpretation of Findings:
Unmarried: Males (1691) were slightly more likely to be unmarried than females (1533).
Married: Females (1646) were slightly more likely to be married than males (1530).

The chi-square test confirmed that this distribution was not due to random chance, indicating a significant (but small) association between gender and
marital status in the sample.

3 Conduct a one-way ANOVA with post-hoc tests (if appropriate) to household income across each of employment
satisfaction levels and report the result.

Null and Alternative Hypotheses for One-Way ANOVA


Null Hypothesis (H₀):
"There is no significant difference in mean household income across different employment satisfaction levels."
(Formally: µ₁ = µ₂ = µ₃ = ... = µₖ, where µ represents the population mean income for each satisfaction group)
Alternative Hypothesis (H₁):
"There is a significant difference in mean household income between at least two employment satisfaction levels."

Test of Homogeneity of Variances


Levene
Statistic df1 df2 Sig.
Household income in Based on Mean 148.715 4 6395 .000
thousands Based on Median 96.645 4 6395 .000
Based on Median and 96.645 4 4355.85 .000
with adjusted df 5
Based on trimmed 119.925 4 6395 .000
mean

ANOVA
Household income in thousands
Sum of Mean
Squares df Square F Sig.
Between 3454592.30 4 863648.076 152.580 .000
Groups 5
Within Groups 36197529.6 6395 5660.286
45
Total 39652121.9 6399
50

Robust Tests of Equality of Means


Household income in thousands
Statistica df1 df2 Sig.
Welch 160.911 4 3133.97 .000
7
a. Asymptotically F distributed.

Multiple Comparisons
Dependent Variable: Household income in thousands
Tukey HSD
(I) Job satisfaction (J) Job satisfaction Mean Std. Sig. 95% Confidence Interval
Difference (I- Lower Upper
J) Error Bound Bound
Highly dissatisfied Somewhat -11.24592* 3.09320 .003 -19.6859 -2.8060
dissatisfied
Neutral -23.26447* 3.02776 .000 -31.5259 -15.0031
Somewhat satisfied -41.61645* 3.02155 .000 -49.8609 -33.3720
Highly satisfied -67.91674* 3.11903 .000 -76.4272 -59.4063
Somewhat Highly dissatisfied 11.24592* 3.09320 .003 2.8060 19.6859
dissatisfied Neutral -12.01855* 2.92016 .000 -19.9864 -4.0507
Somewhat satisfied -30.37053* 2.91372 .000 -38.3208 -22.4203
Highly satisfied -56.67082* 3.01469 .000 -64.8966 -48.4451
Neutral Highly dissatisfied 23.26447* 3.02776 .000 15.0031 31.5259
Somewhat 12.01855* 2.92016 .000 4.0507 19.9864
dissatisfied
Somewhat satisfied -18.35199* 2.84415 .000 -26.1124 -10.5916
Highly satisfied -44.65227* 2.94751 .000 -52.6947 -36.6098
Somewhat satisfied Highly dissatisfied 41.61645* 3.02155 .000 33.3720 49.8609
Somewhat 30.37053* 2.91372 .000 22.4203 38.3208
dissatisfied
Neutral 18.35199* 2.84415 .000 10.5916 26.1124
Highly satisfied -26.30029* 2.94113 .000 -34.3253 -18.2753
Highly satisfied Highly dissatisfied 67.91674* 3.11903 .000 59.4063 76.4272
Somewhat 56.67082* 3.01469 .000 48.4451 64.8966
dissatisfied
Neutral 44.65227* 2.94751 .000 36.6098 52.6947
Somewhat satisfied 26.30029* 2.94113 .000 18.2753 34.3253
*. The mean difference is significant at the 0.05 level.

One-Way ANOVA Report: Household Income by Job Satisfaction Level


A one-way between-groups analysis of variance was conducted to explore whether there is a significant difference in household income across five
levels of job satisfaction (Highly Dissatisfied, Somewhat Dissatisfied, Neutral, Somewhat Satisfied, Highly Satisfied).
Assumption Checks
 Levene’s Test indicated a violation of homogeneity of variances (F(4, 6395) = 148.715, *p* < .001). Due to this violation, robust Welch ANOVA
results were prioritized.
ANOVA Results
 The Welch ANOVA confirmed a statistically significant difference in household income across job satisfaction levels (F(4, 3133.977) =
160.911, *p* < .001).
Post-Hoc Comparisons (Tukey HSD)
Post-hoc comparisons revealed the following significant differences (*p* < .05):
1. Highly Satisfied employees earned significantly more than all other groups:
o vs. Highly Dissatisfied: **MD = 67,916.74∗∗,9567,916.74∗∗,9559,406.3, $76,427.2]
o vs. Somewhat Dissatisfied: **MD = 56,670.82∗∗,9556,670.82∗∗,9548,445.1, $64,896.6]
o vs. Neutral: **MD = 44,652.27∗∗,9544,652.27∗∗,9536,609.8, $52,694.7]
o vs. Somewhat Satisfied: **MD = 26,300.29∗∗,9526,300.29∗∗,9518,275.3, $34,325.3]
2. Somewhat Satisfied employees earned significantly more than:
o Highly Dissatisfied: **MD = 41,616.45∗∗,9541,616.45∗∗,9533,372.0, $49,860.9]
o Somewhat Dissatisfied: **MD = 30,370.53∗∗,9530,370.53∗∗,9522,420.3, $38,320.8]
o Neutral: **MD = 18,351.99∗∗,9518,351.99∗∗,9510,591.6, $26,112.4]
3. Neutral employees earned significantly more than:
o Highly Dissatisfied: **MD = 23,264.47∗∗,9523,264.47∗∗,9515,003.1, $31,525.9]
o Somewhat Dissatisfied: **MD = 12,018.55∗∗,9512,018.55∗∗,954.0507, $19,986.4]
Conclusion
There was a significant association between job satisfaction and household income (F(4, 3133.977) = 160.911, *p* < .001), with highly satisfied
employees reporting the highest incomes. All pairwise comparisons were significant (*p* < .05), indicating a graded relationship: higher job
satisfaction was associated with progressively higher income.

4. Test a hypothesis to know whether there is a significance difference between the proportion of males and females who is
retired.

Gender * Retired Cross tabulation


Retired
No Yes Total
Gende Female Count 3053a 126b 3179
r Expected 3026.0 153.0 3179.0
Count
% within 96.0% 4.0% 100.0%
Gender
% within 50.1% 40.9% 49.7%
Retired
Male Count 3039a 182b 3221
Expected 3066.0 155.0 3221.0
Count
% within 94.3% 5.7% 100.0%
Gender
% within 49.9% 59.1% 50.3%
Retired
Total Count 6092 308 6400
Expected 6092.0 308.0 6400.0
Count
% within 95.2% 4.8% 100.0%
Gender
% within 100.0% 100.0% 100.0%
Retired
Each subscript letter denotes a subset of Retired categories whose column
proportions do not differ significantly from each other at the .05 level.

Chi-Square Tests
Asymptotic
Significance Exact Sig. Exact Sig.
Value df (2-sided) (2-sided) (1-sided)
Pearson Chi- 9.939a 1 .002
Square
Continuity 9.574 1 .002
b
Correction
Likelihood Ratio 9.995 1 .002
Fisher's Exact .002 .001
Test
N of Valid Cases 6400
a. 0 cells (0.0%) have expected count less than 5. The minimum expected
count is 152.99.
b. Computed only for a 2x2 table

Interpretation of Chi-Square Test for Independence: Gender and Retirement Status


1. Assumption Check
o A footnote below the Chi-Square Tests table indicates that "0 cells (0.0%) have an expected count less than 5."
o This means we have not violated the assumption for the Chi-Square test, as all expected cell counts are greater than 5 (the minimum
expected count was 152.99).
2. Hypothesis Testing
o Null Hypothesis (H₀): There is no association between gender and retirement status.
o Alternative Hypothesis (H₁): There is a significant association between gender and retirement status.
3. Test Results
o The Pearson Chi-Square test showed a statistically significant association:
 χ²(1) = 9.939, p = .002
o Since p < 0.05, we reject the null hypothesis and conclude that there is a significant relationship between gender and retirement status.
4. Effect Size & Proportions
o Retirement Rates:
 Females: 4.0% retired
 Males: 5.7% retired
o The difference in proportions is statistically significant, with males having a higher retirement rate than females.
5. Final Reporting
"A Chi-Square Test of Independence revealed a statistically significant association between gender and retirement status, χ²(1, N = 6400) = 9.939, p
= .002. The proportion of retired males (5.7%) was significantly higher than that of females (4.0%)."

5, Measure the relationship between household income and price of primary vehicle and its significance using spearman’s
rank order correlation coefficient.
 Null Hypothesis (H₀):
There is no significant relationship between household income and the price of the primary vehicle.
(ρ = 0)
 Alternative Hypothesis (H₁):
There is a significant relationship between household income and the price of the primary vehicle.
(ρ ≠ 0)

Correlations
Household Price of
income in primary
thousands vehicle
Spearman's Household income in Correlation 1.000 .998**
rho thousands Coefficient
Sig. (2-tailed) . .000
N 6400 6400
Price of primary Correlation .998** 1.000
vehicle Coefficient
Sig. (2-tailed) .000 .
N 6400 6400
**. Correlation is significant at the 0.01 level (2-tailed).
Spearman’s Rank-Order Correlation Report: Household Income & Vehicle Price
1. Analysis Summary
A Spearman’s rank-order correlation was conducted to examine the relationship between household income (in thousands) and the price of
primary vehicle.
2. Key Results
 Correlation Coefficient (ρ): 0.998

 p-value: < .001

 Sample Size (N): 6,400

3. Interpretation
 Strength & Direction:

The correlation coefficient of 0.998 indicates an almost perfect positive monotonic relationship between household income and vehicle price. As
income increases, the price of the primary vehicle tends to increase proportionally.
 Statistical Significance:

The p-value (< .001) confirms the correlation is highly significant at the 0.01 level.
4. Conclusion
There is an extremely strong, statistically significant positive relationship between household income and vehicle price (Spearman’s ρ = .998, p
< .001). Higher-income households are associated with significantly more expensive primary vehicles.

You might also like