The dataset used in this project is from Kaggle: Education Inequality Data
License: MIT License
Original author: Shamim Hasan
- 🔼 It gives us a look into our 1st 5 row
df.head()
- 🔼 shows all column, data types, missing values....
df.info()
- 🔼 It gives min,max,mean....
df.describe()
- 🔼 Checks if we have any missing values
df.isnull().sum()
- 🔼 Checks if we have any duplicates
df.duplicated().sum()
- 🔼 It gives us info on how many schools are there in states
school_by_state
- 🔼 It gives us info on school type
school_by_type
- 🔼 It gives us info on level of school
school_by_level
- 🔼 Bar Graph on schools by type
school_by_type
- 🔼 Histogram of funding per student (in usd)
funding_per_student_usd
- 🔼 Histogram of average test score percent
avg_test_score_percent
- 🔼 Correlation summary in table format
corr
- 🔼 Correlation Heatmap of School Factors
heatmap
-
🔍 Reading specific values
- Funding vs Test Scores → 0.02 → Almost no correlation
- Student–Teacher Ratio vs Dropout Rate → 0.01 → Essentially no relationship
- Percent Low-Income vs Avg Test Score → -0.00 → No significant link
- Internet Access vs Dropout Rate → 0.02 → Weak, negligible positive relationship
- Most numbers lie between -0.07 and +0.04, showing no strong explanatory links
-
🔍 Key Insights
- All correlations are weak (close to 0) → dataset doesn’t show strong linear relationships
- Real-world expectations (but not seen strongly here):
- More funding → better test scores
- Higher low-income % → higher dropout rate
- Better internet access → higher test scores
-
🔍 Strongest Positive Correlations
- Student–Teacher Ratio vs Avg Test Score → 0.04
- Internet Access vs Student–Teacher Ratio → 0.03
- Internet Access vs Avg Test Score → 0.02
- 👉 All negligible
-
🔍 Strongest Negative Correlations
- Student–Teacher Ratio vs Percent Low Income → -0.07
- Dropout Rate vs Id → -0.05
- Id vs Percent Low Income → -0.05
- 👉 Again, very weak
-
🔍 Summary
- All correlations fall within -0.07 → +0.04
- In practice → no strong linear relationships
- Heatmap confirms near independence between variables
- 🔼 Average Test Scores by School Type
avg_scores_by_type
- 🔼 Scatter Plots || Funding vs Test Score
Funding vs Test Scores (with Trend Line)
- 🔼 Scatter Plots || Student-Teacher Ratio vs Test Score
Student-Teacher Ratio vs Test Score (with Trend Line)
- 🔼 Scatter Plots || Percent Low-Income vs Dropout Rate
Percent Low-Income vs Dropout Rate (with Trend Line)