Skip to content

Aashirwad10/Educational_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Source

The dataset used in this project is from Kaggle: Education Inequality Data
License: MIT License
Original author: Shamim Hasan


A. Data Quality Check

Preview (Screenshot)

Data Head

  • 🔼 It gives us a look into our 1st 5 row
  • df.head()

Data info

  • 🔼 shows all column, data types, missing values....
  • df.info()

Describe

  • 🔼 It gives min,max,mean....
  • df.describe()

Checking Missing values

  • 🔼 Checks if we have any missing values
  • df.isnull().sum()

Checking duplicate values

  • 🔼 Checks if we have any duplicates
  • df.duplicated().sum()

B. Basic Summaries

Preview (Screenshot)

School by state

  • 🔼 It gives us info on how many schools are there in states
  • school_by_state

School by type

  • 🔼 It gives us info on school type
  • school_by_type

School by level

  • 🔼 It gives us info on level of school
  • school_by_level

C. Basic Visual

Preview (Screenshot)

Bar graph of schools by type

  • 🔼 Bar Graph on schools by type
  • school_by_type

Histogram of funding_per_student_usd

  • 🔼 Histogram of funding per student (in usd)
  • funding_per_student_usd

Histogram of avg_test_score_percent

  • 🔼 Histogram of average test score percent
  • avg_test_score_percent

D. Find patterns & correlations

Preview (Screenshot)

Correlation Summary

  • 🔼 Correlation summary in table format
  • corr

Heatmap

  • 🔼 Correlation Heatmap of School Factors
  • heatmap

  • 🔍 Reading specific values

    • Funding vs Test Scores → 0.02 → Almost no correlation
    • Student–Teacher Ratio vs Dropout Rate → 0.01 → Essentially no relationship
    • Percent Low-Income vs Avg Test Score → -0.00 → No significant link
    • Internet Access vs Dropout Rate → 0.02 → Weak, negligible positive relationship
    • Most numbers lie between -0.07 and +0.04, showing no strong explanatory links
  • 🔍 Key Insights

    • All correlations are weak (close to 0) → dataset doesn’t show strong linear relationships
    • Real-world expectations (but not seen strongly here):
      • More funding → better test scores
      • Higher low-income % → higher dropout rate
      • Better internet access → higher test scores
  • 🔍 Strongest Positive Correlations

    • Student–Teacher Ratio vs Avg Test Score → 0.04
    • Internet Access vs Student–Teacher Ratio → 0.03
    • Internet Access vs Avg Test Score → 0.02
    • 👉 All negligible
  • 🔍 Strongest Negative Correlations

    • Student–Teacher Ratio vs Percent Low Income → -0.07
    • Dropout Rate vs Id → -0.05
    • Id vs Percent Low Income → -0.05
    • 👉 Again, very weak
  • 🔍 Summary

    • All correlations fall within -0.07 → +0.04
    • In practice → no strong linear relationships
    • Heatmap confirms near independence between variables

E. Group Comparisons (categorical vs numeric)

Preview (Screenshot)

Average Test Scores by School Type

  • 🔼 Average Test Scores by School Type
  • avg_scores_by_type

F. Scatter Plots

Preview (Screenshot)

Scatter Plots || Funding vs Test Score

  • 🔼 Scatter Plots || Funding vs Test Score
  • Funding vs Test Scores (with Trend Line)

Scatter Plots || Student-Teacher Ratio vs Test Score

  • 🔼 Scatter Plots || Student-Teacher Ratio vs Test Score
  • Student-Teacher Ratio vs Test Score (with Trend Line)

Scatter Plots || Percent Low-Income vs Dropout Rate

  • 🔼 Scatter Plots || Percent Low-Income vs Dropout Rate
  • Percent Low-Income vs Dropout Rate (with Trend Line)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages