Exploratory Data Analysis (EDA)
Summary Report
1. Dataset Overview
• Observations: 1,000
• Features (Columns): 20
• Data Types:
- Numerical: 14
- Categorical: 6
2. Missing Data Analysis
Feature Missing Count Missing % Suggested
Imputation
Age 75 7.5% Mean (MCAR)
Gender 30 3.0% Mode (MAR)
Credit Score 100 10.0% Median (MAR)
Types of Missingness:
- MCAR (Missing Completely at Random)
- MAR (Missing at Random)
- MNAR (Missing Not at Random)
3. Descriptive Statistics (Numerical Features)
Feature Missing Missing % Suggested
Count Imputation
Age 75 7.5% Mean
(MCAR)
Gender 30 3.0% Mode (MAR)
Credit Score 100 10.0% Median
(MAR)
Types of Missingness:
- MCAR (Missing Completely at Random)
- MAR (Missing at Random)
- MNAR (Missing Not at Random)
3. Descriptive Statistics (Numerical Features)
Feature Mean Median Std Dev Min Max
Age 34.5 34 9.2 18 65
Income 55,000 52,000 14,000 20k 120k
Credit Score 680 690 85 400 850
4. Categorical Analysis
Feature Unique Values Most Frequent Mode %
Gender Male, Female Male 62%
Status Single, Married, Single 48%
Other
5. Feature Correlation (Top Numeric Features)
Age Income Credit Score
Age 1.00 0.45 0.30
Income 0.45 1.00 0.55
Credit Score 0.30 0.55 1.00
6. Target Variable Analysis (Delinquency)
• Positive Cases (Delinquent): 220
• Negative Cases (Non-delinquent): 780
• Imbalance Ratio: 1:3.5
This indicates imbalanced data, requiring methods like SMOTE or reweighting.
7. Model Readiness & Recommendations
• Handle missing data via mean/median/mode imputation.
• Address imbalance with sampling techniques.
• Perform feature scaling and encoding.
• Consider logistic regression for binary classification.
• Evaluate models using confusion matrix, AUC-ROC, F1 score, precision, and recall.