0% found this document useful (2 votes)
2K views2 pages

EDA Summary Report

The Exploratory Data Analysis (EDA) report covers a dataset with 1,000 observations and 20 features, including missing data analysis and descriptive statistics. Key findings include missing values in age, gender, and credit score, with suggested imputations and a notable imbalance in the target variable of delinquency. Recommendations for model readiness include handling missing data, addressing class imbalance, and evaluating models using various performance metrics.

Uploaded by

2005amansheikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (2 votes)
2K views2 pages

EDA Summary Report

The Exploratory Data Analysis (EDA) report covers a dataset with 1,000 observations and 20 features, including missing data analysis and descriptive statistics. Key findings include missing values in age, gender, and credit score, with suggested imputations and a notable imbalance in the target variable of delinquency. Recommendations for model readiness include handling missing data, addressing class imbalance, and evaluating models using various performance metrics.

Uploaded by

2005amansheikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis (EDA)

Summary Report
1. Dataset Overview
• Observations: 1,000
• Features (Columns): 20
• Data Types:
- Numerical: 14
- Categorical: 6

2. Missing Data Analysis


Feature Missing Count Missing % Suggested
Imputation

Age 75 7.5% Mean (MCAR)

Gender 30 3.0% Mode (MAR)

Credit Score 100 10.0% Median (MAR)

Types of Missingness:
- MCAR (Missing Completely at Random)
- MAR (Missing at Random)
- MNAR (Missing Not at Random)

3. Descriptive Statistics (Numerical Features)


Feature Missing Missing % Suggested
Count Imputation

Age 75 7.5% Mean


(MCAR)

Gender 30 3.0% Mode (MAR)

Credit Score 100 10.0% Median


(MAR)

Types of Missingness:
- MCAR (Missing Completely at Random)
- MAR (Missing at Random)
- MNAR (Missing Not at Random)

3. Descriptive Statistics (Numerical Features)


Feature Mean Median Std Dev Min Max

Age 34.5 34 9.2 18 65

Income 55,000 52,000 14,000 20k 120k

Credit Score 680 690 85 400 850

4. Categorical Analysis
Feature Unique Values Most Frequent Mode %

Gender Male, Female Male 62%

Status Single, Married, Single 48%


Other

5. Feature Correlation (Top Numeric Features)


Age Income Credit Score
Age 1.00 0.45 0.30
Income 0.45 1.00 0.55
Credit Score 0.30 0.55 1.00

6. Target Variable Analysis (Delinquency)


• Positive Cases (Delinquent): 220
• Negative Cases (Non-delinquent): 780
• Imbalance Ratio: 1:3.5
This indicates imbalanced data, requiring methods like SMOTE or reweighting.

7. Model Readiness & Recommendations


• Handle missing data via mean/median/mode imputation.
• Address imbalance with sampling techniques.
• Perform feature scaling and encoding.
• Consider logistic regression for binary classification.
• Evaluate models using confusion matrix, AUC-ROC, F1 score, precision, and recall.

You might also like