SuperLender-Loan-Prediction

This project predicts whether a loan will be Good (1) or Bad (0) using machine learning models. The goal is to help SuperLender assess credit risk effectively.

SuperLender is a digital lending company struggling with loan defaults. Using customer demographics, loan history, and repayment behavior, this project builds machine learning models to predict the likelihood that a new loan will be repaid (“Good”) or default (“Bad”). The analysis revealed that loan amount, repayment history, and employment status were among the strongest predictors of creditworthiness. With the best-performing model (Random Forest, 95.87% accuracy), SuperLender can automate risk scoring and reduce default rates by focusing on high-risk applicants before disbursing loans.

Datasets

traindemographics.csv – Customer demographic information
trainperf.csv – Performance of current loans (target: good_bad_flag)
trainprevloans.csv – History of previous loans
test... files – Similar structure for testing

Features

Data cleaning (handling nulls, datetime conversion)
Feature engineering (loan ratios, repayment gaps, etc.)
Correlation analysis via heatmap
Multiple ML models:
- Logistic Regression
- SVM
- KNN
- LightGBM
- XGBoost
- Gradient Boosting
Model comparison and evaluation using accuracy, ROC-AUC, and confusion matrix.

Data Cleaning

-Converted date columns to datetime Removed null values and irrelevant columns Merged all train and test data into unified datasets

Feature Engineering Created ratios such as totaldue / loanamount Calculated repayment duration and loan term features Encoded categorical variables (education, bank name, etc.)
Exploratory Data Analysis Correlation analysis with heatmap Visualization of loan amount distribution and term duration Identification of key predictors for default.

Key Insights

-Customers with shorter repayment gaps and stable employment have better repayment behavior. -Loan amount and education level strongly influence credit performance. -Ensemble models (Random Forest, LightGBM, XGBoost) achieved the best overall accuracy and ROC-AUC.

Results

LightGBM achieved the best performance (highest ROC-AUC).
Feature importance plots highlight key predictors of loan repayment.

Technologies

Python 3
Pandas, NumPy
Scikit-learn
LightGBM, XGBoost
Matplotlib, Seaborn
Jupyter Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
loan.ipynb		loan.ipynb
submission.csv		submission.csv
testdemographics.csv		testdemographics.csv
testperf.csv		testperf.csv
testprevloans.csv		testprevloans.csv
traindemographics.csv		traindemographics.csv
trainperf.csv		trainperf.csv
trainprevloans.csv		trainprevloans.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SuperLender-Loan-Prediction

This project predicts whether a loan will be Good (1) or Bad (0) using machine learning models. The goal is to help SuperLender assess credit risk effectively.

Datasets

Features

Data Cleaning

Key Insights

Results

Technologies

Thank you for viewing this project! Feedback and suggestions are welcome.

About

Uh oh!

Releases

Packages

Languages

ZainabAbari/SuperLender-Loan-Prediction

Folders and files

Latest commit

History

Repository files navigation

SuperLender-Loan-Prediction

This project predicts whether a loan will be Good (1) or Bad (0) using machine learning models. The goal is to help SuperLender assess credit risk effectively.

Datasets

Features

Data Cleaning

Key Insights

Results

Technologies

Thank you for viewing this project! Feedback and suggestions are welcome.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages