0% found this document useful (0 votes)

68 views21 pages

Bank Loan Case Study

The Bank Loan Case Study demonstrates the application of Exploratory Data Analysis (EDA) in banking risk analytics, focusing on data cleaning, outlier detection, and visualization using MS Excel. The analysis involved two datasets: current and previous loan applications, revealing insights on loan defaults and factors influencing loan approval. Key findings include the impact of income, age, and education on loan applications and defaults, as well as the challenges of handling large datasets with missing values and outliers.

Uploaded by

Nitika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views21 pages

Bank Loan Case Study

Uploaded by

Nitika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Trainity Project-6

lOMoARcPSD|30732588

Project-6: BANK LOAN CASE STUDY

Description:
This case study attempts to demonstrate the application of EDA in a real-world
business environment. In this case study, in addition to using the techniques learned
in the EDA module, you will gain a basic grasp of risk analytics in banking and
financial services, as well as how data is utilized to reduce the risk of losing money
when lending to consumers

Approach:
For this project, approach was to analyse the dataset, clean the dataset finding
the blanks and missing values, imputing the missing values with the appropriate
method (mean, median, mode). Then I tried to find the outliers in the dataset, there
are some anomalies such as negative values which need either to be deleted or
standardized. After all these I used pivot tables and basic charts to visualise the data.
Moreover, insights were drawn based on my understandings

Tech-Stack:
• MS Excel (2023)
• Dataset provided (Bank Dataset)
Insights:
The dataset contains 3 files:
1. application_data.csv: contains all the information of the client at the time of
application. The data is about whether a client has payment difficulties
lOMoARcPSD|30732588

2. previous_application.csv: contains information about the client’s previous

loan data. It contains the data whether the previous application had been
Approved, Cancelled, Refused or Unused offer.

3. columns_descrption.csv: It is data dictionary which describes the meaning of

the variables

Both sets of data contained many undesired columns that will not be used for risk
analytics, as well as many blanks. So, I cleaned up the data.
Following the data cleaning procedure, I split columns in the dataset based on
two categories of variables. 1) Categorical variables
2) Numerical variables
Categorical variables (non-numerical variables)- person's occupation, education
status.
Numerical variables - income, credit etc.,
The following are some of the categorical and numerical variables from the
provided data set.

Categorical variables Numeric variables

Gender Age
Name contract type Days employed
Income type Amount Income
Education Amount Annuity
Housing type Amount Credit

I completed full EDA on the present application and then on the previous
application. Then, in this report, I summarized the results of both
applications and provided business insights.
lOMoARcPSD|30732588

Current application.csv

Task 2 (Find Missing Data):

Importing the dataset in excel :

lOMoARcPSD|30732588
lOMoARcPSD|30732588

Imputing the missing values using mean , median and mode

lOMoARcPSD|30732588

3.Outliers can only be identified on Numeric

Box plotted Target column vs

1) Amount credit
2) Amount Income
3) Amount Annuity
lOMoARcPSD|30732588

Data imbalance: Data imbalance occurs when data is disseminated in an

unequal manner. I plotted data imbalance using Pivot charts.

NAME CONTRACT TYPE

lOMoARcPSD|30732588

Task 5 (EDA):
Univariate Analysis:

INFERENCE
Individuals with higher incomes are less likely to apply for loans. The credit
amount of a bank loan is typically in the range of 45000 to 1045000. The majority
of loan applications have come from people between the ages of 35 and 50. Those
with 0 to 8 years of work experience are the most likely to seek for loans.
Individuals who own homes are more likely to apply for loans than others. Those
who are married have taken out more loans. More loans have been requested by
working people. Unaccompanied minors have requested for extra loans.

Amount Income
lOMoARcPSD|30732588

Amount Credit

AGE
lOMoARcPSD|30732588

Name suite type

Bivariate Analysis:
INFERENCE
Customers who live in low-rating areas will have higher defaults.
Individuals with lower incomes are more likely to default. Young people are
more likely to default, and the trend of defaulters declines with age. Ladies
are less inclined than males to have defaults. More defaults are predicted due
to maternity leave and unemployment. Customers with more than five family
members are more likely to default on their bank loan. Customers with fewer
educational qualifications are more likely to fail on a bank loan. Customers
with hardly work experience are more likely to have defaults.

Region Rating Client vs Target Amount Income vs Target

lOMoARcPSD|30732588

Age vs Target Gender vs Target

INCOME TYPE VS TARGET FAMILY MEMBER VS

TARGET

EDUCATION TYPE VS TARGET MONTHS EMPLOYED

VSTARGET
lOMoARcPSD|30732588

Task 6 (Finding top 10 correlations):

Top 10 driving factors in current application.csv

1. Income type
2. Count of Family Members
3. Children count
4. External source
5. Region rating of client

6. Age
7. Months Employed
8. Amount credit
9. Amount Goods Price
10. Amount total income
lOMoARcPSD|30732588

Earlier Application.csv

Task 2 (Data Cleaning):

Column removal: I used the COUNTBLANK function to determine the number of
blanks in a column, and if it exceeded 5%, I eliminated it. I removed a couple
columns that were of no use to the analysis
There are 1670214 rows in the dataset where as Excel has a Max limit of 1048576
rows and as per the project requirement we are supposed to use only Excel for
Analysis. Hence we’d be limited to the use of 1048576 rows

Task 3 (Finding Outliers):

lOMoARcPSD|30732588

Task 4(Data Imbalance):

Below are the columns where data is unevenly distributed
lOMoARcPSD|30732588

Task 5 (EDA):
Univariate Analysis:

Inference

Customers have largely chosen cash and consumer loans. The majority of our
clients are repeat customers.

The majority of current loan applicants are individuals who applied for loans
less than ten months ago. More loans have been requested for consumer
gadgets.
lOMoARcPSD|30732588

Bivariate Analysis:
Inference
Customers who applied for more than Rs. 350,000 will most likely be denied. The
majority of loans sought for through Credit and Cash agencies are cancelled.New
clients are overjoyed because the majority of their loans were approved. Thus far,
car loans have been denied. Loans made to MLM partner clients are likely to be
cancelled. Virtually 80% of the loans were authorised, with a steady stream of
rejections.Consumer loans have nearly no cancellations and the greatest approval
rate. Several loans for the first Selling place area group were cancelled.Clients who
apply for another loan within 10 months of their previous loan are more likely to
have it cancelled. Walk-in loans have a higher refusal rate
lOMoARcPSD|30732588

Task 6 (Finding Correlations): Top ten reasons for loan cancellation and refusal
lOMoARcPSD|30732588

1. Amount Application
2. Cash loan Purpose
3. Goods Category
4. Product Combination
5. Product type
6. Channel type
7. Months Decision
8. Contract type
9. Client type
10. Payment type
lOMoARcPSD|30732588

Task 7 (Combining two sheets): I then ran analysis on the common set of
data by joining the Target column with the previous application table. I used
MySQL to join them. I loaded the data into workbench and ran the following
query.

Query:
SELECT TARGET,
SK_ID_CURR,
NAME_CONTRACT_TYPE,
AMT_APPLICATION,
NAME_CASH_LOAN_PURPOSE,
NAME_CONTRACT_STATUS,
NAME_CLIENT_TYPE, DAYS_DECISION,
CODE_REJECT_REASON,
NAME_SELLER_INDUSTRY,
NAME_PORTFOLIO,
NAME_PRODUCT_TYPE,
CHANNEL_TYPE, SELLERPLACE_AREA,
NAME_YIELD_GROUP,
PRODUCT_COMBINATION
FROM application_data
JOIN previous_application ON SK_ID_CURR;

pivot table analysis

lOMoARcPSD|30732588

Clients who have applied for previous loans have no defaults in current loans

Excel file:
https://docs.google.com/spreadsheets/d/1HqSNv0NdM7yg0Q0uC0IL7x_Ds
Xt6JgLD/edit?usp=drive_link&ouid=115404029938861642621&rtpof=tru
e&sd=true

RESULT:

This project involved extensive use of Excel. The major challenge was working with
such huge data. This project helped me understand how to work with huge datasets.
This helped me understand how 2 datasets are merged to analyze the details. The
dataset involved a lot of missing data and outliers, handling them was a task and this
project helped me understand what to how and why of handling the outliers and Null
values. The project also helped me discover new add-ins such as data analyze.

Bank Loan Case Study1
No ratings yet
Bank Loan Case Study1
13 pages
Trainity Data Analytics Training Project 6
No ratings yet
Trainity Data Analytics Training Project 6
22 pages
Bank Loan Default Risk Analysis
No ratings yet
Bank Loan Default Risk Analysis
26 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Bank Loan Data Insights
No ratings yet
Bank Loan Data Insights
11 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Trainity-Data An
No ratings yet
Trainity-Data An
24 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
13 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
EDA Assignment
No ratings yet
EDA Assignment
33 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
LendingClubCaseStudy 1
No ratings yet
LendingClubCaseStudy 1
19 pages
Problem Statement
No ratings yet
Problem Statement
11 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
41 pages
Credit EDA Case Study Problem Statement
No ratings yet
Credit EDA Case Study Problem Statement
4 pages
GLCA DA MS Excel HBFC Project Modified-1
No ratings yet
GLCA DA MS Excel HBFC Project Modified-1
3 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
26 pages
Bank Loan PDF
No ratings yet
Bank Loan PDF
30 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
Bank Loan Data Analysis Study
No ratings yet
Bank Loan Data Analysis Study
11 pages
Credit EDA Assignment
No ratings yet
Credit EDA Assignment
23 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
Bank Loan Casestudy
No ratings yet
Bank Loan Casestudy
17 pages
Thera Bank Loan Campaign Analysis
No ratings yet
Thera Bank Loan Campaign Analysis
21 pages
Thera Bank Loan Campaign Analysis
100% (1)
Thera Bank Loan Campaign Analysis
21 pages
LBG Step Up Career Challenge PPT Deck
No ratings yet
LBG Step Up Career Challenge PPT Deck
29 pages
Spark Python Course APPLY Project Problem Statement
No ratings yet
Spark Python Course APPLY Project Problem Statement
3 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Credit Risk EDA Case Study
67% (6)
Credit Risk EDA Case Study
41 pages
Fradulent Credit Case Study
100% (1)
Fradulent Credit Case Study
31 pages
Vehicle Loan Default Prediction Report
No ratings yet
Vehicle Loan Default Prediction Report
23 pages
Harsh Project
No ratings yet
Harsh Project
13 pages
Credit Card Default Risk Analysis
100% (1)
Credit Card Default Risk Analysis
16 pages
EDA Assignment Summary PDF
No ratings yet
EDA Assignment Summary PDF
12 pages
Loan Approval EDA Insights
100% (1)
Loan Approval EDA Insights
15 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
This Study Resource Was: Bank Loan Default Prediction Model
No ratings yet
This Study Resource Was: Bank Loan Default Prediction Model
9 pages
Bank Loan Case Study 2
No ratings yet
Bank Loan Case Study 2
23 pages
Thera Bank Loan Purchase Modelling
No ratings yet
Thera Bank Loan Purchase Modelling
44 pages
Mohit Project
No ratings yet
Mohit Project
13 pages
Supervised Learning Problem For Solving
No ratings yet
Supervised Learning Problem For Solving
2 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Credit Eda Case Study
100% (2)
Credit Eda Case Study
17 pages
EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
Hillier 7e Ch02 PPT Accessible
No ratings yet
Hillier 7e Ch02 PPT Accessible
74 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
19 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
Edafinal 1
No ratings yet
Edafinal 1
32 pages
Assignment Business Analytics
No ratings yet
Assignment Business Analytics
2 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
EDA for Loan Risk Analysis
No ratings yet
EDA for Loan Risk Analysis
4 pages
Kevin Neuman Theatrical Technician Resume
No ratings yet
Kevin Neuman Theatrical Technician Resume
1 page
Sidra Khalil Aisha Akram Sobia Anwar Dawra
No ratings yet
Sidra Khalil Aisha Akram Sobia Anwar Dawra
25 pages
Lion Air Eticket (KNFEPG) - Setiadi
No ratings yet
Lion Air Eticket (KNFEPG) - Setiadi
10 pages
Rectangular Hollow Section WWW Jfeindia Com
No ratings yet
Rectangular Hollow Section WWW Jfeindia Com
5 pages
61m2p032a0400 (0N0)
No ratings yet
61m2p032a0400 (0N0)
3 pages
Design Patterns Course Guide
No ratings yet
Design Patterns Course Guide
13 pages
Datasheet TL082 PDF
No ratings yet
Datasheet TL082 PDF
11 pages
Abusive Email
No ratings yet
Abusive Email
4 pages
Correction Element
No ratings yet
Correction Element
20 pages
en 10025
No ratings yet
en 10025
4 pages
RPG Combat Skills Guide
No ratings yet
RPG Combat Skills Guide
14 pages
R1 Variation External Light 18-02-2024 ENJAZ
No ratings yet
R1 Variation External Light 18-02-2024 ENJAZ
1 page
Introduction To J2534 and Flash Reprogramming
75% (4)
Introduction To J2534 and Flash Reprogramming
11 pages
Grade 11 Management Lessons
94% (16)
Grade 11 Management Lessons
6 pages
Bus Body Building Business Plan
No ratings yet
Bus Body Building Business Plan
4 pages
UDB - Hybrid Security Control Panel - Operation Command-1
No ratings yet
UDB - Hybrid Security Control Panel - Operation Command-1
17 pages
Ecr Grade 11 and 12
No ratings yet
Ecr Grade 11 and 12
80 pages
HDPE Pipe Butt Welding Guide
100% (1)
HDPE Pipe Butt Welding Guide
5 pages
DC Drive User's Manual
No ratings yet
DC Drive User's Manual
38 pages
06 05 10 Simaitis JC2IEDM
No ratings yet
06 05 10 Simaitis JC2IEDM
29 pages
MBA1 Human Resource Management Jan 2013
No ratings yet
MBA1 Human Resource Management Jan 2013
280 pages
Functional Specifications Wipro: Confidentiality
No ratings yet
Functional Specifications Wipro: Confidentiality
10 pages
Myntra Terms of Use
No ratings yet
Myntra Terms of Use
8 pages
User Guide: Studio Console Power Supply
No ratings yet
User Guide: Studio Console Power Supply
8 pages
Students' Answer Template - Vote Book-Soft
No ratings yet
Students' Answer Template - Vote Book-Soft
4 pages
NEDA Metering Guidelines 2016
No ratings yet
NEDA Metering Guidelines 2016
5 pages
2013 - Usa - PDF - BRKCRS-3142 - Troubleshooting Cisco Catalyst 4500 Series Switches
No ratings yet
2013 - Usa - PDF - BRKCRS-3142 - Troubleshooting Cisco Catalyst 4500 Series Switches
103 pages
Global Operations Strategy Guide
No ratings yet
Global Operations Strategy Guide
22 pages
Owner'S Manual: 5AV-F8199-E1
No ratings yet
Owner'S Manual: 5AV-F8199-E1
76 pages
Mechanical Fatigue (Includes Vibration Fatigue)
No ratings yet
Mechanical Fatigue (Includes Vibration Fatigue)
10 pages

Bank Loan Case Study

Uploaded by

Bank Loan Case Study

Uploaded by

Trainity Project-6

Project-6: BANK LOAN CASE STUDY

2. previous_application.csv: contains information about the client’s previous

3. columns_descrption.csv: It is data dictionary which describes the meaning of

Categorical variables Numeric variables

Task 2 (Find Missing Data):

Importing the dataset in excel :

Imputing the missing values using mean , median and mode

3.Outliers can only be identified on Numeric

Box plotted Target column vs

Data imbalance: Data imbalance occurs when data is disseminated in an

NAME CONTRACT TYPE

Name suite type

Region Rating Client vs Target Amount Income vs Target

Age vs Target Gender vs Target

INCOME TYPE VS TARGET FAMILY MEMBER VS

EDUCATION TYPE VS TARGET MONTHS EMPLOYED

Task 6 (Finding top 10 correlations):

Task 2 (Data Cleaning):

Task 3 (Finding Outliers):

Task 4(Data Imbalance):

pivot table analysis

You might also like