Project Documents

This document discusses the development of a machine learning model for predicting loan approval, focusing on reducing loan defaults and enhancing accuracy in the banking sector. It explores various algorithms, particularly the decision tree method, and emphasizes the importance of data privacy in handling sensitive borrower information. The study aims to automate the loan approval process by analyzing applicant data to identify eligible borrowers and mitigate risks for investors in peer-to-peer lending scenarios.

Uploaded by

paldibya712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views9 pages

Project Documents

Uploaded by

paldibya712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Prediction Of Loan Approval Using Machine Learning Algorithm

ABSTRACT :
In our banking system, banks sell a wide range of goods, but their credit lines are their primary source of
revenue. so they can profit from the interest on the loans they have refinanced. A bank's profit or loss is
mostly determined by how well its customers repay their loans or if they default. Identifying loan
defaulters can help the bank lower its non-performing assets. This makes it crucial to examine this
phenomenon. Prior studies conducted in this era have demonstrated the wide range of approaches
available for examining the issue of loan default control. However, it is crucial to research the nature of
the various approaches and compare them because accurate forecasts are crucial for maximizing earnings.
The subject of loan defaulter prediction is studied using a crucial predictive analytics technique: the logistic
regression model. Kaggle is where the data is gathered for analysis and forecasting. The various
performance metrics have been calculated using logistic regression models. Performance metrics like
sensitivity and specificity are used to compare the models. The model produces varying outputs, according
to the end results.

INTRODUCTION:
Small loans play a crucial role in our daily lives by enabling would-be business owners to launch
their ventures, enabling inquisitive students to afford higher education that would otherwise be
unaffordable without a steady source of income, and—above all—enabling regular people without
friends or family to get short-term financial support and get back on their feet so they can pursue
the American Dream. That being said, there is a chance of default with any loan. Financially
speaking, default refers to the inability to fulfill a loan's legal duty to repay principal and interest.
It's a prevalent issue in the financial sector and one of the main dangers of making loans. Of course,
most of the time, default does not occur, and the lending banks are typically able to recover the
loss from a loan that defaults by using other loans that have been fully paid back, along with the
interest that goes along with them. Additionally, banks are trading off a higher likelihood of default
with a bigger profit from the high interest rate when they offer loans with higher interest rates to
those who have a high potential of defaulting. All things considered, default is inevitable, and the
majority of financial institutions have a well-established procedure to absorb the loss and lessen its
effects. However, what if the loan is made up of money from multiple investors rather than being
issued by a single bank? One of the numerous peer-to-peer lending companies that creates this
odd circumstance is Lending Club. Peer-to-peer lending companies, to put it simply, serve as a
middleman between investors and borrowers. The business establishes a platform on which
investors may look for and select which small, unsecured personal loans to make investments in.
The company receives a cut from both parties (origination fee from borrowers and service charge
from investors), investors earn from the loan interest, and borrowers get the loan they desire. This
implies that a single bank will no longer bear the entire loss in the event of a loan failure; rather,
some or all of the investors will. If all of the loans came from a single lender, the total profit might
be positive because other completely paid loans could make up the loss. However, if investors did
not diversify, there will be winners and losers within this new type of lending practices. Predicting
whether a certain loan would default based on the borrowers' initial information and credit report
is a clear solution to this issue. Undoubtedly, Lending Club already has a methodology in place for
approving loans that are listed on their website.
In order to reduce the burden placed on individual investors, this study will examine the steps
and outcomes involved in developing a new machine learning model that could forecast a loan
default. More significantly, though, the model will concentrate on reducing the total loss on
investment from bad loans. The study will also examine privacy-preserving measures for sensitive
data obtained from the borrower's credit record. As previously said, the ultimate objective is to
assess a condensed form of RAPPOR (Randomized Aggregately Privacy Preserving Ordinal
Response) and ascertain if data hashed by this algorithm can still be used to forecast loan default.

LITERATURE SURVEY:
Kamil Amira Ibrahim Hassan and Ajith Abraham (2008) build a supervised two-layer feed-forward
network using a prediction model built with three distinct training procedures. The findings
demonstrate that the training method enhances the loan default prediction model's architecture.
Angelini (2008) used a neural network with standard topology and a feed-forward neural network
with ad hoc connections. Neural network can be used for prediction model. This paper shows
that the above two models give optimum results with less error.
Ngai (2009) predicts future customer behavior in CRM using the classification model. Neural
networks are the most often utilized model in the CRM industry. Between 2000 and 2006, he
identified 87 papers related to data mining techniques and applications.
Dr. A. Chitra and S. Uma (2010) introduced a ensemble learning method for prediction of time
series based on Radial Basis Function networks (RBF), K - Nearest Neighbor (KNN) and Self
Organizing Map (SOM). They proposed a model namely PAPEM which perform better than
individual model.
Akkoç (2012) used a model namely hybrid Adaptive Neuro- Fuzzy Inference model, grouping of
statistics and Neuro- Fuzzy network. A 10-fold cross validation is used for better results and a
comparison with other models.
Sarwesh Site, Dr. Sadhna K. Mishra (2013) proposed a method in which two or more classifiers
are combined together to produce an ensemble model for the better prediction. They used the
bagging and boosting techniques and then used random forest technique.
Maher Alaraj, Maysam Abbod, and Ziad Hunaiti (2014) proposed a new ensemble method for
classification of costumer loan. This ensemble method is based on neural network. They state
that the proposed method gives better results and accuracy as compared to single classifier and
any other model.
AlarajM , AbbodM (2015) introduced a model that are based on homogenous and heterogeneous
classifiers. Ensemble model based on three classifiers that are logistic artificial neural network,
logistic regression and support vector machine.
PROBLEM STATEMENT :
To develop and put into operation a system that uses data mining and machine learning to
forecast a user's likelihood of receiving a bank loan in order to increase accuracy and reduce
fraud. All around the world, banks, home finance companies, and some NBFCs deal in a variety
of loan products, including business, personal, and housing loans. These businesses operate in
urban, semi-urban, and rural regions. After applying loan by customer these companies validate
the eligibility of customers to get the loan or not. This paper provides a solution to automate
this process by employing machine learning algorithm. So, the customer will fill an online loan
application form. This form consists details like Sex, Marital Status, Qualification, Details of
Dependents, Annual Income, Amount of Loan, Credit History of Applicant and others. To
automate this process by using machine learning algorithm, First the algorithm will identify
those segments of the customers who are eligible to get loan amounts so bank can focus on
these customers.

ABOUT THE DATA SET :

Here is the description of the variables present in the dataset:
Variable Description Type
Loan_ID Unique Loan ID Object
Gender Male / Female Categorical
Married Applicant married (Yes/No) Categorical
Dependents Number of dependents Categorical
Education Education (Graduate / Not Graduate) Categorical
Self_Employed Self-employed (Yes/No) Categorical
Applicant Income Applicant’s income Numerical
Co applicant Income Co applicant’s income Numerical
LoanAmount Loan amount in thousands Numerical
Loan_Amount_Term Term of loan in months Numerical
Credit_History Credit history meets guidelines Numerical
Property_Area Urban / Semiurban / Rural Categorical
Loan_Status Loan approved (Yes/No) – class Categorical

PROPOSED SYSTEM :
Decision tree algorithm in machine learning methods which efficiently performs both
classification and regression tasks[2]. It creates decision trees. Decision trees are widely used in
the banking industry due to their high accuracy and ability to formulate a statistical model in
plain language. In Decision tree each node represents a feature (attribute), each link (branch)
represents a decision (rule) and each leaf represents an outcome (categorical or continues
value). Using different data analytics tools loan prediction and there severity can be forecasted.
In this process it is required to train the data using different algorithms and then compare user
data with trained data to predict the nature of loan. Several R functions and packages were used
to prepare the data and to build the classification model. The work proves that the R package
is an efficient visualizing tool that applies data mining techniques. Using R Package, customer’s
data analysis can be done and depends on that bank can sanction or reject the loan. In real time
customers data sets may have many missing and imputed data which needs to be replaced with
valid data generated by making use of the available completed data. The dataset has many
attributes that define the credibility of the customers seeking for several types of loan. The
values for these attributes can have outliers that do not fit into the regular range of data.
ALGORITHM USED :
Decision Tree(DT) is a supervised learning algorithm used to solve classification and regression
problems too. Here, DT uses tree representation to solve the prediction problem, i.e., external
node and leaf node in a tree represents attribute and
class labels respectively. The pseudo code for DT model is depicted in the following section:

Step 1: Best attribute is chosen as the tree‘s root.

Step 2: Training set is divided into subsets, such that, each subset comprises similar value for
an attribute.
Step 3: Step 1 and Step 2 are repeated for all subsets until all the leaf nodes are traversed in a
tree.

Fig. An Example of Decision Tree Algorithm

RESULT :

VALIDATION :
We use k-fold cross validation for this problem.Using this method, we can divide the preprocessed
dataset into training and test sets which are completely disjoint. This distribution of dataset is
done randomly so that there is no common data between the training set and the test set.
For example, if we have a dataset of N (=500) data records, using 10-fold CV (i.e. k=10), the
number of data records in the training set will be
N × (k – 1)/k = 500 × 9/10 = 450
and the number of data records in the test set will be
N × 1/k = 500 × 1/10 = 50
CONCLUSION :
The analytical process started from data cleaning and processing, Missing value imputation with mice
package, then exploratory analysis and finally model building and evaluation. The best accuracy on public
test set is 0.721. This brings some of the following insights about approval. Applicants with Credit history
not passing fails to get approved, probably because that they have a probability of a not paying back.
Most of the Time, Applicants with high income sanctioning low amount is to more likely get approved
which make sense, more likely to pay back their loans. Some basic characteristic gender and marital
status seems not to be taken into consideration by the company.

REFERENCES :

[1] Dr. A. Chitra and S. Uma., “An Ensemble Model of Multiple Classifiers for Time Series
Prediction”, International Journal of Computer Theory and Engineering, Vol. 2, Issue 3, pp. 454–
458, June 2010.
[2] M. V. Jagannatha Reddy and B. Kavitha, “ Extracting Prediction Rules for Loan Default
Using Neural Networks through Attribute Relevance Analysis”, International Journal of Computer
Theory and Engineering, Vol. 2, Issue 4, pp. 596-601, August 2010.
[3] Ms. Neethu Baby, Mrs. Priyanka L.T., “ Customer Classification And Prediction Based On
Data Mining Technique” , International Journal of Emerging Technology and Advanced
Engineering, Vol. 2, Issue 12, pp. 314- 318, December 2012.
[4] Sivasree M S, Rekha Sunny T, “Loan Credibility Prediction System Based on Decision Tree
Algorithm”, International Journal of Engineering Research & Technology , Vol. 4 ,Issue 09, pp.
825- 830,
September 2015.
[5] E. Angelini, A. Roli, and G. di Tollo, “A neural network approach for credit risk evaluation”
elsevier, The Quarterly Review of Economics and Finance, Vol. 48, Issue 4, pp. 733–755,
November 2008.
[6] Suresh Ramakrishna, Maryam Mirzaei and Mahmoud Bekri, “Adaboost Ensemble
Classifiers for Corporate Default Prediction” , 1st International Conference of Recent Trends in
Information and Communication Technologies, pp. 258-269, September 2014
[7] Amira Kamil Ibrahim Hassan and Ajith Abraham, “Modeling Consumer Loan Default
Prediction Using Ensemble Neural Networks”, International Conference on Computing, Electrical
and Electronics Engineering , pp. 719 – 724, August 2013.
[8] Toby Segaran, “Programming Collective Intelligence: Building Smart Web 2.0
Applications.” O’Reilly Media.
[9] Drew Conway and John Myles White,” Machine Learning for Hackers: Case Studies and
Algorithms to Get you Started,” O’Reilly Media.
[10] Trevor Hastie, Robert Tibshirani, and Jerome Friedman,”The Elements of Statistical Lear.
[11] PhilHyo Jin Do ,Ho-Jin Choi, “Sentiment analysis of real-life situations using loca- tion,
people and time as contextual features,” International Conference on Big Data and Smart
Computing (BIGCOMP), pp. 39–42. IEEE, 2015.
[12] Bing Liu, “Sentiment Analysis and Opinion Mining,” Morgan & Claypool Publishers, May
2012.
[13] Bing Liu, “Sentiment Analysis: Mining Opinions, Sentiments, and Emotions,” Cambridge
University Press, ISBN:978-1-107-01789-4.
[14] Shiyang Liao, Junbo Wang, Ruiyun Yu, Koichi Sato, and Zixue Cheng, “CNN for situations
understanding based on sentiment analysis of twitter data,” Procedia computer science,
111:376–381, 2017.CrossRef.
[15] K I Rahmani, M.A. Ansari, Amit Kumar Goel, “An Efficient Indexing Algorithm for
CBIR,”IEEE- International Conference on Computational Intelligence & Communication
Technology ,13-14 Feb 2015.
[16] Gurlove Singh, Amit Kumar Goel ,”Face Detection and Recognition System using Digital
Image Processing” , 2nd International conference on Innovative Mechanism for Industry
Application ICMIA 2020, 5-7 March 2020, IEEE Publisher. ning: Data Mining, Inference, and
Prediction,” Springer ,Kindle.
[17] Amit Kumar Goel, Kalpana Batra, Poonam Phogat,” Manage big data using optical
networks”, Journal of Statistics and Management Systems “Volume 23, 2020, Issue 2, Taylors &
Francis.
[18] Raj, J. S., & Ananthi, J. V., “Recurrent neural networks and nonlinear prediction in support
vector machine” Journal of Soft Computing Paradigm (JSCP), 1(01), 33-40, 2019.
[19] Aakanksha Saha, Tamara Denning, VivekSrikumar, Sneha Kumar Kasera. "Secrets inSource
Code: Reducing False Positives usingMachine Learning", 2020 InternationalConference on
Communication Systems &Networks (COMSNETS), 2020.
[20] X.Frencis Jensy, V.P.Sumathi,Janani Shiva Shri, “An exploratory Data Analysis for Loan
Prediction based on nature of clients”, International Journal of Recent Technology and
Engineering (IJRTE),Volume-7 Issue-4S, November 2018.
[21] Pidikiti Supriya, Myneedi Pavani, Nagarapu Saisushma,Namburi Vimala Kumari, k
Vikash,“Loan Prediction by using Machine Learning Models”, International Journal of Engineering
and Techniques.Volume 5 Issue 2, Mar-Apr 2019.

IJCRT2106313
No ratings yet
IJCRT2106313
3 pages
Loan Approval Prediction via DM Techniques
No ratings yet
Loan Approval Prediction via DM Techniques
8 pages
Ajol-File-Journals 387 Articles 263414 65b236d58cc5e
No ratings yet
Ajol-File-Journals 387 Articles 263414 65b236d58cc5e
8 pages
2022 V13i1198
No ratings yet
2022 V13i1198
12 pages
Prathyush PullaUB9A
No ratings yet
Prathyush PullaUB9A
9 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
Xtreme Boosting Machine
No ratings yet
Xtreme Boosting Machine
5 pages
Loan Default Prediction System
No ratings yet
Loan Default Prediction System
44 pages
Ajol-File-Journals 543 Articles 255840 650d5184b77f4
No ratings yet
Ajol-File-Journals 543 Articles 255840 650d5184b77f4
14 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
11 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Loan Prediction with ML Models
No ratings yet
Loan Prediction with ML Models
11 pages
Loan Prediction System Using Machine Learning
No ratings yet
Loan Prediction System Using Machine Learning
4 pages
1 PB
No ratings yet
1 PB
13 pages
Loan Default Prediction: Decision Trees vs. Random Forest
No ratings yet
Loan Default Prediction: Decision Trees vs. Random Forest
13 pages
Rapport Loan Prediction Finance
No ratings yet
Rapport Loan Prediction Finance
24 pages
Madaan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012042
No ratings yet
Madaan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012042
13 pages
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
No ratings yet
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
4 pages
ABSTRACT
No ratings yet
ABSTRACT
7 pages
10.3934 Dsfe.2024009
No ratings yet
10.3934 Dsfe.2024009
14 pages
B.E Cse Batchno 149
No ratings yet
B.E Cse Batchno 149
43 pages
2818-Article Text-5218-1-10-20210411
No ratings yet
2818-Article Text-5218-1-10-20210411
5 pages
Research Report
No ratings yet
Research Report
8 pages
Shsconf Icdeba2023 02008
No ratings yet
Shsconf Icdeba2023 02008
5 pages
Loan Prediction Using Machine Learning
No ratings yet
Loan Prediction Using Machine Learning
89 pages
Lending Club Data Analysis PDF
No ratings yet
Lending Club Data Analysis PDF
3 pages
AI Loan Risk Prediction for Banks
No ratings yet
AI Loan Risk Prediction for Banks
3 pages
Loan
No ratings yet
Loan
4 pages
Loan Eligibility Prediction
No ratings yet
Loan Eligibility Prediction
14 pages
Gupta 2020
No ratings yet
Gupta 2020
4 pages
Final Report
No ratings yet
Final Report
69 pages
Cluster Credit Risk R PDF
No ratings yet
Cluster Credit Risk R PDF
13 pages
Predicting Loan Defaults with AI
No ratings yet
Predicting Loan Defaults with AI
10 pages
Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk
No ratings yet
Algorithm Comparison For Data Mining Classification: Assessing Bank Customer Credit Scoring Default Risk
10 pages
Loan Prediction Using ML Models
No ratings yet
Loan Prediction Using ML Models
49 pages
Make 06 00004
No ratings yet
Make 06 00004
25 pages
Credit Loan Default Prediction Model
No ratings yet
Credit Loan Default Prediction Model
4 pages
Bank Loan Approval via ML
No ratings yet
Bank Loan Approval via ML
13 pages
Credit Risk Management Using ML
No ratings yet
Credit Risk Management Using ML
4 pages
Loan Approval Prediction Using Machine Learning
No ratings yet
Loan Approval Prediction Using Machine Learning
2 pages
Loan Default Risk Assessment Using Supervised Learning
No ratings yet
Loan Default Risk Assessment Using Supervised Learning
7 pages
IJNRD2407179
No ratings yet
IJNRD2407179
7 pages
Hp1047, Vmr286 Loan Default Prediction Final Report
No ratings yet
Hp1047, Vmr286 Loan Default Prediction Final Report
8 pages
Wa0003.
No ratings yet
Wa0003.
6 pages
Research Paper
No ratings yet
Research Paper
14 pages
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
No ratings yet
An Exploratory Data Analysis For Loan Prediction Based On Nature of The Clients
4 pages
Bank Loan Fraud Detection Insights
No ratings yet
Bank Loan Fraud Detection Insights
7 pages
Developing Prediction Model of Loan Risk in Banks Using Data Mining
No ratings yet
Developing Prediction Model of Loan Risk in Banks Using Data Mining
9 pages
Aadarsha ML STW
No ratings yet
Aadarsha ML STW
35 pages
Wa0001.
No ratings yet
Wa0001.
8 pages
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
No ratings yet
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
36 pages
Reasearchby AK0102
No ratings yet
Reasearchby AK0102
7 pages
Iim 2022092709434339
No ratings yet
Iim 2022092709434339
8 pages
2022 V13i876
No ratings yet
2022 V13i876
9 pages
Loan Prediction Using Artificial Intelligence and Machine Learning
No ratings yet
Loan Prediction Using Artificial Intelligence and Machine Learning
23 pages
Midterm 2017
No ratings yet
Midterm 2017
5 pages
Unit 5
No ratings yet
Unit 5
25 pages
Use of Machine Learning Algorithms For Weld Quality Monitoring Using Acoustic Signature
No ratings yet
Use of Machine Learning Algorithms For Weld Quality Monitoring Using Acoustic Signature
7 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
ML Notes (Unit 1&2)
No ratings yet
ML Notes (Unit 1&2)
42 pages
06 Data Mining-Data Preprocessing-Cleaning
No ratings yet
06 Data Mining-Data Preprocessing-Cleaning
6 pages
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
5 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
Syllabus ML
No ratings yet
Syllabus ML
5 pages
Decision Trees & Entropy Explained
No ratings yet
Decision Trees & Entropy Explained
16 pages
Chapter 3 Test Bank
No ratings yet
Chapter 3 Test Bank
55 pages
A Tour of Machine Learning Algorithms
No ratings yet
A Tour of Machine Learning Algorithms
9 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Pakdd 2018 Workshops Bdasc BDM Ml4cyber Paisi Damemo Melbourne Vic Australia June 3 2018 Revised Selected Papers Mohadeseh Ganji
No ratings yet
Pakdd 2018 Workshops Bdasc BDM Ml4cyber Paisi Damemo Melbourne Vic Australia June 3 2018 Revised Selected Papers Mohadeseh Ganji
141 pages
Minimizing Pending Cases in Indian Courts Using Artificial Intelligence Techniquespredict The Outcome of Consumer Complients
No ratings yet
Minimizing Pending Cases in Indian Courts Using Artificial Intelligence Techniquespredict The Outcome of Consumer Complients
15 pages
ML Viva Questions
No ratings yet
ML Viva Questions
25 pages
Data Mining On Educational Domain: Nikhil Rajadhyax Prof. Rudresh Shirwaikar
No ratings yet
Data Mining On Educational Domain: Nikhil Rajadhyax Prof. Rudresh Shirwaikar
6 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
Garbage Content Estimation Using Internet of Things and Machine Learning
No ratings yet
Garbage Content Estimation Using Internet of Things and Machine Learning
13 pages
ML and DL Exam Paper
No ratings yet
ML and DL Exam Paper
3 pages
Course Plan 21CSC307P - Machine Learning For Data Analytics
No ratings yet
Course Plan 21CSC307P - Machine Learning For Data Analytics
13 pages
ML QB Ans
No ratings yet
ML QB Ans
141 pages
Learn Random Forest Using Excel
No ratings yet
Learn Random Forest Using Excel
9 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
MBA Data Mining Essentials
No ratings yet
MBA Data Mining Essentials
28 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
15 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
25 Questions To Test Your Skills On Decision Trees
No ratings yet
25 Questions To Test Your Skills On Decision Trees
10 pages
Sleep Data Analysis for Students
No ratings yet
Sleep Data Analysis for Students
57 pages

Project Documents

Uploaded by

Project Documents

Uploaded by

Prediction Of Loan Approval Using Machine Learning Algorithm

ABOUT THE DATA SET :

Step 1: Best attribute is chosen as the tree‘s root.

Fig. An Example of Decision Tree Algorithm

You might also like