0% found this document useful (0 votes)
24 views9 pages

Project Documents

This document discusses the development of a machine learning model for predicting loan approval, focusing on reducing loan defaults and enhancing accuracy in the banking sector. It explores various algorithms, particularly the decision tree method, and emphasizes the importance of data privacy in handling sensitive borrower information. The study aims to automate the loan approval process by analyzing applicant data to identify eligible borrowers and mitigate risks for investors in peer-to-peer lending scenarios.

Uploaded by

paldibya712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views9 pages

Project Documents

This document discusses the development of a machine learning model for predicting loan approval, focusing on reducing loan defaults and enhancing accuracy in the banking sector. It explores various algorithms, particularly the decision tree method, and emphasizes the importance of data privacy in handling sensitive borrower information. The study aims to automate the loan approval process by analyzing applicant data to identify eligible borrowers and mitigate risks for investors in peer-to-peer lending scenarios.

Uploaded by

paldibya712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Prediction Of Loan Approval Using Machine Learning Algorithm

ABSTRACT :
In our banking system, banks sell a wide range of goods, but their credit lines are their primary source of
revenue. so they can profit from the interest on the loans they have refinanced. A bank's profit or loss is
mostly determined by how well its customers repay their loans or if they default. Identifying loan
defaulters can help the bank lower its non-performing assets. This makes it crucial to examine this
phenomenon. Prior studies conducted in this era have demonstrated the wide range of approaches
available for examining the issue of loan default control. However, it is crucial to research the nature of
the various approaches and compare them because accurate forecasts are crucial for maximizing earnings.
The subject of loan defaulter prediction is studied using a crucial predictive analytics technique: the logistic
regression model. Kaggle is where the data is gathered for analysis and forecasting. The various
performance metrics have been calculated using logistic regression models. Performance metrics like
sensitivity and specificity are used to compare the models. The model produces varying outputs, according
to the end results.

INTRODUCTION:
Small loans play a crucial role in our daily lives by enabling would-be business owners to launch
their ventures, enabling inquisitive students to afford higher education that would otherwise be
unaffordable without a steady source of income, and—above all—enabling regular people without
friends or family to get short-term financial support and get back on their feet so they can pursue
the American Dream. That being said, there is a chance of default with any loan. Financially
speaking, default refers to the inability to fulfill a loan's legal duty to repay principal and interest.
It's a prevalent issue in the financial sector and one of the main dangers of making loans. Of course,
most of the time, default does not occur, and the lending banks are typically able to recover the
loss from a loan that defaults by using other loans that have been fully paid back, along with the
interest that goes along with them. Additionally, banks are trading off a higher likelihood of default
with a bigger profit from the high interest rate when they offer loans with higher interest rates to
those who have a high potential of defaulting. All things considered, default is inevitable, and the
majority of financial institutions have a well-established procedure to absorb the loss and lessen its
effects. However, what if the loan is made up of money from multiple investors rather than being
issued by a single bank? One of the numerous peer-to-peer lending companies that creates this
odd circumstance is Lending Club. Peer-to-peer lending companies, to put it simply, serve as a
middleman between investors and borrowers. The business establishes a platform on which
investors may look for and select which small, unsecured personal loans to make investments in.
The company receives a cut from both parties (origination fee from borrowers and service charge
from investors), investors earn from the loan interest, and borrowers get the loan they desire. This
implies that a single bank will no longer bear the entire loss in the event of a loan failure; rather,
some or all of the investors will. If all of the loans came from a single lender, the total profit might
be positive because other completely paid loans could make up the loss. However, if investors did
not diversify, there will be winners and losers within this new type of lending practices. Predicting
whether a certain loan would default based on the borrowers' initial information and credit report
is a clear solution to this issue. Undoubtedly, Lending Club already has a methodology in place for
approving loans that are listed on their website.
In order to reduce the burden placed on individual investors, this study will examine the steps
and outcomes involved in developing a new machine learning model that could forecast a loan
default. More significantly, though, the model will concentrate on reducing the total loss on
investment from bad loans. The study will also examine privacy-preserving measures for sensitive
data obtained from the borrower's credit record. As previously said, the ultimate objective is to
assess a condensed form of RAPPOR (Randomized Aggregately Privacy Preserving Ordinal
Response) and ascertain if data hashed by this algorithm can still be used to forecast loan default.

LITERATURE SURVEY:
Kamil Amira Ibrahim Hassan and Ajith Abraham (2008) build a supervised two-layer feed-forward
network using a prediction model built with three distinct training procedures. The findings
demonstrate that the training method enhances the loan default prediction model's architecture.
Angelini (2008) used a neural network with standard topology and a feed-forward neural network
with ad hoc connections. Neural network can be used for prediction model. This paper shows
that the above two models give optimum results with less error.
Ngai (2009) predicts future customer behavior in CRM using the classification model. Neural
networks are the most often utilized model in the CRM industry. Between 2000 and 2006, he
identified 87 papers related to data mining techniques and applications.
Dr. A. Chitra and S. Uma (2010) introduced a ensemble learning method for prediction of time
series based on Radial Basis Function networks (RBF), K - Nearest Neighbor (KNN) and Self
Organizing Map (SOM). They proposed a model namely PAPEM which perform better than
individual model.
Akkoç (2012) used a model namely hybrid Adaptive Neuro- Fuzzy Inference model, grouping of
statistics and Neuro- Fuzzy network. A 10-fold cross validation is used for better results and a
comparison with other models.
Sarwesh Site, Dr. Sadhna K. Mishra (2013) proposed a method in which two or more classifiers
are combined together to produce an ensemble model for the better prediction. They used the
bagging and boosting techniques and then used random forest technique.
Maher Alaraj, Maysam Abbod, and Ziad Hunaiti (2014) proposed a new ensemble method for
classification of costumer loan. This ensemble method is based on neural network. They state
that the proposed method gives better results and accuracy as compared to single classifier and
any other model.
AlarajM , AbbodM (2015) introduced a model that are based on homogenous and heterogeneous
classifiers. Ensemble model based on three classifiers that are logistic artificial neural network,
logistic regression and support vector machine.
PROBLEM STATEMENT :
To develop and put into operation a system that uses data mining and machine learning to
forecast a user's likelihood of receiving a bank loan in order to increase accuracy and reduce
fraud. All around the world, banks, home finance companies, and some NBFCs deal in a variety
of loan products, including business, personal, and housing loans. These businesses operate in
urban, semi-urban, and rural regions. After applying loan by customer these companies validate
the eligibility of customers to get the loan or not. This paper provides a solution to automate
this process by employing machine learning algorithm. So, the customer will fill an online loan
application form. This form consists details like Sex, Marital Status, Qualification, Details of
Dependents, Annual Income, Amount of Loan, Credit History of Applicant and others. To
automate this process by using machine learning algorithm, First the algorithm will identify
those segments of the customers who are eligible to get loan amounts so bank can focus on
these customers.

ABOUT THE DATA SET :


Here is the description of the variables present in the dataset:
Variable Description Type
Loan_ID Unique Loan ID Object
Gender Male / Female Categorical
Married Applicant married (Yes/No) Categorical
Dependents Number of dependents Categorical
Education Education (Graduate / Not Graduate) Categorical
Self_Employed Self-employed (Yes/No) Categorical
Applicant Income Applicant’s income Numerical
Co applicant Income Co applicant’s income Numerical
LoanAmount Loan amount in thousands Numerical
Loan_Amount_Term Term of loan in months Numerical
Credit_History Credit history meets guidelines Numerical
Property_Area Urban / Semiurban / Rural Categorical
Loan_Status Loan approved (Yes/No) – class Categorical

PROPOSED SYSTEM :
Decision tree algorithm in machine learning methods which efficiently performs both
classification and regression tasks[2]. It creates decision trees. Decision trees are widely used in
the banking industry due to their high accuracy and ability to formulate a statistical model in
plain language. In Decision tree each node represents a feature (attribute), each link (branch)
represents a decision (rule) and each leaf represents an outcome (categorical or continues
value). Using different data analytics tools loan prediction and there severity can be forecasted.
In this process it is required to train the data using different algorithms and then compare user
data with trained data to predict the nature of loan. Several R functions and packages were used
to prepare the data and to build the classification model. The work proves that the R package
is an efficient visualizing tool that applies data mining techniques. Using R Package, customer’s
data analysis can be done and depends on that bank can sanction or reject the loan. In real time
customers data sets may have many missing and imputed data which needs to be replaced with
valid data generated by making use of the available completed data. The dataset has many
attributes that define the credibility of the customers seeking for several types of loan. The
values for these attributes can have outliers that do not fit into the regular range of data.
ALGORITHM USED :
Decision Tree(DT) is a supervised learning algorithm used to solve classification and regression
problems too. Here, DT uses tree representation to solve the prediction problem, i.e., external
node and leaf node in a tree represents attribute and
class labels respectively. The pseudo code for DT model is depicted in the following section:

Step 1: Best attribute is chosen as the tree‘s root.


Step 2: Training set is divided into subsets, such that, each subset comprises similar value for
an attribute.
Step 3: Step 1 and Step 2 are repeated for all subsets until all the leaf nodes are traversed in a
tree.

Fig. An Example of Decision Tree Algorithm


RESULT :

VALIDATION :
We use k-fold cross validation for this problem.Using this method, we can divide the preprocessed
dataset into training and test sets which are completely disjoint. This distribution of dataset is
done randomly so that there is no common data between the training set and the test set.
For example, if we have a dataset of N (=500) data records, using 10-fold CV (i.e. k=10), the
number of data records in the training set will be
N × (k – 1)/k = 500 × 9/10 = 450
and the number of data records in the test set will be
N × 1/k = 500 × 1/10 = 50
CONCLUSION :
The analytical process started from data cleaning and processing, Missing value imputation with mice
package, then exploratory analysis and finally model building and evaluation. The best accuracy on public
test set is 0.721. This brings some of the following insights about approval. Applicants with Credit history
not passing fails to get approved, probably because that they have a probability of a not paying back.
Most of the Time, Applicants with high income sanctioning low amount is to more likely get approved
which make sense, more likely to pay back their loans. Some basic characteristic gender and marital
status seems not to be taken into consideration by the company.

REFERENCES :

[1] Dr. A. Chitra and S. Uma., “An Ensemble Model of Multiple Classifiers for Time Series
Prediction”, International Journal of Computer Theory and Engineering, Vol. 2, Issue 3, pp. 454–
458, June 2010.
[2] M. V. Jagannatha Reddy and B. Kavitha, “ Extracting Prediction Rules for Loan Default
Using Neural Networks through Attribute Relevance Analysis”, International Journal of Computer
Theory and Engineering, Vol. 2, Issue 4, pp. 596-601, August 2010.
[3] Ms. Neethu Baby, Mrs. Priyanka L.T., “ Customer Classification And Prediction Based On
Data Mining Technique” , International Journal of Emerging Technology and Advanced
Engineering, Vol. 2, Issue 12, pp. 314- 318, December 2012.
[4] Sivasree M S, Rekha Sunny T, “Loan Credibility Prediction System Based on Decision Tree
Algorithm”, International Journal of Engineering Research & Technology , Vol. 4 ,Issue 09, pp.
825- 830,
September 2015.
[5] E. Angelini, A. Roli, and G. di Tollo, “A neural network approach for credit risk evaluation”
elsevier, The Quarterly Review of Economics and Finance, Vol. 48, Issue 4, pp. 733–755,
November 2008.
[6] Suresh Ramakrishna, Maryam Mirzaei and Mahmoud Bekri, “Adaboost Ensemble
Classifiers for Corporate Default Prediction” , 1st International Conference of Recent Trends in
Information and Communication Technologies, pp. 258-269, September 2014
[7] Amira Kamil Ibrahim Hassan and Ajith Abraham, “Modeling Consumer Loan Default
Prediction Using Ensemble Neural Networks”, International Conference on Computing, Electrical
and Electronics Engineering , pp. 719 – 724, August 2013.
[8] Toby Segaran, “Programming Collective Intelligence: Building Smart Web 2.0
Applications.” O’Reilly Media.
[9] Drew Conway and John Myles White,” Machine Learning for Hackers: Case Studies and
Algorithms to Get you Started,” O’Reilly Media.
[10] Trevor Hastie, Robert Tibshirani, and Jerome Friedman,”The Elements of Statistical Lear.
[11] PhilHyo Jin Do ,Ho-Jin Choi, “Sentiment analysis of real-life situations using loca- tion,
people and time as contextual features,” International Conference on Big Data and Smart
Computing (BIGCOMP), pp. 39–42. IEEE, 2015.
[12] Bing Liu, “Sentiment Analysis and Opinion Mining,” Morgan & Claypool Publishers, May
2012.
[13] Bing Liu, “Sentiment Analysis: Mining Opinions, Sentiments, and Emotions,” Cambridge
University Press, ISBN:978-1-107-01789-4.
[14] Shiyang Liao, Junbo Wang, Ruiyun Yu, Koichi Sato, and Zixue Cheng, “CNN for situations
understanding based on sentiment analysis of twitter data,” Procedia computer science,
111:376–381, 2017.CrossRef.
[15] K I Rahmani, M.A. Ansari, Amit Kumar Goel, “An Efficient Indexing Algorithm for
CBIR,”IEEE- International Conference on Computational Intelligence & Communication
Technology ,13-14 Feb 2015.
[16] Gurlove Singh, Amit Kumar Goel ,”Face Detection and Recognition System using Digital
Image Processing” , 2nd International conference on Innovative Mechanism for Industry
Application ICMIA 2020, 5-7 March 2020, IEEE Publisher. ning: Data Mining, Inference, and
Prediction,” Springer ,Kindle.
[17] Amit Kumar Goel, Kalpana Batra, Poonam Phogat,” Manage big data using optical
networks”, Journal of Statistics and Management Systems “Volume 23, 2020, Issue 2, Taylors &
Francis.
[18] Raj, J. S., & Ananthi, J. V., “Recurrent neural networks and nonlinear prediction in support
vector machine” Journal of Soft Computing Paradigm (JSCP), 1(01), 33-40, 2019.
[19] Aakanksha Saha, Tamara Denning, VivekSrikumar, Sneha Kumar Kasera. "Secrets inSource
Code: Reducing False Positives usingMachine Learning", 2020 InternationalConference on
Communication Systems &Networks (COMSNETS), 2020.
[20] X.Frencis Jensy, V.P.Sumathi,Janani Shiva Shri, “An exploratory Data Analysis for Loan
Prediction based on nature of clients”, International Journal of Recent Technology and
Engineering (IJRTE),Volume-7 Issue-4S, November 2018.
[21] Pidikiti Supriya, Myneedi Pavani, Nagarapu Saisushma,Namburi Vimala Kumari, k
Vikash,“Loan Prediction by using Machine Learning Models”, International Journal of Engineering
and Techniques.Volume 5 Issue 2, Mar-Apr 2019.

You might also like