BANKING CHURN
PRIDICITON
Mohammad Haseeb Mir Mohammed Adnan Ahmad Munmeeth k
Dayananda Sagar Academyof Dayananda Sagar Academy of Dayananda Sagar Academyof
Technology and Management Technology and Management Technology and Management
haseebrecovery393@gmail.com mohammedadnan10@gmail.com munmeeth2003@gmail.com
Prof. RAJESH M
Dayananda Sagar Academy of
Technology and Management
Abstract - Customer churn is a critical challenge for the banking has served basically into three basic areas-rooted
banking sector, directly impacting profitability and concept of customers, comes due to intense competition, and
sustainability. Predicting churn enables banks to shifting market dynamics. Between these banks have become
proactively identify customers at risk of leaving and very aggressive in customer acceptance and to go in for
implement effective retention strategies. This paper increasing customer loyalty. On account of this inability to
focuses on developing a robust churn prediction model engender customer loyalty, the banking sector has had to
using advanced data analytics and machine learning endure a high level of attrition. An investigation of such a
techniques. By analyzing customer behavior, transaction strategy for retention usually indicates, "drop rates," which are
patterns, demographic data, and engagement metrics, the number of people that stop using the service in question.
the model uncovers key factors driving customer Churn is the cost of a customer leaving and is defined as
attrition. Various algorithms, including logistic "failure or exit of one-time or long-term customers." One of the
regression, decision trees, and ensemble methods, are chief areas as reflected in the company's agenda for customer
evaluated to identify the most effective approach for churn is the customer satisfaction which is to be obtained
churn prediction. Additionally, this study highlights the through customers' experiences with a company.
importance of feature engineering in capturing customer
preferences and trends, leading to actionable
insights.Real-world banking datasets are used to validate
the model, achieving significant improvements in
prediction accuracy compared to traditional
methods.This research underscores the need for a
customer-centric approach in the banking industry,
leveraging technology to anticipate churn and sustain
long-term customer relationships .Keywords: Banking,
Churn Prediction, Machine Learning, Customer
Retention, Predictive Analytics
INTRODUCTION
In a world of cut-throat competition and changing dynamics,
many service providers have emerged to compete with one
another. One of the biggest challenges faced by service
providers now is to respond to the changing customer
behavior, which seems to be driven by increased expectations
from customers. Compared with the previous generation,
today's customers demand different approaches regarding
connectivity along with more custom-made and modernized
solutions. Those customers are educated and more aware of
prevailing trends; thus, their purchasing habits reflect a
different paradigm. This eases the way for a phenomenon
called 'analysis paralysis' wherein customers analyze the
buying/selling process far more than it is reasonable, and such
analysis usually is a window through which the buyers
enhance their overall decision process. Thus, the non-
generation service providers are presented with a challenge to
think out of the box and present value to customers.
Ever since the first bank was founded in India in 1786,
I. LITERATURE SURVEY
II. METHODOLOGY
Client churn analysis within the banking space is a vast field
of research. In one study [7], client churn prediction in
commercial banks was studied with the SVM (Support This study develops a predictive model for customer churn
Vector Machine) model. The dataset consisted of 50,000 where a commercial bank introduces appropriate data mining
techniques to this end. The model proposed is illustrated in the
customer records from a Chinese commercial bank, with
diagram in Fig 1.
46,406 valid records after preprocessing. Two SVM models
were studied: linear SVM and SVM using a radial basis
kernel function. An undersampling approach was applied to A. Dataset Overview
resolve the data imbalance issues, leading to considerable The dataset selected for this analysis was downloaded from
enhancement in predictive performance; however, also some Kaggle and is composed of information about 1
challenge with SVM was posed in churn prediction given samples contribute to the negative class or customers who've
skewed distribution of the dataset, leading further to exited. The target variable is flagged with a binary response: 1
evaluation metrics, even in general, not being able to give is used for a sample that shows a customer has left the bank,
credit of its predictive power. The predictions were while 0 is used for a customer who remained. The dataset
enhanced with random sampling and SVM. contains 13 feature vectors (predictors) based on the customer
and transaction data, as shown in Table II.
Another study [8] provided mining applications in the
banking sector to draw actionable insights, stating that
customers making more use of banking products were more
loyal and thus present a target for banks to market to those B. Data Preprocessing
that use fewer than three products. This study employs a Data preprocessing is one of the essential steps for the success
dataset containing 1,866 customers and a neural network of data mining tasks. In this phase, the resolution of
model constructed using Alyuda NeuroIntelligence software randomness, noisiness, and unreliability of data goes on,
to subdivide data into training, validation, and testing sets. together with the conversion of data where necessary. Data
The proposed neural network was recorded with a validation optimization ensures that the dataset would suffice for accurate
accuracy (CCR) of 93.96%. However, even though quite analysis. Preprocessing procedures for the predictors are
effective, the model was claimed to be slow and tedious, summarized in Table III, showing the various attributes' use in
more so due to high representation of seniors in the dataset, the churn prediction loop.
which is likely to be a challenge in wider generalizations.
Another study [9] works on the churn prediction model for
telecom operators on a big data machine learning platform.
Using the Syriatel data, the underlying techniques were
Decision Tree, Random Forest, Gradient Boosted Machine
(GBM), and Extreme Gradient Boosting (XGBOOST).
Hyperparameters of the model were tuned using K-fold
cross-validation and class imbalance was addressed through
an oversampling/random undersampling technique. Within
the models tested, XGBOOST had maximum AUC (93.3%)
working with 180 trees.
Irrelevant Data: The irrelevant attributes or data are those that
do not bear on the analysis's aim. Irrelevant features, if kept,
can impede the performance of classification algorithms. The
variables besides those customarily adopted were, therefore,
rejected from the analysis by hand, these being the Row
Number, Customer ID, Surname, and Geography on the churn
dataset treated in this research.
Data Transformation: Data transformation is the process of
changing the format of data. Proper transformation ensures
that data are arranged, validated, and not subject to any Decision Tree (DT): The decision tree further divides the data
deficiencies, such as null values, duplicate redundancy, into segments that look like branches, making the model simple
incorrect indexing, or incorrect formats, which enhances the to expound on [20]. While other algorithms such as neural
quality. In this study, the data were transformed by: networks tend to yield better values, decision trees release
transparency, factorizing complex nonlinear relationships
Gender: Coded as Female → 0 and Male → 1. between predictors and target variables. Two main strategies of
implementation of decision trees reside in tree creation and
Feature Selection classification [21].
Relevant Reaction: Selecting relevant features is an integral Random Forest (RF): Random Forest is an ensemble
step of the data preparation stage in data analysis. Filter-type introduced by Breiman [22] of several decision trees built on
feature selection methods do not depend on the training randomly sampled data. The approach of random forest lessens
algorithm. Instead, they focus on ranking features based on the variance associated with decision trees that could lead them
their characteristics. to make wrong predictions. Random forests work efficiently
with data of high dimensions without the need for dimensional
Relief Algorithm: This is another filter-based feature reduction or feature selection. In addition to that, the training
selection method that ranks features using the Relief by RF is considerably faster, and is, thus, compatible with
algorithm. It estimates the feature importance for distance- models of parallel processing.
based supervised models that predict outcomes using pairwise
distances between observations. Based on a given number of 3.3 Prerequisites
nearest neighbors, the algorithm would rank the predictors Before starting, ensure you have the following tools installed:
according to their relevance, delivering an output list of • Programming languages: Python, R, or MATLAB.
predictors ordered in terms of importance. • Libraries: Pandas, NumPy, Scikit-learn, TensorFlow,
or PyTorch.
Oversampling • Big Data Platforms (if needed): Apache Spark or
Hadoop.
Oversampling and undersampling are the methods used to
balance class distribution in datasets. In particular, the data 3.4 Domain Knowledge
used in this study were not only highly imbalanced (7963 • Understanding of customer behavior, banking
positive-class samples and 2037 negative-class samples) and operations, and factors influencing churn in the
thus, quite small; hence, oversampling here seemed quite financial sector.
appropriate. Undersampling could deter from most necessary • Awareness of regulatory requirements in the banking
samples while building a diverse model. Hence, random industry.
oversampling was adopted to balance the minority class 3.5 Data Preparation
(negatively). • Data Cleaning: Handling missing values, outliers,
and duplicate records.
Classification • Feature Engineering: Creating relevant predictors
(e.g., average transaction value, frequency of service
Classification algorithms applied to the preprocessed data usage).
include K-Nearest Neighbor, Support Vector Machine, • Data Transformation: Converting categorical data
Decision Tree, and Random Forest, with the models being into numerical formats (e.g., encoding gender,
compared for the output results. Also, effectiveness of these geography).
classifiers was evaluated on various features against different
• Imbalanced Dataset Handling: Applying
feature selection methods.
oversampling or undersampling techniques if churn
and non-churn classes are disproportionate.
k-Nearest Neighbor (KNN): KNN is a simple, non-parametric
classification method based on supervised learning. It •
classifies new samples by searching the k nearest neighbors
from the training dataset and giving the sample to the class 3.6 Data Collection
possessing the highest likelihood. • Access to relevant customer data, such as
demographics, transaction history, account details,
Support Vector Machine (SVM): Support Vector Machines, and service usage.
stemmed from Vapnik's Statistical learning theory [13], [14], • Inclusion of both churned and non-churned customer
[15], represent a very effective supervised learning algorithm records for balanced insights.
having applications in classification [16], regression [17],
time series prediction, and estimation in geotechnical and 3.7 Machine Learning Algorithms
mining sciences [18]. The main aim of an SVM is to find the Familiarity with various ML models, including:
optimal hyperplane that maximizes the margin between the • Logistic Regression: For baseline classification.
two classes, thereby minimizing misclassification on both • Decision Tree (DT): For interpretability.
training and test data. For this study, the LSVM model was • Random Forest (RF): For ensemble predictions.
selected and was originally intended for binary classification • Gradient Boosted Machines (GBM/XGBoost): For
problems [19].
handling complex patterns.
• Support Vector Machine (SVM): For high-
dimensional data.
• K-Nearest Neighbors (KNN): For similarity-based
predictions.
III. Results And Discussions
After the preprocessing of the data has been completed, the
dataset becomes operational and ready for analysis. This
study proceeds with carrying out 10 features as derived from
the preprocessing. In total, 70% of the data is set aside for
training, while 30% of the remaining ones are randomly
selected to conduct the tests. Classification models are
applied independently and then combined with the specific
feature selection methods. Each model's performance is
judged through the accuracy acquired using 10-fold cross-
validation along with the randomness-based confusion
matrix. The performance of the classifiers varies by selection Using any of the feature selection methods improves KNN
of feature selection methods made. Selected feature and performance marginally, but does not at all move the needle on
classifier parameter, the latest discussed below. SVM. The performance for DT and RF dropped a little because
tree-based classifiers rely more heavily on the features
The k value is set at 5 for KNN, meaning five nearest available, and a decrease in features invariably affects their
neighbors will be considered for classification. Sometimes confidence levels.
too few neighbors (i.e., fewer than 5) might search for higher
accuracy. However, this method fails every time when the Finally, based on this study, RF with oversampling performed
data is randomly selected. On the contrary, the increasing better than KNN, SVM, and DT with respect to accuracy.
value of neighbors would usually lead to a considerable Although feature selection methods did not lead to much
decrease in accuracy. Hence, the analysis makes use of k=5 improvement in the performance of tree classifiers, “Number
to balance and optimize the system performance. Euclidean of Products” was the most prominent feature. This implies that
distance will be used for classification. For SVM, the LSVM customers who use multiple bank products such as mobile
employs a linear kernel function. In the case of RF, the banking, internet banking, savings accounts, and fixed deposits
number of decision trees set is an average number of around have a reduced risk of churn. Thus, banks shall focus on
100. This parameter finds its way to perform the best during retaining customers with fewer products.
the classification process. The classification execution IV. CONCLUSION
outcomes are provided in Table IV; those conducted with and In banking, as with a few other industries, the engagement of
without oversampling are included and are all methods in this customers is becoming a core focus. In this respect, efforts
case. KNN and SVM classifiers showed better accuracy after have turned towards the prediction of churn in customers.
oversampling. In contrast, no change in accuracy occurred in Millions of studies into churn prediction are being conducted
KNN, whereas the accuracy of SVM decreased by the level within banking. Different types of organizations are
of oversampling meaning learning is less suitable for the evaluating their churn rates by measuring churn in different
larger datasets. The six significant features that have been ways over different data sources. Therefore, there exists a
selected by the MRMR method include Number of Products, huge gap for a generalized functional model to detect player
Is Active Member, Gender, Age, Balance, and Tenure. Table churn much earlier. Such a model would input fixed and
V summarizes KNN's accuracy after the MRMR method,
which has improved, whereas the SVM accuracy did not. The variable data sources, irrespective of any particular service
DT and RF accuracies decrease slightly as compared to the provider, and allowing for minimal information input
models without MRMR. Among those features recommended providing maximum predictive output. That is what the
by the Relief method, Number of Products, Age, Balance, research aims for.
Tenure, Gender, and Has CrCard were selected. Compared
with models, KNN accuracy is improved, SVM is unchanged,
while accuracies of DT and RF are slightly lower. The goal of this research is to develop the most effective
model for early-stage customer churn prediction in banks.
The undersampling method solves the problem of class The study, based on a small dataset (10,000 samples) with
imbalance by sampling to make equal observations of class 0 heavy imbalance, could have used even larger real banking
and 1. The drawback of this method is that the classification datasets. The oversampling countered some of the issues. The
accuracy for SVM deteriorated because of the very large data study compares KNN, SVM, Decision Tree, and RF
which restrains the performance of SVM. The accuracy in classifiers in various settings. On cross-validation,
KNN remains relatively constant after resampling, while DT oversampled data using the RF classifier achieved the best
and RF models benefit from improved accuracy as a result of result of 95.74%. Feature selection methods do not have
large, balanced datasets. much impact on tree-based classifiers (Decision Tree and
RF). The results showed that feature reduction from feature
selection slightly downgraded the performance of these hyperspectral imaging," Computers and Electronics in
classifiers. Agriculture, vol. 119, pp. 191–200, 2015.
[11] L. Beretta and A. Santaniello, "Implementing
An interesting observation is that different from other
ReliefF filters to extract meaningful features from genetic
classes of classifiers, oversampling provides relatively poor
performance to the SVM. This is mostly due to the lifetime datasets," Journal of Biomedical Informatics,
imbalance in the dataset, against which SVM is found not vol. 44, no. 2, pp. 361–369, 2011.
performing well.
[12] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
V. REFERENCES Classification, John Wiley & Sons, 2012.
[1] D.-R. Liu and Y.-Y. Shih, "Integrating AHP and data [13] C. Cortes and V. Vapnik, "Support-vector
mining for product recommendation based on customer networks," Machine Learning, vol. 20, no. 3, pp. 273–
lifetime value," Information & Management, vol. 42, 297, 1995.
no. 3, pp. 387–400, 2005.
[14] V. Vapnik, The Nature of Statistical Learning
[2] G. Canning Jr, "Do a value analysis of your customer Theory, Springer Science & Business Media, 2013.
base," Industrial Marketing Management, vol. 11, no. 2,
pp. 89–93, 1982. [15] V. N. Vapnik, "An overview of statistical learning
theory," IEEE Transactions on Neural Networks, vol. 10,
[3] R. W. Stone and D. J. Good, "The assimilation of no. 5, pp. 988–999, 1999.
computer-aided marketing activities," Information &
Management, vol. 38, no. 7, pp. 437–447, 2001. [16] J. Raj and V. Ananthi, "Recurrent neural networks
and nonlinear prediction in support vector machines,"
[4] M.-S. Chen, J. Han, and P. S. Yu, "Data mining: an Journal of Soft Computing Paradigm, vol. 2019, pp. 33–
overview from a database perspective," IEEE 40, Sep. 2019.
Transactions on Knowledge and Data Engineering, vol.
8, no. 6, pp. 866–883, 1996. [17] P. G. Nieto, E. F. Combarro, J. del Coz Díaz, and E.
Montañes, "A SVM-based regression model to study the
[5] M.-K. Kim, M.-C. Park, and D.-H. Jeong, "The air quality at local scale in Oviedo urban area (Northern
effects of customer satisfaction and switching barrier on Spain): A case study," Applied Mathematics and
customer loyalty in Korean mobile telecommunication Computation, vol. 219, no. 17, pp. 8923–8937, 2013.
services," Telecommunications Policy, vol. 28, no. 2,
pp. 145–159, 2004. [18] S.-G. Cao, Y.-B. Liu, and Y.-P. Wang, "A
forecasting and forewarning model for methane hazard in
[6] I. T. J. Swamidason, "Survey of data mining working face of coal mine based on LSSVM," Journal of
algorithms for intelligent computing systems," Journal China University of Mining and Technology, vol. 18, no.
of Trends in Computer Science and Smart Technology, 2, pp. 172–176, 2008.
vol. 01, pp. 14–23, Sep. 2019.
[19] Y. Tang, "Deep learning using linear support vector
[7] B. He, Y. Shi, Q. Wan, and X. Zhao, "Prediction of machines," arXiv preprint arXiv:1306.0239, 2013.
customer attrition of commercial banks based on SVM
model," Procedia Computer Science, vol. 31, pp. 423– [20] L. Breiman, J. Friedman, C. J. Stone, and R. A.
430, 2014. Olshen, Classification and Regression Trees, CRC Press,
1984.
[8] A. Bilal Zoric, "Predicting customer churn in
banking industry using neural networks," [21] N. B. Amor, S. Benferhat, and Z. Elouedi,
Interdisciplinary Description of Complex Systems: "Qualitative classification with possibilistic decision
INDECS, vol. 14, no. 2, pp. 116–124, 2016. trees," in Modern Information Processing, Elsevier,
2006, pp. 159–169.
[9] A. K. Ahmad, A. Jafar, and K. Aljoumaa, "Customer
churn prediction in telecom using machine learning in [22] L. Breiman, "Random forests," Machine Learning,
big data platform," Journal of Big Data, vol. 6, no. 1, p. vol. 45, no. 1, pp. 5–32, 2001.
28, 2019.
[10] Y. Jiang and C. Li, "MRMR-based feature selection
for classification of cotton foreign matter using