Reseacch
Reseacch
*Correspondence:
joao.brito@ufrgs.br; jbgb@uol. Abstract
com.br Managing customer retention is critical to a company’s profitability and firm value.
1
Departamento de Engenharia However, predicting customer churn is challenging. The extant research on the topic
de Produção, Universidade mainly focuses on the type of model developed to predict churn, devoting little
Federal do Rio Grande do Sul, Av.
Osvaldo Aranha, 99 – 5° Andar, or no effort to data preparation methods. These methods directly impact the identifi-
Porto Alegre, RS 90035‑190, Brazil cation of patterns, increasing the model’s predictive performance. We addressed this
2
Escola de Administração, problem by (1) employing feature engineering methods to generate a set of potential
Universidade Federal do Rio
Grande do Sul, Washington Luiz, predictor features suitable for the banking industry and (2) preprocessing the majority
855, Porto Alegre, RS 90010‑460, and minority classes to improve the learning of the classification model pattern. The
Brazil framework encompasses state-of-the-art data preprocessing methods: (1) feature engi-
3
Escola de Administração
de Empresas de São Paulo, neering with recency, frequency, and monetary value concepts to address the imbal-
Fundação Getulio Vargas, Av. 9 anced dataset issue, (2) oversampling using the adaptive synthetic sampling algorithm,
de julho, 2029, Bela Vista, São and (3) undersampling using NEASMISS algorithm. After data preprocessing, we use
Paulo, SP 01313‑902, Brazil
XGBoost and elastic net methods for churn prediction. We validated the proposed
framework with a dataset of more than 3 million customers and about 170 million
transactions. The framework outperformed alternative methods reported in the litera-
ture in terms of precision-recall area under curve, accuracy, recall, and specificity. From
a practical perspective, the framework provides managers with valuable information
to predict customer churn and develop strategies for customer retention in the bank-
ing industry.
Keywords: Customer churn prediction, Imbalanced dataset treatment, Feature
engineering, RFM
Introduction
The strategy of traditional banks has always been centered on their products and ser-
vices, prioritizing internal practices and processes and considering customers as the
commercial target (Lähteenmäki and Nätti 2013). However, the advancement of com-
puting power and the reduction in its costs have led to a significant transformation in
the industry by rapidly implementing new and highly competitive financial products and
services (Feyen et al. 2021). Competitiveness has grown sharply as new financial technol-
ogies (e.g., FinTech) have risen (Murinde et al. 2022). Fintech is recognized as one of the
most significant technological innovations in the financial sector and is characterized by
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate-
rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/.
Brito et al. Financial Innovation (2024) 10:17 Page 2 of 29
rapid development (Kou et al. 2021a). Digital innovations coupled with abundant data
have made it possible to devise new businesses and financial services, placing custom-
ers at the center of marketing decisions (Pousttchi and Dehnert 2018). In this scenario,
managing customer retention is critical to avoid the defection of valuable customers
(Mutanen et al. 2010). Reichheld and Sasser (1990) pointed out that reducing churn by
5% might increase a bank’s profits from those customers by 85%.
Churn prediction is key in churn management programs (Ascarza 2018). Thus, cus-
tomer churn prevention is a strategic issue that allows companies to monitor customer
engagement behavior and act when engagement decreases (Gordini and Veglio 2017).
Additionally, as the costs of acquiring new customers are typically higher than retain-
ing existing ones, developing reliable strategies for churn management is critical for the
sustainability of companies (Lemmens and Gupta 2020). However, managing customer
churn is a challenging task for marketing managers (Ascarza and Hardie 2013).
Therefore, predictive models tailored to identify potential churners are fundamental
in supporting managerial decisions (Zhao et al. 2022). Several models have been pro-
posed to predict churn (Benoit and Poel 2012; He et al. 2014; Gordini and Veglio 2017),
estimate the churn probability of each customer, and assess the accuracy of these pre-
dictions in the holdout sample. However, most of these methods focus on the type and
mathematical properties of the model used to predict churn. While this concern is legiti-
mate and important, less effort is devoted to data preprocessing, which comprises a set
of tasks performed before training the models to improve the predictive performance
(Jassim and Abdulwahid 2021). Furthermore, one of the main challenges of customer
churn prediction (CCP) is training binary classification models when the churn event
is rare (Zhu et al. 2018), as is the case of the retail banking industry (see Mutanen et al.
2010 and Keramati et al. 2016). The low proportion of the rare event compared with the
majority class (e.g., retained clients) can create pattern gaps, making it hard for predic-
tive models to identify a customer who is likely to churn. In extreme cases, a classifica-
tion model may classify all customers as retained, mispredicting churns (Sun et al. 2009;
Megahed et al. 2021). For instance, suppose there is a company with 98% of retained
customers and 2% of churned customers and we want to predict churners. In this case, if
we directly apply a commonly used learning algorithm, such as random forest or logistic
regression, the outcome will probably be a high accuracy for the majority class (recall
measure) and a poor specificity measure for the rare class.
Therefore, proposing strategies to deal with the problem of highly imbalanced datasets
is a relevant topic. According to Megahed et al. (2021), a challenging task is to increase
the number of instances of the rare class by collecting additional samples from the real-
world process, so rebalancing methods are important. These methods are part of the
broad concept of data preprocessing, which is a set of strategies carried out on raw
data before applying a learning algorithm. Recent studies aim to increase CCP perfor-
mance by testing multiple alternative techniques along the predictive modeling process,
including outliers treatment, missing values imputation (MVI), feature engineering (FE),
imbalanced dataset treatment (IDT), feature selection (FS), hyperparameters optimiza-
tion, and testing different classification models. Although such approaches address the
challenges of modeling churn behavior, they do not deal with scenarios where the churn
event is rare (Zhu et al. 2018). Therefore, data analysts should better understand the
Brito et al. Financial Innovation (2024) 10:17 Page 3 of 29
data preprocessing idea to ensure that data manipulation does not change the underly-
ing data structure and favors the classification technique applied to such data (Megahed
et al. 2021).
This study proposes a novel framework for CCP in the banking industry, where rare
churn events are persistent (Gür Ali and Arıtürk 2014). Rarity tends to jeopardize the
performance of traditional techniques aimed at binary classification. Our objectives are
as follows:
(1) Propose and validate a data preprocessing phase that combines different approaches
for data preprocessing (FE focused on the retail banking context, IDT oversampling
(IDT-over), and IDT undersampling (IDT-under));
(2) Perform and validate a model training and testing phase with state-of-the-art clas-
sification techniques (XGBoost and elastic net) while using Bayesian approaches for
hyperparameter optimization;
(3) Evaluate and discuss the impact of different data preprocessing approaches to
improve the predictive power of classification models.
Our study makes a valuable contribution to the research on decision support systems.
It sheds light on the importance of conducting data preprocessing steps in scenarios
where a substantial class imbalance is observed. To address this issue, rigorous tests
were applied, confirming the practical value of the model. The deliverables include the
following:
The rest of the paper is organized as follows. “Literature review” section reviews the
literature; “Proposed framework for CCP” section describes our framework; “Data and
empirical context” section presents the tests with the proposed framework; “Results and
discussion” section discusses the results; and, finally, “Conclusions” section concludes.
Literature review
In this section, we present a literature review on the fundamentals of the techniques
employed in the proposed framework.
CCP
Financial institutions have been challenged to develop new methods for min-
ing and interpreting customers’ data to help them understand their needs (Broby
2021). Broby (2021) also reported that the advancement of technology has reduced
Brito et al. Financial Innovation (2024) 10:17 Page 4 of 29
information asymmetry between banks and their clients, boosting competition and
making customer retention management increasingly important. Similarly, Livne
et al. (2011) examined the link between customer relationships and financial perfor-
mance at the firm level in the context of US and Canadian wireless (cellular) firms.
They found a positive association between customer retention and future revenues.
They also suggested that customer retention and the level of service usage play an
important mediating role in the relationship between investments in customer acqui-
sition and financial performance. These results indicate the importance of generating
and retaining customer loyalty to drive financial performance.
Given the relevance of retaining customers, churn prevention practices have gained
significance. The development of churn prediction models is an effective alterna-
tive to potentiate customer retention efforts. According to Broby (2022), statistical
and computational models of machine learning such as CCP models are increas-
ingly being integrated into decision support systems in the financial area. Thus, many
studies have been conducted to determine effective classification models for churn
prediction, especially when using data from financial sources (Lahmiri et al. 2020).
However, modeling financial data is complicated due to the presence of multiple
latent factors that can evolve and are usually correlated and even autocorrelated (Li
et al. 2022). Table 1 presents some studies that focused on the banking industry to
exemplify the methods commonly used to preprocess the dataset and the classifica-
tion models adopted. In the table, we also compare our study with the other studies.
The extant studies typically focused on comparing the performance of predictive
models using different fundamentals and the managerial benefits of churn prediction
on customer management. However, data preprocessing methods and approaches
for hyperparameter optimization are scarce in the CCP literature. Usually, authors
that described data preprocessing adopted traditional practices already established in
previous studies, including random undersampling (RU) (Benoit and Poel 2012; He
et al. 2014; Gordini and Veglio 2017), MVI (Lemmens and Croux 2006), and outlier
detection and elimination (Zhao and Dang 2008; Keramati et al. 2016). Most studies,
except that by Geiler et al. (2022), have not analyzed how predictive performance can
benefit from employing more robust data preprocessing techniques, especially when a
substantial imbalance between classes is verified (a common issue in CCP).
There are noticeable differences between previous studies and our propositions. To
the best of our knowledge, our study is the first to propose a complete framework
for churn prediction that encompasses a sequence of preprocessing steps prior to the
classification task. Additionally, our framework relies on more sophisticated algo-
rithms for IDT-over and IDT-under steps, enabling a performance gain due to the
synergy between these steps. Our study also differs from that of Lemmens and Croux
(2006), He et al. (2014), Farquad et al. (2014) and Geiler et al. (2022) in terms of the
distribution of the features—both IDT-over and IDT-under did not modify the sam-
ple distribution significantly (as proved by a KS test) because the number of added
artificial instances was controlled. This is desirable because it will not change the rela-
tionship between features. Another interesting difference is that our framework did
not devote a step to FS as seen in the studies by Farquad et al. (2014) and Keramati
et al. (2016) because the classification techniques employed to perform this task use
Brito et al. Financial Innovation (2024) 10:17 Page 5 of 29
Table 1 Previous research assessing churn prediction models (specifications of our study are
presented in the bottom line for comparison purposes)
Authors Context Data Predictive models Dataset size
preprocessing
stages
Lemmens and Croux Telecom MVI Logistic regression, Datasets 1 and 2: 51,306
(2006) IDT-Over bagging, and stochas- customers
IDT-Under tic gradient boosting Dataset 3: 100,462
customers
Xie et al. (2009) Banking OT Support Vector 2,382 customers
Machines
Zhao and Dang (2008) Banking – Random forests, neu- 1,524 customers
ral networks, decision
trees, and Support
Vector Machines
Benoit and Poel (2012) Banking – Random forest 244,787 customers
Huang et al. (2012) Telecom FE Logistic regression, 827,124 customers
Naive Bayes, linear
classification, C4.5,
neural networks, Sup-
port Vector Machines,
and data mining by
evolutionary learning
(DMEL)
Farquad et al. (2014) Banking IDT-Over Support Vector 14,814 customers
IDT-Under Machines, Naive Bayes
FS trees
He et al. (2014) Banking IDT-Over Logistic regression 46,406 customers
IDT-Under and Support Vector
Machines
Datta et al. (2015) TV subscription MVI Binomial probit model 16,512 customers
FE
Keramati et al. (2016) Banking OT Decision trees 4,383 customers
MVI
IDT-Over
FS
Geiler et al. (2022) Banking and others IDT-Over KNN, Logistic regres- 16 datasets (average of
IDT-Under sion, Naive Bayes, Sup- 108,473 customers)
port Vector Machines,
Decision trees, neural
networks, Random
Forest, XGBoost
Tékouabou et al. Banking MVI Chandy-Misra-Bryant 45,000 customers
(2022) FS
Our study Banking MVI XGBoost and Elastic 3,283,332 customers
FE Net
IDT-Over
IDT-Under
Additionally, for Pyle (1999), this phase comprises more than 80% of the modeling effort,
encompassing activities beyond cleaning and selection. In our study, we adopted three
preprocessing methods—FE, IDT-over, and IDT-under—which are described in more
detail in the following sections.
FE
FE is performed to transform the raw data into a new format that favors modeling
and contributes to performance gains (Khoh et al. 2023). It involves the initial discov-
ery of features and their stepwise improvement based on domain knowledge to adjust
the data representation (Kuhn and Johnson 2019). Mathematical operations, interac-
tions between attributes, matrix decomposition, discretization, and binarization are
typical FE operations performed regardless of the business context. However, based on
content assumptions, we can create new features using several practices, e.g., business
context and customer behavior (Zheng and Casari 2018). According to Ascarza et al.
(2018), customer retention involves the sustained continuity of customer transactions
by a company. In this sense, RFMs are constituted by transactional data and can enhance
prediction performance by expressing customer behavior and enabling churn predictors
(Fader et al. 2005). Similarly, Kou et al. (2021b) used transaction data to enhance bank-
ruptcy prediction in the banking industry.
IDT‑over
In binary classification modeling, it is common to have a rare class (Megahed et al.
2021). In extreme cases, a class rarity does not convey a sufficiently delimited decision
boundary between two groups (Weiss 2004). Classification problems such as bank fraud
detection, infrequent medical diagnoses, and oil spill recognition in satellite images are
well-known examples of rare classes (Galar et al. 2012). The imbalance between cate-
gories can make predictive classification modeling unfeasible. IDT-over methods try to
circumvent this by generating synthetic instances that reinforce the boundary between
classes (Triguero et al. 2012).
Researchers have used IDT-over methods in two ways—to create a new representation
of the data (dismissing the original data completely) and to oversample (creating artifi-
cial instances of the rare class) (Sun et al. 2009). One popular oversampling algorithm
is the synthetic minority oversampling technique (SMOTE), which assembles synthetic
examples of the infrequent category (Fernandez et al. 2018). With similar purposes, the
ADASYN algorithm searches for cases that are more difficult to discriminate between
classes and generates synthetic instances to reinforce them (He et al. 2008). The algo-
rithm uses a k-nearest neighbor (KNN) algorithm to add new artificial examples related
to the minority class, making it more distinct from the majority class. The objective of
ADASYN is to identify a weighted distribution of the rare class, considering its learning
difficulty (He et al. 2008). It uses rare class instances very close to the majority class to
generate other cases, reinforcing points in the boundary between the two groups. Recent
studies have demonstrated that ADASYN outperforms SMOTE (Dey and Pratap 2023).
Brito et al. Financial Innovation (2024) 10:17 Page 7 of 29
IDT‑under
IDT-under methods remove instances from the majority class to balance the groups
(Fernandez et al. 2018). The techniques range from simple RU to more refined algo-
rithms. However, procedures such as RU do not guarantee the retainment of instances
that maximize the identification of patterns, failing to select significant examples (He
and Ma 2013). Therefore, Lin et al. (2017) proposed the CLUS algorithm to maximize
the heterogeneity of the majority class instances by reducing redundancies. Similarly,
Zhang and Mani (2003) proposed the NEARMISS algorithm.
The NEARMISS algorithm uses a KNN-based method to identify and remove
instances with noisy patterns or those already represented in the data (redundant)
(Zhang and Mani 2003). The procedure selects majority class instances with large aver-
age distances to the KNNs, keeping the majority class instances at the decision boundary
(Bafna et al. 2023).
Classification algorithms
Researchers have developed several classification algorithms to predict a churn event.
As either a customer will continue to be a customer or will churn, binary classification
models are a promising alternative to address this problem. We now present the funda-
mentals of the two state-of-the-art classification techniques we test in our framework—
XGBoost and elastic net.
XGBoost
The XGBoost is a tree-based ensemble method under the gradient-boosting deci-
sion tree framework developed and implemented in the R programming language in
the package “xgboost” (Chen et al. 2022). This method uses an ensemble of classifica-
tion and regression trees (CARTs) to fit the training data samples, utilizing residuals to
calibrate a previous model at each iteration toward optimizing the loss function with a
performance metric [e.g., precision-recall area under curve (PR-AUC)] (Chen and Gues-
trin 2016). The XGBoost adds one CART when it identifies a subset of hard-to-classify
instances. To avoid overfitting in the calibration process, XGBoost adds a regularization
term into the objective function, controlling the complexity of the model. It also com-
bines first- and second-order gradient statistics to approximate the loss function before
optimization (Chen and Guestrin 2016). Equation 1 presents the objective process of the
XGBoost model at each iteration.
n → 1 →
J ft ∼= x i + h i ft 2 −
L yi , yi t−1 + gi ft − x i + ft (1)
i=1 2
tth iteration; gi and hi are the first and second derivatives of the loss function L, respec-
tively; and characterizes the regularization term described in Eq. 2 (Chen and Gues-
trin 2016):
T
� ft = γ Tt + ωj 2 (2)
2 j=1
here γ and are constants controlling the regularization process; T denotes the number
of leaves in the tree; and ωj is the score of each leaf. Representing the structure of a
CART f as a function q : Rm → {1, 2, 3, . . . , T } mapping an observed data instance to
→
the corresponding leaf index, we derive f − x i = ωq− →x i , with ωq −
→ ∈ RT . Next, plug-
xi
ging Eq. 2 into Eq. 1, removing the constant terms L(yi , yi t−1 ) that do not impact the
→
optimization process, and defining Ij = q − x i = j as the instance set of leaf j , we can
rewrite the objective function at each iteration as Eq. 3 (Chen and Guestrin 2016).
T ωj
J ft =∼ ωj gi + + hi + γT (3)
j=1 i∈Ij 2 i∈Ij
For a fixed tree structure q , the optimal weight of leaf j (ωj ∗) is obtained simply by
deriving Eq. 3 with respect to ωj and equating it to zero, leading to Eq. 4 (Chen and
Guestrin 2016).
i∈I gi
∗
ωj = − j (4)
+ i∈Ij hi
The optimal value of the objective function is Eq. 5 (Chen and Guestrin 2016).
1 T ( i∈Ij gi )2
J ∗ (qt ) = − + γT (5)
j=1 +
2 i∈Ij hi
Equation 6 is used as a scoring function to gage the tree structure quality. High gain
scores denote tree structures that better discriminate observations. As it is impossible to
enumerate all possible tree structures, a greedy algorithm starting from a single leaf and
iteratively adding branches to the tree is used. Letting IL and IR represent the instance
sets of left and right nodes, respectively, after a split (i.e., I = IL ∪ IR is the instance set of
the split node before splitting it), the new tree has T + 1 nodes with the same structure,
except around the split node.
�� �2
�� �2 �
1 i∈IL gi i∈IR gi + i∈I hi
Gain = � + � − � −γ (6)
2 + i∈IL hi + h
i∈IR i + i∈I hi
This process quantifies how a given node split compares to the previous one. Once the
Gain is negative, the algorithm will stop the cotyledon depth growth. The XGBoost algo-
rithm prunes the features that do not contribute to the prediction (embedded FS) and
generates feature importance ranking (Gain) based on the relative contribution of each
attribute to the model, as depicted in Eq. 6.
Brito et al. Financial Innovation (2024) 10:17 Page 9 of 29
Elastic net
Friedman et al. (2010) developed and implemented fast algorithms for fitting general-
ized linear models with elastic net penalties in the R programming language package
“glmnet.” We adopted the two-class logistic regression. It considers a response vari-
able Y ∈ {−1, +1} and a vector of predictors − →x ∈ Rm ( m is the number of features),
representing class-conditional probabilities through a linear function of the predic-
tors (see Eq. 7):
1
P Y = −1|−
→
x = = 1 − P Y = −1|−
→
x , (7)
− β0 +−
→
x ′β
1+e
P Y = −1|−
→
x + P Y = +1|−
→
x =1 (9)
→
The model fit occurs by regularized maximum (binomial) likelihood. Let P − x i be
the probability for the ith instance of the training set at a particular value of param-
eters β0 and β . −
→
x i ∈ Rm represents the ith instance sample of the training set, and yi is
the ith observed example in the data set ( yi ∈ {1, +1}). We maximize the penalty log-
likelihood function as follows:
1 n → →
I yi = −1 loglogP −x i + I yi = +1 loglog 1 − P − xi − PF α (β)
n i=1
(10)
where I(·) is the indicator function (note that I yi = −1 + I yi = +1 = 1), and
m 1
PF α (β) = k=1 2 (1 − α)βk 2 + α|βk | is the elastic net penalty factor (Zou and Hastie
2005), representing a compromise between the ridge regression penalty (α = 0) and the
lasso penalty (α = 1). We can rewrite Eqs. 8 and 9 more explicitly as follows:
1 n →
− m 1
I yi = −1 β0 + −
→
x i ′β − loglog 1 + eβ0 + x i ′β − (1 − α)βk 2 + α|βk |
n i=1 k=1 2
(11)
Step 1: FE preprocessing
Most churn models use features that are not readily available in the dataset. Moreover,
the original feature set may not be sufficiently informative to predict customer churn.
Therefore, creating new features through FE is critical to expanding the number of
potential predictor candidates.
In the first step of the proposed framework, we create new features based on RFM,
which summarize customers’ purchasing behavior (Fader et al. 2005; Zhang et al. 2015)
and are strongly related to churn behavior in the banking industry. Recency is the time
of the most recent purchase; frequency corresponds to the number of prior purchases;
and monetary value depicts the average purchase amount per transaction (Fader et al.
2005; Zhang et al. 2015; Heldt et al. 2021).
In our propositions, we considered the concept of the traditional RFM approach
(Fader et al. 2005) and disaggregated per product category (RFM/P) (Heldt et al. 2021).
We used credit cards, overall credits, and investments as categories. We proposed new
features based on the original data and aligned them with the recency concept, which
comprehends the overall recency, recency per product category, and recency per chan-
nel. We also proposed new features derived from frequency, overall, or per product cat-
egory. We registered the number of periods with at least one transaction. In each period,
we recorded the total number of transactions, the overall binary indicator of purchase
incidence, the binary indicator of purchase incidence, and the binary indicator of using
a specific channel. Finally, consistent with the monetary value concept, we proposed
new features, such as the overall contribution margin, overall revenue, and overall value
transacted per product category.
Brito et al. Financial Innovation (2024) 10:17 Page 11 of 29
Framework testing
Combining all three methods sampled for the preprocessing phase (see “Phase 1: data
preprocessing” section), we derive eight combinations: FE (yes or no) × IDT-over
(ADASYN or no) x IDT-under (NEARMISS or RU). We compare all the combina-
tions against a ninth alternative taken as a baseline of not doing any preprocessing
phase (no FE, no IDT-over, and no IDT-under) using only the 32 original features.
Moreover, we combined all nine preprocessing choices with the two classification
techniques (XGBoost or elastic net) examined, deriving 18 configurations for testing.
They are presented in the first five columns of Table 5 as a framework.
ADASYN algorithm does not determine a priori the optimal number of churned
customers to synthesize. We want to add a certain number of churned custom-
ers that reinforce the boundary between classes without compromising the sam-
ple’s representativeness. We performed experiments by adding as many artificial
churned customers as possible without identifying any significant difference between
the empirical cumulative densities of the distribution of rare class—original versus
ADASYN.
We also performed experiments to define the KNN parameter, as both ADASYN and
NEARMISS rely on it. Numerical experiments suggest that k = 5.
The implementations used the R programming language version 4.2.1 (R Core Team
2022) package “themis” (Hvitfeldt 2022) for NEARMISS and ADASYN implementa-
tions, package “tune” for Bayesian hyperparameter optimization (Kuhn 2022), package
“xgboost” (Chen et al. 2022) for XGBoost, and package “glmnet” (Friedman et al. 2010)
for elastic net. Next, we describe the process of creating new features in the FE stage.
Brito et al. Financial Innovation (2024) 10:17 Page 14 of 29
RFM aggregated
33. Purchase recency [int] 34. Purchase frequency [int] 39. Total value transacted monthly
[real]
35. Relative purchase frequency 40. Total contribution margin [real]
[real]
36. Purchase incidence [bin]
37. Effective total amount trans-
acted monthly [real]
38. Number of distinct monthly
transactions [int]
RFM disaggregated per product category
41. Credit purchase recency [int] 44. Credit purchase incidence [bin] 56. Overall investment value [real]
42. Investment purchase recency 45. Investment purchase incidence 57. Overall credit value [real]
[int] [bin]
43. Use of mobile channels recency 46. Use of mobile channels inci- 58. The ratio of overall credit value
[int] dence [bin] over credit value taken by the cus-
tomer with all banks in the market
[real]
47. Number of direct debits [int] 59. Number of investment products
terminated [int]
48. The number of interactions in 60. Number of credit contracts
mobile channels [int] finished or terminated [int]
49. The number of transactions and
purchases in mobile channels [int]
50. Number of distinct products
purchased [int]
51. Whether the customer has used
a credit card [bin]
52. Whether the customer has done
any debt renegotiation [bin]
53. Whether the customer has any
credit taken [bin]
54. Whether the customer has any
overdue credit [bin]
55. Whether the customer has a
credit card [bin]
Binary features bin ∈ {0, 1}; real features real ∈ R ; integer non-negatives features [int] ∈ Z +; and categorical features
[cat]
the most recent value in the training time frame for traits at nominal and binary lev-
els, converting it to dummy features using the one-hot encoding procedure. For the
other continuous features, we take the median of the last six periods.
In the following subsection, we analyze the impact of preprocessing stages and classi-
fication models on churn prediction performance. We also examine the influence of the
IDT-over and IDT-under algorithms on the majority and rare class distributions.
Fig. 3 Comparison among cumulative distributions from the rare and majority classes after applying IDT
preprocessing stages
Brito et al. Financial Innovation (2024) 10:17 Page 16 of 29
features—credit purchase recency and investment purchase recency. These features are
among the most important for predicting churn behavior according to the importance
features index (Gain) from XGBoost (discussed at the end of this section).
Figure 3a, c depict that the artificially added churned customers did not modify the
original distribution (p value 1.0), although ADASYN does not use information from the
distribution. Figure 3b, d depict the same result for the NEARMISS algorithm—the dis-
tributions of retained clients and after-NEARMISS maintained customers are not sig-
nificantly different (p value 0.9), although NEARMISS does not use information from
the distribution.
Performance results
The predictive performance of the proposed framework (configurations C1 and C2 in
Table 5) was tested against 16 alternative configurations by (1) using or not using FE,
(2) using ADASYN for IDT-over or not using ADASYN, and (3) using NEARMISS or
RU for IDT-under. We alternate the classification techniques (e.g., XGBoost and elastic
net) in such configurations to assess their performance on churn prediction. We trained
and tested each of these 18 configurations using 100 folds cross-validation. We used
configurations C17 and C18 as references, not undergoing preprocessing stages. Table 5
presents the average PR-AUC, accuracy, recall, and specificity sorted by decreasing PR-
AUC. “Appendix B” depicts a confusion matrix of the average of the 100-fold cross-vali-
dation procedure for the recommended configurations C1 and C2.
Table 5 reveals that, on average, configurations C1 and C2 (e.g., FE, ADASYN, and
NEARMISS coupled with XGBoost and elastic net) yielded the highest predictive
Table 5 Average of the 100-fold cross validation predictive performance for the 18 assessed
configurations
Configuration FE IDT-Over IDT-Under Classification PR-AUC Accuracy Recall Specificity
model
customers as retained, which accounted for a perfect specificity. Although the accuracies
of C17 and C18 are the highest among all configurations, they are misleading as they did
not classify a single churned customer correctly. Figure 4 depicts the boxplots for all 16
configurations and four performance metrics (we omit configurations C17 and C18 due
to the effects of data unbalance on assessed performance metrics). Such boxplots depict
a similar variability of all performance metrics in the 100 replicates, suggesting high sta-
bility of the assessed metrics even when the performance levels are low.
Next, we statistically assessed how the different preprocessing steps and classification
techniques impacted the performance metrics. We first employed the multivariate anal-
ysis of variance (MANOVA) to test whether the three stages (FE, data balance, and clas-
sification model) are significant in the four predictive performance metrics. To apply the
MANOVA test, we initially tested for the normality of the performance metrics using
the Shapiro–Wilk test. Only three out of the 64 performance averages did not pass the
test (α < 0.05). Next, we carried out the MANOVA test. Table 6 presents the results.
The results indicate that all groups within the stages are different regarding the per-
formance metrics. The preprocessing steps are informative for the predictive perfor-
mance across all the metrics tested. The Pillai results for each stage indicate that FE had
the highest impact on predictive performance (Pillai = 0.8115), followed by the clas-
sification model (Pillai = 0.5646), IDT-over (Pillai = 0.2739), and IDT-under method
(Pillai = 0.2255).
Based on the MANOVA results, we conducted individual univariate ANOVA tests for
each performance metric, as presented in Table 7. This was conducted to check the sig-
nificance of each factor in each performance metric.
The univariate ANOVAs confirmed the significant differences when considering each
performance metric separately. They also supported the higher impact on all perfor-
mance metrics obtained from using FE relative to the other stages studied. Addition-
ally, we assessed the statistical differences of the best-ranked configurations using C1
configuration (e.g., FE, ADASYN, NEARMISS, and XGBoost) as a reference. A post hoc
analysis using pairwise t-tests revealed that the pairs C1–C2, C1–C3, C1–C4, and C1–
C5 are all significantly different (α < 0.05) for all performance metrics.
Given the impact of the FE stage in the framework proposed, we extracted the feature
importance (Gain) using Eq. 1 for C1 configuration to compare the contribution of the
most relevant original and created features. Table 8 presents the 10+ features, compris-
ing 80%1 of the accumulated feature importance.
1
Using the classic Pareto rule, Appendix C exhibits the complete list of remaining features resulting from XGBoost.
Brito et al. Financial Innovation (2024) 10:17 Page 19 of 29
Credit purchase recency is the topmost important feature, accounting for 46.23%
of feature importance when predicting customer churn. This finding reinforces the
conceptual foundations of customer behavior in the banking industry and the method
used to support the proposition of this new feature. When acquiring credit products,
clients sign contracts that usually last several months, so it is not surprising that a
customer remains active for a while. Once the credit contract has ceased and recency
starts to increase, the likelihood of defection also increases significantly, most likely
because of low switching costs.
According to the traditional RFM concept, recency is more influential than fre-
quency in defining the likelihood of a customer being active (Fader et al. 2005).
Regarding the derived RFM/P concept, the same rationale is valid, with the only
Brito et al. Financial Innovation (2024) 10:17 Page 20 of 29
difference being the disaggregation per product category. In the present study, the
relevance of credit purchase recency for customer retention in the banking industry
confirms the findings of the existing literature.
Besides credit purchase recency, usage of mobile channels and investment purchase
recency are also relevant for churn prediction using the recommended C1 configuration.
It confirms the importance of features based on the RFM/P concept, which is consist-
ent with the rationale that recency is critical to defining whether a customer is likely to
remain active or not in the future.
Apart from the credit-related features, deposit and investment-related features were
also influential for churn prediction. Customers’ withdrawals from checking and invest-
ment accounts indicate possible churn events in the future. As recency measures the
time between withdrawals, smaller recency suggests that a customer is likely to churn.
The recency of customer interactions through mobile channels also contributes to
that sense, helping to identify customers that might be making transactions with a
competitor.
Overall customer income is also deemed one of the most relevant features by the
proposed framework as it contributes to increasing the retention rate. Low-income
customers are less likely to purchase investment products and more likely to default.
Additionally, due to their low credit limits, such customers often acquire credit from dif-
ferent financial institutions.
The number of distinct products purchased was also relevant to predicting customer
churn. This feature correlates with the concept of purchase frequency as it indicates
that a customer has purchased at least one financial service in a period. However, it also
encompasses information on how many different financial services a customer has pur-
chased. Therefore, it reflects the success of cross-selling efforts made by sales teams. The
relevancy of this feature to predict churn indicates that the higher the number of finan-
cial services a customer purchases, the less likely she is to churn.
We analyzed the impact of customer churn on a firm’s financial performance in terms
of the profit loss that would be avoided by employing the proposed framework. We
observed that 97.25% of profitability could be maintained if successful action was taken
to reverse the potential churn. When no action is taken, a firm would incur costs to
acquire new customers in order to recover the loss in profitability.
Furthermore, customers that are most likely to defect are those currently interacting with
other banks, with a median of 62% of businesses in other financial institutions (“Appen-
dix D”, Fig. 5). Based on the Mann–Whitney test, a significant difference (p value 0.00)
was found in the share of wallets between the churn group and the group of retained cus-
tomers. The median share of wallet for the churn group (µ median = 0.38) was substantially
lower than the nonchurn group (µ median = 0.78). This suggests that the churn group has
business with other financial institutions, and their defection represents a missed opportu-
nity to exploit their business opportunities. By retaining customers, a financial institution
can capitalize on its business potential and generate long-term revenue streams.
Regarding a bank’s products, the profitability loss appears to come mainly from credit.
For example, rural credit was the most affected product, accounting for 35% of the total
lost margin and losing 8% of its margin during the sample period. This information sug-
gests the need for adjustments in offering that product.
Brito et al. Financial Innovation (2024) 10:17 Page 21 of 29
Managerial implications
To deal with customer churn, marketing managers try to identify in advance the
decrease in customer engagement with their company and plan marketing efforts to pre-
vent customers from defecting (Benoit and Poel 2012; Gordini and Veglio 2017). Regard-
ing the retail banking sector, adequate customer retention management has become
increasingly relevant due to the growing competition. The entry of new players relying
on digital and scalable innovative solutions tends to replace traditional services offered
by incumbents and reduce customers’ switching costs (Lazari and Machado 2021).
Managers benefit from the framework proposed in this study in many ways. They can
anticipate potentially churning customers three months before, even with highly imbal-
anced data. The framework not only allows managers to know the most likely churning
customers but also gives them a reasonable amount of time to plan marketing efforts
proactively to prevent such churn from happening.
In addition, based on the FE step of the framework (Step 1) and using the concept
of RFM, we sought to generate features related to churn behavior in the retail banking
industry (see Tables 2, 3). This provides a tailored list of adequate predictors that manag-
ers can use to harness their existing databases, allowing classification algorithms to pre-
dict customer churn more precisely and thus increase the effectiveness of the customer
management team to monitor and manage potential churners.
Finally, based on these proposed features and after estimating the classification algo-
rithms, we also identify features with higher gain (Table 8). Analyzing such metrics
contributes to understanding features that are the most relevant churn predictors in
the retail banking context. For instance, credit purchase recency is the most pertinent
feature. Although customers might transfer a credit contract to another banking insti-
tution anytime they want, this finding suggests that the existence of credit contracts sig-
nificantly increases the likelihood of retaining a customer. Another key feature is the use
of mobile channel recency. As the competition is becoming increasingly dependent on
building digital relationships with customers, engaging them through the frequent use
of mobile channels is critical for their loyalty to a company. In summary, knowing fea-
tures that impact churn behavior is relevant for managers to drive well-designed market-
ing efforts and avoid customer defection. Thus, adopting the proposed framework has
important managerial implications, providing managers with resources to enhance cus-
tomer retention management and thrive in a digital market environment with growing
competition and lower switching costs.
Conclusions
Our study makes a valuable contribution to the research on decision support systems
by proposing a framework to model customer churn behavior in the context of highly
imbalanced classes. We emphasize the importance of conducting data preprocessing
strategies (i.e., FE, IDT-over, and IDT-under processes) to improve model performance.
The FE process is fundamental for building new and informative features to better char-
acterize customers’ behavior, directly impacting model effectiveness. Regarding the
IDT-over and IDT-under strategies, they handle the class imbalance issue through the
ADASYN and NEARMISS algorithms, respectively, providing the machine learning
technique with a suitable number of instances at the modeling stage. In the IDT-over
Brito et al. Financial Innovation (2024) 10:17 Page 22 of 29
stage, the ADASYN algorithm reinforced the decision boundary toward the minority
class (churned customers). In the IDT-under step, the NEARMISS provided an efficient
way to undersampling the retained customers of the majority class such that the main-
tained customers have reduced noise and redundancies, making it easier to identify pat-
terns with the XGBoost and elastic net models.
Compared with alternative configurations, we demonstrated that the algorithms used
in our proposed framework perform well in PR-AUC, accuracy, sensibility, and speci-
ficity. Additionally, as we tested two classification models (XGBoost and elastic net),
we demonstrated that adequate data preprocessing procedures improve the predictive
power in classification models relying on different mathematical fundamentals.
The proposed framework also provided a methodological contribution to the literature
on churn prediction by adequately dealing with highly imbalanced datasets. The thor-
ough work on data preprocessing in the FE step and rebalancing classes, reinforcing the
decision boundary between them, and reducing data redundancy and noise, combined
with state-of-the-art classification techniques, led to a higher predictive performance of
customer churn. Thus, our study substantially enhances customer retention practices,
fostering the adoption of more effective marketing efforts to prevent churn. Preventing
churn increases customer portfolio profitability.
These results confirm the effectiveness of conducting a detailed data preprocessing
before proceeding to the model training. Given the lack of extant research on this topic,
we encourage additional studies to investigate different data preprocessing methods.
They should not only test diverse classification models to achieve gains in predictive per-
formance but also test different combinations of data preprocessing techniques because
it has a significant potential to provide even more accurate predictions.
Finally, we highlight some limitations in the proposed study that can become the
subject of future research. One restriction is about using a single dataset, mainly due
to the challenging task of obtaining real datasets from the retail banking industry. We
encourage future studies to extend the proposed framework to other cases in this indus-
try or even apply it to address binary classification problems other than churn predic-
tion. Another limitation relates to the classification techniques tested in our experiments
(XGBoost and elastic net); different algorithms, such as artificial neural networks and
support vector machines, can also be tested. The decision to use only two types of algo-
rithms was due to the focus of our study in the data preprocessing phase. Another limi-
tation of the study is the absence of features regarding customer transactions with other
companies. Data indicating customers’ share-of-wallet change over time would probably
increase predictive performance. For instance, a decrease in the number of transactions
of a given customer with a focal company may increase the number of transactions of
this customer with competitors. The only available information regarding such behavior
was the binary feature on whether a customer took credit with other companies in the
banking system. However, we found only a weak association of this binary feature with
churn. Therefore, it does not appear as a good churn predictor. Given the lack of such
features in our dataset, we had to apply the framework utilizing data mainly related to
the strict relationship between the focal company and a customer. Future studies can
benefit from using features regarding customer transactions with other companies.
Brito et al. Financial Innovation (2024) 10:17 Page 23 of 29
30. Whether the customer has received a salary with the bank 0.42 1,892,600 1,390,732
31. Whether the customer is a civil servant 0.16 2,748,934 534,398
32. Whether the customer has a dedicated account manager 0.15 2,790,652 492,680
44. Credit purchase incidence 0.83 570,809 2,712,523
45. Investment purchase incidence 0.37 2,080,999 1,202,333
36. Purchase incidence 0.98 58,702 3,224,630
46. Use of mobile channels incidence 0.61 1,290,043 1,993,289
55. Whether the customer has a credit card 0.20 2,635,776 647,556
53. Whether the customer has any credit taken 0.41 1,922,302 1,361,030
54. Whether the customer has any overdue credit 0.08 3,012,753 270,579
52. Whether the customer has done any debt renegotiation 0.01 3,252,890 30,442
51. Whether the customer has used a credit card 0.14 2,816,557 466,775
Brito et al. Financial Innovation (2024) 10:17 Page 24 of 29
Table 11 shows the following descriptive statistics of each real and integer feature: (1)
average, (2) standard deviation, (3) minimum value, (4) median, and (5) maximum value.
We highlight how the distributions of the purchasing recency per product differ signifi-
cantly from the distribution of the overall purchase recency feature (specifically, credit
purchase recency and investment purchase recency). It indicates how the feature engi-
neering process based on the recency, frequency, and monetary value concept (RFM)
was important to generate features that uncover a relevant level of variability which was
not observable by only using the overall purchase recency feature. It provides a richer set
of predictor features for the learning algorithms increasing the likelihood of predicting
the churn more precisely.
C1
Prediction
Churn 6,631.10 36,133.96
Retained 763.90 287,647.04
C2
Prediction
Churn 6,399.63 45,588.36
Retained 995.37 278,192.64
Table 13 (continued)
Feature Original/from FE Gain
37. Effective total amount transacted monthly From FE 0.0166
19. Professional profile Original 0.0146
34. Purchase frequency From FE 0.0142
40. Total contribution margin From FE 0.0137
13. Marital status Original 0.0122
57. Overall credit value From FE 0.0117
59. Number of investment products terminated From FE 0.0090
11. Education level Original 0.0086
05. Checking account balance Original 0.0082
56. Overall investment value From FE 0.0073
45. Investment purchase incidence From FE 0.0071
20. Savings account balance Original 0.0058
48. The number of interactions in mobile channels From FE 0.0053
46. Use of mobile channels incidence From FE 0.0042
12. Gender Original 0.0039
22. Segment (Method B) Original 0.0036
33. Purchase recency From FE 0.0031
54. Whether the customer has any overdue credit From FE 0.0029
17. Overdraft limit Original 0.0027
58. The ratio of overall credit value over credit value taken by the cus- From FE 0.0026
tomer with all banks in the market
23. The region where the customer’s bank agency is located Original 0.0016
Fig. 5 Mann–Whitney test to compare the share of wallet between churned and retained customers
Brito et al. Financial Innovation (2024) 10:17 Page 27 of 29
Abbreviations
C1 Configuration 1
C2 Configuration 2
C3 Configuration 3
C4 Configuration 4
C5 Configuration 5
C6 Configuration 6
C7 Configuration 7
C8 Configuration 8
C9 Configuration 9
C10 Configuration 10
C11 Configuration 11
C12 Configuration 12
C13 Configuration 13
C14 Configuration 14
C15 Configuration 15
C16 Configuration 16
C17 Configuration 17
C18 Configuration 18
CCP Customer churn prediction
FE Feature engineering
FS Feature selection
IDT Imbalanced dataset treatment
IDT-over Imbalanced dataset treatment oversampling
IDT-under Imbalanced dataset treatment undersampling
KNN K-Nearest neighbor
KS-test Kolmogorov–Smirnov test
MVI Missing values imputation
PR-AUC Precision-recall area under curve
OT Outliers treatment
RFM Recency, frequency, and monetary value.
RFM/P Recency, frequency, and monetary value/product
RU Random undersampling
SMOTE Synthetic minority oversampling technique
Acknowledgements
Not applicable.
Author contributions
JBGB designed the model, and the computational framework analyzed the data and carried out the implementation.
GBB, RH, JLB, and MJA verified the analytical methods and contributed to writing the manuscript with input from all
authors. CSS, FBL, and MJA supervised the findings of this work and contributed to the final version of the manuscript. All
authors read and approved the final manuscript.
Funding
Not applicable.
Declarations
Competing interests
The authors declare no competing interests.
References
Ascarza E (2018) Retention futility: targeting high-risk customers might be ineffective. J Mark Res 55:80–98. https://doi.
org/10.2139/ssrn.2759170
Ascarza E, Hardie BGS (2013) A joint model of usage and churn in contractual settings. Mark Sci 32:570–590. https://doi.
org/10.1287/mksc.2013.0786
Ascarza E, Neslin SA, Netzer O et al (2018) In pursuit of enhanced customer retention management: review, key issues,
and future directions. Cust Need Solut 5:65–81. https://doi.org/10.1007/s40547-017-0080-0
Bafna R, Jain R, Malhotra R (2023) A comparative study of classification techniques and imbalanced data treatment for
prediction of software faults. Res Sq. https://doi.org/10.21203/rs.3.rs-2809140/v1
Brito et al. Financial Innovation (2024) 10:17 Page 28 of 29
Benoit DF, den Poel DV (2012) Improving customer retention in financial services using kinship network information.
Expert Syst Appl 39:11435–11442. https://doi.org/10.1016/j.eswa.2012.04.016
Broby D (2021) Financial technology and the future of banking. Financ Innov 7:47. https://doi.org/10.1186/
s40854-021-00264-y
Broby D (2022) The use of predictive analytics in finance. J Finance Data Sci 8:145–161. https://doi.org/10.1016/j.jfds.2022.
05.003
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: proceedings of the 22nd ACM SIGKDD interna-
tional conference on knowledge discovery and data mining. Association for Computing Machinery, New York, pp
785–794
Chen T, He T, Benesty M, et al (2022) xgboost: Extreme gradient boosting. CRAN R package version 1.6.0.1: https://
CRAN.R-project.org/package=xgboost
Datta H, Foubert B, Van Heerde HJ (2015) The challenge of retaining customers acquired with free trials. J Mark Res
52:217–234. https://doi.org/10.1509/jmr.12.0160
Dey I, Pratap V (2023) A comparative study of SMOTE, borderline-SMOTE, and ADASYN oversampling techniques using
different classifiers. In: 2023 3rd international conference on smart data intelligence (ICSMDI), pp 294–302
Fader PS, Hardie BGS, Lee KL (2005) “Counting your customers” the easy way: an alternative to the pareto/NBD model.
Mark Sci 24:275–284. https://doi.org/10.1287/mksc.1040.0098
Farquad MAH, Ravi V, Raju SB (2014) Churn prediction using comprehensible support vector machine: an analytical CRM
application. Appl Soft Comput 19:31–40. https://doi.org/10.1016/j.asoc.2014.01.031
Fernandez A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges,
marking the 15-year anniversary. J Artif Intell Res 61:863–905. https://doi.org/10.1613/jair.1.11192
Feyen E, Frost J, Gambacorta L et al (2021) Fintech and the digital transformation of financial services: implications for
market structure and public policy. BIS Papers 117. https://www.bis.org/publ/bppdf/bispap117.htm
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat
Softw 33:1–22
Galar M, Fernandez A, Barrenechea E et al (2012) A review on ensembles for the class imbalance problem: bagging-,
boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (appl Rev) 42:463–484. https://doi.org/
10.1109/TSMCC.2011.2161285
García S, Luengo J, Herrera F (2014) Data preprocessing in data mining. Springer, Berlin
Geiler L, Affeldt S, Nadif M (2022) A survey on machine learning methods for churn prediction. Int J Data Sci Anal. https://
doi.org/10.1007/s41060-022-00312-5
Gordini N, Veglio V (2017) Customers churn prediction and marketing retention strategies. An application of support
vector machines based on the AUC parameter-selection technique in B2B e-commerce industry. Ind Mark Manag
62:100–107. https://doi.org/10.1016/j.indmarman.2016.08.003
Gür Ali Ö, Arıtürk U (2014) Dynamic churn prediction framework with more effective use of rare event data: the case of
private banking. Expert Syst Appl 41:7889–7903. https://doi.org/10.1016/j.eswa.2014.06.018
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New York
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. Hong Kong
He B, Shi Y, Wan Q, Zhao X (2014) Prediction of customer attrition of commercial banks based on SVM model. Procedia
Comput Sci 31:423–430. https://doi.org/10.1016/j.procs.2014.05.286
Heldt R, Silveira CS, Luce FB (2021) Predicting customer value per product: from RFM to RFM/P. J Bus Res 127:444–453.
https://doi.org/10.1016/j.jbusres.2019.05.001
Huang B, Kechadi MT, Buckley B (2012) Customer churn prediction in telecommunications. Expert Syst Appl 39:1414–
1425. https://doi.org/10.1016/j.eswa.2011.08.024
Hvitfeldt E (2022) themis: Extra recipes steps for dealing with unbalanced data. CRAN R package version 1.0.0: https://
CRAN.R-project.org/package=themis
Jassim MA, Abdulwahid SN (2021) Data mining preparation: process, techniques and major issues in data analysis. IOP
Conf Ser: Mater Sci Eng 1090:012053. https://doi.org/10.1088/1757-899X/1090/1/012053
Keramati A, Ghaneei H, Mirmohammadi SM (2016) Developing a prediction model for customer churn from electronic
banking services using data mining. Financ Innov 2:10. https://doi.org/10.1186/s40854-016-0029-6
Khoh WH, Pang YH, Ooi SY et al (2023) Predictive churn modeling for sustainable business in the telecommunication
industry: optimized weighted ensemble machine learning. Sustainability 15:8631. https://doi.org/10.3390/su151
18631
Kou G, Olgu Akdeniz Ö, Dinçer H, Yüksel S (2021a) Fintech investments in European banks: a hybrid IT2 fuzzy multidimen-
sional decision-making approach. Financ Innov 7:39. https://doi.org/10.1186/s40854-021-00256-y
Kou G, Xu Y, Peng Y et al (2021b) Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective
feature selection. Decis Support Syst 140:113429. https://doi.org/10.1016/j.dss.2020.113429
Kuhn M (2022) tune: Tidy tuning tools. CRAN R package version 1.0.1. https://CRAN.R-project.org/package=tune
Kuhn M, Johnson K (2019) Feature engineering and selection: a practical approach for predictive models. CRC Press, Boca
Raton
Lähteenmäki I, Nätti S (2013) Obstacles to upgrading customer value-in-use in retail banking. Int J Bank Mark 31:334–347.
https://doi.org/10.1108/IJBM-11-2012-0109
Lahmiri S, Bekiros S, Giakoumelou A, Bezzina F (2020) Performance assessment of ensemble learning systems in financial
data classification. Int J Intell Syst Account Finance Manag 27:3–9. https://doi.org/10.1002/isaf.1460
Lazari N, Machado G (2021) The future of banking: growing digitalization of Brazil’s financial system will foster efficiency
and intensify competition. S&P Global Ratings
Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Mark Res 43:276–286. https://
doi.org/10.1509/jmkr.43.2.276
Lemmens A, Gupta S (2020) Managing churn to maximize profits. Mark Sci 39:956–973. https://doi.org/10.1287/mksc.
2020.1229
Brito et al. Financial Innovation (2024) 10:17 Page 29 of 29
Li T, Kou G, Peng Y, Yu PS (2022) An integrated cluster detection, optimization, and interpretation approach for financial
data. IEEE Trans Cybern 52:13848–13861. https://doi.org/10.1109/TCYB.2021.3109066
Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409–410:17–
26. https://doi.org/10.1016/j.ins.2017.05.008
Livne G, Simpson A, Talmor E (2011) Do customer acquisition cost, retention and usage matter to firm performance and
valuation? J Bus Financ Acc 38:334–363. https://doi.org/10.1111/j.1468-5957.2010.02229.x
Megahed FM, Chen Y-J, Megahed A et al (2021) The class imbalance problem. Nat Methods 18:1270–1272. https://doi.
org/10.1038/s41592-021-01302-4
Murinde V, Rizopoulos E, Zachariadis M (2022) The impact of the FinTech revolution on the future of banking: opportuni-
ties and risks. Int Rev Financ Anal 81:102103. https://doi.org/10.1016/j.irfa.2022.102103
Mutanen T, Nousiainen S, Ahola J (2010) Customer churn prediction—a case study in retail banking. In: Data mining for
business applications, pp 77–83. https://doi.org/10.3233/978-1-60750-633-1-77
Pousttchi K, Dehnert M (2018) Exploring the digitalization impact on consumer decision-making in retail banking. Elec-
tron Markets 28:265–286. https://doi.org/10.1007/s12525-017-0283-0
Pyle D (1999) Data preparation for data mining (The Morgan Kaufmann series in data management systems), Book&CD-
ROM 1st. Morgan Kaufmann, Burlington
R Core Team (2022) R: a language and environment for statistical computing. R Project. https://www.R-project.org/
Reichheld FF, Sasser WE (1990) Zero defections: quality comes to services. Harvard business review. https://hbr.org/1990/
09/zero-defections-quality-comes-to-services
Sammut C, Webb GI (eds) (2010) Data preprocessing. In: Encyclopedia of machine learning. Springer, Boston, MA, pp
260–260
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in
neural information processing systems 25 (NIPS 2012). NeurIPS proceedings
Sofaer HR, Hoeting JA, Jarnevich CS (2019) The area under the precision-recall curve as a performance metric for rare
binary events. Methods Ecol Evol 10:565–577. https://doi.org/10.1111/2041-210X.13140
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Patt Recogn Artif Intell 23:687–719.
https://doi.org/10.1142/S0218001409007326
Tékouabou SCK, Gherghina ŞC, Toulni H, Neves Mata P, Mata MN, Martins JM (2022) A Machine Learning Framework
towards Bank Telemarketing Prediction. J Risk Financ Manag 15:269. https://doi.org/10.3390/jrfm15060269
Triguero I, Derrac J, Garcia S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest
neighbor classification. IEEE Trans Syst, Man, Cybern C 42:86–100. https://doi.org/10.1109/TSMCC.2010.2103939
Victoria AH, Maragatham G (2021) Automatic tuning of hyperparameters using Bayesian optimization. Evol Syst
12:217–223. https://doi.org/10.1007/s12530-020-09345-2
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6:7–19. https://doi.org/10.1145/10077
30.1007734
Xie Y, Li X, Ngai EWT, Ying W (2009) Customer churn prediction using improved balanced random forests. Expert Syst
Appl 36:5445–5449. https://doi.org/10.1016/j.eswa.2008.06.121
Zhang J, Mani I (2003) KNN Approach to Unbalanced Data Distributions: a case study involving information extraction. In:
Proceeding of international conference on machine learning. ICML United States, Washington DC
Zhang Y, Bradlow ET, Small DS (2015) Predicting customer value using clumpiness: from RFM to RFMC. Mark Sci 34:195–
208. https://doi.org/10.1287/mksc.2014.0873
Zhao J, Dang X-H (2008) Bank Customer churn prediction based on support vector machine: taking a commercial bank’s
VIP customer churn as the example. In: 2008 4th international conference on wireless communications, networking
and mobile computing. IEEE, Dalian, China, pp 1–4
Zhao H, Zuo X, Xie Y (2022) Customer churn prediction by classification models in machine learning. In: 2022 9th interna-
tional conference on electrical and electronics engineering (ICEEE). pp 399–407
Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly
Media, Inc
Zhu B, Baesens B, Backiel A, vanden Broucke SKLM (2018) Benchmarking sampling techniques for imbalance learning in
churn prediction. J Oper Res Soc 69:49–65. https://doi.org/10.1057/s41274-016-0176-1
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (stat Methodol)
67:301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.