Jtaer 17 00024
Jtaer 17 00024
Abstract: Customer churn prediction is very important for e-commerce enterprises to formulate
effective customer retention measures and implement successful marketing strategies. According to
the characteristics of longitudinal timelines and multidimensional data variables of B2C e-commerce
customers’ shopping behaviors, this paper proposes a loss prediction model based on the combination
of k-means customer segmentation and support vector machine (SVM) prediction. The method
divides customers into three categories and determines the core customer groups. The support vector
machine and logistic regression were compared to predict customer churn. The results show that each
prediction index after customer segmentation was significantly improved, which proves that k-means
clustering segmentation is necessary. The accuracy of the SVM prediction was higher than that of
the logistic regression prediction. These research results have significance for customer relationship
management of B2C e-commerce enterprises.
Keywords: B2C e-commerce customer; k-means; logistic regression; SVM; customer churn
Citation: Xiahou, X.; Harada, Y. B2C 1. Introduction
E-Commerce Customer Churn
Customers are one of the most important assets of an enterprise, and they play a
Prediction Based on K-Means and
very important role in improving the market competitiveness and performance of the
SVM. J. Theor. Appl. Electron. Commer.
enterprise [1]. Amid fierce market competition, customers can easily choose among
Res. 2022, 17, 458–475. https://
numerous products or service providers [2]. Studies show that the cost of developing
doi.org/10.3390/jtaer17020024
a new customer is often higher than the cost of retaining an old customer [3]. If an
Academic Editor: Eduardo enterprise maintains a good relationship with customers for a long time, it will gain
Álvarez-Miranda more profits from existing customers. If the customer retention rate increases by 5%,
Received: 9 January 2022
the net present value of the enterprise will increase by 25–95% [4]. When the customer
Accepted: 30 March 2022
churn rate is reduced by 5%, the average profit margin of an enterprise will increase
Published: 6 April 2022
by 25–85% [5,6]. In order to maintain market advantages, it has become important for
enterprises to determine how to make use of existing customer resources and avoid the
Publisher’s Note: MDPI stays neutral
loss of existing customers [7].
with regard to jurisdictional claims in
Customer churn prediction techniques can be used to identify customers that may
published maps and institutional affil-
be lost. Then, marketing strategies can be improved according to the forecast results.
iations.
Retention of existing customers can effectively prevent performance loss [8]. Datta et al.
proposed a framework for customer churn management that comprised data acquisi-
tion, business understanding, feature selection, a construction model, and a validation
Copyright: © 2022 by the authors.
model [9], this literature described CHAMP (CHurn Analysis, Modeling, and Prediction),
Licensee MDPI, Basel, Switzerland. an automated system, and offered specific guiding methods for researching customer
This article is an open access article churn management, which is significant. Jain et al. showed that in the highly competitive
distributed under the terms and telecom market, companies must analyze corporate behaviors to determine customer
conditions of the Creative Commons loss and take effective measures to retain existing customers [10]. Research on churn pre-
Attribution (CC BY) license (https:// diction was carried out in the telecom, banking, retail, and other industries. For example,
creativecommons.org/licenses/by/ in the prediction of telecom customer churn, Coussement et al. examined the customers’
4.0/). calling behavior, customer and operator’s interaction behavior, package subscription,
J. Theor. Appl. Electron. Commer. Res. 2022, 17, 458–475. https://doi.org/10.3390/jtaer17020024 https://www.mdpi.com/journal/jtaer
J. Theor. Appl. Electron. Commer. Res. 2022, 17 459
2. Literature Review
The forecasting methods of customer churn can be summarized into three types:
forecasting methods based on traditional statistical analysis, prediction methods based
on machine learning, and prediction methods based on combinatorial classifiers. The
prediction methods based on traditional statistical analysis mainly include linear discrim-
inant analysis, the naive Bayesian model, cluster analysis, and logistic regression. For
example, Pınar et al. used a naive Bayes classifier to predict customer churn of a tele-
com company in 2011. Their results showed that the average call duration of customers
was strongly correlated with customer churn [22]. Renjith et al. predicted e-commerce
customer churn by logistic regression and proposed a personalized customer retention
strategy using machine learning [23]. Caignya et al. combined logistic regression and a
decision tree to predict customer churn in the telecom industry [24]. In these studies, tradi-
tional statistical analysis methods were used for prediction, and the prediction model had
strong interpretability. However, these methods have a limitation while dealing with big
data and multidimensional variable data; the prediction performance was not obvious in
these cases.
Prediction methods based on machine learning mainly include decision trees, the
support vector machine, and artificial neural networks, among others. For example,
Neslin et al. stated that decision trees are widely used in practical customer churn prediction.
The decision tree algorithm can be applied as the basic model of loss prediction [25].
Zhang et al. used the C5.0 decision tree to predict the loss of postal short message service
of telecom enterprises. The results showed that the C5.0 decision tree prediction model
had high accuracy [26]. Farquad et al. predicted the churn of bank credit card customers
and proposed a hybrid method to extract rules from the support vector machine [27].
Gordini et al. predicted the loss of B2B e-commerce customers, and the results showed
that the support vector machine had a good prediction performance in processing noisy,
unbalanced, and non-linear B2B e-commerce data [7]. Tian et al. predicted the churn
of telecom customers, extracted appropriate variables from the original data with a two-
layer neural network and proposed a churn prediction model based on an artificial neural
network. The results showed that the prediction effect of this method was better than
that of the decision tree and naive Bayes classifier [28]. Yu et al. studied the prediction
of telecom customer churn, used iterative particle classification optimization and particle
fitness calculation to train the prediction model, and proposed a BP neural network based
on particle classification optimization. The results showed that their method improved the
accuracy of churn prediction [29].
The prediction methods based on combinatorial classifiers integrate several weak
classifiers and form a strong classifier. Commonly used combinative classifier methods
include AdaBoost, XGBoost, and random forest. For example, Wu et al. studied the e-
commerce customer churn prediction problem by balancing positive and negative class
data by adjusting the sampling ratio, reducing the size of the dataset, and improving the
classification accuracy of the classifier by combining the AdaBoost algorithm. The results
showed that Adaboost had a good prediction effect [30]. For datasets with time charac-
teristics, such as that of telecom industry customers, Ji et al. proposed a hybrid feature
selection algorithm based on XGBoost. The algorithm chose features from two angles to
select the most important characteristic to predict customer churn while removing redun-
dant features. The experimental results showed that this method had a good prediction
performance [31]. Ahmed et al. studied the prediction of customer churn in the telecom in-
dustry and proposed a prediction model based on a combinatorial heuristic algorithm [32].
Ying et al. performed an in-depth study on the duality of bank customers and adopted
integrated LDA and boosting methods to predict customer churn and achieved good pre-
diction results [33]. Zhang et al. used CART and the adaptive boosting integrated model to
predict telecom customer churn. The results showed that their method had high prediction
accuracy [34].
In summary, researchers have used various forecasting methods to conduct in-depth
research on the churn of contractual customers in the telecom industry, banking industry,
J. Theor. Appl. Electron. Commer. Res. 2022, 17 461
and B2B e-commerce enterprises. Previous works have also discussed the advantages of
various methods, thus making valuable contributions to the research on churn prediction
of contractual customers. Customer loss of B2C e-commerce enterprises is concerning.
SuchIncustomers’
summary, researchers
shopping have used various
behavior forecasting methods
is multidimensional, to conduct
and their in-depth
shopping intentions
research on the churn of contractual customers in the telecom industry, banking industry,
and tendencies are personalized. Therefore, using the characteristics of customer data, this
and B2B e-commerce enterprises. Previous works have also discussed the advantages of
work studies the loss of non-contractual customers of B2C e-commerce enterprises.
various methods, thus making valuable contributions to the research on churn prediction
of contractual customers. Customer loss of B2C e-commerce enterprises is concerning.
3. Research
Such Methods
customers’ shopping behavior is multidimensional, and their shopping intentions
and tendencies
E-commerce are personalized. Therefore,
customers have using thebehaviors:
two possible characteristics of customer
customer churndata,
andthis
customer
work studies the
non-churn. loss of non-contractual
Therefore, customer churncustomers of B2C is
prediction e-commerce
a binary enterprises.
classification problem.
Online shopping has its own characteristics in terms of shopping time and behavior. Tak-
3. Research Methods
ing shopping time and behavioral tendency into full consideration may play an important
E-commerce customers have two possible behaviors: customer churn and customer
role in judging customer loss. In this research, there are two important steps- customer
non-churn. Therefore, customer churn prediction is a binary classification problem. Online
segmentation
shopping has its and
own churn prediction.
characteristics inThe
termsclustering
of shoppingalgorithm
time and and prediction
behavior. model that
Taking
are expected
shopping timetoandbe behavioral
adopted are important
tendency intotasks that must be may
full consideration determined. The chosen clus-
play an important
role in judging customer loss. In this research, there are two important steps- customer In the
tering algorithm is dependent on the type of data and the purpose of clustering.
literature onand
segmentation customer segmentation
churn prediction. from thealgorithm
The clustering last five and
years, the customer
prediction model thatsegmentation
are
expected
method to viabek-means
adopted are hasimportant tasks that
been adopted in must be determined.
the e-commerce The chosen
[35,36], retailclustering
[37,38], finance
algorithm
[39,40], andis dependent on the type of
telecommunications data and the
industries purpose
[41,42]. Thisofalgorithm
clustering.canIn the
helpliterature
process large-
on
scale data and exhibits simple calculation and high operation efficiency, andmethod
customer segmentation from the last five years, the customer segmentation thus has been
via k-means has been adopted in the e-commerce [35,36], retail [37,38], finance [39,40],
widely adopted in other industries [43]. In terms of prediction models, comparisons
and telecommunications industries [41,42]. This algorithm can help process large-scale
among prediction effects of different models for most literature are made by using a uni-
data and exhibits simple calculation and high operation efficiency, and thus has been
fied data
widely set toindetermine
adopted other industries the optimal one.ofInprediction
[43]. In terms the literature
models, oncomparisons
churn prediction
among [7], by
comparing prediction models, such as LR, SVM, Neural Network,
prediction effects of different models for most literature are made by using a unified data Decision Trees, and
Random
set Forest,the
to determine the research
optimal one.results show thatonthe
In the literature SVM
churn model exhibited
prediction a positive pre-
[7], by comparing
diction performance
prediction models, suchwith as LR, a SVM,
quickNeural
training speed.Decision
Network, Therefore, considering
Trees, and Random theForest,
obvious ad-
the research results show that the SVM model exhibited a positive
vantages of the k-means and SVM algorithms, we chose the k-means and SVM as the main prediction performance
with a quick training
algorithms speed. and
in this paper Therefore,
madeconsidering
a comparative the obvious
analysisadvantages
throughofLR. the Onk-means
the basis of
and SVM algorithms, we chose the k-means and SVM as the
pre-processed data, this study first uses the k-means algorithm for the clustering main algorithms in thissubdivi-
paper and made a comparative analysis through LR. On the basis of pre-processed data,
sion of customers and then uses the LR algorithm and SVM algorithm to establish predic-
this study first uses the k-means algorithm for the clustering subdivision of customers
tion models. The prediction accuracy and effectiveness of these two prediction models are
and then uses the LR algorithm and SVM algorithm to establish prediction models. The
investigated,
prediction and and
accuracy a comparative
effectivenessanalysis
of these twois conducted. The research
prediction models process isand
are investigated, shown in
aFigure 1.
comparative analysis is conducted. The research process is shown in Figure 1.
Thefollowing
The following is is a brief
a brief introduction
introduction toalgorithms
to the the algorithms
of SVM ofand
SVM and LR.
LR.
Figure1.1.The
Figure Theprocess
processof of
thethe
SVM andand
SVM LR models for customer
LR models churn churn
for customer prediction.
prediction.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 462
m
∑ ai yi = 0 (3)
i =1
Substitute Equations (2) and (3) into Equation (1), and finally convert the original
problem into the following objective function (4) for the solution:
m
1 m m
max ∑ ai −
2 i∑ ∑ ai a j yi y j xi x j s.t. αi ≥ 0, i = 1, 2, . . . , m (4)
i =1 =1 j =1
SVM can simulate the nonlinear decision boundary well with the kernel function and
also control overfitting.
The advantage of SVM is that it can improve the generalization ability and deal with
high-dimensional data problems.
exp(wx + b)
P (Y = 1 | X ) = (5)
1 + exp(wx + b)
1
P (Y = 0 | X ) = (6)
1 + exp(wx + b)
where the real values of the linear regression model z = WT x + b are transformed into values
between [0, 1] by the sigmoid function. In other words, the result of a certain sample X
calculated by Equation (5) or (6) is the probability that the sample point X belongs to a
certain class. We can categorize them by setting thresholds. Generally, values calculated by
the sigmoid function are classified as category 1 if they are greater than or equal to 0.5 and
category 0 if they are not. The advantage of LR is that the results are easy to interpret and
applicable to both continuous and categorical variables.
4. Empirical Study
4.1. Data Preparation
The original dataset was the dataset published by the Alibaba Cloud Tianchi
platform [49] for scientific research and big data competitions. This dataset contains
all the behavior data of 987,994 random users who have shopping activities between
J. Theor. Appl. Electron. Commer. Res. 2022, 17 463
23 November 2017 and 4 December 2017. The data set includes five categories of
indicators, namely User ID, commodity ID, Item ID, Behavior Type, and Timestamp.
Among them, there are four types of behaviors: PV (page view of an item’s detail
page, equivalent to an item click), Buy (purchase an item), Cart (add an item to the
shopping cart), and Fav (favorite an item). The original data were preprocessed and
the time of shopping behavior was divided into two stages: the first six days as the
observation period and the last six days as the verification period. A customer who
bought more than once during the observation period and bought again more than
once during the verification period is defined as a non-lost customer and is represented
by 0. Customers who purchased more than once during the observation period and 0
times during the verification period were defined as lost customers and represented
by 1. In the R statistical language, customers were first grouped according to User
ID, then the purchase times of each customer in the observation period and verifi-
cation period were calculated, and the customers that met the screening conditions
were retained. Finally, 95,388 pieces of data of 8156 customers were retained, among
which 7576 customers were lost, accounting for 92.8%. There were 580 non-lost cus-
tomers, accounting for 7.2%. Customer data was unbalanced, so the data imbalance
was addressed.
2. Relationship
Figure 2.
Figure Relationshipbetween
betweenthethe
contour coefficient
contour and K.
coefficient and K.
As can be seen in Figure 2, we clustered data into three categories and observed the
As can be seen in Figure 2, we clustered data into three categories and observed the
number of lost and non-lost samples of each cluster customer group. The clustering results
number
are shownofin
lost and1.non-lost samples of each cluster customer group. The clustering results
Table
are shown in Table 1.
Table 1. Results of K-means classification.
Table 1. Results of K-means classification.
Churn and Number of Actual Number of
Customer Type
Non-Churn
Churn and Customer
Number Types
of Actual Clustered
Number Customers
of Clustered
Customer Type 0
Cluster I Non-Churn Customer 484Types Customers
4935
1 4451
0 484
Cluster I 0 83 4935
Cluster II 1 1 44512614 2697
0 0 83 13
Cluster
Cluster II
III 5242697
1 1 2614511
0
(0: Churn customer; 1: Non-churn customer). 13
Cluster III 524
As can be seen in Table 11, there were 4935 customers
511 in Cluster I, among which non-
(0: Churn
churn customer;
customers 1: Non-churn
were customer).
484, accounting for 9.8% of the total number of Cluster I customers.
There were 2697 customers in Cluster II, of which 83 were non-churn customers, accounting
As of
for 3.1% canthebetotal
seen in Table
number 1, there
of Cluster were 4935 There
II customers. customers in Cluster
were 524 customersI, in
among which
Cluster III non-
churn customers
customers, of which were 484,non-churn
13 were accounting for 9.8%accounting
customers, of the total
fornumber of Cluster
2.5% of the I customers.
total number
There
of Cluster III customers. The customer churn rate of Cluster I was 90.2%, that of Clusteraccount-
were 2697 customers in Cluster II, of which 83 were non-churn customers,
ing for 3.1%was
II customers of the total
96.9%, andnumber of Cluster
that of Cluster II customers.
III customers There
was 97.5%. were 524
Therefore, customers
Cluster I in
customers
Cluster IIIwere the coreof
customers, customer
which 13groups
weretonon-churn
be focused on.
customers, accounting for 2.5% of the
total number of Cluster III customers. The customer churn rate of Cluster I was 90.2%, that
4.4. Random Forest and Feature Selection
of Cluster II customers was 96.9%, and that of Cluster III customers was 97.5%. Therefore,
The customer churn data set contained many customer features, and not all variables
Cluster I customers were the core customer groups to be focused on.
were conducive to churn prediction performance. Excessive redundancy and irrelevant
variables in the dataset may hinder the predictive performance of the model [54]. Therefore,
4.4. Random
feature Forest
selection wasandcarried
Feature Selection
out next. Random forest is an effective feature selection
The customer
algorithm with highchurn data setaccuracy,
classification contained goodmany customer
robustness to features,
noise andand not all
outliers, andvariables
strongconducive
were generalization abilityprediction
to churn [55]. Therefore, random Excessive
performance. forest has redundancy
been widely used in
and irrelevant
business management,
variables in the dataseteconomics, finance,
may hinder thebiological
predictive sciences, and otherof
performance fields. The number
the model [54]. There-
of variables in this study was high at 17, so the random forest algorithm was used
fore, feature selection was carried out next. Random forest is an effective feature selection to select
feature variables. The key problem in feature selection is how to select the optimal number
algorithm with high classification accuracy, good robustness to noise and outliers, and
of features (M). The out-of-bag error (OOB error) was used to determine the number of
strong
featuresgeneralization
[56]. When eachability [55].
tree was Therefore,
constructed usingrandom forestforest,
the random has been widely
different used in busi-
bootstrap
ness
samples were used for the training set. Due to the characteristics of sampling, the The
management, economics, finance, biological sciences, and other fields. OOBnumber
of variables
error in this study
was calculated. was high results
The calculation at 17, so
of the
the random
OOB error forest algorithm
are shown was
in Table 2. used to select
feature variables. The key problem in feature selection is how to select the optimal number
of features (M). The out-of-bag error (OOB error) was used to determine the number of
features [56]. When each tree was constructed using the random forest, different bootstrap
samples were used for the training set. Due to the characteristics of sampling, the OOB
J. Theor. Appl. Electron. Commer. Res. 2022, 17 465
Number of
2 3 4 5 6 7 8 9 10 11
Features
oob error 0.069 0.080 0.081 0.083 0.089 0.097 0.098 0.099 0.104 0.104
By changing the number of randomly selected features each time, the difference of
out-of-bag error rate was found to be very small, and the influence of feature number M
was not large. When the number of features selected was 4 each time, the OOB error was
relatively small. So, the random forest with the feature number M set to 4 was established
and the importance of the variables was output, as shown in Table 3.
The Gini index in the random forest can be used to judge the importance of features,
and the importance of each variable can be judged by calculating the value of the Gini
index. The larger the Gini value is, the stronger the importance of the variable is [57].
According to the results in Table 3, four variables were selected as the variables of loss
prediction, namely, “Night Buy,” “PM Buy,” “Night PV,” and “PM PV”.
Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
753 1
positive (0)
Actual 0.9081 0.9990 0.8175 0.942
138 620
negative (1)
Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
753 1
positive (0)
Actual 0.9065 0.9984 0.8149 0.938
140 618
negative (1)
Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
11 1
positive (0)
Cluster I 0.9256 0.9184 0.9298 0.992
Actual
4 48
negative (1)
Actual
107 0
positive (0)
Cluster II 0.9158 0.9982 0.8820 0.966
Actual
31 231
negative (1)
Actual
635 0
positive (0)
Cluster III 0.9053 0.9998 0.7706 0.927
Actual
102 343
negative (1)
Avg. 0.9156 0.9721 0.8608
Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
10 2
positive (0)
Cluster I Actual 0.9050 0.8496 0.9176 0.963
4 47
negative (1)
Actual
107 0
positive (0)
Cluster II Actual 0.9098 1 0.8728 0.955
33 228
negative (1)
Actual
635 0
positive (0)
Cluster III Actual 0.9050 1 0.7697 0.904
102 343
negative (1)
Avg. 0.9066 0.9498 0.8533
them. They can then provide real-time and customized product promotions for customers
during these time periods according to this operable information for the sake of continuous
customer retention.
The literature on B2C e-commerce customer relationship management primarily
focuses on researching customer churn prediction modeling [69–71]. One of the pur-
poses of customer churn prediction for enterprises is to evaluate the effectiveness of
customer retention measures and conduct targeted commodity promotion activities
for retained customers. These marketing strategies should be operable, or enterprises
should not only determine target customers according to the customer churn tendency
but evaluate the practical effects of customer retention measures according to the practi-
cal situation [72]. This paper adopts the research method Classification before Prediction.
The customers are first segmented into three types according to the k-means algorithm.
After the type of customers is made clear, customer churn can then be predicted. The
recent literature [73] emphasizes the importance of customer segmentation, which places
customer segmentation as the first step, and is thus fundamentally consistent with our
research method (Classification before Prediction). For e-commerce enterprises, our
research method holds practical significance. For example, when an enterprise deter-
mines that Cluster I is the key customer group to be focused on, it can make a churn
prediction for Cluster I customers through an appropriate algorithm, judge the churn
tendency of Cluster I customers according to the prediction and evaluation indicators
and then formulate appropriate marketing strategies. If the enterprise cannot judge
such customers in time, no effective customer retention strategies will be carried out,
and, accordingly, part of the Cluster I customers may be subject to churn. Therefore,
the research method utilized in this paper fully takes into account customer retention.
Our research results reflect practicality, which embodies another important value of
this paper.
LR and SVM are compared to judge the prediction performance of the two models,
which is consistent with the research methods and results of other literature. The
prediction performance of SVM is better than that of LR [74–76]. When evaluating
the prediction performance of a model, undoubtedly several performance indicators
are always used. Our research adopts three indicators, namely Accuracy, Recall, and
Precision, which are basically the same as those evaluation indicators mentioned in
other literature [7,77,78]. However, we believe that the customer data of e-commerce
enterprises is unique. Enterprises often change product information or upload a variety
of evaluation information about consumption for the sake of customer retention, which
is quite different from financial and telecommunication customer information. The
training and testing times of various prediction models are quite disparate when dealing
with customer data. As a matter of fact, when we use the SVM model to train customer
data, the data training time for SVM is significantly reduced than that for LR. To retain
customers, e-commerce enterprises need to predict customers in real time. Therefore,
the operational efficiency of the models should be considered, which is mentioned in
only a few pieces of literature [57].
B2C e-commerce environment. Although a large number of shopping websites exist, not
all variables have the same impact on customer churn. The previous researches [68] are
a basis for this paper, helps us continue to expand the research scope and shows that
shopping time (time period) is a key variable for prediction. Traditional R and F variables
are not key for predictions. For customers who are most likely to choose other companies
for consumption, the “Buy” and “PV” within the Night and PM periods are the most
relevant variables for customer churn. Previous research on customer churn [69–71]
has focused on the prediction performance of models. Although we have a limited
understanding of the reasons for customer churn in the B2C e-commerce industry and
enterprises, studying the impact of time variables (P, B, C and F) in this context helps
prove the generalizability of the traditional RFM model. In addition, this paper, based on
the research method segmentation before prediction, clarifies how to predict customer
churn more accurately in the e-commerce industry, which is another achievement of
this paper.
Money). Our research supports “segmentation first” as the first step in predictions, which
is of practical guiding significance for enterprise customer retention. B2C e-commerce
companies should understand that it is not impossible to fully utilize machine learning
tools (technologies) in the customer recommendation system. Marketing managers need to
constantly learn the application methods for various machine learning tools (technologies).
The method mentioned in this paper is reliable and operable and allows an enterprise to
judge which customers are liable to leave in time. Our research shows that marketing
managers can easily optimize their customer retention strategies and product promotion
activities by seeking target customer groups. Considering the operation efficiency of the
model and the ability inherent in enterprise technical teams, the fruits of our method are
consistent with previous literature [80,81]. The technical teams and managers at companies
should not only consider prediction technologies, but also the ability and knowledge of
the personnel or managers using machine learning technologies. In short, for enterprises,
cooperation between people and technologies is always necessary for data processing and
customer segmentation or model optimization and performance evaluation.
7. Conclusions
Customer churn predictions are very important in e-commerce. To maintain market
competitiveness, B2C enterprises should make full use of machine learning in customer
relationship management to predict the potential loss of customers and devise new market-
ing strategies and customer retention measures according to the prediction results. This
will help establish efficient and accurate loss prediction for e-commerce enterprises.
This paper used customer behavior data of a B2C e-commerce enterprise to test the
predictive ability of the SVM and LR models. To evaluate the prediction performance of the
two models, the k-means algorithm was first used for clustering subdivision to classify into
three types of customers, and then predictions were made for these three types of customers.
Accuracy, recall, precision, and AUC were calculated. There were two motivations for
our research. The first purpose was to study the effectiveness of customer segmentation
and the prediction effect of the model before and after customer segmentation according
to the longitudinal timeliness and multivariate variables of customer shopping behavior.
The experimental results prove that the customer segmentation of each prediction index
has a significant improvement. Therefore, k-means clustering segmentation is necessary.
The second purpose was to compare the effects of the LR model’s prediction based on
traditional statistics and the SVM’s prediction based on machine learning. The results
prove that the accuracy of the SVM model prediction was higher than that of the LR model
prediction. These research results have significance for customer relationship management
of B2C e-commerce enterprises.
The results of this study also have some limitations. We used a real data set containing
987,994 customers under the B2C environment, namely, the selection of data is limited to
a certain extent. Ideally, the research results should be verified by several data sets. The
results of customer segmentation greatly impact the prediction performance of the model.
In terms of the method, this paper only uses the k-means algorithm to segment and divide
customer types, which may have limitations, because if two or more segmentation methods
are compared, more convincing results may be obtained. In addition, we only use a small
number of predictive variables, which makes the promotion of our results limited, because
a lot of shopping information is presented on B2C websites, and some of it may be ignored.
Future research can be conducted from several aspects. First, we can collect and
compare the customer behavioral data from several companies to further enhance the
generalizability of our model. Second, considering customer retention is a durative task
of the enterprises, it is necessary to continuously predict and evaluate the churn. It is
crucial to carry out a segmentation considering the customer value and build a prediction
model with value variables, in that all prediction modeling works and customer retention
activities of enterprises aim at profit first.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 473
Author Contributions: Conceptualization, X.X. and Y.H.; methodology, X.X. and Y.H.; software, X.X.;
validation, X.X. and Y.H.; formal analysis, X.X. and Y.H.; investigation, X.X. and Y.H.; resources, X.X.;
data curation, X.X.; writing—original draft preparation, X.X. and Y.H.; writing—review and editing,
X.X. and Y.H.; visualization, X.X. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in
the study.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Bi, Q.Q. Cultivating loyal customers through online customer communities: A psychological contract perspective. J. Bus. Res.
2019, 103, 34–44. [CrossRef]
2. Maria, O.; Bravo, C.; Verbeke, W.; Sarraute, C.; Baesens, B.; Vanthienen, J. Social network analytics for churn prediction in telco:
Model building, evaluation and network architecture. Expert. Syst. Appl. 2017, 85, 204–220.
3. Roberts, J.H. Developing new rules for new markets. J. Acad. Market. Sci. 2000, 8, 31–44. [CrossRef]
4. Reichheld, F.F.; Sasser, W.E. Zero defeofions: Quoliiy comes to services. Harvard. Bus. Rev. 1990, 68, 105–111.
5. Jones, T.O.; Sasser, W.E., Jr. Why satisfied customers defect. IEEE Eng. Manag. Rev. 1998, 26, 16–26. [CrossRef]
6. Nie, G.; Rowe, W.; Zhang, L.; Tian, Y.; Shi, Y. Credit card chum forecasting by logistic regression and decision tree. Expert. Syst.
Appl. 2011, 38, 15273–15285. [CrossRef]
7. Gordini, N.; Veglio, V. Customers churn prediction and marketing retention strategies: An application of support vector machines
based on the AUC parameter-selection technique in B2B e-commerce industry. Ind. Market. Manag. 2017, 62, 100–107. [CrossRef]
8. Zorn, S.; Jarvis, W.; Bellman, S. Attitudinal perspectives for predicting churn. J. Res. Interact. Mark. 2010, 4, 157–169. [CrossRef]
9. Datta, P.; Masand, B.; Mani, D.R.; Li, B. Automated cellular modeling and prediction on a large scale. Artif. Intell. Rev. 2000, 14,
485–502. [CrossRef]
10. Jain, H.; Khunteta, A.; Srivastava, S. Churn prediction in telecommunication using logistic regression and logit boost. Procedia
Comput. Sci. 2020, 167, 101–112. [CrossRef]
11. Coussement, K.; Lessmann, S.; Verstraeten, G. A comparative analysis of data preparation algorithms for customer churn
prediction: A case study in the telecommunication industry. Decis. Support Syst. 2017, 95, 27–36. [CrossRef]
12. Masand, B.; Datta, P.; Mani, D.R.; Li, B. CHAMP: A prototype for automated cellular churn prediction. Data Min. Knowl. Disc.
1999, 3, 219–225. [CrossRef]
13. Huang, Y.; Kechadi, T. An effective hybrid learning system for telecommunication churn prediction. Expert. Syst. Appl. 2013, 40,
5635–5647. [CrossRef]
14. Larivière, B.; Van den Poel, D. Investigating the role of product features in preventing customer churn, by using survival analysis
and choice modeling: The case of financial services. Expert. Syst. Appl. 2004, 27, 277–285. [CrossRef]
15. Zi˛eba, M.; Tomczak, S.K.; Tomczak, J.M. Ensemble boosted trees with synthetic features generation in application to bankruptcy
prediction. Expert. Syst. Appl. 2016, 58, 93–101. [CrossRef]
16. Kotler, P.; Keller, K. Marketing Management, 15th ed.; Pearson Education Ltd.: Edinburgh, UK, 2016; pp. 89–120.
ISBN 10: 1-292-0926-29.
17. Cao, L. In-depth behavior understanding and use: The behavior informatics approach. Inform. Sci. 2010, 180, 3067–3085.
[CrossRef]
18. Cao, L.; Yu, P.S. Behavior informatics: An informatics perspective for behavior studies. IEEE Intell. Inf. Bulletin. 2009, 10, 6–11.
19. Orsenigo, C.; Vercellis, C. Combining discrete SVM and fixed cardinality warping distances for multivariate time series classifica-
tion. Pattern Recogn. 2010, 43, 3787–3794. [CrossRef]
20. Eichinger, F.; Nauck, D.D.; Klawonn, F. Sequence mining for customer behaviour predictions in telecommunications. In
Proceedings of the Workshop on Practical Data Mining: Applications, Experiences and Challenges (ECML/PKDD), Berlin,
Germany, 22 September 2006; pp. 3–10.
21. Prinzie, A.; Van den Poel, D. Incorporating sequential information into traditional classification models by using an
element/position-sensitive SAM. Decis. Support Syst. 2006, 42, 508–526. [CrossRef]
22. Pınar, K.; Topcu, Y.I. Applying Bayesian Belief Network approach to customer chum analysis: A case study on the telecom
industry of Turkey. Expert. Syst. Appl. 2011, 38, 7151–7157.
23. Renjith, S. An integrated framework to recommend personalized retention actions to control B2C E-commerce customer churn.
Intl. J. Eng. Trends Technol. 2015, 27, 152–157. [CrossRef]
J. Theor. Appl. Electron. Commer. Res. 2022, 17 474
24. Caignya, A.D.; Coussementa, K.; Bockb, K.W.D. A new hybrid classification algorithm for customer churn prediction based on
logistic regression and decision trees. Eur. J. Oper. Res. 2018, 269, 760–772. [CrossRef]
25. Neslin, S.A.; Gupta, S.; Kamakura, W.; Lu, J.; Mason, C.H. Defection detection: Measuring and understanding the predictive
accuracy of customer churn models. J. Mark. Res. 2006, 43, 204–211. [CrossRef]
26. Zhang, Y. A Customer Churn Alarm Model based on the C5.0 Decision Tree-Taking the Postal Short Message as an Example. Stat.
Inf. Forum. 2015, 30, 89–94.
27. Farquad, M.A.H.; Ravi, V.; Raju, S.B. Churn prediction using comprehensible support vector machine: An analytical CRM
application. Appl. Soft. Comput. 2014, 19, 31–40. [CrossRef]
28. Tian, L.; Qiu, H.; Zheng, L. Telecom chum prediction modeling and application based on neural network. Comput. Appl. 2007, 27,
2294–2297.
29. Yu, R.; An, X.; Jin, B.; Shi, J.; Move, O.A.; Liu, Y. Particle classification optimization-based BP network for telecommunication
customer churn prediction. Neural Comput. Appl. 2018, 2, 707–720. [CrossRef]
30. Wu, X.; Meng, S. E-commerce Customer Churn Prediction based on Customer Segmentation and AdaBoost. In Proceedings of the
International Conference on Service Systems and Service Management (ICSSSM), Kunming, China, 24–26 June 2016.
31. Ji, H.; Ni, F.; Liu, J. Prediction of telecom customer churn based on XGB-BFS feature selection algorithm. Comput. Technol. Dev.
2021, 31, 21–25.
32. Ahmed, A.A.Q.; Maheswari, D. An enhanced ensemble classifier for telecom churn prediction using cost based uplift modeling.
Intl. J. Inf. Technol. 2019, 11, 381–391.
33. Ying, W.Y. Research on the LDA boosting in customer churn prediction. J. Appl. Stat. Manag. 2010, 29, 400–408.
34. Zhang, W.; Yang, S.; Liu, T.T.; School, M. Customer churn prediction in mobile communication enterprises based on CART and
Boosting algorithm. Chin. J. Manag. Sci. 2014, 22, 90–96.
35. Wu, J.; Shi, L.; Lin, W.P.; Tsai, S.B.; Xu, G.S. An empirical study on customer segmentation by purchase behaviors using a RFM
model and K-means algorithm. Math. Probl. Eng. 2020, 2020, 1–7. [CrossRef]
36. Wu, J.; Shi, L.; Yang, L.P.; Niu, X.X.; Li, Y.Y.; Cui, X.D.; Tsai, S.B.; Zhang, Y.B. User Value Identification Based on Improved RFM
Model and K-Means++ Algorithm for Complex Data Analysis. Wirel Commun. Mob.Com. 2021, 9982484, 1–8. [CrossRef]
37. Li, Y.; Chu, X.Q.; Tian, D.; Feng, J.Y.; Mu, W.S. Customer segmentation using K-means clustering and the adaptive. Appl. Soft
Comput. 2021, 113, 107924. [CrossRef]
38. Christy, A.J.; Umamakeswari, A.; Priyatharsini, L.; Neyaa, A. RFM ranking-An effective approach to customer segmentation. J.
King. Saud. Univ. Sci. 2021, 33, 1251–1257. [CrossRef]
39. Abbasimehr, H.; Bahrini, A. An analytical framework based on the recency, frequency, and monetary model and time series
clustering techniques for dynamic segmentation. Expert. Syst. Appl. 2021, 192, 116373. [CrossRef]
40. Hosseini, M.; Shajari, S.; Akbarabadi, M. Identifying multi-channel value co-creator groups in the banking industry. J. Retail.
Consum. Serv. 2022, 5, 102312. [CrossRef]
41. Alboukaey, N.; Joukhadar, A.; Ghneim, N. Dynamic behavior based churn prediction in mobile telecom. Expert. Syst. Appl. 2020,
162, 113779. [CrossRef]
42. Zhou, J.; Zhai, L.L.; Pantelous, A.A. Market Segmentation Using High-dimensional Sparse Consumers Data. Expert. Syst. Appl.
2020, 145, 113136. [CrossRef]
43. Li, M.; Wang, Q.W.; Shen, Y.Z.; Zhu, T.Y. Customer relationship management analysis of outpatients in a Chinese infectious
disease hospital using drug-proportion recency-frequency-monetary model. Int. J. Med. Inform. 2021, 147, 104373. [CrossRef]
44. Vapnik, V.N. The Nature of Statistic Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2000; pp. 138–141. ISBN 0387987800.
45. Vapnik, V.N. Statistic Learning Theory; Wiley-Interscience: John & Wiley, Inc.: New York, NY, USA, 1998; pp. 375–383.
ISBN 0417030031.
46. Scholkopf, B.; Smolla, A. Learning with kernels-Support Vector Machines, Regularization, Optimization, and Beyond; The MIT Press:
Cambridge, MA, USA; London, UK, 2002; pp. 189–222. ISBN 0262194759.
47. Lee, S.; Lee, H.; Abbeel, P.; Andrew, Y.N. Efficient L1 regularized logistic regression. In Proceedings of the 21st National
Conference on Artificial Intelligence (AAAI-06), Boston, MA, USA, 16 July 2006; Volume 1, pp. 401–408.
48. Minka, T.P. Algorithms for Maximum-Likelihood Logistic Regression; Carnegie Mellon University Research Showcase: Technical
Report (Mathematics); IEEE: Pittsburgh, PA, USA, 2003; pp. 1–15. [CrossRef]
49. Alibaba Cloud Tianchi Data Sets. Available online: https://tianchi.aliyun.com/datase (accessed on 17 March 2021).
50. Cao, L. Behavior Informatics and Analytics: Let Behavior Talk. In Proceedings of IEE International Conference on Data Mining
Workshops (ICDM), Pisa, Italy, 15–19 December 2008; pp. 15–19.
51. Stolfo, S.J.; Hershkop, S.; Hu, C.W.; Li, W.J.; Nimeskern, O.; Wang, K. Behavior-based modeling and its application to Email
analysis. ACM T. Internet. Appl. 2006, 6, 187–221. [CrossRef]
52. Pham, D.T.; Dimov, S.S.; Nguyen, C.D. Selection of K in K-means clustering. Mech. Eng. Sci. 2004, 219, 103–119. [CrossRef]
53. Chen, N.; Chen, A.; Zhou, L. An Effective Clustering Algorithm in Large Transaction Databases. J. Sw. 2001, 12, 476–484.
54. Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B. New insights into churn prediction in the telecommunication sector: A
profit driven data mining approach. Eur. J. Oper. Res. 2012, 218, 211–229. [CrossRef]
55. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
56. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
J. Theor. Appl. Electron. Commer. Res. 2022, 17 475
57. Goldstein, B.A.; Polley, E.C.; Briggs, F.B.S. Random Forests for Genetic Association Studies. Stat. Appl. Genet. Mol. 2011, 10, 32.
[CrossRef]
58. Drummond, C.; Holte, R.C. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings
of Workshop on Learning from Imbalanced Datasets II, ICML, Washington, DC, USA, 21 August 2003.
59. Chawla, N.V.; Bowyer, K.W.; Hall, L.Q.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357. [CrossRef]
60. Provost, F. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In
Proceedings of the International Conference on knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, 15–18
August 1999; pp. 43–48.
61. Fan, X.; Ke, T. Enhanced maximum AUC linear classifier. In Proceedings of the International Conference on Fuzzy Systems and
Knowledge Discovery (FSKD), Yantai, China, 10–12 August 2010; pp. 1540–1544.
62. Brito, P.Q.; Carlos, S.; Sérgio, A.; Mote, A.; Byvoet, M. Customer segmentation in a large database of an online customized fashion
business. Robot. Cim-int. Manuf. 2015, 36, 93–100. [CrossRef]
63. Sturm, A.; Bob, L. Classification accuracy is not enough. J. Intell. Inf. Syst. 2013, 41, 371–406. [CrossRef]
64. Ma, S.; Huang, J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics
2005, 21, 4356–4362. [CrossRef] [PubMed]
65. Song, X.; Ma, S. Penalized Variable Selection with U-Estimates. J. Nonparametr. Stat. 2010, 22, 499–515. [CrossRef] [PubMed]
66. Chang, H.H.; Tsay, S.F. Integrating of SOM and K-mean in data mining clustering: An empirical study of CRM and profitability
evaluation. J. Inform. Manag. 2004, 11, 161–203.
67. Rachid, A.D.; Abdellah, A.; Belaid, B.; Rachid, L. Clustering Prediction Techniques in Defining and Predicting Customers
Defection: The Case of E-Commerce Context. Int. J. Elect. Comput. Eng. 2018, 8, 2367–2383. [CrossRef]
68. Chen, K.; Hu, Y.H.; Hsieh, Y.C. Predicting customer churn from valuable B2B customers in the logistics industry: A case study.
Inf. Syst. E-Bus. Manage. 2015, 13, 475–494. [CrossRef]
69. Buckinx, W.; Van den Poel, D. Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG
retail setting. Eur. J. Oper. Res. 2005, 164, 252–268. [CrossRef]
70. Migueis, V.L.; Van den Poel, D.; Camanho, A.S.; Cunha, J.F. Modeling partial customer churn: On the value of first product-
category purchase sequences. Expert. Syst. Appl. 2012, 39, 11250–11256. [CrossRef]
71. Miguéis, V.L.; Camanho, A.; Cunha, J.F. Customer attrition in retailing: An application of Multivariate Adaptive Regression
Splines. Expert. Syst. Appl. 2013, 40, 6225–6232. [CrossRef]
72. Ascarza, E. Retention Futility: Targeting High Risk Customers Might Be Ineffective. J. Mark. Res. 2018, 55, 80–98. [CrossRef]
73. Caigny, D.A.; Coussement, K.; Verbeke, W.; Idbenjra, K.; Phan, M. Uplift modeling and its implications for B2B customer churn
prediction: A segmentation-based modeling approach. Ind. Market. Manag. 2021, 99, 28–39. [CrossRef]
74. Kim, S.; Shin, K.; Park, K. An application of support vector machines for customer churn analysis: Credit card case. In Proceedings
of the First international conference on Advances in Natural Computation (ICNC), Changsha, China, 27–29 August 2005;
pp. 636–647.
75. Coussement, K.; Van den Poel, D. Churn prediction in subscription services: An application of support vector machines while
comparing two parameter-selection techniques. Expert. Syst. Appl. 2008, 34, 313–327. [CrossRef]
76. Kim, H.S.; Sohn, S.Y. Support vector machines for default prediction of SMEs based on technology credit. Eur. J. Oper. Res. 2010,
201, 838–846. [CrossRef]
77. Schaeffer, S.E.; Sanchez, S.V.R. Forecasting client retention-A machine-learning approach. J. Retail. Consum. Serv. 2020, 52, 101918.
[CrossRef]
78. Gattermann-Itschert, T.; Thonemann, U.W. How training on multiple time slices improves performance in churn prediction. Eur.
J. Oper. Res. 2021, 295, 664–674. [CrossRef]
79. Sood, A.; Kumar, V. Analyzing client profitability across diffusion segments for a continuous innovation. J. Mark. Res. 2017, 54,
932–951. [CrossRef]
80. Duan, Y.; Edwards, J.S.; Dwivedi, Y.K. Artificial intelligence for decision making in the era of big data-Evolution, challenges and
research agenda. Int. J. Inform. Manag. 2019, 48, 63–71. [CrossRef]
81. Dwivedi, Y.K.; Hughes, L.; Ismagilova, E.; Aarts, G.; Coombs, C.; Crick, T.; Duan, Y.; Dwived, R.; Edwards, J.; Eirug, A.; et al.
Artificial intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice
and policy. Int. J. Inform. Manag. 2021, 57, 101994. [CrossRef]