0% found this document useful (0 votes)
19 views18 pages

Jtaer 17 00024

Uploaded by

Sida Sama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views18 pages

Jtaer 17 00024

Uploaded by

Sida Sama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Article

B2C E-Commerce Customer Churn Prediction Based on


K-Means and SVM
Xiancheng Xiahou * and Yoshio Harada

Faculty of Business Management, Osaka Sangyo University, Daitou 574-8530, Japan;


ryo_hrd@ge.osaka-sandai.ac.jp
* Correspondence: kakou.osu@hotmail.com

Abstract: Customer churn prediction is very important for e-commerce enterprises to formulate
effective customer retention measures and implement successful marketing strategies. According to
the characteristics of longitudinal timelines and multidimensional data variables of B2C e-commerce
customers’ shopping behaviors, this paper proposes a loss prediction model based on the combination
of k-means customer segmentation and support vector machine (SVM) prediction. The method
divides customers into three categories and determines the core customer groups. The support vector
machine and logistic regression were compared to predict customer churn. The results show that each
prediction index after customer segmentation was significantly improved, which proves that k-means
clustering segmentation is necessary. The accuracy of the SVM prediction was higher than that of
the logistic regression prediction. These research results have significance for customer relationship
management of B2C e-commerce enterprises.

Keywords: B2C e-commerce customer; k-means; logistic regression; SVM; customer churn



Citation: Xiahou, X.; Harada, Y. B2C 1. Introduction
E-Commerce Customer Churn
Customers are one of the most important assets of an enterprise, and they play a
Prediction Based on K-Means and
very important role in improving the market competitiveness and performance of the
SVM. J. Theor. Appl. Electron. Commer.
enterprise [1]. Amid fierce market competition, customers can easily choose among
Res. 2022, 17, 458–475. https://
numerous products or service providers [2]. Studies show that the cost of developing
doi.org/10.3390/jtaer17020024
a new customer is often higher than the cost of retaining an old customer [3]. If an
Academic Editor: Eduardo enterprise maintains a good relationship with customers for a long time, it will gain
Álvarez-Miranda more profits from existing customers. If the customer retention rate increases by 5%,
Received: 9 January 2022
the net present value of the enterprise will increase by 25–95% [4]. When the customer
Accepted: 30 March 2022
churn rate is reduced by 5%, the average profit margin of an enterprise will increase
Published: 6 April 2022
by 25–85% [5,6]. In order to maintain market advantages, it has become important for
enterprises to determine how to make use of existing customer resources and avoid the
Publisher’s Note: MDPI stays neutral
loss of existing customers [7].
with regard to jurisdictional claims in
Customer churn prediction techniques can be used to identify customers that may
published maps and institutional affil-
be lost. Then, marketing strategies can be improved according to the forecast results.
iations.
Retention of existing customers can effectively prevent performance loss [8]. Datta et al.
proposed a framework for customer churn management that comprised data acquisi-
tion, business understanding, feature selection, a construction model, and a validation
Copyright: © 2022 by the authors.
model [9], this literature described CHAMP (CHurn Analysis, Modeling, and Prediction),
Licensee MDPI, Basel, Switzerland. an automated system, and offered specific guiding methods for researching customer
This article is an open access article churn management, which is significant. Jain et al. showed that in the highly competitive
distributed under the terms and telecom market, companies must analyze corporate behaviors to determine customer
conditions of the Creative Commons loss and take effective measures to retain existing customers [10]. Research on churn pre-
Attribution (CC BY) license (https:// diction was carried out in the telecom, banking, retail, and other industries. For example,
creativecommons.org/licenses/by/ in the prediction of telecom customer churn, Coussement et al. examined the customers’
4.0/). calling behavior, customer and operator’s interaction behavior, package subscription,

J. Theor. Appl. Electron. Commer. Res. 2022, 17, 458–475. https://doi.org/10.3390/jtaer17020024 https://www.mdpi.com/journal/jtaer
J. Theor. Appl. Electron. Commer. Res. 2022, 17 459

account information, calling information, and demographic characteristics [11]. Loss


prediction was performed through algorithms, such as the logit model, simple linear
regression, k-nearest neighbors, decision tree, and artificial neural network [11–13]. For
the customer churn prediction, credit card information, transaction information, and
abnormal usage information have been used to predict customer churn through the logit
model and decision tree algorithm [14,15]. Online shopping is a common consumer shop-
ping mode. E-commerce enterprises constantly launch new products; market conditions
are constantly changing, and many e-commerce enterprises face the serious problem of
customer loss [11].
However, most of the studies on customer churn prediction are concentrated in the
telecom, banking, retail, and other industries, and there are few studies on customer
churn prediction in B2C e-commerce. Customers in the telecommunications and bank-
ing industries need to sign contracts with enterprises, which means the customers are
contractually bound. These enterprises can accurately identify customer churn. Online
shopping customers do not need to sign contracts with enterprises. The shopping behavior
of non-contractual customers is complicated, and the loss state is vague. E-commerce
enterprises often find it difficult to judge when and why their customers are lost and
whether it is worth retaining them. In addition, the shopping behavior dataset of B2C
e-commerce customers contains two types of data. The first type of data is the demographic
data of customer groups, such as age, gender, address, contact information, nationality,
and social class. E-commerce enterprises can easily obtain the data directly from their
database. The other kind of data is customer shopping behavior data, including shopping
time, purchase preparation, shopping intention, and attitude toward products [16], this
literature (Kotler and Keller) points out the research methods and direction for researchers
studying customer relationship management and plays an important role in classifying
data attributes. The behavioral data of B2C e-commerce customers are characterized by
longitudinal timeliness and multidimensional variables, and such data are usually stored
independently in the business database managed by enterprises [17,18]. B2C shopping
websites have an extensive variety and quantity of customer data, including customer
behavior information, such as shopping time, clicking on product pages, collecting goods,
adding to a shopping cart, and shopping frequency. These information variables may have
a better ability to predict customer churn. In the previous works on forecasting customer
churn in e-commerce, longitudinal behavior data and longitudinal timeliness of customers
are often ignored [19–21].
E-commerce enterprise managers can use big data and cloud computing to analyze
and model consumer behavior data by extracting all kinds of information as well as car-
rying out customer churn prediction research. This paper builds on existing research and
uses B2C e-enterprise, non-contractual customers, as the research object to analyze data
and information on shopping behavior. The k-means algorithm was used for customer
segmentation, and the random forest was used to select variable characteristics. Finally, a
machine learning algorithm was used to construct an erosion prediction model, and the
empirical results are discussed. This paper aims to provide B2C e-commerce enterprises
with a complete solution for customer segmentation and churn prediction and help enter-
prise managers realize customer relationship management that utilizes artificial intelligence
technologies. The ultimate goal is to formulate reasonable and effective customer retention
strategies and reduce operating costs for enterprises.
The rest of this paper is organized into five parts. The next section is the literature
review. Section 3 describes the research methods, including the introduction of the basic
principles of support vector machine (SVM) and logistic regression (LR). Section 4 presents
the empirical methodology, including data pretreatment, customer segmentation based on
k-means, feature selection based on random forest, and prediction evaluation indicators.
Section 5 presents the experimental results and discussion. Section 6 is the conclusion.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 460

2. Literature Review
The forecasting methods of customer churn can be summarized into three types:
forecasting methods based on traditional statistical analysis, prediction methods based
on machine learning, and prediction methods based on combinatorial classifiers. The
prediction methods based on traditional statistical analysis mainly include linear discrim-
inant analysis, the naive Bayesian model, cluster analysis, and logistic regression. For
example, Pınar et al. used a naive Bayes classifier to predict customer churn of a tele-
com company in 2011. Their results showed that the average call duration of customers
was strongly correlated with customer churn [22]. Renjith et al. predicted e-commerce
customer churn by logistic regression and proposed a personalized customer retention
strategy using machine learning [23]. Caignya et al. combined logistic regression and a
decision tree to predict customer churn in the telecom industry [24]. In these studies, tradi-
tional statistical analysis methods were used for prediction, and the prediction model had
strong interpretability. However, these methods have a limitation while dealing with big
data and multidimensional variable data; the prediction performance was not obvious in
these cases.
Prediction methods based on machine learning mainly include decision trees, the
support vector machine, and artificial neural networks, among others. For example,
Neslin et al. stated that decision trees are widely used in practical customer churn prediction.
The decision tree algorithm can be applied as the basic model of loss prediction [25].
Zhang et al. used the C5.0 decision tree to predict the loss of postal short message service
of telecom enterprises. The results showed that the C5.0 decision tree prediction model
had high accuracy [26]. Farquad et al. predicted the churn of bank credit card customers
and proposed a hybrid method to extract rules from the support vector machine [27].
Gordini et al. predicted the loss of B2B e-commerce customers, and the results showed
that the support vector machine had a good prediction performance in processing noisy,
unbalanced, and non-linear B2B e-commerce data [7]. Tian et al. predicted the churn
of telecom customers, extracted appropriate variables from the original data with a two-
layer neural network and proposed a churn prediction model based on an artificial neural
network. The results showed that the prediction effect of this method was better than
that of the decision tree and naive Bayes classifier [28]. Yu et al. studied the prediction
of telecom customer churn, used iterative particle classification optimization and particle
fitness calculation to train the prediction model, and proposed a BP neural network based
on particle classification optimization. The results showed that their method improved the
accuracy of churn prediction [29].
The prediction methods based on combinatorial classifiers integrate several weak
classifiers and form a strong classifier. Commonly used combinative classifier methods
include AdaBoost, XGBoost, and random forest. For example, Wu et al. studied the e-
commerce customer churn prediction problem by balancing positive and negative class
data by adjusting the sampling ratio, reducing the size of the dataset, and improving the
classification accuracy of the classifier by combining the AdaBoost algorithm. The results
showed that Adaboost had a good prediction effect [30]. For datasets with time charac-
teristics, such as that of telecom industry customers, Ji et al. proposed a hybrid feature
selection algorithm based on XGBoost. The algorithm chose features from two angles to
select the most important characteristic to predict customer churn while removing redun-
dant features. The experimental results showed that this method had a good prediction
performance [31]. Ahmed et al. studied the prediction of customer churn in the telecom in-
dustry and proposed a prediction model based on a combinatorial heuristic algorithm [32].
Ying et al. performed an in-depth study on the duality of bank customers and adopted
integrated LDA and boosting methods to predict customer churn and achieved good pre-
diction results [33]. Zhang et al. used CART and the adaptive boosting integrated model to
predict telecom customer churn. The results showed that their method had high prediction
accuracy [34].
In summary, researchers have used various forecasting methods to conduct in-depth
research on the churn of contractual customers in the telecom industry, banking industry,
J. Theor. Appl. Electron. Commer. Res. 2022, 17 461
and B2B e-commerce enterprises. Previous works have also discussed the advantages of
various methods, thus making valuable contributions to the research on churn prediction
of contractual customers. Customer loss of B2C e-commerce enterprises is concerning.
SuchIncustomers’
summary, researchers
shopping have used various
behavior forecasting methods
is multidimensional, to conduct
and their in-depth
shopping intentions
research on the churn of contractual customers in the telecom industry, banking industry,
and tendencies are personalized. Therefore, using the characteristics of customer data, this
and B2B e-commerce enterprises. Previous works have also discussed the advantages of
work studies the loss of non-contractual customers of B2C e-commerce enterprises.
various methods, thus making valuable contributions to the research on churn prediction
of contractual customers. Customer loss of B2C e-commerce enterprises is concerning.
3. Research
Such Methods
customers’ shopping behavior is multidimensional, and their shopping intentions
and tendencies
E-commerce are personalized. Therefore,
customers have using thebehaviors:
two possible characteristics of customer
customer churndata,
andthis
customer
work studies the
non-churn. loss of non-contractual
Therefore, customer churncustomers of B2C is
prediction e-commerce
a binary enterprises.
classification problem.
Online shopping has its own characteristics in terms of shopping time and behavior. Tak-
3. Research Methods
ing shopping time and behavioral tendency into full consideration may play an important
E-commerce customers have two possible behaviors: customer churn and customer
role in judging customer loss. In this research, there are two important steps- customer
non-churn. Therefore, customer churn prediction is a binary classification problem. Online
segmentation
shopping has its and
own churn prediction.
characteristics inThe
termsclustering
of shoppingalgorithm
time and and prediction
behavior. model that
Taking
are expected
shopping timetoandbe behavioral
adopted are important
tendency intotasks that must be may
full consideration determined. The chosen clus-
play an important
role in judging customer loss. In this research, there are two important steps- customer In the
tering algorithm is dependent on the type of data and the purpose of clustering.
literature onand
segmentation customer segmentation
churn prediction. from thealgorithm
The clustering last five and
years, the customer
prediction model thatsegmentation
are
expected
method to viabek-means
adopted are hasimportant tasks that
been adopted in must be determined.
the e-commerce The chosen
[35,36], retailclustering
[37,38], finance
algorithm
[39,40], andis dependent on the type of
telecommunications data and the
industries purpose
[41,42]. Thisofalgorithm
clustering.canIn the
helpliterature
process large-
on
scale data and exhibits simple calculation and high operation efficiency, andmethod
customer segmentation from the last five years, the customer segmentation thus has been
via k-means has been adopted in the e-commerce [35,36], retail [37,38], finance [39,40],
widely adopted in other industries [43]. In terms of prediction models, comparisons
and telecommunications industries [41,42]. This algorithm can help process large-scale
among prediction effects of different models for most literature are made by using a uni-
data and exhibits simple calculation and high operation efficiency, and thus has been
fied data
widely set toindetermine
adopted other industries the optimal one.ofInprediction
[43]. In terms the literature
models, oncomparisons
churn prediction
among [7], by
comparing prediction models, such as LR, SVM, Neural Network,
prediction effects of different models for most literature are made by using a unified data Decision Trees, and
Random
set Forest,the
to determine the research
optimal one.results show thatonthe
In the literature SVM
churn model exhibited
prediction a positive pre-
[7], by comparing
diction performance
prediction models, suchwith as LR, a SVM,
quickNeural
training speed.Decision
Network, Therefore, considering
Trees, and Random theForest,
obvious ad-
the research results show that the SVM model exhibited a positive
vantages of the k-means and SVM algorithms, we chose the k-means and SVM as the main prediction performance
with a quick training
algorithms speed. and
in this paper Therefore,
madeconsidering
a comparative the obvious
analysisadvantages
throughofLR. the Onk-means
the basis of
and SVM algorithms, we chose the k-means and SVM as the
pre-processed data, this study first uses the k-means algorithm for the clustering main algorithms in thissubdivi-
paper and made a comparative analysis through LR. On the basis of pre-processed data,
sion of customers and then uses the LR algorithm and SVM algorithm to establish predic-
this study first uses the k-means algorithm for the clustering subdivision of customers
tion models. The prediction accuracy and effectiveness of these two prediction models are
and then uses the LR algorithm and SVM algorithm to establish prediction models. The
investigated,
prediction and and
accuracy a comparative
effectivenessanalysis
of these twois conducted. The research
prediction models process isand
are investigated, shown in
aFigure 1.
comparative analysis is conducted. The research process is shown in Figure 1.
Thefollowing
The following is is a brief
a brief introduction
introduction toalgorithms
to the the algorithms
of SVM ofand
SVM and LR.
LR.

Figure1.1.The
Figure Theprocess
processof of
thethe
SVM andand
SVM LR models for customer
LR models churn churn
for customer prediction.
prediction.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 462

3.1. Support Vector Machines


Support vector machines are linear classifiers [44–46] that classify data by solving
quadratic optimization problems to establish the optimal separation plane between datasets.
SVM can also be applied to nonlinear classification problems by kernel function transfor-
mation. SVM transforms the original problem into a more solvable dual problem through
the Lagrange multiplier method, which is mathematically expressed as:
m
1   
L(w, b, α) = k w k2 + ∑ a i 1 − y i W T x i + b (1)
2 i =1

Take partial derivatives of w and b, and set them to 0, yielding:


m
W= ∑ ai xi yi = 0 (2)
i =1

m
∑ ai yi = 0 (3)
i =1

Substitute Equations (2) and (3) into Equation (1), and finally convert the original
problem into the following objective function (4) for the solution:
m
1 m m
max ∑ ai −
2 i∑ ∑ ai a j yi y j xi x j s.t. αi ≥ 0, i = 1, 2, . . . , m (4)
i =1 =1 j =1

SVM can simulate the nonlinear decision boundary well with the kernel function and
also control overfitting.
The advantage of SVM is that it can improve the generalization ability and deal with
high-dimensional data problems.

3.2. Logistic Regression


Logistic regression is a classical classification method [47,48]. It predicts the probability
of the categories of the dataset of unknown categories through the dataset of the existing
category labels. It can be expressed as P(Y|X) in the form of a conditional probability
distribution. Take dichotomies as an example, where X is an n-dimensional vector and Y is
0 or 1. The mathematical expression of the prediction result is:

exp(wx + b)
P (Y = 1 | X ) = (5)
1 + exp(wx + b)

1
P (Y = 0 | X ) = (6)
1 + exp(wx + b)

where the real values of the linear regression model z = WT x + b are transformed into values
between [0, 1] by the sigmoid function. In other words, the result of a certain sample X
calculated by Equation (5) or (6) is the probability that the sample point X belongs to a
certain class. We can categorize them by setting thresholds. Generally, values calculated by
the sigmoid function are classified as category 1 if they are greater than or equal to 0.5 and
category 0 if they are not. The advantage of LR is that the results are easy to interpret and
applicable to both continuous and categorical variables.

4. Empirical Study
4.1. Data Preparation
The original dataset was the dataset published by the Alibaba Cloud Tianchi
platform [49] for scientific research and big data competitions. This dataset contains
all the behavior data of 987,994 random users who have shopping activities between
J. Theor. Appl. Electron. Commer. Res. 2022, 17 463

23 November 2017 and 4 December 2017. The data set includes five categories of
indicators, namely User ID, commodity ID, Item ID, Behavior Type, and Timestamp.
Among them, there are four types of behaviors: PV (page view of an item’s detail
page, equivalent to an item click), Buy (purchase an item), Cart (add an item to the
shopping cart), and Fav (favorite an item). The original data were preprocessed and
the time of shopping behavior was divided into two stages: the first six days as the
observation period and the last six days as the verification period. A customer who
bought more than once during the observation period and bought again more than
once during the verification period is defined as a non-lost customer and is represented
by 0. Customers who purchased more than once during the observation period and 0
times during the verification period were defined as lost customers and represented
by 1. In the R statistical language, customers were first grouped according to User
ID, then the purchase times of each customer in the observation period and verifi-
cation period were calculated, and the customers that met the screening conditions
were retained. Finally, 95,388 pieces of data of 8156 customers were retained, among
which 7576 customers were lost, accounting for 92.8%. There were 580 non-lost cus-
tomers, accounting for 7.2%. Customer data was unbalanced, so the data imbalance
was addressed.

4.2. Data Preprocessing


Data preprocessing is a key step in customer churn prediction. First, the R language
function as. POSIxCT was used to convert the timestamp of each piece of data in the
original data into “year, month, day” and “hour, minute, second” in accordance with the
time format. As the behavioral intention is implied in customer shopping transaction
data and behavior may play an important role in the emergence of problems, behavioral
intention as a potential factor is weakened or even ignored in traditional customer behavior
analysis [50,51]. Therefore, to further study the intention of customers’ shopping behaviors
in different time periods, this work divided the time when shopping behaviors occurred.
Our definitions were 00:00 to 06:00 is Daybreak; 06:00 to 12:00 is AM; 12:00 to 18:00 is PM;
and 18:00 to 00:00 is Night. Then, the shopping behavior of each customer in these four
time periods was counted.
The behavioral data in the dataset are PV number, Buy number, Cart number, and
Fav number. The behavioral data were further subdivided, and the data types were finally
sorted, with 17 types of variables. Specifically, the items were Categories, Daybreak PV,
Daybreak Buy, Daybreak Cart, Daybreak Fav, AM PV, AM Buy, AM Cart, AM Fav, PM PV,
PM Buy, PM Cart, PM Fav, Night PV, Night Buy, Night Cart, and Night Fav.

4.3. K-Means and Customer Segmentation


One of the tasks of customer churn management is to divide customer groups into
various types according to their shopping behaviors, such as valuable core customers,
which are loyal customers and easy-to-lose customers. To formulate targeted marketing
strategies for customers, enterprises must subdivide customers. In this study, k-means was
used for customer segmentation. As a classical algorithm of unsupervised clustering, k-
means is similar to a fully automatic classification. The more similar the cluster is, the better
the clustering effect will be. The algorithm is simple, easy to implement, and has good
clustering performance, so it is widely used [52,53]. The variables of k-means clustering
were the 17 variables mentioned above. For the given unlabeled sample data set, the cluster
number K was determined in advance so that the samples in the cluster were distributed
together as closely as possible and the distance between clusters was as large as possible.
By enumerating K from 2 to 8, we obtained the relation between the contour coefficient
and K (Figure 2).
JTAER 2022, 17, FOR PEER REVIEW 7
J. Theor. Appl. Electron. Commer. Res. 2022, 17 464

2. Relationship
Figure 2.
Figure Relationshipbetween
betweenthethe
contour coefficient
contour and K.
coefficient and K.

As can be seen in Figure 2, we clustered data into three categories and observed the
As can be seen in Figure 2, we clustered data into three categories and observed the
number of lost and non-lost samples of each cluster customer group. The clustering results
number
are shownofin
lost and1.non-lost samples of each cluster customer group. The clustering results
Table
are shown in Table 1.
Table 1. Results of K-means classification.
Table 1. Results of K-means classification.
Churn and Number of Actual Number of
Customer Type
Non-Churn
Churn and Customer
Number Types
of Actual Clustered
Number Customers
of Clustered
Customer Type 0
Cluster I Non-Churn Customer 484Types Customers
4935
1 4451
0 484
Cluster I 0 83 4935
Cluster II 1 1 44512614 2697
0 0 83 13
Cluster
Cluster II
III 5242697
1 1 2614511
0
(0: Churn customer; 1: Non-churn customer). 13
Cluster III 524
As can be seen in Table 11, there were 4935 customers
511 in Cluster I, among which non-
(0: Churn
churn customer;
customers 1: Non-churn
were customer).
484, accounting for 9.8% of the total number of Cluster I customers.
There were 2697 customers in Cluster II, of which 83 were non-churn customers, accounting
As of
for 3.1% canthebetotal
seen in Table
number 1, there
of Cluster were 4935 There
II customers. customers in Cluster
were 524 customersI, in
among which
Cluster III non-
churn customers
customers, of which were 484,non-churn
13 were accounting for 9.8%accounting
customers, of the total
fornumber of Cluster
2.5% of the I customers.
total number
There
of Cluster III customers. The customer churn rate of Cluster I was 90.2%, that of Clusteraccount-
were 2697 customers in Cluster II, of which 83 were non-churn customers,
ing for 3.1%was
II customers of the total
96.9%, andnumber of Cluster
that of Cluster II customers.
III customers There
was 97.5%. were 524
Therefore, customers
Cluster I in
customers
Cluster IIIwere the coreof
customers, customer
which 13groups
weretonon-churn
be focused on.
customers, accounting for 2.5% of the
total number of Cluster III customers. The customer churn rate of Cluster I was 90.2%, that
4.4. Random Forest and Feature Selection
of Cluster II customers was 96.9%, and that of Cluster III customers was 97.5%. Therefore,
The customer churn data set contained many customer features, and not all variables
Cluster I customers were the core customer groups to be focused on.
were conducive to churn prediction performance. Excessive redundancy and irrelevant
variables in the dataset may hinder the predictive performance of the model [54]. Therefore,
4.4. Random
feature Forest
selection wasandcarried
Feature Selection
out next. Random forest is an effective feature selection
The customer
algorithm with highchurn data setaccuracy,
classification contained goodmany customer
robustness to features,
noise andand not all
outliers, andvariables
strongconducive
were generalization abilityprediction
to churn [55]. Therefore, random Excessive
performance. forest has redundancy
been widely used in
and irrelevant
business management,
variables in the dataseteconomics, finance,
may hinder thebiological
predictive sciences, and otherof
performance fields. The number
the model [54]. There-
of variables in this study was high at 17, so the random forest algorithm was used
fore, feature selection was carried out next. Random forest is an effective feature selection to select
feature variables. The key problem in feature selection is how to select the optimal number
algorithm with high classification accuracy, good robustness to noise and outliers, and
of features (M). The out-of-bag error (OOB error) was used to determine the number of
strong
featuresgeneralization
[56]. When eachability [55].
tree was Therefore,
constructed usingrandom forestforest,
the random has been widely
different used in busi-
bootstrap
ness
samples were used for the training set. Due to the characteristics of sampling, the The
management, economics, finance, biological sciences, and other fields. OOBnumber
of variables
error in this study
was calculated. was high results
The calculation at 17, so
of the
the random
OOB error forest algorithm
are shown was
in Table 2. used to select
feature variables. The key problem in feature selection is how to select the optimal number
of features (M). The out-of-bag error (OOB error) was used to determine the number of
features [56]. When each tree was constructed using the random forest, different bootstrap
samples were used for the training set. Due to the characteristics of sampling, the OOB
J. Theor. Appl. Electron. Commer. Res. 2022, 17 465

Table 2. OOB error.

Number of
2 3 4 5 6 7 8 9 10 11
Features
oob error 0.069 0.080 0.081 0.083 0.089 0.097 0.098 0.099 0.104 0.104

By changing the number of randomly selected features each time, the difference of
out-of-bag error rate was found to be very small, and the influence of feature number M
was not large. When the number of features selected was 4 each time, the OOB error was
relatively small. So, the random forest with the feature number M set to 4 was established
and the importance of the variables was output, as shown in Table 3.

Table 3. Importance of random forest variable selection.

Variable Mean Decrease Gini Mean Decrease Accuracy


Night Buy 135.2431 23.645784
PM Buy 120.5565 26.333866
Night PV 114.7848 −2.905496
PM PV 110.5329 −5.03892
AM PV 102.2129 −6.711931
AM Buy 87.24348 21.823613
Categories 68.46213 16.58795
Daybreak PV 47.48254 −2.350555
Night Cart 41.98615 −3.251536
Daybreak Buy 37.84351 12.998789
PM Cart 31.62181 −2.414922
AM Cart 25.06281 −1.886356
Night Fav 15.16813 −4.411997
PM Fav 13.76148 −0.308831
AM Fav 11.68928 −2.350264
Daybreak Cart 10.74105 0.3483816
Daybreak Fav 6.106746 1.8525728

The Gini index in the random forest can be used to judge the importance of features,
and the importance of each variable can be judged by calculating the value of the Gini
index. The larger the Gini value is, the stronger the importance of the variable is [57].
According to the results in Table 3, four variables were selected as the variables of loss
prediction, namely, “Night Buy,” “PM Buy,” “Night PV,” and “PM PV”.

4.5. Data Balancing


Since there was an imbalance between the number of non-churn customers and churn
customers (580 non-churn, 7576 churn), the data imbalance was remedied. One of the
most commonly used techniques for processing unbalanced data is sampling, such as
over-sampling and under-sampling [58]. In this experiment, the oversampling rate was 1:1,
and the dataset was balanced by SMOTE [59]. The balanced dataset is shown in Table 4.

Table 4. The balanced dataset.

Dataset Name Churn Non-Churn Positive:Negative


Sample dataset 7576 580 1:13
SMOT balance 3788 3788 1:1
Cluster I 4451 484 1:9.2
SMOT balance 2225 2225 1:1
Cluster II 2614 83 1:31.5
SMOT balance 1307 1307 1:1
Cluster III 511 13 1:39.3
SMOT balance 256 256 1:1
4.6. Evaluation Metrics
The evaluation methods were mainly through the generation of the confusion mat
and on this basis, the accuracy, recall rate, and accuracy were calculated. These three
J. Theor. Appl. Electron. Commer.dicators
Res. 2022, 17were used to measure the prediction effect of the model. After drawing 466 the
ceiver operating characteristic curve, the area under the receiver operating curve (AU
was used for comprehensive evaluation [60,61].
Accuracy
4.6. isMetrics
Evaluation the proportion between the positive and negative sample size and
total sample size that methods
The evaluation is accurate
werein the overall
mainly throughprediction. It of
the generation mainly measures
the confusion the accur
matrix,
of the overall predictions of the model.
and on this basis, the accuracy, recall rate, and accuracy were calculated. These three
indicators were used to measure the prediction effect of the model. After drawing the
receiver operating characteristic curve, the area under  TN
TPthe receiver operating curve (AUC)
Accuracy=
was used for comprehensive evaluation [60,61].TP  FN  FP  TN
Recall is the isproportion
Accuracy of correctly
the proportion predicted
between the positive
positive and sample
negative samplesize
sizeto thethe
and actual nu
total sample size that is accurate in the overall prediction. It mainly measures the accuracy
ber of positive samples, and it reflects the coverage of the model.
of the overall predictions of the model.
TP
Recall=
Accuracy =
TP +TP  FN
TN
(7)
TP + FN + FP + TN
Precision is the proportion of the number of positive samples correctly predicted
the number ofispositive
Recall samples,
the proportion mainly
of correctly indicating
predicted positivethe accuracy
sample of the
size to the positive
actual number sample
of positive samples, and it reflects the coverage of the model.
TP
Precision=
TP TP  FP
Recall = (8)
TP (true positive) and TN (true negative) TP + FN indicate how many predictions were c
rect. Precision is the proportion of the number of positive samples correctly predicted to
FP (false of
the number positive) and FN
positive samples, (false
mainly negative)
indicating indicateof how
the accuracy manysamples.
the positive predictions w
wrong. TP
Precision = (9)
TP + FP
5. Results and Analysis
TP (true positive) and TN (true negative) indicate how many predictions were correct.
AfterFP the
(falsedata imbalance
positive) processing
and FN (false negative) was completed,
indicate the balanced
how many predictions were data
wrong.were in
into the SVM and LR models for prediction. Several iterations were performed until c
5. Results and Analysis
vergence was achieved. Ten-fold cross-validation was used to divide the data into
After the data imbalance processing was completed, the balanced data were input
parts, nine
into theof which
SVM and were used for
LR models as the trainingSeveral
prediction. set and one as the
iterations weretest set. Theuntil
performed average va
of the predictedwas
convergence experimental results
achieved. Ten-fold obtained from
cross-validation was10 experiments
used to divide the was used10as the f
data into
evaluation result
parts, nine of the
of which wereSVM
usedand
as theLR models.
training set and one as the test set. The average value
The SVM and LR models were applied
of the predicted experimental results obtained to 10
from the test set data
experiments to obtain
was used the confus
as the final
evaluation result of the SVM and LR models.
matrix. Tables 5 and 6, respectively, show the confusion matrices before and after sub
The SVM and LR models were applied to the test set data to obtain the confusion ma-
vision,
trix. Tables 5 and 6,2respectively,
and Figures and 3, respectively, show the
show the confusion ROCbefore
matrices curvesandpredicted by the SVM
after subdivision,
LR models before
and Figures and
2 and after subdivision.
3, respectively, show the ROC curves predicted by the SVM and LR
models before and after subdivision.

(a) (b) (c)


Figure 3. The ROC curve of the SVM after segmentation. (a) Cluster I; (b) Cluster II; (c) Cluster III.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 467

Table 5. SVM confusion matrix before segmentation.

Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
753 1
positive (0)
Actual 0.9081 0.9990 0.8175 0.942
138 620
negative (1)

Table 6. LR confusion matrix before segmentation.

Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
753 1
positive (0)
Actual 0.9065 0.9984 0.8149 0.938
140 618
negative (1)

5.1. Customer Segmentation Analysis


Customer segmentation can help enterprises improve product marketing plans and
re-formulate communication strategies with core customer groups. It matches products
with customer preferences, which helps to re-plan products and speed up delivery [62].
In this paper, k-means were used to subdivide the sample data into three types. Table 1
summarizes the distribution of the number of customers of these three types. Most of the
customers were concentrated in Cluster I. The number of Cluster I customers was 4935,
accounting for 60.5% of the number of customers. The customer churn rate of Cluster I was
90.2%. Compared with the customers of Cluster II and Cluster III, the non-churn rate of
this kind of customer was higher by 9.8%, and the number of customers of Cluster III was
the least. With only 524 customers, Cluster III also had the highest attrition rate. This result
means that Cluster I customers may be the core customers of enterprises, who should be
paid more attention to while formulating marketing strategies and predicting customer
churn measures. It also indicates the effectiveness of the k-means algorithm in customer
segmentation. These insights are valuable for data analysis and customer churn modeling
of B2C e-commerce enterprises.

5.2. Forecast Model Performance


We conducted comparative experiments on the SVM and LR models. Accuracy, recall,
and precision values for each category were calculated according to the confusion matrix
to evaluate the performance of the three categories in the dataset. Tables 7 and 8 show the
experimental results of the SVM and LR model predictions, respectively, after customer
segmentation. It can be seen from Tables 7 and 8 that the prediction accuracy of the
SVM model for the three types of customers was higher than that of the LR model, and
the prediction effect of the SVM model was good. However, to confirm the prediction
quality, accuracy alone is sometimes misleading [63]. Therefore, when evaluating the
prediction performance of the model, not only accuracy but also recall and precision should
be observed, and the performance of the prediction model should be comprehensively
determined according to the three performance indicators: accuracy, recall, and precision.
In the experimental results of this paper, on average, the three indicators of the SVM model
after customer segmentation were 0.9156, 0.9721, and 0.861, and those of the LR model
were 0.9066, 0.9498, and 0.8533. Therefore, the prediction performance of the SVM model
was better than that of the LR model.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 468

Table 7. SVM confusion matrix after segmentation.

Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
11 1
positive (0)
Cluster I 0.9256 0.9184 0.9298 0.992
Actual
4 48
negative (1)
Actual
107 0
positive (0)
Cluster II 0.9158 0.9982 0.8820 0.966
Actual
31 231
negative (1)
Actual
635 0
positive (0)
Cluster III 0.9053 0.9998 0.7706 0.927
Actual
102 343
negative (1)
Avg. 0.9156 0.9721 0.8608

Table 8. LR confusion matrix after segmentation.

Predicted
Predicted Predicted Accuracy Recall Precision AUC
Positive (0) Negative (1)
Actual
10 2
positive (0)
Cluster I Actual 0.9050 0.8496 0.9176 0.963
4 47
negative (1)
Actual
107 0
positive (0)
Cluster II Actual 0.9098 1 0.8728 0.955
33 228
negative (1)
Actual
635 0
positive (0)
Cluster III Actual 0.9050 1 0.7697 0.904
102 343
negative (1)
Avg. 0.9066 0.9498 0.8533

The generalization ability of a prediction model is an important aspect of whether the


prediction performance of the model is satisfactory. Therefore, we used the ROC and AUC
to evaluate the generalization ability of the model. ROC can easily detect the influence
of arbitrary thresholds on the generalization performance of the learner [64,65]. Figure 3
shows the ROC corresponding to the SVM model after subdivision. It can be seen from
Figure 3 that when the threshold values of Cluster I, Cluster II, and Cluster III were 0.27,
0.16, and 0.071, the sensitivity was 0.98, 0.913, and 0.786. The specificity was 1.0, 1.0, and
0.983, and the AUC was 0.992, 0.966, and 0.927. Similarly, the performance indicators of the
LR model can be seen in Figure 4. These experimental data prove that the SVM model has
good generalization ability and good prediction performance. We recommend using the
SVM model for customer churn predictions in B2C e-commerce analyses.
AER 2022, 17, FOR PEER REVIEW
J. Theor. Appl. Electron. Commer. Res. 2022, 17 469

(a) (b) (c)


Figure 4. The
Figure ROC
4. The ROCcurve
curve of theLR
of the LRafter
after segmentation.
segmentation. (a) Cluster
(a) Cluster I; (b)II;Cluster
I; (b) Cluster II; III.
(c) Cluster (c) Cluster II
6. Discussion
6. Discussion
As mentioned in the first part of this paper, the behavioral data of B2C e-commerce
As mentioned
customers in the first
is characterized part of
by vertical this paper,
timeliness and athe behavioral
variety of behavioraldatainformation
of B2C e-comme
customers is characterized by vertical timeliness and a variety of behavioral
(PV, Buy, Cart, and Fav). We take those shopping behaviors recorded with time periods informa
(segments) as an important variable characteristic and segment the shopping time into
(PV, Buy, Cart, and Fav). We take those shopping behaviors recorded with time peri
four periods (Daybreak, AM, PM, and Night), which is a new opinion of this paper.
(segments)
Among the as customer
an important variable
segmentation characteristic
models, the classical and RFM segment
model isthe shopping
popular. The time
fourrecency
periods in the model takes the time information as the variable, i.e., the time variable.ofTothis pa
(Daybreak, AM, PM, and Night), which is a new opinion
Amongstudythethecustomer
impact timesegmentation
variables have models,
on customer thesegmentation
classical RFM and model is popular. The
churn prediction,
cency in the model takes the time information as the variable, i.e., the on
Chang et al. [66] extended the time variable and offered an LRFM model based time
RFM. variable
L is defined as the number of time periods (such as days). The research results show that
study the impact time variables have on customer segmentation and churn predict
the number of time periods (L) is an important variable for customer churn prediction
Changandet canal.
be[66]
usedextended
to evaluatethe time variable
customer loyalty. Rachidand offered
et al. [67] an LRFM
studied the model
LRFM model based on R
L is using
defined as the
a data number
set from onlineofretailers
time periods (such
(electronics, as days).
fashion, The research
household appliances, results
and show
the number
children’sofproducts)
time periods (L)the
and took is an important
number of timevariable
periods as forancustomer
important churn
variable prediction
for
churn prediction. The results show that customer shopping behaviors in different time
can be used to evaluate customer loyalty. Rachid et al. [67] studied the LRFM model us
periods (days) are quite different. In the literature of Wu et al. [30], the shopping periods
a data set from online
of e-commerce customersretailers (electronics,
were divided fashion,
into three periods, household appliances,
i.e., day, night, and childr
and late night
products) and took
(not segmented intothe number
“hours”). The ofresults
time periods
show thatas thean important
customer churnvariable for churn pre
rates in different
tion. The results show that customer shopping behaviors in different time periods
periods are different. Alboukaey et al. [41] proposed an RF-Daily model, in which the (da
are quite different. In the literature of Wu et al. [30], the shopping periods of e-comme
time variable was defined as daily dynamic behavior (not monthly dynamic behavior).
The daily dynamic behavior and other variables form the multivariate time series. The
customers were divided into three periods, i.e., day, night, and late night (not segmen
results show that the churn prediction effect of the model is good, and the daily model
into is“hours”).
significantlyThebetter
resultsthanshow that theone.
the monthly customer
Chen etchurnal. [68]rates
also in different
extended the periods
RFM are
ferent.
modelAlboukaey
and proposed et al.an[41]
LRFMP proposed
model, in anwhich
RF-Daily
the Timemodel,
Variable in Lwhich the time
represents the time variable
defined as daily dynamic behavior (not monthly dynamic behavior). The daily dyna
between transactions. The results show that the time between transactions has a great
impact on customer churn.
behavior and other variables form the multivariate time series. The results show that
The time (period) variable described in this paper is divided into “hours”. In the
churn prediction
prediction data set, effect of the
excluding themodel
variableisofgood, andthe
Categories, thedatadaily model
variables is significantly
in this paper are be
thancomposed
the monthlyof time one. Chen
variables et al. [68]
(Daybreak, AM, PM alsoandextended
Night) andthe fourRFM model
behavior and(PV,
variables proposed
LRFMP model,
Buy, Cart, and in which
Fav). the Time
Our model can beVariable L represents
called RF-PBCF, the timeresearch
and the empirical between shows transactio
The results show that the time between transactions has a great impact on customer chu
that the Night and PM variables have a great impact on churn predictions. It can be seen
from Table 3 that the importance of Night Buy and PM Buy variables ranks as the top two.
Theresults
Our time are (period) variable
fundamentally described
consistent within this
the paper
results of theis divided into “hours”.
aforementioned literature, In the p
diction datathe
namely, set, excluding
shopping time the variable
variable is keyoftoCategories,
churn prediction, the datawhichvariables
embodiesin this paper
a new
composed
value forofthis time variables
paper. Marketing(Daybreak,
managers are AM, PM andinterested
particularly Night) and fourcustomers
in when behavior varia
(PV, Buy, Cart, and Fav). Our model can be called RF-PBCF, and the empirical
shop because this time variable segmented into hours can provide operable information for resea
shows that the Night and PM variables have a great impact on churn predictions. It
be seen from Table 3 that the importance of Night Buy and PM Buy variables ranks as
top two. Our results are fundamentally consistent with the results of the aforementio
literature, namely, the shopping time variable is key to churn prediction, which embod
J. Theor. Appl. Electron. Commer. Res. 2022, 17 470

them. They can then provide real-time and customized product promotions for customers
during these time periods according to this operable information for the sake of continuous
customer retention.
The literature on B2C e-commerce customer relationship management primarily
focuses on researching customer churn prediction modeling [69–71]. One of the pur-
poses of customer churn prediction for enterprises is to evaluate the effectiveness of
customer retention measures and conduct targeted commodity promotion activities
for retained customers. These marketing strategies should be operable, or enterprises
should not only determine target customers according to the customer churn tendency
but evaluate the practical effects of customer retention measures according to the practi-
cal situation [72]. This paper adopts the research method Classification before Prediction.
The customers are first segmented into three types according to the k-means algorithm.
After the type of customers is made clear, customer churn can then be predicted. The
recent literature [73] emphasizes the importance of customer segmentation, which places
customer segmentation as the first step, and is thus fundamentally consistent with our
research method (Classification before Prediction). For e-commerce enterprises, our
research method holds practical significance. For example, when an enterprise deter-
mines that Cluster I is the key customer group to be focused on, it can make a churn
prediction for Cluster I customers through an appropriate algorithm, judge the churn
tendency of Cluster I customers according to the prediction and evaluation indicators
and then formulate appropriate marketing strategies. If the enterprise cannot judge
such customers in time, no effective customer retention strategies will be carried out,
and, accordingly, part of the Cluster I customers may be subject to churn. Therefore,
the research method utilized in this paper fully takes into account customer retention.
Our research results reflect practicality, which embodies another important value of
this paper.
LR and SVM are compared to judge the prediction performance of the two models,
which is consistent with the research methods and results of other literature. The
prediction performance of SVM is better than that of LR [74–76]. When evaluating
the prediction performance of a model, undoubtedly several performance indicators
are always used. Our research adopts three indicators, namely Accuracy, Recall, and
Precision, which are basically the same as those evaluation indicators mentioned in
other literature [7,77,78]. However, we believe that the customer data of e-commerce
enterprises is unique. Enterprises often change product information or upload a variety
of evaluation information about consumption for the sake of customer retention, which
is quite different from financial and telecommunication customer information. The
training and testing times of various prediction models are quite disparate when dealing
with customer data. As a matter of fact, when we use the SVM model to train customer
data, the data training time for SVM is significantly reduced than that for LR. To retain
customers, e-commerce enterprises need to predict customers in real time. Therefore,
the operational efficiency of the models should be considered, which is mentioned in
only a few pieces of literature [57].

6.1. Theoretical Implications


Our research results, to some extent, are enlightening for model development in
B2C e-commerce customer churn predictions and customer relationship management
in enterprises. We first made some contributions to the existing literature on customer
segmentation and churn prediction models in the B2C e-commerce industry. In this paper,
four variables that directly affect consumer shopping behaviors were selected based
on various information from current online shopping websites as our data variables,
and a new customer churn prediction model is thus developed. The variables of this
model comprehensively reflect consumer shopping behavior and specific shopping
times. New opinions have been proposed based on the methods for selecting variables,
segmenting customers, and realizing variable quantity dimension reduction under the
J. Theor. Appl. Electron. Commer. Res. 2022, 17 471

B2C e-commerce environment. Although a large number of shopping websites exist, not
all variables have the same impact on customer churn. The previous researches [68] are
a basis for this paper, helps us continue to expand the research scope and shows that
shopping time (time period) is a key variable for prediction. Traditional R and F variables
are not key for predictions. For customers who are most likely to choose other companies
for consumption, the “Buy” and “PV” within the Night and PM periods are the most
relevant variables for customer churn. Previous research on customer churn [69–71]
has focused on the prediction performance of models. Although we have a limited
understanding of the reasons for customer churn in the B2C e-commerce industry and
enterprises, studying the impact of time variables (P, B, C and F) in this context helps
prove the generalizability of the traditional RFM model. In addition, this paper, based on
the research method segmentation before prediction, clarifies how to predict customer
churn more accurately in the e-commerce industry, which is another achievement of
this paper.

6.2. Implications for Practice


From a practical point of view, our research results are also significant. Developing a
customer churn prediction model that meets the practical requirements of enterprises is
obviously helpful for enterprise customer relationship management. Enterprises can have
an insight into the causes of customer churn in accordance with the importance of con-
sumption characteristics. Our research is helpful for B2C e-commerce marketing managers
in optimizing enterprise marketing strategies and retaining customers through commodity
promotion activities based on the results of the churn model (rather than the traditional
churn prediction model) proposed in this paper. Traditional churn prediction models
cannot fully achieve the business goal of maximizing customer retention, namely, how to
minimize customer churn through the commodity recommendation system. Traditional
churn prediction models focus more on the overall results, namely, whether customers
will churn, but ignore the factors of customer retention in practical situations. For an
enterprise, these may cause some problems in that online shopping customers usually have
their own individual consumption habits and intentions. If the enterprise cannot adapt to
customer consumption habits, a transfer of customers with consumption intention to other
competitive companies will be caused. Traditional models often ignore the correlation
between the number of churned customers and the retained ones to exclude some of them
with shopping intention from retention activities. For example, some customers habitually,
such as shopping at night, however, commodity promotion activities are not held at night.
This problem can be solved by accurately identifying customer types and improving the
customer churn model, i.e., the customer segmentation method and prediction model
mentioned in this paper. Gattermann-Itschert et al. [78] studied the correlation between
the number of churned customers and retention activities through a field experiment and
proved the effectiveness of customer retention activities based on the churn prediction
with practical evidence. The model proposed in this paper can help enterprises lock target
customers easily. Enterprises can reduce not only customer churn but also marketing ex-
penditures. The research results show that the combination of our customer segmentation
and the prediction model is a feasible scheme for B2C e-commerce enterprises to deal with
customer churn.
On the other hand, we support the analysis teams of enterprises adopting a predic-
tion method that combines customer segmentation and churn prediction. Our research
shows that the performances of these customer segmentation methods and prediction
algorithms are similar to that of other methods, including the SVMauc prediction model [7],
segmentation-based modeling approach [73], and multi-slicing technique [78]. The existing
marketing literature detail that k-means customer segmentation is a valuable tool in all
cases because there will always be some customer groups with different shopping behav-
iors. For example, the literature [79] shows that the customer churn rate of the time-based
customer segmentation method is quite different from that of enterprise profitability (e.g.,
J. Theor. Appl. Electron. Commer. Res. 2022, 17 472

Money). Our research supports “segmentation first” as the first step in predictions, which
is of practical guiding significance for enterprise customer retention. B2C e-commerce
companies should understand that it is not impossible to fully utilize machine learning
tools (technologies) in the customer recommendation system. Marketing managers need to
constantly learn the application methods for various machine learning tools (technologies).
The method mentioned in this paper is reliable and operable and allows an enterprise to
judge which customers are liable to leave in time. Our research shows that marketing
managers can easily optimize their customer retention strategies and product promotion
activities by seeking target customer groups. Considering the operation efficiency of the
model and the ability inherent in enterprise technical teams, the fruits of our method are
consistent with previous literature [80,81]. The technical teams and managers at companies
should not only consider prediction technologies, but also the ability and knowledge of
the personnel or managers using machine learning technologies. In short, for enterprises,
cooperation between people and technologies is always necessary for data processing and
customer segmentation or model optimization and performance evaluation.

7. Conclusions
Customer churn predictions are very important in e-commerce. To maintain market
competitiveness, B2C enterprises should make full use of machine learning in customer
relationship management to predict the potential loss of customers and devise new market-
ing strategies and customer retention measures according to the prediction results. This
will help establish efficient and accurate loss prediction for e-commerce enterprises.
This paper used customer behavior data of a B2C e-commerce enterprise to test the
predictive ability of the SVM and LR models. To evaluate the prediction performance of the
two models, the k-means algorithm was first used for clustering subdivision to classify into
three types of customers, and then predictions were made for these three types of customers.
Accuracy, recall, precision, and AUC were calculated. There were two motivations for
our research. The first purpose was to study the effectiveness of customer segmentation
and the prediction effect of the model before and after customer segmentation according
to the longitudinal timeliness and multivariate variables of customer shopping behavior.
The experimental results prove that the customer segmentation of each prediction index
has a significant improvement. Therefore, k-means clustering segmentation is necessary.
The second purpose was to compare the effects of the LR model’s prediction based on
traditional statistics and the SVM’s prediction based on machine learning. The results
prove that the accuracy of the SVM model prediction was higher than that of the LR model
prediction. These research results have significance for customer relationship management
of B2C e-commerce enterprises.
The results of this study also have some limitations. We used a real data set containing
987,994 customers under the B2C environment, namely, the selection of data is limited to
a certain extent. Ideally, the research results should be verified by several data sets. The
results of customer segmentation greatly impact the prediction performance of the model.
In terms of the method, this paper only uses the k-means algorithm to segment and divide
customer types, which may have limitations, because if two or more segmentation methods
are compared, more convincing results may be obtained. In addition, we only use a small
number of predictive variables, which makes the promotion of our results limited, because
a lot of shopping information is presented on B2C websites, and some of it may be ignored.
Future research can be conducted from several aspects. First, we can collect and
compare the customer behavioral data from several companies to further enhance the
generalizability of our model. Second, considering customer retention is a durative task
of the enterprises, it is necessary to continuously predict and evaluate the churn. It is
crucial to carry out a segmentation considering the customer value and build a prediction
model with value variables, in that all prediction modeling works and customer retention
activities of enterprises aim at profit first.
J. Theor. Appl. Electron. Commer. Res. 2022, 17 473

Author Contributions: Conceptualization, X.X. and Y.H.; methodology, X.X. and Y.H.; software, X.X.;
validation, X.X. and Y.H.; formal analysis, X.X. and Y.H.; investigation, X.X. and Y.H.; resources, X.X.;
data curation, X.X.; writing—original draft preparation, X.X. and Y.H.; writing—review and editing,
X.X. and Y.H.; visualization, X.X. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in
the study.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Bi, Q.Q. Cultivating loyal customers through online customer communities: A psychological contract perspective. J. Bus. Res.
2019, 103, 34–44. [CrossRef]
2. Maria, O.; Bravo, C.; Verbeke, W.; Sarraute, C.; Baesens, B.; Vanthienen, J. Social network analytics for churn prediction in telco:
Model building, evaluation and network architecture. Expert. Syst. Appl. 2017, 85, 204–220.
3. Roberts, J.H. Developing new rules for new markets. J. Acad. Market. Sci. 2000, 8, 31–44. [CrossRef]
4. Reichheld, F.F.; Sasser, W.E. Zero defeofions: Quoliiy comes to services. Harvard. Bus. Rev. 1990, 68, 105–111.
5. Jones, T.O.; Sasser, W.E., Jr. Why satisfied customers defect. IEEE Eng. Manag. Rev. 1998, 26, 16–26. [CrossRef]
6. Nie, G.; Rowe, W.; Zhang, L.; Tian, Y.; Shi, Y. Credit card chum forecasting by logistic regression and decision tree. Expert. Syst.
Appl. 2011, 38, 15273–15285. [CrossRef]
7. Gordini, N.; Veglio, V. Customers churn prediction and marketing retention strategies: An application of support vector machines
based on the AUC parameter-selection technique in B2B e-commerce industry. Ind. Market. Manag. 2017, 62, 100–107. [CrossRef]
8. Zorn, S.; Jarvis, W.; Bellman, S. Attitudinal perspectives for predicting churn. J. Res. Interact. Mark. 2010, 4, 157–169. [CrossRef]
9. Datta, P.; Masand, B.; Mani, D.R.; Li, B. Automated cellular modeling and prediction on a large scale. Artif. Intell. Rev. 2000, 14,
485–502. [CrossRef]
10. Jain, H.; Khunteta, A.; Srivastava, S. Churn prediction in telecommunication using logistic regression and logit boost. Procedia
Comput. Sci. 2020, 167, 101–112. [CrossRef]
11. Coussement, K.; Lessmann, S.; Verstraeten, G. A comparative analysis of data preparation algorithms for customer churn
prediction: A case study in the telecommunication industry. Decis. Support Syst. 2017, 95, 27–36. [CrossRef]
12. Masand, B.; Datta, P.; Mani, D.R.; Li, B. CHAMP: A prototype for automated cellular churn prediction. Data Min. Knowl. Disc.
1999, 3, 219–225. [CrossRef]
13. Huang, Y.; Kechadi, T. An effective hybrid learning system for telecommunication churn prediction. Expert. Syst. Appl. 2013, 40,
5635–5647. [CrossRef]
14. Larivière, B.; Van den Poel, D. Investigating the role of product features in preventing customer churn, by using survival analysis
and choice modeling: The case of financial services. Expert. Syst. Appl. 2004, 27, 277–285. [CrossRef]
15. Zi˛eba, M.; Tomczak, S.K.; Tomczak, J.M. Ensemble boosted trees with synthetic features generation in application to bankruptcy
prediction. Expert. Syst. Appl. 2016, 58, 93–101. [CrossRef]
16. Kotler, P.; Keller, K. Marketing Management, 15th ed.; Pearson Education Ltd.: Edinburgh, UK, 2016; pp. 89–120.
ISBN 10: 1-292-0926-29.
17. Cao, L. In-depth behavior understanding and use: The behavior informatics approach. Inform. Sci. 2010, 180, 3067–3085.
[CrossRef]
18. Cao, L.; Yu, P.S. Behavior informatics: An informatics perspective for behavior studies. IEEE Intell. Inf. Bulletin. 2009, 10, 6–11.
19. Orsenigo, C.; Vercellis, C. Combining discrete SVM and fixed cardinality warping distances for multivariate time series classifica-
tion. Pattern Recogn. 2010, 43, 3787–3794. [CrossRef]
20. Eichinger, F.; Nauck, D.D.; Klawonn, F. Sequence mining for customer behaviour predictions in telecommunications. In
Proceedings of the Workshop on Practical Data Mining: Applications, Experiences and Challenges (ECML/PKDD), Berlin,
Germany, 22 September 2006; pp. 3–10.
21. Prinzie, A.; Van den Poel, D. Incorporating sequential information into traditional classification models by using an
element/position-sensitive SAM. Decis. Support Syst. 2006, 42, 508–526. [CrossRef]
22. Pınar, K.; Topcu, Y.I. Applying Bayesian Belief Network approach to customer chum analysis: A case study on the telecom
industry of Turkey. Expert. Syst. Appl. 2011, 38, 7151–7157.
23. Renjith, S. An integrated framework to recommend personalized retention actions to control B2C E-commerce customer churn.
Intl. J. Eng. Trends Technol. 2015, 27, 152–157. [CrossRef]
J. Theor. Appl. Electron. Commer. Res. 2022, 17 474

24. Caignya, A.D.; Coussementa, K.; Bockb, K.W.D. A new hybrid classification algorithm for customer churn prediction based on
logistic regression and decision trees. Eur. J. Oper. Res. 2018, 269, 760–772. [CrossRef]
25. Neslin, S.A.; Gupta, S.; Kamakura, W.; Lu, J.; Mason, C.H. Defection detection: Measuring and understanding the predictive
accuracy of customer churn models. J. Mark. Res. 2006, 43, 204–211. [CrossRef]
26. Zhang, Y. A Customer Churn Alarm Model based on the C5.0 Decision Tree-Taking the Postal Short Message as an Example. Stat.
Inf. Forum. 2015, 30, 89–94.
27. Farquad, M.A.H.; Ravi, V.; Raju, S.B. Churn prediction using comprehensible support vector machine: An analytical CRM
application. Appl. Soft. Comput. 2014, 19, 31–40. [CrossRef]
28. Tian, L.; Qiu, H.; Zheng, L. Telecom chum prediction modeling and application based on neural network. Comput. Appl. 2007, 27,
2294–2297.
29. Yu, R.; An, X.; Jin, B.; Shi, J.; Move, O.A.; Liu, Y. Particle classification optimization-based BP network for telecommunication
customer churn prediction. Neural Comput. Appl. 2018, 2, 707–720. [CrossRef]
30. Wu, X.; Meng, S. E-commerce Customer Churn Prediction based on Customer Segmentation and AdaBoost. In Proceedings of the
International Conference on Service Systems and Service Management (ICSSSM), Kunming, China, 24–26 June 2016.
31. Ji, H.; Ni, F.; Liu, J. Prediction of telecom customer churn based on XGB-BFS feature selection algorithm. Comput. Technol. Dev.
2021, 31, 21–25.
32. Ahmed, A.A.Q.; Maheswari, D. An enhanced ensemble classifier for telecom churn prediction using cost based uplift modeling.
Intl. J. Inf. Technol. 2019, 11, 381–391.
33. Ying, W.Y. Research on the LDA boosting in customer churn prediction. J. Appl. Stat. Manag. 2010, 29, 400–408.
34. Zhang, W.; Yang, S.; Liu, T.T.; School, M. Customer churn prediction in mobile communication enterprises based on CART and
Boosting algorithm. Chin. J. Manag. Sci. 2014, 22, 90–96.
35. Wu, J.; Shi, L.; Lin, W.P.; Tsai, S.B.; Xu, G.S. An empirical study on customer segmentation by purchase behaviors using a RFM
model and K-means algorithm. Math. Probl. Eng. 2020, 2020, 1–7. [CrossRef]
36. Wu, J.; Shi, L.; Yang, L.P.; Niu, X.X.; Li, Y.Y.; Cui, X.D.; Tsai, S.B.; Zhang, Y.B. User Value Identification Based on Improved RFM
Model and K-Means++ Algorithm for Complex Data Analysis. Wirel Commun. Mob.Com. 2021, 9982484, 1–8. [CrossRef]
37. Li, Y.; Chu, X.Q.; Tian, D.; Feng, J.Y.; Mu, W.S. Customer segmentation using K-means clustering and the adaptive. Appl. Soft
Comput. 2021, 113, 107924. [CrossRef]
38. Christy, A.J.; Umamakeswari, A.; Priyatharsini, L.; Neyaa, A. RFM ranking-An effective approach to customer segmentation. J.
King. Saud. Univ. Sci. 2021, 33, 1251–1257. [CrossRef]
39. Abbasimehr, H.; Bahrini, A. An analytical framework based on the recency, frequency, and monetary model and time series
clustering techniques for dynamic segmentation. Expert. Syst. Appl. 2021, 192, 116373. [CrossRef]
40. Hosseini, M.; Shajari, S.; Akbarabadi, M. Identifying multi-channel value co-creator groups in the banking industry. J. Retail.
Consum. Serv. 2022, 5, 102312. [CrossRef]
41. Alboukaey, N.; Joukhadar, A.; Ghneim, N. Dynamic behavior based churn prediction in mobile telecom. Expert. Syst. Appl. 2020,
162, 113779. [CrossRef]
42. Zhou, J.; Zhai, L.L.; Pantelous, A.A. Market Segmentation Using High-dimensional Sparse Consumers Data. Expert. Syst. Appl.
2020, 145, 113136. [CrossRef]
43. Li, M.; Wang, Q.W.; Shen, Y.Z.; Zhu, T.Y. Customer relationship management analysis of outpatients in a Chinese infectious
disease hospital using drug-proportion recency-frequency-monetary model. Int. J. Med. Inform. 2021, 147, 104373. [CrossRef]
44. Vapnik, V.N. The Nature of Statistic Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2000; pp. 138–141. ISBN 0387987800.
45. Vapnik, V.N. Statistic Learning Theory; Wiley-Interscience: John & Wiley, Inc.: New York, NY, USA, 1998; pp. 375–383.
ISBN 0417030031.
46. Scholkopf, B.; Smolla, A. Learning with kernels-Support Vector Machines, Regularization, Optimization, and Beyond; The MIT Press:
Cambridge, MA, USA; London, UK, 2002; pp. 189–222. ISBN 0262194759.
47. Lee, S.; Lee, H.; Abbeel, P.; Andrew, Y.N. Efficient L1 regularized logistic regression. In Proceedings of the 21st National
Conference on Artificial Intelligence (AAAI-06), Boston, MA, USA, 16 July 2006; Volume 1, pp. 401–408.
48. Minka, T.P. Algorithms for Maximum-Likelihood Logistic Regression; Carnegie Mellon University Research Showcase: Technical
Report (Mathematics); IEEE: Pittsburgh, PA, USA, 2003; pp. 1–15. [CrossRef]
49. Alibaba Cloud Tianchi Data Sets. Available online: https://tianchi.aliyun.com/datase (accessed on 17 March 2021).
50. Cao, L. Behavior Informatics and Analytics: Let Behavior Talk. In Proceedings of IEE International Conference on Data Mining
Workshops (ICDM), Pisa, Italy, 15–19 December 2008; pp. 15–19.
51. Stolfo, S.J.; Hershkop, S.; Hu, C.W.; Li, W.J.; Nimeskern, O.; Wang, K. Behavior-based modeling and its application to Email
analysis. ACM T. Internet. Appl. 2006, 6, 187–221. [CrossRef]
52. Pham, D.T.; Dimov, S.S.; Nguyen, C.D. Selection of K in K-means clustering. Mech. Eng. Sci. 2004, 219, 103–119. [CrossRef]
53. Chen, N.; Chen, A.; Zhou, L. An Effective Clustering Algorithm in Large Transaction Databases. J. Sw. 2001, 12, 476–484.
54. Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B. New insights into churn prediction in the telecommunication sector: A
profit driven data mining approach. Eur. J. Oper. Res. 2012, 218, 211–229. [CrossRef]
55. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
56. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
J. Theor. Appl. Electron. Commer. Res. 2022, 17 475

57. Goldstein, B.A.; Polley, E.C.; Briggs, F.B.S. Random Forests for Genetic Association Studies. Stat. Appl. Genet. Mol. 2011, 10, 32.
[CrossRef]
58. Drummond, C.; Holte, R.C. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings
of Workshop on Learning from Imbalanced Datasets II, ICML, Washington, DC, USA, 21 August 2003.
59. Chawla, N.V.; Bowyer, K.W.; Hall, L.Q.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357. [CrossRef]
60. Provost, F. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In
Proceedings of the International Conference on knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, 15–18
August 1999; pp. 43–48.
61. Fan, X.; Ke, T. Enhanced maximum AUC linear classifier. In Proceedings of the International Conference on Fuzzy Systems and
Knowledge Discovery (FSKD), Yantai, China, 10–12 August 2010; pp. 1540–1544.
62. Brito, P.Q.; Carlos, S.; Sérgio, A.; Mote, A.; Byvoet, M. Customer segmentation in a large database of an online customized fashion
business. Robot. Cim-int. Manuf. 2015, 36, 93–100. [CrossRef]
63. Sturm, A.; Bob, L. Classification accuracy is not enough. J. Intell. Inf. Syst. 2013, 41, 371–406. [CrossRef]
64. Ma, S.; Huang, J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics
2005, 21, 4356–4362. [CrossRef] [PubMed]
65. Song, X.; Ma, S. Penalized Variable Selection with U-Estimates. J. Nonparametr. Stat. 2010, 22, 499–515. [CrossRef] [PubMed]
66. Chang, H.H.; Tsay, S.F. Integrating of SOM and K-mean in data mining clustering: An empirical study of CRM and profitability
evaluation. J. Inform. Manag. 2004, 11, 161–203.
67. Rachid, A.D.; Abdellah, A.; Belaid, B.; Rachid, L. Clustering Prediction Techniques in Defining and Predicting Customers
Defection: The Case of E-Commerce Context. Int. J. Elect. Comput. Eng. 2018, 8, 2367–2383. [CrossRef]
68. Chen, K.; Hu, Y.H.; Hsieh, Y.C. Predicting customer churn from valuable B2B customers in the logistics industry: A case study.
Inf. Syst. E-Bus. Manage. 2015, 13, 475–494. [CrossRef]
69. Buckinx, W.; Van den Poel, D. Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG
retail setting. Eur. J. Oper. Res. 2005, 164, 252–268. [CrossRef]
70. Migueis, V.L.; Van den Poel, D.; Camanho, A.S.; Cunha, J.F. Modeling partial customer churn: On the value of first product-
category purchase sequences. Expert. Syst. Appl. 2012, 39, 11250–11256. [CrossRef]
71. Miguéis, V.L.; Camanho, A.; Cunha, J.F. Customer attrition in retailing: An application of Multivariate Adaptive Regression
Splines. Expert. Syst. Appl. 2013, 40, 6225–6232. [CrossRef]
72. Ascarza, E. Retention Futility: Targeting High Risk Customers Might Be Ineffective. J. Mark. Res. 2018, 55, 80–98. [CrossRef]
73. Caigny, D.A.; Coussement, K.; Verbeke, W.; Idbenjra, K.; Phan, M. Uplift modeling and its implications for B2B customer churn
prediction: A segmentation-based modeling approach. Ind. Market. Manag. 2021, 99, 28–39. [CrossRef]
74. Kim, S.; Shin, K.; Park, K. An application of support vector machines for customer churn analysis: Credit card case. In Proceedings
of the First international conference on Advances in Natural Computation (ICNC), Changsha, China, 27–29 August 2005;
pp. 636–647.
75. Coussement, K.; Van den Poel, D. Churn prediction in subscription services: An application of support vector machines while
comparing two parameter-selection techniques. Expert. Syst. Appl. 2008, 34, 313–327. [CrossRef]
76. Kim, H.S.; Sohn, S.Y. Support vector machines for default prediction of SMEs based on technology credit. Eur. J. Oper. Res. 2010,
201, 838–846. [CrossRef]
77. Schaeffer, S.E.; Sanchez, S.V.R. Forecasting client retention-A machine-learning approach. J. Retail. Consum. Serv. 2020, 52, 101918.
[CrossRef]
78. Gattermann-Itschert, T.; Thonemann, U.W. How training on multiple time slices improves performance in churn prediction. Eur.
J. Oper. Res. 2021, 295, 664–674. [CrossRef]
79. Sood, A.; Kumar, V. Analyzing client profitability across diffusion segments for a continuous innovation. J. Mark. Res. 2017, 54,
932–951. [CrossRef]
80. Duan, Y.; Edwards, J.S.; Dwivedi, Y.K. Artificial intelligence for decision making in the era of big data-Evolution, challenges and
research agenda. Int. J. Inform. Manag. 2019, 48, 63–71. [CrossRef]
81. Dwivedi, Y.K.; Hughes, L.; Ismagilova, E.; Aarts, G.; Coombs, C.; Crick, T.; Duan, Y.; Dwived, R.; Edwards, J.; Eirug, A.; et al.
Artificial intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice
and policy. Int. J. Inform. Manag. 2021, 57, 101994. [CrossRef]

You might also like