0% found this document useful (0 votes)

48 views7 pages

E-Commerce Customer Segmentation

The document discusses a project focused on customer segmentation in e-commerce using machine learning techniques. It details the process of classifying customers into categories based on their characteristics through unsupervised learning and training supervised classifiers, achieving accuracies of 75.50% for Random Forest and 75.77% for Gradient Boost. The study emphasizes the importance of effective customer segmentation for businesses to enhance marketing strategies and improve customer satisfaction.

Uploaded by

nizsam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views7 pages

E-Commerce Customer Segmentation

Uploaded by

nizsam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/355166048

Customer Segmentation in E- Commerce

Article · July 2021

CITATIONS READS

3 7,141

6 authors, including:

Vani Ashok
Sri Jayachamarajendra College of Engineering
16 PUBLICATIONS 134 CITATIONS

SEE PROFILE

All content following this page was uploaded by Vani Ashok on 10 October 2021.

The user has requested enhancement of the downloaded file.

Customer Segmentation in E-
Commerce
1
Vani Ashok, 2Rahul R Kamath, 3Adithya RK, 4Supreeth Singh,5Ajay Bhati
1
Assistant Professor, 2B.E Student, 3B.E Student, 4B.E Student, 5B.E Student
1
Dept. of Computer Science and Engineering,
1
JSS Science and Technology University, Mysuru, India

Abstract : Customer segmentation is the process of dividing customers into groups based on common characteristics so that
companies can target each group efficiently. With the increase in businesses coming up every day, it has become significantly
important for the old businesses to apply marketing strategies to stay in the market as the competition is increasing. This project is
an attempt to classify the customers into different categories using unsupervised learning and train supervised learning classifiers
using this data and decide the best classifier based on the values of accuracy. In this project many machine learning libraries present
in Python are used to perform different operations. KNN, Random Forest, Gradient Boost are the supervised learning algorithms
used and an accuracy of 68.29, 75.50, 75.77 was achieved respectively during testing.

Keywords: Segmentation, KNN, Random Forest, Gradient Boost

1. INTRODUCTION

With the growth of the market, it has become mandatory for the old businesses to apply marketing strategies to stay in the market
due to the increase in competition. The number of consumers is increasing significantly every day and it has become challenging
for the businesses to cater to the requirements of every and each customer. Hence Customer segmentation is used which is the
process of dividing customers into groups with respect to the common characteristics by which the companies can target each group
efficiently. It allows companies to see what actually the purchasers are buying which can prompt the businesses to better serve their
customers leading to customer satisfaction, it also allows the businesses to seek out who their target customers are and improvise
their marketing tactics to get more revenues from them.

2. LITERATURE SURVEY

Li, Zeying [1] have proposed a method in which a retail supermarket was taken as research object, and data mining methods was
used to retail enterprise customer segments, and then association rules obtained using Apriori algorithm were used to different
groups of customers and get rules about customer characteristics to make customer characteristic analysis efficiently. Finally, the
author gave some references to the supermarket's marketing and management work, which helped in understanding it in detail. Data
mining was used efficiently to deal with the large number of historical and current data, from the database to find some potential,
useful and valuable information for the retail stores which help us target customers.

Wang, Zhenyu, Yi Zuo, Tieshan Li, CL Philip Chen, and Katsutoshi Yada [2] have analyzed customer segmentation based on
broad learning system which provides an alternative view of learning in deep structure. Firstly, in addition to customer purchasing
behavior, RFID (Radio Frequency Identification) data was also included, which can accurately represent the consumers' in-store
behavior. Secondly, this paper used Broad Learning System (BLS) to analyze the consumer segmentation. BLS is one of the finest
machine learning techniques, and quite efficient and effective for classification tasks. Thirdly, the customer behavior data used in
this paper was collected from a real-world supermarket in Japan. Customer segmentation was considered as a multi-label
classification problem based on both of POS data and RFID data.

Kansal, Tushar, Suraj Bahuguna, Vishal Singh, and Tanupriya Choudhury [3] performed customer segmentation using K-means
clustering. A python program was developed and the program was trained by applying standard scaler onto a dataset having two
features of 200 training sample taken from local retail shop. Both the features are the average of the amount of shopping by
customers and average of the customer's visit into the shop annually. By applying clustering, 5 segments of cluster were formed
labelled as Careless, Careful, Standard, Target and Sensible customers. However, the authors got two new clusters on applying
mean shift clustering labelled as High buyers and frequent visitors and High buyers and occasional visitors.

Bhade Kalyani, Vedanti Gulalkari, Nidhi Harwani and Sudhir N Dhage [4] have proposed a systematic approach for targeting
customers and providing maximum profit to the organizations. An important initial step was to analyze the data of sales acquired
from the purchase history and determine the parameters that have the maximum correlation. Based on respective clusters, proper
resources can be assigned towards profitable customers using machine learning algorithms. K-Means clustering was used for
customer segmentation and Singular Value Decomposition was used for providing appropriate recommendations to the customers.
This paper also deals with the drawbacks of the recommender system like sparsity, cold start problem etc and how they can be
overcome.

JETIR2107116 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a908
© 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162)

3. RESEARCH METHODOLOGY

Fig 1: System architecture for customer segmentation.

Fig 1 shows the system architecture for customer segmentation. The data set obtained is passed to the pre-processing stage where
duplicates and null rows are dropped. A dataset may contain cancelled orders which are removed as well. The datatype of columns
is converted to the required datatype for pre-processing. The refined data set is then used to classify the customers into different
categories by un-supervised learning through K-Mean algorithm. The resultant dataset or matrix which contains customer and its
categories is used to train and test supervised training models such as Random Forest, K-Nearest Neighbors, and Gradient Boost.

3.1 Data Preprocessing:

The dataset was obtained from a machine learning repository in University of California [5]. Initially it was found that the dataset
has 5,41,909 rows and 8 columns (Invoice No, Stock Code, Description, Quantity, Invoice Date, Unit Price, Customer ID, Country).
After removing all the NULL values, the data frame was of 4,06,829 rows and 8 columns. It was observed that almost 25% of
customer ID entries were NULL. It was then observed that there were 5,225 duplicate rows in the data frame. It was then observed
that there were 37 different countries in the dataset and plotted a choropleth map for the same showing the orders per country. It
was observed that 16.47% of the orders were cancelled. Some entries had negative values for quantity so it was assumed that when
an entry is cancelled in the dataset, an entry with same Customer ID, description is added but with a negative value of quantity so
it was decided to locate the entries. The dataset had discount entries, which got discarded later. Even then the dataset had some
negative quantity entries which proved our assumption to be wrong. It was because the dataset has values from 2010 but the
remaining values are those which were made before 2010 but cancelled after 2010.

3.2 Classifying customers using unsupervised learning:

A basket was created and a basket price list for every transaction and a pie chart for different range of values was plotted. It was
seen that ∼65% of purchases give prizes in excess of £ 200.In the next step analysis of product categories was performed. A function
was defined that takes as input the data frame and analyses the content of the Description column. Firstly, it extracts root word
corresponding to a word using stemmer function and the values associated with each root word and represented it using a dictionary
(Key, Value). It then counts the number of times each root appears in the data frame. When several words are listed for the same
root, it was considered that the keyword associated with this root is the shortest name (E.g. if we have runner and running as values
associated with root word run, runner will be selected as it has smaller length). A subplot showing the number of occurrences of
every word was plotted. The keywords were used to create group of products and defined the matrix X with product along y axis
and description along x axis. Entry in the matrix will be 1 if the description of the corresponding product contains the word. The
product was then grouped into classes using K-means and silhouette score was calculated which indicates how well the clusters
have been formed. +1 indicates perfectly formed clusters, -1 indicates overlapping clusters. It was realized that 5 clusters will be
perfect in our case as the increase in silhouette score after 5 wasn’t significant.

JETIR2107116 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a909
© 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162)

Fig 2: Projecting PCA to convert the feature data from a higher dimensional space to a lower dimensional space.

In the Fig 2, a graph was plotted with principal components along the X axis and explained variance along the y axis. Explained
variance which is the variance shown by individual components, increased with the increase in the number of principal components.
Hence, it was observed that, as the number of principal components increases, loss decreases. Mean, sum, count, min, max for each
customer and percentage of the customer being allocated that each category was calculated.
Standard Scalar was used to standardize features by removing the mean and scaling to unit variance. Hence, a matrix with
standardized values was generated. In practice, before creating the clusters of customers, it is interesting to define a base of smaller
dimension allowing to describe the matrix. In this case, this base was used in order to create a representation of the different clusters
and thus verify the quality of the separation of the different groups. Therefore, PCA was performed beforehand.

JETIR2107116 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a910
© 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162)

From the Fig 3, it was concluded that many clusters are well separated while some are overlapping. At this point, clusters of
clients were created from the standardized matrix that was defined earlier and using the K-means algorithm a silhouette score of
0.218 was obtained.

3.3 Steps in supervised learning:

A model was built by feeding the obtained data to various supervised learning classifiers and accuracy for every classifier was
obtained and results were compared. Generally, determination of whether a given model is optimal is done by looking at its F1:
Score, precision, recall, and accuracy (for classification), or it’s coefficient of determination (R2) and error (for regression).
However, real world data is often distributed somewhat non uniform, meaning that the fitted model is likely to perform better on
some sections of the data than on others. Cross Validation Scores visualizer enables us to visually explore these variations in
performance using different cross validation strategies. Training score indicates how well the model is fitted for the training data.
The 3 supervised classifiers used are Random Forest, K nearest neighbors, Gradient Boost. A random forest is a estimator that fits
a number of decision tree classifiers on different sub-samples of the dataset and uses averaging to improve the calculated accuracy
and control over-fitting. The sub-sample size is controlled using the max_samples parameter if bootstrap=True (default), else the
whole dataset is used to build each tree [6]. In k-NN classification, the output has a class membership. An object is classified by a
majority vote of its neighbors, with the value being assigned to the class most common among its k nearest neighbor [7]. Gradient
Boost (GB) builds a model in a forward stage-wise fashion; it allows the user to optimize the values of arbitrary differentiable loss
functions. In each stage regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function.
Binary classification is a special case in which only a single regression tree is induced [8]. The 3 different classifiers Random forest,
K-NN, and Gradient Boost resulted in accuracy of 75.50, 68.29, 75.77, respectively for testing. A confusion matrix is a table which
gives information about the performance of a classification model on validation data for which the true values are known. The
accuracy of the results seems to be as expected. Nevertheless, when the different classes were defined, there was an imbalance in
size between the classes obtained. In particular, one class contains around 40% of the clients. Hence, we need a confusion matrix.

4. RESULTS

Fig 4 shows the learning curve for Random Forest. From the learning curve, it can be concluded that both the training score and
cross validation score increases initially and reaches a constant value. An accuracy of 90.30, 75.50 was achieved for training and
testing of Random Forest respectively.

Fig 4: Learning curve for Random Forest.

Fig 5 shows the learning curve for Random Forest. From the learning curve, it can be concluded that the training score decreases
with increase in samples but cross validation score increases with increase in samples. An accuracy of 90.17, 75.77 was achieved
for training and testing of Gradient Boost respectively.

JETIR2107116 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a911
© 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162)

Fig 5: Learning curve for Gradient Boosting

Table 1: Training and testing accuracy

CLASSFIER TRAINING ACCURACY TESTING ACCURACY

K-Nearest Neighbors (k=5) 81.44 68.29
Random Forest 90.30 75.50
Gradient boost 90.17 75.77

Table 1 shows the training and testing accuracy for the 3 different classifiers. A good accuracy of 75.50,75.77 was achieved for
Random Forest and Gradient Boost respectively whereas for KNN, accuracy of 68.29 was achieved which is comparatively low.

Fig 6 is the confusion matrix for the 11 customer categories . From the confusion matrix, the cases where the predicted class was
same as the true class and the cases where predicted class was different from the true class can be known. The entries along the
diagonal represent the number of correct predictions and the other entries represent the number of incorrect predictions. Hence most
of the entries are correctly predicted.

JETIR2107116 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a912
© 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162)

5. CONCLUSION

The proposed work aims to classify online e-commerce customer into various categories based on their characteristics like
spending amount, type of product they buy, and how frequently purchase happens etc. Initially the products were classified into 5
categories using K-means and through this result along with other characteristics like amount spent, frequency etc. online e-
commerce customers were classified into 11 categories using un-supervised method i.e. K-means Clustering. To measure the
accuracy, silhouette scoring was used and a score of 0.02 was obtained. PCA was used to obtain insights about the clusters. Later
different classifiers were trained using the data obtained so far. The 3 different classifiers Random forest, K-NN, and Gradient
Boost resulted in accuracy of 75.50, 68.29, 75.77, respectively. The values of accuracy for Random Forest and Gradient Boost are
almost same. Further from the learning curve of those two, it can be verified that both of them neither overfit nor underfit. Hence
both the classifiers are equally good for the given dataset.

REFERENCES

[1] Li, Zeying. "Research on customer segmentation in retailing based on clustering model." In 2011 International Conference on
Computer Science and Service System (CSSS), pp. 3437-3440. IEEE, 2011
[2] Wang, Zhenyu, Yi Zuo, Tieshan Li, CL Philip Chen, and Katsutoshi Yada. "Analysis of Customer Segmentation Based on
Broad Learning System." In 2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pp. 75-80.
IEEE, 2019.
[3] Kansal, Tushar, Suraj Bahuguna, Vishal Singh, and Tanupriya Choudhury. "Customer segmentation using K-means clustering."
In 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS), pp. 135-139. IEEE,
2018.
[4] Bhade Kalyani, Vedanti Gulalkari, Nidhi Harwani and Sudhir N Dhage “A Systematic approach to customer segmentation and
buyer targeting for profit maximization” In 2018 9th International Conference on Communication and Networking
Technologies(ICCCNT), pp. 1-6 IEEE,2018.
[5] https://archive.ics.uci.edu/ml/datasets/Online+Retail
[6] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
[7] https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
[8] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

JETIR2107116
View publication stats
Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a913

Data Science for Customer Segmentation
No ratings yet
Data Science for Customer Segmentation
7 pages
Customer Segmentation Literature Review 1
No ratings yet
Customer Segmentation Literature Review 1
8 pages
Sustainability 14 07243 v2
No ratings yet
Sustainability 14 07243 v2
15 pages
RFM & K-Means for E-Commerce
No ratings yet
RFM & K-Means for E-Commerce
4 pages
Adm Final
No ratings yet
Adm Final
7 pages
AI-Driven Customer Profiling
No ratings yet
AI-Driven Customer Profiling
11 pages
JPSP202244
No ratings yet
JPSP202244
7 pages
Customer Segmentation Using K Means Clustering IJERTV11IS030152
No ratings yet
Customer Segmentation Using K Means Clustering IJERTV11IS030152
6 pages
Customer Profiling Segmentation and Sales Predicti
No ratings yet
Customer Profiling Segmentation and Sales Predicti
12 pages
WQD7005 Case Study - 17219402
No ratings yet
WQD7005 Case Study - 17219402
21 pages
IJCRT2407525
No ratings yet
IJCRT2407525
9 pages
Advanced Data Science Project Report
No ratings yet
Advanced Data Science Project Report
3 pages
2629 Gembali Maneesh
No ratings yet
2629 Gembali Maneesh
59 pages
Customer Segmentation Using Machine Learning
No ratings yet
Customer Segmentation Using Machine Learning
8 pages
E-Commerce Customer Segmentation Via Unsupervised Machine Learning
No ratings yet
E-Commerce Customer Segmentation Via Unsupervised Machine Learning
7 pages
25 Vol 102 No 22
No ratings yet
25 Vol 102 No 22
13 pages
CUSTOMER - MALL - SEGMENTATION.1 (1) (1) (Autosaved)
No ratings yet
CUSTOMER - MALL - SEGMENTATION.1 (1) (1) (Autosaved)
9 pages
IGI - Book 270 292
No ratings yet
IGI - Book 270 292
24 pages
Major 74 Team
No ratings yet
Major 74 Team
20 pages
Mall Customer Segmentation Kalash Daf
No ratings yet
Mall Customer Segmentation Kalash Daf
12 pages
1 PB
No ratings yet
1 PB
10 pages
Comparison of K-Means and DBSCAN
No ratings yet
Comparison of K-Means and DBSCAN
20 pages
ML Review PPT 2
No ratings yet
ML Review PPT 2
29 pages
Utkaarshhhhhhhhhhhhhhhhh
No ratings yet
Utkaarshhhhhhhhhhhhhhhhh
50 pages
ADS Phase2
No ratings yet
ADS Phase2
6 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
IEEE Conference Template 5
No ratings yet
IEEE Conference Template 5
5 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
Verapandi
No ratings yet
Verapandi
4 pages
1 s2.0 S1319157819309802 Main
No ratings yet
1 s2.0 S1319157819309802 Main
8 pages
Page - 1
No ratings yet
Page - 1
5 pages
Banking Customer Segmentation
No ratings yet
Banking Customer Segmentation
6 pages
Customer Segmentation Marketing Strategy Based On Big Data Analysis and Clustering Algorithm
No ratings yet
Customer Segmentation Marketing Strategy Based On Big Data Analysis and Clustering Algorithm
16 pages
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
No ratings yet
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
9 pages
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
67% (3)
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
66 pages
BT 4065 Report
No ratings yet
BT 4065 Report
32 pages
RFM Analysis For Customer Segmentation Using Machine Learning: A Survey of A Decade of Research
No ratings yet
RFM Analysis For Customer Segmentation Using Machine Learning: A Survey of A Decade of Research
8 pages
Chapter 1,2 Report
No ratings yet
Chapter 1,2 Report
5 pages
Advanced Customer Clustering
No ratings yet
Advanced Customer Clustering
10 pages
DS MP
No ratings yet
DS MP
18 pages
Behavioural Customer Segmentation Based
No ratings yet
Behavioural Customer Segmentation Based
7 pages
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
No ratings yet
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
9 pages
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
No ratings yet
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
8 pages
MGT Report 1
No ratings yet
MGT Report 1
20 pages
Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
No ratings yet
Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
10 pages
Energy Consumption Prediction System
No ratings yet
Energy Consumption Prediction System
21 pages
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
No ratings yet
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
8 pages
Customer Segmentation New
No ratings yet
Customer Segmentation New
11 pages
E-commerce Customer Segmentation Review
No ratings yet
E-commerce Customer Segmentation Review
45 pages
Da cs-1
No ratings yet
Da cs-1
11 pages
New RFM-D Classification Model For Improving Customer Analysis and Response Prediction
No ratings yet
New RFM-D Classification Model For Improving Customer Analysis and Response Prediction
9 pages
Irjet V11i5300
No ratings yet
Irjet V11i5300
5 pages
Customer Segmentation
No ratings yet
Customer Segmentation
21 pages
IJCSP23D1055
No ratings yet
IJCSP23D1055
9 pages
Customer Targetted Ecommerce
No ratings yet
Customer Targetted Ecommerce
85 pages
Local Business
No ratings yet
Local Business
81 pages
Measurements
No ratings yet
Measurements
79 pages
Rec Carrier Guidance
No ratings yet
Rec Carrier Guidance
67 pages
Rev Visitor
No ratings yet
Rev Visitor
1 page
Rec BusPass - New
No ratings yet
Rec BusPass - New
65 pages
Agriculture Machne Shopping
No ratings yet
Agriculture Machne Shopping
16 pages
Banking System in C++
80% (5)
Banking System in C++
42 pages
Probability Worksheet
No ratings yet
Probability Worksheet
11 pages
Physics MCQs For Class 12 With Answers Chapter 10
No ratings yet
Physics MCQs For Class 12 With Answers Chapter 10
12 pages
CAD-CAM in Dentistry
No ratings yet
CAD-CAM in Dentistry
64 pages
Kadanes Algorithm Report
No ratings yet
Kadanes Algorithm Report
2 pages
SAE J1173 1977 - Vetro Shot Peening
No ratings yet
SAE J1173 1977 - Vetro Shot Peening
5 pages
Iso 18211 2016
No ratings yet
Iso 18211 2016
9 pages
Ionospheric Radio Wave Guide
No ratings yet
Ionospheric Radio Wave Guide
29 pages
Full
0% (1)
Full
909 pages
MERCOSUR Metrology Regulation
No ratings yet
MERCOSUR Metrology Regulation
6 pages
Home Package Physics Form Five 24/03/2020 Answer All Questions
100% (1)
Home Package Physics Form Five 24/03/2020 Answer All Questions
26 pages
General Climatology
No ratings yet
General Climatology
43 pages
(61B SP25) Lecture 40 - Summary
No ratings yet
(61B SP25) Lecture 40 - Summary
37 pages
Technical Data Sheet B38G 220-240V 50/60Hz 1 R134a: Compressor Model Voltage Refrigerant
No ratings yet
Technical Data Sheet B38G 220-240V 50/60Hz 1 R134a: Compressor Model Voltage Refrigerant
4 pages
Lexical Semantics
No ratings yet
Lexical Semantics
32 pages
Advanced Calculus Functions
No ratings yet
Advanced Calculus Functions
7 pages
Install OpenMPI in Linux
No ratings yet
Install OpenMPI in Linux
5 pages
Unit 1 - Basic Concepts - FD3404 - Principles of Thermodynamics
No ratings yet
Unit 1 - Basic Concepts - FD3404 - Principles of Thermodynamics
28 pages
Smart Memory Alloys: Asim Rahimatpure
No ratings yet
Smart Memory Alloys: Asim Rahimatpure
3 pages
Chapter 5 Rev
No ratings yet
Chapter 5 Rev
5 pages
Amplitude Modulation Principles
No ratings yet
Amplitude Modulation Principles
87 pages
SAP Storage Bin Management Guide
No ratings yet
SAP Storage Bin Management Guide
10 pages
NS18steeldesign 3 PDF
No ratings yet
NS18steeldesign 3 PDF
51 pages
New Energy Technologies Issue 12
0% (1)
New Energy Technologies Issue 12
81 pages
ML Unit 2
No ratings yet
ML Unit 2
71 pages
12c Bus Protocol
100% (1)
12c Bus Protocol
33 pages
Orchid Spike Pruning Guide
No ratings yet
Orchid Spike Pruning Guide
6 pages
Singer A Phragmentary Tablet Bearing An Unknowwn Script, Aphek II, 2009
No ratings yet
Singer A Phragmentary Tablet Bearing An Unknowwn Script, Aphek II, 2009
13 pages
Database Structure of Accounting Systems
67% (3)
Database Structure of Accounting Systems
4 pages

E-Commerce Customer Segmentation

Uploaded by

E-Commerce Customer Segmentation

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Customer Segmentation in E- Commerce

Article · July 2021

The user has requested enhancement of the downloaded file.

Keywords: Segmentation, KNN, Random Forest, Gradient Boost

Fig 1: System architecture for customer segmentation.

3.1 Data Preprocessing:

3.2 Classifying customers using unsupervised learning:

3.3 Steps in supervised learning:

Fig 4: Learning curve for Random Forest.

Fig 5: Learning curve for Gradient Boosting

Table 1: Training and testing accuracy

CLASSFIER TRAINING ACCURACY TESTING ACCURACY

You might also like