CUSTOMER SEGMENTATION USING MACHINE
LEARNING
Project Report submitted in partial fulfilment of the requirements for the course of
Project Management
( MGT 1034 )
submitted by
KRITHIKKA JAYAMURTHI(20BEE1053)
NAGAVINDHYA(20BEE1022)
FALL SEMESTER 2022-2023
DECLARATION
We hereby declare that the project entitled “Title of the project” submitted by us
in partial fulfillment of the requirements for the course “Project Management-
MGT1034”, is a record of bonafide work carried out by us under the course
faculty of Dr.Meenakshi Sankaran. We further declare that the work reported in
this project has not been submitted and will not be submitted, either in part or in
full, for the any other course of this institute or of any other institute or
university.
Signature
KRITHIKKA JAYAMURTHI(20BEE1053)
Signature
NAGAVINDHYA(20BEE1022)
CERTIFICATE
The project report entitled “Customer segmentation using machine learning” is pre-
pared and submitted by Krithikka Jayamurthi(Register number:20BEE1053),and
Nagavindhya(Register number:20BEE1022). It has been found satisfactory in terms
of scope, quality and presentation as partial fulfillment of the course “ Project Manage-
ment-MGT1034” in VIT University, Chennai Campus,India.
(Name & Signature of the Course Faculty)
ACKNOWLEDGEMENT
The success and final outcome of the internship required a lot of guidance and assistance
from many people and I am extremely fortunate to receive such help throughout the
internship. Whatever I have done is only due to such guidance and assistance and I
would not forget to thank them.
I take this wonderful opportunity for thanking all those who have extended a helping
hand to carry out this project . I would like to express my special gratitude to my teacher
for helping me in learning project management . Their guidance has helped me to know
the easy and every things about project management in a effective manner.
I owe my profound gratitude to the dean and H.O.D for allowing me to do this project. I
extend my heartful thanks to my parents and friends for providing their support in this
experimental endeavor of mine
ABSTRACT
Nowadays online business is very popular and, in this case, this online marketing is
essential to hold, attract the customers, but during this, considering all customers as
same and targeting all of them with similar marketing strategy is not very efficient way
rather it's also annoying the customers by neglecting his or her individuality, so
customer segmentation is becoming very popular and also became the efficient solution
for this existing problem. Customer segmentation is defined as dividing company’s
customers on the basis of demographic (age, gender, marital status) and behavioural
(types of products ordered, annual income) aspects. Since demographic characteristics
does not emphasize on individuality of customer because same age groups may have
different interests so behavioural aspects is a better approach for customer
segmentation as its focus on individuality and we can do proper segmentation with the
help of it.Based on the survey taken by customers we can segregate the customer based
on their interest we can fulfil their needs accordingly. Customers can be classified based
on the geographic, demographic, behavioural and psychological parameters. In
geographic parameter, the customer is segmented based on the location, like if a
customer is located within the city where the shop is there, the owner can deliver the
order to him in faster rate than customer who is far away from the shop. In demographic
parameter customers can be segmented based on their gender so that we can to
marketing well like if you are female then some things related to women accessories will
be displayed in the online shopping and shown in notification as well. In behavioural
parameter customers order history is analysed and based on their order products are
suggested. In psychological parameters customers can be segregated based on their
attitudes which is took in survey.
Machine learning methodologies are a great tool for analysing customer data and finding
insights and patterns. Artificially intelligent models are powerful tools for decision-
makers. They can precisely identify customer segments, which is much harder to do
manually or with conventional analytical methods. One very common machine learning
algorithm that’s suitable for customer segmentation problems is the k-means clustering
algorithm. There are other clustering algorithms as well such as DBSCAN,
Agglomerative Clustering, and BIRCH, etc. Unlike supervised learning algorithms, K-
means clustering is an unsupervised machine learning algorithm. This algorithm is used
when we have unlabelled data. Our customer segmentation data is like this for this
problem. The algorithm discovers groups (cluster) in the data, where the number of
clusters is represented by the K value. Customers have different needs. A one-size-for-
all approach to business will generally result in less engagement, lower-click through
rates, and ultimately fewer sales. Customer segmentation is the cure for this problem.
Finding an optimal number of unique customer groups will help one can understand how
customers differ, and help us give them exactly what they want. Customer segmentation
improves customer experience and boosts company revenue. That’s why segmentation is
a must if someone want to surpass their competitors and get more customers. It is best
way do it with machine learning.
Introduction
Customer segmentation simply means grouping your customers according to various
characteristics (for example grouping customers by age).
It’s a way for organizations to understand their customers. Knowing the differences
between customer groups, it’s easier to make strategic decisions regarding product
growth and marketing.
The opportunities to segment are endless and depend mainly on how much customer
data you have at your use. Starting from the basic criteria, like gender, hobby, or age, it
goes all the way to things like “time spent of website X” or “time since user opened our
app”.
There are different methodologies for customer segmentation, and they depend on four
types of parameters:
Geographic customer segmentation is very simple, it’s all about the user’s location. This
can be implemented in various ways. You can group by country, state, city, or zip code.
Demographic segmentation is related to the structure, size, and movements of customers
over space and time. Many companies use gender differences to create and market
products. Parental status is another important feature. You can obtain data like this from
customer surveys.
Behavioral customer segmentation is based on past observed behaviors of customers that
can be used to predict future actions. For example, brands that customers purchase, or
moments when they buy the most. The behavioral aspect of customer segmentation not
only tries to understand reasons for purchase but also how those reasons change
throughout the year.
Psychological segmentation of customers generally deals with things like personality
traits, attitudes, or beliefs. This data is obtained using customer surveys, and it can be
used to gauge customer sentiment.
Nature of the project (Industrial / Developmental)
Our project is development project,
The goal of development projects is to help improve people's lives through skills
training and other livelihood programs. Development organizations prepare and
implement development projects and work to strengthen the capabilities of local
institutional and promote community self-reliance through sustainable.
Importance of the project in hand
Nowadays online shopping is very popular in India and more than 1 percent of our
India’s population doing online shopping and there are wide range of websites are
doing online shopping to fullfill the customer needs in attractive and profitable
manner. As lockdown came in India at 2020 , online shopping reached its peak and
gained more customers in all aspects.
But more online platforms has been developed for doing shopping so the
competition would be heavy and many of them would try to get more customers for
their profitable growth so customer segmentation method can be used to attract new
customers and retaining old customers based on the understanding of their needs and
fulfilling it as it makes easy to understand customer.
Customer segmentation plays an important role in improving market efficiency of
the company as we can segregate large customers into small groups and target the
customer based on the amount of money they are spending and how much they are
using their e-platform for shopping so that the company won’t waste money
unnecessarily as it targets on the frequent customers.
As online shopping is done across the globe , even foreign customers will be
interested to buy in indian websites. So customer segmentation plays an vital role in
expanding the network across the globe by meeting the demands of the foreign
people according to their interested Indian products so that currency conversion rate
would be high.
Customer segmentation plays an crucial role in designing the product or service
according to the needs and benefits of people can increase the demand of those
product or service among the people. if we doesnt do according to the customer then
everything will go to the lossess.
Challenges
Marketing segmentation is incredibly useful,though have some challenges to face as
cost,customers who belong to different segments,precise segmentation,giving
importance to to right segment,integrating that segmentation with the
company.costly procedure because we invest on different parts of campaign aimed to
the single market.Based on the environment people choice differ so the customers
fall into both segments which is something deal about while making set of data.so
that we can get precise data.one should start with stakeholder interviews to make
secure but-in projects.
Beneficiaries
By using customer segmentation we can retain old customers like suggesting them
the product again which they bought on their previous shopping or similar kind of
product which they earlier with discounts so that we can make them to shop in our
service. For example if a girl is buying a salwar kameez in a online shopping, after
one month we can suggest same type of salwar kameez or the dress similar to salwar
kameez like kurtis and providing them with discounts ( like 10%).
By providing the product based the economic status of the frequent people like
selling the product for cheap rate and at great quality, most of the customer will go
for it. Because of this the company can gain new customers in their business and be a
well competitor in the business market and the brand of the company will get
established automatically and the customers will give tell their name more frequently
rather than their service or product. For example Fevicol is a glue which expanded
its market in India in such a way that most the customer tell the word fevicol instead
of glue.
Fig 1 explains about the benefits of customer segmentation in an app. It shows that
we using customer segmentation they can invite more customers into their market
such that the increase in open rate for apps will be 85 percent as it will leave
curiosity in the mind of the customer about the offers and festive seasons. It
increases the customer satisfaction as we are segregating customers into small
groups and making their demands on time it terms of price,quality,quantity, delivery
time,discounts,offers,etc. There is a increase in the returning website traffic to 38
percent which is bit less as we are doing benefits to the customer to 95 percent and
their resolves are solved on time. Decrease in unsubscribe rate is 38 percent as the
customers want to see any offers or discounts by app through mail bcause they get
most of the products of their interest so they don't want to miss it.
Uniqueness of the Proposed Project
Machine learning is very useful for understanding the patterns and insights of the
particular data . Decision makers prefer to use artificial intelligent models.There are
many machine learning algorithms but we used k-means clustering
algorithm.Generally customer segmentation is used to be done manually and its not
precise so we use the algorithm which if famous and easy to understand .
Statement of Project (Sop)
Company Name: Spectrum software solution
Project Managers: Nagavindhya
Team mates: Krithikka Jayamurthi
Date of project proposal:1st November 2022
Starting date of the project:1st December 2022
Ending date of the project:2nd June 2023
Cost estimated for the project: Rs 40 Lakhs(approx)
Project scope: This project can be used to segregate the customers based on the
criteria like birthday dates, type of phone used, their location, gender etc so that
company can gain more profit than using their regular method to attract customers.
TEFR
Projected future sales revenue
Fig2: some x company in uk has 83% of its revenue come from uk itself.
Fig3:Monthly revenue of the online retail company One can observe the raise of revenue
from nov 2010 followed by nov 2011 and raise in august
Quality and quantity of the raw materials
Plant Siting
Location & Infrastructure
Requirement of manpower and its costing
One can observe the raise of revenue from nove 2010 followed by nov 2011 and
a raise in august.This is because of Data Wrangling ,Feature Engineer-
ing ,Building Machine Learning Models ,Selecting Model.
Quality and quantity of the raw materials
Customer ID – This is the id of a customer for a particular business.
Products Purchased – This feature represents the number of products purchased
by a customer in a year.
Complaints – This column value indicates the number of complaints made by the
customer in the last year
Money Spent – This column value indicates
Fig4: sample customer data
Location & Infrastructure
Bangalore in India, San Francisco, United states
Requirement of manpower and its costing
Require ml engineers ,technical team, HR team
So lets have an example,
To find company’s number of employees: 3 minutes per data point x 100 customers
= approximately 5 hours
To find company’s revenues: 4 minutes per data point x 100 customers = approximately
6.5 hours
Either task can be completed by an intern for approximately $75 to $130 assuming they
earn between $15 to $20 per hour.
Project Starting and Completion Duration, Time
Starting date:1.12.2022
completion date: 2.6.2023
Time: 6 to 7 months approximately
Total Project Cost Estimation
Estimation different groups of people or organizations your
enterprise aims to reach and serve. This includes users who might not
generate revenues, but which are necessary for the business model to
work.
There are different models exist for estimating so we using one of the
model called learning curve.
Lets take example, Learning curve cost estimating is based on the assumption
that as a particular task is repeated, the operator systematically becomes quicker
at performing the task. In particular, the model is based on the assumption that
the time required to complete the task for production unit 2x is a fixed
percentage of the time required for production unit x for all positive, integer x.
The learning curve slope indicates "how fast" learning occurs. For example, a
learning curve rate of 70% represents much faster learning than a rate of 90%. If
an operator exhibits learning on a certain task at a rate of 70%, the time required
to complete production unit 50, for example, is only 70% of the time required to
complete unit 25.
Let b = learning curve exponent
= log (learning curve rate in decimal form) / log 2.0
Then TN = time estimate for unit N (N = 1, 2, ...)
= (T1) (N)b
where T1 is the time required for unit 1.
As an example: A learning curve rate is 70%, the operator’s time for the
firstt unit is 65 seconds. What is the operator’s time for the 50th unit?
b = log (0.70) / log 2.0 = -0.5145
T100 = T1 * (50) ^ b = 65 * (50) ^ -0.5145 = 8.68 min
Vendors Details
1. Name of the company: Amrita Fashions
Name of the HR Manager: V Ananth
Address: Plot no 23, Sarojini Naidu street, New Delhi-1
Account No: 234510849001
Phone no: 9840223657
2. Name of the company : AB electronics
Name of HR Manager: R.Arun
Address: Plot no 17, Lakshmi road, Visakhapatnam-108
Account No:109238376459
Phone no: 9812873476
3. Name of the company: Ravi Garments
Name of the HR Manager: Karthik G
Address: Plot no 5, Amman street, Thirupur-78
Account No:23874567102
Phone no:9218376654
Work breakdown Structure
A work breakdown structure (WBS) is a scope management process that is entirely
deliverable-oriented. It is based on the order of tasks that must be completed to
eventually arrive at the final product. The work breakdown structure aims to keep all
project members on task and focused on the project's purpose.
WBS Dictionary
Project Network Diagram (Phase Wise / Design Phase)
Fig4:use case diagram
Description of Dependencies
Machine learning methodologies are a great tool for analyzing customer data
and finding insights and patterns. Artificially intelligent models are powerful
tools for decision-makers. They can precisely identify customer segments,
which is much harder to do manually or with conventional analytical
methods.
There are many machine learning algorithms, each suitable for a specific type
of problem. One very common machine learning algorithm that’s suitable for
customer segmentation problems is the k-means clustering algorithm. There
are other clustering algorithms as well such as DBSCAN, Agglomerative
Clustering, and BIRCH, etc.
K-Means clustering is an efficient machine learning algorithm to solve data
clustering problems. It’s an unsupervised algorithm that’s quite suitable for
solving customer segmentation problems. Before we move on, let’s quickly
explore two key concepts
we have to insert these functions in our laptop
import pandas as pd
import NumPy as np
from sklearn.cluster import KMeans
import plotly.express as px
jjimport plotly.graph_objects as go
import matplotlib.pyplot as plt
We have to collect the data and store in excel sheet
customersdata = pd.read_csv("customers-data.csv")
This is the data we are extracting from excel sheet
cluster_centers = kmeans_model_new.cluster_centers_
data = np.expm1(cluster_centers)
points = np.append(data, cluster_centers, axis=1)
points
Events (Estimation Time, Money, Materials, Machines, Software /
Hardware , Human Energy Resource’s Includes Full Time, Part
Time, Outsourcing Labourers If Any)
Require different kind of tech team for analyzing the segmentation,
ml team, design team, etc,different software’s with inbuilt libraries
and have to update the data and customize it . (like ploty ia python
library used for graphing, statistics, plotting and analytics.
Time of 6 months is the initial period to set up. Outsourcing required
for updating the technology in future .
Activities
Customer segmentation plays an important role in improving the business revenue
so we are basically segregating customers based on their birthday dates, Type of
phone used, their locality, area of interest,based on gender, .
Birthdays dates are collected before signing in the website/ app so that before one
month of their birthday we can show them the offers/ discounts on the dresses or
accessories so that they can complete their birthday purchases on the website/app
with free home delivery option.
Customers are segregated based on the mobile phones used. If a person is using
iphone or oneplus we can offer them products which are more expensive or the items
which can be bought in installments and expensive. if a person is using a android
phone we can offer them products based on the price of their mobile phone. so in this
case we are offering products based on their price of the mobile so that we may get
frequent orders.
We can segregate the customers based on their locality, if a person and the shop
from which he ordered the product are in same place, then the priority is given in
such a way that they can get the order fast as expected and if someone is located in
hilly or remote areas, we have to analysis the climatic condition over there so that we
can deliver the product accordingly. Segregating the customers based on the gender
is the most common thing in the present scenario. Like if you are a female, products
are offered on cosmetics, women clothing, etc.
Customer segmentation is done based on the area of interest, from the purchase of
first few orders we can understand his/her interest so that we can recommend them
the products based on their previous orders.
K-Means clustering is an efficient machine learning algorithm to solve data
clustering problems. It’s an unsupervised algorithm that’s quite suitable for solving
customer segmentation problems. Unsupervised machine learning is quite different
from supervised machine learning. It’s a special kind of machine learning algorithm
that discovers patterns in the dataset from unlabelled data.
Unsupervised machine learning algorithms can group data points based on similar
attributes in the dataset. One of the main types of unsupervised models is clustering
models.
Unlike supervised learning algorithms, K-means clustering is an unsupervised
machine learning algorithm. This algorithm is used when we have unlabelled data.
Unlabelled data means input data without categories or groups provided. Our
customer segmentation data is like this for this problem.
The algorithm discovers groups (cluster) in the data, where the number of clusters is
represented by the K value. The algorithm acts iteratively to assign each input data to
one of K clusters, as per the features provided. All of this makes k-means quite
suitable for the customer segmentation problem.
Risk and Uncertainties Expected
The analyzed data can be changed anytime cause customers interest changes
according to environment sot it if often risky to stick to same model and methods ,
have to acquire new ways and technology and update the tools to make the user
satisfy . Sometimes promotion problems, heavy investments lead a lot risk, stock and
storage problems need to accounted.
Contingency Plan
In order to complete this project in a short period of time, we can add new team
members and give them extra work to complete the task. Additional salaries are also
provided for them.We can exclude some old features in our project in order to
complete it on time.
Project Management Software Used
Jupyter software
Jupyter is a project with goals to develop open-source software, open standards, and
services for interactive computing across multiple programming languages.
Safety and Security Measures and Facilities Your Project Provides
We have developed the software in such a way that the data will not get corrupted
and we have also installed a antivirus system to protect our systems against any
malware software. Full security is given to workers like providing them cab for
coming to office and dropping them to the home.
Outcomes / Takeaway / Knowledge / Skills You gained. acquired or
Learnt From This Project
Learned a lot about ml,customer segmenting,various objects,
analaysing data of customers, preferences.
Conclusion
It’s not wise to serve all customers with the same product model, email, text message
campaign, or ad. Customers have different needs. A one-size-for-all approach to
business will generally result in less engagement, lower-click through rates, and
ultimately fewer sales. Customer segmentation is the cure for this problem.
Finding an optimal number of unique customer groups will help you understand how
your customers differ, and help you give them exactly what they want. Customer
segmentation improves customer experience and boosts company revenue. That’s
why segmentation is a must if you want to surpass your competitors and get more
customers. Doing it with machine learning is definitely the right way to go.
❖ References
❖ G.A. Antonides et al.
Consumer Behavior A European Perspective
(1998)
❖ G.R. Bitran et al.
A comparative analysis of decision making procedures in the catalog
sales industry
European Management Journal
(1997)
❖ R.C. Blattberg et al.
Managing marketing by the customer equity test
Harvard Business Review
(1996)
❖ J.R. Bult et al.
Optimal selection for direct mail
Marketing Science
(1995)
❖ S. Chib et al.
Analysis of multivariate probit models
Biometrika
(1998)
❖ J.F. Engel et al.
Consumer Behavior
(1995)
❖ S. Fournier et al.
Preventing the premature death of relationship marketing
Harvard Business Review
(1998)
❖ P.H. Franses
A test for the hit rate in binary response models
International Journal of Market Research
(2000)
❖ A.W.H. Grant et al.
Realize your customers full profit potential
Harvard Business Review
(1995)
❖ W.H. Greene
Econometric Analysis
(1997)