Zara
Zara
2.a) Businesses can use “Clustering algorithms” for market segmentation. Explain the
concept of clustering in data analytics and identify two more potential business use
cases of clustering (3marks)
- self descriptive
Introduction
When a population or set of data points is clustered, the data points in the same group are
more similar to one another than the data points in different groups. To put it another way,
the goal is to sort people into groups based on shared characteristics.
Let's use an example to better comprehend what I'm saying. If you're the manager of a rental
store, you may want to learn more about your customers' tastes in order to grow your
business. What if you were able to look at the specifics of each customer and come up with a
customized business plan for each one? Not at all. For example, you can group your
customers into 10 different categories, each with a different strategy for customers in each of
these categories. The term for this is "clustering."
2. Types of Clustering
Broadly speaking, clustering can be divided into two subgroups :
Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or
not. For example, in the above example each customer is put into one group out of the 10
groups.
Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a
probability or likelihood of that data point to be in those clusters is assigned. For example,
from the above scenario each costumer is assigned a probability to be in either of 10 clusters
of the retail store.
3. Types of clustering algorithms
Since the task of clustering is subjective, the means that can be used for achieving this goal
are plenty. Every methodology follows a different set of rules for defining the ‘similarity’
among data points. In fact, there are more than 100 clustering algorithms known. But few of
the algorithms are used popularly, let’s look at them in detail:
Connectivity models:They are named by the idea that the data points closer together in data
space are more comparable to each other than the data points further apart. There are two
ways to approach these models. Using the first method, all data points are first classified into
different clusters, after which they are aggregated as the distance between them reduces. Data
points are grouped together into a single cluster and then partitioned when the distance
between them grows. Distance function selection is a matter of personal preference. Even
while these models are simple to understand, they are not capable of managing large datasets.
Hierarchical clustering and its variants are examples of these models.
Centroid models: Clustering methods that use the centroid of the clusters as a measure of
similarity are called iterative clustering algorithms. Algorithms like the K-Means clustering
method are well-known in this field. Because the number of clusters needed at the end of
these models must be specified in advance, prior knowledge of the dataset is critical.
Iteratively, these models search for the best possible solution.
Models of distribution: Based on the idea of how likely it is that all data points in the cluster
belong to a single distribution, these models use clustering algorithms (For example: Normal,
Gaussian). In many cases, overfitting is a problem with these types of models. The
Expectation-maximization algorithm, which makes use of multivariate normal distributions,
is a well-known illustration of this type of model.
Density Models: These models search the data space for areas of varied density of data
points in the data space. It isolates various different density regions and assign the data points
within these regions in the same cluster. Popular examples of density models are DBSCAN
and OPTICS.
4. K Means Clustering
K means is an iterative clustering algorithm that aims to find local maxima in each iteration.
This algorithm works in these 5 steps :
Specify the desired number of clusters K : Let us choose k=2 for these 5 data points in 2-D
space.
2.Randomly assign each data point to a cluster : Let’s assign three points in cluster 1 shown
using red color and two points in cluster 2 shown using grey color.
3. Compute cluster centroids : The centroid of data points in the red cluster is shown using
red cross and those in grey cluster using grey cross.
4. Re-assign each point to the closest cluster centroid : Note that only the data point at the
bottom is assigned to the red cluster even though its closer to the centroid of grey cluster.
Thus, we assign that data point into grey cluster
5. Re-compute cluster centroids : Now, re-computing the centroids for both the clusters.
'''
5. Hierarchical Clustering
As the name suggests, hierarchical clustering is an algorithm that creates a hierarchy of
clusters. To begin, each data point is assigned to its own cluster in this algorithm. The two
clusters that are the closest to each other are then combined into a single cluster. When there
is only one cluster left, this method comes to an end.
Initially, we have 25 data points, each of which is assigned to a distinct cluster. The two
clusters that are the closest to each other are then combined to form a single cluster at the top.
The distance between two clusters in the data space is represented by the height of the
clusters that are merged in the dendrogram.
The dendrogram can be used to determine the number of clusters that best represent the
various groups. The optimal number of clusters is the number of vertical lines in the
dendrogram cut by a horizontal line that may cross the maximum vertical distance without
overlapping a cluster.
The red horizontal line in the dendrogram below covers the maximum vertical distance AB in
the example above, hence the optimal number of clusters is 4.
Two important things that you should know about hierarchical clustering are:
This algorithm has been implemented above using bottom up approach. It is also possible to
follow top-down approach starting with all data points assigned in the same cluster and
recursively performing splits till each data point is assigned a separate cluster.
The decision of merging two clusters is taken on the basis of closeness of these clusters.
There are multiple metrics for deciding the closeness of two clusters :
Euclidean distance: ||a-b||2 = √(Σ(ai-bi))
Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2)
Manhattan distance: ||a-b||1 = Σ|ai-bi|
Maximum distance:||a-b||INFINITY = maxi|ai-bi|
Mahalanobis distance: √((a-b)T S-1 (-b)) {where, s : covariance matrix}
6. Difference between K Means and Hierarchical clustering
Hierarchical clustering can’t handle big data well but K Means clustering can. This is because
the time complexity of K Means is linear i.e. O(n) while that of hierarchical clustering is
quadratic i.e. O(n2 ).
In K Means clustering, since we start with random choice of clusters, the results produced by
running the algorithm multiple times might differ. While results are reproducible in
Hierarchical clustering.
K Means is found to work well when the shape of the clusters is hyper spherical (like circle in
2D, sphere in 3D).
K Means clustering requires prior knowledge of K i.e. no. of clusters you want to divide your
data into. But, you can stop at whatever number of clusters you find appropriate in
hierarchical clustering by interpreting the dendrogram
7. Applications of Clustering
Clustering has a large no. of applications spread across various domains. Some of the most
popular applications of clustering are:
Recommendation engines
Market segmentation
Social network analysis
Search result grouping
Medical imaging
Image segmentation
Anomaly detection
library('randomForest')
library('Metrics')
#set random seed
set.seed(101)
#loading dataset
data<-read.csv("train.csv",stringsAsFactors= T)
#checking dimensions of data
dim(data)
## [1] 3000 101
#specifying outcome variable as factor
data$Y<-as.factor(data$Y)
#dividing the dataset into train and test
train<-data[1:2000,] test<-data[2001:3000,]
#applying randomForest model_rf<-randomForest(Y~.,data=train)
preds<-predict(object=model_rf,test[,-101])
table(preds)
## preds
## -1 1
## 453 547
#checking accuracy
auc(preds,test$Y)
## [1] 0.4522703
So, the accuracy we get is 0.45. Now let’s create five clusters based on values of independent
variables using k-means clustering and reapply randomforest.
<textarea id=all<-rbind(train,test)
preds2<-predict(object=model_rf,test[,-101])
table(preds2)
## preds2
## -1 1
##548 452
auc(preds2,test$Y)
## [1] 0.5345908
Whoo! In the above example, even though the final accuracy is poor but clustering has given
our model a significant boost from accuracy of 0.45 to slightly above 0.53.
This shows that clustering can indeed be helpful for supervised machine learning tasks.
End Notes
In this article, we have discussed what are the various ways of performing clustering. It find
applications for unsupervised learning in a large no. of domains. You also saw how you can
improve the accuracy of your supervised machine learning algorithm using clustering.
Although clustering is easy to implement, you need to take care of some important aspects
like treating outliers in your data and making sure each cluster has sufficient population.
These aspects of clustering are dealt in great detail in this article.
2.b) Compare and contrast hierarchical clustering with k-means clustering
- self descriptive
k-means is a cluster analysis approach that uses a predetermined number of clusters.
Preparation in 'K' is required.
One of the methods of cluster analysis, known as hierarchical clustering or hierarchical
cluster analysis (HCA), attempts to establish a hierarchy of clusters without a predefined
number of clusters.
Main differences between K means and Hierarchical Clustering are:
To locate the mutually exclusive cluster of Hierarchical methods can be either divisive
the spherical shape using k-means, the or agglomerative.
approach uses a pre-specified number of
clusters.
It is necessary to have an idea of the number Using the dendrogram to analyze the
of clusters you want to divide your data into clusters, one can stop the hierarchical
in order to perform clustering. clustering process at any number of clusters
that seem acceptable.
One can use median or mean as a cluster Agglomerative methods begin with 'n'
centre to represent each cluster. clusters and merge them gradually until only
one cluster remains.
For really large datasets, the methods When using a dividing method, the records
utilized are more efficient in terms of are divided in the other direction, starting
computation. with one large cluster. When the goal is to
arrange the clusters in a natural order,
hierarchical approaches are extremely
beneficial.
K Means clustering, since one start with In Hierarchical Clustering, results are
random choice of clusters, the results reproducible in Hierarchical clustering
produced by running the algorithm many
times may differ.
K- means clustering a simply a division of A hierarchical clustering is a set of nested
the set of data objects into non-overlapping clusters that are arranged as a tree.
subsets (clusters) such that each data object
is in exactly one subset).
K Means clustering works best when the Hierarchical clustering don’t work as well
clusters are hyper spherical in structure (like as, k means when the shape of the clusters
circle in 2D, sphere in 3D). is hyper spherical.
Data Preprocessing
Data preprocessing is the process of transforming raw data into an understandable format. It
is also an important step in data mining as we cannot work with raw data. The quality of the
data should be checked before applying machine learning or data mining algorithms.
Why is Data preprocessing important?
Preprocessing of data is mainly to check the data quality. The quality can be checked by the
following
Dimensionality reduction: This process is necessary for real-world applications as the data
size is big. In this process, the reduction of random variables or attributes is done so that the
dimensionality of the data set can be reduced. Combining and merging the attributes of the
data without losing its original characteristics. This also helps in the reduction of storage
space and computation time is reduced. When the data is highly dimensional the problem
called “Curse of Dimensionality” occurs.
Numerosity Reduction: In this method, the representation of the data is made smaller by
reducing the volume. There will not be any loss of data in this reduction.
Data compression: The compressed form of data is called data compression. This
compression can be lossless or lossy. When there is no loss of information during
compression it is called lossless compression. Whereas lossy compression reduces
information but it removes only the unnecessary information.
Data Transformation:
The change made in the format or the structure of the data is called data transformation. This
step can be simple or complex based on the requirements. There are some methods in data
transformation.
Smoothing: With the help of algorithms, we can remove noise from the dataset and helps in
knowing the important features of the dataset. By smoothing we can find even a simple
change that helps in prediction.
Aggregation: In this method, the data is stored and presented in the form of a summary. The
data set which is from multiple sources is integrated into with data analysis description. This
is an important step since the accuracy of the data depends on the quantity and quality of the
data. When the quality and the quantity of the data are good the results are more relevant.
Discretization: The continuous data here is split into intervals. Discretization reduces the
data size. For example, rather than specifying the class time, we can set an interval like (3
pm-5 pm, 6 pm-8 pm).
Normalization: It is the method of scaling the data so that it can be represented in a smaller
range. Example ranging from -1.0 to 1.0.
import pandas as pd
import numpy as np
dataset = pd.read_csv('Datasets.csv')
print (data_set)
x_test= st_x.transform(x_test)
2.d) Perform hierarchical clustering on the given dataset and produce a dendrogram
- Provide relevant well commented code snippets, details of used package(s) and
method(s) along
with appropriate visualisation in your report.
Introduction
In any business, it is critical to have a grasp on how customers behave. In the last year, my
chief marketing officer approached me and said, "Can you tell me which current customers
should we target for our new product?"
For me, that was a steep learning curve. As a data scientist, I rapidly learned the importance
of segmenting clients in order to adapt and design targeted strategies for my firm. Using
clustering proved to be a lifesaver here!
Problems like client segmentation can be surprisingly difficult to solve because we don't have
a specific goal in mind. A new era of learning has arrived in which we must discover patterns
and structures without any predetermined goal in mind. As a data scientist, it's both tough and
exhilarating.
Table of Contents
Supervised vs Unsupervised Learning
Why Hierarchical Clustering?
What is Hierarchical Clustering?
Types of Hierarchical Clustering
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
Steps to perform Hierarchical Clustering
How to Choose the Number of Clusters in Hierarchical Clustering?
Solving a Wholesale Customer Segmentation Problem using Hierarchical Clustering
The dependent or target variable is y, and the independent variables are represented by X.
The target variable is referred to as a dependent variable because it is linked to X. The term
"supervised learning" refers to the fact that we train our model using the independent
variables under the supervision of the target variable.
It is our goal to create a function that maps the independent variables to our desired target
when we train the model When a model has been trained, additional sets of observations can
be fed into it, and the model will anticipate the target. In a nutshell, this is a form of
supervised education.
There may be times when we don't know what we're trying to forecast. The term
"unsupervised learning" refers to situations in which there is no clear target variable.
Problems like these solely deal with independent variables and no dependent factors.
We try to divide the entire data into a set of groups in these cases. These groups are known as
clusters and the process of making these clusters is known as clustering.
But there are certain challenges with K-means. It always tries to make clusters of the same
size. Also, we have to decide the number of clusters at the beginning of the algorithm.
Ideally, we would not know how many clusters should we have, in the beginning of the
algorithm and hence it a challenge with K-means.
we have the below points and we want to cluster them into groups:
Now, based on the similarity of these clusters, we can combine the most similar clusters
together and repeat this process until only a single cluster is left:
We are essentially building a hierarchy of clusters. That’s why this algorithm is called
hierarchical clustering. I will discuss how to decide the number of clusters in a later section.
For now, let’s look at the different types of hierarchical clustering.
Then, at each iteration, we merge the closest pair of clusters and repeat this step until only a
single cluster is left:
We are merging (or adding) the clusters at each step, right? Hence, this type of clustering is
also known as additive hierarchical clustering.
Divisive Hierarchical Clustering
Divisive hierarchical clustering, on the other hand, operates the other way around. It is
preferable to start with a single cluster and assign all points to that cluster rather than having
n separate clusters.
Now, at each iteration, we split the farthest point in the cluster and repeat this process until
each cluster only contains a single point:
Steps to Perform Hierarchical Clustering
Hierarchical clustering combines the most comparable points or clusters, and we are well
aware of this. Now, how do we determine which points are similar and which are different?
It's one of the most pressing issues in clustering!
Take the distance between the clusters' centroids to calculate similarity. It is possible to
merge the points with the smallest distance between them, referred to as comparable points.
This algorithm can also be referred to as a distance-based one (since we are calculating the
distances between the clusters).
A proximity matrix is a key concept in hierarchical clustering. The distances between the
points are saved here. This matrix and the processes to execute hierarchical clustering can be
better understood with an example.
Let's say a teacher wishes to create separate groups for her students. Each student's grade on
an assignment has been tabulated, and she'd like to divide the students into groups depending
on their grades. There's no set number of groups to form here. A supervised learning problem
cannot be solved since the teacher does not know which pupils should be assigned to which
group. As a result, we'll try to perform hierarchical clustering and divide the pupils into
several subgroups.
Let’s take a sample of 5 students:
On the x-axis, we have the dataset's samples, and on the y-axis, we have the distance. A join
will be made in this dendrogram every time two clusters are merged, and the height of that
join will be proportional to how far apart these points are. Let's create a dendrogram for the
sake of demonstration:
Spend a few seconds contemplating the image to the left. Merging samples 1 and 2 yielded
sample 3 with a distance of 3. (refer to the first proximity matrix in the previous section).
Consider the following data and plot it in a dendrogram:
As you can see, sample 1 and sample 2 are blended here. The distance between these samples
is shown by a vertical line on the graph. Dendrograms can be created by plotting each step in
which groups of data are combined.
Hierarchical clustering's stages are easily discernible. Clusters are further apart if the vertical
lines in the dendrogram are further apart.
Setting a threshold distance and drawing a horizontal line are now possible options
(Generally, we try to set the threshold in such a way that it cuts the tallest vertical line). Set
the bar at 12, and then draw a horizontal line through it:
It is the amount of intersected vertical lines that determines how many clusters are formed.
There will be two clusters in the example above due to the red line intersecting two vertical
lines in the picture. There will be a sample in one cluster (1,2,4) and a sample in the other
cluster (2,3,4,5). (3,5). Isn't it obvious where this is going?
Hierarchical Clustering uses a dendrogram to determine the number of clusters. In the
following section, we'll use hierarchical clustering to help you better comprehend the topics
we've covered in this post, so stay tuned!
The x-axis contains the samples and y-axis represents the distance between these samples. The
vertical line with maximum distance is the blue line and hence we can decide a threshold of 6 and cut
the dendrogram:
plt.figure(figsize=(10, 7))
plt.title("Dendrograms")
dend = shc.dendrogram(shc.linkage(data_scaled, method='ward'))
plt.axhline(y=6, color='r', linestyle='--')
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
cluster.fit_predict(data_scaled)
We can see the values of 0s and 1s in the output since we defined 2 clusters. 0 represents the
points that belong to the first cluster and 1 represents points in the second cluster. Let’s now
visualize the two clusters:
plt.figure(figsize=(10, 7))
plt.scatter(data_scaled['Milk'], data_scaled['Grocery'], c=cluster.labels_)
2.e) Using the cluster dendrogram output of task 2.c and any other suitable algorithmic
procedure(s)
argue what is the ideal number of clusters to be made.
- Provide relevant well commented code snippets, details of used package(s) and
method(s) along
with appropriate visualisation and/or output in your report.
A fundamental step for any unsupervised algorithm is to determine the optimal number of
clusters into which the data may be clustered. The Elbow Method is one of the most popular
methods to determine this optimal value of k.
We now demonstrate the given method using the K-Means clustering technique using the
Sklearn library of python.
Step 1: Importing the required libraries
distortions = []
inertias = []
mapping1 = {}
mapping2 = {}
K = range(1, 10)
for k in K:
# Building and fitting the model
kmeanModel = KMeans(n_clusters=k).fit(X)
kmeanModel.fit(X)
distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_,
'euclidea
n'), axis=1)) / X.shape[0])
inertias.append(kmeanModel.inertia_)
It is necessary to choose the "elbow" value of k, i.e. the point at which distortion and inertia
begin to decrease linearly, in order to identify the best number of clusters. In other words,
based on the provided data, we can safely say that the ideal number of clusters is 3.
Variations in the value of k result in the following clustered data:
k=1
2. k = 2
3. k = 3
4. k = 4
2.f) Suppose your marketing department requires you to segment the given dataset in to
4 clusters,perform the clustering as required and produce a segment profile for each
cluster with appropriate visualisations and descriptions.
-Provide with all important and well commented code snippets, column averages for
each segment,appropriate visualisation(s) in your report and write a segment profile
(for all 4 segments) in clearbusiness language. State any underlying assumptions made
towards writing these segment profiles.
The separation of a market into distinct client groups with comparable characteristics is
known as customer segmentation. Unmet client needs can be discovered through the use of
customer segmentation. With the information provided above, businesses may create
products and services that stand out from the crowd.
Customer groups can be divided in a variety of ways, with the most prevalent being:
Demographic information, such as gender, age, familial and marital status, income,
education, and occupation.
Geographical information, which differs depending on the scope of the company. For
localized businesses, this info might pertain to specific towns or counties. For larger
companies, it might mean a customer’s city, state, or even country of residence.
Psychographics, such as social class, lifestyle, and personality traits.
Behavioral data, such as spending and consumption habits, product/service usage, and
desired benefits.
Advantages of Customer Segmentation
Determine appropriate product pricing.
Develop customized marketing campaigns.
Design an optimal distribution strategy.
Choose specific product features for deployment.
Prioritize new product development efforts.
The Challenge
As the owner of a grocery mall, you are privy to basic information on your clients, such as
their Customer ID (as well as their gender, annual income, and spending score). For the
marketing team, you want to obtain a sense of who your target clients are so that they can
plan accordingly.
K Means Clustering Algorithm
Specify number of clusters K.
Initialize centroids by first shuffling the dataset and then randomly selecting K data points for
the centroids without replacement.
Keep iterating until there is no change to the centroids. i.e assignment of data points to
clusters isn’t changing.
Data
This project is a part of the Mall Customer Segmentation Data competition held on Kaggle.
The dataset can be downloaded from the kaggle website which can be found here.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("../input/customer-segmentation-tutorial-in-python/Mall_Customers.csv")
df.head()
dropped the id column as that does not seem relevant to the context. Also I plotted the age
frequency of customers.
df.drop(["CustomerID"], axis = 1, inplace=True)
plt.figure(figsize=(10,6))
plt.title("Ages Frequency")
sns.axes_style("dark")
sns.violinplot(y=df["Age"])
plt.show()
box plot of spending score and annual income to better visualize the distribution range. The
range of spending score is clearly more than the annual income range
plt.figure(figsize=(15,6))
plt.subplot(1,2,1)
sns.boxplot(y=df["Spending Score (1-100)"], color="red")
plt.subplot(1,2,2)
sns.boxplot(y=df["Annual Income (k$)"])
plt.show()
Checked male and female population distribution by drawing a bar graph. There are far more
women than men in the world.
genders = df.Gender.value_counts()
sns.set_style("darkgrid")
plt.figure(figsize=(10,4))
sns.barplot(x=genders.index, y=genders.values)
plt.show()
made a bar plot to check the distribution of number of customers in each age group. Clearly
the 26–35 age group outweighs every other age group.
age18_25 = df.Age[(df.Age <= 25) & (df.Age >= 18)]
age26_35 = df.Age[(df.Age <= 35) & (df.Age >= 26)]
age36_45 = df.Age[(df.Age <= 45) & (df.Age >= 36)]
age46_55 = df.Age[(df.Age <= 55) & (df.Age >= 46)]
age55above = df.Age[df.Age >= 56]
x = ["18-25","26-35","36-45","46-55","55+"]
y=[len(age18_25.values),len(age26_35.values),len(age36_45.values),len(age46_55.values),len(age
55above.values)]
plt.figure(figsize=(15,6))
sns.barplot(x=x, y=y, palette="rocket")
plt.title("Number of Customer and Ages")
plt.xlabel("Age")
plt.ylabel("Number of Customer")
plt.show()
made a bar graph to show how many clients there were based on how much money they
spent. Approximately four-fifths of our clients have a spending score between 41 and 60.
ss1_20 = df["Spending Score (1-100)"][(df["Spending Score (1-100)"] >= 1) &
(df["Spending Score (1-100)"] <= 20)]
ss21_40 = df["Spending Score (1-100)"][(df["Spending Score (1-100)"] >= 21) &
(df["Spending Score (1-100)"] <= 40)]
ss41_60 = df["Spending Score (1-100)"][(df["Spending Score (1-100)"] >= 41) &
(df["Spending Score (1-100)"] <= 60)]
ss61_80 = df["Spending Score (1-100)"][(df["Spending Score (1-100)"] >= 61) &
(df["Spending Score (1-100)"] <= 80)]
ss81_100 = df["Spending Score (1-100)"][(df["Spending Score (1-100)"] >= 81) &
(df["Spending Score (1-100)"] <= 100)]
ssx = ["1-20", "21-40", "41-60", "61-80", "81-100"]
ssy = [len(ss1_20.values), len(ss21_40.values), len(ss41_60.values), len(ss61_80.values),
len(ss81_100.values)]
plt.figure(figsize=(15,6))
sns.barplot(x=ssx, y=ssy, palette="nipy_spectral_r")
plt.title("Spending Scores")
plt.xlabel("Score")
plt.ylabel("Number of Customer Having the Score")
plt.show()
plt.figure(figsize=(15,6))
sns.barplot(x=aix, y=aiy, palette="Set2")
plt.title("Annual Incomes")
plt.xlabel("Income")
plt.ylabel("Number of Customer")
plt.show()
plotted Within Cluster Sum Of Squares (WCSS) against the the number of clusters (K Value)
to figure out the optimal number of clusters value. WCSS measures sum of distances of
observations from their cluster centroids which is given by the below formula.
where Yi is centroid for observation Xi. The main goal is to maximize number of clusters and
in limiting case each data point becomes its own cluster centroid.
The Elbow Method
The steps can be summarized in the below steps:
produced a 3D plot of the customer's annual income and expenditure score to show the
correlation. In the 3D plot, the data points are divided into five categories, each of which is
represented by a different color.
km = KMeans(n_clusters=5)
clusters = km.fit_predict(df.iloc[:,1:])
df["label"] = clusters
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df.Age[df.label == 0], df["Annual Income (k$)"][df.label == 0], df["Spending
Score (1-100)"][df.label == 0], c='blue', s=60)
ax.scatter(df.Age[df.label == 1], df["Annual Income (k$)"][df.label == 1], df["Spending
Score (1-100)"][df.label == 1], c='red', s=60)
ax.scatter(df.Age[df.label == 2], df["Annual Income (k$)"][df.label == 2], df["Spending
Score (1-100)"][df.label == 2], c='green', s=60)
ax.scatter(df.Age[df.label == 3], df["Annual Income (k$)"][df.label == 3], df["Spending
Score (1-100)"][df.label == 3], c='orange', s=60)
ax.scatter(df.Age[df.label == 4], df["Annual Income (k$)"][df.label == 4], df["Spending
Score (1-100)"][df.label == 4], c='purple', s=60)
ax.view_init(30, 185)
plt.xlabel("Age")
plt.ylabel("Annual Income (k$)")
ax.set_zlabel('Spending Score (1-100)')
plt.show()
Results
Conclusions
One of the most commonly used clustering methods, K-means clustering (K-means), is
frequently the initial step in addressing clustering assignments. The purpose of K means is to
separate data points into discrete, non-overlapping clusters for subsequent analysis and
interpretation. One of the most common uses of K means clustering is customer
segmentation, which may be used to boost a company's profits by better knowing its
customers.