0% found this document useful (0 votes)

6 views48 pages

10 Lecture AI 10

The document discusses unsupervised learning, focusing on clustering, which groups similar data points without predefined classes. It covers various clustering techniques, such as partitional and hierarchical clustering, and their applications in fields like marketing, image compression, and anomaly detection. Additionally, it explains the K-means clustering algorithm, including its steps, centroid initialization, and the importance of quality metrics in evaluating clustering results.

Uploaded by

Azan Afzal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views48 pages

10 Lecture AI 10

Uploaded by

Azan Afzal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Lecture week 10

Unsupervised Learning: Clustering

Unsupervised Learning
 Unsupervised learning differs in that no target variable exists.
 All variables are treated as inputs, and therefore unsupervised learning is
used to find patterns in the data. This allows the data to be reduced and
segmented into its representative classes.
Unsupervised Learning

 Clustering: Grouping similar data points together. For example, clustering

customers into different segments based on their purchasing behavior.

 Dimensionality reduction: Reducing the number of variables in the data

while retaining essential information. Techniques like Principal Component
Analysis (PCA) are used for this purpose.

 Anomaly detection: Identifying unusual or rare data points that do not

conform to the general pattern of the data.
Clustering
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters

 Cluster analysis
 Finding similarities between data according to the characteristics found in the data
and grouping similar data objects into clusters
Cluster Analysis
• Finding groups of objects such that the objects in a
group will be similar (or related) to one another and
different from (or unrelated to) the objects in other
groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Clustering

 Unsupervised scholarship: no predefined classes

 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms
Applications of Clustering

 Customer Segmentation in Marketing(Help marketers discover distinct groups in

their customer bases, and then use this knowledge to develop targeted marketing
programs)
 Image Compression
 Recommendation Systems
 Document Classification
 Anomaly Detection in Fraud Detection
Clustering Applications

 Land use
 Identification of areas of similar
land use in an earth observation
database

 Insurance
 Identifying groups of motor
insurance policy holders with a
high average claim cost
Clustering Applications

 City-planning
 Identifying groups of houses
according to their house type,
value, and geographical location

 Earth-quake studies
 Observed earth quake epicenters
should be clustered along
continent faults
Notion of a Cluster can be Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

10
Quality: What Is Good Clustering?
 A good clustering method will produce high quality
clusters with
 high intra-class similarity
 low inter-class similarity
 The quality of a clustering result depends on both the
similarity measure used by the method and its
implementation
 The quality of a clustering method is also measured by
its ability to discover some or all of the hidden patterns
11
Types of Clustering
Types of Clustering

 Partitional Clustering
 A division data objects into non-overlapping subsets (clusters)
such that each data object is in exactly one subset

 Hierarchical clustering
 A set of nested clusters organized as a hierarchical tree
Partitional Clustering

Original Points A Partitional Clustering

Hierarchical Clustering

p1
p3 p4
p2
p1 p2 p3 p4

Traditional Hierarchical Traditional Dendrogram

Clustering

p1
p3 p4
p2
p1 p2 p3 p4

Non-traditional Hierarchical Non-traditional Dendrogram

Clustering
Hierarchal clustering
E

C D
B

A
Measure the Quality of
Clustering
 Dissimilarity/Similarity metric: Similarity is expressed in terms
of a distance function, typically metric: d(i, j)

 There is a separate “quality” function that measures the

“goodness” of a cluster.
 The definitions of distance functions are usually very different
for interval-scaled, boolean, categorical, ordinal ratio.
 Weights should be associated with different variables based on
applications and data semantics.
 It is hard to define “similar enough” or “good enough”
 the answer is typically highly subjective.
The K-Means Clustering
Method
 Given k, the k-means algorithm is implemented in
four steps:
 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the
clusters of the current partition (the centroid is
the center, i.e., mean point, of the cluster)
 Assigneach object to the cluster with the nearest
seed point
 Go back to Step 2, stop when no more new
assignment
The K-Means Clustering Method
(Cont’d)
 Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3

the
3

each
2 2
2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to most
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

similar
center reassign reassign
10
10
K=2 9
9
8
8

Arbitrarily choose K 7
7

object as initial
6
6
5
5
cluster center 4 Update 4
3

2
the 3

cluster
2
1 1
0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 20
4 5 6 7 8 9 10
The K-Means Algorithm

1. Choose a value for K, the total number of clusters to be

determined.
2. Choose K instances within the dataset at random. These are
the initial cluster centers.
3. Use simple Euclidean distance to assign the remaining
instances to their closest cluster center.
4. Use the instance in each cluster to calculate a new mean for
each cluster.
5. If the new mean values are identical to the mean values of the
previous iteration the process terminates.
6. Otherwise, use the new means as cluster centers and repeat
steps 3-5.
How to Initialize Centroids
 Random Points in Feature Space
 Random Points From Data Set
 Look For Dense Regions of Space
 Space them uniformly around the feature space

22
Working of the K-Mean Algorithm
Instance #: 1 2 3 4 5 6
X: 1 1 2 2 3 3
Y: 1.5 4.5 1.5 3.5 2.5 6.0

 Let’s pick Instances #1 and #3 as the initial centroids.

Distance with Centroid1 Distance with Centroid2
 Instance #2 3.00 3.16
 Instance #4 2.24 2.00
 Instance #5 2.24 1.41
 Instance #6 6.02 5.41

 New centroids are (1, 3) and (2.5, 3.4)

23
Centroid Selection
K1=> 1st  (1,1.5)
Solution K2=> 3rd  (2,1.5)

# X Y Distance Distance Cluster

with K1 with K2 Assigned
1 1 1.5 0 -

2 1 4.5 3 3.16

3 2 1.5 - 0

4 2 3.5 2.23 2

5 3 2.5 2.25 1.484

6 3 6.0 4.924 4.609

Centroid Selection
K1=> 1st  (1,1.5)
Solution(1st iteration) K2=> 3rd  (2,1.5)

# X Y Distance Distance Cluster

with C1 with C2 Assigned
1 1 1.5 0 - C1
2 1 4.5 3 3.16 C1
C1 C2
3 2 1.5 - 0 C2 1,2 3,4,5,6
4 2 3.5 2.23 2 C2
5 3 2.5 2.25 1.484 C2
6 3 6.0 4.924 4.609 C2

Mean Calculation = C1 Mean Calculation = C2

X=(1+1)/2=1 X=(2+2+3+3)/4=2.5
Y=(1.5+4.5)/2=3 Y=(1.5+3.5+2.5+6)/4=3.375
New C1 (1,3) New C2 (2.5,3.375)
Centroid Selection
New C1 (1,3)
Solution(2nd iteration) New C2 (2.5,3.375)

# X Y Distance Distance Cluster

with C1 with C2 Assigned
1 1 1.5 1.5 2.4 C1
2 1 4.5 1.5 1.875 C1
C1 C2
3 2 1.5 1.802 1.94 C1 1,2,3 4,5,6
4 2 3.5 1.118 0.51 C2
5 3 2.5 2.06 1.007 C2
6 3 6.0 3.60 2.67 C2

Mean Calculation = C1 Mean Calculation = C2

New C1 (1.33,2.5) New C2 (2.667,4)
Centroid Selection
New C1 (1.33,2.5)
Solution(3rd iteration) New C2 (2.667,4)

# X Y Distance Distance Cluster

with C1 with C2 Assigned
1 1 1.5 1.5 3 C1
2 1 4.5 2.02 1.74 C2 C1 C2
3 2 1.5 1.203 2.587 C1 1,3 2,4,5,6
4 2 3.5 1.203 0.833 C2
5 3 2.5 1.67 1.536 C2
6 3 6.0 3.87 2.02 C2

Mean Calculation = C1 Mean Calculation = C2

New C1 New C2
Practice Exercise
data X Y Distance Distance Cluster
K=2 # with C1 with C2 Assigned
Data Points (X, Y):
(1, 2)  Centroid 1 1 1 2 C1
(2, 3) 2 2 3 C2
(3, 4)
(8, 7) 3 3 4 C2
(9, 8)  Centroid 2 4 8 7 C2
(10, 9) 5 9 8 C1
6 10 9 C2
SSE
For k=3, apply k-means algorithm
Instance X-Coordinate Y-Coordinate
1 4 7
2 -1 3
3 2 -2
4 5 1
5 7 6
6 3 8
7 -5 -2
8 3 6
9 0 -3
10 -3 -1
11 4 2
12 6 0
30
K-Means Clustering – Details
 Initial centroids are often chosen randomly.
 Clusters produced vary from one run to another.
 The centroid is (typically) the mean of the points in the cluster.
 ‘Closeness’ is measured by Euclidean distance, Cosine similarity,
Correlation, etc.
 K-means will converge for common similarity measures mentioned
above.
 Most of the convergence happens in the first few iterations.
 Often the stopping condition is changed to ‘Until relatively few points change
clusters’
 Outlier removal and feature normalization are important data pre-
processing steps before applying K-Means.
31
Two different K-means Clustering
Original Points

3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x
32
Optimal Clustering Sub-optimal Clustering
Importance of Choosing Initial
Centroids Iteration 1 Iteration 2 Iteration 3
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

Iteration 4 Iteration 5 Iteration 6

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
33
Importance of Choosing Initial Centroids
…
Iteration 5
1
2
3
4
3

2.5

1.5

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

34
Importance of Choosing Initial Centroids
…
Iteration 1 Iteration 2
3 3

2.5 2.5

2 2

1.5 1.5

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

Iteration 3 Iteration 4 Iteration 5

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

35
Evaluating K-means Clusters
 Most common measure is Sum of Squared Error (SSE)
 For each point, the error is the distance to the nearest cluster
 To get SSE, we square these errors and sum them.
K
SSE    dist 2 (mi , x )
i 1 xCi

 x is a data point in cluster Ci and mi is the representative point for cluster Ci

 can show that mi corresponds to the center (mean) of the cluster
 Given two clusters, we can choose the one with the smallest error
 One easy way to reduce SSE is to increase K, the number of clusters
 A good clustering with smaller K can have a lower SSE than a poor clustering with
higher K
36
SSE

 SSE is the total sum of the squared Euclidean distances between each data
point and the centroid of the cluster it is assigned to.
 It quantifies how well the data points fit their respective clusters.
 A lower SSE indicates that the points are closer to their centroids, suggesting
that the clustering is better.
Determining the Right K
 Elbow Method:
 Plot a line chart of the SSE for a range of value of k. If the line chart looks like an
arm, then the "elbow" on the arm is the value of k that is the best

38
Limitations of K-means

 K-means has problems when clusters are of differing

 Sizes
 Densities
 Non-globular shapes

 K-means has problems when the data contains outliers.

39
Limitations of K-means: Non-globular Shapes

Original Points K-means (2 Clusters)

40
Overcoming K-means Limitations

Original Points K-means Clusters

41
Differing Sizes

42
Overcoming Differing Size

43
Differing Density

44
Overcoming Differing Density

45
Summary K-Means Method
 Strengths of K-Means:
 Fast and Scalable: K-means can handle large datasets efficiently, especially with optimizations like Mini-Batch K-
means.

 Easy to Understand and Implement: The algorithm’s concept is straightforward, making it a good starting point for
clustering tasks.

 Works Well for Simple Data: It performs well when clusters are well-separated, have similar sizes, and exhibit
spherical shapes.

 Key Limitations:
 Choosing the Right k: You need to specify the number of clusters in advance, which can be tricky without domain
knowledge or appropriate techniques like the elbow method or silhouette score.
 Sensitive to Initial Centroids: Poor initialization of centroids can lead to suboptimal clustering. The K-means++
initialization method can mitigate this issue.
 Sensitive to Outliers: K-means can be easily influenced by outliers because it relies on the mean to define cluster
centroids. K-medoids or other robust algorithms may be better for handling outliers.
 Assumes Equal Cluster Sizes, Densities, and Spherical Shapes: K-means assumes that clusters
46
are spherical, of equal
size, and have similar densities. It may struggle with complex, non-spherical, or unevenly sized clusters.
When to Use K-Means:

 When clusters are roughly spherical and have similar sizes and densities.
 When you have a good idea of the number of clusters (k) or can estimate it
using methods like the elbow method.
 When speed and scalability are important, especially with large datasets.
Pre-processing and Post-processing

 Pre-processing
 Normalize data
 Eliminate outliers

 Post-processing
 Eliminate small clusters that may represent outliers
 Split ‘loose’ clusters, i.e., clusters with relatively high SSE
 Merge clusters that are ‘close’ and that have relatively low SSE

IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Clustering
No ratings yet
Clustering
125 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
90 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
Unit 4
No ratings yet
Unit 4
29 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Week 9
No ratings yet
Week 9
66 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
54 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering
No ratings yet
Clustering
38 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
cz4041 10 Clustering
No ratings yet
cz4041 10 Clustering
67 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Cluster
100% (1)
Cluster
72 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Unit 4
No ratings yet
Unit 4
125 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Cluster Analysis for CS Students
No ratings yet
Cluster Analysis for CS Students
43 pages
Clustering
No ratings yet
Clustering
104 pages
Clustering
No ratings yet
Clustering
84 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Unit-IV - Unsupervised Learning
No ratings yet
Unit-IV - Unsupervised Learning
154 pages
(3rd Year) Pattern REcognition Lecture 4
No ratings yet
(3rd Year) Pattern REcognition Lecture 4
48 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unit 4
No ratings yet
Unit 4
19 pages
Chapter 01 Class 5
No ratings yet
Chapter 01 Class 5
3 pages
Hazrat Abu Bakr
No ratings yet
Hazrat Abu Bakr
2 pages
Chemical Reactions
No ratings yet
Chemical Reactions
20 pages
SEATO Notes and Solved Past Papers 2023-24
No ratings yet
SEATO Notes and Solved Past Papers 2023-24
3 pages
History and Importance of Quran
No ratings yet
History and Importance of Quran
4 pages
Relation With Bangladesh Notes and Solved Past Papers 2023-24
No ratings yet
Relation With Bangladesh Notes and Solved Past Papers 2023-24
5 pages
Passages (Marking Scheme)
No ratings yet
Passages (Marking Scheme)
4 pages
Moles Worksheet
No ratings yet
Moles Worksheet
3 pages
Hire Purchase Worksheet
No ratings yet
Hire Purchase Worksheet
2 pages
Geography
No ratings yet
Geography
1 page
Compound Interest Worksheet
No ratings yet
Compound Interest Worksheet
2 pages
Climate WOrksheet
No ratings yet
Climate WOrksheet
1 page
Physics Marshel Cavendish
No ratings yet
Physics Marshel Cavendish
37 pages
DENGUE
No ratings yet
DENGUE
1 page
Climate PPT (Part 1) by Azan Afzal
No ratings yet
Climate PPT (Part 1) by Azan Afzal
84 pages
57B1 (1H57B450E1) AL115C Mio Soul
No ratings yet
57B1 (1H57B450E1) AL115C Mio Soul
54 pages
Full Stack Guide for Frontend Engineers
No ratings yet
Full Stack Guide for Frontend Engineers
137 pages
Text Data Cleaning with Python
No ratings yet
Text Data Cleaning with Python
5 pages
Reinforced Concrete Column Design
No ratings yet
Reinforced Concrete Column Design
3 pages
The MIB Includes The System Information Transmitted On BCH. Signalling Radio Bearer: N/A Rlc-Sap: TM Logical Channel: BCCH Direction: Network To UE
No ratings yet
The MIB Includes The System Information Transmitted On BCH. Signalling Radio Bearer: N/A Rlc-Sap: TM Logical Channel: BCCH Direction: Network To UE
5 pages
Chem Lab Manual for 1st Year Engg
No ratings yet
Chem Lab Manual for 1st Year Engg
68 pages
Subject Profile - IB Mathematics Analysis & Approaches
No ratings yet
Subject Profile - IB Mathematics Analysis & Approaches
3 pages
Research Assignment
No ratings yet
Research Assignment
3 pages
Chemistry Home Work June 2023 - Structured Questions
No ratings yet
Chemistry Home Work June 2023 - Structured Questions
29 pages
Final Revision Grade 10 T1
No ratings yet
Final Revision Grade 10 T1
20 pages
Grade 8 Maths Practice Paper
No ratings yet
Grade 8 Maths Practice Paper
33 pages
0625 s17 QP 23 PDF
No ratings yet
0625 s17 QP 23 PDF
20 pages
Understanding Demand Elasticity
No ratings yet
Understanding Demand Elasticity
51 pages
Facade Design for Engineers
No ratings yet
Facade Design for Engineers
19 pages
Java Unit 4 - 23410738 - 2023 - 11 - 22 - 12 - 44
No ratings yet
Java Unit 4 - 23410738 - 2023 - 11 - 22 - 12 - 44
32 pages
3 X 240 11 KV CX2-T103-W20-01-51 R1 - Tech Data Sheet
No ratings yet
3 X 240 11 KV CX2-T103-W20-01-51 R1 - Tech Data Sheet
9 pages
QR Factorization for Students
100% (1)
QR Factorization for Students
12 pages
Ankhon Mein Teri Ajab Si Ajab Si A Om Shanti Om Pi
No ratings yet
Ankhon Mein Teri Ajab Si Ajab Si A Om Shanti Om Pi
3 pages
350 Interview
No ratings yet
350 Interview
88 pages
Configuration of Sanctioned Party List Screening
No ratings yet
Configuration of Sanctioned Party List Screening
17 pages
Programming ESP8266 MicroPython
100% (4)
Programming ESP8266 MicroPython
51 pages
Analysing Sculpture PDF
No ratings yet
Analysing Sculpture PDF
6 pages
EIM 4 Lesson 2 Cable Tray
100% (2)
EIM 4 Lesson 2 Cable Tray
64 pages
Adjusting Solar Panel Voc for Cold
No ratings yet
Adjusting Solar Panel Voc for Cold
14 pages
Heterogeneous Computing To Enable The Highest Level of Safety in Automotive Systems - v1.2
No ratings yet
Heterogeneous Computing To Enable The Highest Level of Safety in Automotive Systems - v1.2
39 pages
MATRICES 3x3 - Equations
No ratings yet
MATRICES 3x3 - Equations
28 pages
MEK-2500 Datasheet PDF
No ratings yet
MEK-2500 Datasheet PDF
2 pages
Mobility and Effective Electric Field in Nonplanar Channel MOSFETs
No ratings yet
Mobility and Effective Electric Field in Nonplanar Channel MOSFETs
5 pages
ATS48C32Q: Product Data Sheet
No ratings yet
ATS48C32Q: Product Data Sheet
2 pages
Accounting, Organizations and Society: Isabella Grabner, Frank Moers
No ratings yet
Accounting, Organizations and Society: Isabella Grabner, Frank Moers
13 pages

10 Lecture AI 10

Uploaded by

10 Lecture AI 10

Uploaded by

Lecture week 10

Unsupervised Learning: Clustering

 Clustering: Grouping similar data points together. For example, clustering

 Dimensionality reduction: Reducing the number of variables in the data

 Anomaly detection: Identifying unusual or rare data points that do not

 Unsupervised scholarship: no predefined classes

 Customer Segmentation in Marketing(Help marketers discover distinct groups in

How many clusters? Six Clusters

Two Clusters Four Clusters

Original Points A Partitional Clustering

Traditional Hierarchical Traditional Dendrogram

Non-traditional Hierarchical Non-traditional Dendrogram

 There is a separate “quality” function that measures the

1. Choose a value for K, the total number of clusters to be

 Let’s pick Instances #1 and #3 as the initial centroids.

 New centroids are (1, 3) and (2.5, 3.4)

# X Y Distance Distance Cluster

5 3 2.5 2.25 1.484

6 3 6.0 4.924 4.609

# X Y Distance Distance Cluster

Mean Calculation = C1 Mean Calculation = C2

# X Y Distance Distance Cluster

Mean Calculation = C1 Mean Calculation = C2

# X Y Distance Distance Cluster

Mean Calculation = C1 Mean Calculation = C2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

Iteration 4 Iteration 5 Iteration 6

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Iteration 3 Iteration 4 Iteration 5

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

 x is a data point in cluster Ci and mi is the representative point for cluster Ci

 K-means has problems when clusters are of differing

 K-means has problems when the data contains outliers.

Original Points K-means (2 Clusters)

Original Points K-means Clusters

You might also like