0% found this document useful (0 votes)

21 views19 pages

ML Unit4

Unsupervised learning is a machine learning approach that identifies patterns in data without labeled responses, focusing on clustering and association methods. Key applications include genome analysis, customer segmentation, and anomaly detection, while clustering techniques like K-means and DBSCAN are commonly used for grouping similar data points. The document also discusses the differences between supervised and unsupervised learning, as well as the limitations and preprocessing benefits of clustering.

Uploaded by

study2722004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views19 pages

ML Unit4

Uploaded by

study2722004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT 4: Unsupervised Learning Teaching

Unsupervised Learning
• Unsupervised learning is a type of machine learning where the algorithm learns patterns and
structures from input data without explicit supervision or labeled responses.
• In unsupervised learning, the algorithm is given a dataset consisting of features but not the
corresponding target labels. The goal is to explore the inherent structure of the data, identify
patterns, and extract meaningful insights.
• Unsupervised Learning is the training of a machine using information that is neither
classified nor labelled , which allows the algorithm to act on that information without
guidance.

TYPES OF UNSUPERVISED LEARNING

CLUSTERING:
❖ Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of
another group.
❖ Cluster analysis finds the commonalities between the data objects and categorizes them
as per the presence and absence of those commonalities.

ASSOCIATION:
❖ An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database.
❖ It determines the set of items that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people who buy X item (suppose a bread)
are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis

Difference Between Supervised & Unsupervised Learning

Supervised learning algorithms are Unsupervised learning algorithms are trained using
trained using labelled data unlabelled data.

Supervised learning model predicts the Unsupervised learning model finds the hidden patterns
output. in data.

The goal of supervised learning is to The goal of unsupervised learning is to find the hidden
train the model so that it can predict the patterns and useful insights from the unknown dataset.
output when it is given new data.
Supervised learning can be categorized Unsupervised Learning can be classified in Clustering
in Classification and Regression and Associations problems.
problems.
Supervised learning model produces an Unsupervised learning model may give less accurate
accurate result. result as compared to supervised learning.

It includes various algorithms such as It includes various algorithms such as Clustering, KNN,
Linear Regression, Logistic Regression, and Apriori algorithm.
Support Vector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.
Supervised learning needs supervision Unsupervised learning does not need any supervision to
to train the model. train the model.

Supervised learning is not close to true Unsupervised learning is more close to

Artificial intelligence as in this, we first the true Artificial Intelligence as it learns
train the model for each data, and then similarly as a child learns daily routine things
only it can predict the correct output. by his experiences.
Applications of Unsupervised Learning
GENOME ANALYSIS: These Algorithms can analyze genetic data to uncover patterns and
links for genetic research.
IMAGE AND TEXT CLUSTERING: It can automatically group similar
images and texts , aiding in tasks like image organization, document clustering
or content recommendation.
CUSTOMER SEGMENTATION: Customers can be grouped based on their purchase
behavior, allowing organizations to customize marketing efforts.
SEMANTIC CLUSTERING: It organizes all the responses with the same meaning into
clusters to ensure that customer quickly and easily gets the information.
ANOMALY DETECTION: Unsupervised learning is used to identify data points, events,
and/or observations that deviate from a dataset's normal behavior.
MARKET BASKET ANALYSIS: Past purchase behavior coupled with unsupervised
learning can be used to help businesses discover data trends that they could use to develop
effective cross-selling strategies.

CLUSTERING
Clustering in unsupervised machine learning, is the process of grouping unlabeled data into
clusters, based on their similarities.
Broadly this technique is applied to group data, based on different patterns, such as
similarities or differences, our machine model finds.
Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine
similarity, Manhattan distance, etc. and then group the points with highest similarity score
together
Types of Clustering
PARTIONING BASED CLUSTERING: These methods divide the data set into distinct groups.
They aim to create partitions such that objects within each partition are as similar as possible.
• K-means Clustering: Partitions data into k clusters based on centroid minimization.
• K-medoids Clustering: Similar to k-means but uses medoids (most centrally located
points) instead of centroids.
HIERARCHICAL CLUSTERING: These methods create a hierarchy of clusters that can be
represented in a tree structure (dendrogram).
• Agglomerative (Bottom-Up): Starts with each object in its own cluster and merges the
closest pairs of clusters iteratively.
• Divisive (Top-Down): Starts with all objects in one cluster and splits the cluster iteratively.
DENSITY-BASED CLUSTERING: These methods find clusters based on the density of points
in the data space.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Finds clusters
based on the density of data points, capable of finding arbitrary-shaped clusters and dealing
with noise.
• OPTICS (Ordering Points to Identify the Clustering Structure): An extension of
DBSCAN that works better with varied densities.
GRID BASED CLUSTERING: These methods involve dividing the data space into a finite
number of cells that form a grid structure.
• STING (Statistical Information Grid): Uses a multi-resolution grid data structure for
efficient clustering.
• CLIQUE (Clustering In QUEst): Identifies dense regions in subspaces of high-
dimensional data and merges them to form clusters.
APPLICATIONS OF CLUSTERING:
MARKETING: Clustering can be used for marketing reasons to characterize and discover client
segments.

BIOLOGY: It can be used for classification among different species of plants and animals.

IMAGE PROCESSING: It can be used to group similar images together, classify images based on
content and identify patterns in image data.

FRAUD DETECTION: Clustering is used to identify suspicious patterns or anomalies in financial

transactions, thereby helping in fraud detection.

CYBERSECURITY: It is used to group similar patterns of network traffic or system behavior

,which can help in detecting and preventing cyber attacks.

CLIMATE ANALYSIS: It is used for grouping comparable patterns of climate data, such as
temperature, precipitation and wind inorder to better understand climate change and its influence
on environment.

MEDICAL DIAGNOSIS: It is used to group patients with similar symptoms or diseases, Which
helps in making accurate diagnoses and identifying effective treatments.
FINANCE: Clustering is technique used to identify market groupings based on customer behavior
, find patterns in stock market data and assess risk in investment portfolios
K -MEANS CLUSTERING
➢ K-means clustering is a very popular clustering algorithm which applied when we have a
dataset with labels unknown.
➢ The goal is to find certain groups based on some kind of similarity in the data with the
number of groups represented by K.
➢ This algorithm is generally used in areas like market segmentation, customer
segmentation, etc. But, it can also be used to segment different objects in the images on the
basis of the pixel values.

How K-Means Clustering Works?

Here’s how it works:

1. Initialization: Start by randomly selecting K points from the dataset. These points will
act as the initial cluster centroids.
2. Assignment: For each data point in the dataset, calculate the distance between that
point and each of the K centroids. Assign the data point to the cluster whose centroid
is closest to it. This step effectively forms K clusters.
3. Update centroids: Once all data points have been assigned to clusters, recalculate the
centroids of the clusters by taking the mean of all data points assigned to each cluster.
4. Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs when the
centroids no longer change significantly or when a specified number of iterations is
reached.
5. Final Result: Once convergence is achieved, the algorithm outputs the final cluster
centroids and the assignment of each data point to a cluster .

Problem on K-Means Clustering

Q. Apply K(=2)-Means algorithm over the data (185, 72), (170, 56), (168, 60),
(179,68), (182,72), (188,77) up to two iterations and show the clusters. Initially
choose first two objects as initial centroids.

Solution:
Given, number of clusters to be created (K) = 2 say c1 and c2,
number of iterations = 2 and
The given data points can be represented in tabular form as:
Data Points

also, first two objects as initial centroids:

Centroid for first cluster c1 = (185, 72)
Centroid for second cluster c2 = (170, 56)

Iteration 1: Now calculating similarity by using Euclidean distance measure as:

Euclidean distance calculation

Representing above information in tabular form:

Distance of each data points from cluster centroids

The resulting cluster after first iteration is:

Data points cluster

Iteration 2: Now calculating centroid for each cluster:

Calculating centroid as mean of data points

Now, again calculating similarity:

Distance calculation between data points and centroids

Representing above information in tabular form.

Distance of each data points from cluster centroids
The resulting cluster after second iteration is:

Data points cluster

As we have already completed two iteration as asked by our question, the

numerical ends here.

Since, the clustering doesn’t change after second iteration, so terminate the
iteration even if question doesn’t say so.

Limitations of K Means Clustering

• Sensitivity to Initial Conditions:K-means is sensitive to initial conditions.

The algorithm randomly initializes the cluster centroids at the beginning, and
the final clustering results can vary depending on these initial positions.
Different initializations can lead to other local optima, resulting in different
clustering outcomes. This makes the K-means algorithm less reliable and
reproducible.
• Difficulty in Determining / K :One of the drawbacks of the K-means
algorithm is that we have to set
k the number of clusters ( ) in advance.
Choosing an incorrect number of clusters can lead to inaccurate results.
• K-means requires us to specify the number of clusters a priory.
• K-Means is sensitive towards outlier. Outliers can skew the clusters in K-
Means in very large extent.
• K-Means forms spherical clusters only. It fails when data is not spherical (e.g.,
oblong or elliptical) or of arbitrary shape
• Inability to Handle Categorical Data: When categorical data is used with the
K-means algorithm, it requires converting the categories into numerical values,
such as using one-hot encoding. One shortcoming of using one-hot encoding is
that it treats each feature independently and can degrade performance since it
can significantly increase data dimensionality.
.CLUSTERING FOR IMAGE SEGMENATION
Image Segmentation is the process of partitioning an image into multiple regions based on the
characteristics of the pixels in the original image. Clustering is a technique to group similar entities
and label them. Thus, for image segmentation using clustering, we can cluster similar pixels using
a clustering algorithm and group a particular cluster pixel as a single segment.

K-Means clustering for Image Segmentation

K-M eans is an agglomerative type of clustering where we do Not have the labels of pixels/data
points beforehand. The number of clusters or groups formed is taken as K. These clusters are
derived based on some of the similar or common characteristics between the pixels or data points.

The steps involved in K-Means clustering are

• Select a particular value of K

• A feature is taken in each pixel like RGB value, etc.
• Similar pixels are grouped by using distances like Euclidian distance.
• K-Means is used with the center of the cluster.
• Generally, a threshold is defined within which if the calculated size falls it is grouped into
a single cluster.
Clustering for Preprocessing:
Since a clustering process considers all the attributes of the dataset and reduces the information
to a cluster, which is really another attribute (i.e., the ID of the cluster to which a record would
belong to), clustering can be used as a data compression technique. The output of clustering is
the cluster ID for each record and it can be used as an input variable for other data science tasks.
Hence, clustering can be employed as a preprocessing technique for other data science processes.

In general, clustering can be used for two types of preprocessing:

1.
Clustering to reduce dimensionality: In an n-dimensional dataset (n number of attributes),
the computational complexity is proportional to the number of dimensions or “n.” With
clustering, n-dimensional attributes can be converted or reduced to one categorical
attribute—“Cluster ID.” This reduces the complexity, although there will be some loss of
information because of the dimensionality reduction to one single attribute. Chapter 14,
Feature Selection provides an in-depth look at feature selection techniques.
2.
Clustering for object reduction: Assume that the number of customers for an organization
is in the millions and the number of cluster groups is 100. For each of these 100 cluster
groups, one “poster child” customer can be identified that represents the typical
characteristics of all the customers in that cluster group. The poster child customer can be
an actual customer or a fictional customer. The prototype of a cluster is the most common
representation of all the customers in a group. Reducing millions of customer records to
100 prototype records provides an obvious benefit. In some applications, instead of
processing millions of records, just the prototypes can be processed for further
classification or regression tasks. This greatly reduces the record count and the dataset can
be made appropriate for classification by algorithms like k-nearest neighbor (k-NN) where
computation complexity depends on the number of records.

Using Clustering for semi supervised learning

Using clustering for Semi-supervised learning can offer several benefits over purely
supervised or unsupervised learning methods. First, it can leverage the large amount of unlabeled
data that is often available in real-world scenarios, and use it to enhance the accuracy and
generalization of the model. Second, it can reduce the cost and effort of labeling data manually,
which can be tedious, subjective, or error-prone. Third, it can discover hidden patterns or structures
in the data that may not be obvious or captured by the existing labels.

How does clustering work with semi-supervised learning

Semi-supervised learning with clustering can be implemented in various ways, depending on

the type of data, the clustering algorithm, and the supervised learning model. A common approach
is to first apply a clustering algorithm to the unlabelled data, and then combine the labelled and
unlabeled data, using the cluster labels as pseudo-labels for the unlabelled data. Next, a supervised
learning model is trained on the combined data, with the original labels and pseudo-labels serving
as target variables. Finally, the model can be evaluated on a test set, and optionally refined or
adjusted.
Some examples of semi-supervised learning with clustering
Semi-supervised learning with clustering has been applied to various domains and
tasks in AI, such as natural language processing for text classification, sentiment analysis,
topic modelling, or named entity recognition. For example, one can cluster unlabelled
documents based on their word embeddings and use the cluster labels as pseudo-labels to
train a text classifier. This same approach can be used for computer vision tasks like image
classification, object detection, face recognition, or semantic segmentation. Clustering can
also be used in bioinformatics for gene expression analysis, protein function prediction, or
disease diagnosis. For instance, one can cluster unlabelled gene expression profiles based on
their similarity and use the generated labels as pseudo-labels to train a gene classifier.

DBSCAN
DBSCAN is the abbreviation for Density-Based Spatial Clustering of Applications with Noise. It
is an unsupervised clustering algorithm.DBSCAN clustering can work with clusters of any size
from huge amounts of data and can work with datasets containing a significant amount of noise.
It is basically based on the criteria of a minimum number of points within a region.

What is DBSCAN Algorithm?

DBSCAN algorithm can cluster densely grouped points efficiently into one cluster. It can identify
local density in the data points among large datasets. DBSCAN can very effectively handle
outliers. An advantage of DBSACN over the K-means algorithm is that the number of centroids
need not be known beforehand in the case of DBSCAN.

DBSCAN algorithm depends upon two parameters epsilon and minPoints.

Epsilon is defined as the radius of each data point around which the density is considered.

minPoints is the number of points required within the radius so that the data point becomes a core
point.

The circle can be extended to higher dimensions.

Working of DBSCAN Algorithm

In the DBSCAN algorithm, a circle with a radius epsilon is drawn around each data point and the
data point is classified into Core Point, Border Point, or Noise Point. The data point is classified
as a core point if it has minPoints number of data points with epsilon radius. If it has points less
than minPoints it is known as Border Point and if there are no points inside epsilon radius it is
considered a Noise Point.
Let us understand working through an example.

In the above figure, we can see that point A has no points inside epsilon(e) radius. Hence it is a
Noise Point. Point B has minPoints(=4) number of points with epsilon e radius , thus it is a Core
Point. While the point has only 1 ( less than minPoints) point, hence it is a Border Point.

Steps Involved in DBSCAN Algorithm.

• First, all the points within epsilon radius are found and the core points are identified with
number of points greater than or equal to minPoints.
• Next, for each core point, if not assigned to a particular cluster, a new cluster is created for
it.
• All the densely connected points related to the core point are found and assigned to the same
cluster. Two points are called densely connected points if they have a neighbor point that
has both the points within epsilon distance.
• Then all the points in the data are iterated, and the points that do not belong to any cluster
are marked as noise.

Advantages of the DBSCAN Algorithm

• DBSCAN does not require the number of centroids to be known beforehand as in the case with the K-
Means Algorithm.
• It can find clusters with any shape.
• It can also locate clusters that are not connected to any other group or clusters. It can work well with
noisy clusters.
• It is robust to outliers.

Disadvantages of the DBSCAN Algorithm

• It does not work with datasets that have varying densities.
• Cannot be employed with multiprocessing as it cannot be partitioned.
• Cannot find the right cluster if the dataset is sparse.
• It is sensitive to parameters epsilon and minPoints

Applications of DBSCAN
• It is used in satellite imagery.
• Used in XRay crystallography
• Anamoly detection in temperature.

Other Clustering algorithms

Hierarchical clustering is another unsupervised learning algorithm that is used to group together
the unlabeled data points having similar characteristics.

Hierarchical clustering is a technique for grouping data points based on similarities. The process
involves finding the two data points closest to each other and combining the two most similar ones.
After repeating this process until all data points are grouped into clusters, the end result is a
hierarchical tree of related groups known as a dendrogram

There are two major types of approaches:

• Agglomerative clustering: Divide the data points into different clusters and then
aggregate them as the distance decreases.
• Divisive clustering: Combine all the data points as a single cluster and divide them
as the distance between them increases.

Agglomerative Clustering
Agglomerative clustering is a bottom-up approach. It starts clustering by treating the
individual data points as a single cluster, then it is merged continuously based on similarity
until it forms one big cluster containing all objects. It is good at identifying small clusters.

The steps for agglomerative clustering are as follows:

1. Compute the proximity matrix using a distance metric.

2. Use a linkage function to group objects into a hierarchical cluster tree based on the
computed distance matrix from the above step.
3. Data points with close proximity are merged together to form a cluster.
4. Repeat steps 2 and 3 until a single cluster remains.

The pictorial representation of the above steps would be:

In the above figure,

• The data points 1,2,...6 are assigned to each individual cluster.

• After calculating the proximity matrix, based on the similarity the points 2,3 and 4,5
are merged together to form clusters.
• Again, the proximity matrix is computed and clusters with points 4,5 and 6 are
merged together.
• And again, the proximity matrix is computed, then the clusters with points 4,5,6 and
2,3 are merged together to form a cluster.
• As a final step, the remaining clusters are merged together to form a single cluster.

Proximity Matrix and Linkage

The proximity matrix is a matrix consisting of the distance between each pair of data
points. The distance is computed by a distance function. Euclidean distance is one of the
most commonly used distance functions.

Image:
The above proximity matrix consists of n points named x, and the d(xi,xj) represents the
distance between the points.

In order to group the data points in a cluster, a linkage function is used where the values in
the proximity matrix are taken and the data points are grouped based on similarity. The
newly formed clusters are linked to each other until they form a single cluster containing all
the data points.

The most common linkage methods are as follows:

• Complete linkage: The maximum of all pairwise distance between elements in

each pair of clusters is used to measure the distance between two clusters.
• Single linkage: The minimum of all pairwise distance between elements in each
pair of clusters is used to measure the distance between two clusters.
• Average linkage: The average of all pairwise distances between elements in each
pair of clusters is used to measure the distance between two clusters.
• Centroid linkage: Before merging, the distance between the two clusters’ centroids
are considered.
• Ward’s Method: It uses squared error to compute the similarity of the two clusters
for merging.

Dendrogram Charts

The way to find the optimal number of clusters in hierarchical clustering is to use a
dendrogram chart.

• .

The chart is shown here:

Image:
Author
From the above chart we can visualize the hierarchical technique, so how to find the
optimal number of clusters from the above chart?

To find it, draw a horizontal line where there is no overlap in the vertical lines of the bars.
The number of bars without the overlap below the line is the optical number of the clusters.
Refer to the figure below for a clear illustration.
Image: Author
From the above figure, we have three bars below the horizontal line, so the optimal number
of clusters is three. Also, if you recall, the Iris dataset has three classes and we got the same
number from the above chart.

Divisive Clustering
Divisive clustering works just the opposite of agglomerative clustering. It starts by
considering all the data points into a big single cluster and later on splitting them into
smaller heterogeneous clusters continuously until all data points are in their own cluster.
Thus, they are good at identifying large clusters. It follows a top-down approach and is
more efficient than agglomerative clustering. But, due to its complexity in implementation,
it doesn’t have any predefined implementation in any of the major machine learning
frameworks.

Steps in Divisive Clustering

Consider all the data points as a single cluster.

1. Split into clusters using any flat-clustering method, say K-Means.

2. Choose the best cluster among the clusters to split further, choose the one that has
the largest Sum of Squared Error (SSE).
3. Repeat steps 2 and 3 until a single cluster is formed.
In the above figure,

• The data points 1,2,...6 are assigned to large cluster.

• After calculating the proximity matrix, based on the dissimilarity the points are split
up into separate clusters.
• The proximity matrix is again computed until each point is assigned to an individual
cluster.

The proximity matrix and linkage function follow the same procedure as agglomerative
clustering, As the divisive clustering is not used in many places, there is no predefined
class/function in any Python library.

Unit 4
No ratings yet
Unit 4
53 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Unsupervised Learning: Niveditha. GH
No ratings yet
Unsupervised Learning: Niveditha. GH
10 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit 3
No ratings yet
Unit 3
34 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Unit 4
No ratings yet
Unit 4
96 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unit 4
No ratings yet
Unit 4
74 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
CH 5
No ratings yet
CH 5
34 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
25 pages
Unit 4
No ratings yet
Unit 4
29 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unit 4
No ratings yet
Unit 4
16 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Unit 5
No ratings yet
Unit 5
44 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Unit 4
No ratings yet
Unit 4
62 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
1694601073-Unit 3.1 Unsupervised Learning CU 2.0
No ratings yet
1694601073-Unit 3.1 Unsupervised Learning CU 2.0
35 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Clustering in Machine Learning: Prepared by
No ratings yet
Clustering in Machine Learning: Prepared by
10 pages
MAD Unit 3pdf
No ratings yet
MAD Unit 3pdf
5 pages
ML Unit 2
No ratings yet
ML Unit 2
20 pages
MAD Unit 2
No ratings yet
MAD Unit 2
23 pages
DA Unit3
No ratings yet
DA Unit3
40 pages
MAD Unit-4
No ratings yet
MAD Unit-4
41 pages
It - Unit 3
No ratings yet
It - Unit 3
24 pages
ECD-Unit 2
No ratings yet
ECD-Unit 2
80 pages
CHAPTER 2 (Rural Economics)
No ratings yet
CHAPTER 2 (Rural Economics)
9 pages
4th Chapter Economy Notes
No ratings yet
4th Chapter Economy Notes
8 pages
Chapter 1 (Rural Economics)
100% (1)
Chapter 1 (Rural Economics)
7 pages
III Sem OE Rural Economics
No ratings yet
III Sem OE Rural Economics
1 page
IV Sem Constitution of India Full Notes
No ratings yet
IV Sem Constitution of India Full Notes
30 pages
Civil Service Form 212 Guide
0% (1)
Civil Service Form 212 Guide
1 page
5E Thinking Through A Lesson Protocol (5E-TTLP) : Class Info
No ratings yet
5E Thinking Through A Lesson Protocol (5E-TTLP) : Class Info
9 pages
Mathematics April 2025.Jst Retooling
No ratings yet
Mathematics April 2025.Jst Retooling
14 pages
Bsse & Bgcse Mock Examination Draft Time Table 2025
100% (1)
Bsse & Bgcse Mock Examination Draft Time Table 2025
2 pages
Lesson 2 Working With Docs
No ratings yet
Lesson 2 Working With Docs
10 pages
Gender Differences in Parents Communication With Their Adolescent Children About Sexual Risk and Sex Positive Topics
No ratings yet
Gender Differences in Parents Communication With Their Adolescent Children About Sexual Risk and Sex Positive Topics
13 pages
Subordination & Marginalization - Submite
No ratings yet
Subordination & Marginalization - Submite
16 pages
Physical Education Theories & Principles
100% (2)
Physical Education Theories & Principles
26 pages
Mindfulness for Kids' Mental Health
No ratings yet
Mindfulness for Kids' Mental Health
2 pages
The Uganda Institution of Professional Engineers Application Form Newest Revised 5
No ratings yet
The Uganda Institution of Professional Engineers Application Form Newest Revised 5
4 pages
C Project Proposal
No ratings yet
C Project Proposal
11 pages
Maritirukan Aku Bisa: Android Application For Learning Social Skills of Children With Social Emotional Barriers
No ratings yet
Maritirukan Aku Bisa: Android Application For Learning Social Skills of Children With Social Emotional Barriers
9 pages
Deserts Exercise
No ratings yet
Deserts Exercise
4 pages
Brainy kl7 Short Tests Unit 7 Lesson 5
No ratings yet
Brainy kl7 Short Tests Unit 7 Lesson 5
1 page
Buku Panduan MT DLP Pemulihan Khas
100% (1)
Buku Panduan MT DLP Pemulihan Khas
44 pages
JEE Mains 2025 Exam Questions
No ratings yet
JEE Mains 2025 Exam Questions
35 pages
Brochure 7.0 Technology Excellence Awards Feb 2025
No ratings yet
Brochure 7.0 Technology Excellence Awards Feb 2025
10 pages
Set 1 2 3 Possible Questions Uring The Oral Defense
No ratings yet
Set 1 2 3 Possible Questions Uring The Oral Defense
13 pages
School:: Last Name First Name Middle Name School Attended
100% (1)
School:: Last Name First Name Middle Name School Attended
61 pages
U2L6 Are Life Choices A Gamble - Let's Assess Figurative Risk Worksheet-1
No ratings yet
U2L6 Are Life Choices A Gamble - Let's Assess Figurative Risk Worksheet-1
5 pages
English 6 Q3 W6 Activity Sheet
100% (2)
English 6 Q3 W6 Activity Sheet
7 pages
DLP 33
No ratings yet
DLP 33
6 pages
Ncoi Portfolio Teacher Vi Blue
No ratings yet
Ncoi Portfolio Teacher Vi Blue
29 pages
Biochemistry Module for BSED Students
No ratings yet
Biochemistry Module for BSED Students
2 pages
Communication Letter
100% (1)
Communication Letter
2 pages
Final Exam Science and Technology
No ratings yet
Final Exam Science and Technology
3 pages
Europass CV
No ratings yet
Europass CV
3 pages
10th Class Board Pairing Scheme 2023
No ratings yet
10th Class Board Pairing Scheme 2023
4 pages
Clasificare Reviste ERIH
No ratings yet
Clasificare Reviste ERIH
153 pages
Ocean Learning Platform Help 3
No ratings yet
Ocean Learning Platform Help 3
5 pages

ML Unit4

Uploaded by

ML Unit4

Uploaded by

UNIT 4: Unsupervised Learning Teaching

TYPES OF UNSUPERVISED LEARNING

Difference Between Supervised & Unsupervised Learning

Supervised learning is not close to true Unsupervised learning is more close to

FRAUD DETECTION: Clustering is used to identify suspicious patterns or anomalies in financial

CYBERSECURITY: It is used to group similar patterns of network traffic or system behavior

How K-Means Clustering Works?

Here’s how it works:

Problem on K-Means Clustering

also, first two objects as initial centroids:

Iteration 1: Now calculating similarity by using Euclidean distance measure as:

Representing above information in tabular form:

Distance of each data points from cluster centroids

The resulting cluster after first iteration is:

Data points cluster

Iteration 2: Now calculating centroid for each cluster:

Calculating centroid as mean of data points

Now, again calculating similarity:

Representing above information in tabular form.

Data points cluster

As we have already completed two iteration as asked by our question, the

Limitations of K Means Clustering

• Sensitivity to Initial Conditions:K-means is sensitive to initial conditions.

K-Means clustering for Image Segmentation

The steps involved in K-Means clustering are

• Select a particular value of K

In general, clustering can be used for two types of preprocessing:

Using Clustering for semi supervised learning

How does clustering work with semi-supervised learning

Semi-supervised learning with clustering can be implemented in various ways, depending on

What is DBSCAN Algorithm?

DBSCAN algorithm depends upon two parameters epsilon and minPoints.

The circle can be extended to higher dimensions.

Working of DBSCAN Algorithm

Steps Involved in DBSCAN Algorithm.

Advantages of the DBSCAN Algorithm

Disadvantages of the DBSCAN Algorithm

Other Clustering algorithms

There are two major types of approaches:

The steps for agglomerative clustering are as follows:

1. Compute the proximity matrix using a distance metric.

The pictorial representation of the above steps would be:

In the above figure,

• The data points 1,2,...6 are assigned to each individual cluster.

Proximity Matrix and Linkage

The most common linkage methods are as follows:

• Complete linkage: The maximum of all pairwise distance between elements in

The chart is shown here:

Steps in Divisive Clustering

Consider all the data points as a single cluster.

1. Split into clusters using any flat-clustering method, say K-Means.

• The data points 1,2,...6 are assigned to large cluster.

You might also like