0% found this document useful (0 votes)
41 views3 pages

Hierarchical Clustering

The document explains Hierarchical clustering, which groups data into a tree of clusters through a series of merges, and is represented by a Dendrogram. It also discusses Divisive Hierarchical clustering, the opposite approach that starts with one cluster and separates data points iteratively. Additionally, it introduces DBSCAN, a density-based clustering algorithm that identifies arbitrary-shaped clusters and handles noise, detailing its key parameters, eps and MinPts.

Uploaded by

vedanta.rcr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views3 pages

Hierarchical Clustering

The document explains Hierarchical clustering, which groups data into a tree of clusters through a series of merges, and is represented by a Dendrogram. It also discusses Divisive Hierarchical clustering, the opposite approach that starts with one cluster and separates data points iteratively. Additionally, it introduces DBSCAN, a density-based clustering algorithm that identifies arbitrary-shaped clusters and handles noise, detailing its key parameters, eps and MinPts.

Uploaded by

vedanta.rcr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

A Hierarchical clustering method works via grouping data into a tree of

clusters. Hierarchical clustering begins by treating every data point as a


separate cluster. Then, it repeatedly executes the subsequent steps:
1. Identify the 2 clusters which can be closest together, and
2. Merge the 2 maximum comparable clusters. We need to continue these
steps until all the clusters are merged together.

In Hierarchical Clustering, the aim is to produce a hierarchical series of


nested clusters. A diagram called Dendrogram (A Dendrogram is a tree-like
diagram that statistics the sequences of merges or splits) graphically
represents this hierarchy and is an inverted tree that describes the order in
which factors are merged (bottom-up view) or clusters are broken up (top-
down view).

Let's say we have six data points A, B, C, D, E, and F.

 Step-1: Consider each alphabet as a single cluster and calculate the


distance of one cluster from all the other clusters.
 Step-2: In the second step comparable clusters are merged together to
form a single cluster. Let's say cluster (B) and cluster (C) are very similar
to each other therefore we merge them in the second step similarly to
cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
 Step-3: We recalculate the proximity according to the algorithm and
merge the two nearest clusters([(DE), (F)]) together to form new clusters
as [(A), (BC), (DEF)]
 Step-4: Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now left
with clusters [(A), (BCDEF)].
 Step-5: At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].

Divisive Hierarchical clustering


We can say that Divisive Hierarchical clustering is precisely the opposite of
Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, we
take into account all of the data points as a single cluster and in every
iteration, we separate the data points from the clusters which aren't
comparable. In the end, we are left with N clusters.

Divisive Hierarchical clustering

DBSCAN is a density-based clustering algorithm that groups data


points that are closely packed together and marks outliers as
noise based on their density in the feature space. It identifies clusters as
dense regions in the data space separated by areas of lower density. Unlike
K-Means or hierarchical clustering which assumes clusters are compact and
spherical, DBSCAN perform well in handling real-world data irregularities
such as:
 Arbitrary-Shaped Clusters: Clusters can take any shape not just circular
or convex.
 Noise and Outliers: It effectively identifies and handles noise points
without assigning them to any cluster.

DBSCAN Clustering in ML | Density based clustering


The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise
tolerance while DBSCAN manages arbitrary-shaped clusters and noise
handling.
Key Parameters in DBSCAN
1. eps: This defines the radius of the neighborhood around a data point. If
the distance between two points is less than or equal to eps they are
considered neighbors. A common method to determine eps is by analyzing
the k-distance graph. Choosing the right eps is important:
 If eps is too small most points will be classified as noise.
 If eps is too large clusters may merge and the algorithm may fail to
distinguish between them.
2. MinPts: This is the minimum number of points required within
the eps radius to form a dense region. A general rule of thumb is to set
MinPts >= D+1 where D is the number of dimensions in the dataset.
For most cases a minimum value of MinPts = 3 is recommended.

You might also like