MODULE 5
HIERARCHICAL CLUSTERING
      ALGORITHMS
       HIERARCHICAL CLUSTERING ALGORITHMS
Hierarchical Clustering is a method of cluster analysis that builds a hierarchy of
clusters
A hierarchy of clusters is an organized tree-like structure that shows how data points
are progressively grouped into larger and larger clusters
This structure allows us to explore data at
different levels — from individual points to one
big super-cluster
Therefore hierarchical clustering builds clusters
step by step by merging smaller clusters into
bigger ones
1.Bottom-up                         1.Top-up
2.Start with each point as its      2.Start with all points in one
  own cluster and merge the           cluster and recursively split
  closest clusters step by step       them to get individual data
  until a single final cluster is     points
  formed
AGGLOMERATIVE CLUSTERING METHOD
   SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
Definition:
    In single linkage, the distance between two clusters is defined as the shortest distance
   between any pair of points, one from each cluster
                  Here,
                  · DSL is the single linkage distance
                  · Ci , Cj are clusters
                  · d(a,b) is the distance between the elements a and b
SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
       Objects          X           Y
         A              4           3
         B              1           4
         C              2           1
         D              3           8
         E              6           9
         F              5           1
  SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
Step 1: Compute the Distance or Proximity Matrix
   Find the Euclidean Distance between each pairs of points
 SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
Objects        X        Y
          x1       y1
  A            4        3                       2             2
                               d(A,B) = √ (x2-x1) + (y2-y1)
          x2       y2
  B            1        4
  C            2        1   Substituting the values of x and y
  D            3        8                    2       2
                             d(A,B)= √ (1-4) + (4-3) = 3.16
  E            6        9
  F            5        1
 SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
                                      2            2
Objects   X   Y
                  d(A,C)= √ (2-4) + (1-3) = 2.822
                                  2       2
                  d(A,D)= √ (3-4) + (8-3) = 5.09
  A       4   3                   2       2
                   d(A,E)= √ (6-4) + (9-3) = 6.32
  B       1   4                   2       2
                   d(A,F)= √ (5-4) + (1-3) = 2.23
  C       2   1
                  Similarly find
  D       3   8
                  d(B,C), d(B,C), d(B,C), d(B,C)
  E       6   9   d(C,D), d(C,E), d(C,F)
  F       5   1
                  d(E,F)
         DISTANCE MATIRX
    A     B      C      D      E      F
A   0    3.16   2.82   5.09   6.32   2.23
B   --    0     3.16   4.47   7.07    5
C   --    --     0     7.07   8.94    3
D   --    --     --     0     3.16   7.28
E   --    --     --     --     0     8.06
F   --    --     --     --     --     0
  SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
Step 2: Merging the two closest members
    Here the minimum value is 2.23 and hence we combine A and F
    Now, form clusters of elements corresponding to the minimum value and
    update the distance matrix
                                                        D             E
                                              B
                                                            A
                                                  C               F
SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
        A      B      C      D      E      F
  A     0     3.16   2.82   5.09   6.32   2.23
  B     --     0     3.16   4.47   7.07    5
  C     --     --     0     7.07   8.94    3
  D     --     --     --     0     3.16   7.28
  E     --     --     --     --     0     8.06
  F     --     --     --     --     --     0
    SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
     A     B      C      D      E      F             (A,F)    B      C      D      E
A     0   3.16   2.82   5.09   6.32   2.23   (A,F)    0      3.16   2.82   5.09   6.32
B    --    0     3.16   4.47   7.07    5      B       --      0     3.16   4.47   7.07
C    --    --     0     7.07   8.94    3      C       --      --     0     7.07   8.94
D    --    --     --     0     3.16   7.28    D       --      --     --     0     3.16
E    --    --     --     --     0     8.06    E       --      --     --     --     0
F    --    --     --     --     --     0              Cluster Formed = (A,F)
        SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
         (A,F)    B      C      D      E
                                                                 E
(A,F)     0      3.16   2.82   5.09   6.32           D
 B        --      0     3.16   4.47   7.07
                                             B
 C        --      --     0     7.07   8.94
 D        --      --     --     0     3.16               (A,F)
                                                 C
 E        --      --     --     --     0
 SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
        (A,F)    B      C      D      E
                                                      (A,F),C    B      D      E
(A,F)    0      3.16   2.82   5.09   6.32
                                            (A,F),C     0       3.16   5.09   6.32
 B       --      0     3.16   4.47   7.07
                                              B         --       0     4.47   7.07
 C       --      --     0     7.07   8.94
                                              D         --       --     0     3.16
 D       --      --     --     0     3.16
                                              E         --       --     --     0
 E       --      --     --     --     0
                                                  Cluster Formed = {(A,F), C }
   SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
          (A,F),C    B      D      E
                                                 D
                                                        E
(A,F),C     0       3.16   5.09   6.32
  B         --       0     4.47   7.07   B
  D         --       --     0     3.16
                                             (A,F), C
  E         --       --     --     0
   SINGLE LINKAGE CLUSTERING (MINIMUM LINKAGE)
          (A,F),C        B      D          E                      [{(A,F),C}, B]   (D,E)
                    2
(A,F),C     0           3.16   5.09       6.32   [{(A,F),C}, B]         0          4.47
  B         --           0     4.47       7.07      (D,E)              --           0
                                      1
  D         --           --     0         3.16   Cluster Formed = [{(A,F), C }, B]
                                                            and (D, E)
  E         --           --     --         0
                                                             Final Cluster
                                                      { [{(A,F), C }, B] , (D, E) }
DENDROGRAM FOR { [{(A,F), C }, B] , (D, E) }
                         5
     1                           3
 A       F       C   B       D       E