Stat401 ch6
Stat401 ch6
Cluster Analysis
 Cluster analysis classify observations into a small number of
    homogeneous groups (called clusters) based on observed variables.
 - The cluster analysis partitions observations into mutually exclusive groups.
Cluster A
1, 2, 4,…
Cluster B
                          3, 5, 6,…
                    Ch 6. Cluster Analysis
Stage 1 2 3 4 5
          A
                                    A, D
          D
                                                     A, B, C, D, E
          B
                                           B, C, E
          C
                       C, E
          E
  Agglomerative
                 6.2. Agglomerative Hierarchical
                      Clustering Techniques
 An agglomerative hierarchical clustering procedure produces a series of
    partitions of the data, Pn, Pn-1, …, P1.
 - The first, Pn, consists of n single member clusters.
 - The last, P1, consists of a single group containing all n observations.
 Illustrative example
 - A dissimilarity matrix for five individuals:
                               1      2 3             4 5
                    1        0                     
                                                   
                    2        2        0            
                 D          6                     
                    3                  5     0
                                                   
                    4        10       9     4 0    
                    5        9        8     5 3 0 
                             
          6.2.1. Measuring inter-cluster dissimilarity
 Everitt (2010)
Single Linkage
Complete Linkage
         Group Average
           6.2.2. Illustrative example of the application
                           of single linkage
 Stage 1: Since d12 is the smallest entry in the matrix D, individuals 1 and 2
    are merged to form a cluster.
 Stage 2.
 - The distances between this group and the three remaining individuals, 3, 4,
   and 5 are obtained as follows:
        d(12)3 = min (d13, d23) = d23 = 5,                        1 2 3 4 5
                                                             1 0               
        d(12)4 = min (d14, d24) = d24 = 9,                                     
                                                             2 2 0             
        d(12)5 = min (d15, d25) = d25 = 8.             D
                                                             3 6 5 0           
 - Form a new distance matrix D1:                                              
                                                             4 10 9 4 0     
                                12   3   4   5
                                                             5  9 8 5 3 0 
                     12       0                
                                                 
              D1     3         5     0          .
                      4         9     4   0      
                                                 
                      5         8     5   3   0 
                            
 - Since d45 is the smallest entry in D1, individuals 4 and 5 are merged to form
   a second cluster.
           6.2.2. Illustrative example of the application
                           of single linkage
 Stage 3.
 - The distances between (45) and the remaining (12) and 3 are obtained as
   follows:
        d(12)(45) = min (d14, d15, d24, d25) = d25 = 8,
                                                                1 2 3 4 5
        d3(45) = min (d34, d35) = d34 = 4.                 1 0            
 - Form a new distance matrix D2:                                         
                                                           2 2 0          
                                                         D
                              12   3   45                  3 6 5 0        
                                                                              
                      12    0                              4 10 9 4 0     
               D2                          .
                       3      5     0                        5  9 8 5 3 0 
                      45    8     4    0 
                             
 - Since d3(45) is the smallest entry in D2, individual 3 is merged with the (45)
   cluster.
 Stage 4.
 - Fusion of the two remaining groups takes place to form a single group
   containing all five individuals.
            6.2.2. Illustrative example of the application
                            of single linkage
 The partitions produced at each stage are:
       Stage      Groups
       P5         [1],[2],[3],[4],[5]
       P4         [12],[3],[4],[5]
       P3         [12],[3],[45]
       P2         [12],[345]
       P1         [12345]
 We are not interested in the complete hierarchy but only in one or two
    partitions obtained from it.
  - Determining an appropriate number of groups is not straightforward.
 - In hierarchical clustering, partitions are obtained by ‘cutting’ a dendrogram.
 In the multivariate data, the commonly used criterion considers the three
  matrices that can be calculated for each partition of the data into g groups:
                       T   xij  x xij  x  ,
                                                  
                             g     ni
i 1 j 1
                       W   xij  xi xij  xi  ,
                                                    
                              g     ni
i 1 j 1
                                                    
                             g
                       B   ni  xi  x  xi  x  .
                            i 1
   where xij is the vector of variable values for the jth observation in the ith group,
         x is the mean vector of all n observations,
          xi is the mean vector of the observation in group i,
          and ni is the number of observations in group i.
 - Note that T = W + B,
   where T represents total dispersion, W represents within-group dispersion, and
          B represents between-group dispersion.
                    6.3. Optimization Methods
        i 1 j 1                                                                                            
                                                                                                               
                                                                                         xijp  xip          
                                                                                                         2
                                                                                                               
        g ni                                 ni                                                ni
                                                                                                                            
          ij1 i1                xij1  xi1  xij 2  xi 2                               ij1 i1  ijp ip  
                                    g                                                       g
                                                                                      
                         2
                   x  x                                                                           x       x     x      x
        i 1 j 1                 i 1 j 1                                           i 1 j 1                            
                                                         ni                            g ni                                
                                                 xij 2  xi 2                      xij 2  xi 2  xijp  xip  
                                                   g
                                                                           2
       
      =                                           i 1 j 1                          i 1 j 1                             
                                                                                                                         
                                                                                                                           
                                                                                                g ni                       
                                                                                                        xijp  xip 
                                                                                                                        2
                                                                                                                           
                                                                                              i 1 j 1                    
                     6.3. Optimization Methods
 K-means methods is a criterion to minimize tr(W).
 K-means method
 (1) Partition the individuals into K initial clusters.
 (2) Proceed through the list of individuals, assigning an individual to the
    cluster whose centroid (mean) is nearest.
  - Distance is usually computed using the Euclidean distance with either
    standardized or unstandardized observations.
 (3) Recalculate the centroid for the cluster receiving the new individual and
    for the cluster losing the individual.
  - If an individual is moved from the initial configuration, the cluster
    centroid (means) must be updated before proceeding.
 (4) Repeat Steps (2) and (3) until no more reassignments take place.
 Note that the Euclidean distance between two arbitrary points P and Q
  with coordinates P = (x1, x2, … xp) and Q = (y1, y2, … yp) is
           d  P, Q     x1  y1 2  x2  y2 2    x p  y p 2 .
                   6.3. Optimization Methods
 Example. K-means method with K = 2
                            X1     X2
                    1       5       3
     Individuals    2       -1      1
                    3       1       -2
                    4      -3      -2
 (1) Divide the four individuals into 2 clusters.
  - Arbitrarily partition the individuals into two clusters, such as (12) and (34).
  - Compute the coordinates of the cluster centroid:
                                         Coordinates of centroid
        Cluster                   X1                          X2
          (12)             {5+(-1)}/2 = 2                (3+1)/2 = 2
          (34)             {1+(-3)}/2 = -1            {-2+(-2)}/2 = -2
                    6.3. Optimization Methods
 Example. K-means method with K = 2 (continued)
 (2) Compute the squared distances
        d1(12)2 = (5-2)2 + (3-2)2 = 10,
        d1(34)2 = (5-(-1))2 + (3-(-2))2 = 61.
  - Since individual 1 is closer to cluster (12) than to cluster (34),
    it is not reassigned.
        d2(12)2 = (-1-2)2 + (1-2)2 = 10,
        d2(34)2 = (-1-(-1))2 + (1-(-2))2 = 9.
  - Since individual 2 is closer to cluster (34) than to cluster (12),
    it is reassigned to cluster (34).
 K-means method
 - Rather than starting with a partition of all individuals into K preliminary
   groups in Step (1), could specify K initial centroids and then proceed to
   Step (2).
 - The final assignment of individuals to clusters could depend on the initial
   partition or the initial selection of centroids.
 - To avoid biases, randomly partition the individuals into initial groups or
   randomly select initial centroids.
 - To check the stability of the clustering, it is desirable to rerun the algorithm
   with a new initial partition.
                   6.3. Optimization Methods
 K-means method
 - If two or more initial centroids inadvertently lie within a single cluster,
   their resulting clusters will be poorly differentiated.
 - The existence of an outlier might produce at least one group with very
   dispersed individuals.
 - Even if the population is known to consist of K groups, data from the rarest
   group may not appear in the sample. Forcing the data into K groups would
   lead to nonsensical clusters.
 - It is recommended not to fix the number of clusters, K, in advance.
 - In cases where a single run of the algorithm requires the user to specify K,
   it is always a good idea to rerun the algorithm for several choices.