UPGMA Contd.....
Final Step:
Pitfalls in UPGMA
The UPGMA clustering method is very sensitive to unequal
evolutionary rates. This means that when of the OTUs has
incorporated more mutations over time, then the other OTU one may
end up with a tree that has wrong topology
Clustering works only if the data are ultrametric
Ultrametric distances are defined by the satisfaction of the
three- point condition.
What is the three-point condition?
For any three taxa: dist AC <= max (distAB, distBC) or in words: the two
greatest distances are equal, or UPGMA assumes that the evolutionary
rate is the same for all branches. If the assumption of rate constancy
among lineages does not hold UPGMA may give an erroneous topology.
Suppose you have the following tree:
Since the divergence of A and B, B has accumulated mutations at a much
higher rate than A. The Three-point criterion is violated!
e.g. distBD <= max (distBA,distAD) or, 10 <= max (5,7) = False
The reconstruction of the evolutionary history uses the following distance matrix:
Correct Topology
Neighbour Joining Method
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Step 2: Now we calculate a new distance matrix using for each pair of OTUs
the formula:
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Now we start with a star tree:
Step 3: Now we choose as neighbors those two OTUs for which Mij is the
smallest. These are A and B and D and E. Let's take A and B as neighbors and
we form a new node called U. Now we calculate the branch length from the
internal node U to the external OTUs A and B.
S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) = 1
S(BU) =d(AB)-S(AU) = 4
Step 4: Now we define new distances
from U to each other terminal node:
d(CU) = d(AC) + d(BC) - d(AB) / 2 = 3
d(DU) = d(AD) + d(BD) - d(AB) / 2 = 6
d(EU) = d(AE) + d(BE) - d(AB) / 2 = 5
d(FU) = d(AF) + d(BF) - d(AB) / 2 = 7
The entire procedure is repeated starting at step 1