0% found this document useful (0 votes)
19 views2 pages

Solutions Hfman

The document presents a proof of the optimality of Huffman codes, detailing an algorithm (HUF) that constructs an optimal binary tree for a given alphabet and frequency function. It includes lemmas that demonstrate the properties of optimal trees, particularly that the two symbols with the lowest frequencies must be siblings. The proof is structured through induction on the size of the alphabet, confirming that the algorithm produces the minimum average weighted path length for the tree.

Uploaded by

rayrup123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

Solutions Hfman

The document presents a proof of the optimality of Huffman codes, detailing an algorithm (HUF) that constructs an optimal binary tree for a given alphabet and frequency function. It includes lemmas that demonstrate the properties of optimal trees, particularly that the two symbols with the lowest frequencies must be siblings. The proof is structured through induction on the size of the alphabet, confirming that the algorithm produces the minimum average weighted path length for the tree.

Uploaded by

rayrup123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Proof of Optimality of Huffman Codes

CSC373 Spring 2009

1 Problem
P are given an alphabet A and a frequency function f : A → (0, 1) such that
You
x f (x) = 1. Find a binary tree T with |A| leaves (each leaf corresponding to
a unique symbol) that minimizes
X
ABL(T ) = f (x)depth(x)
leaves of T

Such a tree is called optimal.

2 Algorithm
HUF(A, f )
If |A| = 1 then return a single vertex.
Let w and y be the symbols with the lowest frequencies.
Let A0 = A \ {w, y} + {z}.
Let f 0 (x) = f (x) for all x ∈ A0 \ {z}, and let f 0 (z) = f (w) + f (y).
T 0 = HUF(A0 , f 0 ).
Create T from T 0 by adding w and y as children of z.
return T

3 Proof
Lemma 1 Let T be a tree for some f and A, and let y and w be two leaves. Let
T 0 be the tree obtained from T by swapping y and w. Then ABL(T 0 )−ABL(T ) =
(f (y) − f (w))(depth(w, T ) − depth(y, T )).

Proof

ABL(T 0 ) − ABL(T ) = f (y)depth(w, T ) + f (w)depth(y, T ) − f (w)depth(w, T ) − f (y)depth(y, T )


= f (y)(depth(w, T ) − depth(y, T )) + f (w)(depth(y, T ) − depth(w, T ))
= (f (y) − f (w))(depth(w, T ) − depth(y, T ))

Lemma 2 There exists an optimal tree such that the two symbols with the lowest
frequencies are siblings.

Proof Let T be an optimal tree. Let w and y be two symbols with the lowest
frequencies. If there is more than one symbol that has the lowest frequency, then

1
take two that have the biggest depth. If w and y are siblings, then we are done.
Otherwise, suppose without loss of generality, that depth(w, T ) ≥ depth(y, T ).
We have three cases:

• w has a sibling z. Let T 0 be the tree created from T by swapping z and


y, and thus making w and y siblings. By applying Lemma 1, we get that
ABL(T 0 ) ≤ ABL(T ). Since T is optimal, there cannot be another tree
with a smaller cost, and so ABL(T 0 ) = ABL(T ). Thus T 0 is also optimal.

• w is an only child. Create T 0 by removing w’s leaf and assigning w to


its old parent. T 0 is cheaper than T , contradiction the optimality of T .
Hence, this case is not possible.

• There exists a node z at a depth bigger then w. Create T 0 by swapping


w and z. By our choice of w, f (w) < f (z), so, applying Lemma 1, we
have that T 0 is cheaper than T , a contradiction. Hence, this case is not
possible.

Theorem 3 The algorithm HUF(A, f ) computes an optimal tree for frequencies


f and alphabet A.

Proof The proof is by induction on the size of the alphabet. The induction
hypothesis is that for all A with |A| = n and for all frequencies f , HUF(A, f )
computes the optimal tree.
In the base case (n = 1), the tree is only one vertex and the cost is zero,
which is the smallest possible.
For the general case, assume that the induction hypothesis holds for n − 1.
That is, T 0 is optimal for A0 and f 0 . First, let us show the following:
X
ABL(T ) = ( f (x)depth(x, T )) + f (w)depth(w, T ) + f (y)depth(y, T )
x∈A\{w,y}
X
=( f (x)depth(x, T )) + (f (w) + f (y))(depth(z, T 0 ) + 1)
x∈A\{w,y}
X
=( f (x)depth(x, T )) + f 0 (z)depth(z, T 0 ) + f (w) + f (y)
x∈A\{w,y}
X
=( f 0 (x)depth(x, T 0 )) + f (w) + f (y)
x∈A0
= ABL(T 0 ) + f (w) + f (y)

Now, assume for the sake of contradiction that T is not optimal, and let Z be
an optimal tree that has w and y as siblings (this exists by the above lemma).
Let Z 0 be the tree obtained from Z by removing w and y. We can view Z 0 as
a tree for the alphabet A0 and frequency function f 0 . We can then repeat the
calculation above and get ABL(Z) = ABL(Z 0 ) + f (w) + f (y). So, ABL(T 0 ) =
ABL(T )−f (w)−f (y) > ABL(Z)−f (w)−f (y) = ABL(Z 0 ). Since T 0 is optimal
for A0 and f 0 , this is a contradiction.

You might also like