1 Warm-Up Examples: COS597D: Information Theory in Computer Science

Uploaded by

rayeliu233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

1 Warm-Up Examples: COS597D: Information Theory in Computer Science

Uploaded by

rayeliu233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

COS597D: Information Theory in Computer Science September 21, 2011

Lecture 2
Lecturer: Mark Braverman Scribe: Mark Braverman∗

In the last lecture, we introduced entropy H(X), and conditional entry H(X|Y ), and showed how they
are related via the chain rule. We also proved the inequality H(X1 , . . . , Xn ) ≤ H(X1 ) + · · · + H(Xn ). We
also showed that H(X|Y ) ≤ H(X). Equality holds if and only if X and Y are independent. Similarly to
this inequality, one can show that more generally,
Lemma 1. H(X|Y Z) ≤ H(X|Y ).
In this lecture, we will derive one more useful inequality and then give some examples of applying entropy
in combinatorics. We start with some simple examples.

1 Warm-up examples
Example 2. Prove that for any deterministic function g(y), H(X|Y ) ≤ H(X|g(Y )).
Solution We have
H(X|Y ) = H(X|Y g(Y )) ≤ H(X|g(Y )).
Here the equality holds because the value of Y completely determines the value of g(Y ). The inequality is
an application of Lemma 1.

Example 3. Consider a bin with n balls of various colors. In one experiment k ≤ n balls are taken out
with replacement. In another the balls are taken out without replacement. In which case does the resulting
sequence have higher entropy?
Solution One quick way to get the answer (but not the proof) is to test the question for very small
numbers. Suppose n = k = 2 and the bin contains one red and one blue ball. Then the possible outcomes
of the first experiment are RR, RB, BR, and BB – all with equal probabilities. Hence the entropy of the
first experiment is 2 bits. The possible outcomes of the second experiment are RB and BR, and hence the
entropy is only 1 bit. Thus we should try to prove that the first experiment has the higher entropy.
Denote the random variables representing the colors of the balls in the first experiment by X1 , . . . , Xk
and in the second experiment by Y1 , . . . , Yk . The first experiment is run with replacements, and hence all
the Xi ’s are independent and identically distributed. Thus we have:

H(X1 . . . Xk ) = H(X1 )+H(X2 |X1 )+. . .+H(Xk |X1 . . . Xk−1 ) = H(X1 )+H(X2 )+. . .+H(Xk ) = k ·H(X1 ),

where the first equality is just the Chain Rule, and the second equality follows from the fact that the variables
are independent.
Next, we observe that the distribution of the i-th ball drawn in the second experiment is the same
as the original distribution of the colors in the bag, which is the same as the distribution of X1 . Thus
H(Yi ) = H(X1 ) for all i. Finally, we get

H(Y1 . . . Yk ) = H(Y1 ) + H(Y2 |Y1 ) + . . . + H(Yk |Y1 . . . Yk−1 ) ≤ H(1 ) + H(Y2 ) + . . . + H(Yk ) = k · H(X1 ),

which shows that the second experiment has the lower entropy. Note that equality holds only when the Yi ’s
are independent, which would only happen if all the balls are of the same color, and thus both entropies are
0!

∗ Based on lecture notes by Anup Rao and Kevin Zatloukal

1
Example 4. Let X and Y be two (not necessarily independent) random variables. Let Z be a random
variable that is obtained by tossing a fair coin, and setting Z = X if the coin comes up heads and Z = Y if
the coin comes up tails. What can we say about H(Z) in terms of H(X) and H(Y )?
Solution We claim that
(H(X) + H(Y ))/2 ≤ H(Z) ≤ (H(X) + H(Y ))/2 + 1.
These bounds are tight. To see this consider the case when X = 0 and Y = 1 are just two constant
random variables and thus H(X) = H(Y ) = 0. We then have Z ∼ B1/2 distributed as a fair coin with
H(Z) = 1 = (H(X) + H(Y ))/2 + 1. At the same time, if X and Y are any i.i.d. (independent, identically
distributed) random variables, then Z will have the same distribution as X and Y and thus H(Z) = H(X) =
H(Y ) = (H(X) + H(Y ))/2.
To prove the inequalities denote by S the outcome of the coin toss when we select whether Z = X or
Z = Y . Then we have
1 1 1 1
H(Z) ≥ H(Z|S) = H(Z|S = 0) + H(Z|S = 1) = H(X) + H(Y ).
2 2 2 2
For the other direction, note that
1 1
H(Z) ≤ H(ZS) = H(S) + H(Z|S) = 1 + H(X) + H(Y ).
2 2

2 One more inequality

We show that the uniform distribution has the highest entropy. In fact, we will see that a distribution has
the maximum entropy if and only if it is uniform.
Lemma 5. Let X be the support of X. Then H(X) ≤ log |X |.
Proof We write H(X) as an expectation and then apply Jensen’s inequality (from Lecture 1) to the
convex function log(1/t):
! !
X X X
H(X) = p(x) log(1/p(x)) ≤ log p(x)(1/p(x)) = log 1 = log |X |
x∈X x∈X x∈X

3 Applications
3.1 Bounding the Binomial Tail
Suppose 2n + 1 people have each watched a subset of n movies. Since there are only 2n possible subsets
of these movies, there must be two people that have watched exactly the same subset by the pigeonhole
principle.
We shall show how to argue something similar in less trivial cases where the pigeonhole principle does
not apply. This first example will show one way to do that using information theory.
Suppose 2n people have each watched a subset of 2n movies and every person has watched at least 90%
of the movies. If the number of possible subsets meeting this constraint is less than 2n , then we must have
two people who have watched exactly the same subset, of movies as before. The following result will give us
what we need.

2
Pk n

Lemma 6. If k ≤ n/2, then i=0 ≤ 2nH(k/n) .
i
P2n P0.1(2n) 2n
We would like to compute i=0.9(2n) 2n 2n 2n

i . Since i = 2n−i , this sum is equal to i=0 i ≤
22nH(1/10) < 2n since we can compute H(0.1) < 0.469 < 0.5.
It remains only to prove the lemma.
Proof Let X1 X2 · · · Xn be a uniformly random string sampled from the set of n-bit strings of weight
Pk n

at most k. Thus, H(X1 X2 · · · Xn ) = log i=0 k . Further, we have that Pr[Xi = 1] = E [Xi ]. By
hP i
n n
symmetry, this probability is equal to n1 j=1 E [Xj ] = n1 E k
P
j=1 Xj ≤ n , where we have used linearity
of expectation. We can relate this to the entropy by using the fact that p ≤ H(p) for 0 ≤ p ≤ 21 . Finally,
applying our inequality from last see that H(X1 X2 · · · Xn ) ≤ H(X1 ) + · · · + H(Xn ) ≤ nH(k/n).
P time, we
k n
Thus, we have shown that log i=0 i ≤ nH(k/n), proving the lemma.

3.2 Triangles and Vees [KR10]

Let G = (V, E) be a directed graph. We say that vertices (x, y, z) (not necessarily distinct) form a triangle
if {(x, y), (y, z), (z, x)} ⊂ E. Similarly, we say they form a vee if {(x, y), (x, z)} ⊂ E. Let T be the number
of triangles in the graph, and V be the number of vees.
We are interested in the relationship between V and T . From any particular triangle, we can get one vee
from each edge, say (x, y), by repeating the second vertex: (x, y, y) is a vee. If the vertices of a triangle are
distinct, the number of vees in the triangle is equal to the number of triangles contributed by the vertices
of the triangle, since the three cyclic permutations of (x, y, z) are distinct triangles. However, the same edge
could be used in many different triangles, so that this simple counting argument does not tell us anything
about the relationship between V and T . We shall use an information theory based argument to show:
Lemma 7. In any directed graph, V ≥ T .

Proof Let (X, Y, Z) be a uniformly random triangle. Then by the chain rule, log(T ) = H(X, Y, Z) =
H(X) + H(Y |X) + H(Z|X, Y ). We will construct a distribution on vees with at least log T entropy, which
together with Lemma 5, would imply that log V ≥ log T ⇒ V ≥ T .
Since conditioning can only reduce entropy, we have that H(X, Y, Z) ≤ H(X) + H(Y |X) + H(Z|Y ). Now
observe that by symmetry, the joint distribution of Y, Z is exactly the same as that of X, Y . Thus we can
simply the bound to log T ≤ H(X) + 2H(Y |X).
Sample a random vee, (A, B, C) with the distribution

q(a, b, c) = Pr[X = a] · Pr[Y = b|X = a] · Pr[Y = c|X = a].

In words, we sample the first vertex with the same distribution as X, and then sample two independent copies
of Y to use as the second and third vertices. Observe that if q(a, b, c) > 0, then (a, b, c) must be a vee, so this
is a distribution on vees. On the other hand, the entropy of this distribution is H(A)+H(B|A)+H(C|AB) =
H(A) + H(B|A) + H(C|A) = H(X) + 2H(Y |X), which is at least log T as required.

The lemma is tight. Consider a graph with 3n vertices partitioned into three sets A, B, C with |A| =
|B| = |C| = n. Suppose that the edges are {(a, b) | a ∈ A, b ∈ B} and similarly for B, C and C, A. For each
triple (a, b, c), we get three triangles – (a, b, c), (b, c, a), and (c, a, b) – so T = 3n3 . On the other hand, each
vertex a ∈ A is involved in n2 vees of the form (a, b1 , b2 ), and similarly for B, C. So V = 3n3 .

3.3 Counting Perfect Matchings [Rad97]

Suppose that we have a bipartite graph G = (A∪B, E), with |A| = |B| = n. Let A = [n]. A perfect matching
is a subset of E that is incident to every vertex exactly once. Hence, it is a bijection between the sets A

3
and B. How many possible perfect matchings can there be in a givenQgraph? If we let dv be the degree of
vertex v, then a trivial bound on the number of perfect matchings is i∈A di . A tighter bound was proved
by Brégman:
Q 1/d
Theorem 8 (Brégman). The number of perfect matchings is at most i∈A (di !) i .
To see that this is tight, consider the complete bipartite graph. Any bijection can be chosen, and the
number of bijections
Qn is the number of permutations of n letters, which is n!. In this case, the bound of the
theorem is i=1 (n!)1/n = (n!)(1/n)·n = n!.
We give a simple proof of this theorem using information theory, due to Radhakrishnan [Rad97].
Proof Let ρ be a uniformly random perfect matching, and for simplicity, assume that A = 1, 2, . . . , n.
Write ρ(i) for the neighbor of i under the matching ρ. Given a permutation τ , we write τ (i) to denote the
concatenation τ (i), τ (i − 1), . . . , τ (1).
Then, using the chain rule, the fact that conditioning only reduces entropy, and the fact that the entropy
of a variable is at most the logarithm of the size of its support,
n
X n
X n
X n
Y
H(ρ) = H(ρ(i)|ρ(i − 1)) ≤ H(ρ(i)) ≤ log di = log di
i=1 i=1 i=1 i=1
Qn
This proves that the number of matchings is at most i=1 di . Can we improve the proof? In computing
H(ρ(i)| . . . ), we are losing too much by throwing away all the previous values.
To improve the bound, let us start by symmetrizing over the order in which we condition the individual
values ofPρ. If π is any permutation, then conditioning in the order of π(1), π(2), . . . , π(n) shows that
n
H(ρ) = i=1 H(ρπ(i)|ρπ(i − 1)), where here ρπ(i) denotes ρ(π(i)). Since this is true for any choice of π,
we can take the expectation over a uniformly random choice of π without changing the value. Let L be a
uniformly random index in {1, 2, . . . , n}. Then,
n
X 1 1
H(ρπ(L)|L, π, ρπ(L − 1)) = H(ρπ(i)|π, ρπ(i − 1)) = H(ρ).
i=1
n n
Now we rewrite this quantity according to the contribution of each vertex in A:
Xn h i
H(ρπ(L)|L, π, ρπ(L − 1)) = Pr[π(L) = i] E H(ρπ(L)|L, π, ρπ(L − 1))
π,L s.t. π(L)=i
i=1
n
X h i
= (1/n) E H(ρπ(L)|L, π, ρπ(L − 1))
π,L s.t. π(L)=i
i=1

Consider any fixed perfect matching ρ. We are interested in the number of possible choices for ρπ(L)
conditioned on π(L) = i, after ρπ(1), . . . , ρπ(L − 1) have been revealed. Let a1 , a2 , . . . , adi −1 be such that
the set of neighbors of i in the graph is exactly {ρ(a1 ), ρ(a2 ), . . . , ρ(adi −1 ), ρ(i)}. π, L in the expectation can
be sampled by first sampling a uniformly random permutation π and then setting L so that π(L) = i. Thus,
the ordering of {a1 , a2 , . . . , adi −1 , i} induced by π is uniformly random, and
|{ρ(a1 ), ρ(a2 ), . . . , ρ(adi −1 )} ∩ {ρπ(L − 1), . . . , ρπ(1)}| = |{a1 , a2 , . . . , adi −1 } ∩ {π(L − 1), . . . , π(1)}|
is equally likely to be 0, 1, 2, . . . , di − 1. The number of available choices for ρ(π(L)) is equally likely to be
bounded by 1, 2, . . . , di . This allows us to bound
Xn h i
(1/n)H(ρ) = (1/n) E H(ρπ(L)|L, π, ρπ(L − 1))
π,Ls.t.π(L)=i
i=1
di
n X n
!
X X Y
≤ (1/n) (1/di ) log(j) = (1/n) log (di !)1/di = (1/n) log (di !)1/di ,
i=1 j=1 i=1 i∈A
1/di
Q
which proves that the number of perfect matchings is at most i∈A (di !) .

4
References
[KR10] Swastik Kopparty and Benjamin Rossman. The homomorphism domination exponent. Technical
report, ArXiv, April 14 2010.
[Rad97] Jaikumar Radhakrishnan. An entropy proof of bregman’s theorem. J. Comb. Theory, Ser. A,
77(1):161–164, 1997.

Mathematical Problems and Solutions On Information Theory
No ratings yet
Mathematical Problems and Solutions On Information Theory
28 pages
No of Flips For First Head
No ratings yet
No of Flips For First Head
8 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
Info Theory Course Notes
No ratings yet
Info Theory Course Notes
46 pages
Three Tutorial Lectures
No ratings yet
Three Tutorial Lectures
36 pages
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
No ratings yet
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
78 pages
Indian Institute of Technology Bombay
No ratings yet
Indian Institute of Technology Bombay
6 pages
Entropy Bounds in Graph Theory
No ratings yet
Entropy Bounds in Graph Theory
9 pages
Practice 2
No ratings yet
Practice 2
7 pages
Entropy and Correlation Solutions
No ratings yet
Entropy and Correlation Solutions
4 pages
Elements of Information Theory.2nd Ex 2.4
No ratings yet
Elements of Information Theory.2nd Ex 2.4
4 pages
Solution To Homework #1: (A) (B) (C) (D)
No ratings yet
Solution To Homework #1: (A) (B) (C) (D)
4 pages
ProbabilisticMethod 12
No ratings yet
ProbabilisticMethod 12
10 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
Uncertain We Are of The Outcome
No ratings yet
Uncertain We Are of The Outcome
14 pages
ProbabilisticCombinatorics 15 MAR 2019
No ratings yet
ProbabilisticCombinatorics 15 MAR 2019
114 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Entropy 1
No ratings yet
Entropy 1
7 pages
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
85% (55)
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
397 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
Alberto - Leon-Garcia 2009 Student Solutions Manual
86% (7)
Alberto - Leon-Garcia 2009 Student Solutions Manual
204 pages
Probabilistic Methods in Information Theory
No ratings yet
Probabilistic Methods in Information Theory
48 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Probabilistic Arguments in Graph Theory
No ratings yet
Probabilistic Arguments in Graph Theory
7 pages
Lec38 - 210108071 - AKSHAY KUMAR JHA
No ratings yet
Lec38 - 210108071 - AKSHAY KUMAR JHA
12 pages
Ict Solution
No ratings yet
Ict Solution
41 pages
HW 1
No ratings yet
HW 1
4 pages
Entropy Methods in Combinatorics: Daniel Naylor
No ratings yet
Entropy Methods in Combinatorics: Daniel Naylor
50 pages
ProbabilisticMethod 4
No ratings yet
ProbabilisticMethod 4
13 pages
(Some) Solutions For HW Set # 2
No ratings yet
(Some) Solutions For HW Set # 2
3 pages
Shannon PDF
No ratings yet
Shannon PDF
6 pages
Exercises From Probability With Martingales
No ratings yet
Exercises From Probability With Martingales
54 pages
lời giải
No ratings yet
lời giải
52 pages
Possible Generalization of Boltzmann-Gibbs Statist
No ratings yet
Possible Generalization of Boltzmann-Gibbs Statist
10 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Random Cliques in Random Graphs Revisited
No ratings yet
Random Cliques in Random Graphs Revisited
51 pages
Ee5143 Pset1 PDF
No ratings yet
Ee5143 Pset1 PDF
4 pages
Practice 1
No ratings yet
Practice 1
4 pages
A Joint Representation of Rényi's and Tsalli's Entropy With Application in Coding Theory - 2017 - International Journal of Mathematics A
No ratings yet
A Joint Representation of Rényi's and Tsalli's Entropy With Application in Coding Theory - 2017 - International Journal of Mathematics A
6 pages
Random Graphs: Techniques & Thresholds
No ratings yet
Random Graphs: Techniques & Thresholds
35 pages
E2 201: Information Theory (2019) Solutions To Homework 3
No ratings yet
E2 201: Information Theory (2019) Solutions To Homework 3
11 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
HW 1
No ratings yet
HW 1
3 pages
Homework 3 Solutions
No ratings yet
Homework 3 Solutions
9 pages
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
No ratings yet
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
8 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Notes For Miscellaneous Lectures
No ratings yet
Notes For Miscellaneous Lectures
5 pages
Information Theory for Students
No ratings yet
Information Theory for Students
16 pages
Tsallis
No ratings yet
Tsallis
9 pages
Tsallis1988 Article PossibleGeneralizationOfBoltzm PDF
No ratings yet
Tsallis1988 Article PossibleGeneralizationOfBoltzm PDF
9 pages
Information Theory Exercises
No ratings yet
Information Theory Exercises
4 pages
ProbabilisticMethod 11
No ratings yet
ProbabilisticMethod 11
10 pages
Solution 3: Problem 1
No ratings yet
Solution 3: Problem 1
6 pages
Paper 5 Essentials Guideline
No ratings yet
Paper 5 Essentials Guideline
5 pages
History, Concepts, Theories and Principles of Environmental Planning
No ratings yet
History, Concepts, Theories and Principles of Environmental Planning
43 pages
Learning Module 1.2 - Ngec2 - RPH - Historical Sources
No ratings yet
Learning Module 1.2 - Ngec2 - RPH - Historical Sources
7 pages
12th Board Pattern Major Test Series-2024-25
No ratings yet
12th Board Pattern Major Test Series-2024-25
1 page
BATAAN 04 Quiz 1
No ratings yet
BATAAN 04 Quiz 1
3 pages
Educational Series for Teachers
No ratings yet
Educational Series for Teachers
6 pages
Organizational Culture Insights
100% (5)
Organizational Culture Insights
49 pages
Manuscript - Docx WWWW (2) .PDF Carbonation Depth
No ratings yet
Manuscript - Docx WWWW (2) .PDF Carbonation Depth
1 page
Make The Move To Multicloud With Cisco ACI - Cisco
No ratings yet
Make The Move To Multicloud With Cisco ACI - Cisco
4 pages
Case Study Research for Students
100% (2)
Case Study Research for Students
3 pages
Astro Notes
No ratings yet
Astro Notes
62 pages
Onomatopoeia
No ratings yet
Onomatopoeia
7 pages
Business Statistics Assignment 2
No ratings yet
Business Statistics Assignment 2
2 pages
BDM25
No ratings yet
BDM25
45 pages
PROJECT
No ratings yet
PROJECT
13 pages
Nimcet 2024 Questions Paper - Original
No ratings yet
Nimcet 2024 Questions Paper - Original
12 pages
A. B. C. D.: Answer
No ratings yet
A. B. C. D.: Answer
5 pages
LESSON PLAN - Fire and Ice
100% (10)
LESSON PLAN - Fire and Ice
5 pages
Science Grade Nine Notes
No ratings yet
Science Grade Nine Notes
6 pages
Form Faculty Performance Evaluation by The Students First Term 2020 2021
No ratings yet
Form Faculty Performance Evaluation by The Students First Term 2020 2021
22 pages
Design Rationale
No ratings yet
Design Rationale
2 pages
MIDTERM EXAMINATION in EARTH SCIENCE
No ratings yet
MIDTERM EXAMINATION in EARTH SCIENCE
3 pages
Transformer Lifting-Plan (R02)
100% (2)
Transformer Lifting-Plan (R02)
40 pages
Astrobiology An Introduction
No ratings yet
Astrobiology An Introduction
2 pages
2021 Nature Bichler Equilibrium Learning
No ratings yet
2021 Nature Bichler Equilibrium Learning
33 pages
LTO Region IV-A Data Analysis
No ratings yet
LTO Region IV-A Data Analysis
11 pages
DLL Meiosis
No ratings yet
DLL Meiosis
47 pages
Intensive Program For NEET-2025-01 Result 27-03-2025 ALL
No ratings yet
Intensive Program For NEET-2025-01 Result 27-03-2025 ALL
8 pages
Unit I - QB
100% (1)
Unit I - QB
3 pages
U7.1 E11 Practice 1-2-3
No ratings yet
U7.1 E11 Practice 1-2-3
1 page

1 Warm-Up Examples: COS597D: Information Theory in Computer Science

Uploaded by

1 Warm-Up Examples: COS597D: Information Theory in Computer Science

Uploaded by

COS597D: Information Theory in Computer Science September 21, 2011

∗ Based on lecture notes by Anup Rao and Kevin Zatloukal

2 One more inequality

3.2 Triangles and Vees [KR10]

q(a, b, c) = Pr[X = a] · Pr[Y = b|X = a] · Pr[Y = c|X = a].

3.3 Counting Perfect Matchings [Rad97]

You might also like