0% found this document useful (0 votes)
49 views5 pages

Sheet 1 Solution

sheet 1 solution for Information theroey and coding

Uploaded by

omarkhedr7525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views5 pages

Sheet 1 Solution

sheet 1 solution for Information theroey and coding

Uploaded by

omarkhedr7525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Problem sheet 1, Information Theory, MT 2022

Designed for the first tutorial class

Question 1 We are given a deck of n cards in order 1, 2, · · · , n. Then a randomly chosen card
is removed and placed at a random position in the deck. What is the entropy of the resulting
deck of card?

Answer 1 There are evenly n cards be picked up, and n places to be placed evenly. So there are
1/n2 different actions with even probability 1/n2 and some of them result in the same outcome as
following:

(a) The original order can be resulted by any card be picked up and placed at its original place, so
the probability of the original order is 1/n;

(b) a swap of two adjacent card can be resulted by two different operations. There are n − 1 of
these results, each with probability 2/n2 .

(c) a card is moved at least 2 positions away: there are n × (n − 3) + 2 possible results, each with
probability 1/n2 .

So there are 1 + (n − 1) + (n2 − 3n + 2) different results with probabilities as above, whose entropy
is
1 2 1
H(“deck”) = log(n) + (n − 1) 2 log(n2 /2) + (n2 − 3n + 2) 2 log(n2 )
n n n
1 
n log(n) + 2(n − 1)(2 log(n) − 1) + (n2 − 3n + 2)2 log(n)

= 2
n
2n − 1 2n − 2
= log(n) + .
n n2

Question 2 (Polling inequalities) Let a ≥ 0, b ≥ 0 are given with a + b > 0. Show that

a+b
−(a + b) log(a + b) ≤ −a log(a) − b log(b) ≤ −(a + b) log( )
2
and that the first inequality becomes an equality iff ab = 0, the second inequality becomes an
equality iff a = b.

1
a
Answer 2 Denote p = a+b . Divided by a + b and then add log(a + b) on all three terms, the
equalities are equivalent to
1
0 ≤ −p log(p) − (1 − p) log(1 − p)) ≤ − log( ),
2
which is obvious according to the first basic property of entropy.

Question 3 Let X, Y, Z be discrete random variables. Prove or provide a counterexample to


the following statements:

(a) H(X) = H(42X);

(b) H(X|Y ) ≥ H(X|Y, Z);

(c) H(X, Y ) = H(X) + H(Y ).

Answer 3 The first one is true : f (x) = −42x is a bijective.


The second is true: H(X|Y ) − H(X|Y, Z) = I(X, Z|Y ), and the interpretation by informa-
tion/surprise works.
The third is wrong: By the chain rule, H(X, Y ) = H(Y |X) + H(X), and H(Y |X) = H(Y ) if
and only if X, Y are independent. An easy counter example is when Y = X and H(X) > 0, we
have H(X, Y ) = H(X, X) = H(X) < H(Y ) + H(X).

Question 4 Does there exist a discrete random variable X with a distribution such that
H(X) = +∞? If so, describe it as explicitly as possible.

Answer 4 Obviously, H(X) < +∞ for any case with finite image space. So we assume the
image space is the natural nubmers. Here is an counter example: P(X = n) = n logc2 (n) with c =
2 P h 2c 2c(log(log(n))
i
P 1 2 > 0. then H(X) = c
P
2
n n log (n) [log(n log (n))−log(c)] = n n log(n) + 2 −
n n log n n log (n)
P 1
log(c) = +∞ since log(log(n)) → +∞ and n n log(n) = +∞.

2
Question 5 Let X be a finite set, f a real-valued function f : X 7→ R and fix α ∈ R. We
want to maximise the entropy H(X) of a random variable X taking values in X subject to the
constraint
E[f (X)] ≤ α. (1)
Denote by U a uniformly distributed random variable over X . Prove the following optimal
solutions for the maximisation.
(a) If α ∈ [E[f (U )], maxx∈X f (x) ], then the entropy is maximised subject to (1) by the
uniformly distributed random variable U .

(b) If f is non-constant and α ∈ [minx∈X f (x), E[f (U )] ], then the entropy is maximised subject
to (1) by the random variable Z given by

eλf (x)
P (Z = x) = P λf (y)
for x ∈ X .
y∈X e

where λ < 0 is chosen such that E[f (Z)] = α.

(c) (Optional) Prove that under the assumptions of (b), the choice for λ is unique and we
have λ < 0.

Answer 5 (a) Since the uniform distribution achieves the maximal entropy without any con-
strained, so we just need to verify it satisfies the constraint (1), which is obvious.

(b) Recall the Gibbs’ inequality that for any pmf p and q,
X X
− p(x) log(p(x)) ≤ − p(x) log(q(x)).
x∈X x∈X

So we can try to write E[f (X)] into the form of − x∈X p(x) log(q(x))+c(λ)
P
−λ for some constant
λ < 0 and c with p(·) being the pmf of X, for which we should write

λf (x) = log(q(x)) + c(λ) ⇐⇒ eλf (x) e−c(λ) = q(x).

With the fact that q is a pmf, we have


X eλf (x)
c(λ) = log( eλf (x) ), and hence q(x) = P λf (x)
and .
x∈X x∈X e

λf (x)
So for any λ < 0, define the pmf q(x) := P e λf (x) , then
x∈X e

1 X c(λ)
E[f (X)] = p(x) log(q(x)) + .
λ λ
x∈X
P
With λ < 0, we have that E[f (X)] ≤ α is equivalent to − x∈X p(x) log(q(x)) ≤ −λα+c(λ).
P
Hence H(X) ≤ − x∈X p(x) log(q(x)) ≤ −λα + c(λ), (which means −λα + c(λ) is up-
per bound for any λ < x0), and both equalities hold (hence the upper bound is achieved)

3
λf (x)
P e
P
iff p(x) = q(x), i.e., P(X = x) = q(x) = λf (x) and α = x∈X p(x)f (X) =
x∈X e
λf (x)
P e
P
x∈X f (x) eλf (y)
.
y∈X
P
To make sure the existence of λ < 0 such that E[f (X)] = x∈X q(x)f (x) = α, we leave the
proof to part (c).
λf (x)
(c) Denote g(λ) := x∈X f (x) P e eλf (y) . Then g is a differentiable function with
P
y∈X

X eλf (x) X eλf (x) X


g 0 (λ) = f (x)2 P λf (y)
− f (x) P λf (y) )2
f (y)eλf (y)
x∈X y∈X e x∈X
( y∈X e y∈X

= E[f (X)2 ] − (E[f (X)])2 .

Since f is not a constant, so g 0 (λ) > 0, which means g is a strictly increasing and contin-
uous function. Furthermore, g(0) = E[f (U )], g(−∞) = minx∈X f (x). So g(λ) = α ∈
(min f (x), E[f (U )]) admits a unique solution λ < 0.

Question 6 (A revision on strong law of large numbers (SLLN) in probability theory, please
take this question as a reference) Let X be a real-valued random variable.

(a) Assume additionally that X is non-negative. Show that for every x > 0, we have

E[X]
P(X ≥ x) ≤ .
x

(b) Let X be a random variable of mean µ and variance σ 2 . Show that

σ2
P(|X − µ| > ε) ≤ .
2

)n≥1 be a sequence of i.i.d random variables with mean µ and variance σ 2 . Show
(c) Let (XnP
1 m
that m n=1 Xn converges to µ in probability, i.,e., for every ε > 0,

m
!
1 X
lim P Xn − µ >  = 0.
m→+∞ m
n=1

This is a weak version of SLLN. It can bePstrengthen by Borel-Cantelli lemma


1 m
to the often-used version: P(limm→+∞ m n=1 Xn = µ) = 1.

Answer 6 (a) E[X] = E[X1X≥x ] + E[X1X<x ] ≥ E[X1X≥x ] ≥ E[x1X≥x ] = xP(X ≥ x), so


we have the inequality.
E[Y 2 ]
(b) Similar to part (a), for any random variable Y and constant ε > 0, P(|Y | > ε) ≤ ε2
. Apply
Y = X − µ in this inequality, we get the one in the question.

4
1 Pm σ2
(c) For any integer m, denote Ym = m n=1 Xn − µ, then E[Ym ] = 0, Var(Ym ) = m. Hence
σ 2 m→+∞
P(|Ym | > ε) ≤ mε −→ 0.

Question 7 (Optional) Partition the interval [0, 1] into n disjoint sub-intervals of length p1 , · · · , pn .
Let X1 , X2 , · · · be i.i.d. random variables, uniformly distributed on [0, 1], and Zm (i) be the
number of the X1 , · · · , Xm that lie in the ith interval of the partition. Show that the random
variables
n
Z (i) 1 m→+∞
X
Rm = Πni=1 pi m satisfy log(Rm ) −→ pi log(pi ) with probability 1.
m
i=1

Answer 7 Denote Ii as the ith subinterval. By the definition of Zm (i), we have Zm (i) =
Pm
j=1 1Xj ∈Ii , and by the law of large numbers,
Pm !
j=1 1Xj ∈Ii
P lim = pi = 1.
m→+∞ m

It is easy to see that


n n Pm n
1 1 X j=1 1Xj ∈Ii m→+∞
X X
log(Rm ) = Zm (i) log(pi ) = log(pi ) −→ pi log(pi ).
m m m
i=1 i=1 i=1

You might also like