Information Theory and Coding
Quantitative measure of information
Cédric RICHARD
Université Côte d’Azur
Self-information
Information content
Let A be an event with non-zero probability P (A).
The greater the uncertainty of A, the larger the information h(A) provided by the
realization of A. This can be expressed as follows:
1
h(A) = f .
P (A)
Function f (·) must satisfy the following properties:
f (·) is an increasing function over IR+
information provided by 1 sure event is zero: limp→1 f (p) = 0
information provided by 2 independent events: f (p1 · p2 ) = f (p1 ) + f (p2 )
This leads us to use the logarithmic function for f (·)
1
Self-information
Information content
Lemme 1. Function f (p) = − logb p is the only one that is both positive, continue
over (0, 1], and that satisfies f (p1 · p2) = f (p1) + f (p2).
Proof. The proof consists of the following steps:
1. f (pn ) = n f (p)
1/n
2. f p = n1 f (p) after replacing p with p1/n
3. f pm/n = m n f (p) by combining the two previous equalities
4. f (pq ) = q f (p) where q is any positive rational number
5. f (pr ) = limn→+∞ f (pqn ) = limn→+∞ qn f (p) = r f (p) because rationals are
dense in the reals
Let p and q in (0, 1[. One can write: p = q logq p , which yields:
f (p) = f q logq p = f (q) logq p.
We finally arrive at: f (p) = − logb p
2
Self-information
Information content
Definition 1. Let (Ω, A, P ) be a probability space, and A an event of A with
non-zero probability P (A). The information content of A is defined as:
h(A) = − log P (A).
Unit. The unit of h(A) depends on the base chosen for the logarithm.
log2 : Shannon, bit (binary unit)
loge : logon, nat (natural unit)
log10 : Hartley, decit (decimal unit)
Vocabulary. h(·) represents the uncertainty of A, or its information content.
3
Self-information
Information content
Information content or uncertainty: h(A) = − logb P (A)
h(A)
P (A)
0
0 1/b 0.5 1
4
Self-information
Information content
Example 1. Consider a binary source S ∈ {0, 1} with P (0) = P (1) = 0.5.
Information content conveyed by each binary symbol is equal to: h 12 = log 2,
namely, 1 bit or Shannon.
Example 2. Consider a source S that randomly selects symbols si among 16
equally likely symbols {s0 , . . . , s15 }. Information content conveyed by each symbol
is log 16 Shannon, that is, 4 Shannon.
Remark. The bit in Computer Science (binary digit ) and the bit in Information
Theory (binary unit ) do not refer to the same concept.
5
Self-information
Conditional information content
Self-information applies to 2 events A and B. Note that P (A, B) = P (A) P (B|A).
We get:
h(A, B) = − log P (A, B) = − log P (A) − log P (B|A)
Note that − log P (B|A) is the information content of B that is not provided by A.
Definition 2. Conditional information content of B given A is defined as:
h(B|A) = − log P (B|A),
that is: h(B|A) = h(A, B) − h(A).
Exercise. Analyze and interpret the following cases: A ⊂ B, A = B, A ∩ B = ∅.
6
Self-information
Mutual information content
The definition of conditional information leads directly to another definition, that
of mutual information, which measures information shared by two events.
Definition 3. We call mutual information of A and B the following quantity:
i(A, B) = h(A) − h(A|B) = h(B) − h(B|A).
Exercise. Analyze and interpret the following cases: A ⊂ B, A = B, A ∩ B = ∅.
7
Entropy of a random variable
Definition
Consider a memoryless stochastic source S with alphabet {s1 , . . . , sn }. Let pi be
the probability P (S = si ).
The entropy of S is the average amount of information produced by S:
n
H(S) = E{h(S)} = − pi log pi .
i=1
Definition 4. Let X be a random variable that takes its values in {x1 , . . . , xn }.
Entropy of X is defined as follows:
n
H(X) = − P (X = xi ) log P (X = xi ).
i=1
8
Entropy of a random variable
Example of a binary random variable
The entropy of a binary random variable is given by:
H(X) = −p log p − (1 − p) log(1 − p) H2 (p).
H2 (p) is called the binary entropy function.
1
entropy H2 (Sh/symb)
0.5
0
0 0.5 1
probability p
9
Entropy of a random variable
Notation and preliminary properties
Lemme 2 (Gibbs’ inequality). Consider 2 discrete probability distributions with
mass functions (p1 , . . . , pn ) and (q1 , . . . , qn ). We have:
n
qi
pi log ≤0
i=1
pi
Equality is achieved when pi = qi for all i
Proof. The proof is carried out in the case of the neperian logarithm. Observe
that ln x ≤ x − 1, with equality for x = 1. Let x = pqii . We have:
n
n
qi qi
pi ln ≤ pi −1 = 1 − 1 = 0.
i=1
pi i=1
pi
10
Entropy of a random variable
Notation and preliminary properties
Graphical checking of inequality ln x ≤ x − 1
y
y =x−1
2
1
y = ln x
0
0.5 1 1.5 2 2.5
x
−1
−2
−3
−4
11
Entropy of a random variable
Properties
Property 1. The entropy satisfies the following inequality:
Hn (p1 , . . . , pn ) ≤ log n,
1
Equality is achieved by the uniform distribution, that is, pi = n for all i.
1
Proof. Based on Gibbs’ inequality, we set qi = n.
Uncertainty about the outcome of an experiment is maximum when all possible
outcomes are equiprobable.
12
Entropy of a random variable
Properties
Property 2. The entropy increases as the number of possible outcomes increases.
Proof. Let X be a discrete random variable with values in {x1 , . . . , xn } and
probabilities (p1 , . . . , pn ), respectively. Consider that state xk is split into two
substates xk1 et xk2 , with non-zero probabilities pk1 et pk2 such that pk = pk1 + pk2 .
Entropy of the resulting random variable X is given by:
H(X ) = H(X) + pk log pk − pk1 log pk1 − pk2 log pk2
= H(X) + pk1 (log pk − log pk1 ) + pk2 (log pk − log pk2 ).
The logarithmic function being strictly increasing, we have: log pk > log pki . This
implies: H(X ) > H(X).
Interpretation. Second law of thermodynamics
13
Entropy of a random variable
Properties
Property 3. The entropy Hn is a concave function of p1 , . . . , pn .
Proof. Consider 2 discrete probability distributions (p1 , . . . , pn ) and (q1 , . . . , qn ).
We need to prove that, for every λ in [0, 1], we have:
Hn (λp1 + (1 − λ)q1 , . . . , λpn + (1 − λ)qn ) ≥ λHn (p1 , . . . , pn ) + (1 − λ)Hn (q1 , . . . , qn ).
By setting f (x) = −x log x, we can write:
n
Hn (λp1 + (1 − λ)q1 , . . . , λpn + (1 − λ)qn ) = f (λpi + (1 − λ)qi ).
i=1
The result is a direct consequence of the concavity of f (·) and Jensen’s inequality.
14
Entropy of a random variable
Properties
Graphical checking of the concavity of f (x) = −x log x
0
f (x)
−5
0 1 2
x
15
Entropy of a random variable
Properties
Concavity of Hn can be generalized to any number m of distributions.
Property 4. Given {(q1j , . . . , qnj )}m
j=1 a finite set of discrete probability
distributions, the following inequality is satisfied:
m
m
m
Hn ( λj q1j , . . . , λj qmj ) ≥ λj Hn (q1j , . . . , qmj ),
j=1 j=1 j=1
m
where {λj }m
j=1 is any set of constants in [0, 1] such that j=1 λj = 1.
Proof. As in the previous case, the demonstration of this inequality is based on
the concavity of f (x) = −x log x and Jensen’s inequality.
16
Pair of random variables
Joint entropy
Definition 5. Let X and Y be two random variables with values in {x1 , . . . , xn }
and {y1 , . . . , ym }, respectively. The joint entropy of X and Y is defined as:
n
m
H(X, Y ) − P (X = xi , Y = yj ) log P (X = xi , Y = yj ).
i=1 j=1
The joint entropy is symmetric: H(X, Y ) = H(Y, X)
Example. Case of two independent random variables
17
Pair of random variables
Conditional entropy
Definition 6. Let X and Y be two random variables with values in {x1 , . . . , xn }
and {y1 , . . . , ym }, respectively. The conditional entropy of X given Y = yj is:
n
H(X|Y = yj ) − P (X = xi |Y = yj ) log P (X = xi |Y = yj ).
i=1
H(X|Y = yj ) is the amount of information needed to describe the outcome of X
given that we know that Y = yj .
Definition 7. The conditional entropy of X given Y is defined as:
m
H(X|Y ) P (Y = yj ) H(X|Y = yj ),
j=1
Example. Case of two independent random variables
18
Pair of random variables
Relations between entropies
H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ).
These equalities can be obtained by first writing:
log P (X = x, Y = y) = log P (X = x|Y = y) + log P (Y = y),
and then taking the expectation of each member.
Property 5 (chain rule). The joint entropy of n random variables can be evaluated
using the following chain rule:
n
H(X1 , . . . , Xn ) = H(Xi |X1 . . . Xi−1 ).
i=1
19
Pair of random variables
Relations between entropies
Each term of H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ) is positive. We can
conclude that:
H(X) ≤ H(X, Y )
H(Y ) ≤ H(X, Y )
20
Pair of random variables
Relations between entropies
From the generalized concavity of the entropy, setting qij = P (X = xi |Y = yj ) and
λj = P (Y = yj ), we get the following inequality:
H(X|Y ) ≤ H(X)
Conditioning a random variable reduces its entropy. Without proof, this can be
generalized as follows:
Property 6 (entropy decrease with conditioning). The entropy of a random
variable decreases with successive conditionings, namely,
H(X1 |X2 , . . . , Xn ) ≤ . . . ≤ H(X1 |X2 , X3 ) ≤ H(X1 |X2 ) ≤ H(X1 ),
where X1 , . . . , Xn denote n discrete random variables.
21
Pair of random variables
Relations between entropies
Consider X and Y two random variables, respectively with values in {x1 , . . . , xn }
and {y1 , . . . , ym }. We have:
0 ≤ H(X|Y ) ≤ H(X) ≤ H(X, Y ) ≤ H(X) + H(Y ) ≤ 2H(X, Y ).
22
Pair of random variables
Mutual information
Definition 8. The mutual information of two random variables X and Y is
defined as follows:
I(X, Y ) H(X) − H(X|Y )
or, equivalently,
n
m
P (X = xi , Y = yj )
I(X, Y ) P (X = xi , Y = yj ) log .
i=1 j=1
P (X = xi ) P (Y = yj )
The mutual information quantifies the amount of information obtained about one
random variable through observing the other random variable.
Exercise. Case of two independent random variables
23
Pair of random variables
Mutual information
In order to give a different interpretation of mutual information, the following
definition is recalled beforehand.
Definition 9. We call the Kullback-Leibler distance between two distributions P1
and P2 , here supposed to be discrete, the following quantity:
P1 (X = x)
d(P1 , P2 ) = P1 (X = x) log .
P2 (X = x)
x∈X(Ω)
The mutual information corresponds to the Kullback-Leibler distance between the
marginal distributions and the joint distribution of X and Y .
24
Pair of random variables
Venn diagram
A Venn diagram can be used to illustrate relationships among measures of
information: entropy, joint entropy, conditional entropy and mutual information.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
H(X) H(Y |X) H(X|Y ) H(Y )
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
H(X, Y ) H(X|Y ) I(X, Y ) H(Y )
25