✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Chapter 6: An Introduction to Information Theory
Instructor: Jia-Chin Lin
Department of Communication Engineering,
National Central University
Email: jiachin@ieee.org
✫ ✪
Instructor: Jia-Chin Lin 1
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Outline
• Mathematical Models for Information Sources
• A Logarithmic Measure Of Information
• Lossless Coding of Information Sources
• Lossy Data Compression
• Channel Model and Channel Capacity
✫ ✪
Instructor: Jia-Chin Lin 2
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Mathematical Models for Information
Sources
n o
• Each letter in alphabet x1 , x2 , . . . , xL has a given
probabilily pk
pk = P [X = xk ], 1≤k≤L
• Where
L
X
pk = 1
k=1
• An analog source has an output waveform x(r) that is a sample
function of a stochastic processX(t),use sampling theorem may be
used to represent
∞
X n n
X(t) = X( )sinc[2W (t − )]
n=−∞
2W 2W
✫ ✪
Instructor: Jia-Chin Lin 3
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
A Logarithmic Measure Of Information
• I(x; y) is mutual information between x and y
P [x|y]
I(X; Y ) = log
P [x]
• the mutual information between random variables X and Y is
defined as the average of I(x; y)
XX
I(X; Y ) = P [X = x, Y = y]I(X; Y )
x∈X y∈Y
XX P [x|y]
= P [X = x, Y = y] log
P [x]
x∈X y∈Y
✫ ✪
Instructor: Jia-Chin Lin 4
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
A Logarithmic Measure Of Information
• I(X; Y ) = I(Y ; X)
• I(X; Y ) ≥ 0 with equality if and only if X and Y are independent
• I(X; Y ) ≤ min|X|, |Y | where|X|and|Y | denote the size of the
alphabets
• When the random variables X and Y are statistically
independent,P [x|y] = P [x],thenI(X; Y ) = 0
1
I(x; y) = log = − log P [X = x]
P [X = x]
XX
I(x; y) =− P [X = x, Y = y] log P [X = x]
x∈X y∈Y
X
=− P [X = x] log P [X = x]
x∈X
X
H(X) = − P [X = x] log P [X = x]
✫ ✪
x∈X
Instructor: Jia-Chin Lin 5
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
0 ≤ H(X) ≤ log |X| (1)
I(X; Y ) = H(X) (2)
I(X; Y) ≤ min {H(X), H(Y)} (3)
Y = g(x), then H(Y ) ≤ H(X) (4)
✫ ✪
Instructor: Jia-Chin Lin 6
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Joint and Conditional Entropy
• Joint entropy of X and Y
X
H(X, Y ) = − P [X = x, Y = y] log P [X = x, Y = y]
(x,y)∈X×Y
• When the value of random variable X is known to be x
X
H(Y |X = x) = − P [Y = y|X = x] log P [Y = y|X = x]
(x,y)∈X×Y
P
H(Y |X = x) = P [X = x]H(Y |X = x)
x∈X
P
=− P [X = x, Y = y] log P [X = x, Y = y]
(x,y)∈X×Y
• Then
H(X, Y ) = H(X) + H(Y |X)
✫ ✪
Instructor: Jia-Chin Lin 7
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
• Joint entropy
X
H(X1 , X2 , . . . , Xn ) =− P [X1 = x1 , X2 = x2 , . . . , Xn = xn ]
x1 ,x2,... ,xn
× log P [X1 = x1 , X2 = x2 , . . . , Xn = xn−1 ]
= H(X1 ) + H(X2 |X1 ) + H(X3 |X1 , X2 ) + · · ·
+H(Xn |X1 , X2 , . . . , Xn−1 )
n
X
≤ H(Xi )
i=1
= nH(x)
✫ ✪
Instructor: Jia-Chin Lin 8
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Lossless Source
• Since the number of occurrences of ai in x is roughly npi and the
source is memoryless
N npi
Q
log P [X =x] ≈ log (pi )
i=1
N
X
= npi log pi
i=1
= −nH(X)
• Hence,
P [X =x] ≈ 2−nH(x)
• This states that all typical sequences have roughly the same
probability, and this common probability is 2−nH(x)
✫ ✪
Instructor: Jia-Chin Lin 9
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Discrete Stationary Sources
• The entropy of a block of random variableX1 , X2 , ....Xk
k
X
H(X1 X2 · · · Xk ) = H( Xi | X1 X2 · · · Xi−1 )
i=1
• The entropy per letter for the k-symbol block
1
Hk (X) = H(X1 X2 · · · Xk )
k
• As k → ∞
∆ 1
H∞ (X) = Hk (X) = lim H(X1 X2 · · · Xk )
k k→∞
• From previous equation
H∞ (X) = lim H( Xk | X1 X2 · · · Xk−1 )
k→∞
✫ ✪
Instructor: Jia-Chin Lin 10
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
• Define
H( Xk | X1 X2 · · · Xk−1 ) ≤ H( Xk−1 | X1 X2 · · · Xk−2 )
• for k ≥ 2. From our previous result that conditioning on a random
variable cannot increase entropy, we have
H( Xk | X1 X2 · · · Xk−1 ) ≤ H( Xk−1 | X2 X3 · · · Xk−1 )
H( Xk | X2 X3 · · · Xk−1 ) = H( Xk−1 | X2 X3 · · · Xk−2 )
• This result demonstrates H( Xk | X1 X2 · · · Xk−1 ) is a nonincreasing
sequence in k
• We have result
Hk (X) ≥ H( Xk | X1 X2 · · · Xk−1 )
• Define the entropy rate of the source in terms of the conditional
entropy
✫ ✪
Instructor: Jia-Chin Lin 11
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
• From the definition of Hk (X)
1
Hk (X) = [H(X1 X2 · · · Xk−1 ) + H(Xk |X1 · · · Xk−1 )]
k
1
= [(k − 1)Hk−1 (X) + H(Xk |X1 · · · Xk−1 )]
k
k−1 1
≤ Hk−1 (X) + Hk (X)
k k
• Result
Hk (X) ≤ Hk−1 (X)
• Hk (X) and the conditional entropy Xk−1 ) + H(Xk |X1 · · · Xk−1 ) are
both nonnegative and nonincreasing with k, both limits must exist.
Express Hk+j (X)
1
Hk+j (X) = H(X1 X2 · · · Xk−1 )
k+j
1
+ [H( Xk | X1 X2 · · · Xk−1 ) + H(Xk+1 |X1 · · · Xk )
k+j
+ · · · + H(Xk+j |X1 · · · Xk+j−1 )]
✫ ✪
Instructor: Jia-Chin Lin 12
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
1 j+1
Hk+j (X) = H(X1 X2 · · · Xk−1 ) + H( Xk | X1 X2 · · · Xk−1 )
k+j k+j
• As j → ∞
H∞ (X) ≤ H( Xk | X1 X2 · · · Xk−1 )
• It is valid for k → ∞
H∞ (X) ≤ lim H( Xk | X1 X2 · · · Xk−1 )
k→∞
• From Equation 6.3-12
H∞ (X) ≥ lim H( Xk | X1 X2 · · · Xk−1 )
k→∞
• The entropy rate of a discrete stationary source is defined
1
H∞ (X) = lim H(Xk |X1 , X2 , · · · , Xk−1 ) = lim H(X1 , X2 , · · · , Xk−1 )
k→∞ k→∞ k
✫ ✪
Instructor: Jia-Chin Lin 13
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
The Huffman Coding
• Huffman (1952) devised a variable-length encoding algorithm,
based on the Source letter probabilitiesP (xi ), i = 1, 2, ...., L
• This algorithm is optimum in the sense that the average number of
binary digits required to represent the source symbols are a
minimum, subject to the constraint that the code words satisfy the
prefix condition ,which allows the received sequence to be
uniquely and instantaneously decodable.
✫ ✪
Instructor: Jia-Chin Lin 14
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
The Huffman Coding
✫ ✪
Instructor: Jia-Chin Lin 15
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
The Huffman Coding
✫ ✪
Instructor: Jia-Chin Lin 16
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Entropy and Mutual Information for Continuous Random Variables
• Mutual information
Z ∞ Z ∞
p(y|x)p(x)
I(X; Y ) = p(x)p(y|x) log dxdy
−∞ −∞ p(x)p(y)
• Properties
– I(X; Y ) = I(Y ; X)
– I(X; Y ) ≥ 0
– I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
• Conditional differential entropy
Z ∞Z ∞
H(X|Y ) = − p(x, y) log p(x | y)dxdy
−∞ −∞
✫ ✪
Instructor: Jia-Chin Lin 17
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Entropy and Mutual Information for Continuous Random Variables
The random variable X is discrete and Y is continuous. X has
possible outcomes xi , i = 1, 2, . . . , n. When X and Y are statistically
dependent, the marginal pdf p(y) can be expressed as:
n
X
p(y) = p(y|xi )P[xi ]
i=1
The mutual information provided about the event X = xi by the
occurrence of the event Y = y is
p(y|xi )P[xi ]
I(xi ; y) = log
p(y)P[xi ]
p(y|xi )
= log
p(y)
✫ ✪
Instructor: Jia-Chin Lin 18
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Entropy and Mutual Information for Continuous Random Variables
Then the average mutual information between X and Y is
n Z ∞
X p(y|xi )
I(X; Y ) = p(y|xi )P[xi ] log dy
i=1 −∞ p(y)
✫ ✪
Instructor: Jia-Chin Lin 19
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
ex 6.4-1. Suppose that X is a discrete random variable with two
equally probable outcomes x1 = A and x2 = −A. Let the conditional
pdfs p(y|xi ), i = 1, 2, be Gaussian with mean xi and variance σ 2 .
1 (y − A)2
p(y|A) = √ exp(− 2
)
2πσ 2σ
1 (y + A)2
p(y| − A) = √ exp(− 2
)
2πσ 2σ
The average mutual information
Z ∞
1 p(y|A) p(y| − A)
I(X; Y ) = p(y|A) log + p(y| − A) log dy
2 −∞ p(y) p(y)
where
1
p(y) = [p(y|A) + p(y| − A)]
2
✫ ✪
Instructor: Jia-Chin Lin 20
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Channel Model and Channel Capacity
A general communication channel is described in terms of its set of
possible inputs and outputs
Denote X : the input alphabet
Denote Y : the output alphabet
the conditional probability is denoted by P[y1 , y2 , . . . , yn |x1 , x2 , . . . , xn ]
A channel is called memoryless if we have
n
Y
P[y|x] = P[yi |xi ] for all n
i=1
In other words, a channel is memoryless if the output at time i depends
only on the input at time i.
✫ ✪
Instructor: Jia-Chin Lin 21
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Binary Symmetric Channel
✫ ✪
Instructor: Jia-Chin Lin 22
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Binary Symmetric Channel
The diagram of a binary symmetric channel
P [Y = 0|X = 1] = P [Y = 1|X = 0] = p
P [Y = 1|X = 1] = P [Y = 0|X = 0] = 1 − p
✫ ✪
Instructor: Jia-Chin Lin 23
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Discrete Memoryless Channel
The BSC is a special case of a more general discrete input, discrete
output channel. The DMC is a channel model in which the input and
output alphabets X and Y are discrete sets and the channel is
memoryless.
✫ ✪
Instructor: Jia-Chin Lin 24
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
AWGN Waveform Channel
We consider a channel model in which both inputs and outputs are
waveform. Assume the channel has given a bandwidth W , with ideal
frequency response C(f ) = 1 within the frequency range [−W, +W ].
x(t): the band-limited input
y(t): the corresponding output
N0
n(t): the AWGN noise with power spectral density
2
y(t) = x(t) + n(t)
✫ ✪
Instructor: Jia-Chin Lin 25
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
AWGN Waveform Channel
The channel input is subject to a power constraint of the form
E[X 2 (t)] ≤ P
which for ergodic inputs results in an input power constraint of the form
T /2
1
Z
lim x2 (t)dt ≤ P
T →∞ T −T /2
✫ ✪
Instructor: Jia-Chin Lin 26
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
AWGN Waveform Channel
We can express x(t), y(t), and n(t) in the form
X
x(t) = xj φj (t)
j
X
n(t) = nj φj (t)
j
X
y(t) = yj φj (t)
j
where xj , yj , and nj are the sets of coefficients in the corresponding
expansions
Z ∞ Z ∞
yj = y(t)φj (t)dt = (x(t) + n(t))φj (t)dt
−∞ −∞
= xj + nj
✫ ✪
Instructor: Jia-Chin Lin 27
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
AWGN Waveform Channel
Since n,j s are iid zero-mean Gaussian random variable with variance
N0
, it follows that
2
1 (yj − xj )2
p(yj |xj ) = √ exp − , i = 1, 2, . . .
πN0 N0
and by the independence of n,j s
N
Y
p(y1 , y2 , . . . , yN |x1 , x2 , . . . , xN ) = p(yj |xj )
j=1
for any N .
✫ ✪
Instructor: Jia-Chin Lin 28
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
AWGN Waveform Channel
The power constraint on the input waveform can be written as
T /2 2W T
1 1 X 2
Z
lim x2 (t)dt = lim xj
T →∞ T −T /2 T →∞ T
j=1
1
= lim× 2W T E[X 2 ]
T →∞ T
= 2W E[X 2 ]
≤ P
We conclude that in the discrete-time channel model we have
P
E[X 2 ] ≤ 2
W
✫ ✪
Instructor: Jia-Chin Lin 29
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Channel Capacity
• The entropy and the rate distortion function provide the minimum
required rate for compression of a discrete memoryless source
subject to the condition that it can be losslessly recovered.
• We introduce a third fundamental quantity called channel capacity
that provides the maximum rate at which reliable communication
over a channel is possible.
✫ ✪
Instructor: Jia-Chin Lin 30
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Channel Capacity
For an arbitrary DMC the capacity is given by
C = max I (X; Y )
p
where the maximization is over all PMFs of the form
p = p1 , p2 , . . . , p|X | on the input alphabet X . The p,i s naturally
satisfy the constraints
pi ≥ 0 i = 1, 2, . . . , |X |
|X |
X
pi = 1
i=1
The unit of C are bits per transmission or bits per channel use.
✫ ✪
Instructor: Jia-Chin Lin 31
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
ex 6.5-1. For an BSC, due to the symmetry of the channel, the
capacity is achieved for a uniform input distribution, i.e., for
1
P [Y = 0|X = 1] = P [Y = 1|X = 0] = . The maximum mutual
2
information is given by
C = 1 + p log2 p + (1 − p) log2 (1 − p) = 1 − H(p)
✫ ✪
Instructor: Jia-Chin Lin 32
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Channel Capacity with Input Power Constraint
The channel model is equivalent to 2W uses per second of
P
discrete-time AWGN channel input power constraint of and noise
2W
2 N0
variance of σ = . The capacity of this discrete-time channel is
2
P
1
C = log2 1 + 2W
2 N0
2
1 P
= log2 1 + bits/channel use
2 N0 W
✫ ✪
Instructor: Jia-Chin Lin 33
✬
Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications
✩
Channel Capacity with Input Power Constraint
The capacity of the continuous-time channel is given by
1 P
C = 2W × log2 1 +
2 N0 W
P
= W log2 1 + bits/s
N0 W
To see how the capacity changes as W → ∞, we need use the relation
ln(1 + x) → x as x → 0
P P
C∞ = lim W log2 1 + = (log2 e)
W →∞ N0 W N0
✫ ✪
Instructor: Jia-Chin Lin 34