Information Theory Basics
Information Theory Basics
,Elgvgn
Information Theory
and Source Coding
77.1 INTRODUCTION
Information theory provides aquantitative measure of the information contained in message signals and allows
us to determine the capacity of a communication system to transfer this information from source to destination.
In this chapter we briefly explore some basic ideas involved in the information theory and source coding.
Thus, a mathematical measure of information should be a function of the probability of the outcome
and should satisff the following axioms:
1. Information should be proportional to the uncertainty of an outcome.
2. Information contained in independent outcomes should add.
.lo9..a: lna _
loga
(1 1.6)
aL
h2 log2
-
2. Averoge Infofmation or Entropy In a practical communication system, we usually trans-
mit long sequences of symbols from an information source. Thus, we are more interested in the average
information that a souice produces than the information content of as single symbol.
The mean value of (x) over the alphabet of source Xwith z different symbols is given by
The quantity H(X) is called the entropy of source X.It is a measure of the average information content
per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty
within source Xthat is resolved by use of the alphabet.
Note that for a binary source Xthat generates independent symbols 0 and I with equal probability, the
source entropy H6) is
HX\-- 2I ton.,"'2I - I2 .1
Iog" -
"'2 - I b/symbol (1 1.8)
j = i, soXemits the same symbol x,all the time. The upper bound corresponds to the maximum uncertainty
which occurs when P(x,)
- llm for all i,that is, when all symbols are equally likely to be emiffed by X.
3. InformOtion Rate If the time rate at which source Xemits symbols is r (symbols/s), the infor-
mation rate R of the source is given by
R : rH(X)bls (r r.r0)
oYn
on only the current input and not on any of the Fig. 11.1 Discrete memoryless channel
previous inputs.
A diagram of a DMC withm inputs andn outputs is illustrated in Fig. 11.1. The inputXconsists
of input symbols x11 x2; ..., x*. The a priori probabilities of these source symbols P(x,) are assumed to
be known. The output I consists of output symbols lv !2, ... !n.Each pbssible input-to-output path is
indicated along with a conditional probabilityP( !ilx), where P(y)*,) is the conditional probability of
obtaining output \ given that the input is x,, and is called a channel transition probability.
B. Channe[ Matrix
A channel is completely specified by the complete set of transition probabilities. Accordingly, the channel
of Fig. I l.l is often specified by the matrix of transition probabilities lPgln), given by
The matrix [P(fl,!] is called the channel matrix. Since each input to the channel results in some
output, each row of the channel matrix must sum to unity, that is,
and the output probabilities P()') are represented by the row matrix
)l - [P("r,,)P(y)...P(y,,)]
lP(Y (l 1. 14)
P(x1) 0 0
0 P(x) 0
(l
lP(x)),r: 1.16)
C. Special Channets
7. Lossless Chahnel A channel describedby a chan- E
nel matrix with only one nonzero element in each column 4 -------t
yt
is called a lossless channel. An example of a lossless chan-
nel is shown in Fig. 11.2, and the corresponding channel
r.,r1- .r,
1
matrix is shown in Eq. (l l.l8). 4 -l_
tr.
[1 1o 00
144 x3 +--
Hy.
tP(Ylx)t- lo 0 i Zo ( 1 1.18)
3
l3 3 1 -t ys
It can be shown that in the lossless channel no source information is lost in transmission. [See Eq. (1 L35)
and Solved Problem I l.l0l
2. Deterrninistic Channel e channel described by a channel matrix with only one nollzero
element in each row is called a deterministic channel. An example of a deterministic channel is shown in
Fig. I 1.3, and the corresponding channel matrix is shown in Eq. (11.19).
100
100
lP(Ylx)l =- 010 ( I 1.1e)
010
001
Note that since each row has only one nonzero element, this
element must be unit by Eq.(11.12). Thus, when a given
1
source symbol is sent in the deterministic channel, it is clear
which output symbol will be received. Fig. 11.3 Deterministic channel
Information Theory ond Source Coding ----l
I1!.5
3. Noiseless Channel e channel is called noiseless if it is both lossless and deterministic. A noise-
less channel is shown in Fig. I 1.4. The channel matrix has only one element in each row and in each column,
and this element is unity. Note that the input and output alphabets are of the same size; that is, m z for the :
noiseless channel.
4. Binary Symmetric Chonnel fne binary symmetric channel (BSC) is defined by the channel
diagram shown in Fig. 11.5, and its channel matrix is given by
tP(v'lx)) : rr p p l
(1 1.20)
l' on ,1 o)
The channel has two inputs (xr : 0, 1) and two outputs (-/r : 0, lz - 1). The channel is symmet-
x2:
ric because the probability of receiving a I if a 0 is sent is the same as the probability of receiving a 0 if a
I is sent. This common transition probability is denoted by p. (See Supplementary Problem 11.1).
1-p
X1 xt:0
Yz
xm
1 1-p
Fig. 11.4 Noisetess channel Fig. 11.5 Binary symmetrical channel
H(Y) : - =r_
L,:, P(y)togrP(y',) (tt.22)
These entropies can be interpreted as follows: H(X) is the average uncertainty of the channel input,
and H(Y) is the average uncertainty of the channel output. The conditional entropy H(XII) is a
Analog and Digital Communications
measure of the average uncertainty remaining about the channel input after the channel output has been
observed. AndH(Xlf is sometimes called the equivocation ofXwith respectto L The conditional entropy
H(YIX) is the average uncertainty of the channel output given that X was transmitted. The joint
entropy H(X, )') is the average uncertainty of the communication channel as a whole.
Two useful relationships among the above various entropies are
H(,Y, Y) : H(XIY) + H(Y) (11.26)
H(X, Y) : H(YIX) + H(X) (11.27)
B. Mutual Information
The mutual information I(X; Y) of a channel is defined by
I(X; Y) - H(n - H6l)z) b/symbol (11.28)
Since H(X) represents the uncertainty about the channel input before the channel output is observed
and H(XlI) represents the uncertainty about the channel input after the channel output is observed, the
mutual information I(X; I/) represents the uncertainty about the channel input that is resolved by observ-
ing the channel output.
Properties of I(X; Y):
l. I(X; Y) : I(Y; X) (tt.2e)
2. I(X; Y) > 0 (1 1.30)
3. I(x; Y) - H(n - H(W (l l.31)
4. 16; D: H(n + HU) - H(X, Y) (1r.32)
c, : ,?,?i
H(x) : tog, m (r 1.36)
2. Deterministic ChAnnel For a deterministic channel, H(YW): 0 for all input distributions
P(x,), and
I(X; Y) : H(Y) (1 1.37)
Thus, the information transfer is equal to the output entropy. The channel capacity per symbol is
( - ,frii H(Y) : tog.- n ( 1 1.38)
3. NOiSeleSS ChAnnel Sirrce a noiseless channel is both lossless and deterministic, we have
4X; Y) - H(X) -- H(Y) (11.3e)
and the channel capacity per symbol is
( - logrm
- logzn (l 1.40)
4. Binary Symmetfic Channel For the BSC of Fig. I 1.5, the mutual information is (Solved
Problem I 1.16)
I(X; Y): H(Y) + plogrp + (l - p)logr(l - p) (l 1.41)
and the channel capacity per symbol is
C" : |* plog2lt + (1 - p)logz0 - p) (1r.42)
A. Differential Entropy
The average amount of information per sample value of x(r) is measured by
The parameter L represents the average number of bits per source symbol used in the source coding
process.
The code fficiency 4 is defined as
L^in
q -- (1 1.s0)
L
where Z,,,n is the minimum possible value of L. When T approaches unity, the code is said to be
fficient.
The code redundancy 1is defined as
'Y:l-rl (u.s l)
C. Classification of Codes
Classification of codes is best illustrated by an example. Consider Table I 1.1 where a source of size 4 has
been encoded in binary codes with symbol 0 and 1.
7. Fixed-length Codes Afixed-length code is one whose code word length is fixed. Code I and
code 2 of Table I 1.1 are fixed-length codes with length 2.
2. Variobte-length Codes Avariable-length code is one whose code word length is not fixed. All
codes of Table I1.1 except codes I and 2 are variable-length codes.
3. DistinCt Codes A code is distinct if each code word is distinguishable from other code words.
All codes of Table I 1.1 except code I are distinct codes-gotice the codes for x, and xr.
'[ 11.10 | Analog and Digital Communicotions _
4. Prefix-free CodeS A code in which no code word can be formed by adding code symbols to
another code word is called a prefix-free code. Thus, in a prefix-free code no code word rs a prefix of
another. Codes 2, 4, and 6 of Table 1 1.1 are prefix-free codes.
5. Uniquely Decodable Codes A distinct code is uniquely decodable if the original source
sequence can be reconstructed perfectly from the encoded binary sequence. Note that code 3 of Table
11.1 is not a uniquely decodable code. For example, the binary sequence l00l may correspond to
the source sequences x{{zor xixfifir. A sufficient condition to ensure thata code is uniquely decodable
is that no code word is a prefix of another. Thus, the prefix-free codes 2, 4, and 6 are uniquely decod-
able codes. Note that the prefix-free condition is not a necessary condition for unique decodability. For
example, code 5 of Table I 1.1 does not satisfy the prefix-free condition, and yet it is uniquely decodable
since the bit 0 indicates the beginning of each code word of the code.
6. InstAntone0us COdeS A uniquely decodable code is called an instqntaneous code if the end
of any code word is recognizable without examining subsequent code symbols. The instantaneous codes
have the property previously mentioned that no code word is a prefix of another code word. For this
reason, prefix-frei codes are sometimes called instantaneous codes.
7. 0ptimAl COdeS A code is said to be optimal if it is instantaneous and has minimum average
length L for a given source with a given probability assignment for the source symbols.
D. Kraft Ineguatity
Let Xbe a DMS with alphabet {x,} (l : I ,2, ..., z). Assume that the length of the assigned binary code
word corresponding to x,is n,.
A necessary and sufficient condition for the existence of an instantaneous binary code is
m
K: \- 2-', <l (r r.54)
,:l
which is known as the Kraft inequality.
Note that the Kraft inequality assures us of the existence of an instantaneously decodable code with
code word lengths that satisff the inequality. But it does not show us how to obtain these code words,
nor does it say that any code that satisfies the inequality is automatically uniquely decodable (Solved
Problem 11.27).
A. Shannon-Fano Coding
An efficient code can be obtained by the following simple procedure, known as Shannon-Fano
algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equiprobable as possible, and assign 0 to the upper
set and I to the lower set.
_ Information Theory and Source Coding
--l
I t!.tt
3. Continue this process, each time partitioning the sets with as nearly equal probabilities as possible
until further partitioning not possible.
An example of Shannon-Fano encoding is shou,n in Table 11.2. Note that in Shannon-Fano
encoding the ambiguity may arise in the choice of approximately equiprobable sets. (See Solved
Problem 11.33.)
L: 2.38 b/symbol
q-H(X)|L-0.e9
B. Huffman Encoding
In general, Huffrnan encoding results in an optimum code. Thus, it is the code that has the highest effi-
ciency (Solved Problem 11.34). The Huffrnan encoding procedure is as follows:
L:2.38 b/symbol
\: 0'99
i--r
It.r ':it'iz 'l Analog and Digital Communications
o.3o oo o.lo oo
l0
o.:o 0.45 \ r 0.55
0.25 01 o.zs or o.zs
0l 0N[
0.30---r I\0.45 I
t0
0.2s !1
0i
Measure of Information
P[xqi:g.I. , l
&) fina the aqrount of information contained in the messages xp2xrx3 and xa4x.,#2, and corrl*
pare with the I{ffi obtained in part (a).
LetP(x,) - a. P(xr) : I - o.
H(X) - -o-logro- - (1 - o) 1og, (l - o) (11.55)
dH(X)
da - +
da
[-alogr0-(t - a)tosr(t-o)]
Using the reiation
!roroy-Lbsur!!-
dxydx
we obtain
dHlx) I -- a
da
- -bgza
eL * logr(l
vL\ - o) : logz
o
,:
11.4 V,erify Eq, (l1.9),ttrat is,
. I ,. l,
I )landlo*, I>o
P(x,) "" P(x,)
Then it follows that
I
p@)log2 _- >0
P(!,)
r-------'1
| !1.14 | Analog and Digital Communications
P(x,) -0
P@,) Iog2
-l-
if and only if P(xi) : g or l. Since
f i:l
rs,1 -1
:f,Q,-f c:o
i:l i:r
by using Eq. (1 1.58). Thus,
mn
P,logrY:- <o (11.61)
D
i=l ti
1
Qi: -m i- 1,2,...,m (11.62)
mm
we obtain
*nbe,+ --D p,togrl-f \bgrm
(1 r.63)
-H(X)-tosz*ln,
i=l
-H(X)-logzml0
Hence, H(X) S log, m
and the equatity holds only if the symbols in,triare equiprobable, as in Eq. (11.62).
Informotion Theory and Source Coding
11.5 A high-resolution black and white TV picture consists of about 2 x 106 picture elements and
16 different brighureis levels. Pictures are repeated at the rate of 32 per second. All pichre
elemerts are assumed to be independent, and all levels have equal likelihood of occurrence.
, ,, ,,:,C cultte &e-averagerate of information conveyed by this TV picture $ource.
H(X): -
Ar r
bielement
-'1
I G,ot,;
r :2(106X32) - 64(106) elements/s
Hence, by Eq.(11.10)
R - rH(X) : 64(loux+)
- 256(106; b/s
- 256 Mb/s
Xi.6 Consi@C telegraph source hrying two symbols, dot and dash. The dot duratioa is,.0.2,s, The dash
duration is 3 times the dot duration. The probability of the dot's occurring is twice that of the dash
and the time between symbols is 0.2 s. Calculate the information rate of the telegraph source.
P(dot). : r'ilf,ll : ,
"iJl,Tl t) -
Thus, P(dash):1andP(dot)-
11
JJ
By Eq. (11.7)
H(X) : -P(dot)logrP(dot) - P(dash)logrP(dash)
- 0.667(0.585) + 0.333(1.58s)
- 0.92 b/symbol
/dot : 0.2s rourn : 0.6s /.0u.. : 0.2s
, L: 1.875 svmbols/s
- T,J
Thus, the average information rate of the telegraph source is
R
- rH(X) : I .875(0.92) : 1 .725 bls
I-:---:--l
I :l1.t6 | Anolog and Dfgital Communications
(a) Using Eq. (l l.l l), we see the channel matrix is given by
: [0.5 o.s]
' [o ? : ]l lo.2 o.8l
11.8 Two binary channels of Solved Problem ll.7 areconnected in cascade, as shown in Fig. 11.8.
0.9 Yr 0.9
Fig. 11.8
{a) Find' fu ovr =uhr lfiaffi*.of *re r€$ul nt channel, and draw thE resultant eiuivalent
channel diegrfirp.
':l :i : l
(a) By Eq.(11.ls)
tP(Dt _ [P(x)]tP(w)l
lP(z)l - [P(D][P(zlY))
- IP (x)llP (y1x))tP (4Y )l
- [P(x)]lP(zw)l
Thus, from Fig. I 1.8
lP (zlx )l : lP (yv)ltP (zlY )l
Informotion Theory ond Source Coding
o.rl[o o o ,l :
: fo., o.sl o rTl
lo.z z o 8] fo.s:
[o :+ o ool lo
The resultant equivalent channel diagram is shown
in Fig. I 1.9.
(b) lP(z)1 - 1P(x)1lP(zlx)l
- ro s o sr l::l : i'l : ro s8s o 4rsl
[0.:+ 0.66]
Hence, P(rr): 0.585 and P(zr): 0.415.
tp(rr)r: (r 1.64)
L-\
[,;o ;,:r]
(a) Draw the channel diagram.
(b) If the source has equally likely outputs, compute the probabilities associated with the channel
outputs for p :0.2.
(a) The channel diagram is shown in Fig. 11.10. Note that the channel represented by Eq. (l1.64)
(see Fig. 1 1 . l0) is known as the binary erqsure cltannel. The binary erasure channel has two
inputs xr:0 and x2: I andthree outputs lt:0,lz: e,and!t:1, where e indicates an
erasure; that is, the output is in doubt, and it should be erased.
(b) ByEq.(ll.1s)
t-p
Fig. 11.10 Binary erasure channel
Mutual Information
Note that all the terms in the inner summation are zero because they are in the form of I x log, 1
P(y)*,): ',
(r 1.70)
{: =',
Hence, P@i, !) : P(yilx,)P@,) : [P@,) i: j (l l.7l)
I o i=i
and
, P(t) : D P@,, !) : P(x) (11.72)
i:l
Thus, by Eqs. (11.7) and (1L72)
H(Y) : -f Pe,)togrP(y,)
j:l
m
-D P@,)logrP(x,): H(X)
,:1
Next, by Eqs. (11.24), (l1.70), and (11.71)
H(Y|X): - D D P@,,t)toszP(y)*,)
j =1 i:1
mm
: - D r(r,) D togrP(y,lx,)
j:1
:-D p@,)logrt:0
i:1
)ltosP( y,)
jt L, r j
1I.I3 Show that the mutual information I(X; I) of the channel described by Eq. (11.11), with the
inputprobabilitiesP(x), i - l,2,...,m,andtheoutputprobabilitiesP( \),_i: l,2,...,ncanbe
expressed as
By Eq.(11.28)
\X; Y) : H(X) * H(X1Y)
Using Eqs. (l l.2l) and (11.23), we obtain
:ff 1=t,r*,,4)tosz'4+
- P(x,)
KX; Y) - KY; X)
FromEq.(11.73)andusingtherelationlog(alb)--log(bla),wehave
E
Using the inequality (11'59), that is'
lncv ( a-1
we have
1 ,1'
P(xuyi)l?#'l
-t(X:Y)< #E I il
f , l^It fn r(r,) P(v i) -D,i=r D' 'G,, , ,)\
I
(rr '77)
or - r(x; y) <
--:
h2 l1 "i=t -i=t l
Since iL,
i:t i:l
P(x,)P(y): ii--1P(*,) D, 't )-
i =r
(1)(1) - 1
-t(x; n < o
or I(X; Y)"2 0
Informotion Theory and Source Coding
: 1-p
ftxr) a xr
ff{$:*1., - a&
' ol[t-p p
\(1"/J
tP(x, Dl - 11
[o r-*][
l p t-p]
I
rP(r)t - '
rososr' [l? I]l :
lo.r o.el
roso5l
Thus, P(y) - P(y): 0.5.
By Eq. (11.22)
rP(nr : 5.,, : ro s o sl
r0
[: : ii]
H(Y) - 1
ptoszp + (t - p)los, (t - p) log, 0.5 + 0.s log, 0.5
l: i
Thus, I(X;Y)-1-l:0
Note that in this case (p - 0.5) no information is being transmitted at all. An equally accept-
able decision could be made by dispensing with the channel entirely and "flipping a coin" at
the receiver. When I(X; Y) - 0, the channel is said to be useless.
Channet Capacity
c, : ,Il?ir
r(x; y):
{}??,1}
H(x): toszm
11.19 Find the channel capacity of the binary erasure channel of Fig. ll.l2 (Solved Problem 11.9).
P(x) : a x,
P(xz\:1-aX2
1-p
Fig. 17.12
ByEq.(l1.ls)
'lt-P P o
tP(Dl-[ot-"'I
I
o p r- p)
-la(r-p)p(r-CI)(l -p)l
-- lP(y) P(y) P(y)l
By Eq. (1 I .17)
-l
l"(r - p) ap 0l
I 0 (|-a)p (1 - o)(t - ill
lP1xr, !) P(xr, y) P(
'r, /r)l
:
lr|r, rr) P(xz, yz) P( xz, !)l
In addition, from Eqs. (11.22) and (11.24) we can calculate
H(Y): - i P(y)togrP(y,)
j:1
11.20 Fin{ the differential entropy H{X) of the uniformly distributed random variableXwith probabil-
ity density function
Ir A{x1a
"fx$): l;Io otherwise
By Eq. (1 1.43)
H(X) - - [* fr(x) log, f*(x)dx
v-L
ll.2l With the differential entropy of a random variable Xdefined by Eq. (l 1.43),that is,
^M,
From Eqs. (6.37b) and (6.75),fx(x) must satisfy the following two conditions:
f**a) dx - | (11.86)
n @-p)'fr(x)dx:r' (1 r.87)
where p is the mean of X and o' is its variance. Since the problem is the maximization of H(X)
under constraints of Eqs. (1 1.86) and (1 1.87), we use the method of Lagrange multipliers.
First, we form the function
where parameters .\, and ,\, are the Lagrange multipliers. Then the maximization of H(X) requires
that
or
Hence, we obtain
(11.e0)
In view of the constraints of Eqs. (l 1.86) and ( I 1.87), it is required that Az ( 0. Let
e*p|,-l+
' \ ]:rond,]' :-b2
t logre) logze
Then Eq. (l1.90) can be rewritten as
fx@) - o'-P('-*P (1l.el)
Substituting Eq. (11.91) into Eqs. (l1.86) and (l1.87), we obtain
, d&(,-az dx - ,{ -- t (tt.e2)
[:
'[: @-tD',-u26-'12 dx- ,#:o2 (l t.e3)
o-L andb2- I
1t2ro 2o2
Substituting these values Eq. (l l.9l), we see that the desired/r(x) is given by
11.22 Slrcn, fut the chamel eapacity of an ideal AWGN ehrml with infinite hdrridth is givan by
ln2 r7 n
Note that Eq. (1 1.95) can be used to estimate upper limits on the performance of any practical
communication system whose transmission channel can be approximated by the AWGN channel.
l*,r3 Csnrider gp AlilGN channel with 4.kHz bandwidth and the noise power spectral density
EtZ = 10-12WII{z. Tho sigual power required at the ressiver is 0.1 mW. Calculate the capacity
: :; of this channel.
B- 4000 Hz S : 0. 1( 10-3) W
N: rlB :2(lo-tz)(4ooo) : 8(10-e) w
Thus, s 0'1(101
,^/- g(10-e) -1.25(104)
And by Eq. (l 1.48)
C : "'I [t * {]
Blon.,
N)
4000 log, 1.25(100)] 54.44(1031b/s
- [1+ -
(d) Find the bmrdwidth required for an AWGN channel for error-free transrnission of the output
(a) f,,,r:4(lo3) Hz
Nvquistrate
rl;t*,Tlirffi,iilio,,r,
H(X) :8 b/sample
=Logz256
Information Theory and Source Coding
C: Blon"
"'llt - +J : l041os zG + 102;: 66.6(103y b/s
l
Since R > C, error-free transmission is not possible.
(c) The required S/l/ ratio can be found by
c: lo+log.
"- l,*{l> s(loo)
t I't)
or ""I N)-,*
lon,[,*{l
N1/
Thus, the required S/A ratio must be greater than or equal to 24.1 dB for error-free transmission.
(d) The required bandwidth B can be found by
8(104)
"
B -> Logll + 100) : 1.2(101; Hz : 12 kHz
and the required bandwidth of the channel must be greater than or equal to 12 kHz.
Source Coding
11.25 Consider DMS X with two symbols x, and x, and P(x, ) 0.9, P(*r)
a - * 0. I . Tabte 17.4
Symbols x, and x2are encoded as follows (Thble ll.4):
Find the efficiency 4 and the redundancy 7 of this code. xi P(xi) Code
Jrl 0.9 0
By Eq. (11.49) the average code length L per symbol is
x) 0.1 1
By Eq. (1 1.7)
2
H(X):- I P@)tog,P(x,)
i:1
: -0.9 1og, 0.9 - 0.1 log, 0.1
- 0.469 b/symbol
Thus, by Eq. (l 1.53) the code efflciency 4 is
H(x)
,: L -0.469:46.9yo
By Eq. (l 1.51) the code redundancy 7 is
1:l-q- 0-531
-53.1o/o
Anolog and Digital Communications
Tabte 17.5
11.26 The second-order extension ofthe DMSXof Solved Problem
11.25, denoted by .*, is formed by taking the source sym- at P(a,) Code
bols two at a time. The coding of this extension is shown in qt: xl xl 0.81 0
Table 11.5. Find the efficiency 11 and the redundancy 7 of
this extension code. a2: xl x2 0.09 l0
a3: x2xl 0.09 110
- - 0.81 log, 0.81 _- 0.09 logr 0.09 - 0.09 log, 0.09 - 0.01 log, 0.01
: 0.938 b/symbol
Therefore, the code efficiency 4 is
't:l-ry -0.273-27.3%o
Note that H(X') : ztt(X).
11.27 Consider a DMS X with symbols x,, i : l, 2, 3,4. Table I I .6 lists four possible binary codes.
ility.
O) Show that codes A and D are uniquely decodable but codes B and C are not uniquely decodable.
Tabte !7.6
xi Code A Code B Code C Code D
xl 00 0 0 0
x2 01 10 l1 100
x3 t0 l1 r00 110
x4 ll ll0 il0 lll
(a) From Eq. (l I .54) we obtain the follorving:
For code l: frt:fr2--n3--flq:2
r-tz-' 4
- ]*1*1+1': l
i=r4444
Information Theory and Source Coding t-1r;tl
For code .B: ilt: I n2: n3: 2 ftt: 3
x-fz-':1*l*1+f:t1>t
7124488
ForcodeC: fft:l frz:2 n3:no:3
K-f2-''-!*l*1+f:l
2 4 8
fr 8
fi:l <s
. ',bgr9-
ti
Q,: T- (11.97)
K
where K-f2-"'
Lr-
(11.98)
is, - +f r-' - r
i:l r\ i =l
(rr ee)
:-i p, tog. p,
- ii:l pini
-(losz xli P, (l 1.100)
l:l l:l
-H(n-L-logrK<0
From the Kraft inequality (11.54) we have
logrKJ 0 (11.101)
I t1-.3o I Analog and Digitol Communicotions
ll.2g LetXbe a DMS with symbolsx, and corresponding probabilitiesP(x,) -- P,,i - 1,2, ...,m.
Show that for the optimum source encoding we require that
x*t2-' - i:1
1 (1 1.103)
I
and n,:log"
"t --oz D -I ',
-1i (11.104)
where n, is the length of the code word corresponding to x, and !, is the inforrnation content ofx,.
From the result of Solved Problem 11.28, the optimum source encoding with L : H(D requires
K : 1 and P, I . Thus, by Eqs. (l 1.98) and (1 1.97)
-
K-
,Dt -r
(l1.r0s)
'i ti
Show that this relationship satisfies the Ituaft inequality (l 1.54), and find the bound on K in
Eq. ( I 1.s4).
2log
P,
> 2-" Z2loe-
I 2-,
Then
or P,>2-"'>!P, (1 1.10e)
-2 t
Thus, re>f
LJ
i:l
I-Z-t
i=l
2-n >1re
-^.L/
'i:l
t
(11.110)
ml
or
/-t- 1
(1r.r l l)
-
i:l .u
-.l
which indicates that the Kraft inequality ( I 1.54) is satisfie4 and the bond on K is
1I.31 Consider a DMS Xwith symbols x, and corresponding probabilities P(x,) : pp i: 1,2, ..., ffi.
Show that a code constructed in agreement with Eq. (l I.107) will satisfirthe foilowing relation:
H(X)<L<H(X)+t (r l. I l3)
where H(n is the source entropy and L is the average code word length.
xl I 0 0
2
x2 1 I 0 10
4
I
x3 I I 0 110
8
x4
I
I I I 111
8
- - logz-8! :3 :
I
I(xr)- -logz::3 - nr l(xo) no
E
^_ H(X) _, _
,L
t,
- -- -
I
-
100%
Table 11.9
Since the average code word length is the same as that for the Shannon-Fano code, the efficiency
is also the same.
Table 11.10
P(x) Code
v
li-q5 lll
0l
x1 0.2 0.4
x2 0.2 0.2
x5 0.2
1134 A DMS Xhas five symbols x1, x2t x3r xa, drLdx, with P(xr) * 0.4, P(x) - 0.19, P(xl) - 0.16,
P@o): 0.15, and P(rr) : 0.1
(u):..co"*truef.':sh#anocodeforX,andca1cu1atetheefficiency:offtecode.
(b) Repeat for the Huftnan code and compare the results.
(a) The Shannon-Fano code is constructed as follows (see Table l1.l l):
Table 11.11
- 2.r5
: D
5
Z p(x,)n,
i:l
- 0.4(2) + 0.1e(2) + 0.t6(2) + 0.ls(3) + 0.1(3) :2.25
H(X) 2.r5 0.956 95.6Yo
L
,l:
2.25
- :
The average code word length of the Huffinan code is shorter than that of the Shannon-Fano code,
and thus the efficiency is higher than that of the Shannon-Fano code.
Table ll.l2
xi E@) Code
x1 0.4
1
0.4
1l
0.4 I - r-0.6+
x2 0.19 0.2s
0l
o3s oo )o*;
000
x3 0.16 0.19 o.2s 01 --r
X4 0.15 0.16
xs 0.1
11.1 Consider a source Xthatproduces five sym- ll.2 Calculate the average information content
in the English language, assuming that each
bols with probabiliti", 1, 1, 1, I and of the 26 characters in the alphabet occurs
24816 with equal probability.
a. D.,"rmine the source entropy H(X) lAns.4.7 blcharacterl
l6 11.3 Two BSCs are connected in cascade, as
lAns. 1.875 b/symboll shown in Fig. 11.13.
tlffis I
0.7
Fig. 11.13
(a) Find the channel matrix of the resultant ll.7 Show that for a deterministic channel
channel. H(YIX) s :
(b) Find P(z) and P(z.r) if P(x,) : 0.6 and fHint: Use Eq. (11.24), and note that for a
P@) : o'4' deterministic channel P(y)*,) are either
0 or l.l
tAns,,,
ll li lll] 11.8 Consider a channel with an input X and an
output I Show that if X and Y are statisti-
(b) P(zr) : 0.524, P(z) :0.4761 cally independent, then H(XIY): H(X) and
ll.4 Consider the DMC shown in Fig. ll.l4. I(X; Y): Q
(a) Find the output probabilities if P(x,) : lHint: Use Eqs. (6.51) and (6.52) in
Eqs. (l1.23) and (l1.28).1
I urd P(xr)- P(xr) - I
11.9 A channel is described by the following
24
channel matrix.
(b) Find the output entropy H(Y).
(a) Draw the channel diagram.
,\, lAns. (a) P(y,) :7124, P(y) : 17148, and (b) Find the channel capacity.
I
P(Y):17148 (b) 1.58 b/symboll
[l 1ol
)
1
3
122l
o r]
[o
1 lAns. (a) See Fig. I I . 15. (b) I b/symboll
4
11.10 Let Xbe a random variable with probability
density function/r(x), and let Y : aX * b,
where a and b are constants. Find H(Y) in
terms of H(X).
lAns. H(Y) - H(Y) * logra)
Fig. 11.14
11.5 Verify Eq. (1 1.32), that is,
I(X; Y) H(x) + H(Y) - H(X, Y)
-
lHint: Use Eqs. (1 1.28) and ( 11.26).1
11,.6 Show tIntH(X,Y) < H(n + Hg) with equal-
ity if and only ifXand Y are independent. 1
11.11 Find the differential entropy H(X) of a (b) Let no be the fixed code word length.
gaussian random variable Xwith zero mean Show that if f,o: logrm, then the code
and variarce o'*. efficiency is 100 percent.
: I .
(Zreo"*)f lHint: Use Eqs. (11.49) and (t 1.52).1
lAns. H(X \
rrof, 11.15 Construct a Huffinan code for the DMS X
ll.l2 Consider an AWGN channel described by of Solved Problem 11.32, and show that the
Eq.(11.46), that is, code is an optimum code.
Y-X*n lAns. Symbols: xt x2 .r3 x4
where X and Y arc the channel input and Code: 0 10 110 llll
output, re sp e ctive ly, and n i s an additive white
gaussian noise with zero mean and variance 11.16 A DMS Xhas five symbols x1, x2, x3,r4, ord
o',. Find the average mutual information x, with respective probabilities 0.2, 0.15,
I(X; Y) when the channel inputXis gaussian 0.05,0.1, and 0.5.
rvith zero mean and variance o'r. (a) Construct a Shannon-Fano code for X,
and calculate the code efficiency.
(b) Repeat (a) for the Huffinan code.
I(X; Y):
lA'ns.
)rorrlt.*), lAns. (a) Symbols: xt x2 x3 x4 xs
11.13 Calculate the capacity of AWGN channel Code: l0 110 111I 1l l0 0
with a bandwidth of I MHz and an S/l/ratio Code efficiency rl:98.6 percent.
of40 dB.
(b) xt x2 x3 x4 xs
Symbols:
lAns. 13.29 Mb/sl
ll.l4 Consider a DMS X with ru equiprobable Code: 11 100 1011 1010 0
symbols x,, i : 1,2, ... ) m. Code efficiency rl:98.6 percent.]
(a) Show that the use of a fixed-length ll.l7 Show that the Kraft inequality is satisfied
oode for the representation of r, is most by the codes of Solved Problem 11.33.
efficient.' lHifi: Use Eq. (1 1.54).1