0% found this document useful (0 votes)
93 views36 pages

Information Theory Basics

This document provides an introduction to information theory and source coding. It discusses measuring information through entropy and average information. A discrete memoryless source produces symbols randomly according to a probability distribution and can be characterized by its alphabet, symbol probabilities, and generation rate. Entropy is the average information of a source and measures uncertainty. It is calculated as the probability-weighted information content of each symbol. The information rate of a source is the product of its entropy and symbol generation rate. Discrete memoryless channels transmit information probabilistically according to a channel matrix of transition probabilities from input to output symbols.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views36 pages

Information Theory Basics

This document provides an introduction to information theory and source coding. It discusses measuring information through entropy and average information. A discrete memoryless source produces symbols randomly according to a probability distribution and can be characterized by its alphabet, symbol probabilities, and generation rate. Entropy is the average information of a source and measures uncertainty. It is calculated as the probability-weighted information content of each symbol. The information rate of a source is the product of its entropy and symbol generation rate. Discrete memoryless channels transmit information probabilistically according to a channel matrix of transition probabilities from input to output symbols.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

",Clh .tof,,.

,Elgvgn
Information Theory
and Source Coding

77.1 INTRODUCTION
Information theory provides aquantitative measure of the information contained in message signals and allows
us to determine the capacity of a communication system to transfer this information from source to destination.
In this chapter we briefly explore some basic ideas involved in the information theory and source coding.

7I.2 MEASURE,OF TNFORMATION


A. Information Sources
An information source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution. A practical source in a communication system is a device that pro-
duces messages, and it can be either analog or discrete. In this chapter we deal mainly with the discrete
sources, since analog sources can be transformed to discrete sources through the use of sampling and
quantization techniques, described in Chap. 5. A discrete information source is a source that has only a
finite set of symbols as possible outputs. The set of source symbols is called the source alphabet, and the
elements of the set are called $,mbols or letters.
Information sources can be classified as having memory or being memoryless. A source with memory
is one for which a current symbol depends on the previous symbols. A memoryless source is one for
which each symbol produced is independent of the previous symbols.
A discrete memoryless source (DMS) can be characterizedby the list of the symbols, the probability
assignment to these symbols, and the specification of the rate of generating these symbols by the source.

B. Information Content of a Discrete Memoryless Source


The amount of information contained in an event is closely related to its uncertainty. Messages contain-
ing knowledge of high probability of occurrence convey r'elatively little information. We note that if an
event is certain (that is, the event occurs with probability 1), it conveys zero information.
Analog ond Digital Communications

Thus, a mathematical measure of information should be a function of the probability of the outcome
and should satisff the following axioms:
1. Information should be proportional to the uncertainty of an outcome.
2. Information contained in independent outcomes should add.

7. Information Content of a Symbol Consider a DMS, denoted by X, with alphabet {x1,


x2, ..., x^\. The information content of a symbol x,, denoted by (x,), is defined by

I(x,) loga (r 1.1)


- p(xJ - -logoP(xi)
where P(x,) is the probability of occurrence of symbol x,. Note that I(x,) satisfies the following properties:
(x,)-g for P(x,)-l (11.2)
(x)20 (l 1.3)
I(x,) > I(x) if P(x,) a P(x) (r 1.4)
I(x,x) - I(x) + I(x) if x, and xj arc independent (11.s)
The unit of (x) is the bit (binary unit) if b :2,Hartley or decit if b : 10, and nat (natural unit) if b e.
It is standard to use D :2.Here the unit bit (abbreviated "b") is a measure of information content and is
-
not to be confused with the term bit meaning "binary digit." The conversion of these units to other units
can be achieved by the following relationships.

.lo9..a: lna _
loga
(1 1.6)
aL
h2 log2
-
2. Averoge Infofmation or Entropy In a practical communication system, we usually trans-
mit long sequences of symbols from an information source. Thus, we are more interested in the average
information that a souice produces than the information content of as single symbol.
The mean value of (x) over the alphabet of source Xwith z different symbols is given by

H(x) - E[r(x)l- D P@,) r@,) (1 1.7)

:-D P(x,)log, P(x,) b/symbol

The quantity H(X) is called the entropy of source X.It is a measure of the average information content
per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty
within source Xthat is resolved by use of the alphabet.
Note that for a binary source Xthat generates independent symbols 0 and I with equal probability, the
source entropy H6) is

HX\-- 2I ton.,"'2I - I2 .1
Iog" -
"'2 - I b/symbol (1 1.8)

The source entropy H(X) satisfies the following relation:


(11.e)
where n is the size (number of symbols) ofthe alphabet of sourceX(Solved Problem I 1.4). The lowerbound
corresponds t-o no uncertainty, which occurs when one symbol has probability P@,) I while P@) 0 for
- -
Information Theory and Source Coding

j = i, soXemits the same symbol x,all the time. The upper bound corresponds to the maximum uncertainty
which occurs when P(x,)
- llm for all i,that is, when all symbols are equally likely to be emiffed by X.
3. InformOtion Rate If the time rate at which source Xemits symbols is r (symbols/s), the infor-
mation rate R of the source is given by
R : rH(X)bls (r r.r0)

11.3 DISCRETE MEMORYLESS CHANNELS


A. Channel Representation
A communication channel is the path or medium
oYt
through which the symbols flow to the receiver.
A discrete memoryless chqnnel (DMC) is a sta-
tistical model with an input X and an output I ;:] oYz

(Fig. 11.1). During each unit of the time (sig- a


naling interval), the ,channel accepts an input oYi
symbol from X, and in response it generates an
output symbol from L The channel is "discrete"
when the alphabets of X and Y are both finite. It
is "memoryless" when the current output depends
'l a

oYn
on only the current input and not on any of the Fig. 11.1 Discrete memoryless channel
previous inputs.
A diagram of a DMC withm inputs andn outputs is illustrated in Fig. 11.1. The inputXconsists
of input symbols x11 x2; ..., x*. The a priori probabilities of these source symbols P(x,) are assumed to
be known. The output I consists of output symbols lv !2, ... !n.Each pbssible input-to-output path is
indicated along with a conditional probabilityP( !ilx), where P(y)*,) is the conditional probability of
obtaining output \ given that the input is x,, and is called a channel transition probability.

B. Channe[ Matrix
A channel is completely specified by the complete set of transition probabilities. Accordingly, the channel
of Fig. I l.l is often specified by the matrix of transition probabilities lPgln), given by

P(yr l x) P(yz I x) P(y, I x)


P(\ *) P(yz I xr) P(y, | *z)
[P(Y\E] -
I
(1 l.l l)

The matrix [P(fl,!] is called the channel matrix. Since each input to the channel results in some
output, each row of the channel matrix must sum to unity, that is,

L, ,O,l-i) : I for all i (tt.t2)


j:l
Now, if the input probabilities P(X) are represented by the row matrix '

lP(X;1 - [P(x,)P(x) ..r P(x*)]' (11.13)


l. 17-L l Analog and Digitol Communicotions

and the output probabilities P()') are represented by the row matrix
)l - [P("r,,)P(y)...P(y,,)]
lP(Y (l 1. 14)

then lP(Y )l : [P(x)] tP(vlx)l (11.15)


If P(X) is represented as a diagonal matrix

P(x1) 0 0
0 P(x) 0
(l
lP(x)),r: 1.16)

then lP{x,Y)l:lP(x)l,ffuwl (r 1.17)


where the (i, j) element of matrix IP(X,I)l has the form P(x,, !). The matrixlP(X, fl is known asthe joint
probability matrix, and the element P(xi, y) is the joint probability of transmiuing x, and receiving yr.

C. Special Channets
7. Lossless Chahnel A channel describedby a chan- E
nel matrix with only one nonzero element in each column 4 -------t
yt
is called a lossless channel. An example of a lossless chan-
nel is shown in Fig. 11.2, and the corresponding channel
r.,r1- .r,
1
matrix is shown in Eq. (l l.l8). 4 -l_

tr.
[1 1o 00
144 x3 +--
Hy.
tP(Ylx)t- lo 0 i Zo ( 1 1.18)
3

l3 3 1 -t ys

[' o o 0l Fig. 11.2 Lossless channel

It can be shown that in the lossless channel no source information is lost in transmission. [See Eq. (1 L35)
and Solved Problem I l.l0l
2. Deterrninistic Channel e channel described by a channel matrix with only one nollzero
element in each row is called a deterministic channel. An example of a deterministic channel is shown in
Fig. I 1.3, and the corresponding channel matrix is shown in Eq. (11.19).
100
100
lP(Ylx)l =- 010 ( I 1.1e)

010
001
Note that since each row has only one nonzero element, this
element must be unit by Eq.(11.12). Thus, when a given
1
source symbol is sent in the deterministic channel, it is clear
which output symbol will be received. Fig. 11.3 Deterministic channel
Information Theory ond Source Coding ----l
I1!.5

3. Noiseless Channel e channel is called noiseless if it is both lossless and deterministic. A noise-
less channel is shown in Fig. I 1.4. The channel matrix has only one element in each row and in each column,
and this element is unity. Note that the input and output alphabets are of the same size; that is, m z for the :
noiseless channel.

4. Binary Symmetric Chonnel fne binary symmetric channel (BSC) is defined by the channel
diagram shown in Fig. 11.5, and its channel matrix is given by

tP(v'lx)) : rr p p l
(1 1.20)
l' on ,1 o)
The channel has two inputs (xr : 0, 1) and two outputs (-/r : 0, lz - 1). The channel is symmet-
x2:
ric because the probability of receiving a I if a 0 is sent is the same as the probability of receiving a 0 if a
I is sent. This common transition probability is denoted by p. (See Supplementary Problem 11.1).
1-p
X1 xt:0
Yz

xm
1 1-p
Fig. 11.4 Noisetess channel Fig. 11.5 Binary symmetrical channel

TT.4 MUTUAL INFORMATION


A. Conditional and Joint Entropies
Using the input probabilities P(x,), output probabilities P( 1), transition probabilities P(y)x,), and joint
probabilities P(x,, !1), we can define the following various entropy functions for a channel with ru inputs
and n outputs:

H(X)-- D P(*,)logrP(x) (l l .21)


,:l

H(Y) : - =r_
L,:, P(y)togrP(y',) (tt.22)

H(xlY) --- D D P@i,, !)logrP(x,ly,) (tr.23)


j:1 i:1
nm
H(Ytx)--It
j:t
P(x,, y)logrP{S;)x,) (tr.24)
i=1

H(X,l') :-DD P@,, !)loEzP(x,, yi) (tt.2s)


j =t i=l

These entropies can be interpreted as follows: H(X) is the average uncertainty of the channel input,
and H(Y) is the average uncertainty of the channel output. The conditional entropy H(XII) is a
Analog and Digital Communications

measure of the average uncertainty remaining about the channel input after the channel output has been
observed. AndH(Xlf is sometimes called the equivocation ofXwith respectto L The conditional entropy
H(YIX) is the average uncertainty of the channel output given that X was transmitted. The joint
entropy H(X, )') is the average uncertainty of the communication channel as a whole.
Two useful relationships among the above various entropies are
H(,Y, Y) : H(XIY) + H(Y) (11.26)
H(X, Y) : H(YIX) + H(X) (11.27)

B. Mutual Information
The mutual information I(X; Y) of a channel is defined by
I(X; Y) - H(n - H6l)z) b/symbol (11.28)
Since H(X) represents the uncertainty about the channel input before the channel output is observed
and H(XlI) represents the uncertainty about the channel input after the channel output is observed, the
mutual information I(X; I/) represents the uncertainty about the channel input that is resolved by observ-
ing the channel output.
Properties of I(X; Y):
l. I(X; Y) : I(Y; X) (tt.2e)
2. I(X; Y) > 0 (1 1.30)
3. I(x; Y) - H(n - H(W (l l.31)
4. 16; D: H(n + HU) - H(X, Y) (1r.32)

11.5 CHANNEL CAPACITY


A. Channel Capaciry per Symbol Cs

The channel.capacity per symbol of a DMC is defined as

c,: ,Pei 16; Y) b/sYmbol (1 1.33)

where the maximization is over all possible input probability distributions


{P(r,)} on X. Note that
the channel capacity C, is a function of only the channel transition probabilities that define the
channel.

B. Channel Capacity per Second C


If r symbols are being transmitted per second, then the maximum rate of transmission of information per
second is rC,. This is the channel capacity per second and is denoted by C(b/s):
C
- rC, b/s (l 1.34)

C. Capacities of Specia[ Channels


7. LOSSIeSS Chilnnel For a lossless channel, H(XIY) :0 (Solved Problem I I . l0) and
I(x; Y) - H(x) (l l.3s)
Thus, the mutual information (information transfer) is equal to the input (source) entropy, and no
source information is lost in transmission. Consequently, the channel capacity per symbol is
Informotion Theory ond Source Coding

c, : ,?,?i
H(x) : tog, m (r 1.36)

where lz is the number of symbols in X.

2. Deterministic ChAnnel For a deterministic channel, H(YW): 0 for all input distributions
P(x,), and
I(X; Y) : H(Y) (1 1.37)

Thus, the information transfer is equal to the output entropy. The channel capacity per symbol is
( - ,frii H(Y) : tog.- n ( 1 1.38)

where n is the number of symbols in L

3. NOiSeleSS ChAnnel Sirrce a noiseless channel is both lossless and deterministic, we have
4X; Y) - H(X) -- H(Y) (11.3e)
and the channel capacity per symbol is
( - logrm
- logzn (l 1.40)

4. Binary Symmetfic Channel For the BSC of Fig. I 1.5, the mutual information is (Solved
Problem I 1.16)
I(X; Y): H(Y) + plogrp + (l - p)logr(l - p) (l 1.41)
and the channel capacity per symbol is
C" : |* plog2lt + (1 - p)logz0 - p) (1r.42)

7T.6 ADDITIVE W!{ITE GAUSSIAN NOISE CHANNEL


In a continuous channel an information source produces a continuous signal x(t). The set of possible sig-
nals is considered as an ensemble of waveforms generated by some ergodic random process. It is further
assumed thatx(t) has a finite bandwidth so thatx(l) is completely characterized by its periodic sample
values. Thus, at any sampling instant, the collection of possible sample values constitutes a continuous
random variable Xdescribed by its probability density functionfx@).

A. Differential Entropy
The average amount of information per sample value of x(r) is measured by

H(X): - f*@)toerf*(*) dxblsampte (l1.43)


"[:
The entropy H(X) defined by Eq. (l 1.43) is known as the dffirential entropy of X.
The average mutual information in a continuous channel is defined (by analogy with the discrete case) as
I(x; Y) -- H(x) - H(xlY)
or 4x; Y) - H(Y) - H(YW)
where H(y) : - f* fr0)logrfr(y) dy (tt.44)
J -,o
Analog ond Digital Communications _

H(xlY) :- f*U, y)toszfx@ly) dxdy ( 1 1.a5a)


t: f-
H(YIX): -_
- [: f f*r@,y)tosz.fy(vlx) dxdy (r 1.4sb)

B. Additive White Gaussian Noise Channel


In an additive white gar.rssian noise (AWGN) channel, the channel outptrt Iis given by
Y:X*n ( l 1.46)
where X is the channel input and n is an additive band-limited white gaussian noise with zero mean and
uartance o'.
The capacity C, of an AWGN channel is given by

C,: ,f?.1, 1.x; Y) : . *J bisample (1t.47)


Lrorr[,
where S/N is the signal-to-noise ratio at the channel output. If the channel bandwidth B Hz is fixed then
the output y(t) is also a band-limited signal cornpletely characterrzed by its periodic sample values taken
at the Nyquist rate 28 samples/s. Then the capacity C(b/s) of the AWGN channel is given by

C : 2BC,: B losz[, . ]l (1 1.48)


"t
Equation (1 1.48) is known as the Shannon-llartley lav,.
The Shannon-Hartley law underscores the fundamental role of bandwidth and signal-to-noise ratio
in communication. It also shows that we can exchange increased bandwidth for decreased signal power
(Solved Problem 11.24) for a system with given capacity C.

!!.7 SOURCE CODING


A conversion of the output of a DMS into a sequence
of binary symbols (binary code word) is called source
coding. The device that performs this conversion is called Binary
the source encoder (Fig. I l 6). sequence
An objective of source coding is to minimize the X == {xt'""' x^l
average bit rate required for representation of the source Fig. 11.6 Source coding
by reducing the redundancy of the information source.

A. Code Length and Code Efficiency


Let Xbe a DMS with finite entropy H(X ) and an alphabet {r,, . . . , x*} with corresponding probabilities
of occurrence P(x,Xi : l, . . ., m). Let the binary code word assigned to symbol x, by the encoder have
length z;, me&SUred in bits. The length of a code word is the number of binary digits in the code word.
The average code word length L, per source symbol is given by
nl
z - \-i=l
/-/
P(.r,\n,
\ t/ t (l 1.4e)
Information Theory ond Source Coding

The parameter L represents the average number of bits per source symbol used in the source coding
process.
The code fficiency 4 is defined as

L^in
q -- (1 1.s0)
L

where Z,,,n is the minimum possible value of L. When T approaches unity, the code is said to be
fficient.
The code redundancy 1is defined as
'Y:l-rl (u.s l)

B. Source Coding Theorem


The source coding theorem states that for a DMS Xwith entropy H(X), the average code word Iength Z
per symbol is bounded as
L > H(X) (11.52)
and furthe r, L canbe made as close to H(X)as desired for some suitably chosen code.
Thus, with Z-,n H(X), the code efficiency can be rewritten as
-
,:Y-? (r 1.s3)

C. Classification of Codes
Classification of codes is best illustrated by an example. Consider Table I 1.1 where a source of size 4 has
been encoded in binary codes with symbol 0 and 1.

Table 11.1 Binary Codes


xi Code l Code 2 Code 3 Code 4 Code 5 Code 6
xl 00 00 0 0 0 I
x2 01 0l 1 t0 0l 0l
x3 00 l0 00 110 011 001
x4 11 11 11 111 0111 0001

7. Fixed-length Codes Afixed-length code is one whose code word length is fixed. Code I and
code 2 of Table I 1.1 are fixed-length codes with length 2.

2. Variobte-length Codes Avariable-length code is one whose code word length is not fixed. All
codes of Table I1.1 except codes I and 2 are variable-length codes.

3. DistinCt Codes A code is distinct if each code word is distinguishable from other code words.
All codes of Table I 1.1 except code I are distinct codes-gotice the codes for x, and xr.
'[ 11.10 | Analog and Digital Communicotions _

4. Prefix-free CodeS A code in which no code word can be formed by adding code symbols to
another code word is called a prefix-free code. Thus, in a prefix-free code no code word rs a prefix of
another. Codes 2, 4, and 6 of Table 1 1.1 are prefix-free codes.

5. Uniquely Decodable Codes A distinct code is uniquely decodable if the original source
sequence can be reconstructed perfectly from the encoded binary sequence. Note that code 3 of Table
11.1 is not a uniquely decodable code. For example, the binary sequence l00l may correspond to
the source sequences x{{zor xixfifir. A sufficient condition to ensure thata code is uniquely decodable
is that no code word is a prefix of another. Thus, the prefix-free codes 2, 4, and 6 are uniquely decod-
able codes. Note that the prefix-free condition is not a necessary condition for unique decodability. For
example, code 5 of Table I 1.1 does not satisfy the prefix-free condition, and yet it is uniquely decodable
since the bit 0 indicates the beginning of each code word of the code.

6. InstAntone0us COdeS A uniquely decodable code is called an instqntaneous code if the end
of any code word is recognizable without examining subsequent code symbols. The instantaneous codes
have the property previously mentioned that no code word is a prefix of another code word. For this
reason, prefix-frei codes are sometimes called instantaneous codes.

7. 0ptimAl COdeS A code is said to be optimal if it is instantaneous and has minimum average
length L for a given source with a given probability assignment for the source symbols.

D. Kraft Ineguatity
Let Xbe a DMS with alphabet {x,} (l : I ,2, ..., z). Assume that the length of the assigned binary code
word corresponding to x,is n,.
A necessary and sufficient condition for the existence of an instantaneous binary code is
m
K: \- 2-', <l (r r.54)
,:l
which is known as the Kraft inequality.
Note that the Kraft inequality assures us of the existence of an instantaneously decodable code with
code word lengths that satisff the inequality. But it does not show us how to obtain these code words,
nor does it say that any code that satisfies the inequality is automatically uniquely decodable (Solved
Problem 11.27).

11.8 ENTROPY CODING


The design of a variable-length code such that its average code word length approaches the entropy of the'
DMS is often referred to as entropy coding.In this section we present two examples of entropy coding.

A. Shannon-Fano Coding
An efficient code can be obtained by the following simple procedure, known as Shannon-Fano
algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equiprobable as possible, and assign 0 to the upper
set and I to the lower set.
_ Information Theory and Source Coding
--l
I t!.tt
3. Continue this process, each time partitioning the sets with as nearly equal probabilities as possible
until further partitioning not possible.
An example of Shannon-Fano encoding is shou,n in Table 11.2. Note that in Shannon-Fano
encoding the ambiguity may arise in the choice of approximately equiprobable sets. (See Solved
Problem 11.33.)

Table 17.2 Shannon-fano encoding

xi P(x,) Step I Step 2 Step 3 Step 4 Code


x1 0.30 0 0 00
x2 0.2s 0 I 0l
x3 0.20 I 0 10
x4 0.12 1 I 0 110
x5 0.08 I I I 0 I110
X. 0:05 I I I 1 llll

H(X) : 2.36 b/sYmbol

L: 2.38 b/symbol

q-H(X)|L-0.e9

B. Huffman Encoding
In general, Huffrnan encoding results in an optimum code. Thus, it is the code that has the highest effi-
ciency (Solved Problem 11.34). The Huffrnan encoding procedure is as follows:

l. List the source symbols in order of decreasing probability.


2. Combine the probabilities of the fwo symbols having the lowest probabilities, and reorder the
resultant probabilities; this step is called reduction l. The same procedure is repeated until there
are two ordered probabilities remaining.
3. Start encoding with the last reduction, which consists of exactly two ordered probabilities. Assign
0 as the first digit in the code words for all the source symbols associated with the first probability;
assign I to the second probability.
4. Now go back and assign 0 and 1 to the second digit for the two probabilities that were combined
in the previous reduction step, retaining all assignments made in Step 3.
5. Keep regressinfihis way until the first column is reached.

An example of Huffinan encoding is shown in Table 11.3.


H(X) : 2.36 b/symbol

L:2.38 b/symbol

\: 0'99
i--r
It.r ':it'iz 'l Analog and Digital Communications

Table 11.3 Huffman encoding

o.3o oo o.lo oo
l0
o.:o 0.45 \ r 0.55
0.25 01 o.zs or o.zs
0l 0N[
0.30---r I\0.45 I

t0
0.2s !1
0i

Measure of Information

If x, and xj are independent, then by Eq. (6.21)


. P(x,x) - P(x)P(xi)
By Eq. (1 1.1) '
ll
I(x,x) - log p@,\) - log p@,) p(O
1l * log-
- loB
"(rJ
- I(x) a t@)

P[xqi:g.I. , l

&) fina the aqrount of information contained in the messages xp2xrx3 and xa4x.,#2, and corrl*
pare with the I{ffi obtained in part (a).

(a) H(x): - P@,) 1og, [P(x,)]


*
: -0.4 logr0.4- 0.3 logr0.3l - 0.2 logr0.2- 0.1' 1og, 0.1
- 1.85 b/symbol J
. Information Theory and Source Coding '--_l
I t1.13
(b) - (0.4)(0.3X0.4)(0.2) - 0.0096
P(xrxrxtx3)
I(xrxrxrxr) - - 1og, 0.0096 : 6.70 b/symbol
Thus, I(xrxrxrxr) < .4 F aH(X)l b/symbol
7

P(xoxrxrxJ : (0. I X0.2)2(0.3) : 0.00 I 2


I(xoxrxrxr) -1og: 0.0012 - 9.70 b/symbol
-
Thus, I(xoxrxrxr) > 7.41: 4H(X)l b/symbol

{ill.3,,,Consideit,biuary. ndxr. Show ttrutI


soulpeXr*ithtwo symbolsx,andxr.Show ttrrrtH(X) is maximum
when both.r, and 12 are equiprobable.

LetP(x,) - a. P(xr) : I - o.
H(X) - -o-logro- - (1 - o) 1og, (l - o) (11.55)

dH(X)
da - +
da
[-alogr0-(t - a)tosr(t-o)]
Using the reiation

!roroy-Lbsur!!-
dxydx
we obtain
dHlx) I -- a
da
- -bgza
eL * logr(l
vL\ - o) : logz
o

The maximum value of H(X) requires that


dH(x) _ 0
da
thatis, l-o.:l->o- I
a2
Note that H(X)- 0 when a - 0 or l. When P(x,) - P(xz)- l, H6)is maximum and is given
bv-2
\/2eL2e' 1logr2+ !bgr2- I b/symbol
H(x)- (11.56)

,:
11.4 V,erify Eq, (l1.9),ttrat is,
. I ,. l,

where m is the sire of & dphabet ofX.

Proof of the lower bound: Since 0 < P(x,) S 1,

I )landlo*, I>o
P(x,) "" P(x,)
Then it follows that
I
p@)log2 _- >0
P(!,)
r-------'1
| !1.14 | Analog and Digital Communications

Thus, H(x): fi:l P(x,) log, _I >0 (1 1.s7)


P(x,)
Next, we note that

P(x,) -0
P@,) Iog2
-l-
if and only if P(xi) : g or l. Since

f i:l
rs,1 -1

- 1, then f6) - 0 for/ = i. Thus, only in this case, H(X) -


when P(x,) g.
Proof of the upper bound: Consider two probability distributions {P(x,) : P} and {Q@) : Q,}
on the alphabet {x,}, i : 1,2, ..., m such that

t pi:1and! ei:l (1 1.s8)

Using Eq. (11.6), we have


"' P,- +
fi=r,,bs2+ f.
ln2 ft nn*
Next, using the inequality
lnaS o--l o)0 (l l.5e)
and noting that the equality holds only if a - l, we get

Dp,h?=[ ,l?-,) : *,n,-P,) (l 1.60)

:f,Q,-f c:o
i:l i:r
by using Eq. (1 1.58). Thus,
mn
P,logrY:- <o (11.61)
D
i=l ti

where the equality holds only if Qi: P, for all i. Setting

1
Qi: -m i- 1,2,...,m (11.62)

mm
we obtain
*nbe,+ --D p,togrl-f \bgrm

(1 r.63)
-H(X)-tosz*ln,
i=l

-H(X)-logzml0
Hence, H(X) S log, m

and the equatity holds only if the symbols in,triare equiprobable, as in Eq. (11.62).
Informotion Theory and Source Coding

11.5 A high-resolution black and white TV picture consists of about 2 x 106 picture elements and
16 different brighureis levels. Pictures are repeated at the rate of 32 per second. All pichre
elemerts are assumed to be independent, and all levels have equal likelihood of occurrence.
, ,, ,,:,C cultte &e-averagerate of information conveyed by this TV picture $ource.

H(X): -
Ar r
bielement
-'1
I G,ot,;
r :2(106X32) - 64(106) elements/s
Hence, by Eq.(11.10)
R - rH(X) : 64(loux+)
- 256(106; b/s
- 256 Mb/s

Xi.6 Consi@C telegraph source hrying two symbols, dot and dash. The dot duratioa is,.0.2,s, The dash
duration is 3 times the dot duration. The probability of the dot's occurring is twice that of the dash
and the time between symbols is 0.2 s. Calculate the information rate of the telegraph source.

P(dot). : r'ilf,ll : ,
"iJl,Tl t) -
Thus, P(dash):1andP(dot)-
11
JJ
By Eq. (11.7)
H(X) : -P(dot)logrP(dot) - P(dash)logrP(dash)

- 0.667(0.585) + 0.333(1.58s)
- 0.92 b/symbol
/dot : 0.2s rourn : 0.6s /.0u.. : 0.2s

Thus, the average time per symbol is

4 : P(dot)/dot + P(dash)roa,h * /,pur" : 0.5333 s/symbol


and the average symbol rate is

, L: 1.875 svmbols/s
- T,J
Thus, the average information rate of the telegraph source is
R
- rH(X) : I .875(0.92) : 1 .725 bls

Discrete Memorytess Channets

|l,l,J Consider,'h b,iuary e$atmel,shq,wn in Fig. 11,7 xr


flxr)
(See Solved Prsblem 6.14)

(a) Find tlre channel marix ofthe channel.


(b) find(rr) and ryDwhenI(xr) : P{x):0.5
P(xz) xz
(c) Find the joint probabilities P(x,, y) arrd
I

I-:---:--l
I :l1.t6 | Anolog and Dfgital Communications

(a) Using Eq. (l l.l l), we see the channel matrix is given by

tP(YW)t: [:1'' ]''l !(tzJ'')l : ? o


fo 1l
lP(y,l*) P(yzix)) |10.2 0.8J

(b) Using Eqs. (l l.l3), (11.14), and (11.15), we obtain


tP(nl - [P(x)][P(vw)]

: [0.5 o.s]
' [o ? : ]l lo.2 o.8l

- [0.5s 0.4s] - lP(y) P(y)1


Hence, P(Y):0.55 andP(Yr)
- 0.45.
(c) Using Eqs. (l 1.16) and (1 1.17), we obtain
lP(x, ).)l : lP(x)ldlP(yw1
lo.s
tllt
ollo.q o.ll
I o o.sl [o.z 0.8]

_ [0.+s o.osl P(x,, y2\)


:IPG,,r,,)
-
I o, 04] lr@r,r,) P(xz,y)l
Hence, P(xv !z) : 0.05 an{ P(xz,/r) : 0.1.

11.8 Two binary channels of Solved Problem ll.7 areconnected in cascade, as shown in Fig. 11.8.
0.9 Yr 0.9

Fig. 11.8
{a) Find' fu ovr =uhr lfiaffi*.of *re r€$ul nt channel, and draw thE resultant eiuivalent
channel diegrfirp.

':l :i : l

(a) By Eq.(11.ls)
tP(Dt _ [P(x)]tP(w)l
lP(z)l - [P(D][P(zlY))
- IP (x)llP (y1x))tP (4Y )l
- [P(x)]lP(zw)l
Thus, from Fig. I 1.8
lP (zlx )l : lP (yv)ltP (zlY )l
Informotion Theory ond Source Coding

o.rl[o o o ,l :
: fo., o.sl o rTl
lo.z z o 8] fo.s:
[o :+ o ool lo
The resultant equivalent channel diagram is shown
in Fig. I 1.9.
(b) lP(z)1 - 1P(x)1lP(zlx)l
- ro s o sr l::l : i'l : ro s8s o 4rsl
[0.:+ 0.66]
Hence, P(rr): 0.585 and P(zr): 0.415.

11,9 A channel has the following channel matrix:

tp(rr)r: (r 1.64)
L-\

[,;o ;,:r]
(a) Draw the channel diagram.
(b) If the source has equally likely outputs, compute the probabilities associated with the channel
outputs for p :0.2.

(a) The channel diagram is shown in Fig. 11.10. Note that the channel represented by Eq. (l1.64)
(see Fig. 1 1 . l0) is known as the binary erqsure cltannel. The binary erasure channel has two
inputs xr:0 and x2: I andthree outputs lt:0,lz: e,and!t:1, where e indicates an
erasure; that is, the output is in doubt, and it should be erased.
(b) ByEq.(ll.1s)

lP(Y)l: [0.5 o rt-[0


[o.8
o 2 o
I X1 :0 Yt: O
0.2 0.8]

- [0.4 0.2 0.4] Yz: €

Thts P(7,) : 0.4, P(Yr) : 0.2, and P$,,t) : 0.4.


Ys: 1

t-p
Fig. 11.10 Binary erasure channel
Mutual Information

11.10 For a lossless channel show that


ff(xln _ 0 (r 1.65)
When we observe the outputy, in a lossless channel (Fig. ll.2), it is clear which x, was transmitted
that is,
P(x)y)-0orl (l 1.66)
Now by Eq. (11.23)

H(xlY): - fj:t fi:l ,r*,, !i) to6zP(xlyi)


nm
- -D Pj)D P@)y,) losz P(x,ly) 01.67)
j=r i:l J
.l !1,18 | Analog and Digitol Communications

Note that all the terms in the inner summation are zero because they are in the form of I x log, 1

or 0 x log, 0. Hence, we conclude that for a lossless channel


H(XIY) :0
11.11 Consider a noiseless channel with m input symbols and m output symbols (Fig. 11.4). Show that
H{x) H(v) - (r 1.68)
and H(Ylx) a - (r l.6e)
For a noiseless channel the transition probabilities are

P(y)*,): ',
(r 1.70)
{: =',
Hence, P@i, !) : P(yilx,)P@,) : [P@,) i: j (l l.7l)
I o i=i
and
, P(t) : D P@,, !) : P(x) (11.72)
i:l
Thus, by Eqs. (11.7) and (1L72)

H(Y) : -f Pe,)togrP(y,)
j:l
m

-D P@,)logrP(x,): H(X)
,:1
Next, by Eqs. (11.24), (l1.70), and (11.71)

H(Y|X): - D D P@,,t)toszP(y)*,)
j =1 i:1
mm
: - D r(r,) D togrP(y,lx,)
j:1

:-D p@,)logrt:0
i:1

ll.l2 Verify Eq. {I'l'"26y;r,that is,


": ,i,i
' I
H(X, n * H:WlY},+fftI)
From Eqs. (6.16) and (6.24)
P(xuy): P(xill)P(li)
m
and D P@,, !) : P(yi)
i:l
so by Eq. (11.25) and using Eqs. (11.22) and (11.23), we have

H(x, Y) --- t=t fi:l


j
,r*,,4)toeP(xi, t1)
Informotion Theory ond Source Coding

: -D D r(r,, 1,, ) log[P 1x,lti\pU))


l:l i-l
t1 ilt
:- D f P(.r,, t)lo1P(x)t)
i:t ,:r
n
: - f l,n
lIrtr,, y
I

)ltosP( y,)
jt L, r j

-- H(XID - D, P|t) logP(1i)


.j:l
: H(XlI') + H(Y)

1I.I3 Show that the mutual information I(X; I) of the channel described by Eq. (11.11), with the
inputprobabilitiesP(x), i - l,2,...,m,andtheoutputprobabilitiesP( \),_i: l,2,...,ncanbe
expressed as

{x; Y) : f,l .,r,, !,) rouz'-W (r 1.73)


i=t j:l

By Eq.(11.28)
\X; Y) : H(X) * H(X1Y)
Using Eqs. (l l.2l) and (11.23), we obtain

r(x;y): p(x,),"*, p@uyi)togrp(x,ly,)


t #.f I
' n
: t* lnlD ''
I
loe,
t
-l-+)-7 I P@,,yi)togrP(x,l1t,)
"' P(xi)
m

,=i l7 "tr,,y)lI ,=r

: D,D, PGi,vi) l,or, * log, Pt*)v ')1


i:l .t-t , =1-
r\xi) l

:ff 1=t,r*,,4)tosz'4+
- P(x,)

11.14 Veriff Eq (l1.29), that is


4X; Y) : I(Y; X)
Using Eq. (l1.73), we express (IZ; X) as

\Y; x) : if ,rr,,x,) log,


ry# (rr.74)
i:r i:l
Now, by eq. (6.19)
P(li, xi) : P!*u y,)
Analog ond Digital Communications
t-'1J-r0 I
P(y)*t) _ P(x,lYi)
and P(y ) P(x,)

(1 l'73)' we conclude that


Thus, comparing Eqs' (11'74) with

KX; Y) - KY; X)

FromEq.(11.73)andusingtherelationlog(alb)--log(bla),wehave

-\X;Y):ti:l tj:r P@"!)t"'ffi (11'75)

Using Bayes'rule [Eq' (6'19)]' we have

P(x,) P(x,) P(Y,)


P(x)Y) - P(xi,Yi)
be rewritten as
Then by using Eq' (1 1'6), Eq' (11'75) can
1 n n p(*)!(y)
- KX; Y) : #f P@pY) t"
n;;; (1 1.76)

E
Using the inequality (11'59), that is'
lncv ( a-1

we have

1 ,1'
P(xuyi)l?#'l
-t(X:Y)< #E I il
f , l^It fn r(r,) P(v i) -D,i=r D' 'G,, , ,)\
I
(rr '77)
or - r(x; y) <
--:
h2 l1 "i=t -i=t l

Since iL,
i:t i:l
P(x,)P(y): ii--1P(*,) D, 't )-
i =r
(1)(1) - 1

tf P(x,,vi): ilt :f P@') -|


,:t i=t H [r=' ",',,rrr1] ,=l

Equation (11.77) reduces to

-t(x; n < o
or I(X; Y)"2 0
Informotion Theory and Source Coding

11.16 Consider a BSC (FiS. 11.5) with P(x1) : a.


(a) showsa** Xt#-fU":ii*u, (r -p) (11.78)
(b) Calculate ffi, fimu o : 0.5 mdp: 0.1
(c) Repqat (&) for cu,* 0,5'andp = 0.5, and comment on the result.
Figrrye l'I'.1I slrorrs the diagram of the BSC with associated input probabilities.

: 1-p
ftxr) a xr

ff{$:*1., - a&

(a) Using Eqs. (1 1.16), (1 I .1 7), and (1 1 .20), we have

' ol[t-p p
\(1"/J
tP(x, Dl - 11
[o r-*][
l p t-p]
I

1"0-il ap Il:l lPQ,,y) P(\,yz)f


-l
l0 - ")p (l - o)(l - p)l lP(*r, y,\ P(xz, l,z)]
Then by Eq. (11.24)
H(ln - -y) lo9z P(ytlx) - P(xp y) logz P(yzlxz)
P@v

-P(x2, y) lo9z P(yrlxz) - P(x2, y) lo9z P(yzlxz)


- -a(l -p) log, (l - p) - ap logrp
-(l - a)plogrp-(l -aXl -p)logr(l -p)
:-plog2p-Q-p)logz0-d 01.79)
Hence, by Eq. (11.31)
I(X; Y) : H(Y) - H(YW)
_ H(y) * p Log.- p + (t _ p) togz 0 _ p)
(b) When a - 0.5 and p : 0.1 , by Eq. (1 I .15)

rP(r)t - '
rososr' [l? I]l :
lo.r o.el
roso5l
Thus, P(y) - P(y): 0.5.
By Eq. (11.22)

- -P(y) tog, P(yr) - P(y) tog, P(yr)


H(Y)

- - 0.5 log, 0.5 - 0.5 log, 0.5 - 1

plogzp + Q -p)logz0 -il:0.1 logr 0.l + 0.9 logr 0.9 : -0.469


Thus, I(X; Y) - 1 - 0.469: 0.531
Anolog and Digitol Communicotions

(c) When q - 0.5 andp- 0.5,

rP(nr : 5.,, : ro s o sl
r0
[: : ii]
H(Y) - 1
ptoszp + (t - p)los, (t - p) log, 0.5 + 0.s log, 0.5
l: i
Thus, I(X;Y)-1-l:0
Note that in this case (p - 0.5) no information is being transmitted at all. An equally accept-
able decision could be made by dispensing with the channel entirely and "flipping a coin" at
the receiver. When I(X; Y) - 0, the channel is said to be useless.

Channet Capacity

ll.l7 Verifu Eq. (11.36), that is,


' q:
/1 I'
logzm
where C. is the channel capacity of a lossless channel and m is the number of symbols in X.

For a lossless channel [Eq. (l 1.65), Solved Problem 1 l.l0]


H(xlY) - 0
Then by Eq. (1 1.28)
I(X; Y) : H(X) - H(X|Y) : H(X) (r 1.80)
Hence, by Eqs. (1 1.33) and (1 1.9)

c, : ,Il?ir
r(x; y):
{}??,1}
H(x): toszm

f 1.18 Verifo Eq. (11.4z),that is,


c, - I +plog, p + (l _ p)logz0 -p)
where C, is the channel capacity of a BSC (Fig. 11.10).
By Eq. (1 1.78) (Solved Problem I 1.16) the mutual information I(X; I) of a BSC is given by
I(x; Y) : H(Y) * plogrp + (1 -p) logr(1 - p)
which is maximum when H(Y) is maximum. Since the channel output is binary, H(Y) is maximum
when each output has a probability of 0.5 and is achieved for equally likely inputs [Eq. (11.9)]. For
this case H(Y)
- l, and the channel capacity is
C.
' - max. I(X; Y): 1 * plogzp + 0 - illogr(l -p)
lrlx l)

11.19 Find the channel capacity of the binary erasure channel of Fig. ll.l2 (Solved Problem 11.9).

Let P(x,) - a. Then P@r) - I - o. By Eq. (11.64)


\ t 'r
tp(,-x\1
L _lr-o p 0l:f PQ,lx) P(vrlx) p(vrlr,)l
p l- p) P(yrlx) P(ytlx)]
I O lP(yrl*r)
Information Theory and Source Coding

P(x) : a x,

P(xz\:1-aX2
1-p
Fig. 17.12
ByEq.(l1.ls)
'lt-P P o
tP(Dl-[ot-"'I
I

o p r- p)
-la(r-p)p(r-CI)(l -p)l
-- lP(y) P(y) P(y)l
By Eq. (1 I .17)

[P(x,vl] -i"[0 l-"]l1l'-' Po


0 p t-p) l

-l
l"(r - p) ap 0l
I 0 (|-a)p (1 - o)(t - ill
lP1xr, !) P(xr, y) P(
'r, /r)l
:
lr|r, rr) P(xz, yz) P( xz, !)l
In addition, from Eqs. (11.22) and (11.24) we can calculate

H(Y): - i P(y)togrP(y,)
j:1

- - a(l- p) logz a (l- p) - p logz p - Q - a) (l- p) log, [(1 -CI)(t-p))


- (l- p) [- a log, o - (l- o) log, (l- o)]
- p IoEzp - Q- p) log, (l- p) (1 l.8l)
32
H(Y1X): - D D P(r,, t) to1z P(yil*,)
j:r i=l
- a(l- p) log, (1- p) -op logrp
- (1 - a)p logrp - (1 - o) (1 - p)logz(l - p)
- - p log, p - (l- p) log, (l- p) (l 1.82)
Thus, by Eqs. (1 1.31) and (l 1.55)
I(x; Y)

-f::'),::{}z a -( 1 - *) log' ( 1 - 0)l


(1 1.83)
Analog and Digital Communications

And by Eqs. (1 1.33) and (1 1.56)


r - rll?1 16; Y): rfr?5r o- p) H(x) - (r-pl *fi?i H(x) - 1- p (1 1.84)

Additive White Gaussian Noise Channel

11.20 Fin{ the differential entropy H{X) of the uniformly distributed random variableXwith probabil-
ity density function
Ir A{x1a
"fx$): l;Io otherwise

for (a) a : 1, {b) a


- 2, and (c) a : I
2

By Eq. (1 1.43)
H(X) - - [* fr(x) log, f*(x)dx
v-L

_ - f' Llog2 1 dx - logra (11.85)


Jo o o
(a) a-l,H(X)-1o9, 1-0
(b) r:2,H(X)-logr2:l
(c) a -;,HW)-losz
::-bgr2--1
Note that the differential entropy H(X) is not an absolute measure of information.

ll.2l With the differential entropy of a random variable Xdefined by Eq. (l 1.43),that is,
^M,

. find the probabilily:dbu ty,,,function/r(x) for'which H(Y\isis n


which.FI(X) maxirnum
:

From Eqs. (6.37b) and (6.75),fx(x) must satisfy the following two conditions:

f**a) dx - | (11.86)

n @-p)'fr(x)dx:r' (1 r.87)

where p is the mean of X and o' is its variance. Since the problem is the maximization of H(X)
under constraints of Eqs. (1 1.86) and (1 1.87), we use the method of Lagrange multipliers.
First, we form the function

Gl.fr(x),),, )rl - H(X)* ^, lr/_: -f*(*)* - rlr \ fr/_"" @ - tD' fx@)n- - *l


_ f* lf*(x)logrfr(x) * \yf*@) * ),r(x- t)2fx@)ldx- \,- Aro2 (1LS8)
J-:x J
Infurmation Theory and Source Coding !1.25

where parameters .\, and ,\, are the Lagrange multipliers. Then the maximization of H(X) requires
that

?ffi - -logzfx(x) - logre + + l, (, - H)2 :0 (l l.8e)


^r
Thus, loeJ(x)- logze * l, * A, (, - tD'

or

Hence, we obtain

(11.e0)

In view of the constraints of Eqs. (l 1.86) and ( I 1.87), it is required that Az ( 0. Let

e*p|,-l+
' \ ]:rond,]' :-b2
t logre) logze
Then Eq. (l1.90) can be rewritten as
fx@) - o'-P('-*P (1l.el)
Substituting Eq. (11.91) into Eqs. (l1.86) and (l1.87), we obtain

, d&(,-az dx - ,{ -- t (tt.e2)
[:
'[: @-tD',-u26-'12 dx- ,#:o2 (l t.e3)

Solving Eqs. (l 1..92) and (l1.93) for a and b2, we obtain

o-L andb2- I
1t2ro 2o2
Substituting these values Eq. (l l.9l), we see that the desired/r(x) is given by

f*(*): +2no ,-6-P72rPi1 (l l.e4)


I
which is the probability density function of a gaussian random variableXof mean pt andvariance
d leq.6.el)1.

11.22 Slrcn, fut the chamel eapacity of an ideal AWGN ehrml with infinite hdrridth is givan by

C*: ,'= { x 1.44{ ul, ( l r.e5)


tnl q rl
urlprc S is tlrc, an-erage:siffi:,Wwer a*d qlZ is ttre porner spoeerl dffifrity of s&ite gnrsrirn
mire.
Form Eq. (8.6) the noise power N is given by N
- qB. Thus, by Eq. (l1.48)
"' [tI * A]
C: Bton,
qB.)
' l--:--1
l, Ib#., I Anqlag ond Digital Communicotions

Let Sl(qB) : ). Then


c I S ln(t+))
C : I vL\ +
log,(l -^ (t 1.96)
il, ^): ln2 11 ,\

Now lim Blo*.[t"!l


C- B_.x ". I qB )
I
- ln2 {tirnln(l+))
11 x'o l
Since
|13 f t<t + l)l/^: 1, we obtain

ln2 r7 n
Note that Eq. (1 1.95) can be used to estimate upper limits on the performance of any practical
communication system whose transmission channel can be approximated by the AWGN channel.

l*,r3 Csnrider gp AlilGN channel with 4.kHz bandwidth and the noise power spectral density
EtZ = 10-12WII{z. Tho sigual power required at the ressiver is 0.1 mW. Calculate the capacity
: :; of this channel.
B- 4000 Hz S : 0. 1( 10-3) W
N: rlB :2(lo-tz)(4ooo) : 8(10-e) w

Thus, s 0'1(101
,^/- g(10-e) -1.25(104)
And by Eq. (l 1.48)

C : "'I [t * {]
Blon.,
N)
4000 log, 1.25(100)] 54.44(1031b/s
- [1+ -

,,., r *atisfi*Olly in@M"


(a) What is the i sn ratc of this sourpe?
(b) Can th1 otlpuf ,of ,this rource be qansmlltga-yittrout error over an AWGN channel *th t

(d) Find the bmrdwidth required for an AWGN channel for error-free transrnission of the output

(a) f,,,r:4(lo3) Hz
Nvquistrate
rl;t*,Tlirffi,iilio,,r,
H(X) :8 b/sample
=Logz256
Information Theory and Source Coding

By Eq. ( I I . 10) the information rate R of the source is


R: rH(x) : 1oo(g) b/s
- 8o kb/s
(b) Bv Eq. (l1.48)

C: Blon"
"'llt - +J : l041os zG + 102;: 66.6(103y b/s
l
Since R > C, error-free transmission is not possible.
(c) The required S/l/ ratio can be found by

c: lo+log.
"- l,*{l> s(loo)
t I't)
or ""I N)-,*
lon,[,*{l

N1/
Thus, the required S/A ratio must be greater than or equal to 24.1 dB for error-free transmission.
(d) The required bandwidth B can be found by

C : B1og, (1 + 100) > 8(104)

8(104)
"
B -> Logll + 100) : 1.2(101; Hz : 12 kHz

and the required bandwidth of the channel must be greater than or equal to 12 kHz.

Source Coding

11.25 Consider DMS X with two symbols x, and x, and P(x, ) 0.9, P(*r)
a - * 0. I . Tabte 17.4
Symbols x, and x2are encoded as follows (Thble ll.4):
Find the efficiency 4 and the redundancy 7 of this code. xi P(xi) Code
Jrl 0.9 0
By Eq. (11.49) the average code length L per symbol is
x) 0.1 1

,t-2: D p(x,)n,- (0.9X1) + (0.1Xl) : I b

By Eq. (1 1.7)
2
H(X):- I P@)tog,P(x,)
i:1
: -0.9 1og, 0.9 - 0.1 log, 0.1
- 0.469 b/symbol
Thus, by Eq. (l 1.53) the code efflciency 4 is
H(x)
,: L -0.469:46.9yo
By Eq. (l 1.51) the code redundancy 7 is
1:l-q- 0-531
-53.1o/o
Anolog and Digital Communications

Tabte 17.5
11.26 The second-order extension ofthe DMSXof Solved Problem
11.25, denoted by .*, is formed by taking the source sym- at P(a,) Code
bols two at a time. The coding of this extension is shown in qt: xl xl 0.81 0
Table 11.5. Find the efficiency 11 and the redundancy 7 of
this extension code. a2: xl x2 0.09 l0
a3: x2xl 0.09 110

t: D P(a,)n, o^: +1_f- x. x. 0.01 111

- 0.81(l) + 0.0e(2) + 0.0e(3) + 0.01(3)


1.29 b/symbol
-
The entropy of the seconcl-order extension of X, H(*), is given by
4
H(X'): D P(a) log, P(a,)
i=l

- - 0.81 log, 0.81 _- 0.09 logr 0.09 - 0.09 log, 0.09 - 0.01 log, 0.01
: 0.938 b/symbol
Therefore, the code efficiency 4 is

,t -- H(x\ : 9 ?',t :0.727 - 7z.7To


L 1.29
and the code redundancy 1 is

't:l-ry -0.273-27.3%o
Note that H(X') : ztt(X).

11.27 Consider a DMS X with symbols x,, i : l, 2, 3,4. Table I I .6 lists four possible binary codes.
ility.
O) Show that codes A and D are uniquely decodable but codes B and C are not uniquely decodable.

Tabte !7.6
xi Code A Code B Code C Code D
xl 00 0 0 0
x2 01 10 l1 100
x3 t0 l1 r00 110
x4 ll ll0 il0 lll
(a) From Eq. (l I .54) we obtain the follorving:
For code l: frt:fr2--n3--flq:2
r-tz-' 4

- ]*1*1+1': l
i=r4444
Information Theory and Source Coding t-1r;tl
For code .B: ilt: I n2: n3: 2 ftt: 3

x-fz-':1*l*1+f:t1>t
7124488
ForcodeC: fft:l frz:2 n3:no:3
K-f2-''-!*l*1+f:l
2 4 8
fr 8

For code D: flt: | ffz: ff3: tt4: 3

K- i2-', - !*1*1+1 : 7..r


7, 2 8 8 8 8
All codes except code B satisfr the Kraft inequality.
(b) Code A and D are prefix-free codes. They are therefore uniquely decodable. Code,B does nqt
satisfy the Kraft inequality, and it is not uniquely decodable. Although code C does satisfy
the Kraft inequality, it is not uniquely decodable. This can be seen by the following example:
Given the binary sequence 01 10110. This sequence may correspond to the source sequences
x{ix{4or xf4x4.

11.28 Veriff Eq.(11.52), that is,


L>H(N
where I is the average code word length per symbol and H(X) is the source entropy.

From Eq. (11.61) (Solved Problem 11.4), we have

fi:l <s
. ',bgr9-
ti

where the eqtrality holds only if Qi: P,.Let

Q,: T- (11.97)
K
where K-f2-"'
Lr-
(11.98)

which is defined in Eq. (1L54). Then

is, - +f r-' - r
i:l r\ i =l
(rr ee)

and ir,t s,;: Ie[r"s, ;-,,-r"g, KJ

:-i p, tog. p,
- ii:l pini
-(losz xli P, (l 1.100)
l:l l:l

-H(n-L-logrK<0
From the Kraft inequality (11.54) we have
logrKJ 0 (11.101)
I t1-.3o I Analog and Digitol Communicotions

Thus, H(.Y) - L < log, K < 0 (1 l.102)


or L > H(x)
The equality holds when K: I and P,: Qr

ll.2g LetXbe a DMS with symbolsx, and corresponding probabilitiesP(x,) -- P,,i - 1,2, ...,m.
Show that for the optimum source encoding we require that

x*t2-' - i:1
1 (1 1.103)

I
and n,:log"
"t --oz D -I ',
-1i (11.104)

where n, is the length of the code word corresponding to x, and !, is the inforrnation content ofx,.

From the result of Solved Problem 11.28, the optimum source encoding with L : H(D requires
K : 1 and P, I . Thus, by Eqs. (l 1.98) and (1 1.97)
-
K-
,Dt -r
(l1.r0s)

and Pi: Qi: 2-'i (1 1.106)


I
Hence, fti:-logrP,-logz T:li
ti
Note that Eq. (1 1.104) implies the following commonsense principle: Symbols that occur with high
probability should be assigned shorter code words than symbols that occur with low probability.

11.30 Consider a DMS Xwith symbols x, and corresponding probabilities P(x,)


- Pt i : 1,2, ..., ffi.
Let n,be the length of the code word for x,such that

'i ti
Show that this relationship satisfies the Ituaft inequality (l 1.54), and find the bound on K in
Eq. ( I 1.s4).

Equation (l 1.107) can be rewritten as

- logz P,1 n, S - logz Pi +l (l 1.r08)


or log, P, ) - n,2 logz P, - |

2log
P,
> 2-" Z2loe-
I 2-,
Then

or P,>2-"'>!P, (1 1.10e)
-2 t

Thus, re>f
LJ
i:l
I-Z-t
i=l
2-n >1re
-^.L/
'i:l
t
(11.110)

ml
or
/-t- 1
(1r.r l l)
-
i:l .u
-.l

Information Theory and Source Coding

which indicates that the Kraft inequality ( I 1.54) is satisfie4 and the bond on K is

1 <K< l (l 1.1 l2)


2

1I.31 Consider a DMS Xwith symbols x, and corresponding probabilities P(x,) : pp i: 1,2, ..., ffi.
Show that a code constructed in agreement with Eq. (l I.107) will satisfirthe foilowing relation:

H(X)<L<H(X)+t (r l. I l3)
where H(n is the source entropy and L is the average code word length.

Multiplying Eq. (l 1.108) by P, and summing over i yields

-D, p, tog,p, <f n, p, Sf ,,(-logrp, * 1) (1 1.1 14)


i:l ,:l i:l
Now fi:l ,,(-losz pi+t)-- f ,,bgrp,*f ,,
i: I i=l
:H(X)+l
Thus, Eq. (1 1.114) reduces to
H(n<L<H(X)+t
Entropy Coding

11.32 ADMSXhasfoursymbolsxl,r2,x3,irildxowithP(xr)- 1, P(x): l,andp(x.) l.


' \J/ -p(xo):
\!/
2 4' g
Construct a Shannon-Fano code for X;show that this code has the optimum property that a, : {rJ
and that the code efficiency is 100 percent.

The Shannon-Fano code is constructed as follows (see Table ll.7):


Table tt.7
xi P(x,) Step I Step 2 Step 3 Code

xl I 0 0
2

x2 1 I 0 10
4

I
x3 I I 0 110
8

x4
I
I I I 111
8

I(xr) - -logz !:l


2 - n, I(xr) - - logz !
4
_- z:fr2
Analog and Digital Communications

- - logz-8! :3 :
I
I(xr)- -logz::3 - nr l(xo) no
E

H(x): ii:l P(x,)/(x,) - |r,r


+f,o+ |t:t + ]or : t-75
4
Z- D P(x,\n,- jt,r * f,rrt* |trt + |rrr - t-75
i=l

^_ H(X) _, _
,L
t,
- -- -
I
-
100%

11,33 A DMS Xhas five equally likely symbols.

(b) Construct another Shannon-Fano code and compaxe the results.


(c) Repeat for the l{uffinan code and compare the results.
(a) A Shannon-Fano code [by choosing two approximately equiprobable (0.4 versus 0.6) sets] is
constructed as follows (see Table I 1.8):
Table 11.8
xi P@,) Step I Step 2 Step 3 Code
xl 0.2 0 0 00
x2 0.2 0 I 0l
x3 0.2 I 0 l0
x4 0.2 I 1 0 110
x5 0.2 I 1 I 111

H(X): - i P@,)log, P(x,)- 5(- 0.2tosr0.2) - 2.32


,'='
t- D P(x)ni: 0.2(2 +2+2+ 3 * 3) - 2.4
i:r
The efflciency 4 is
q- H(x) - 2'32 _ a.967
L 2.4 -96.7vo
(b) Another Shannon-Fano code [by choosing another two approximately equiprobable (0.6 ver-
sus 0.4) setsl is constructed as follows (see Table I 1.9):

L : D p(x,) ni: O.Z(Z+ 3 + 3 + 2 *2)-2.4


i=l
Since the average code word length is the same as that for the code of part (a), the efficiency
is the same.
Information Theory and Source Coding

Table 11.9

xi PV,) Step I Step 2 Step j Code


xl 0.2 0 0 00
x2 0.2 0 I 0 010
x3 0.2 0 1 I 0ll
x4 0.2 I 0 10
x5 0.2 1 I 1l

(c) The Huffrnan code is constructed as follows (see Table I1.10):


5

t: D P@,) ni: 0.2(2+ 3 + 3 + 2 * 2) - 2.4

Since the average code word length is the same as that for the Shannon-Fano code, the efficiency
is also the same.
Table 11.10

P(x) Code
v
li-q5 lll
0l
x1 0.2 0.4

x2 0.2 0.2

x3 0.2 0.2 0.2 |


0l
x4 0.2 0.2

x5 0.2

1134 A DMS Xhas five symbols x1, x2t x3r xa, drLdx, with P(xr) * 0.4, P(x) - 0.19, P(xl) - 0.16,
P@o): 0.15, and P(rr) : 0.1
(u):..co"*truef.':sh#anocodeforX,andca1cu1atetheefficiency:offtecode.
(b) Repeat for the Huftnan code and compare the results.
(a) The Shannon-Fano code is constructed as follows (see Table l1.l l):
Table 11.11

xi P(x,) Step I Step 2 Step 3 Code


xl 0.4 0 0 00
x2 0.r9 0 I 01
x1 0.16 I 0 t0
x4 0.15 I I 0 110
x5 0.1 I I I lll
I N1.34 | Analog and Digital Communications

H(X): - P@,) tos., P(x,)


I
: - 0.4 logr 0.4 - 0. I 9 logr 0.19 - 0.16 log, 0. l6
- 0. 15 log, 0. 15 - 0.1 log, 0. 1

- 2.r5
: D
5

Z p(x,)n,
i:l
- 0.4(2) + 0.1e(2) + 0.t6(2) + 0.ls(3) + 0.1(3) :2.25
H(X) 2.r5 0.956 95.6Yo
L
,l:
2.25
- :

(b) rhe Huffinan;:;":'ff:.o.. forlows (see rabre n t2):


-
-:

: ol,l) + (0.1e + 0.16 -r 0.15 + 0.1X3) - 2.2


,: H(x)
L -
2.15
2.2
:0.97j - 97.7Tc

The average code word length of the Huffinan code is shorter than that of the Shannon-Fano code,
and thus the efficiency is higher than that of the Shannon-Fano code.
Table ll.l2
xi E@) Code

x1 0.4
1
0.4
1l
0.4 I - r-0.6+
x2 0.19 0.2s
0l
o3s oo )o*;
000
x3 0.16 0.19 o.2s 01 --r
X4 0.15 0.16

xs 0.1

11.1 Consider a source Xthatproduces five sym- ll.2 Calculate the average information content
in the English language, assuming that each
bols with probabiliti", 1, 1, 1, I and of the 26 characters in the alphabet occurs
24816 with equal probability.
a. D.,"rmine the source entropy H(X) lAns.4.7 blcharacterl
l6 11.3 Two BSCs are connected in cascade, as
lAns. 1.875 b/symboll shown in Fig. 11.13.
tlffis I

0.7

Fig. 11.13

(a) Find the channel matrix of the resultant ll.7 Show that for a deterministic channel
channel. H(YIX) s :
(b) Find P(z) and P(z.r) if P(x,) : 0.6 and fHint: Use Eq. (11.24), and note that for a
P@) : o'4' deterministic channel P(y)*,) are either
0 or l.l
tAns,,,
ll li lll] 11.8 Consider a channel with an input X and an
output I Show that if X and Y are statisti-
(b) P(zr) : 0.524, P(z) :0.4761 cally independent, then H(XIY): H(X) and
ll.4 Consider the DMC shown in Fig. ll.l4. I(X; Y): Q
(a) Find the output probabilities if P(x,) : lHint: Use Eqs. (6.51) and (6.52) in
Eqs. (l1.23) and (l1.28).1
I urd P(xr)- P(xr) - I
11.9 A channel is described by the following
24
channel matrix.
(b) Find the output entropy H(Y).
(a) Draw the channel diagram.
,\, lAns. (a) P(y,) :7124, P(y) : 17148, and (b) Find the channel capacity.
I
P(Y):17148 (b) 1.58 b/symboll
[l 1ol
)

1
3
122l
o r]
[o
1 lAns. (a) See Fig. I I . 15. (b) I b/symboll
4
11.10 Let Xbe a random variable with probability
density function/r(x), and let Y : aX * b,
where a and b are constants. Find H(Y) in
terms of H(X).
lAns. H(Y) - H(Y) * logra)

Fig. 11.14
11.5 Verify Eq. (1 1.32), that is,
I(X; Y) H(x) + H(Y) - H(X, Y)
-
lHint: Use Eqs. (1 1.28) and ( 11.26).1
11,.6 Show tIntH(X,Y) < H(n + Hg) with equal-
ity if and only ifXand Y are independent. 1

fHint: Use Eqs. (11.30) and (l1.32).] Fig. 11.15


| 11.36 I Analog and Digital Communications
_

11.11 Find the differential entropy H(X) of a (b) Let no be the fixed code word length.
gaussian random variable Xwith zero mean Show that if f,o: logrm, then the code
and variarce o'*. efficiency is 100 percent.
: I .
(Zreo"*)f lHint: Use Eqs. (11.49) and (t 1.52).1
lAns. H(X \
rrof, 11.15 Construct a Huffinan code for the DMS X
ll.l2 Consider an AWGN channel described by of Solved Problem 11.32, and show that the
Eq.(11.46), that is, code is an optimum code.
Y-X*n lAns. Symbols: xt x2 .r3 x4
where X and Y arc the channel input and Code: 0 10 110 llll
output, re sp e ctive ly, and n i s an additive white
gaussian noise with zero mean and variance 11.16 A DMS Xhas five symbols x1, x2, x3,r4, ord
o',. Find the average mutual information x, with respective probabilities 0.2, 0.15,
I(X; Y) when the channel inputXis gaussian 0.05,0.1, and 0.5.
rvith zero mean and variance o'r. (a) Construct a Shannon-Fano code for X,
and calculate the code efficiency.
(b) Repeat (a) for the Huffinan code.
I(X; Y):
lA'ns.
)rorrlt.*), lAns. (a) Symbols: xt x2 x3 x4 xs
11.13 Calculate the capacity of AWGN channel Code: l0 110 111I 1l l0 0
with a bandwidth of I MHz and an S/l/ratio Code efficiency rl:98.6 percent.
of40 dB.
(b) xt x2 x3 x4 xs
Symbols:
lAns. 13.29 Mb/sl
ll.l4 Consider a DMS X with ru equiprobable Code: 11 100 1011 1010 0
symbols x,, i : 1,2, ... ) m. Code efficiency rl:98.6 percent.]
(a) Show that the use of a fixed-length ll.l7 Show that the Kraft inequality is satisfied
oode for the representation of r, is most by the codes of Solved Problem 11.33.
efficient.' lHifi: Use Eq. (1 1.54).1

You might also like