UNIT - I
INFORMATION THEORY
[syutasus |
| Discrete Memoryless source, Information, Entropy, Mutual Information = Discrete
| Memoryless channels — pinary Symmetric Channel, Channel Capacity - Hartley «
| Shannon law - Source coding theorem - Shannon - Fano & Huffman codes.
Information theory was known as the mathematical theory of communication,
nmunication system
deals only with mathematical modeling and analysis of a com
rather than with physical sources and physical channels.
Information theory provides,
“The minimum number of bits per symbo! required to fully represent the
source;
The maximum rate at which reliable communication can take place over
the channel.
1.1. UNCERTAINTY
“Uncertainty refers to justification situations involving imperfect or
unknown information”.
Uncertainty applies to predictions of future events to physical measurements
that are already made, or to the unknown.
The lack of certainty, a state of limited knowledge where it is impossible to
exactly describe the existing state, a future outcome, or more than one possible
outcome.
Example:
The following example related to uncertainty (or) surprise.{=
| Sun Rises in East
i
| Here uncertainty is zero, because there is no surprise in the Statemen, nt, 9
probability of occurrence of sun rising in the east is always I. Me
Sun does not Rise in East
| Here Uncertainty is high because there is maximum surprise, MAXIM |
information as it is not possible. ‘|
|
: R |
Consider the source output is modeled as a discrete random variable, Sw
| .
| takes on symbols from a fixed finite alphabet. 7
$ = {5951541} (la)
| with probabilities,
| PS=s) = py = 0,1... #1 (13)
| Set of probabilities must satisfy the condition,
| Tp sd (13)
«
Consider the event S = s,, describing the emission of symbol 5, by the souree
with probability p, , as
P(S=5,) = Py (14)
if the probability p, = 1 and p, = 0 for all j # k, then thére is no ‘surprise’ and
therefore no “information” when symbol s, is emitted,
If, on the other hand, the source symbols occur with different probabilities, and
the probability p, is low, then there is more ‘surprise’ and therefore ‘information’
| wwhen symbol s, is emitted by the source than when symbol S;, i # k with higher
| probability is emitted.
UNCERTAINTY - SURPRISE - INFORMATION
Uncertainty. Surprise and Information are all related.
Before the event S = 5, occurs, there is the amount of uncertainty.
“+ When the event S = s,, occurs there is the amount of surprise.
* Affer the occurrence of the event § = 5,, there is gain in the amount of
infromation,
1
2Information Theor,
1.2. INFORMATION
of
of the probability
The amount of information is related to the inverse
occurrence.
Definition
The amount of information gained after observing the event S = Sa» which
oceurs with probability »,,
S 1 5)
16) = toe a
() 25, |
By using logarithm to base 2, the resulting unit of information is ealled the bit,
1.6)
in 16) = lon (ze) 40,1,
1.2.1. PROPERTIES OF INFORMATION
Property 1
If we are obsolutely certain of the outcome of an event, even before it occurs,
there is no information gained.
1) = 0 for p,=1 --C.-7)
Also, if recently knowns the message being twansmitted, the amount of
information carried is zero.
Property 2
‘The occurrence of an event S = s, either provides some or no information but
never brings about a loss of information.
1(s,)20 for OSp,=1 +18)
Property 3
‘The less probable an event is, the more information we gain when it occurs.
i.e. If there is more uncertainty about the message, information carried is also
more.
1(s,)> 1s) for pe
Le more ola | +(L.14)|
where ¢ is the base of the natural algorithm.
By using the inequaltiy equation (1.13), we get
! 4s) laa! (# )
log, |} < aT
XP 82 (# log, e oP Px
1 Ket
S Tose zap)
1 (= K=1 }
< ax- D Px | =0
Tog, e | 2% 2,7#
is 4s)
ie. log,| ~~] < 0 2 (LAS)
where the equality holds if g, =p, for all k.
. 1
Suppose Ge = i #=01..K-1 (1.16)
which corresponds to an alphabet $ with equiprobables symbols. The entropy of
a discrete memoryless source with such a characterisation equals.
Kz) 1
I —|= K (1.17)
~% 4 on. (2] log, ol
Substitute equation (1.16) in (1.15) we get
Kel i
Lye toes( 5) S log, K (1.18)
K=0 Ps
Equivalently, the entropy of a discrete memoryless source with an arbitrary
probability distribution for the symbols of its alphabet S is bounded as,
H(S) < log, K (119)
a ee)holds only if the
“The inequality holds only HFt
|
\
he alphabet § are equiprobable. |
ample 1.7 | Entropy of Memoryless Source: Consider @ bin
for whic 7 7
for which symbol 0 occurs with probability p, and symbol 1 with pro
P,=I-pe
ary source
bability
Solution:
relations ; emitted by
We assume that the source is memoryless so that successive symbols emitted 92
the source are satistically independent.
The entropy of such a source equals,
H(S)
~ Po logs Po~ Pr logs Pr
= Sr easicti-Po log, (1 = Po) bits 20)
“When py = 0, the entropy H(S) = 0. This follows from the fact that x log
x>0asx 0.
a8 When p= I the entropy H(S) =
“The entropy H(S) attains its maximum value, H,,4x = 1 bit, when py = Po
1,
3 ie. symbols 1 and 0 are equally probable.
Extension of a Discrete Memoryless Source
Consider blocks rather than individual symbols with each block consisting of
‘a’ suce
ve source symbols.
Such block as being produced by an extended source with a source alphabet S”
that has k" distinct blocks, where & is the number of distinct symbols in the source
alphabet § of the original source.
In the case of a discrete memoryless source, the source symbols are statistically
independent.
of a source symbol in S" is equal to the product of the
probabilities of the source symbols in S constituting the particular source symbol
ins”,
ee |
Hence, the probabilitya
Digital Communica)
ai
entropy of the extended source, is equal to 7 times } 1(S),
entropy of the original source,
H(S") = 1 H(S) a2
(12
Consider a discrete memoryless source with source alpha
abe
ih probabilities.
©Solution:
1
P(So) = Po“
1
pS) = P=y
1
and P(S2) = P2=2
1 1
H(s) = po Joes i logs, whee
= i log, (4) + F log, +5 1 W665 (2) = 5 bits
Example 1.9 | Prove that if there are ‘M’ number of equally likely messages,
then entropy of the source is log, M.
@Solution:
ai
=M
i.e, the probability for all the messages are, py =
Zl-
Pm
Entropy is given by,
: :
H(S) = ¥ py log,>-
k=O Pp
1 1 1
: a +P logs +--+ Pm log 5
1
4 log, M+ log, M+ +h lon M
ee eeInformation Theory Sean
By adding all the term we get,
1.4. INFORMATION RATE
P , i jon per
Information Rate ‘R? is represented in average number of bits of information pe!
second.
R = r HS)
where r= is rate at which messages are generated.
H(S) is the entropy
; | Messages.) _ Information bis)
Re R= (rin Second) «(H16) in” Second
An analog signal is bandlimited 10 B Hz and sampled at
Nyquist rate. The samples are quantized into4 levels. Each level represents one
message. Thus there are 4 messages. The probabilities of occurrence of these 4
levels arep, =p,
1
and py =p3
Find out information rate of the source.
(Nov/Dec 2011)
©Solution:
Given:
1. 3 23,1
Pi=gsPo=g>P3—g Pang
Information Rate R = r H(S)
r = rate of message
We known that the signal is sampled at Nyquist rate:
Nyquist rate for B Hz bandlimited signal is,
Nyquist rate = 2 B samples /sec
Every sample generates one message signal, messages per second~
Digital Comma
iccuj
Entropy H(S)
oe 1 i 1
H(S) = py logs 57 + P2 logs 5 * Ps log. 5 * Py logy
yi
1 3 Be
= 3 log 8 +3 log, 5
H(S) = 1.8 bits ‘message
R = rH(S) = 2Bx18
R_=_ 3.68 bits /sec
1.5. SOURCE CODING THEOREM (Shannon’s First Theorem)
Bod
logs 5 +g log, 8
Information Rate R
The process of generating data by a discrete source is called source encoding
For the source encoder to be efficient we require knowledge of statistics of the
source.
“If some source symbols are known to be more probable than others, then we
may exploit this feature in the generation of a source code by assigning short-code
words to frequent source symbols and long code words to rare source symbols.”
It referred as a variable length code.
Morse Code
The Morse code is an example of a variable length code.
In morse code, the letters of the alphabet and numerals are encoded into streams
of marks and spaces, denoted as dots *.’ and dashes ‘-‘ respectively.
To develop an efficient source code, the encoder must satisfies two functional
requirements,
*% The code-words produced by the encoder are in binary form.
% The source code is uniquely decodable, so that the original source
sequence can be reconstructed perfectly from the encoded binary
ae eelaformation Theory
Consider a encocting proc
Dy Binary
Sk. Source he
sequence
encoder
Source encoding .
Adee ' vd by the source
A discrete memoryless. souree whose output $, is converted by the
encoder into a stream of 0s and 1’s denoted by by.
hat the k" symbol Sx
The source has an alphabet with & different symbols, and th
occurs with probability py. =0, 1... k= 1
sed to symbol S, by the encoder have
Lot the bianry code-word a ength xs
measured in bits. The average code-word length L of the source encoder as,
skal
T= Daly
iat
the parameter L represents the average number of bits per source symbol used in
the source encoding process.
Lot Ligy denote the minimum possible value of L then coding efficiently of the
source endcoder as,
with D2 Lj, and $1.
The source encoder is said to be efficient when 9 approaches unity.
Statement
Given a discrete memoryless source of entropy H(S), the average code word
length L for any source encoding is bounded as
L > H(S)
The entropy H(S) represents a fundamental limit on the average number of bits
per source symbol,— al Commune
cessary to represent a discrete memoryless source in that L can be mage,
small as, but no smaller than, the entropy H(S).
Thus, with L,,,,, = H(S), we may rewrite the efficiency of a source encode;
Ty
terms of the entropy H(S) as
= H©)
new
L
Code Redundancy |
Code redundancy is the measure of redundancy of bits in the encoded message
sequence.
Redundancy y = 1 -code efficiency
1-n |
Code Variance
Variance is the measure of variability in codeword length .
Variance of the code is given as,
Ne1
= La %-D
k=O
6
Here? is variance of the code
P, is probability of k* symbol
ny is the number of bits assigned to k* symbol.
L - isaverage codeword length
N -_ is the number of symbols,
DP ereereeneeere
1.6. PREFIX CODING
Consider a discrete memoryless source with
and source statistics {Pos Piss Pea}
‘The out;
put of the code has to be uniquely decodable,
}
1 source alphabet{so, $1, .-. Sk-1!Information Theory
1.6.1. PREFIX CONDITION
Let the k codeword assigned to source symbol s, o be denoted by
{Myr Mi Mun are 0’S
and 1°s
a> «+» My,,}, Where the individual elements Mxj --
and n is the code word length.
The initial part of the code word is represented by the elements Mx +»
M,,, for some i 0.125 00 110 O11
83 0.125 i 111 o1i1
©Solution:
: From the given table,
Codel
Code I is not a prefix code since the bit ‘0’, the codeword for sp is a prefix of 00,
the codeword for s2-
Note |
For prefix code; a code in which no codeword is the prefix of any other code-
word.Digital Com —~
Bi Midiieg
Code I is a prefix code bacaus
it satisfy the prefix code condition,
Code Ill
Like code I, code III is not a prefix code, the bit 1, the codeword for 5 i
~ Sisis
prefix of 11, the code - word for s;.
Decision Tree
It is graphical drawing of a code words in the particular source code, j
It is equivalent to source decoder.
Source Decoder |
The source decoder simply starts at the beginning of the sequence and decodes)
one code-word at a time.
Initial
State
Fig. 1.3. Decision tree for code Il
* The tree has an initial state and four ten
Source symbols so, s,, Sy and 53.
state,
minal states correspondit
The decoder always starts in the init!
The firs as 5
he first received bit moves the decoder to the terminal state sy ifit js
or else to second di
lecision point if it is 1Information Theory
he second bit moy
the decoder one step further down th
to terminal st;
sion point if it is 1 and
le 8). iPit is 0 or else to a third decision point if it is
So on.
Once er is ie its
* Once each terminal state emits its symbol, the decoder is reset 10
initial state,
For example:
the encoded sequence 1011111000... is readily decoded as the
Source sequence s, $5 83 s9 80...
ie WU Ooo
St S83 S2_ So Sq
1.6.2. KRAFT MC MILLAN IN EQUALITY
If a prefix code has been constructed for a discrete memoryless source with
source alphabet (so, s), ... s, _,} and source statistics {pq Py. »-- Py 4} and the
code-word for symbol sx has length /, , k= 0, 1... 1 then the code-word lengths
of the code satisfy a certain inequality known as the kraft-McMillan inequality.
Ket
ries
ko
where factor » =
2 refers to the radix (number of symbols) in the binary
alphabet.
“If the code-word lengths of a code for a discrete memo!
kraft-McMillan inequality,
constructed”,
ryless source satisfy the
then a prefix code with these code-word lengths can be
Proof
Consider the quantity
Ko1 n
irae = lotr e
K=0
trolk- yn
where n is a positive integer.
By expanding right side of the above equa
peli ei len
tet fea =
tion,
=p|
Digital “ommnnicag
— ta
Let the smallest of the code word lengths /, be unity and the largest be / Th
| the integer i may take on a value extending from n ton /. ™
Let N, denote the number of terms of the form r~4, we may rewrite the equat
y ion
Kei 1 nt
Dre} = SN
kao fe
N, is also the number of code symbol of length . Hence, if the code is uniquely
decodable, N, cannot be greater than 7.
[ie n > at
. Y i
ice., (= , ) < Yrirtsnt-ntl
ko ion
According we have
rk } < ni foralln
Taking the nt roots of both sides of inequality, we get,
Kel
york sap for all n
=0
Limit —> 00, then
Hm Gy ym
get the inequality given.
‘The average code-word length of a prefix code is bounded as follows.
H(S) < L
tS) 2 Py bok py
I
102 lon |g Tol logs 0.11
, 1
0.23 logy 9295
1
10.22 los 533
1
> 10.15 loge 9.75
10.09 lo
mol |
2. Shannon-Fano
Symbol Probability Stage 1 Stagell Stage stage Code word Length
ul IV
A lo] 00 2
F ol 2
B 10 2
E 100 3
c Ol 1110 4
D 0.09 Mi 4
‘Average Codeword Length
be ano)
C= Dem
kel
= 0.23 (2) + 0.2(2) + 0.11 (4) + 0.09 (4) + 0.15 (3) + 0.22(2)
55 bits“Ee
tal Communia
7.8. HUFFMAN CODING
|
|The Huffinen Code is a souree eode whose average word length approaches
ental limit set by the entropy of a discrete memoryless source.
fund:
The algorithm is used to synthesize the code is to replace the prescribed set gf
eto
source statistics of s discrete memoryless source with a simpler one. |
This reduction process is continued in a step-by-step manner until we are lef
ties of only two. fo
'
with a final set of source stati which (0. 1) is an optimal code, |
Algorithm |
1. The source symbols are listed in order of decreasing probability
ed a “0° and
2, The two source symbols of lowest probability
combined into a new
These two source symbols are regarded as bel
equal to the sum of the two original
source symbol with probabil
probabilities.
4. The list of source symbols and therefore source statistics, is thereby
of the new symbol is placed in
reduced in size by one. The probabili
the list in accordance with its value.
5. The procedure is repeated until we are left with a final list of source
statistics (Symbols) of only two for which a 0 and 1 are assigned.
6. The code for each source symbol is found by working backward and
gned to that symbol as well as its
tracing the sequence of 0"s and 1°s as
successors.
Example 1.16] The five source symbols of the alphabet of a discrete
memoryless source and their probabilities are given by sg = 0.4, S$) ~ 7
0.1. Find average codeword length and entropy usi"s
5) =02,5;= 0.1 s,
Huffman algorithm,
Solution:
Source symbols are listed in order of decreasing probabilities,Information Theory
Huffman Coding Table
Stagel Stage Stage HT Stage TV
So 0.4 —» 0.4 — oe “'t
a = aes
S2
8; 014
sy 01
1
Encoding Table
- Codeword obtained by Number of digit im
Symbol | Probability | tracing digits from stage codeword
IV to stage I
So py=0.4 1 1
5 =0.2 01 2
8) p,=0.2 000 3
3 | p= 0 0010 4
Sy p,=0.1 ool 4
‘Average Codeword Length
Average codeword 7
= 3(Probability of symbol) x (No. of digit in codeword)
length per symbol, ib
4(1) + 0.2(2) + 0.2 (3) + 0.1 (4) + 0.1 (4)
c
binary digits /symbol
L
Entropy
1 1 1
H(S) = Po lee; +p, log 5” + P2 logs + Ps log. 5” + Py loss 3;
41.82 =
. Digital Commun
_ Lo I
del ops gg 1 02 fox, 93 102 low Gy 40-4 log, a
0.1 log,
,
0.52877 + 0.46439 + 0.46439 4 0.33219 + 0.33219 :
[2193 bits /symbol |
(Ww)
ADMS have five symbols sq, 8
probability distribution as 0-4, 0.2, 0.1, 0.2 and 0.1 respectively,
82183) Sy characterised
Evaluate the distinct variable length Huffman codes for the source to illustray
of Huffnan technique, Calculate code efficieney and the variane
non uniquene:
of the ensemble as defined by,
Not
ee Lp
ko
where p, and n, are probability and length of codeword respectively for symbal
3, and Lis average length. Conclude on result. |
(Nov/Dec 2009, Apr/May 2011)
Solution:
Here the Huffman codes should be evaluate for two distinct variable length, ie.
non uniqueness of Huffman techniques.
1. Placing the combined probability as high as possible.
2. Placing the combined probability as low as possible.
Entropy
Entropy of the source is same for two distinct.
HO =F py loa
k=0 “Pk
a
= t 1
0.4 log, 9g +0.2 log;g 5 +0.1 logs ay +02 low a5 +0.1 log 9,1
Tae - 21> bwee]
LH(S) = 2.122 bits/symbolInformation Theory
1. Placi Dol as High as Post
acing Combined symbol as High as Possible
Huffman Coding Table
Source symbols a i
‘ymbols are listed in order of decreasing probabilities,
Symbol Probability StageII Stage IIT Stage 1V
Stage |
So 04 04
S102 02 ee 04
S202 024 02
Sens 027
sso :
Symbol | Probability Codon nied by sami of in
So 04 00 2
51 02 10 3
8 01 o10 3
3 02 il 2
84 01 ol 3
Average Codeword Length
z Dy Ny = 0-4(2) + 0.2(2) + 0.18) + 0.22) + 0.1 GB)
k=0
Code Variance
= Srnou-DP
k=0ag
1% = 96.45%
2, Placing Combined Symbol as Low as Possible
Stage I
iit 21 Comms
P +0113 - 2.2P + 0.2 [2— 22P-+ 0113 =
2.2p
Symbol Probability Stage “Stage IIT Stage Ty
So 0.4—+ 0.4—— 04 064
r-1
Ss; 0.2— + 02 0.4 os ;
SF 0.2 ——» 023) 0.2—-
oO
82 0.1 027
84 or
1
Fig. 1.6.
“ | Codeword obtained by Number of bits in
Symbol | Probability
tracing codeword
So 0.4 1 1
5; 02 01 2
$3 0.2 0010 4
82 0.1 a 000 3
(Esse O 0011 4Tim
= O41) +020) +0104) +028)+
(1) +0.20) +0114) +028) +01 @)
LL= 22 ynta
Code Vatiance
4 ~
Yalu ip
ko
2ap+0fd-22p+02 (3-227 +6 Is-2.2P
eng Ib) ADT filing witb of caren
as shown Belov.
i ie etiad ese | ete |e ae
Probaiiy | 0.25 | 00625 | 025 ars} 0.125 | a125| 025
Generate th Hin code ih iu code arian, Deter the coe
ries add fe Cnn on cde fen. (Ap/May 2004)
Solution:
Source symbols are listed inthe order of decreasing probabilities.
Dlaa!”
1.36 | ,
ica)
puso
on
| Huffman Coding Table
symbol Probability Stagell — Stagelll Ste IV Stave Vg
Be Stage
vl
s: 0.25 0.25 0.5 058
56 0.25 0.25 0.25 0.25 af ut
7
s, 0128 0.125. [80.25 025% | 025
< 0 ~ '
8, 0128 0.125. 0.1259 [* 0.25 5
ss 0.125 0.1254 |* 0.1257
i | Codeword obtained by Number of bits in |
Symbol Probability — ria |
a 025 | 10 2
se | 0.25 | i 2 |
an ose 001 3 |
s | ons | 010 3 |
3s | 0125 oll 3 |
Ls, | 0.0605 | 0000 4 |
5s | 0.065. | 0001 4 |
Average Codeword Length
= |
a
Pym + Pz Mz * Ps M3 * Py My + Ps M5 * Po NoNEES SISSIES
; 137
Information Theory |
ve 95(3) +0.25(2) |
HLS O.062S eH) +0.2502) + 0.00254 1) 10.125(3) + 0.1290)
| : |
{ LL. 2.625 bits/Symbol |
Entropy |
| ‘ 1 i
NS) = ¥ py tog, i
keo {
1
~ +0,0625 log
+ 0.25 logs 9.35 Oe
1
os * 0.0615 log 90625
1 ;
. 1 § loos ise + 0.25 log.
Teds + 9-125 log, Fy5g +0125 low 195
|
|
Code Efficiency |
H(S) }
Tesh |
nel |
Code Variance
6 _
= Y py lin LP
ia
= 0.125(3 ~ 2.625) + 0.0625(4 ~ 2.625);
(4 = 2.625)? + 0.125 (3 ~ 2.625) + 0.
A discrete memoryless source hay an alphabet of five syabals
with their probabilities for its output as given here,
X = [X, XX, X, Xa]
PIX} = (0.45, 0.15, 0.15. 0.1, 0.15]
i
|
|
{
t
j
}
i
1
{
|
|
1
|
Compute two different Huffinan codes for this source. Fop these
Dwo codes find
nrwrage codeword length
@ -
averd
Gi) Variance of the
symbols.
@Solution:
Placing combined symbol as high
Both will produce a same result.
Huffman Coding Table
: Pitted Comme
codeword length over the
ensemble of
Sure,
[Apr/May 2005, 2995
as possible and as low as possible,
Symbol Probability Stage I Stage II] Stage IV
Stage I
X, 0.45 ——> 0.45 —= 0.45 0554
1
X, 0.15 0. os 0.30 feed
X; 0.15 0.15% 0.25
Xs 0.154 0.155
X,
Symbol | Probability Codeword obtained by Number of bits in
tracing codeword
Xx 0.45 1 1
x -
2 0.15 000 3
| % 0.15 001 ;
X | on oul 3
a. 2.
Ls | 0.15 oll 3
3
Average Codeword Length
;
E pany* 0.15(3) + 0.15(3) +
0.1(3) + 0.15 (3)
Code Variance
3
e= Y piy-Lp
kel
= 0.451 = 2.1)? + 0.153 = 2.192 + 0.153 ~ 2:1)? + 0.1 G 2-1)? + 0-15 = 2.1"
o = 0.99 = 1
Draw a Huffman coded tree for the following input sequence
“Add BB CCC DA”, [Apr/May 2014]
©Solution:
Given sequence is AAA BB CCC DA.
Theer are total 10 symbols. Here, A occurs 4 times out of 10.
p(A) = 4 a 04
Similarly, B occurs 2 times out of 10.
p(B) = 5 =02
C occurs 3 times out of 10,
p(C) = 4 =03
D occurs | times out of 10,
1
PD) = jg =0.1
4 7 T 7
Symbol A }e | Dd]
ae onl al
Probability | 04 | 02 | 03 | 01 |
|
Huffman Coding Table
Source Symbol are listed in drder of decreasing probabilitieDistal © Comms
Symbol Probability Stage Stage Hl]
Stage |
A Cc - 06%
C 03N, ee 3 ne
B 020
D ud
Using this “ tables we can draw Huffman Code tree,
|
|
ee
The code for D = 001, C
1, B = 000, A
Drawbacks of Huffman Coding
“> It requires knowledge of the probabilistic model of the sourc
knowing the source statistics in advance is not possible at all times:
n code)
e. But)
+ With modeling text, the storage requirement prevents the Huffmai
from capturing the higher order relationship between words.
% This coding algorithm doesnot uses fixed length codes, therefore it is no
suitable for synchronous transmission.
“When it is applied to the ordinary English text, achieves compaction af
|
about 43% onlyInformation Theory
1.9. DISCRETE MEMORYLESS CHANNELS
ough which
(DMC) is @
1.7. Both X
A communicati
ica ; i
ae ‘tion channel may be defined as the path or medium thr
S flow to the receiver end. A Discrete Memoryless Channel
statistical model with an i
sit del with an input X and an output Y es shown in figure
and Y are random variables
c— —
|
[xd .
| TX PID yo Y
| . .
[ss ce
1.7. Discrete memoryless channel
During each unit of time, (the signaling interval) the channel accepts an input
symbol from X, and in response an output symbol from Y. The channel is said to
be “discrete” when both alphabets X and Y have finite sizes. It is said to be
“memoryless” when the current output symbol depends only on the current input
symbol not any of the previous ones.
The channel is described in terms of an input alphabet.
X= Ko xp eee Md (1)
an output alphabet,
Y= Vode xo} +Q)
and a set of transition probabilities
pox lx) = p(Y=y,|X=x) foralljandk ...(3)
We have
O 8
jet
riy=yp) = 5 vy 2.
HX| Y=yx) Srestowtee| 7a,i50
Thus quantity is itself a random variable that takes
HX | Y=) 1) with probabilities p).--PO'x-) respectively.
‘The mean value of H(X| Y =)’x) over the output alphabet y’ is
K
K
-1
DHX Y=) POW
=0
HY)
"
Ket Jel i
yy ves) POD Hoe| SET ]
x50 j=0
5 1 -
(x;..¥4) logy [ tum lyn) |
an PO 2] BG, Le 2
The quantity HCXIY) js called a conditional entropy. It represents the amount of
uncertainty remaining about the channel input after the channel output has been
observed.
Now, considering both the uncertainty conditions (before and after applyi
inputs), we come to know that the difference i.e., H(X) - H(X | Y) ees
the uncertainty about the channel input that is resolved by ob: ae ae
output, This called as the Mutual Information of the channel ears
el.T
Ommanic
Wi
whole thing jy
Denoting the mutual Information as I(X5 Y) we ean a mn
in write the
follows
equation, as
WX; Y¥) = W(X) ~ XY)
Hence, this is the equational representation of mutual information.
Properties of Mutual Information
“The mutual information I(X;Y) has the following important properties,
Property1:
‘The mutual information of a channel is symmetric i.e.
W(X; Y) = 1(¥s X) (
vere the mutual information I(X; Y) is a measure of the uncertainty aba th
ved by observing the channel output and the ruta
channel input that is resol
ion I(Y;X) is a measure of the uncertainty about the channel output that is
informati
|
resolved by sending the channel input.
Proof:
Jel 1
H(X) = <) log, |
ee wel ]
K
k
_ a
= & pe) tors Pt) ] x P01 x)
Jel Kel 1
= YL poxlx) pep toes] sep
joo k=O ]
mae
c= SS wooeal ay | =
We know that
HX ¥) Sus 1
; (x, y,) log] oy
Ne cel see |
1G Y) = HO) -H(X|Y)
Substituting 7
ubstituting H(X) and HICX | Y) and then combining terms we get,
A
det Ket , :
-F'5 69) oa al A)Information Theo,
From Baye; Sere ae a
rom Baye’s rule for condit:onal probabilities, we have
PEL _ pols) (8)
P(x) PO)
ae ' | ; r of
Hence, substituting equation (8) into equation (7) and interebanging the ove
summation, we may write
IQGY) = = = P(X jY4) logy eae |
= 1(%X) @)
Hence the proof.
Property 2
The mutual information is always non negative.
WX; Y) > 0 +10)
Proof
To prove this property, we first note that
Pp ye) -(11)
POY”) = Do)
Hence, substituting equation (11) in equation (7), we may express the mutual
information of the channel as
IK! PCC ¥4) |
SY) = PEI
1X; Y) yy 2% PC» 5) logs [ pee) pO) (12)
Next, a direct application of the fundamental inequaltiy yields the desired result,
W(X; ¥) 20
with equality if and only if,
PU» Ye) = PC) PO) for all and k (3)
This property states that we cannot lose information, on the average, by
observing the output of a channel. Moreover, the average information is zero if, and
he input and output symbols of the channel are Statistically independent, asProperty 3
‘The mutual information of a channel may be
expressed in terms of
,
entropy of the channel output as
WX; Y) = H(Y)- HY] X)
where H(Y | X) is a conditional entropy. oll
Proof:
This property follows directly from defining equation (3) for mutual informa,
.
ofachannel and property 1.
Property 4
The mutual information of a channel is related to the joint entropy of th!
channel input and channel output by : |
1(X:Y) = H(X)+H(Y)-H(%Y) Al}
where the joint entropy H(X, Y) is defined by
0,1) = SS pe, 0 06 (a5) Al
Q %, EP ots ple)
Proof
To prove this property, we first rewrite the definition for joint entropy HY)
as
1
HOG Y) = [eee |
1 K=1
EX per ys) loge! Tee, yy)
J=0 k=O
Jel
+
iso
K-1 1
XP) logy [ PG) POW) ]
The first double summation term on the right side of equation (17) is recogni!
as the negative of the mutual information of the channel I (X ; Y) » previous
given in equation (12). As for the second summation term, we manipulate it
follows,
at
Ald)Information The
aS : 7 _
E pe,)) log,| —_
_] el oe |
/ 1 VS po 29
y aolea|ee! K-1 ie
Stal ay | Eason hel wo]
Tio feo &
10
1 rl
- 7 o aie
= ¥ ple) log, all ) log y ]
2 pl) oe a + &, Pow L pow
(18)
= HX) +H)
: : “tthe result.
Accordingly, using equations 12 and 18 in equation (17) we 26 HE ms
eal)!
H(X, Y) = -1(X; ¥) + W(X) + HY)
By rearranging this equation, we will get
1(X;Y) = H(X) + H(Y)-H% Y)
diagram,
Hence, the proof.
Mutual Information, of a channel can be conclude usi
Fig, 1.8. Mustrating the relations amoug various channel parameters
‘The entropy ofchannel input X is represented by the circle on the left. The
el output Y is represented by the circle of the right. The mutual
is represented by the over between these two circles.
entropy of chann'
information by the channelWe have so far discussed mutual information, The maximum averag
IE muy
ation in an instant of a signaling interval, when transmitted by a
Y a diser,
inform,
bilities of the rate of maximum reliable transmy
memoryless,channel, the probal
of data, can be understood as the channel capacity. It is denoted by C any
i
measured in bits per channel use.
Consider a discrete memoryless channel with input alphabet X, output alphet
Y and transition probabilities p(s |%))-
‘The mutual information of the channel is defined as
Jer Ke POs)
WX;¥) = Z pa y0 | DO)
yao ke0 PV
Pty Ii) = PO lpPGp
Jel
pO) = PU lx) Pe)
j=0
| information of a chante
e three equations we see that, the mutual
which the channel is uset
ly on the channel, but also on the way in
)} is independent of the channel. We c=
From thest
depends not onl
‘The input probal
then maximize the average ™
to {p(x}.
Hence, we define the channel capacity of a diserete memoryless channel «s
erage mutual information I(X ; Y) in any single use of the chant:
ability distributions {pCi
bility distribution {p)
autual information I(X ; Y) of the channel with respe
maximum av
where the maximization is over all possible input prol
on X.
‘The channel capacity is commonly denoted by C, thus
C= max I(X;Y)
[se eat eee5
Information Theory
rr
1,12, CHANNEL CODING THEOREM OR SHANNON’ SECOND THEOREM
The noist ent i A a
na se present in a channel creates unwanted errors between the input and
output sequet e ot jlity sl
put sequences of a digital communication system. The eror probability should
be very low, nearly ‘eve such 2
me z low, nearly < 10-6 for a reliable communication. In order (0 achieve such @
high levi . i
gh level of performance, we may have to resort to the sue of channel coding:
The channel coding in a communication system, introduces redundancy with @
control, so as to improve the reliability of the system. The source coding reduces
redundancy to improve the efficiency of the system.
Channel coding consists of two parts of action.
1. Mapping incoming data sequence into a channel input sequences.
it sequence into an output data
2. Inverse Mapping the channel output
sequence.
The final target is that the overall effect of the channel noise should be
minimized.
‘The mapping is done by the transmission, with the help of an encoder, whereas
the inverse mapping is done by the decoder in the receiver.
Discrete Channel pee Channel
|. |_| i
memoryless encoder memoryless decoder Destination
| seus channel
i Transmitter Receiver
| Noise
i
in which the message sequence is divided into sequential
h k-bit block is mapped into an »-bit block where
* Consider block codes,
blocks of & bits long, and eacl
n2k.
The number of redundant bits added by encoder to each transmitted bloc!
n—k bits. The ratio kin is called the code rate.
Fig, 19. Block diagram of digital communication systemDigital Communicge
i
k
Code rate, r= “>, always <1
_ No. of message bits in a block
~ No. of bits in code word
The accurate reconstruction of the original source sequence at 4
ity of symbol error should be les,
destination requires average probabil
Consider a discrete memoryless source with an alphabet $ and
H(S) bits per source symbol and source emits symbols once ¢
entropy
very T,
seconds.
H(S) ,
Average information rate of the source ae bits/sec
The decoder delivers decoded symbols to the destination from the sours,
alphabet S and the same source rate of symbol every Ts. {
The discrete memoryless channel has a channel capacity equal to C bis|
per use of the channel and used once every T, seconds, |
. Cc |
Channel capacity per units = =| bits /see
2
|
This represents the maximum rate of information transfer over the channel. The
Channel coding theorem for a discrete memoryless channel is stated in two parts a
follows, -No po:
ility of transmi
» with small
probability of error,
- i ation
ion and reconstruction of inform
¢ of the
Henee, the maximum rate
oo. e critical rau
of the transmission is equal to the eri
channel capacity,
a
ake place over
fe i ich ¢: ake
for reliable error-free messages, which can take Pp) 7
is called as “Channel Coding Theor
lowing points,
serete memoryle
Inn
important to note the foll
iG
It doesnot show us how to construct a good code.
It do
a »mbol error after
ot have a precise result for the probability of symbol
decoding the channel output.
Binary Symmetric Channel
; ications channel model
--A binary symmetric channel is a common communications channt chest
. ‘ e] Q itte! shes:
used in coding theory and information theory. In this model, a transmitter wi
send a bit (a zero or a one) and the receiver receives a bit.
It is assumed that the bit is usually transmitted correctly, but that it will ee
ipped” with a small probability. This channel is used frequently in information
theory because it is one of the simplest channel to analyse.
41.10. Binary symmetric channel
A binary symmetric channel with cross over Probability p denoted by is a
channel with binary input and binary output and Probability of error P; that is, if .
3 » if xDigital Commune
a.
is the transmitted random variable and Y is the received random Variable, th
> then
channel is characterized by the conditional probabilities,
p.(¥=0[X=0) =
p.(Y=0| X=1) = p
p.(Y=1|X=0) = p
p(Y=1|X=1) = 1-p |
“It is assumed that 0
1/2 then the receiver can swap i
output (interpret when it sees 0 and vice versa) and obtain an equival le
channel with crossover probability 1 —p < 1/2. |
This channel is used by theorists because it is one of the simplest nois,
channels to analyse. Many problems is communicative theory can te!
|
reduced to a binary symmetric channel.
Conversely, being able to transmit effectively over the Binary symmetric
channel can give rise to solutions for more complicated channels.
Channels Capacity for Binary Symmetric Channel
By symmetry, the capacity for the binary symmetric channel is achieved with
: a 1
channel input probability-p(x,) = p(x,) = 3» Where x and x, are each 0 or 1. Hence
© = 105 ¥) 96) = ple,) = of
From the binary symmetric channel diagram we have
POI) = pO |x) =p “8
and B04) = pO [x)= 1 —p 8)
Therefore, substituting
capacity equation with J =
P(*,) accordance with
X(Dp) =
'g these channel transition probabilities into chant!
K = 2, and then setting the input probability via)”
~Po log py~(1~p,) log, (1 - py) ald)‘mation Theory
We find that the eap;
acity of the binary symmetric channel
(5)
C= 1+plog, p+(1-p) log, (Ip)
Based on equation (4) we may define the entropy function re)
1 LL) 7
Xp) = vlogs () + (py oes (725)
Hence, we may rewrite equation (5) as (7)
C= 1-X(p)
a en
The channel capacity © varies with the probability of error F
following figure.
as shown in
Channel Capacity, ©
i
2 05
Transition Probability, P
ig. LL Variation of channel capacity ofa binary symmetric channel with transition
ae probability p
1. When the channel is noise-free permitting to set p = 0, the channel
capacity C attains its maximum value of one bit per channel use, which
is exactly the information in each channel input. At this value of Ps
entropy function X(p) attains its minimum value of zero,
the
2. When the channel is noisy, producing a conditional probability of ertor p
=}, the channel capacity C attains its minimum value of zero, whereas
the entropy function X(p) attains its maximum value of unity,of the Channel Coding Theorem to Binary Symmetric Channels
nels
‘Application
iscrete memoryless source that emits equally likely bi
ally likely binary 5
‘yMib
Consider a di
vith the source entropy equall to one bit
ae per source g
Yb
seconds, W
ate of the source is (
once every Ts
the information r I/¢) bits per second.
|
she couee sequence is app 10 9 Dinar channel ener wih coder |
-
she encoder produees a symbol once every conds, Hence, the chang
an
sety per unit time is (C/T.) units per second, where C is determined by th
the
capa
prescribed channel transition probability p in accordance with following entropy’
function.
xia) = plosa(2)# =p) toea(725)
c = 1-X(p) “a
Accordingly, the channel coding theorem implies that if
(9)
arbitrarily low by the use of a suitable
‘The probability of error can be made
Is the code rate of the encoder.
encoding scheme, But the ratio T, /Ts equa
T,
ra (10
ic
Ts T,
T,
7 oandr
Hence, from the channel coding theorem, we may state that for any = vind fh
< 0.9192, there exists a code of large enough length 7 and © oe w the, given
hat when the code is
appropriate decoding algorithm, such th
channel, the average probability of decoding error
Repetition
codes
Goad capacity
a value
=10°
‘Average probabily of error
0.01 01 10
|
ks
Fig, 1.12, Mlustrating significance of the channel coding theorem
Example 2:
Consider a simple coding scheme that involves the use of a repetition code, in
which each bit of the message is repeated several times.
Let each bit (0 or 1) be repeated » times,
= 3, we transmit 0 and 1 as 000 and 111, respectively.
where 7 = 2m +1 is an odd integer.
For example, for*
Comma
IFin a block of received bits the number of Os exceeds the nun
1
the decoder decides in favor of a 0. Otherwise it decides in favor ofa, °
Tony
Hence, an error occurs when m + 1 or more bits out of m= 2 m + Lj
tence, . * 1 bit
received incorrectly. Because of the assumed symmetric nature of the chany ,
Ba aa - Mel
average probability of error, P, is independent of a prior; probabilities of y at
any
We find P, as
“fn
a= 3 ("Jeane FF
i=m+l t
where P is the transition probability of the channel.
Table 1.1. Average Probability of Repetition Code
1 [A ability of
Code Rate r=", verage Probability of error, p,
1 102
3x 10-4
10-6
4x 107
108
5x 10-10
1
3
L
5
a
7
1
9
1
I
SARS RUS ee
1.13. CHANNEL CAPACITY THEOREM
x Cais & zero-mean stationary process X(/) that is band-limited to B hertz. Lt
x» K= 1, 2, .. m denote the continuous random variables obtained by unifort
sampling of the process X(/) at the rate of 2B samples per second.
The: i in
me asl ate transmitted in T seconds over a noisy channel, also ba
(0 B Hertz, Hence, the number of samples, n is given byRE ea aaa
—-
(1.57
Information Theory
ee
We reler 0X, aso Te ehiannel output Je
Xi, a8 a sample of the transmitted signal. ‘The ea aA
Perturbed by additive white Gaussian noise of zero mean and power °PS
density Ny/ 2. The noise is band-limited to B hertz.
Let the continuous R.
. samples of
andom variables Yj, k= 1, Que # denote samPl
hown by
U (2)
Yy = X +N, k=1,2000
‘The noise sample Ny. is Gaussian with zero mean and variance given bY
oe = NB
we assume that the samples Y,, k = 1,2, .
3)
nare statistically independent .
ibed is called a
A channel for which the noise and the received signal are as deser
X——_—>| Ye
Fig. 1.13. Model of discrete-time, memory-less Gaussian Channel
The transmitter is power limited and assign a cost to each channel input to make
meaningful statements about the channel. So cost can be defined as
ED] =P k=1,2..,n (4)
where p is the average transmitted power. We define the channel capacity as
C= max (1K, 5 ¥) EK?) =p} 83)
LEED
where I(X;, 5 Yq) is the average mutual information between a sample of the
transmitted signal, X,, and the corresponding sample of the received signal Y,,. The
. F ‘a
maximization in above equation can be performed with respect to f Xx), the
probability density function of X,. Ke),The average mutual information (X,
+ Y4) can be expressed j
one op |
quivatent forms, Among thatene is Ms
UX EY = AOD FOI)
sft
Slave Ny and Ny are independent random variables and their sum equate y
als
such as
an REN, kein
We find that the conditional differential entropy of Yq, given Xy, is equal to
conditional entropy of Ny.
YEN) = AND
Substituting equation (7) into equation (6) we get
1X2 Yi) = AY ANY (8)
|
Since A(Nq) mizing I (X, ; Yj)
independent of the distribution of X,, may
i} st
requires maximizing 4(Y,) and the differential entropy of sample Yj of the received)
signal, Pr
For A(Y,) to be ma
imum, Y, has to be a Gaussian random variable (i.e) the| | t1
samples of the received signal represent a noise like process. 7
Since N, is Gaussian by assumption, the sample X, of the transmitted signal
must be Gaussian too. The maximization specified in equation (5) is attained by
choosing the samples of the
transmitted signal from a noise like process of average
power p. So we
can reformulate equation (5) as
CaO EE
X, Gaussian, E[X? ] =p a
For the evaluation of the
—_ channel capacity C we proceed in three stages,
he variance of sample iver si °
ee ple Yi, of the receiver signal equals p + o?. Hence
rential entropy of Y, as
AY) =
1
2 bg, [2x e (+62) 0)
os od~~ T159
Frerential
2, fence, the dif
2. The v, -
1als 6?
atianee of the no}
: nee of the noise sample Ny equ
entropy of Ny as
‘ A)
KN:
N 2 (2m 0)
recognizing
3. By substitu .
YY substituting equations (10) and (11) in equation (9) and
we get the
the definiti A
¢ definition of channel capacity given in equation (9),
desired result as
(12)
c=lt P :
2 logs +5) bits/channel use
withthe channel used times forthe transmission of samples ofthe process
/T) times.
X(t) in T seconds, we find that the channel capacity per unit time is (1
The number n equals 2 BT and the channel capacity per unit time as
>
c= Bie (1435 ) bits/s (13)
Statement
Shannon’s channel capacity theorem is also referred as the Shannon-
information that can be
Hartley Law. Hartley showed that the amount of
transmitted over a given channel is proportional to the product of the channel
bandwidth and the time of operation.
ve white
B hertz, disturbed by addi
Capacity of a channel of bandwidth
iy /2 and limited in bandwidth to B,
Gaussian noise of power spectral density I
is given by
P
c= Btog,(1 +h ) bits/s
rage transmitted power.
where P is the ave!
theorem is one of the most remarkable results of
‘The channel capacity
on theory ina single formula.
informati
Channel capacity theorem highlights most vividly the interplay between
ey system parameters.
three kDigital Communje
—~
@ Chann
(average transmitted power and
Power spectral density at the channel output,
el bandwidth
(iii) nois
The theorem implies that for given average transmitted power P and chap,
bandwidth B. we can transmit information at the rate C bits per second Ny
arbitrarily small probability of error by employing sufficiently complex ened,
Systems, z
It is not possible to transmit at a rate higher than C bits per second by ay,
encoding system without a definite Probability of error.
Hence, the channe! capacity theorem defines the fundamental limit on the rate
power-limited, band-limited Gaussian channel, 7
must have statistical propeniy
error-free transmission for a
Spproach this limit, the transmitted signal
approximating those of white Gaussian noise.
Ideal System
We define an ideal system as one that transmits data at a bit rate R 6 equal to th:
channel capacity C. We may then express the average transmitted power as
P=E,C alt
where E, is the transmitted energy per bit, Accordingly, the ideal system i
defined by the equation
5 > ba (1+$88 }
B (ENB J
Y per-bit to noise power spectral density ratio E, / Nj.
we may define the energy
terms of the bandwidth efficiency, C/B for the ideal system as
E, 2c/B _ | 7
No CB wll
This equation is displayed as the curve labeled “eapacity boundary’ in belo
figure.
A plot of bandwidth effcieney Ry / B vérsus E, / Ny in figure called t
bandwidth efficiency diagram, Based on this diagram, we can have the followit!
observations.
Pmsee density ratio Be! 9|
jo-no'
1
niting value
lim ) ae
soe
This value is called the Shannon limit. Expressed 19 decibels. it equals
— 1.6 dB. The corresponding limiting value of the channel eapaci®) fe
cs)
WN VG
104
f capacity boundary
for which Rp = ©
KX
Region for which
Rp C). The latter region
v
is shown shaded in-
Disctal Comm
No. Rye
Seam highlights potential trade-ofl among
ProPability of symbol érror P.. We may view movement of the oper
YN for a Fixed ;
Point along a horizontal line as trading P, versus
On the other hand, we may view movement of the operating
8 Vertical line ag trading P, versus R,/B, for a fixed E,/Ny.
point
The Shannon limit may also be defined in terms of the I2y/Ny required by
ideal system for error-free transmission to be possible. We may write for the
system,
[0 (EYN) =,
U Gano Jog, { 2
1 = 3 bom (5%
log, ¢
and = che
*o 202
The desired form for fy (x) is therefore described by
_ —pn)
Ix @) = Ee en (=S*) (B
which is recognized as the probability density of a Gaussian random variabj
of mean pt and variance 62,
lex
Already we have the entropy of a discrete random variable as
100 = f Fy 6) tog, [w]e
ral =(e=pp I
109 = fy o7(=85"*) og ot Jax
A(X) = ¥ tog, (2m e 62)
We may summarize the results of this, example, as follows.
1. For a given variance o?, the Gaussian Random variable has the larges!
differential entropy attainable by any random variable. (i.e) if X i84)
Gaussian Random variable and Y is any other random variable with th
same mean and variance, then for all Y.
A(X) = ACY)
where the equality holds if , and énly if, y = xX1.67)
The entro , stermined by
he entre PY Of at Gaussian random variable X is uniquely deter
the variance
Mranee of N (Le, iL is independent of the mean of X)
TWO MARKS QUESTIONS WITH ANSWERS
What is information
Information can be defined as the inverse of probability of occurrence
. aE
i) = log Ps
Give to properties of information,
* Information must be Non-negative.
ty is more
* If probability then information is more and if probabili
then information is |
3. What is entropy?
Entropy can be defined as the average amount of information per source
symbol,
KeI 1
HS) = Dry toes( 55)
a0 Pe
H(S) is called the entropy of a discrete memoryless source with source
alphabet.
4. Give two properties of entropy.
“If H(S) = 0, if and only if the probability p, = 1 for some k, and the
remaining probabilities in the set are all zero. This lower bound on
entropy corresponds to no uncertainty.
. a le
IP H(S) = log, &. ifand only ifpy = Z for all & (i.e all the symbols in the
alphabet S are equiprobables. This upper bound on entropy corresponds
to maximum uncertainty. ° Pi
5. What is discrete source?
If a source emits symbols $ = {sq s), 5
ao Se} from a fix i
a xed finit
alphabet then the source is said to be discrete source. .fon Rate,
. per ot bits OF intomy,
| Information site“ is presented in average num :
Ros)
os are generated.
is nate at which mes
NUS) ~ is the entropy of the source.
hat isa presi
code?
In Pretix code no codeword is the prefix of any other codeword, jy
Variable
th code, The binary digits are assigned to the mess:
their Probabilities of eccurrence.
Define
Mutual Information,
Mutual intormation 1(X.Y) ofa channel is defined by
KX Y) = A(X) —HEXIY) bits/symbol
HX) ~ enutopy of the source
HQXIY) - Conditional entropy of Y
9. State any. Sour properties of mutual information,
* The mutual information ofa channel j
KX:¥) = (Y:x)
* The mutual information is always non negg
ative.
KX;¥) 20
symmetric
* The mutual information of a channel may be
expressed in terms of
entropy of the channel output as
IXSY) = HOY) -Hy]
* The mutual information of a chanr
nel is re]
fated to the joint entropy of th
channel input and channel output by
IXY) = HEQeHy— HX. Y)
eeheony
il
{109}
|
Cf the entropy UNV) of a communication system
to,
Define
© the yi
SMficy
where 104
and Y is he receiver,
ents uncertainty of X on
onal entropy. I repres
is Kno
8 Known. tn other words HUX]Y) is an average meastre of
aller V is received.
umeeraingy jay
HAI) te
KNIYY represents the inform
i ‘ition lost in the noisy cl
+ AN event has a six po
la
sible outcomes with probabilities 1/2, 1/4, 1/8, WAG
US2. Vi
* Mind the entropy of the system,
I
Poe 5.pyet iy ot 1
7 Ae P2 ys Pa 76+ Pa
+ Ps 39
WS Ep, tows dp)
2 +(F) tow ons (i) toes 06) ($4) toes 624 (53) toe 6)
1.5625
12. When is the average information delivered by a source of alphabet size i
maximum?
re equally likely
Average information is maximum when the two n
ise, py = py = V2. Then the maximum average information is
q,
i log, @+4 log,(2)
= 1 bit /message
13. Name the source coding techniques.
1. A Prefix coding
2, Shannon-Fano coding
Huffman coding
we
14 Write the expression for code redunudaney in terms af entropy.
Code redundancy is the measure of redundancy of bits in the encoded
‘ode -
message sequence.
Redundancy = 1 code efficiency |f = ive 2 If's0, why?
15. ds he information y Fa continuous system non negati “ 7 Lal
Yes, the information of stem is non neg i
tha IXY) > 0 sone orgs property.
16, State Shannon's
continuous 3)
frst theorem,
Shannon's First The
HS), the
utee OF entry,
ven a discrete memory .
‘orem:
" ce encoding is bounded a
'1GC code word length L for any source encodin
[> IK) numb
e average number
The entropy H(S) represens a fuidamental limit onthe average nur a
bits per source symbol i,
JT. What is data compaction? |
Data compaction
; ; ation so that he
'S used to remove redundant information s th |
decoder Teconstructs {
he original data with no less of information |
18. Whavis decision ree? Where itis used?
The decision tree is a tree that has an in
Corresponding to source symbols 55, s), Spo Sy_, Once each terminal Site
mits its symbol, the decoder is reset to its inital state, |
Decision tee is used for decoding operation
19. What is ‘instantaneous code?
ial state and terminal states
of prefix codes,
Ifa codeword is not a prefi
instantaneous code;
eg.
0
10
110
1110
20. What is uniquely decipherable code?
ix of any other codeword, then it is said to be
Ifa codeword is not a Combination of, ANY other code itis sa,
‘Word th 0
be uniquely decipherable code, Smits said101
1001
, / — . i sing
21. How ant encoding operation is taken place in, Lempel-Ziv. coding "
» binary sequence?
ties , . fa stream
Encoding operation can be taken place by Parsing the source ae
nto d previously.
ements that are the shoftest subsequences not encounters’
22. What is discrete channel?
The channel is said to be diserete when both the alphabets have finite sizes.
23. What is memoryless channel?
The channel is said to be memoryless when the current output symbol
of the previous choices.
depends only on the current input symbol and not any
24. Define a discrete memoryless channel,
For the discrete memoryless channels, input and output both are discrete
random variables. The current output depends only upon current input for such.
channel.
What is the important property while using the conditional probability
Ol¥y?
The sum of all the elements along the column side should be equal to 1.
to
on
What is the important property while using the conditional probability
(1X)?
The sum of all the elements along the row side should be equal to 1.
26.
2% Define channel capacity?
We define the channel capacity of a discrete memoryless channel as the
maximum average mutual information IX ; Y) in any single use of thePigital Comuriac ‘ig
if
cl hannel, i
28. State the channel coding theorem fora
source with ana)
s. Let discre'
‘ete memory
Letad
produce symbols at every
capacity C and used once every le
1
nitled
source output can be trans
cheme for which s pe
ability of error.
a coding
ith the small proba
There exists
constructed W
over the channel and can be 1
ng) | &
~ T,
aling at the cr! ical rate.
Then, the s id to be
29, Explain Shannon-s -Fano-Coding ?
tained by the following simple procedure
An efficient code can be ol
known an Shannon Fano algorithm.
Algorithm:
Step 1 : List the sour
Partition the sets into two sets that are as close to equiprobabk
symbols in order to decreasing probability.
Step 2 +
as possible and assign 0 to the upper set and assign 1 [0 the
lower set.
Step 3 : Continue this process each time partitioning the sets with *
nearly probabilities as possible unti .
a mntil further partitioning is
possible. partitioning 1S
30. Define Huffman coding.
The Huffman code is a si
a source code wh d
ose average Wi engl
‘orld lene!
approaches the fundamental limit set b:
set by the ent ofa di q
source. tropy of a discrete memory![ taformation Theory
prescribed set
he
The Co s used to synthesize the code is to replace t ne.
ss source with @ simpler one
REVIEW QUESTIONS il
al information
1. Define mutual information, Find the relation between the MUNA”
and the joint entropy of the channel input and channel outPut: ca
important properties of mutual! information.
using Huffman
Encode the source symbols with following set of probabilities
y
coding .
m = { 0.4, 0.2, 0.12, 0.08, 0.08, 0.04}
3. Explain in detail Huffman coding algorithm and compare this with the other
types of coding.
4, Explain the properties of entropy and with suitable example. Explain the
entropy of binary memory less source.
5, Discuss source coding theorem, give the advantages and disadvs
channel coding in detail and discuss the data conipaction.
antages of
Derive the channel capacity theorem.
Explain the information capacity theorem.
Explain the Shannon-Fano coding.
Explain the discrete memoryless channels.
1
ule Ow
ye aN
}
Gessivne wilervetle Which QoQ