INFORMATION CODING
TECHNIQUES
UNIT I
 Information Entropy Fundamentals
 Uncertainty, Information and Entropy – Source coding
 Theorem – Huffman coding – Shannon Fano coding –
 Discrete Memory less channels – channel capacity –
 channel coding Theorem – Channel capacity
 Theorem.
 Objective:
 In this unit students will acquire knowledge about
 information and entropy, source coding, Huffman
 coding, channel capacity and coding theorem.
 Introduction :
 Communication theory deals with systems for transmitting
  information from one point to another.
   Information theory was born with the discovery of the
  fundamental laws of data compression and transmission.
What is “Information Theory”?
“Information Theory” answers two fundamental questions in
   communication theory:
• What is the ultimate data compression
   (the entropy H)
•What is the ultimate transmission rate of communication
   (the channel capacity)
It founds the most basic theoretical foundations of
   communication theory.
Entropy :
 It defines in terms of probabilistic behavior of a source
  of information.
 It is a measure of average information content per
  source symbol.
Capacity :
 Intrinsic ability of a channel to convey information.
 Related to the noise Characteristics of the channel
In Information Theory If the entropy < the capacity,
 then error free communication over the channel can be
 achieved.
Uncertainty, Information & entropy :
   Any information can be defined in 3 ways.
 uncertainity (before a process)
 Surprise ( During the process )
 Information (after process)
Source output is modulated as a discrete
random variable S, which takes an symbols.
 from a fixed finite alphabet
  S = { s0 , s1 ……………. s K-1 }
With probabilities,
 p (S= sk ) = pk, k=0, 1, ….K-1.
This set of probabilities must satisfy the condition.
  k-1
   ∑ pk = 1.
  k=0
Case 1:
consider the event S= sk , the emission of symbol sk by the source
  with probability pk
     If pk = 1 & pi = 0 , for all i ≠ k, then there is no surprise &
  therefore no information.
Case 2:
     If source symbols occur with different probabilities , &
  the probability pk is low, then there is no more surprise
  and therefore information .
 The amount of information is related to the inverse of the
  probability of occurrence.
      I(sk) =log (1/ pk )
Properties:
1. I(sk) =0 , for pk =1
    outcome of an event is known before it occurs, Therefore no
    information gained.
2. I(sk) ≥ 0 for 0≤ pk ≤ 1
   The occurrence of an event S= sk either provides some or no
    information but never brings about a loss of information.
3. I(sk) > I(si) for pk < pi
 That is ,less probable an event is, the more information we gain
    when it occurs
4. I(sk si ) = I(sk) + I(si) if sk & si are statistically independent.
Unit of information is called the bit
          I(sk) =log2 (1/ pk )
                = - log2 ( pk ) , for k=0,1,….K-1
Entropy
  It is a measure of average information content per source symbol.
 Denoted by H(s)
 H(s) =E [I(sk)]
          k-1
         = ∑ pk I(sk)
           k=0
          k-1
  H(s) = ∑ pk log2 (1/ pk )
           k=0
 The quantity H(s) is called the entropy of a discrete memoryless
 source with source alphabet S
Example :
Consider a discrete memory less source with source alphabet
S= { s0 , s1, s2 } with respective probabilities p0= 1/4 , p1= 1/ 4 ,
p2 =1 / 2. find the entropy of the source.
Entopy formula is
             k-1
   H(s) = ∑ pk log2 (1/ pk )
            k=0
           2
       = ∑ pk log2 (1/ pk )
          k=0
       = p0 log2 (1/ p0 )+ p1 log2 (1/ p1)+ p2 log2 (1/ p2 )
     = 1/4 log2 (1/ 1/4 ) + 1/4 log2 (1/ 1/4 ) + 1/2 log2 (1/ 1/2 )
     = .5+.5+.5
     = 1.5 bits.
Properties of Entropy :
  The entropy H(S) of such a source is bounds as follows
                 0 ≤ H(s) ≤ log2 k
where k is the radix (no: of symbols) of the alphabet of the source
1. H(s)=0, if & only if   { p =1 , for some k,
                              k
                  pk = 0 , otherwise
 This Lower bound on entropy corresponds to no uncertainty.
2. H(S) = log2 k, if & only if pk =1/ k , for all k,
This upper bound on entropy corresponds to maximum
 uncertainty
Entropy of binary memory less source.
 Consider a binary source for which symbol 0 occurs with
  probability p0 & symbol 1 with probability p1= 1- p0.
 Symbols emitted by the sources are statistically independent.
 The entropy of source
     H(s) = -p0 log2 (p0 ) - p1 log2 ( p1)
          = -p0 log2 (p0 ) - (1- p0) log2 (1- p0) bits.
H (p0 ) = -p0 log2 (p0 ) - (1- p0) log2 (1- p0)
where H (p0 ) is the entropy function.
Extension of a discrete Memory less Source:
 The probabilty of a source symbol in yn is eual to the
  probabilities of the n source symbols in y constituting the
  particular source symbol in yn .
 H(yn ), The entropy of the extended source is equal to n times
 H(y), the original source.
  H(yn) = nH(y).
Problem for Entropy of extended source
Source Coding theorem :
 The problem in communication is the efficient representation
  of data generated by a source .
 The process by which the representation is accomplished is
  called source encoding.
 The Device that performs the representation is called source
  encoder
a. Frequently used source symbols – assign short code
b. Rare used source symbols – assign long code
 we refer to such a source code as a Variable Length code.
 Morse Code – example for Variable Length code
              - alphabets and numerals are encoded into
  marks(.) and Spaces(_)
             - For E : .
           - for Q : _ _ _ . _ _
The output sk is converted into a block of 0s and 1s by source encoder bk.
We define the average code word length
The parameter       represents the average no: of bits per source symbol
Lmin denote the minimum possible value of
 then the coding efficiency of the source encoder as
                    η= Lmin/
Data Compaction
 Data compaction (a.k.a lossless data compression)
  means that we will remove redundant information
 from the signal prior the transmission
   basically this is achieved by assigning short descriptions
    to the most frequent outcomes of the source output and
    vice versa
 Source-coding schemes that are used in data
  compaction are e.g. prefix coding, huffman coding,
  lempel-ziv
Prefix coding
prefix of code word : any sequence made up of the initial part of
  the code word
prefix code : a code in which no code word is the prefix of any
other code word .
Where 2 refers to the radix (no: of symbols) in the binary alphabet
Kraft – McMillan inequality doesnot tell that a source code is a prefix code.
It is a merely a condition on the code word lengths of the code and the code
words themselves.
Instantaneous codes:
Prefix coding has an important feature that it is always uniquely decodable
Prefix codes can also be referred to as instantaneous codes, meaning that the
decoding process is achieved immediately
Huffman coding
In Huffman coding to each symbol of a given alphabet is assigned a sequence of bits
   accor-ding to the symbol probability
1. Calculate the frequencies of the list of symbols (organize as a list).
2. Sort the list in (decreasing) order of frequencies.
3. Divide list into two half's, with the total freq. Counts of each half being as close as
     possible to the other.
4. The upper half is assigned a code of 0 and lower a code of 1.
5. Recursively apply steps 3 and 4 to each of the halves, until each symbol has
     become a corresponding code leaf on a tree.
L  (0.4  0.2  0.2)  2  (0.1  0.1)  3  2.2
                 1 
H ()  0.4 log        ..........     2.12193
                 0.4 
      η= H(s)/      = 2.12193/2.2
      η =.96
Shannon- Fano’s Coding :
Procedure for Shannon- Fano’s algorithm :\
1. List the source symbols in the decreasing probabilities
2. Partition this ensembles into almost two equiprobable groups
3. assign 0 to one group and 1 to other group. These from the
   starting code symbols of the code.
4. Repeat the steps 2 & 3 on each of the subgroups until the
   subgroups contains only one source symbols to determine
   the succeeding code symbols of the codeword's.
5. For convenience , a code tree may be constructed and codes
   read off directly
Lempel-Ziv Coding
 In Lempel-Ziv coding no probabilities of the source symbols
  is needed, which is actually most often the case
 LZ algorithm parses the source data stream into segments
  that are the shortest subsequences not encountered
  previously
Discrete Memoryless Channels:
 A discrete memoryless channel is a statistical model with an
  input of X and output of Y, which is a noisy version of X (here
  both are random variables)
 In each time slot the channel accepts an input symbol X
  selected from a given alphabet
 We can create channel matrix that corresponds fixed channel
  inputs and outputs and we can assume the probabilities of the
  symbols
Transition Probabilities
P(yk/xj) = p(Y= yk/ X = xi) for all j and k
A discrete memoryless channel is to arrange the various transition probabilities
of the channel in the form of a matrix as follows:
The j-by-k matrix P is called channel matrix or transition matrix.
The joint probability distribution of the random variables X and Y
  is given by
        p(xj ,yk) = p( X =xj , Y = yk )
                    = p (xj / yk ) p(xj )
The marginal probability distribution of the output random
  variable Y is obtained by averaging out the dependence of p(xj
  ,yk) on xj as shown by
        p(yk )= ∑ p (yk / xj) p(xj )
The probabilities p(xj ) for j= 0, 1,…….J-1, are known as the a priori
  probabilities of the various input symbols.
Binary Symmetric Channel
The channel has two input symbols (x0 = 0, x1 = 1) and
  two output symbols (y0 = 0, y1 =1 )
The channel is symmetric because the probability of
  receiving a 1 if a 0 is sent is the same as the probability
  of receiving a 0 if a 1 is sent .
conditional probabaility of error is denoted by p.
Transition probability diagram of binary symmetric
Channel is shown below.
 Mutual information
  It uses conditional entropy of X selected from a
 known alphabet
   conditional entropy means the uncertainty remaining
    about the channel input after the channel output has
    been observed
 Mutual information has several properties :
   symmetric channel
   always nonnegative
   relation to the joint entropy of a channel input and
    channel output
Where H(y) is entropy of the channel output and H(y/x) is the
conditional entropy.
Channel Capacity
 Capacity in the channel is defined as a intrinsic ability of a
 channel to convey information
 Using mutual information the channel capacity of a discrete
 memoryless channel is a maximum average mutual information
 in any single use of channel over all possible probability
 distributions
 denoted by C
                      C= maxI(x,y)/p(xj )
Channel Coding theorem:
 Channel coding consists of mapping the incoming
  data sequence into a channel input sequence and vice
  versa via inverse mapping
   mapping operations performed by encoders and
    decoders.
   H(s)/Ts ≤ C/ Tc ,where C/ Tc is called critical rate.
 Source     Channel      Channel       Channel      Destinatio
            Encoder                    decoder          n
Information Capacity Theorem (Shannon Third
  theorem)
 Continuous channel of bandwidth B hertz, perturbed
  by additive white Gaussiannoise of power density N0
  /2 and limited in bandwidth to B.
       C= Blog 2(1+p/N0B)