Introduction to
Information Theory
What is the information Theory?
Information theory is a branch of mathematics and computer science that deals with the
quantification, storage, transmission, and processing of information. It was first introduced
by Claude Shannon in his groundbreaking paper "A Mathematical Theory of
Communication" in 1948.
At its core, information theory is concerned with measuring the amount of information
contained in a signal, message, or data set, and identifying the optimal methods for
transmitting or processing that information in the most efficient and reliable manner
possible. Key concepts in information theory include entropy, channel capacity, coding
theory, and error correction.
Applications of information theory can be found in a wide range of fields, including
telecommunications, cryptography, data compression, artificial intelligence, and genetics,
among others. The theory has played a significant role in the development of modern
communication technologies, such as digital telecommunications and the internet, and
continues to be an active area of research and innovation.
Why the information Theory is important?
Communication Technology: The theory provides the foundation for modern
communication technologies, such as digital telecommunications, wireless
communication, and the internet. It has helped to develop efficient and reliable methods
for transmitting and processing information across a variety of channels.
Data Compression: Information theory has also led to the development of efficient data
compression techniques that allow us to store and transmit large amounts of data more
efficiently. This is especially important in applications where storage or transmission
resources are limited.
Cryptography: Information theory has contributed to the development of modern
cryptography, which is used to secure communication channels and protect sensitive
information from unauthorized access.
Artificial Intelligence: The theory has also been instrumental in the development of
artificial intelligence and machine learning algorithms, which rely on the efficient
processing and manipulation of large amounts of data.
Overall, information theory is a fundamental theory that underlies many aspects of modern
technology and scientific research. Its applications are wide-ranging and have contributed
to significant advancements in numerous fields.
Information Theory
Information is a term with many meanings depending on context: meaning,
knowledge, instruction, communication etc.
Information is a quality of a message from a sender to one or more receivers.
Quantity of Information is about its certainty.
Information theory is a branch of applied mathematics and electrical engineering
involving the quantification of information. Information theory was developed by
Claude E. Shannon and W. Weaver to find fundamental limits on compressing and
reliably communicating data.
Model of a Digital Communication System
Message
Encoder
e.g. English symbols e.g. English to 0,1 sequence
Information
Coding
Source
Communication
Channel
Destination Decoding
Can have noise
or distortion
Decoder
e.g. 0,1 sequence to English
Communication Channel Includes
Shannon Wants to…
Shannon wants to find a way for “reliably” transmitting data
throughout the channel at “maximal” possible rate.
In his 1948 paper he build a rich theory to the problem of reliable
communication, now called “Information Theory” or “The Shannon
Theory” in honor of him.
Information
Coding
Source
Communication
Channel
Destination Decoding
For example, maximizing the
speed of ADSL @ your home
Shannon’s Vision
Source Channel
Data
Encoding Encoding
Channel
Source Channel
User
Decoding Decoding
Example1: Disk Storage
Data Zip Add CRC
Channel
Verify
User Unzip
CRC
In terms of Information Theory Terminology
Source
Zip = Encoding Data Compression
= Source
Unzip Data Decompression
Decoding
Channel Error Protection
Add CRC = Encoding
Verify Channel
CRC = Decoding Error Correction
Example: VCD and DVD
MPEG RS
Moive
Encoder Encoding
CD/DVD
MPEG RS
TV
Decoder Decoding
RS stands for Reed-Solomon Code.
Example: Cellular Phone
Speech CC
Encoding Encoding
Channel
Speech CC
Decoding Decoding
GSM/CDMA
CC stands for Convolutional Code.
Example: WLAN IEEE 802.11b
CC
Data Zip
Encoding
Channel
CC
User Unzip
Decoding
IEEE 802.11b
CC stands for Convolutional Code.
Shannon Theory
The original 1948 Shannon Theory contains:
1. Measurement of Information
2. Source Coding Theory
3. Channel Coding Theory
Measurement of Information
Shannon’s first question is
“How to measure information
in terms of bits?”
= ? bits
= ? bits
All events are probabilistic!
Using Probability Theory, Shannon showed that there is only
one way to measure information in terms of number of bits:
Shannon showed:
“To reliably store the information generated by some random
source X, H(X) bits for each outcome.”
H ( X ) p( x ) log 2 p( x )
x
called the entropy function
Where H(X) is the entropy of a random variable X, and p(x) is the probability of X
taking on a particular value x. The summation is taken over all possible values of X.
For example
Tossing a dice:
Outcomes are 1,2,3,4,5,6
Each occurs at probability 1/6
Information provided by tossing a dice is
6 6
H p(i ) log 2 p(i ) p(i ) log 2 p(i )
i 1 i 1
6
1 1
log 2 log 2 6 2.585 bits
i 1 6 6
Meaning:
If I toss a dice 1,000,000 times and record
values from each trial
1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
In principle, I need 3 bits for storing each
outcome as 3 bits covers 1-8. So I need
3,000,000 bits for storing the information.
Using ASCII representation, computer
needs 8 bits=1 byte for storing each
outcome
The resulting file has size 8,000,000 bits
But Shannon said:
You only need 2.585 bits for storing
each outcome.
So, the file can be compressed to yield
size
2.585x1,000,000=2,585,000 bits
Optimal Compression Ratio is:
2,585,000
0.3231 32.31%
8,000,000
Let’s Do Some Test!
File Size Compression
Ratio
No 8,000,000 100%
Compression bits
Shannon 2,585,000 32.31%
bits
Winzip 2,930,736 36.63%
bits
WinRAR 2,859,336 35.74%
bits
Follow-up
Later in 1952, David Huffman, while was a graduate student in
MIT, presented a systematic method to achieve the optimal
compression ratio guaranteed by Shannon. The coding
technique is therefore called “Huffman code” in honor of his
achievement. Huffman codes are used in nearly every
application that involves the compression and transmission of
digital data, such as fax machines, modems, computer
networks, and high-definition television (HDTV), to name a few.
The Simplest Case: Computer Network
Communications over computer network,
ex. Internet
The major channel impairment herein is
Packet Loss
Binary erasure channel
The binary erasure channel is a communication channel model used to describe the
transmission of information over a noisy channel. In this channel, each transmitted bit can
either be received correctly or be "erased" (lost), with some probability.
Formally, let's assume we have a binary input signal consisting of 0's and 1's. The output
of the binary erasure channel is also a binary signal, but with the possibility that some bits
may be erased. Specifically, if the input bit is transmitted successfully, the output will be
the same as the input bit. If the bit is erased, the output will be marked with a special symbol
(e.g., "?", "X", or "e") to indicate that the bit was lost.
The probability of erasure is denoted by a parameter p, which represents the likelihood that
a given bit will be erased. For example, if p=0.1, then 10% of the bits transmitted through
the channel will be lost.
The binary erasure channel is often used as a simplified model of real-world
communication channels that are subject to errors and losses, such as wireless or satellite
communication. By analyzing the performance of different coding and decoding schemes
under the binary erasure channel model, researchers can develop strategies to improve the
reliability of communication systems in practice.
Binary erasure channel
In coding theory and information theory, a binary erasure channel (BEC) is a communications channel
model. A transmitter sends a bit (a zero or a one), and the receiver either receives the bit correctly, or with
some probability Pe receives a message that the bit was not received ("erased") .
• The channel model for the binary erasure channel showing a mapping
from channel input X to channel output Y (with known erasure symbol
?). The probability of erasure is Pe .
• A binary erasure channel with erasure probability Pe is a channel with
binary input, ternary output, and probability of erasure Pe. That is, let X e
be the transmitted random variable with [0,1]. Let Y be the received
variable with [0,1,e], where e is the erasure symbol. Then, the channel is
characterized by the conditional probabilities.
The capacity of the BEC with error probability p is 1 − pe.
• Once a binary symbol is erased,
it can not be recovered…
Ex:
Say, Alice
X sends 0,1,0,1,0,0 to Bob
Y
But the network was so poor that Bob
Y only received
0,?,0,?,0,0
So, Bob
Y asked Alice
X to send again
Only this time Y
he received 0,?,?,1,0,0
and Bob goes CRAZY!
What can Alice
X do?
What if Alice
X sends
0000,1111,0000,1111,0000,0000
Repeating each transmission four times!
What Good Can This Serve?
Now Alice
X sends 0000,1111,0000,1111,0000,0000
The only cases Bob
Y can not read Alice are for example
????,1111,0000,1111,0000,0000
all the four symbols are erased.
But this happens at probability p4
Thus if the original network has packet loss rate p=0.25, by
repeating each symbol 4 times, the resulting system has
packet loss rate p4=0.00390625
But if the data rate in the original network is 8M bits per
second
8Mbps
X
Alice
p=0.25
Bob
Y
X can only transmit at 2 M bps
With repetition, Alice
8Mbps
2 Mbps
X4
Alice
X Bob
p=0.00390625 Y
Is repetition the best Alice
X can do?
Shannon’s Channel Coding Theorem
Shannon answered:
“Give me a channel and I can compute a quantity
called capacity, C for that channel. Then reliable
communication is possible only if your data rate
stays below C.”
Shannon means
In this example:
8Mbps
p=0.25
Alice
X Bob
Y
He calculated the channel capacity
C=1-p=0.75
And there exists coding scheme such that:
8Mbps
?
Alice
X
8 x (1-p)
p=0 Y
Bob
=6 Mbps
But With 50 Years of Hard Work
We have discovered a lot of good codes:
Hamming codes
Convolutional codes,
Concatenated codes,
Low density parity check (LDPC) codes
Reed-Muller codes
Reed-Solomon codes,
BCH codes,
Finite Geometry codes,
Cyclic codes,
Golay codes,
Goppa codes
Algebraic Geometry codes,
Turbo codes
Zig-Zag codes,
Accumulate codes and Product-accumulate codes,
…
We now come very close to the dream Shannon had 50
years ago!
Capacity of a System
The bit rate of a system increases with an
increase in the number of signal levels we
use to denote a symbol.
A symbol can consist of a single bit or “n”
bits.
The number of signal levels = 2n.
As the number of levels goes up, the spacing
between level decreases -> increasing the
probability of an error occurring in the
presence of transmission impairments.
3.34
Nyquist Theorem
Nyquist gives the upper bound for the bit
rate of a transmission system by calculating
the bit rate directly from the number of bits
in a symbol (or signal levels) and the
bandwidth of the system (assuming 2
symbols/per cycle and first harmonic).
Nyquist theorem states that for a noiseless
channel:
C = 2 B log22n
C= capacity in bps
B = bandwidth in Hz
3.35
Example
Consider a noiseless channel with a bandwidth of 3000 Hz transmitting a signal with
two signal levels. The maximum bit rate can be calculated as
Bit rate=2B Log2L
Consider a noiseless channel with a bandwidth of 3000 Hz transmitting a signal with
four signal levels. The maximum bit rate can be calculated as
3.36
Example
We need to send 265 kbps over a noiseless channel with a bandwidth of 20 kHz.
How many signal levels do we need?
Solution
We can use the Nyquist formula as shown:
Since this result is not a power of 2, we need to either increase the number of
levels or reduce the bit rate. If we have 128 levels, the bit rate is 280 kbps. If we
have 64 levels, the bit rate is 240 kbps.
3.37
Shannon’s Theorem
Shannon’s theorem gives the capacity of a system in the
presence of noise.
C = B log2(1 + SNR)
where C is the channel capacity, B is the channel bandwidth in hertz, SNR is the
signal power to the noise power.
• The channel capacity, C, increases as the available bandwidth increases and as the signal
to noise ratio increases (improves).
• This expression applies to information in any format and to both analogue and data
communications, but its application is most common in data communications.
• The channel capacity theorem is one of the most important results of information theory.
In a single formula it highlights the interplay between 3 key system parameters:
– channel bandwidth,
– average transmitted or received signal power,
– noise power at the channel output.
3.38
• For a given average transmitted power S and channel bandwidth, B, we can transmit
information at the rate C bits/s with no error, by employing sufficiently complex coding
systems. It is not possible to transmit at a rate higher than C bits/s by any coding system
without a definite probability of error. Hence the channel capacity theorem defines the
fundamental limit on the rate of error-free transmission for a power-limited, band-limited
channel.
Example
Consider an extremely noisy channel in which the value of the signal-to-noise ratio
is almost zero. In other words, the noise is so strong that the signal is faint. For
this channel the capacity C is calculated as
This means that the capacity of this channel is zero regardless of the bandwidth.
In other words, we cannot receive any data through this channel.
Example
We can calculate the theoretical highest bit rate of a regular telephone line. A
telephone line normally has a bandwidth of 3000. The signal-to-noise ratio is
usually 3162. For this channel the capacity is calculated as
This means that the highest bit rate for a telephone line is 34.860 kbps. If we want
to send data faster than this, we can either increase the bandwidth of the line or
improve the signal-to-noise ratio.
Example
The signal-to-noise ratio is often given in decibels. Assume that SNRdB = 36 and
the channel bandwidth is 2 MHz. The theoretical channel capacity can be
calculated as
3.40
Example
For practical purposes, when the SNR is very high, we can assume that SNR + 1 is almost
the same as SNR. In these cases, the theoretical channel capacity can be simplified to
For example, we can calculate the theoretical capacity of the previous example as
Example
We have a channel with a 1-MHz bandwidth. The SNR for this channel is 63. What are the
appropriate bit rate and signal level?
Solution
First, we use the Shannon formula to find the upper limit.
The Shannon formula gives us 6 Mbps, the upper limit. For better performance we choose
something lower, 4 Mbps, for example. Then we use the Nyquist formula to find the number
of signal levels. 3.41
Note: The Shannon capacity gives us the upper limit; the
Nyquist formula tells us how many signal levels we need.
Hw.
A telephone network has a bandwidth of 3.4 kHz.
(a) Calculate the capacity of the channel for a signal-to-noise
ratio of 30 dB.
(b) Calculate the minimum signal-to-noise ratio required for
information transmission through the channel at the rate
of 4800 bits/s.
(c) Calculate the minimum signal-to-noise ratio required for
information transmission through the channel at the rate
of 9600 bits/s.
Capacity versus Bandwidth
It appears from the expression:
that as the bandwidth increases the capacity should increase proportionately. But this
does not happen, because increasing the bandwidth, B, also increases thenoise power
N = NoB giving: