0% found this document useful (0 votes)

30 views22 pages

Information Measure

This document discusses information theory and defines key concepts like information, entropy, mutual information, and self information from an information theoretic perspective. It provides examples to illustrate Hartley and Shannon's measures of information and how they differ. The entropy of a random variable is defined as the expected value of its self information.

Uploaded by

Nancy Brisa Solis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views22 pages

Information Measure

Uploaded by

Nancy Brisa Solis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Chapter 4

Information Measure

Information theory is a mathematical theory of communication with its base in prob-

ability theory. It involves specifying what information is and setting up measurements
for information, and had its birth with Claude Shannon’s landmark paper A Mathematical
Theory of Communications from 1948 [31]. The theory answers two fundamental questions:

• What is information?

• What is communication?

Even if we think we know what information and communication is, it is not the same
as defining it in a mathematical context. As an example one can consider an electronic
copy of Claude Shannon’s paper in pdf format. The one considered here has the file size
2.2MB. It contains a lot of information about the subject, but how can the information in
the file be measured? One way to look at it is to compress the file as much as possible.
The this size could be a measure of the amount of information required to describe the
file. In zip-format the same file has the size 713kB. So if we should quantify the amount
of information in the paper, the pdf version contains at least 1.5MB that is not necessary
for the pure information in the text. Is then the number 713kB a measure of the contained
information? From a mathematical point of view, we will see that it is definitely closer to
the truth. However, from a philosophical point of view, it is not certain. We can compare
this text with a text of the same size containing only randomly chosen letters. If they
have the same file size, do they contain the same amount of information? These semantic
doubts are not considered in the mathematical model. Instead, the question answered is
the amount of data needed to describe the text.

4.1 Information

In his paper Shannon set up a mathematical theory for information and communication,
based on probability theory. He gave a quantitative measure of the amount of informa-
tion stored in a variable and gave limits of how much information can be transmitted
from one place to another over a given communication channel.

33
34 CHAPTER 4. INFORMATION MEASURE

However, already twenty years earlier, in 1928, Hartley stated that a symbol can contain
information only if it has multiple choices [8]. That is, the symbol must be a random
variable. Hartley argued that if one symbol, X, has L alternatives, a vector of n indepen-
dent such symbols (a1 , . . . , an ) has Ln alternatives. To form a measure of information,
one must notice that if the symbol X has the information I, then the vector should have
the information nI. The conclusion of this was that an appropriate information measure
should be based on the logarithm of the number of alternatives,

IH (X) = log L

In that way

IH (X1 , . . . , Xn ) = n log L

E XAMPLE 4.1 Consider a the outcome of a throw with a fair dice. It has 6 alternatives,
and hence, the information according to Hartley is1

IH (Dice) = log2 6 ≈ 2.585 bit

In this example Hartley’s information measure makes sense, since it is the number of bits
needed to point out one of the six alternatives. But there can be other situations where it
runs into problem, like in the next example.

E XAMPLE 4.2 Let the variable X be the outcome of a counterfeit coin, with the probabil-
ities P (X = Head) = p and P (X = Tail) = 1 − p. According to Hartley the information
is

IH (X) = log2 (2) = 1 bit

In the case when the outcomes from the coin flip are equally likely, i.e. p = 21 , the measure
is intuitive, since one bit is the amount of data needed to describe the outcome. If we
instead consider the case when p is very small, and flip the coin several times after each
other our intuition does not say the same. Since p is small we would expect to have most
of the outcomes to be Tail and only a small fraction to be Head. The normal outcome
in a series of flips would be to have mostly Tail, meaning there is not much information
in this outcome. On the other hand, in the rare cases when Head occur there would be
much more information. So, even if Hartley’s information measure was ground breaking
at its time, it did not consider the probability distribution of the experiment.

When Shannon 20 years later introduced his measure the base of the information quantity
is the probability describing how two random variables are related. The information
achieved about the outcome of one variable by observing another can be viewed as the
ratio between the conditioned and the unconditioned probability for that outcome. Still,
as Hartley stated, the logarithm is an important part of the function.
1
Hartley did not specify the basis of the logarithm. Using the binary base, the information measure has
the unit bits. In this way it specifies the number of bits required to distinguish the alternatives.
4.1. INFORMATION 35

D EFINITION 4.1 The mutual information between event A and event B, denoted I(A; B),
is
P (A|B)
I(A; B) = log2 ,
P (A)
where it is assumed P (A) 6= 0 och P (B) 6= 0.

If nothing else is stated, we will assume the the logarithmic base 2 to achieve the unit bit
(binary digit). This unit was first used in Shannon’s paper, but it it also stated that it was
John W. Tukey who coined the expression. To derive the binary logarithmic function use
x = aloga x = blogb x = aloga b logb x
where the last equality follows from b = aloga b . This leads to
loga x
loga x = loga b logb x ⇒ logb x =
loga b
Especially, it is convenient to use
ln x log10 x
log2 x = =
ln 2 log10 2
1
It is also worth noting that log2 e = ln 2 . In Matlab the command log2(n) derives the
binary logarithm.

E XAMPLE 4.3 The outcome of a dice is reflected by the two random variables
X = Number of pips
Y = Odd or even number
The information achieved about the event X = 3 from the event Y = Odd is
P (X = 3|Y = Odd) 1/3
I(X = 3; Y = Odd) = log = log = log 2 = 1 bit
P (X = 3) 1/6
In other words, by knowing that the number of pips is odd, which split the set of out-
comes in half, we gain one bit information about the event that the number is 3.

The symmetry of the information measure follows from

P (A|B) P (A, B) P (B|A)
I(A; B) = log = log = log = I(B; A)
P (A) P (A)P (B) P (B)
That is, the information gained about event A by observing event B, is the same as the
information about event B by observing event A. This is the reason for the name mutual
information.

E XAMPLE 4.4 [cont’d] The information from the event X = 3 about the event Y = Odd)
is
P (Y = Odd|X = 3) 1
I(Y = Odd; X = 3) = log = log = log 2 = 1 bit
P (Odd) 1/2
The knowledge about X = 3 gives us full knowledge about the outcome of Y , which is a
binary choice with two equally sized parts. To specify one of the two outcomes of Y it is
required one bit.
36 CHAPTER 4. INFORMATION MEASURE

The mutual information between the events A and B can be bounded as

−∞ ≤ I(A; B) ≤ min − log P (A), − log P (B) (4.1)

To see this consider the variations of P (A|B), which is a value between 0 and 1. Since
the logarithm is a strictly increasing function the two end cases give the bounds on the
information,
0
P (A|B) = 0 ⇒ I(A; B) = log = log 0 = −∞
P (B)
1
P (A|B) = 1 ⇒ I(A; B) = log = − log P (B)
P (B)

Similarly, by letting P (B|A) = 1 the information is I(A; B) = − log P (A). If P (A) and
P (B) are not equal, there are two upper bound, where the minimum should be used.
Notice that since 0 ≤ P (A) ≤ 1 the value − log P (A) is positive. If I(A; B) = 0 the events
A and B are statistically independent since it imply

P (A|B)
= 1 P (A|B) = P (B)
P (B)

To get a measure for the information related with event A, consider the mutual informa-
tion between A and A. That is, the amount of information achieved about the event by
observing the same event. This quantity is called the self information.

D EFINITION 4.2 The self information in the event A is defined as

P (A|A)
I(A) = I(A; A) = log = − log P (A)
P (A)

That is, − log Pr(A) is the amount of information needed to determine that the event
A has occurred. The self information is always a positive quantity, and as long as the
outcome is not deterministic, i.e. P (A) = 1 for some event A, it is strictly positive.

4.2 Entropy

The above quantities deal with information related to specific events. An interesting
measure then is the average required information to determine the outcome of a random
variable. This is directly achieved from the expected value of the self information. We
get the following important definition.

D EFINITION 4.3 The entropy of a random variable X is

X
H(X) = EX − log p(x) = − p(x) log p(x) (4.2)
x

4.2. ENTROPY 37

In the derivations, we use the convention that 0 log 0 = 0, which follows from the corre-
sponding limit value. It is sometimes convenient to use the notation

L
X
H(p1 , p2 , . . . , pL ) = − pi log pi
i=1

when considering the entropy function for a probability distribution given as a vector
p = (p1 , p2 , . . . , pL ).

The entropy is the amount of information needed to determine the outcome of a random
variable. Thus, it can also be interpreted as the uncertainty of the outcome. Since the self
information is non-negative so is its average,

H(X) ≥ 0 (4.3)

In other words the uncertainty cannot be negative.

In many cases a random variable describes a binary choice, e.g. when considering flip-
ping of coins. The entropy function for this case is so widely used that it often has a
definition of its own.

D EFINITION 4.4 The binary entropy function for the probability p is defined as

h(p) = −p log p − (1 − p) log(1 − p)

The binary entropy function has its maximum for h( 12 ) = 1. In Figure 4.1 a plot of the
function is shown. Here the maximum value is depicted. It can also be see that the binary
entropy function is symmetric in p, i.e. h(p) = h(1 − p), which is also clear directly from
the definition. In the case of a coin flip the uncertainty is maximised for a fair coin, i.e.
P (Head) = p = 12 . If the p increases a natural initial guess is that the outcome would
be Head, and the uncertainty decreases. Similarly, if p decreases the uncertainty should
also decrease. At the and points, where p = 0 or p = 1, the outcome is known and the
uncertainty is zero, corresponding to h(0) = h(1) = 0.
h(p)

p
1/2 1

Figure 4.1: The Binary entropy function.

38 CHAPTER 4. INFORMATION MEASURE

E XAMPLE 4.5 Let X be the outcome of a fair dice. Then P (X = x) = 1/6, x = 1, 2, . . . , 6.

The entropy is
X1 1
H(X) = − log = log 6 = 1 + log 3 ≈ 2.5850 bit
x
6 6

Let Y be the outcome of a dice with a small weight in, such that the probabilities are
P (Y = 1) = 0, P (Y = 6) = 1/3 and P (Y = i) = 1/3, i = 2, 3, 4, 5. Then, the correspond-
ing uncertainty of the outcome is
1 1 4 1 4
H(Y ) = − log − log = + log 3 ≈ 2.2516 bit
3 3 6 6 6
Again, since we expect the outcome to be 6, there is little information is this outcome. In
total the uncertainty of the outcome has decreased compared to the fair dice.

The definition of the entropy is also valid for vectorised random variables, such as (X, Y )
with the joint probability function p(x, y).

D EFINITION 4.5 The joint entropy for a pair of random variables with the joint distribu-
tion p(x, y) is
X
H(X, Y ) = EXY − log p(x, y) = − p(x, y) log p(x, y)
x,y

Similarly, in the general case with an n-dimensional vector X = (X1 , . . . , Xn ), the joint
entropy function is
X
H(X1 , . . . , Xn ) = EX − log p(x) = − p(x) log p(x)
x

E XAMPLE 4.6 Let X and Y be the outcomes from two independent true dice. Then the
joint probability is P (X, Y = x, y) = 1/36 and the joint entropy
X 1 1
H(X, Y ) = − log = log 36 = 2 log 6 ≈ 5.1699
x,y
36 36

We conclude that the uncertainty of the outcome of two dice is twice the uncertainty of
one dice.

Let Z the sum of the dice, Z = X + Y . The probabilities are shown in the following table
Z 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (Z) 36 36 36 36 36 36 36 36 36 36 36

The entropy of Z is
1 2 3 4 5 6 5 4 3 2 1

H(Z) = H 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36
1 1 2 2 3 3
= −2 log − 2 log − 2 log
36 36 36 36 36 36
4 4 5 5 6 6
− 2 log − 2 log − log
36 36 36 36 36 36
23 5 5
= + log 3 − log 5 ≈ 3.2744
18 3 18
4.2. ENTROPY 39

The uncertainty of the sum of the dice is less than the outcomes of the individual dice.
This makes sense, since several outcomes of the pair X, Y give the same sum Z.

In (4.3) it was seen that the entropy function is a non-negative function. To achieve an
upper bound we will be helped by the following inequality.

L EMMA 4.1 (IT- INEQUALITY ) For every positive real number r

log(r) ≤ (r − 1) log(e)

with equality if and only if r = 1.

Proof: Consider the two functions y1 = r − 1 and y2 = ln r as shown in Figure 4.2. It is

easy to verify that at r = 1 the two curves coincide, ln r = r − 1. To show that in all other
cases ln r < r − 1 notice that the derivative of r − 1 is always 1 and the derivative of ln r is
(
d 1 > 1, r < 1 ⇒ ln r < r − 1
ln r = =
dr r < 1, r > 1 ⇒ ln r < r − 1

Hence, for 0 < r < 1 the curve for ln r is steeper than r − 1 and for r > 1 it is flatter. So in
all cases but r = 1, the curve for ln r must be strictly lower than r − 1. That is, we have
shown ln r ≤ r − 1 with equality if and only if r = 1. Rewriting into binary logarithm
completes the proof.

y y =r−1

y = ln(r)

r
1

−1

Figure 4.2: Graphical interpretation of the IT-inequality.

From the previous examples we would guess that the maximum value of the entropy
would occur when the outcomes have equal probabilities. In that case a random variable
X with L outcomes, {x1 , . . . , xL }, has the probabilities P (X = xi ) = L1 . The entropy is

X1 1
H(X) = − log = log L
L L
i
40 CHAPTER 4. INFORMATION MEASURE

To show that this value is a maximum over all distributions consider

X X
H(X) − log L = − p(x) log p(x) − p(x) log L
x x
X 1
= p(x) log
x
p(x)L
X 1
≤ p(x) − 1 log e
x
p(x)L
X 1 X
= − p(x) log e
x
L x
= (1 − 1) log e = 0
1
where the inequality follows from the IT-inequality with r = p(x)L , implying equality if
1
and only if p(x)L = 1. In other words, we have shown that

H(X) ≤ log L (4.4)

1
with equality if and only if p(x) = L. Combining (4.3) and (4.4) the following theorem
can be stated.

T HEOREM 4.2 If X is a random variable with L outcomes then

0 ≤ H(X) ≤ log L

with equality to the left if and only if there exists some i where p(xi ) = 1, and with
equality to the right if and only if p(xi ) = 1/L for all i = 1, 2, . . . , L.

To cover also dependencies between random variables the conditional entropy need to be
considered. This is defined as the average self information for the event Aăconditioned
on the event B, that is E[− log p(x|y)]. Since there are two random variables the average
needs to be taken over the joint probability distribution.

D EFINITION 4.6 The conditional entropy for X conditioned on Y is

X
H(X|Y ) = EXY − log p(x|y) = − p(x, y) log p(x|y)
x,y

where p(x, y) and p(x|y) are the joint and conditional probability functions, respectively.

By introducing the entropy of X conditioned on the event Y = y,

X
H(X|Y = y) = − p(x|y) log p(x|y)
x
4.2. ENTROPY 41

the conditional entropy can be written as

X
H(X|Y ) = H(X|Y = y)p(y)
y

E XAMPLE 4.7 The joint distribution of the random variables X and Y is given by
Y
p(x, y) 0 1
3
0 0 4
X
1 1
1 8 8
P
The
P marginal distributions of X and Y can be derived as p(x) = y p(x, y) and p(y) =
x p(x, y) giving

x 0 1 y 0 1
1 7 3 1
p(x) 8 8 p(y) 4 4

The individual entropies are

H(X) = h( 81 ) ≈ 0.5436
H(Y ) = h( 41 ) ≈ 0.8113
and the joint entropy
H(X, Y ) = H(0, 34 , 18 , 81 ) ≈ 1.0613
To calculate the conditional entropy H(X|Y ) first consider the conditional probabilities,
derived by p(x|y) = p(x,y)
p(y)

P (X|Y = 0) P (X|Y = 1)
1
0 0 2
X
1
1 1 2

The chain rule for probabilities can be used also to achieve a corresponding chain rule for
entropies,
X X
H(X, Y ) = − p(x, y) log p(x, y) = − p(x, y) log p(x|y)p(y)
x,y x,y
X X
=− p(x, y) log p(x|y) − p(y) log p(y)
x,y y

= H(X|Y ) + H(Y ) (4.5)

42 CHAPTER 4. INFORMATION MEASURE

Rewriting the result we get H(X|Y ) = H(X, Y ) − H(Y ). That is, the conditional entropy
is the difference between the uncertainty of the pair (X, Y ) and the information gained by
observing Y . A more general version of (4.5) can be stated as the chain rule for entropies
in the following theorem.

T HEOREM 4.3 Let X1 , . . . , Xn be an n-dimensional random variable drawn according to

p(x1 , . . . , xn ). Then the chain rule for entropies state that

n
X
H(X1 , . . . , Xn ) = H(Xi |X1 , . . . , Xi−1 )
i=1

E XAMPLE 4.8 [Cont’d from Example 4.7] The joint entropy can alternatively be derived
as

1 9 3
H(X, Y ) = H(X|Y ) + H(Y ) = + h( 14 ) = − log 3 ≈ 1.0613
4 4 4

4.3 Mutual Information

The entropy was obtained by averaging the self information for a random variable. Sim-
ilarly, the average mutual information between the random variables X and Y can be
defined as follows.

D EFINITION 4.7 The mutual information between the random variables X and Y is de-
fined as
h p(x, y) i X p(x, y)
I(X; Y ) = EX,Y log = p(x, y) log (4.6)
p(x)p(y) x,y
p(x)p(y)

Utilizing that p(x, y) = p(x)p(y|x) = p(y)p(x|y) the function can also be written as the
fraction between the conditional and the non-conditional probabilities,
h p(x|y) i h p(y|x) i
I(X; Y ) = EX,Y log = EX,Y log (4.7)
p(x) p(y)

The mutual information describes how strong two variables are connected. From the
definition it is clear that the it is a symmetric measure,

I(X; Y ) = I(Y ; X)
4.3. MUTUAL INFORMATION 43

Breaking up the logarithm in the definition it is possible to derive the mutual information
from the entropies as
h p(x, y) i
I(X; Y ) = EX,Y log
p(x)p(y)

= EX,Y log p(x, y) − log p(x) − log p(y)

= EX,Y log p(x, y) − EX log p(x) − EY log p(y)
= H(X) + H(Y ) − H(X, Y )
= H(X) − H(X|Y )
= H(Y ) − H(Y |X)

where the last two equalities follows from the chain rule of entropies.

E XAMPLE 4.9 [Cont’d from Example 4.7] The mutual information can be derives as

I(X; Y ) = H(X) + H(Y ) − H(X, Y )

≈ 0.5436 + 0.8113 − 1.0613 = 0.2936

Alternatively, we can use

I(X; Y ) = H(X) − H(X|Y ) ≈ 0.5436 − 1/4 = 0.2936

The mutual information between two random variables X and Y can be affected by ob-
serving a third variable Z. This is reflected in the conditional mutual information.

Similar to the unconditional case the conditional mutual information can be derived from
the entropies as

I(X; Y |Z) = H(X|Z) + H(Y |Z) − H(X, Y )

= H(X|Z) − H(X|Y Z)
= H(Y |Z) − H(Y |XZ)

Both the entropy and the mutual information are important measures of information. The
entropy states how much information is needed to determine the outcome of a random
variable. It will be shown later that this is a limit of how many bits is needed in average
to describe the variable. In other words, this is a limit of how much a symbol can be
compressed without any data being lost. The mutual information, on the other hand,
describes the amount of information achieved about the variable X by observing the
44 CHAPTER 4. INFORMATION MEASURE

variable Y . In a communication system a symbol, X, is transmitted over a channel. The

received symbol Y , which is a distorted version of X, is used by a receiver to estimate
X. Then the mutual information is a measure of how much information that can be
transmitted over this channel. It will lead to the concept of the channel capacity.

To get more knowledge about these quantities we introduce the relative entropy. It was
first considered by Kullback and Leibler in 1951 [18].

D EFINITION 4.9 Given two probability distributions p(x) and q(x) for the same sample
set X . The relative entropy, or Kullback-Leibler divergence, is defined as
h p(x) i X p(x)
D(p||q) = Ep log = p(x) log
q(x) x
q(x)

E XAMPLE 4.10 Consider a binary random variable, X ∈ {0, 1}, where we set up two
distributions. First we assume that the values are equally probable,

p(0) = p(1) = 1/2

and, secondly, we assume a skew distribution,

q(0) = 1/4 and q(1) = 3/4

The relative entropy from p to q is then

1 1/2 1 1/2
D(p||q) = log + log
2 1/4 2 3/4
1
= 1 − log 3 ≈ 0.2075
2
On the other hand, if we consider the relative entropy from q to p we get

1 1/4 3 3/4
D(q||p) = log + log
4 1/2 4 1/2
3
= log 3 − 1 ≈ 0.1887
4
That is, the relative entropy is not a symmetric measure. Furthermore, in Section 4.3.2
it will be shown that the triangular distance, in general, does not hold for the relative
entropy.

Sometimes the relative entropy is mentioned as a distance from one distribution to an-
other. When dealing with optimal source coding, we will see that there is a natural reason
for this. However, in a mathematical meaning a distance should be seen as a metric. That
is if a function g(x, y) is a metric then it should hold that

• g(x, y) ≥ 0 with equality if and only if x = y (i.e. non-negative)

• g(x, y) = g(y, x) (i.e. symmetry)

4.3. MUTUAL INFORMATION 45

• g(x, y) + g(y, z) ≥ g(x, z) (i.e. triangular inequality)

A divergence does not have to fulfill the symmetry and triangular inequality criteria,
which holds for the relative entropy. In Corollary 4.4 it will be stated that the relative
entropy is non-negative.

In the next example the Poisson distribution is considered. Assume that the result from
a random experiment is Poisson distributed with intensity λ. Then if the experimentally
estimated intensity is λ0 , the example shows the relative entropy for the true distribution
and the estimation. It will later be shown that this value reflects the penalty in compres-
sion rate due to the estimation missmatch.

E XAMPLE 4.11 Consider a random variable that is Poisson distributed, i.e.

λk e−λ
p(k) = , k = 0, 1, 2, . . .
k!
Then compare this distribution with another Poisson distribution with the parameter λ0 ,
i.e.

λk0 e−λ0
p0 (k) = , k = 0, 1, 2, . . .
k!
The relative entropy from p(k) to p0 (k) is then

X λk e−λ λk e−λ
k!
D(p||p0 ) = log
k! λk0 e−λ0
k k!
X λk e−λ λ
= k log + λ0 log e − λ log e
k! λ0
k
λ X λk e−λ X λk e−λ
= log k + (λ0 − λ) log e
λ0 k! k!
k k
λ λ0 − λ
= λ log +
λ0 ln 2
P 1
where in the last equality it is used that E[k] = λ, k p(k) = 1 and log e = ln 2 .

The relative entropy was introduced as a generalised information measure. The mutual
information can be expressed as a special case of the relative entropy as
h p(x, y) i
I(X; Y ) = EXY = D p(x, y) p(x)p(y)
p(x)p(y)

The mutual information is the information divergence from the joint distribution to the
independent case, i.e. the information divergence describes the relation between X and
Y.

Another aspect of the relative entropy to consider is the relationship with the entropy
function. Consider a random variable with the possible outcomes in the set X with car-
dinality |X | = L and probability distribution p(x), x ∈ X . Let u(x) = 1/L be the uniform
46 CHAPTER 4. INFORMATION MEASURE

distribution for the same set of outcomes. Then,

X X
H(X) = − p(x) log p(x) = log L − p(x) log p(x)L
x x
X p(x)
= log L − p(x) log = log L − D(p||u) (4.9)
x
u(x)

where we see that the relative entropy from p(x) to u(x) is the difference between the
entropy based on the true distribution and the maximum value of the entropy. Since the
maximum value is achieved by the uniform distribution, the relative entropy is a measure
of how much p(x) diverge from the uniform distribution.

The relative entropy can be shown to only take positive values. Here the IT-inequality is
used to show this fact.
X p(x) X q(x)
−D(p||q) = − p(x) log = p(x) log
x
q(x) x
p(x)
X q(x)
≤ p(x) − 1 log e
x
p(x)
X X
= p(x) − q(x) log e = (1 − 1) log e = 0
x x
q(x)
with equality if and only if p(x) = 1, i.e. when p(x) = q(x), for all x. The result is
expressed in the next theorem.

T HEOREM 4.4 Given two probability distributions p(x) and q(x) for the same sample set
X . Then the relative entropy is positive
D(p||q) ≥ 0
with equality if and only if p(x) = q(x) for all x.

Since the mutual information can be expressed as the relative entropy, the following
corollary follows immediately.

C OROLLARY 4.5 The mutual information for two random variables, X and Y , is a non-
negative function,
I(X; Y ) ≥ 0
with equality if and only if X and Y are independent.

Rewriting the mutual information as

I(X; Y ) = H(X) − H(X|Y )
and using that it is non-negative, the following theorem is concluded.

T HEOREM 4.6 The conditional entropy is upper bounded by the unconditional entropy,
i.e.
H(X|Y ) ≤ H(X)
with equality if and only if X and Y are independent.
4.3. MUTUAL INFORMATION 47

Intuitively, this means that the uncertainty, in average, will not increase by observing side
information. If the two variables are independent, the uncertainty will not be affected.
Using this result together with the chain rule for entropy, Theorem 4.3, we can get the
next result.

T HEOREM 4.7 Let X = (X1 , . . . , Xn ) be an n-dimensional random vector drawn accord-

ing to p(x1 , . . . , xn ). Then

n
X
H(X1 , . . . , Xn ) ≤ H(Xi )
i=1

with equality if and only if all Xi are independent.

That is, the uncertainty is minimised when considering a random vector as a whole,
instead of individual variables. In other words, we should take relationship between the
variables into account when minimising the uncertainty.

4.3.1 Convexity of information measures

In Definition 3.4 the terminology of convex functions was introduced. In areas like opti-
misation they have important properties since there are no local minima. In this section
the convexity of the information measures will be investigated. First, the relative entropy
will be shown to be convex. With this as a tool the entropy can be shown to be concave
and the convexity of mutual information investigated.

Our previous definition of a convex function is stated for one dimensional functions.
Therefore, we start with a generalisation of the definition. A straight forward way is to
say that a multidimensional functions is convex if it is convex in all dimensions. For the
two dimensional case the function surface resembles a bowl. Comparing with Figure 3.3
the two dimensional argument λ(x1 , y1 ) + (1 − λ)(x2 , y2 ), for 0 ≤ λ ≤ 1, describes a
straight line between the points (x1 , y1 ) and (x2 , y2 ) in the argument plane. The equation
for this line can be rewritten as the coordinates λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 . Con-
sidering the two dimensional function g(x, y), the values corresponding to the endpoints
are z1 = g(x1 , y1 ) and z2 = g(x2 , y2 ). Then λ(x1 , y1 , z1 ) + (1 − λ)(x2 , y2 , z2 ), 0 ≤ λ ≤ 1,
describes a straight line in the three dimensional space between the end points (x1 , y1 , z1)
and (x2 , y2 , z2 ). If the function value along the argument line, g λ(x1 , y1 )+(1−λ)(x2 , y2 )
never exceeds the corresponding value at the line, λg(x1 , y1 ) + (1 − λ)g(x2 , y2 ), the func-
tion g(x, y) is a convex function. That is, g(x, y) is convex over the region A if

g λ(x1 , y1 ) + (1 − λ)(x2 , y2 ) ≤ λg(x1 , y1 ) + (1 − λ)g(x2 , y2 )

for all λ such that 0 ≤ λ ≤ 1 and all (x1 , y1 ), (x2 , y2 ) ∈ A. Here A denotes a two-
dimensional convex region, i.e. a straight line between two points in the region should
never be outside the region. The corresponding regions considered in this text are easily
verified to satisfy this criterion. The above reasoning for convexity of functions can easily
be generalised for n-dimensional functions.
48 CHAPTER 4. INFORMATION MEASURE

(1) (1) (2) (2)

D EFINITION 4.10 Let x(1) = (x1 , . . . , xn ) and x(2) = (x1 , . . . , xn ) be two n-dimensional
vectors in the convex region A and g(x) an n-dimensional function. Then, g(x) is a con-
vex function in A if

λg x(1) + (1 − λ)x(2) ≤ g λx(1) + (1 − λ)g x(2)

for all λ such that 0 ≤ λ ≤ 1 and all x(1) , x(2) ∈ A.

The relative entropy is a two dimensional function in the probability pair (p, q), and can
thus be checked for convexity. Then, consider the four probability distributions p1 (x),
p2 (x), q1 (x) and q2 (x) over the same sample space X . For λ between 0 and 1 two new
distributions can be formed as
pλ (x) = λp1 (x) + (1 − λ)p1 (x)
qλ (x) = λq1 (x) + (1 − λ)q1 (x)

Considering the relative entropy from pλ to qλ it can be seen that

D pλ pλ = D λp1 + (1 − λ)p2 λq1 + (1 − λ)q2
X λp1 (x) + (1 − λ)p2 (x)
= λp1 (x) + (1 − λ)p2 (x) log
x
λq1 (x) + (1 − λ)q2 (x)
X λp1 (x) X (1 − λ)p1 (x)
≤ λp1 (x) log + (1 − λ)p1 (x) log
x
λq1 (x) x
(1 − λ)q1 (x)
X p1 (x) X p1 (x)
=λ p1 (x) log + (1 − λ) p1 (x) log
x
q1 (x) x
q1 (x)

= λD p1 q1 + (1 − λ)D p2 q2

where the inequality is a direct application of the log-sum inequality in Theorem 3.7.
Hence, we have the following result.

T HEOREM 4.8 The relative entropy is convex in (p, q).

From (4.9) the entropy can be expressed as Hp (X) = log L − D(p||u) where u is uniformly
distributed. Again using pλ (x) = λp1 (x) + (1 − λ)p1 (x) we get

Hpλ (X) = log L − D λp1 + (1 − λ)p2 u

≥ log L − λD p1 u − (1 − λ)D p2 u

= λ log L − D p1 u + (1 − λ) log L − D p2 u
= λHp1 (X) + (1 − λ)Hp2 (X)

where the inequality follows from the convexity of the relative entropy. The above result
is stated in the following theorem.

T HEOREM 4.9 The entropy is concave in p.

When it comes to the mutual information it can be written as I(X; Y ) = H(Y ) − H(Y |X).
Hence, it consists of two parts that needs to treated separately. The first case to con-
sider is two distributions on X, p1 (x) and p2 (x), while the conditional probability on Y ,
4.3. MUTUAL INFORMATION 49

p(y|x), is fixed. Then, again we form pλ (x) = λp1 (x) + (1 − λ)p2 (x). The corresponding
unconditional probability on Y then becomes
X
pλ (y) = pλ (x)p(y|x)
x
X X
=λ p1 (x)p(y|x) + (1 − λ) p2 (x)p(y|x)
x x
= λp1 (y) + (1 − λ)p2 (y)

Meaning that introducing pλ (x) gives

Hpλ (Y ) ≥ λHp1 (Y ) + (1 − λ)Hp2 (Y )

= λHp1 (Y |X) + (1 − λ)Hp2 (Y |X)

Putting things together we conclude

Ipλ (X; Y ) = Hpλ (Y ) − Hpλ (Y |X)

≥ λHp1 (Y ) + (1 − λ)Hp2 (Y ) − λHp1 (Y |X) − (1 − λ)Hp2 (Y |X)
= λIp1 (X; Y ) + (1 − λ)Ip2 (X; Y )

That is, for fixed p(y|x) the mutual information I(X; Y ) is concave in p(x).

Similarly, if p(x) is fixed and considering two distributions on the conditional probability,
p1 (y|x) and p2 (y|x) we introduce

pλ (y|x) = λp1 (y|x) + (1 − λ)p2 (y|x)

The corresponding joint and marginal probabilities are

pλ (x, y) = p(x)pλ (y|x) = λp1 (x, y) + (1 − λ)p2 (x, y)

and
X
pλ (y) = p(x)pλ (y|x) = λp1 (y) + (1 − λ)p2 (y)
x
P
where pi (x, y) = p(x)pi (y|x) and pi (y) = x p(x)pi (y|x). Then by writing the mutual
information as the relative entropy and using its convexity we get

Ipλ (X; Y ) = D pλ (x, y) p(x)pλ (y)

≤ λD p1 (x, y) p(x)p1 (y) + (1 − λ)D p2 (x, y) p(x)p2 (y)
= λIp1 (X; Y ) + (1 − λ)Ip2 (X; Y )

Hence, the mutual information is convex in p(y|x). The convexity of the mutual informa-
tion can be summarised in the following theorem.
50 CHAPTER 4. INFORMATION MEASURE

T HEOREM 4.10 The mutual information I(X; Y ) is

• concave in p(x) if p(y|x) is fixed.

• convex in p(y|x) if p(x) is fixed.

4.3.2 Symmetric divergence measure for distributions

To be done

4.4 Entropy of sequences

In the previous section the the information measures is defined for random variables. Of-
ten it is desirable to use also random processes where the variables in a sequence are sta-
tistically dependent. Then, to generalize the entropy measure complete sequences must
be considered. In this section first a famous result on data processing will be derived.
Then the entropy rate will be defined, which is the corresponding entropy measure for
random processes.

For the first part, consider a Markov chain with three variables X, Y and Z. Their de-
pendencies are described in Figure 4.3. The process A transforms X into Y and process
B transforms Y into Z. These processes are very general and can for example represent
pre-processing, post-processing or transmission of data.

X Y Z
Process A Process B

Figure 4.3: The dependencies used in the data processing lemma.

The assumed Markov property gives that X and Z are independent when conditioned
on Y , i.e.

P (XZ|Y ) = P (X|Y )P (Z|XY ) = P (X|Y )P (Z|Y )

where the second equality follows from the Markov condition. Then the mutual infor-
mation between the end points can be derived and bounded in two ways,

I(X; Z) = H(X) − H(X|Z) ≤ H(X) − H(X|Y Z) = H(X) − H(X|Y ) = I(X; Y )

and

I(X; Z) = H(Z) − H(Z|X) ≤ H(Z) − H(Z|XY ) = H(Z) − H(Z|Y ) = I(Z; Y )

This result is stated as the data processing lemma.

4.4. ENTROPY OF SEQUENCES 51

L EMMA 4.11 (D ATA P ROCESSING L EMMA ) Let the random variables X, Y and Z form
a Markov chain, X → Y → Z. Then

I(X; Z) ≤ I(X; Y )
I(X; Z) ≤ I(Y ; Z)

An interpretation of the lemma can be viewed in the following way. Assume first that X
is transformed into Y by process A. This can for example be a transmission of data over
a channel distorting the signals (e.g. wired or wireless communication or writing and
reading of a CD, DVD or flash memory). The aim of the receiver is then to get as much
information about X by observing Y . It is common to perform post-processing, which in
this model is represented by process B. The data processing lemma states that the infor-
mation about X by viewing Z cannot exceed the information the information about X
by viewing Y . In other words, information about X will not increase by post-processing,
it can only decrease. In practice, however, post-processing is often used to transform the
information into another representation where the information is easier accessible for in-
terpretation. For example, it is easier to understand an image when viewed on a screen
than it is from the data received.

Similarly, process A can represent pre-processing and then process B the transmission.
Then, the data processing lemma states that the information cannot increase by the pro-
cessing. Still, in practice it is common to use pre-processing in communication systems
to transform data into appropriate representations. Summarising, the lemma states that
the information cannot increase by either pre- or post-processing. The information can
only be destroyed by the processing.

Next, the description will go to a more general description of information measure for
sequences. In general there is a dependency between the symbols in the sequence, which
can be modelled by a random random process. In this section two natural generalisations
of the entropy function will be introduced. It will be shown that these two definitions are
in fact equivalent. This measure can in many cases be used and interpreted in the same
way for a random process as the entropy for random variables.

A natural way to define the entropy per symbol for a sequence is by treating the sequence
as a multi-dimensional random variable and averaging over the number of symbols. As
the length of the sequence tend to infinity we get the following definition.

D EFINITION 4.11 The entropy rate of a random process is

1
H∞ (X) = lim H(X1 X2 . . . Xn ) (4.10)
n→∞ n

To see that this is a natural generalisation of the entropy function where the variables in
a sequence are considered independent, consider a sequence of i.i.d. variables as in the
next example.
52 CHAPTER 4. INFORMATION MEASURE

E XAMPLE 4.12 Consider a sequence of i.i.d. random variables with entropy H(X). Then
the entropy rate equals the entropy function since,
1 1X
H∞ (X) = lim H(X1 . . . Xn ) = lim H(Xi |X1 . . . Xi−1 )
n→∞ n n→∞ n
i
1X 1X
= lim H(Xi ) = lim H(X) 1 = H(X)
n→∞ n n→∞ n
i i

An alternative definition of the entropy rate for a random process, would be to consider
the entropy of the nth variable in the sequence, conditioned on all the previous. As
n → ∞ we get the following definition.

D EFINITION 4.12 The alternative entropy rate of a random process is

H(X|X ∞ ) = lim H(Xn |X1 X2 . . . Xn−1 ) (4.11)

n→∞

To see how the two definitions relates, rewrite the entropy with the chain rule,
1 1X
H(X1 . . . Xn ) = H(Xi |X1 . . . Xi−1 )
n n
i

The right hand side is the arithmetic mean of H(Xi |X1 . . . Xi−1 ). By the law of large
numbers, as n → ∞ this will approach H(X|X ∞ ). Hence, asymptotically as the length
of the sequence grows to infinity the two definitions for the entropy rate are equal. This
important result is stated as a theorem.

T HEOREM 4.12 The entropy rate and the alternative entropy rate are equivalent, i.e.

H∞ (X) = H(X|X ∞ )

In the continuation we will mostly use the notation in the first definition.

Consider a stationary (time-invariant) random process, then

H(Xn |X1 . . . Xn−1 ) ≤ H(Xn |X2 . . . Xn−1 ) = H(Xn−1 |X1 . . . Xn−2 )

where the last equality follows since it is stationarity. Here it is seen that H(Xn |X1 . . . Xn−1 )
is a decreasing function in n. As n decreases a lower bound for the entropy function is
found from

H(Xn |X1 . . . Xn−1 ) ≤ · · · ≤ H(X2 |X1 ) ≤ H(X1 ) = H(X) ≤ log |X |

Finally, since the entropy is a non-negative function we can state the following theorem.

T HEOREM 4.13 For a stationary random process the entropy rate is bounded by

0 ≤ H∞ (X) ≤ H(X) ≤ log |X |

4.4. ENTROPY OF SEQUENCES 53

log |X | H(Xn |X1 . . . Xn−1 )

H(X)

H∞ (X)

Figure 4.4: The relation between H∞ (X) and H(X).

In Figure 4.4 the relation between log |X |, H(X), H(Hn |X1 . . . Xn−1 ) and H∞ (X) is shown
as a function of n. One natural conclusion is that the uncertainty of the sequence is less if
there is a dependency between the symbols.

So far the entropy rate has been treated for the class of stationary random processes.
If the theory is limited to the often used Markov chains it is possible to be more spe-
cific on derivations of the entropy rate. From the unit memory property and stationarity
of a Markov process, the conditional entropy can be written as H(Xn |X1 . . . Xn−1 ) =
H(Xn |Xn−1 ). Then, the entropy rate is

where
X
H(X2 |X1 = xi ) = P (X2 = xj |X1 = xi ) log P (X2 = xj |X1 = xi )
j

In (4.12) the transition probability is given by the state transition matrix for the Markov
chain
h i
P = pij = P (X2 = xj |X1 = xi )
xi ,xj ∈X

and the stationary distribution by πi = P (X1 = xj ). Hence, we get the following theo-
rem.

T HEOREM 4.14 For a stationary Markov chain with stationary distribution π and tran-
sition matrix P = [pij ], the entropy rate can be derived as
X
H∞ (X) = πi H(X2 |X1 = xi )
i
54 CHAPTER 4. INFORMATION MEASURE

where
X
H(X2 |X1 = xi ) = − pij log pij
j

In the next example the Markov chain used in Chapter 3 is reused.

E XAMPLE 4.13 The Markov chain shown in Figure 3.4 has the state transition matrix
1 2

3 3 0
1 3
P = 4 0 4
1 1
2 2 0

In Example 3.15 the steady state distribution was calculated as

π = 15 16 12

43 43 43

The conditional entropies are the entropies for each row in P ,

H(X2 |X1 = s1 ) = h( 31 ) = log 3 − 2

3
H(X2 |X1 = s2 ) = h( 14 ) = 2 − 34 log 3
H(X2 |X1 = s3 ) = h( 21 ) = 1

and the entropy rate becomes

H∞ (X) = π1 H(X2 |X1 = s1 ) + π2 H(X2 |X1 = s2 ) + π3 H(X2 |X1 = s3 )

15 1 16 1 12 1
= 43 h( 3 ) + 43 h( 4 ) + 43 h( 2 )
3 34
= 43 log 3 + 43 ≈ 0.9013 bit/symbol

In this example the entropy of the stationary distribution is

15 16 12
H , , ≈ 1.58
43 43 43
which can be seen as a measure of the entropy when the relations between the symbols
is neglected. We see here that the uncertainty is lower when we consider the whole
sequence instead of independent variables.

4.5 Random Walk

To be done

MIT6 02F12 Chap02
No ratings yet
MIT6 02F12 Chap02
12 pages
Iict Unit One
No ratings yet
Iict Unit One
35 pages
Probability & Information: Prof. J Bapat
No ratings yet
Probability & Information: Prof. J Bapat
20 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
Information, Entropy, and The Motivation For Source Codes: Hapter
No ratings yet
Information, Entropy, and The Motivation For Source Codes: Hapter
12 pages
Ch5 Entropy and Information
No ratings yet
Ch5 Entropy and Information
77 pages
Lecture-1 Information Theory
No ratings yet
Lecture-1 Information Theory
20 pages
Itc Class1
No ratings yet
Itc Class1
16 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Lec35 - 210108062 - ZAINAB ALI
No ratings yet
Lec35 - 210108062 - ZAINAB ALI
9 pages
Unit IV - Information Theory
No ratings yet
Unit IV - Information Theory
17 pages
Basic Information Theory: Thinh Nguyen Oregon State University
No ratings yet
Basic Information Theory: Thinh Nguyen Oregon State University
17 pages
Self Information
No ratings yet
Self Information
3 pages
Quantum Info for Physics Students
No ratings yet
Quantum Info for Physics Students
64 pages
Information & Coding Theory Basics
No ratings yet
Information & Coding Theory Basics
34 pages
Lec 2
No ratings yet
Lec 2
17 pages
EE Final
No ratings yet
EE Final
27 pages
Unit 1 ppt2 Information Theory
No ratings yet
Unit 1 ppt2 Information Theory
24 pages
Information Theory - Updated
No ratings yet
Information Theory - Updated
31 pages
ch3 PDF
No ratings yet
ch3 PDF
70 pages
Carter - An Introduction To Information Theory and Entropy
No ratings yet
Carter - An Introduction To Information Theory and Entropy
126 pages
An Introduction To Information Theory and Entropy: Tom Carter CSU Stanislaus
No ratings yet
An Introduction To Information Theory and Entropy: Tom Carter CSU Stanislaus
139 pages
Unit 5 Partial
No ratings yet
Unit 5 Partial
116 pages
Info Theory1
No ratings yet
Info Theory1
20 pages
Communication Theory II - Lecture 6
No ratings yet
Communication Theory II - Lecture 6
22 pages
Information Theory Basics
No ratings yet
Information Theory Basics
26 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Information Theory Basics & Examples
No ratings yet
Information Theory Basics & Examples
29 pages
Information Theory
No ratings yet
Information Theory
41 pages
Lecture - Information Theory - Part-1
No ratings yet
Lecture - Information Theory - Part-1
33 pages
An Introduction To Information Theory and Entropy: Tom Carter
No ratings yet
An Introduction To Information Theory and Entropy: Tom Carter
139 pages
Gentle Intro To Information Theory
No ratings yet
Gentle Intro To Information Theory
139 pages
Information Theory
50% (2)
Information Theory
30 pages
It Co 1 en
No ratings yet
It Co 1 en
26 pages
Mod 1 FSD
No ratings yet
Mod 1 FSD
87 pages
Module 1
No ratings yet
Module 1
40 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
Information Content and Surprise in Probability
No ratings yet
Information Content and Surprise in Probability
3 pages
Paper Theory On Information Theory
No ratings yet
Paper Theory On Information Theory
15 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
12 pages
Untitled Presentation
No ratings yet
Untitled Presentation
11 pages
Information Theory: Prepared By: Amit Degada Teaching Assistant, ECED, NIT Surat
No ratings yet
Information Theory: Prepared By: Amit Degada Teaching Assistant, ECED, NIT Surat
30 pages
Information Theory and Coding NOTES
No ratings yet
Information Theory and Coding NOTES
129 pages
Sub Bab 15
No ratings yet
Sub Bab 15
14 pages
17 - Lecture 17 Shannon Information
No ratings yet
17 - Lecture 17 Shannon Information
35 pages
Measure of Information
No ratings yet
Measure of Information
92 pages
Intro to Information Theory
No ratings yet
Intro to Information Theory
17 pages
Info Theory & Coding Course
No ratings yet
Info Theory & Coding Course
35 pages
Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
17 pages
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Chapter 1 (A)
No ratings yet
Chapter 1 (A)
30 pages
Introduction to Shannon's Information Theory
0% (1)
Introduction to Shannon's Information Theory
23 pages
Information Theory in Machine Learning
No ratings yet
Information Theory in Machine Learning
3 pages
Lecture2 1
No ratings yet
Lecture2 1
37 pages
Lec11 Image Registration
No ratings yet
Lec11 Image Registration
46 pages
An Information-Theoretic Perspective of Tf-Idf Measures: Akiko Aizawa
No ratings yet
An Information-Theoretic Perspective of Tf-Idf Measures: Akiko Aizawa
21 pages
Alajaji Chen2018 Book AnIntroductionToSingle UserInf
100% (1)
Alajaji Chen2018 Book AnIntroductionToSingle UserInf
333 pages
NVAE: Hierarchical VAE for Image Generation
No ratings yet
NVAE: Hierarchical VAE for Image Generation
20 pages
Unit 1
100% (2)
Unit 1
45 pages
Lloyd 2001 Measures of Complexity - A Nonexhaustive List
No ratings yet
Lloyd 2001 Measures of Complexity - A Nonexhaustive List
2 pages
UNIT 3-Clustering Metrics
No ratings yet
UNIT 3-Clustering Metrics
54 pages
Exposing The Deception Deepfake Detection
No ratings yet
Exposing The Deception Deepfake Detection
13 pages
Itc MCQ
No ratings yet
Itc MCQ
5 pages
Information Theory and Language
100% (1)
Information Theory and Language
246 pages
Advanced Engineering Design An Integrated Approach 1st Edition by Efren Benavides ISBNÂ 1856175815 9780857090935 Available Full Chapters
No ratings yet
Advanced Engineering Design An Integrated Approach 1st Edition by Efren Benavides ISBNÂ 1856175815 9780857090935 Available Full Chapters
157 pages
Information Theory Exercises
No ratings yet
Information Theory Exercises
4 pages
lời giải
No ratings yet
lời giải
52 pages
The Scientific Status of The Labour Theory of Value: W. Paul Cockshott and Allin Cottrell March 25, 1997
No ratings yet
The Scientific Status of The Labour Theory of Value: W. Paul Cockshott and Allin Cottrell March 25, 1997
12 pages
Tutorial 2
No ratings yet
Tutorial 2
3 pages
Information Theory For Complex Systems Scientists
No ratings yet
Information Theory For Complex Systems Scientists
112 pages
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
No ratings yet
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
34 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
13 pages
Information Theory Coding and Cryptograp PDF
No ratings yet
Information Theory Coding and Cryptograp PDF
140 pages
Mutual Information
No ratings yet
Mutual Information
3 pages
Itc 03
No ratings yet
Itc 03
24 pages
Community Detection
No ratings yet
Community Detection
72 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Full Download Bayesian Brain Probabilistic Approaches To Neural Coding Computational Neuroscience Kenji Doya PDF
No ratings yet
Full Download Bayesian Brain Probabilistic Approaches To Neural Coding Computational Neuroscience Kenji Doya PDF
77 pages
Ee5143 Pset1 PDF
No ratings yet
Ee5143 Pset1 PDF
4 pages
Bayesian OoD Detection Insights
No ratings yet
Bayesian OoD Detection Insights
14 pages
Vector Semantics
No ratings yet
Vector Semantics
18 pages
21EC301 DC Unit I Information Theory Materials
No ratings yet
21EC301 DC Unit I Information Theory Materials
35 pages