0% found this document useful (0 votes)

5 views8 pages

7.4 - Bayesian Estimation - 2

The document discusses Bayesian estimation, focusing on modeling parameters as random variables with prior distributions and updating these with observed data using Bayes' theorem. It explains the concepts of posterior distributions, Bayesian estimators, and their properties, including bias and mean square error. Additionally, it highlights the use of conjugate families and provides examples, particularly with the Bernoulli distribution and beta prior distributions.

Uploaded by

Manimaran Aravind

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views8 pages

7.4 - Bayesian Estimation - 2

Uploaded by

Manimaran Aravind

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

9/8/25, 12:32 PM 7.

4: Bayesian Estimation - Statistics LibreTexts

7.4: Bayesian Estimation

Basic Theory
The General Method
Suppose again that we have an observable random variable X for an experiment, that takes values in a set S . Suppose
also that distribution of X depends on a parameter θ taking values in a parameter space T . Of course, our data variable
X is almost always vector-valued, so that typically S ⊆ R for some n ∈ N . Depending on the nature of the sample
n
+

space S , the distribution of X may be discrete or continuous. The parameter θ may also be vector-valued, so that
typically T ⊆ R for some k ∈ N .
k
+

In Bayesian analysis, named for the famous Thomas Bayes, we model the deterministic, but unknown parameter θ with a
random variable Θ that has a specified distribution on the parameter space T . Depending on the nature of the
parameter space, this distribution may also be either discrete or continuous. It is called the prior distribution of Θ and is
intended to reflect our knowledge of the parameter θ , before we gather data. After observing X = x ∈ S , we then use
Bayes' theorem, to compute the conditional distribution of Θ given X = x. This distribution is called the posterior
distribution of Θ, and is an updated distribution, given the information in the data. Here is the mathematical
description, stated in terms of probability density functions.

Suppose that the prior distribution of Θ on T has probability density function h, and that given Θ = θ ∈ T , the
conditional probability density function of X on S is f (⋅ ∣ θ). Then the probability density function of the posterior
distribution of Θ given X = x ∈ S is

h(θ)f (x ∣ θ)
h(θ ∣ x) = , θ ∈ T (7.4.1)
f (x)

where the function in the denominator is defined as follows, in the discrete and continuous cases, respectively:

f (x) = ∑ h(θ)f (x|θ), x ∈ S

θ∈T

f (x) = ∫ h(θ)f (x ∣ θ) dθ, x ∈ S

Proof

For x ∈ S , note that f (x) is simply the normalizing constant for the function θ ↦ h(θ)f (x ∣ θ) . It may not be necessary
to explicitly compute f (x), if one can recognize the functional form of θ ↦ h(θ)f (x ∣ θ) as that of a known
distribution. This will indeed be the case in several of the examples explored below.
If the parameter space T has finite measure c (counting measure in the discrete case or Lebesgue measure in the
continuous case), then one possible prior distribution is the uniform distribution on T , with probability density
function h(θ) = 1/c for θ ∈ T . This distribution reflects no prior knowledge about the parameter, and so is called the
non-informative prior distributioon.

Random Samples
Of course, an important and essential special case occurs when X = (X , X , … , X ) is a random sample of size n
1 2 n

from the distribution of a basic variable X. Specifically, suppose that X takes values in a set R and has probability
density function g(⋅ ∣ θ) for a given θ ∈ T . In this case, S = R and the probability density function f (⋅ ∣ θ) of X given
n

θ is

f (x1 , x2 , … , xn ∣ θ) = g(x1 ∣ θ)g(x2 ∣ θ) ⋯ g(xn ∣ θ), (x 1 , x 2 , … , x n ) ∈ S (7.4.3)

Real Parameters
Suppose that θ is a real-valued parameter, so that T ⊆ R . Here is our main definition.

The conditional expected value E(Θ ∣ X) is the Bayesian estimator of θ .

1. If Θ has a discrete distribution on T then

E(Θ ∣ X = x) = ∑ θh(θ ∣ x), x ∈ S (7.4.4)

θ∈T

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 1/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
2. If Θ has a continuous distribution on T then

E(Θ ∣ X = x) = ∫ θh(θ ∣ x)dθ, x ∈ S (7.4.5)

Recall that E(Θ ∣ X) is a function of X and, among all functions of X , is closest to Θ in the mean square sense. Of
course, once we collect the data and observe X = x, the Bayesian estimate of θ is E(Θ ∣ X = x). As always, the term
estimator refers to a random variable, before the data are collected, and the term estimate refers to an observed value of
the random variable after the data are collected. The definitions of bias and mean square error are as before, but now
conditioned on Θ = θ ∈ T .

Suppose that U is the Bayes estimator of θ .

1. The bias of U is bias(U ∣ θ) = E(U − θ ∣ Θ = θ) for θ ∈ T .
2. The mean square error of U is mse(U ∣ θ) = E[(U − θ) ∣ Θ = θ] for θ ∈ T .
2

As before, bias(U ∣ θ) = E(U ∣ θ) − θ and mse(U ∣ θ) = var(U ∣ θ) + bias (U ∣ θ) . Suppose now that we observe the
2

random variables (X , X , X , …) sequentially, and we compute the Bayes estimator U of θ based on

1 2 3 n

(X , X , … , X ) for each n ∈ N . Again, the most common case is when we are sampling from a distribution, so that
1 2 n +

the sequence is independent and identically distributed (given θ ). We have the natural asymptotic properties that we
have seen before.

Let U = (U n : n ∈ N + ) be the sequence of Bayes estimators of θ as above.

1. U is asymptotically unbiased if bias(U n
∣ θ) → 0 as n → ∞ for each θ ∈ T .
2. U is mean-square consistent if mse(U n
∣ θ) → 0 as n → ∞ for each θ ∈ T .

Often we cannot construct unbiased Bayesian estimators, but we do hope that our estimators are at least asymptotically
unbiased and consistent. It turns out that the sequence of Bayesian estimators U is a martingale. The theory of
martingales provides some powerful tools for studying these estimators.
From the Bayesian perspective, the posterior distribution of Θ given the data X = x is of primary importance. Point
estimates of θ derived from this distribution are of secondary importance. In particular, the mean square error function
u ↦ E[(Θ − u) ∣ X = x) , minimized as we have noted at E(Θ ∣ X = x), is not the only loss function that can be used.
2

(Although it's the only one that we consider.) Another possible loss function, among many, is the mean absolute error
function u ↦ E(|Θ − u| ∣ X = x), which we know is minimized at the median(s) of the posterior distribution.

Conjugate Families
Often, the prior distribution of Θ is itself a member of a parametric family, with the parameters specified to reflect our
prior knowledge of θ . In many important special cases, the parametric family can be chosen so that the posterior
distribution of Θ given X = x belongs to the same family for each x ∈ S . In such a case, the family of distributions of
Θ is said to be conjugate to the family of distributions of X . Conjugate families are nice from a computational point of

view, since we can often compute the posterior distribution through a simple formula involving the parameters of the
family, without having to use Bayes' theorem directly. Similarly, in the case that the parameter is real valued, we can
often compute the Bayesian estimator through a simple formula involving the parameters of the conjugate family.

Special Distributions
The Bernoulli Distribution
Suppose that X = (X , X , …) is sequence of independent variables, each having the Bernoulli distribution with
1 2

unknown success parameter p ∈ (0, 1). In short, X is a sequence of Bernoulli trials, given p. In the usual language of
reliability, X = 1 means success on trial i and X = 0 means failure on trial i . Recall that given p, the Bernoulli
i i

distribution has probability density function

x 1−x
g(x ∣ p) = p (1 − p) , x ∈ {0, 1} (7.4.6)

Note that the number of successes in the first trials is . Given p, random variable has the binomial
n
n Yn = ∑ Xi Yn
i=1

distribution with parameters n and p.

Suppose now that we model p with a random variable P that has a prior beta distribution with left parameter
a ∈ (0, ∞) and right parameter b ∈ (0, ∞) , where a and b are chosen to reflect our initial information about p. So P has

probability density function

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 2/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
1 a−1 b−1
h(p) = p (1 − p) , p ∈ (0, 1) (7.4.7)
B(a, b)

and has mean a/(a + b) . For example, if we know nothing about p, we might let a = b = 1 , so that the prior
distribution is uniform on the parameter space (0, 1) (the non-informative prior). On the other hand, if we believe that
p is about , we might let a = 4 and b = 2, so that the prior distribution is unimodal, with mean . As a random
2 2

3 3

process, the sequence X with p randomized by P , is known as the beta-Bernoulli process, and is very interesting on its
own, outside of the context of Bayesian estimation.

For n ∈ N , the posterior distribution of

+
P given X n = (X 1 , X 2 , … , X n ) is beta with left parameter a + Yn and
right parameter b + (n − Y ). n

Proof

Thus, the beta distribution is conjugate to the Bernoulli distribution. Note also that the posterior distribution depends
on the data vector X only through the number of successes Y . This is true because Y is a sufficient statistic for p. In
n n n

particular, note that the left beta parameter is increased by the number of successes Y and the right beta parameter is n

increased by the number of failures n − Y . n

The Bayesian estimator of p given X is n

a + Yn
Un = (7.4.10)
a + b + n

Proof

In the beta coin experiment, set n = 20 and p = 0.3, and set a = 4 and b = 2. Run the simulation 100 times and note
the estimate of p and the shape and location of the posterior probability density function of p on each run.

Next let's compute the bias and mean-square error functions.

For n ∈ N , +

a(1 − p) − bp
bias(Un ∣ p) = , p ∈ (0, 1) (7.4.11)
a + b + n

The sequence U = (U n : n ∈ N + ) is asymptotically unbiased.

Proof

Note also that we cannot choose a and b to make Un unbiased, since such a choice would involve the true value of p,
which we do not know.

In the beta coin experiment, vary the parameters and note the change in the bias. Now set n = 20 and p = 0.8, and
set a = 2 and b = 6. Run the simulation 1000 times. Note the estimate of p and the shape and location of the
posterior probability density function of p on each update. Compare the empirical bias to the true bias.

For n ∈ N , +

2 2 2
p[n − 2 a(a + b)] + p [(a + b) − n] + a
mse(Un ∣ p) = , p ∈ (0, 1) (7.4.13)
2
(a + b + n)

The sequence (U n
: n ∈ N+ ) is mean-square consistent.
Proof

In the beta coin experiment, vary the parameters and note the change in the mean square error. Now set n = 10
and p = 0.7, and set a = b = 1 . Run the simulation 1000 times. Note the estimate of p and the shape and location of
the posterior probability density function of p on each update. Compare the empirical mean square error to the true
mean square error.

Interestingly, we can choose a and b so that U has mean square error that is independent of the unknown parameter p:

Let n ∈ N and let a = b = √−

+
n/2. Then
−

n
mse(U n ∣ p) = , p ∈ (0, 1) (7.4.16)
− 2
4(n + √n )

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 3/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
In the beta coin experiment, set n = 36 and a = b = 3 . Vary p and note that the mean square error does not change.
Now set p = 0.8 and run the simulation 1000 times. Note the estimate of p and the shape and location of the
posterior probability density function on each update. Compare the empirical bias and mean square error to the
true values.

Recall that the method of moments estimator and the maximum likelihood estimator of p (on the interval (0, 1) ) is the
sample mean (the proportion of successes):
n
Y 1
Mn = = ∑ Xi (7.4.17)
n n
i=1

This estimator has mean square error mse(Mn ∣ p) =

n
p(1 − p) . To see the connection between the estimators, note
from (6) that
a + b a n
Un = + Mn (7.4.18)
a + b + n a + b a + b + n

So U is a weighted average of
n
a/(a + b) (the mean of the prior distribution) and Mn (the maximum likelihood
estimator).

Another Bernoulli Distribution

Bayesian estimation, like other forms of parametric estimation, depends critically on the parameter space. Suppose
again that (X , X , …) is a sequence of Bernoulli trials, given the unknown success parameter p, but suppose now that
1 2

the parameter space is { , 1}. This setup corresponds to the tossing of a coin that is either fair or two-headed, but we
1

don't know which. We model p with a random variable P that has the prior probability density function h given by
h(1) = a , h ( ) = 1 − a, where a ∈ (0, 1) is chosen to reflect our prior knowledge of the probability that the coin is
1

two-headed. If we are completely ignorant, we might let a = (the non-informative prior). If with think the coin is 1

more likely to be two-headed, we might let a = . Again let Y = ∑ X for n ∈ N . 3 n

n i=1 i +
4

The posterior distribution of P given X n

= (X 1 , X 2 , … , X n ) is
n

1. h(1 ∣ X n
) = n
2 a

2 a+(1−a)
if Y n
= n and h(1 ∣ X n
) = 0 if Y n
< n

2. h ( 1

2
∣ Xn ) = n
1−a
if Y n
= n and h ( 1

2
∣ Xn ) = 1 if Y n
< n
2 a+(1−a)

Proof

Now let
n+1
2 a + (1 − a)
pn = (7.4.23)
n+1
2 a + 2(1 − a)

The Bayes' estimator of p given X the statistic U defined by n n

1. U n
= pn if Y n
= n

2. U n
=
1

2
if Y n
< n

Proof

If we observe Y < n then U gives the correct answer . This certainly makes sense since we know that we do not
n n
1

have the two-headed coin. On the other hand, if we observe Y = n then we are not certain which coin we have, and n

the Bayesian estimate p is not even in the parameter space! But note that p → 1 as n → ∞ exponentially fast. Next
n n

let's compute the bias and mean-square error for a given p ∈ { , 1}. 1

For n ∈ N , +

1. bias(U n ∣ 1) = pn − 1
n
2. bias (U
1 1 1
n ∣ ) = ( ) (pn − )
2 2 2

The sequence of estimators (U n

: n ∈ N+ ) is asymptotically unbiased.
Proof

If p = 1, the estimator U is negatively biased; we noted this earlier. If

n
p =
1

2
, then Un is positively biased for
sufficiently large n (depending on a).

For n ∈ N , +

1. mse(U n
∣ 1) = (pn − 1)
2

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 4/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts

2. mse (U 1 1 n 1 2

n
∣ ) = ( ) (pn − )
2 2 2

The sequence of estimators U = (U n : n ∈ N + ) is mean-square consistent.

Proof

The Geometric distribution

Suppose that X = (X , X , …) is a sequence of independent random variables, each having the geometric distribution
1 2

on N with unknown success parameter p ∈ (0, 1). Recall that these variables can be interpreted as the number of trials
+

between successive successes in a sequence of Bernoulli trials. Given p, the geometric distribution has probability
density function
x−1
g(x ∣ p) = p(1 − p) , x ∈ N+ (7.4.29)

Once again for n ∈ N , let Y = ∑ X . In this setting, Y is the trial number of the nth success, and given p, has the
n
+ n i=1 i n

negative binomial distribution with parameters n and p.

Suppose now that we model p with a random variable P having a prior beta distribution with left parameter
a ∈ (0, ∞) and right parameter b ∈ (0, ∞) . As usual, a and b are chosen to reflect our prior knowledge of p.

The posterior distribution of P given X n = (X 1 , X 2 , … , X n ) is beta with left parameter a + n and right parameter
b + (Y − n) .
n

Proof

Thus, the beta distribution is conjugate to the geometric distribution. Moreover, note that in the posterior beta
distribution, the left parameter is increased by the number of successes n while the right parameter is increased by the
number of failures Y − n , just as in the Bernoulli model. In particular, the posterior left parameter is deterministic and
depends on the data only through the sample size n.

The Bayesian estimator of p based on X is n

a + n
Vn = (7.4.32)
a + b + Yn

Proof

Recall that the method of moments estimator of p, and the maximum likelihood estimator of p on the interval (0, 1) are
both W = 1/M = n/Y . To see the connection between the estimators, note from (19) that
n n n

1 a a + b n 1
= + (7.4.33)
Vn a + n a a + n Wn

So 1/V (the reciprocal of the Bayesian estimator) is a weighted average of (a + b)/a (the reciprocal of the mean of the
n

prior distribution) and 1/W (the reciprocal of the maximum likelihood estimator).
n

The Poisson Distribution

Suppose that X = (X , X , …) is a sequence of random variable each having the Poisson distribution with unknown
1 2

parameter λ ∈ (0, ∞). Recall that the Poisson distribution is often used to model the number of “random points” in a
region of time or space and is studied in more detail in the chapter on the Poisson Process. The distribution is named
for the inimitable Simeon Poisson and given λ, has probability density function
x
−λ
λ
g(x ∣ λ) = e , x ∈ N (7.4.34)
x!

Once again, for , let . Given , random variable also has a Poisson distribution, but with
n
n ∈ N+ Yn = ∑i=1 X i λ Yn

parameter nλ.
Suppose now that we model λ with a random variable Λ having a prior gamma distribution with shape parameter
k ∈ (0, ∞) and rate parameter r ∈ (0, ∞). As usual k and r are chosen to reflect our prior knowledge of λ . Thus the

prior probability density function of Λ is

k
r k−1 −rλ
h(λ) = λ e , λ ∈ (0, ∞) (7.4.35)
Γ(k)

and the mean is k/r. The scale parameter of the gamma distribution is b = 1/r, but the formulas will work out nicer if
we use the rate parameter.

The posterior distribution of Λ given X n = (X 1 , X 2 , … , X n ) is gamma with shape parameter k + Yn and rate
parameter r + n.
https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 5/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
Proof

It follows that the gamma distribution is conjugate to the Poisson distribution. Note that the posterior rate parameter is
deterministic and depends on the data only through the sample size n.

The Bayesian estimator of λ based on X n = (X 1 , X 2 , … , X n ) is

k + Yn
Vn = (7.4.39)
r + n

Proof

Since V is a linear function of Y , and we know the distribution of Y given λ ∈ (0, ∞), we can compute the bias and
n n n

mean-square error functions.

For n ∈ N , +

k − rλ
bias(Vn ∣ λ) = , λ ∈ (0, ∞) (7.4.40)
r + n

The sequence of estimators V = (V n : n ∈ N + ) is asymptotically unbiased.

Proof

Note that, as before, we cannot choose k and r to make V unbiased, without knowledge of λ.
n

For n ∈ N , +

2
nλ + (k − rλ)
mse(Vn ∣ λ) = , λ ∈ (0, ∞) (7.4.42)
(r + n)2

The sequence of estimators V = (V n : n ∈ N + ) is mean-square consistent.

Proof

Recall that the method of moments estimator of λ and the maximum likelihood estimator of λ on the interval (0, ∞)
are both M = Y /n , the sample mean. This estimator is unbiased and has mean square error λ/n . To see the
n n

connection between the estimators, note from (21) that

r k n
Vn = + Mn (7.4.44)
r + n r r + n

So V is a weighted average of k/r (the mean of the prior distribution) and M (the maximum likelihood estimator).
n n

The Normal Distribution

Suppose that X = (X , X , …) is a sequence of independent random variables, each having the normal distribution
1 2

with unknown mean μ ∈ R but known variance σ ∈ (0, ∞) . Of course, the normal distribution plays an especially
2

important role in statistics, in part because of the central limit theorem. The normal distribution is widely used to
model physical quantities subject to numerous small, random errors. In many statistical applications, the variance of
the normal distribution is more stable than the mean, so the assumption that the variance is known is not entirely
artificial. Recall that the normal probability density function (given μ) is
2
1 1 x − μ
g(x ∣ μ) = exp[− ( ) ], x ∈ R (7.4.45)
−−−
√ 2 πσ 2 σ

Again, for n ∈ N let . Recall that also has a normal distribution (given ) but with mean and
n
+
Yn = ∑ Xi Yn μ nμ
i=1

variance nσ . 2

Suppose now that μ is modeled by a random variable Ψ that has a prior normal distribution with mean a ∈ R and
variance b ∈ (0, ∞) . As usual, a and b are chosen to reflect our prior knowledge of μ. An interesting special case is
2

when we take b = σ, so the variance of the prior distribution of Ψ is the same as the variance of the underlying
sampling distribution.

For n ∈ N , the posterior distribution of Ψ given X

+ n = (X 1 , X 2 , … , X n ) is normal with mean and variance given
by
2 2
Yn b + aσ
E(Ψ ∣ Xn ) = (7.4.46)
nb2 + σ 2
2 2
σ b
var(Ψ ∣ Xn ) = (7.4.47)
nb2 + σ 2

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 6/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
Proof

Therefore, the normal distribution is conjugate to the normal distribution with unknown mean and known variance.
Note that the posterior variance is deterministic, and depends on the data only through the sample size n. In the special
case that b = σ, the posterior distribution of Ψ given X is normal with mean (Y + a)/(n + 1) and variance
n n

σ /(n + 1) .
2

The Bayesian estimator of μ is

2 2
Yn b + aσ
Un = (7.4.55)
nb2 + σ 2

Proof

Note that U n
= (Yn + a)/(n + 1) in the special case that b = σ.

For n ∈ N , +

2
σ (a − μ)
bias(Un ∣ μ) = , μ ∈ R (7.4.56)
σ 2 + n b2

The sequence of estimators U = (U n : n ∈ N + ) is asymptotically unbiased.

Proof

When b = σ, bias(U n
∣ μ) = (a − μ)/(n + 1) .

For n ∈ N , +

2 4 4 2
nσ b + σ (a − μ)
mse(Un ∣ μ) = , μ ∈ R (7.4.58)
(σ 2 + n b2 )2

The sequence of estimators U = (U n : n ∈ N + ) is mean-square consistent.

Proof

When b = σ, mse(U ∣ μ) = [nσ + (a − μ) ]/(n + 1) . Recall that the method of moments estimator of μ and the
2 2 2

maximum likelihood estimator of μ on R are both M = Y /n , the sample mean. This estimator is unbiased and has
n n

mean square error var(M ) = σ /n. To see the connection between the estimators, note from (25) that
2

2 2
σ nb
Un = a + Mn (7.4.60)
2 2 2 2
nb + σ nb + σ

So U is a weighted average of a (the mean of the prior distribution) and M (the maximum likelihood estimator).
n n

The Beta Distribution

Suppose that X = (X , X , …) is a sequence of independent random variables each having the beta distribution with
1 2

unknown left shape parameter a ∈ (0, ∞) and right shape parameter b = 1. The beta distribution is widely used to
model random proportions and probabilities and other variables that take values in bounded intervals (scaled to take
values in (0, 1)). Recall that the probability density function (given a) is
a−1
g(x ∣ a) = a x , x ∈ (0, 1) (7.4.61)

Suppose now that a is modeled by a random variable A that has a prior gamma distribution with shape parameter
k ∈ (0, ∞) and rate parameter r ∈ (0, ∞). As usual, k and r are chosen to reflect our prior knowledge of a . Thus the

prior probabiltiy density function of A is

k
r
k−1 −ra
h(a) = a e , a ∈ (0, ∞) (7.4.62)
Γ(k)

The mean of the prior distribution is k/r.

The posterior distribution of A given X n = (X 1 , X 2 , … , X n ) is gamma, with shape parameter k + n and rate
parameter r − ln(X X ⋯ X ) .
1 2 n

Proof

Thus, the gamma distribution is conjugate to the beta distribution with unknown left parameter and right parameter 1.
Note that the posterior shape parameter is deterministic and depends on the data only through the sample size n.

The Bayesian estimator of a based on X is n

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 7/8
9/8/25, 12:32 PM 7.4: Bayesian Estimation - Statistics LibreTexts
k + n
Un = (7.4.65)
r − ln(X 1 X 2 ⋯ X n )

Proof

Given the complicated structure, the bias and mean square error of U given a ∈ (0, ∞) would be difficult to compute
n

explicitly. Recall that the maximum likelihood estimator of a is W = −n/ ln(X X ⋯ X ). To see the connection
n 1 2 n

between the estimators, note from (29) that

1 k r n 1
= + (7.4.66)
Un k + n k k + n Wn

So 1/U (the reciprocal of the Bayesian estimator) is a weighted average of r/k (the reciprocal of the mean of the prior
n

distribution) and 1/W (the reciprocal of the maximum likelihood estimator).

The Pareto Distribution

Suppose that X = (X , X , …) is a sequence of independent random variables each having the Pareto distribution
1 2

with unknown shape parameter a ∈ (0, ∞) and scale parameter b = 1. The Pareto distribution is used to model certain
financial variables and other variables with heavy-tailed distributions, and is named for Vilfredo Pareto. Recall that the
probability density function (given a) is
a
g(x ∣ a) = , x ∈ [1, ∞) (7.4.67)
a+1
x

prior probabiltiy density function of A is

k
r k−1 −ra
h(a) = a e , a ∈ (0, ∞) (7.4.68)
Γ(k)

For n ∈ N , the posterior distribution of A given

+
X n = (X 1 , X 2 , … , X n ) is gamma, with shape parameter k + n

and rate parameter r + ln(X X ⋯ X ) .

1 2 n

Proof

Thus, the gamma distribution is conjugate to Pareto distribution with unknown shape parameter. Note that the
posterior shape parameter is deterministic and depends on the data only through the sample size n.

The Bayesian estimator of a based on X is n

k + n
Un = (7.4.71)
r + ln(X 1 X 2 ⋯ X n )

Proof

Given the complicated structure, the bias and mean square error of U given a ∈ (0, ∞) would be difficult to compute
explicitly. Recall that the maximum likelihood estimator of a is W = n/ ln(X X ⋯ X ). To see the connection
n 1 2 n

between the estimators, note from (31) that

1 k r n 1
= + (7.4.72)
Un k + n k k + n Wn

So 1/U (the reciprocal of the Bayesian estimator) is a weighted average of r/k (the reciprocal of the mean of the prior
n

distribution) and 1/W (the maximum likelihood estimator).

This page titled 7.4: Bayesian Estimation is shared under a CC BY 2.0 license and was authored, remixed, and/or curated by Kyle
Siegrist (Random Services) via source content that was edited to the style and standards of the LibreTexts platform.

https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Poi… 8/8

Bayes' Estimators: The Method
No ratings yet
Bayes' Estimators: The Method
7 pages
Bayesian Estimation
No ratings yet
Bayesian Estimation
13 pages
LN 13
No ratings yet
LN 13
5 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
An Empirical Bayes Approach
No ratings yet
An Empirical Bayes Approach
7 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Lecture 6. Bayesian Estimation
No ratings yet
Lecture 6. Bayesian Estimation
14 pages
Bayesian vs. Frequentist Statistics
No ratings yet
Bayesian vs. Frequentist Statistics
84 pages
SA302 - Lecture 24 - 2024
No ratings yet
SA302 - Lecture 24 - 2024
11 pages
Ch3 - 2009 Conjugate Families of Distributions
No ratings yet
Ch3 - 2009 Conjugate Families of Distributions
67 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
CS1 (A) Book
No ratings yet
CS1 (A) Book
38 pages
Bayesian Stats for Academics
No ratings yet
Bayesian Stats for Academics
33 pages
Expo Kundu
No ratings yet
Expo Kundu
22 pages
IT590 Bayesian Theory Lecture 2
No ratings yet
IT590 Bayesian Theory Lecture 2
5 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Ba Yes I An Estimation Theory I
No ratings yet
Ba Yes I An Estimation Theory I
90 pages
Bayesian Inference Basics
No ratings yet
Bayesian Inference Basics
7 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Bayesian Statistics 1
No ratings yet
Bayesian Statistics 1
6 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
Bayesian Inference: by Hoai Nam Nguyen September 9, 2017
No ratings yet
Bayesian Inference: by Hoai Nam Nguyen September 9, 2017
7 pages
Lecture 20 - Bayesian Analysis
No ratings yet
Lecture 20 - Bayesian Analysis
4 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Slides 1
No ratings yet
Slides 1
73 pages
Bayes
No ratings yet
Bayes
3 pages
Assign 1
No ratings yet
Assign 1
5 pages
Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions
No ratings yet
Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions
17 pages
Bayesian Conjugate Priors Explained
No ratings yet
Bayesian Conjugate Priors Explained
5 pages
ACST356 Section 4 Complete Notes
No ratings yet
ACST356 Section 4 Complete Notes
29 pages
Bayesian Statistics Exam Guide
No ratings yet
Bayesian Statistics Exam Guide
6 pages
Lecture 2 - 4 Prior
No ratings yet
Lecture 2 - 4 Prior
51 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Prob-Review Xid-8243918 1
No ratings yet
Prob-Review Xid-8243918 1
21 pages
Assignment 5 Stat Inf b3 2022 2023 PDF
No ratings yet
Assignment 5 Stat Inf b3 2022 2023 PDF
16 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Bayesian Statistics Lecture Notes
No ratings yet
Bayesian Statistics Lecture Notes
146 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
SA302 - Lecture 25 - 2024
No ratings yet
SA302 - Lecture 25 - 2024
16 pages
5 BayesHandout
No ratings yet
5 BayesHandout
15 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
A Very Gentle Note On The Construction of DP Zhang
No ratings yet
A Very Gentle Note On The Construction of DP Zhang
15 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
CH 5
No ratings yet
CH 5
45 pages
Lecture 2 ECE 4th
No ratings yet
Lecture 2 ECE 4th
17 pages
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
No ratings yet
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
6 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
21MAB 101T Unit-2 QB-AIML
No ratings yet
21MAB 101T Unit-2 QB-AIML
15 pages
21MAB 101T Unit-2 QB-AIML
No ratings yet
21MAB 101T Unit-2 QB-AIML
26 pages
Unit-1 MCQ (2 Files Merged)
No ratings yet
Unit-1 MCQ (2 Files Merged)
15 pages
Key Points and Formulas-Unit-1 and 2
No ratings yet
Key Points and Formulas-Unit-1 and 2
12 pages
Algebra Quiz for Students
No ratings yet
Algebra Quiz for Students
4 pages
Higher Engineering Mathematics 5th Edition John Bird Newest Edition 2025
100% (1)
Higher Engineering Mathematics 5th Edition John Bird Newest Edition 2025
167 pages
Problems in Electrostatics, Current Electricity
No ratings yet
Problems in Electrostatics, Current Electricity
14 pages
Fluid Mechanics-3
No ratings yet
Fluid Mechanics-3
19 pages
Connections That Couple Brain Activity To Blood Flow: News & Views
No ratings yet
Connections That Couple Brain Activity To Blood Flow: News & Views
2 pages
Pump Performance: Nomenclature
No ratings yet
Pump Performance: Nomenclature
4 pages
Practice Questions Set: Numerical Methods: t x i j ij i j πx δx i
No ratings yet
Practice Questions Set: Numerical Methods: t x i j ij i j πx δx i
3 pages
Physics MCQs for SSC CGL Exam Prep
No ratings yet
Physics MCQs for SSC CGL Exam Prep
20 pages
s7v8x Step Up Step Down Voltage Regulator Dimensions
No ratings yet
s7v8x Step Up Step Down Voltage Regulator Dimensions
1 page
Premium Borosilicate Glass in Canada - Goel Scientific
No ratings yet
Premium Borosilicate Glass in Canada - Goel Scientific
1 page
Systematic Reduction of Irreducible Representations: N H: C: G
No ratings yet
Systematic Reduction of Irreducible Representations: N H: C: G
24 pages
Liquid Penetrant Inspection 35 Question Quiz
100% (3)
Liquid Penetrant Inspection 35 Question Quiz
7 pages
A Hydrostatic Model of The Wirtz Pump - AAM
No ratings yet
A Hydrostatic Model of The Wirtz Pump - AAM
18 pages
An Experimental Investigation of Cavitation Surge
No ratings yet
An Experimental Investigation of Cavitation Surge
8 pages
Probability On Graphs Random Processes On Graphs and Lattices Second Edition Grimmett All Chapter Instant Download
100% (5)
Probability On Graphs Random Processes On Graphs and Lattices Second Edition Grimmett All Chapter Instant Download
55 pages
Optics MCQs for Physics Students
No ratings yet
Optics MCQs for Physics Students
4 pages
Optics and Photonics Overview
No ratings yet
Optics and Photonics Overview
74 pages
Lesson 1 PDF
No ratings yet
Lesson 1 PDF
7 pages
Random Variables CH2 Muqaibel
No ratings yet
Random Variables CH2 Muqaibel
62 pages
Errors On Noun - Pronoun - Verb
No ratings yet
Errors On Noun - Pronoun - Verb
25 pages
Industrial Drying with Fluidized Beds
No ratings yet
Industrial Drying with Fluidized Beds
2 pages
Wither: Directio
No ratings yet
Wither: Directio
11 pages
Class 9 Maths Revision Assingment II Cycle II
0% (1)
Class 9 Maths Revision Assingment II Cycle II
1 page
Basics of GD&T Manual - 2 Day
100% (3)
Basics of GD&T Manual - 2 Day
66 pages
Dynamic Chopra4
No ratings yet
Dynamic Chopra4
1 page
Flow Over Humps & Constrictions
No ratings yet
Flow Over Humps & Constrictions
15 pages
Industrial Gear Oil: Eni Blasia 320
No ratings yet
Industrial Gear Oil: Eni Blasia 320
2 pages
كتاب الرياضيات المتميزين الاول المتوسط
No ratings yet
كتاب الرياضيات المتميزين الاول المتوسط
196 pages
Aval and Ghabdian 2019 - Time-Dependent Reliability Analysis of RC Deep Beams-2
No ratings yet
Aval and Ghabdian 2019 - Time-Dependent Reliability Analysis of RC Deep Beams-2
16 pages
Tabelas - Incropera
No ratings yet
Tabelas - Incropera
31 pages

7.4 - Bayesian Estimation - 2

Uploaded by

7.4 - Bayesian Estimation - 2

Uploaded by

9/8/25, 12:32 PM 7.

4: Bayesian Estimation - Statistics LibreTexts

7.4: Bayesian Estimation

f (x) = ∑ h(θ)f (x|θ), x ∈ S

f (x) = ∫ h(θ)f (x ∣ θ) dθ, x ∈ S

f (x1 , x2 , … , xn ∣ θ) = g(x1 ∣ θ)g(x2 ∣ θ) ⋯ g(xn ∣ θ), (x 1 , x 2 , … , x n ) ∈ S (7.4.3)

The conditional expected value E(Θ ∣ X) is the Bayesian estimator of θ .

E(Θ ∣ X = x) = ∑ θh(θ ∣ x), x ∈ S (7.4.4)

E(Θ ∣ X = x) = ∫ θh(θ ∣ x)dθ, x ∈ S (7.4.5)

Suppose that U is the Bayes estimator of θ .

random variables (X , X , X , …) sequentially, and we compute the Bayes estimator U of θ based on

Let U = (U n : n ∈ N + ) be the sequence of Bayes estimators of θ as above.

distribution has probability density function

distribution with parameters n and p.

probability density function

For n ∈ N , the posterior distribution of

increased by the number of failures n − Y . n

The Bayesian estimator of p given X is n

Next let's compute the bias and mean-square error functions.

The sequence U = (U n : n ∈ N + ) is asymptotically unbiased.

Let n ∈ N and let a = b = √−

This estimator has mean square error mse(Mn ∣ p) =

Another Bernoulli Distribution

more likely to be two-headed, we might let a = . Again let Y = ∑ X for n ∈ N . 3 n

The posterior distribution of P given X n

The Bayes' estimator of p given X the statistic U defined by n n

The sequence of estimators (U n

If p = 1, the estimator U is negatively biased; we noted this earlier. If

The sequence of estimators U = (U n : n ∈ N + ) is mean-square consistent.

The Geometric distribution

negative binomial distribution with parameters n and p.

The Bayesian estimator of p based on X is n

The Poisson Distribution

prior probability density function of Λ is

The Bayesian estimator of λ based on X n = (X 1 , X 2 , … , X n ) is

mean-square error functions.

The sequence of estimators V = (V n : n ∈ N + ) is asymptotically unbiased.

The sequence of estimators V = (V n : n ∈ N + ) is mean-square consistent.

connection between the estimators, note from (21) that

The Normal Distribution

For n ∈ N , the posterior distribution of Ψ given X

The Bayesian estimator of μ is

The sequence of estimators U = (U n : n ∈ N + ) is asymptotically unbiased.

The sequence of estimators U = (U n : n ∈ N + ) is mean-square consistent.

The Beta Distribution

prior probabiltiy density function of A is

The mean of the prior distribution is k/r.

The Bayesian estimator of a based on X is n

between the estimators, note from (29) that

distribution) and 1/W (the reciprocal of the maximum likelihood estimator).

The Pareto Distribution

prior probabiltiy density function of A is

For n ∈ N , the posterior distribution of A given

and rate parameter r + ln(X X ⋯ X ) .

The Bayesian estimator of a based on X is n

between the estimators, note from (31) that

distribution) and 1/W (the maximum likelihood estimator).

You might also like