Bayesian statistics
So far we have thought of probabilities as the long term
“success frequency”: #successes / #trails → P(success).
In Bayesian statistics probabilities are subjective!
Examples
* Probability that two companies merge
* Probability that a stock goes up
* Probability that it rains tomorrow
We typically want to make inference for a parameter θ, for
example µ, σ2 or π. How is this done using subjective
probabilities?
1 lecture 8
Bayesian statistics
Bayesian idea: We describe our “knowledge” about the
parameter of interest, θ, in terms of a distribution π(θ). This
is known as the prior distriubtion (or just prior) – as it
describes the situation before we see any data.
Example: Assume θ is the probability of success.
Prior distributions describing what value we think θ has:
2 lecture 8
Bayesian statistics
Posterior
Let x denote our data. The conditional distribution of θ
given data x is denoted the posterior distribution:
f (x | θ )π (θ )
π (θ | x ) =
g (x )
Here f(x|θ) how data is specified conditional on θ.
Example:
Let x denote the number of successes in n trail.
Conditional θ, x follows a binomial distribution:
n x
f ( x | θ ) = θ (1 − θ ) n − x
x
3 lecture 8
Bayesian statistics
Posterior – some data
We now observe n=10 experiment with x=3 successes, i.e.
x/n=0.3
Posterior distributions – our “knowledge” after having seen
data.
Shaded area: Prior distribution
Solid line: Posterior distribution
Notice that the posteriors are moving towards 0.3.
4 lecture 8
Bayesian statistics
Posterior – some data
We now observe n = 100 experiment with x = 30 successes,
i.e. x/n = 0.3
Posterior distributions – our “knowledge” after having seen
data.
Shaded area: Prior distribution
Solid line: Posterior distribution
Notice that the posteriors are almost identical.
5 lecture 8
Bayesian statistics
Mathematical details
The prior is given by a so-called Beta distribution with
parameters α > 0 and β > 0:
Γ(α + β ) α −1
π (θ ) = θ (1 − θ ) β −1 for 0 ≤ θ ≤ 1
Γ(α )Γ( β )
The posterior then becomes
Γ(α + β + n)
π (θ | x) = θ α + x −1 (1 − θ ) β + n − x −1
Γ(α + x)Γ( β + n − x)
a Beta distribution with parameters α+x and β+n−x.
6 lecture 8