0. 2. Estimation and bias 2.1.
Estimators
Estimators
Suppose that X1 , . . . , Xn are iid, each with pdf/pmf fX (x | θ), θ unknown.
We aim to estimate θ by a statistic, ie by a function T of the data.
Lecture 2. Estimation, bias, and mean squared error If X = x = (x1 , . . . , xn ) then our estimate is θ̂ = T (x) (does not involve θ).
Then T (X) is our estimator of θ, and is a rv since it inherits random
fluctuations from those of X.
The distribution of T = T (X) is called its sampling distribution.
Example
Let X1 , . . . , Xn be iid N(µ, 1).
1
P
A possible estimator for µ is T (X) = n Xi .
1
P
For any particular observed sample x, our estimate is T (x) = n xi .
We have T (X) ∼ N(µ, 1/n).
Lecture 2. Estimation, bias, and mean squared error 1 (1–7) Lecture 2. Estimation, bias, and mean squared error 2 (1–7)
2. Estimation and bias 2.2. Bias 2. Estimation and bias 2.3. Mean squared error
Mean squared error
If θ̂ = T (X) is an estimator of θ, then the bias of θ̂ is the difference between its Recall that an estimator T is a function of the data, and hence is a random
expectation and the ’true’ value: i.e. quantity. Roughly, we prefer estimators whose sampling distributions “cluster
more closely” around the true value of θ, whatever that value might be.
bias(θ̂) = Eθ (θ̂) − θ.
Definition 2.1
An estimator T (X) is unbiased for θ if Eθ T (X) = θ for all θ, otherwise it is
The mean squared error (mse) of an estimator θ̂ is Eθ (θ̂ − θ)2 .
biased.
In the above example, Eµ (T ) = µ so T is unbiased for µ.
For an unbiased estimator, the mse is just the variance. In general
Eθ (θ̂ − θ)2 = Eθ (θ̂ − Eθ θ̂ + Eθ θ̂ − θ)2
2
[Notation note: when a parameter subscript is used with an expectation or = Eθ (θ̂ − Eθ θ̂)2 + Eθ (θ̂) − θ + 2 Eθ (θ̂) − θ Eθ θ̂ − Eθ θ̂
variance, it refers to the parameter that is being conditioned on. i.e. the
= varθ (θ̂) + bias2 (θ̂),
expectation or variance will be a function of the subscript]
where bias(θ̂) = Eθ (θ̂) − θ.
[NB: sometimes it can be preferable to have a biased estimator with a low
variance - this is sometimes known as the ’bias-variance tradeoff’.]
Lecture 2. Estimation, bias, and mean squared error 3 (1–7) Lecture 2. Estimation, bias, and mean squared error 4 (1–7)
2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean 2. Estimation and bias 2.4. Example: Alternative estimators for Binomial mean
Example: Alternative estimators for Binomial mean
Suppose X ∼ Binomial(n, θ), and we want to estimate θ.
The standard estimator is TU = X /n, which is Unbiassed since
Eθ (TU ) = nθ/n = θ.
TU has variance varθ (TU ) = varθ (X )/n2 = θ(1 − θ)/n.
Consider an alternative estimator TB = Xn+2 +1
= w Xn + (1 − w ) 12 , where
w = n/(n + 2). TB is a weighted average of X /n and 12 .
e.g. if X is 8 successes out of 10 trials, we would estimate the underlying
success probability as T (8) = 9/12 = 0.75, rather than 0.8.
Then Eθ (TB ) − θ = nθ+1 1
n+2 − θ = (1 − w ) 2 − θ , and so it is biased.
varθ (X )
varθ (TB ) = (n+2)2 = w 2 θ(1 − θ)/n.
Now mse(TU ) = varθ (TU ) + bias2 (TU ) = θ(1 − θ)/n. So the biased estimator has smaller MSE in much of the range of θ
2 2 2 1
2 TB may be preferable if we do not think θ is near 0 or 1.
mse(TB ) = varθ (TB ) + bias (TB ) = w θ(1 − θ)/n + (1 − w ) 2 −θ
So our prior judgement about θ might affect our choice of estimator.
Will see more of this when we come to Bayesian methods,.
Lecture 2. Estimation, bias, and mean squared error 5 (1–7) Lecture 2. Estimation, bias, and mean squared error 6 (1–7)
2. Estimation and bias 2.5. Why unbiasedness is not necessarily so great
Why unbiasedness is not necessarily so great
Suppose X ∼ Poisson(λ), and for some reason (which escapes me for the
2
moment), you want to estimate θ = [P(X = 0)] = e −2λ .
Then any unbiassed estimator T (X ) must satisfy Eθ (T (X )) = θ, or equivalently
∞
X λx
Eλ (T (X )) = e −λ T (x) = e −2λ .
x=0
x!
The only function T that can satisfy this equation is T (X ) = (−1)X [coefficients
of polynomial must match].
Thus the only unbiassed estimator estimates e −2λ to be 1 if X is even, -1 if X is
odd.
This is not sensible.
Lecture 2. Estimation, bias, and mean squared error 7 (1–7)