Problem Set 2 Solution
Problem Set 2 Solution
DEPARTMENT OF STATISTICS
n 1
g(T, µ) = exp(− µ2 ) exp(T µ) √ .
2 ( 2π)n
Pn
b) Denoting T = i=1 Xi2 , you can factorise L(X, σ 2 ) with
1 1
h(X) = 1, g(T, σ 2 ) = exp(− 2
T) √ .
2σ ( 2πσ)n
Then
n
Y
L(X, θ) = I(θ,θ+1) (xi ) = I(θ,θ+1) (x(n) )I(θ,θ+1) (x(1) ) = I(x(n) −1,∞) I(−∞,x(1) ) (θ).
i=1
X(1)
Hence T = can be taken as sufficient vector-statistic.
X(n)
Pn
d) Denoting T = i=1 Xi , you can factorise L(X, λ) with
1
g(T, λ) = exp(−nλ)λT and h(X) = Qn .
i=1 Xi !
Pn
Since i=1 Xi ∼ Po(nλ), the latter expression on the right can be shown to be equal to
t!
Qn
nt i=1 xi !
Pn
and it does not depend on λ. Hence T = i=1 Xi is sufficient according to the original definition
of sufficiency.
1
Question 2: For S = X1 + X2 + X3 we already know (n = 3 is a special case of the general case considered
at the lecture.) To show that T = X1 X2 + X3 is not sufficient, it suffices to show that, say,
f(X1 ,X2 ,X3 |T =1) (0, 0, 1|1) does depend on p. You can see that
T T T
P (X1 = 0 X2 = 0 X3 = 1 T = 1)
f(X1 ,X2 ,X3 |T =1 (0, 0, 1|1)) =
P (T = 1)
(1 − p)2 p
= 2
3p (1 − p) + p(1 − p)2
1−p
=
1 + 2p
Hence T = X1 X2 + X3 is not sufficient for p.
Question 3: We will show that T1 = X1 + X2 is sufficient but T2 = X1 X2 is not sufficient. By a direct check
we have
P (X1 = 0 ∩ X2 = 0|X1 + X2 = 0) = 1,
P (X1 = 1 ∩ X2 = 0|X1 + X2 = 0) = P (X1 = 1 ∩ X2 = 1|X1 + X2 = 0) = P (X1 = 0 ∩ X2 = 1|X1 + X2 = 0) = 0
θ(4 − θ)/12 1
P (X1 = 1 ∩ X2 = 0|X1 + X2 = 1) = = = P (X1 = 0 ∩ X2 = 1|X1 + X2 = 1)
θ(4 − θ)/6 2
P (X1 = 0 ∩ X2 = 0|X1 + X2 = 1) = 0 = P (X1 = 1 ∩ X2 = 1|X1 + X2 = 0)
θ(θ − 1)/12
P (X1 = 1 ∩ X2 = 1|X1 + X2 = 2) = =1
θ(θ − 1)/12
P (X1 = 0 ∩ X2 = 1|X1 + X2 = 2) = P (X1 = 1 ∩ X2 = 0|X1 + X2 = 2) = 0
P (X1 = 0 ∩ X2 = 0|X1 + X2 = 2) = 0
and we see that in all possible cases the conditional distribution does not involve the parameter
θ. However, for T2 = X1 X2 we can see by following the same pattern, that
4θ − θ2
P (X1 = 1 ∩ X2 = 0|X1 X2 = 0) = .
θ − θ2 + 12
This clearly depends on θ hence T2 is not sufficient.
Question 4: The conditional probability P (X = x|X1 = x1 ) is the probability P (X2 = x2 ∩ · · · ∩ Xn = xn )
and it depends on p since for each i we have
In the second case, the conditional distribution explicitly involves θ hence T = X1 + X2 can not
be sufficient for θ.
2
Question 6: Similar solution to Question 4 above and we leave this as an exercise for you.
Question 7: a) The ratio takes the form
Qn
L(x, λ) Pn
xi − n (y )!
Qni=1 i
P
yi
=λ i=1 i=1
L(y, λ) i=1 (xi )!
Pn Pn Pn
and this would not depend on λ if and only if i=1 xi = i=1 yi . Hence T = i=1 Xi is mini-
mal sufficient.
d) We have
L(x, θ) I(x(n) ,∞) (θ)
= .
L(y, θ) I(y(n) ,∞) (θ)
This has to be considered as a function of θ for fixed x(n) and y(n) . Assume that x(n) 6= y(n)
L(x,θ)
and, to be specific, let x(n) > y(n) first. Then the ratio L(y,θ) is:
– not defined if θ ≤ y(n) ,
– equal to zero when θ ∈ [y(n) , x(n) ).
– equal to one when θ > x(n) .
In other words, the ratio’s value depends on the position of θ on the real axis, that is, it is a
function of θ. Similar conclusion will be reached if we had x(n) < y(n) (do it yourself). Hence,
if and only if x(n) = y(n) will the ratio not depend on θ. This implies that T = X(n) is minimal
sufficient.
e) T = (X(1) , X(n) ) is minimal sufficient. We know from 1c) that L(x, θ) depends on the sample
via x(n) and y(n) only. If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) are such that either x(1) 6= y(1)
L(x,θ)
or x(n) 6= y(n) or both then L(y,θ) will have different values in different intervals, that is, will
depend on θ. For this not to happen, x(1) = y(1) and x(n) = y(n) must hold.
b) Since
n
1 Y Pn
L(x, θ) = 4 n
( x3i )e−( i=1 xi )/θ .
(6θ ) i=1
Qn 1 −t/θ
Pn
We can factorise with h(x) = i=1 x3i , g(t, θ) = (6θ 4 )n e , where t = i=1 xi .
Question 9, 10: Left for you as exercises. I have treated the location case for the Cauchy family in the lectures,
the scale case is along the same lines.
3
Question 11: Parts (a) to (d) we went through during the lectures. For part (e) look at the score representa-
tion.
Question 12: Take τ̂ = I{X1 =0 T X2 =0} (X). Then we have that E(τ̂ ) = e−2λ (that is, τ̂ is unbiased for τ (λ) =
e−2λ ). Then the UMVUE would be
X n n
X
E τ̂ Xi = t = 1 · P τ̂ = 1| Xi = t .
i=1 i=1
Pn
We know that i=1 Xi ∼ Po(nλ). The unbiased estimate is
T T Pn
P (X1 = 0 X2 = 0 i=1 Xi = t)
a(t) = Pn
P ( i=1 Xi = t)
T T Pn
P (X1 = 0 X2 = 0 i=3 Xi = t)
= Pn
P ( i=1 Xi = t)
(n − 2)t
=
nt
2
= (1 − )t .
n
We can check directly that this estimator is unbiased for τ (λ) (although this is not necessary:
we have stated a general theorem that Rao-Blackwellization preserves the unbiasedness property.
I have included the calculations below just as an additional exercise:
∞ ∞
X 2 e−nλ (nλ)t X [λ(n − 2)]t
(1 − )t = e−nλ = e−2λ .
E a(T ) =
t=0
n t! t=0
t!
8xy 2y
fY |X (y|x) = = 2 if 0 < y < x, 0 < x < 1 (and zero else)
4x3 x
Z x
2x
a(x) = E(Y |X = x) = yfY |X (y|x)dy = ,0 < x < 1
0 3
4
Z 1 Z 1
2x 3 8
E(a(X)) = a(x)fX (x)dx = 4x dx =
0 0 3 15
Z 1
8
E(Y ) = 4 y(y − y 3 )dy =
0 15
2 8 8 8 2 8
Similarly Ea (x) = 27 , Var(a(X)) = 27 − ( 15 ) = 675
1 11
E(Y 2 ) = , Var(Y ) =
3 225
and we see directly that indeed Var(a(X)) < Var(Y ) holds.
Again note that the fact that by conditioning we reduce the variance was proved quite generally
in the lectures. In this problem we are just checking that indeed Var(a(X)) < Var(Y ) on a
particular example.
Question 14: Steps:
Pn
a) T = i=1 Xi is complete and sufficient for θ.
b) If τ̂ = X1 X2 then E τ̂ = θ2 (that is, τ̂ is unbiased for θ2 ).
t(t−1)
c) a(t) = E(τ̂ |T = t) = · · · = n(n−1) which is the UMVUE.
We can also check directly the unbiasedness of this estimator:
n 1
E(a(T )) = E[X̄( X̄ − )]
n−1 n−1
n E(X̄)
= E(X̄)2 −
n−1 n−1
n θ
= [Var(X̄) + (E(X̄))2 ] −
n−1 n−1
n θ(1 − θ) 2 θ
= ( +θ )−
n−1 n n−1
= θ2 .
Question 15: Since f (x; θ) is an one-parameter exponential Pnfamily, with d(x) = x. Using our general state-
ment from the lecture, we can claim that T = i=1 Xi is complete and minimal sufficient
for θ. We also know that for this distribution E(X1 ) = θ, Var(X1 ) = θ2 holds. Let us calculate:
Var(X1 ) n+1 2
E(X̄ 2 ) = Var(X̄) + (E(X̄))2 = + (EX1 )2 = θ 6= θ2 .
n n
After bias-correction, by Lehmann-Scheffe’s theorem:
n(X̄)2 T2
=
n+1 n(n + 1)
T2
is unbiased for θ and since T is complete and sufficient, we conclude that n(n+1) is UMVUE
for θ2 .
ntn−1
fT (t) = , 0 < t < θ.
θn
n n+2 2
Hence E(T 2 ) = 2
n+2 θ . Hence T1 = n T is unbiased estimator of θ2 . By Lehmann-Scheffe,
n+2 2
T
n
is the UMVUE.
5
Its variance:
n+2 2 2 n+2 2 4
E( T ) − θ4 = ( ) ET − θ4
n n
Z θ n+3
n+2 2 t
=( ) n dt − θ4
n 0 θn
(n + 2)2 1
= θ4 [ − 1]
n n+4
4θ4
= .
n(n + 4)
n−1 1 1
b) Similar to a). n T is the UMVUE; its variance is n(n−2)θ 2 .
Question 17: This is a more difficult (*) question. It is meant to challenge the interested students.
a) The density f (t; θ) in 7a) is also called Gamma(n, θ) density. To show the result, we
could use convolution. Reminder: the convolution formula for the density of the sum of two
independent random variables X, Y :
Z ∞
fX+Y (t) = fX (x)fY (t − x)dx
−∞
In particular, if the random variables are non-negative, the above formula simplifies to:
Z t
fX+Y (t) = fX (x)fY (t − x)dx, if t > 0 (and 0 elsewhere).
0
Applying it for the two non-negative random variables in our case, we get:
Z t Z t
−θx −tθ+θx 2 −tθ
fX1 +X2 (t) = θ 2
e e dx = θ e dx = θ2 te−tθ .
0 0
which means that for n = 2 the claim is proved (note that Γ(2) = 1.) We apply induction to
Pk
show the general case. Assume that for T = i=1 Xi , the formula is also true and we want to
Pk+1 Pk
show that then it is true for k + 1. We apply for i=1 Xi = i=1 Xi + Xk+1 the convolution
formula and we get:
tk θk+1 e−θt
fPk+1 Xi (t) = ,
i=1 Γ(k + 1)
that is, the claim is true for k + 1.
Note: It is possible to give an alternative proof by using the moment generating functions
approach. Try it if you feel familiar enough with moment generating functions.
R∞
b) Consider the estimator τ̂ = I{X1 >k} (X). Then, E(τ̂ ) = 1 · P (X1 > k) = k
θe−θx dx = e−kθ .
Pn
c) Let T = i=1 Xi . Consider for small enough ∆x1 :
fX1 ,T (x1 , t)∆x1 ∆t
fX1 |T (x1 |t)∆x1 =
fT (t)∆t
Pn
P [x1 < X1 < x1 + ∆x1 ; t < i=1 Xi < t + ∆t]
≈ 1 n n−1 e−θt ∆t
Γ(n) θ t
Pn
P [x1 < X1 < x1 + ∆x1 ; t − x1 < i=2 Xi < t − x1 + ∆t]
≈ 1 n n−1 e−θt ∆t
Γ(n) θ t
Pn
P (x1 < X1 < x1 + ∆x1 )P (t − x1 < i=2 Xi < t − x1 + ∆t)
≈ 1 n n−1 e−θt ∆t
Γ(n) θ t
θe−θx1 Γ(n−1)
1
θn−1 (t − x1 )n−2 e−θ(t−x1 ) ∆x1 ∆t (t − x1 )n−2
≈ 1 = (n − 1) ∆x1 .
n n−1 e−θt ∆t
Γ(n) θ t
tn−1
6
Going to the limit as ∆x1 tends to zero, we get
n−1 x1
fX1 |T (x1 |t) = (1 − )n−2 , 0 < x1 < t < ∞.
t t
Now we can find the UMVUE. It will be:
Z ∞ t
n−1 t − k n−1
Z
E(I(k,∞) (X1 )|T = t) = fX1 |T (x1 |t)dx1 = n−1
(t − x1 )n−2 dx1 = .
k k t t
That is,
T − k n−1
I(k,∞) (T )
T
Pn
with T = i=1 Xi is the UMVUE of e−kθ .
Question 18: The restriction θ ∈ (0, 1/5) makes sure that the probabilities calculated as a function of θ indeed
belong to [0, 1]. Let Eθ h(X) = 0 for all θ ∈ (0, 1/5). This means:
for all θ ∈ (0, 1/5). The main theorem of algebra implies then that the coefficients in front
of each power of the 3rd order polynomial in θ must be equal to zero. Hence h(3) = 0 =⇒
h(1) − h(3) = 0 =⇒ h(1) = 0 =⇒ 2h(0) + h(2) = 0. The latter relationship does not necessarily
imply that both h(0) = 0, h(2) = 0 must hold. Hence the family of distributions is not complete.
Question 19: Parts 19a), 19b), 19c) were treated in lecture and are complete. We consider 19d) here. We
have to show that T = X(n) is complete. We know that the density of T is
ntn−1
fT (t) = , 0 < t < θ (and 0 else).
θn
Let Eθ g(T ) = 0 for all θ > 0. This implies:
θ θ
ntn−1
Z Z
1
g(t) n
dt = 0 = n g(t)ntn−1 dt
0 θ θ 0
Rθ
for all θ > 0 must hold. Since θ1n 6= 0 we get 0 g(t)ntn−1 dt = 0 for all θ > 0. Differentiating
both sides with respect to θ we get
ng(θ)θn−1 = 0
for all θ > 0. This implies g(θ) = 0 for all θ > 0. This also means Pθ (g(T ) = 0) = 1. In
particular, this result implies that S = n+1
n X(n) is the UMVUE of τ (θ) = θ in this model since
Eθ S = θ holds (see previous lectures) and S is a function of sufficient and complete statistic.
Question 20: The likelihood is
Pn1 (xi −µ1 )2 Pn2 (yi −µ2 )2
1 {− 1 − 12 }
L(X,Y; µ1 , σ12 , µ2 , σ22 ) = √ i=1 2 i=1 2
n1 n2
e 2 σ1 σ2
n
( 2π) σ1 σ2
and log-likelihood is
n1 n2
√ 1X (xi − µ1 )2 1X (yi − µ2 )2
ln L = −n ln( 2π) − n1 σ1 − n2 σ2 − 2 −
2 i=1 σ1 2 i=1 σ22
7
delivers
µˆ1 = X̄n1 and µˆ2 = Ȳn2
for the MLE. Using the transformation invariance property, we get θ̂ = X̄n1 − Ȳn2 for the
maximum likelihood estimator of θ. Further:
σ12 σ22
Var(θ̂) = Var(X̄n1 ) + Var(Ȳn2 ) = + = f (n1 ).
n1 n − n1
To find the minimum, we set the derivative with respect to n1 to be equal to zero and solve
the resulting equation. This gives: σσ21 = n1
n2 . With other words, the sample sizes must be
proportional to the standard deviations. In particular, if n is fixed, we get n1 = σ1σ+σ
1
2
n.
Question 21: i) The likelihood is
n
Y
L(X; θ) = θn x−2
i I[θ,∞) (x(1) ).
i=1
We consider L as a function of theta after the sample has been substituted. When θ moves
on the positive half-axis, this function first grows monotonically (when θ moves between 0 and
x(1) ) and then drops to zero onward since the indicator becomes equal to zero. Hence L is a
discontinuous function of θ and its maximum is attained at x(1) . This means that θ̂mle = X(1) .
ii) Using the factorisation criterion, we see that X(1) is sufficient. It is also minimal sufficient due
to dimension considerations. The minimal sufficiency can also be shown by directly examining
L(X;θ)
the ratio L(Y;θ) .
Question 22: a) The likelihood is
n
Y
L(X; θ) = θn ( xi )θ−1
i=1
with log-likelihood
n
X
ln L(X; θ) = n ln θ + (θ − 1) ln xi .
i=1
The score:
n
∂ n X
ln L = + ln xi = 0
∂θ θ i=1
which gives the root
−n
θ̂ = θ̂mle = Pn .
i=1 ln xi
Then, using the translation invariance property, we get
ˆ = θ̂
τ (θ) .
θ̂ + 1
b) We have that
√ 1
n(θ̂ − θ →d N (0, ).
IX1 (θ)
We need to find IX1 (θ). To this end, we take:
ln f (x; θ) = lnθ + (θ − 1) ln x;
∂ 1
ln f (x; θ) = + ln x;
∂θ θ
2
∂ 1
ln f (x; θ) = − 2 .
∂θ2 θ
1
This means that IX1 (θ) = θ2 and
√
n(θ̂ − θ) →d N (0, θ2 ).
8
θ
Since τ (θ) = θ+1 , by applying the delta method we get
√ θ2
n(τ̂ − τ ) →d N 0, .
(1 + θ)4
Qn Pn
c) According to the factorisation criterion, i=1 Xi is sufficient (also, i=1 ln Xi is sufficient).
Since the density belongs to an one-parameter exponential we do have completeness, as well.
Pn
The statistic T = i=1 Xi is not sufficient. Consider for example 0 < t < 1, n = 2, T = X1 +X2 .
Using the convolution formula (see previous tutorial sheet) we have:
Z t
fX1 +X2 (t) = θ2 xθ−1 (t − x)θ−1 dx.
0
θ2 (x1 x2 )θ−1
f(X1 ,X2 )|T (x1 , x2 |t) =
t2θ−1 θ2 B(θ, θ)
(if x1 + x + 2 = t, and, of course, zero elsewhere). Hence the conditional density of the sample
given the value of the statistic does depend on the parameter.
d)Looking at Pn
∂ − i=1 ln xi 1
ln L = −n( − )
∂θ n θ
1 1
we see that for the CRLB will be attained. This means that
θ θ can be estimated by the
UMVUE Pn
lnXi
T = − i=1 .
n
The attainable bound is easily seen to be
1
.
nθ2
1 (x−µ)2
f (x; µ, σ 2 ) = √ e− 2σ2
2πσ
9
∂ 1 1 (x − µ)2
2
ln f = − 2 + ,
∂σ 2σ 2 σ4
∂2 1 (x − µ)2
ln f = − .
∂σ 2 ∂σ 2 2σ 4 σ6
1
Taking −E(. . . ) in the last equation gives IX1 (σ 2 ) = 2σ 4 . Hence:
√
n(σ̂ 2 − σ 2 ) −→d N (0, 2σ 4 ).
1 1
Question 24 a) i) The MLE of λ is X̄ hence of τ (λ) = λ would be τ̂ = X̄
.
ii) Since P (X̄ = 0) > 0, we get that even the first moment is infinite (not to mention the second)
and there is no finite variance.
iii) The delta method gives us:
√ 1 1 1 −1
n( − ) −→d N (0, 4 IX (λ)))
X̄ λ λ 1
(Since the asymptotic variance becomes constant (= 41 ) and does not depend on the parameter,
√
we call the transformation h(λ) = λ a variance stabilising transformation).
√ z √
ii) X̄ ± 2α/2
√ would be the confidence interval for
n
λ and
p zα/2 p zα/2
(( X̄ − √ )2 , ( X̄ + √ )2 )
2 n 2 n
10