Lecture  39:   The  method  of  moments
The  method  of  moments  is  the  oldest  method  of  deriving  point  estimators.
It almost  always  produces some asymptotically  unbiased  estimators,  although they may not
be  the  best  estimators.
Consider   a  parametric  problem  where  X
1
, ..., X
n
  are  i.i.d.   random  variables  from  P
,    
  R
k
,  and  E|X
1
|
k
< .
Let  
j
  = EX
j
1
  be  the  jth  moment  of  P  and  let
 
j
  =
  1
n
n
i=1
X
j
i
be  the  jth  sample  moment,  which  is  an  unbiased  estimator  of  
j
,  j  = 1, ..., k.
Typically,
j
  = h
j
(),   j  = 1, ..., k,   (1)
for  some  functions  h
j
  on R
k
.
By  substituting  
j
s  on  the  left-hand  side  of   (1)  by  the  sample  moments    
j
,   we  obtain  a
moment  estimator
  
,  i.e.,
  
  satises
 
j
  = h
j
(
),   j  = 1, ..., k,
which  is  a  sample  analogue  of  (1).
This  method  of  deriving  estimators  is  called  the  method  of  moments.
An  important  statistical  principle,  the  substitution  principle, is  applied  in  this  method.
Let    = ( 
1
, ...,  
k
)  and  h = (h
1
, ..., h
k
).
Then    = h(
).
If  the  inverse  function  h
1
exists,  then  the  unique  moment  estimator  of    is
  
 = h
1
( ).
When  h
1
does  not  exist  (i.e.,   h  is  not  one-to-one),   any  solution  of      =  h(
)  is  a  moment
estimator  of  ;
if  possible,  we  always  choose  a  solution
  
  in  the  parameter  space  .
In  some  cases,  however,  a  moment  estimator  does  not  exist  (see  Exercise  111).
Assume  that
  
 = g( )  for  a  function  g.
If  h
1
exists,  then  g  = h
1
.
If  g  is  continuous  at   = (
1
, ..., 
k
),  then
  
  is  strongly  consistent  for  ,  since   
j
 
a.s.
 
j
  by
the  SLLN.
If   g  is  dierentiable  at    and  E|X
1
|
2k
< ,   then
  
  is  asymptotically  normal,   by  the  CLT
and  Theorem  1.12,  and
amse
() = n
1
[g()]
g(),
where  V
  is  a  k k  matrix  whose  (i, j)th  element  is  
i+j
j
.
Furthermore,  the  n
1
order  asymptotic  bias  of
  
  is
(2n)
1
tr
_
2
g()V
_
.
1
Example   3.24.   Let   X
1
, ..., X
n
  be  i.i.d.   from  a  population  P
  indexed  by  the  parameter
 = (, 
2
),  where   = EX
1
  R  and  
2
= Var(X
1
)  (0, ).
This  includes  cases  such  as  the  family  of  normal  distributions,  double  exponential  distribu-
tions,  or  logistic  distributions  (Table  1.2,  page  20).
Since  EX
1
  =   and  EX
2
1
  = Var(X
1
) + (EX
1
)
2
= 
2
+ 
2
,  setting   
1
  =   and   
2
  = 
2
+ 
2
we  obtain  the  moment  estimator
 =
_
 
X,
  1
n
n
i=1
(X
i
  
X)
2
_
 =
_
 
X,
  n 1
n
  S
2
_
.
Note  that
  
X  is  unbiased,  but
  n1
n
  S
2
is  not.
If  X
i
  is  normal,  then
  
  is  sucient  and  is  nearly  the  same  as  an  optimal  estimator  such  as
the  UMVUE.
On  the  other  hand,  if  X
i
  is  from  a  double  exponential  or  logistic  distribution,  then
  
  is  not
sucient  and  can  often  be  improved.
Consider  now  the  estimation  of  
2
when  we  know  that   = 0.
Obviously  we  cannot  use  the  equation   
1
  =   to  solve  the  problem.
Using   
2
  = 
2
 = 
2
,  we  obtain  the  moment  estimator   
2
=   
2
  = n
1
n
i=1
X
2
i
 .
This  is  still  a  good  estimator  when  X
i
  is  normal,  but  is  not  a  function  of  sucient  statistic
when  X
i
  is  from  a  double  exponential  distribution.
For  the  double  exponential   case  one  can  argue  that  we  should  rst  make  a  transformation
Y
i
 = |X
i
|  and  then  obtain  the  moment  estimator  based  on  the  transformed  data.
The  moment  estimator of 
2
based on the transformed  data is
  
Y
2
= (n
1
n
i=1
|X
i
|)
2
, which
is  sucient  for  
2
.
Note  that  this  estimator  can  also  be  obtained  based  on  absolute  moment  equations.
Example  3.25.   Let  X
1
, ..., X
n
  be  i.i.d.   from  the  uniform  distribution  on  (
1
, 
2
),   <
1
  < 
2
  < .
Note  that
EX
1
  = (
1
 + 
2
)/2
and
EX
2
1
  = (
2
1
 + 
2
2
 + 
1
2
)/3.
Setting   
1
  =  EX
1
  and   
2
  =  EX
2
1
  and  substituting  
1
  in  the  second  equation  by  2 
1
 
2
(the  rst  equation),  we  obtain  that
(2 
1
2
)
2
+ 
2
2
 + (2 
1
2
)
2
  = 3 
2
,
which  is  the  same  as
(
2
  
1
)
2
= 3( 
2
  
2
1
).
Since  
2
  > EX
1
,  we  obtain  that
2
  =   
1
 +
_
3( 
2
  
2
1
) =
  
X +
_
3(n1)
n
  S
2
2
and
1
  =   
1
_
3( 
2
  
2
1
) =
  
X 
_
3(n1)
n
  S
2
.
These  estimators  are  not  functions  of  the  sucient  and  complete  statistic  (X
(1)
, X
(n)
).
Example 3.26.   Let X
1
, ..., X
n
 be i.i.d. from the binomial distribution Bi(p, k) with unknown
parameters  k  {1, 2, ...}  and  p  (0, 1).
Since
EX
1
  = kp
and
EX
2
1
  = kp(1 p) + k
2
p
2
,
we  obtain  the  moment  estimators
 p = ( 
1
 +  
2
1
  
2
)/ 
1
  = 1 
  n1
n
  S
2
/
 
X
and
k  =   
2
1
/( 
1
 +  
2
1
  
2
) =
  
X/(1 
  n1
n
  S
2
/
 
X).
The  estimator   p  is  in  the  range  of  (0, 1).
But
  
k  may  not  be  an  integer.
It  can  be  improved  by  an  estimator  that  is
  
k  rounded  to  the  nearest  positive  integer.
Example  3.27.   Suppose that X
1
, ..., X
n
 are i.i.d. from the Pareto distribution Pa(a, ) with
unknown  a > 0  and   > 2  (Table  1.2,  page  20).
Note  that
EX
1
  = a/( 1)
and
EX
2
1
  = a
2
/( 2).
From  the  moment  equation,
(1)
2
(2)
  =   
2
/ 
2
1
.
Note  that
  (1)
2
(2)
 1 =
  1
(2)
.
Hence
( 2) =   
2
1
/( 
2
  
2
1
).
Since   > 2,  there  is  a  unique  solution  in  the  parameter  space:
 = 1 +
_
 
2
/( 
2
  
2
1
) = 1 +
_
1 +
  n
n1
X
2
/S
2
and
 a =
   
1
(
 1)
=
  
X
_
1 +
  n
n1
X
2
/S
2
__
1 +
_
1 +
  n
n1
X
2
/S
2
_
.
3
Exercise  108.   Let  X
1
, ..., X
n
  be  a  random  sample  from  the  following  discrete  distribution:
P(X
1
 = 1) =
  2(1 )
2 
  ,   P(X
1
  = 2) =
  
2 
,
where    (0, 1)  is  unknown.
Note  that
EX
1
  =
  2(1 )
2 
  +
  2
2 
  =
  2
2 
.
Hence,  a  moment  estimator  of    is
  
 = 2(1 
  
X
1
),  where
  
X  is  the  sample  mean.
Note  that
Var(X
1
) =
  2(1 )
2 
  +
  4
2 
 
  4
(2 )
2
  =
  4 2
2
4
(2 )
2
  ,
 = 2(1 
1
) = g(),
g
() = 2/
2
= 2/[2/(2 )]
2
= (2 )
2
/2.
By  the  central  limit  theorem  and  -method,
n(
 ) 
d
  N
_
0,
 (2 )
2
(2 
2
2)
2
_
.
The  method  of  moments  can  also  be  applied  to  nonparametric  problems.
Consider,  for  example,  the  estimation  of  the  central  moments
c
j
  = E(X
1
1
)
j
,   j  = 2, ..., k.
Since
c
j
  =
j
t=0
_
j
t
_
(
1
)
t
jt
,
the  moment  estimator  of  c
j
  is
 c
j
  =
j
t=0
_
j
t
_
(
X)
t
 
jt
,
where   
0
  = 1.
It  can  be  shown  (exercise)  that
 c
j
  =
  1
n
n
i=1
(X
i
  
X)
j
,   j  = 2, ..., k,   (2)
which  are  sample  central  moments.
From  the  SLLN,   c
j
s  are  strongly  consistent.
If  E|X
1
|
2k
< ,  then
n( c
2
c
2
, ...,  c
k
c
k
) 
d
  N
k1
(0, D)   (3)
where  the  (i, j)th  element  of  the  (k 1) (k 1)  matrix  D  is
c
i+j+2
c
i+1
c
j+1
(i + 1)c
i
c
j+2
(j + 1)c
i+2
c
j
 + (i + 1)(j + 1)c
i
c
j
c
2
.
4