0% found this document useful (0 votes)

31 views5 pages

Method of Moments

The document contains lecture notes for Stat 200 at Stanford University, focusing on the method of moments in statistical inference. It discusses the estimation of parameters using sample data, the properties of estimators, and the application of the method to various distributions including Bernoulli, Normal, and Gamma. Additionally, it highlights the advantages and limitations of the method of moments, emphasizing the importance of consistency in estimators.

Uploaded by

kundainashe199

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views5 pages

Method of Moments

Uploaded by

kundainashe199

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Stat 200: Introduction to Statistical Inference Autumn 2018/19

Lecture 3: The method of moments

Lecturer: Art B. Owen October 2

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They
are meant as a memory aid for students who took stat 200 at Stanford University. They may be distributed
outside this class only with the permission of the instructor. Also, Stanford University holds the copyright.

Abstract
These notes are mnemonics about what was covered in class. They don’t replace being present or
reading the book. Reading ahead in the book is very effective.
In reviewing probability we emphasized moments. Turns out there’s a method for using them.

3.1 Preamble

Now we do statistics. I used the example of an MD working in a neighborhood who sees cholesterol numbers
X1 , X2 , . . . , Xn for n patients. That is a fine record of the past but what does it tell about future patients?
We let Yi = 1Xi >240 as an indicator variable for a patient with a worrisome cholesterol level.
Here is how we usually work in statistics. We consider first that the data we got are a random sample from
some distribution F . We suppose next that F belongs to a known parametric family such as the normal
distributions N (µ, σ 2 ) with PDF
1 x − µ 2
1 −
f (x; µ, σ) = √ e 2 σ , x∈R
2π
or Poisson with probability mass function (PMF) p(x; λ) = e−λ λx /x! for x = 0, 1, 2, . . . . In general X ∼ F
with PDF f (x; θ) (PMFs are similar).
The past data are independent and identically distributed (IID) from this distribution. So are the future
values and they’re independent of the past ones. The only thing we don’t know is θ. If we knew θ we would
know the distribution of these random variables completely.
At a high level, here are the tasks this framework brings:

1. find some estimate of θ of the form θ̂ = T (X1 , . . . , Xn )

that θ̂ is a function of the data (called a ‘statistic’). It is therefore subject to the laws of probability.
2. what good (or bad) properties does our estimate have? Mainly, in what ways is it close to the true θ?
3. given two estimates can we decide which is better?
4. in this framework, do we think that some particular value θ0 is compatible with the data we have
seen? We will then test a hypothesis that θ = θ0 . Often we set things up so that θ0 = 0 is meaningful
scientifically and then we want to test whether the true θ could really be 0.
5. we may also want to test whether the data are compatible with our chosen distribution family f (x; θ).
These are ‘goodness of fit tests’. Maybe we were wrong about that assumption.

c Stanford University 2018

3-2 Lecture 3: The method of moments

6. We often want a confidence interval. Let L(X1 , . . . , Xn ) and U (X1 , . . . , Xn ) be two statistics. They
form a 99% confidence interval for θ if
Pr(L(X1 , . . . , Xn ) 6 θ 6 U (X1 , . . . , Xn )) = 0.99.

Those are the main tasks we will look at: estimation, testing, and forming confidence intervals. We usually
work with IID Xi . Some of the methods and problems we consider can extend beyond the case of IID Xi .
In this class (but the prior lecture notes) we had a review of Rice ch 5, law of large numbers (LLN) and
central limit theorem (CLT).

3.2 Method of moments

Let’s think of our friend the MD with Yi ∈ {0, 1} for i = 1, . . . , n. Since cholesterol levels Xi are IID the Yi
are IID
PnBern(p) for some unknown p. Suppose that of n = 200 patients there were 15 with high cholesterol.
I.e., i=1 Yi = 15.
Pn
In the method of moments we find E(Y ; p) under our parametric model and equate it to Ȳ = (1/n) i=1 Yi .
We estimate that the population mean equals the sample mean. The Bern(p) distribution has mean p. The
data have mean 15/200 = 0.075. Equating them gives us p̂ = .075 as our estimate of p. This almost seems
too easy. We have 7.5% high cholesterol in the sample so we guess it is 7.5% in the population. Then again
without other information, why would one estimate it to be say 7% or 8% when the sample proportion was
7.5%?
In this case, np̂ ∼ Bin(n, p). So we almost know the exact distribution of p̂. The sticking point that we will
have to get around later is that the distribution of our estimate p̂ depends on the true p which we don’t
know. We face many seemingly circular arguments like this in statistics.

3.2.1 Normal data

Now suppose that Xi ∼ N (µ, σ 2 ). Then E(X) = µ so we use µ̂ = X̄ as before. But we still need an estimate
of σ 2 . We get that by using two moments, solving
E(X) = X̄, and E(X 2 ) = X 2 .
Pn Pn
Here X 2 = (1/n) i=1 Xi2 and more generally X r = (1/n) i=1 Xir . Rice’s notation is µr = E(X r ) and
µ̂r = X r . The new equation is E(X 2 ) = µ2 + σ 2 . Solving two moment equations gives us µ̂ = X̄ as before
and v
n u n
2 1 X
2
u1 X
σ̂ = (Xi − X̄) , i.e., σ̂ = t (Xi − X̄)2 .
n i=1 n i=1

Many students will already be familiar with a different estimator

n
1 X
s2 = (Xi − X̄)2 .
n − 1 i=1

This will come up for us later.

If X and Xi ∼ N (µ, σ 2 ) then
X − µ 240 − µ 240 − µ
Pr(X > 240) = Pr > = Pr N (0, 1) > = 1 − Φ((240 − µ)/σ)
σ σ σ
Lecture 3: The method of moments 3-3

where Φ is the CDF of the N (0, 1) distribution. The MD could also estimate the fraction of high cholesterol
patients by 1 − Φ((240 − µ̂)/σ̂). Later we will look at criteria for choosing from among two or more different
estimators.

3.2.2 Gamma data

The Gamma distribution with shape α > 0 is denoted Gam(α) (in these notes). It has PDF f (x; α) =
xα−1 e−x /Γ(α) for 0 < x < ∞, where Γ is the Gamma function (see lecture 2). If Y ∼ Gam(α) and X = Y /λ
for λ > 0 then X has the Gam(α, λ) distribution. This is the definition of Gam(α, λ). The original is then
Gam(α, 1).
Your probability background should enable you to do these two things:
1. show that the PDF of Gam(α, λ) is f (x; α, λ) = f (x/λ; α)/λ, and
2. evaluate this PDF.
In class we worked out that the method of moments estimates are
X̄ 2 X̄
α̂ = 2 and λ̂ = 2 ,
σ̂ σ̂
where σ̂ 2 is the same as for the normal case.

3.3 How well does M.O.M. work?

Let θ be the true parameter value and θ̂ be our estimate of it. When we want to keep track of the sample
size n we write θ̂ as θ̂n . A very mild requirement is that θ̂n should converge to θ as n → ∞. An estimator
that cannot get the right answer on unbounded sample sizes is problematic. We want Pr(|θ̂n − θ| > ) → 0.
We want that for all > 0 and for all θ too. We have a set Θ (capital Greek letter θ) containing all possible
values of θ. Now our estimator is consistent if
lim Pr(|θ̂n − θ| > ; θ) = 0, for all > 0 and all θ ∈ Θ.
n→∞

Our Θ is the set of all possible values that θ could take. Often θ = Rr for some r > 1. Other times only
certain values of θ are possible. For N (µ, σ 2 )
Θ = {(µ, σ) | µ ∈ R, σ > 0}
and the Gamma family has θ = (α, λ) ∈ Θ where
Θ = {(α, λ) | 0 < α < ∞, 0 < λ < ∞}.
Later Rice uses Θ (capital of the Greek leter θ) to denote a random variable that takes the value θ. We will
keep the two usages separate.
If θ = E(X) and θ̂n = X̄ based on n IID Xi then θ̂n is automatically consistent by the law of large numbers.
That was easy.
Now suppose that θ = g(E(X)) and θ̂n = g(X̄). If g is continuous at θ then for any > 0 there exists a
δ > 0 such that |X̄ − E(X)| < δ implies that |g(X̄) − g(E(X))| < . This is from the definition of continuity.
That means
Pr(|θ̂n − θ| > ) = Pr(|g(X̄) − g(E(X))| > ) 6 Pr(|X̄ − e(X)| > δ) → 0
by the LLN. So θ̂n is consistent when g is continuous at the true θ. If g is a continuous function of Θ (i.e.,
continuous everywhere) then the moment estimator θ̂n is consistent.
3-4 Lecture 3: The method of moments

3.3.1 Delta method variance

We will use a Taylor expansion in order to apply the CLT to the method of moments. Rice uses a Taylor
expansion on the method of maximum likelihood so we might as well add that in for the method of moments
too (to be consistent).

If θ̂ = X̄ then a CLT for X̄ immediately gives one for θ̂. The same would happen for a linear function
θ̂ = a + b × X̄. If X̄ ≈ N (µ, σ 2 /n) then a + b × X̄ ≈ N (a + bµ, b2 σ 2 /n)
If X̄ is the average of n IID random variables having mean µ and variance σ 2 then X̄ converges to µ = E(X)
(by the LLN). Let’s make a Taylor approximation to g(X̄) at µ:
1
g(X̄) = g(µ) + (X̄ − µ)g 0 (µ) + (X̄ − µ)2 g 00 (µ) + · · · .
2
Therefore
1
g(X̄) − g(µ) = (X̄ − µ)g 0 (µ) + (X̄ − µ)2 g 00 (µ) + · · · .
2
√
Now E((X̄ − µ)2 ) = σ 2 /n so the typical size of |X̄ − µ| is about σ/ n. Higher powers of |X̄ − µ| are then
relatively negligble. So we can work with the approximations

g(X̄) − g(µ) ≈ (X̄ − µ)g 0 (µ) ≈ N (0, σ 2 g 0 (µ)2 /n).

We wrote an infinite Taylor expansion but we could also terminate it using g 00 (µ) where µ∗ is somewhere
between X̄ and µ. A more advanced course would take more care about conditions on g than we do here.
There is one situation where the argument above goes wrong. If g 0 (µ) = 0 then the first term (X̄ − µ)g 0 (µ)
is no longer dominant. We would then have to find the smallest order derivative of g that is not zero.
Our delta method approximation is then
σ 2 g 0 (µ)2
θ̂n = g(X̄) ≈ N θ,
n
if Xi are independent with mean µ and variance σ 2 , θ = g(E(X)) = g(µ) for a function g that is smooth
and has g 0 (µ) 6= 0.

3.3.2 Problems and advantages of moments

Maybe θ has r = 3 parameters in it. Then we form and solve equations E(X k ) = X k for k = 1, 2, 3. We
could be out of luck of E(X 3 ) = ∞. Then the method of moments would not deliver estimates for us.

We could have X with a PDF f (x; θ) where θ = E(X) is known to satisfy θ > 0. Yet we might get θ̂ = X̄ < 0.
In some settings we can get a negative estimate σ̂ 2 for a variance. We probably knew that σ̂ 2 would be
wrong but still getting σ̂ 2 < 0 remains embarrasing.
We might have a parameter that must be an integer. Suppose your puppy got 0 or 1 or 2 copies of a certain
gene from its parents. Call that value θ. Now you get some random variables that are (say) Poi(10 × θ).
After a bit of algebra you find that the method of moments estimate is θ̂ = X̄/10 but it is not 0 or 1 or 2.

Now some advantages. Suppose that n = 1011 . Then θ̂ will require a lot of summing. You can however
spread those sums over tens of thousands of computers all running in parallel (if you have them). More
complicated estimates can be harder to parallelize.
Lecture 3: The method of moments 3-5

A second advantage. Suppose that you really want θ = E(X) or Var(X) or both and you made an estimate
assuming that Xi have a PDF with the parametric distribution form f (x; θ). If you were wrong about that
f your X̄ will still be consistent for E(X).

Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
Lecture Note 17
No ratings yet
Lecture Note 17
10 pages
Wickham Stati
No ratings yet
Wickham Stati
12 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Method of Moments: Topic 13
No ratings yet
Method of Moments: Topic 13
9 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
MIT14 30s09 Lec19
No ratings yet
MIT14 30s09 Lec19
7 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
Hypothesis Testing Problems 11
No ratings yet
Hypothesis Testing Problems 11
9 pages
Point Estimation Exercises
100% (1)
Point Estimation Exercises
7 pages
Chap 4
No ratings yet
Chap 4
14 pages
6 Mle Asy A
No ratings yet
6 Mle Asy A
22 pages
NOTES
No ratings yet
NOTES
14 pages
Sampling Distributions: 1.1 Statistical Inference
No ratings yet
Sampling Distributions: 1.1 Statistical Inference
22 pages
Statistics I: Parameter Estimation, Part I
No ratings yet
Statistics I: Parameter Estimation, Part I
24 pages
Method of Moments
No ratings yet
Method of Moments
4 pages
Par Est
No ratings yet
Par Est
36 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
Stats 2 Notes
No ratings yet
Stats 2 Notes
17 pages
Method of Moments
No ratings yet
Method of Moments
10 pages
Notes 12
No ratings yet
Notes 12
23 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Estimation Methods
No ratings yet
Estimation Methods
4 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
Theory of Estimation Notes
No ratings yet
Theory of Estimation Notes
19 pages
Statistics
No ratings yet
Statistics
53 pages
Stimation: Statistic
No ratings yet
Stimation: Statistic
46 pages
Chapitre 10 - Construction of Estimators
No ratings yet
Chapitre 10 - Construction of Estimators
35 pages
SI Chapter-2
No ratings yet
SI Chapter-2
53 pages
PSet 7
No ratings yet
PSet 7
2 pages
Method of Moments in Statistics
No ratings yet
Method of Moments in Statistics
9 pages
Biostatistics M291 Lecture 9 Sampling Distribution of The Mean and CI Mod
No ratings yet
Biostatistics M291 Lecture 9 Sampling Distribution of The Mean and CI Mod
34 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Infer
No ratings yet
Infer
218 pages
T 4 Sampling Distributions
No ratings yet
T 4 Sampling Distributions
13 pages
Outline Note Allan Agresti
No ratings yet
Outline Note Allan Agresti
187 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
Sampling: Example Problem
No ratings yet
Sampling: Example Problem
7 pages
Confidence Intervals with σ unknown
No ratings yet
Confidence Intervals with σ unknown
9 pages
Confidence Intervals & Hypothesis Testing
No ratings yet
Confidence Intervals & Hypothesis Testing
9 pages
Lecture 13
No ratings yet
Lecture 13
12 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Business Analytics & Machine Learning: Regression Analysis
No ratings yet
Business Analytics & Machine Learning: Regression Analysis
58 pages
Lectures Series 8a - Point Estimators
No ratings yet
Lectures Series 8a - Point Estimators
23 pages
Tutorial 7 2602
No ratings yet
Tutorial 7 2602
6 pages
STAT 231 Course Notes Winter
100% (1)
STAT 231 Course Notes Winter
358 pages
College Statistics
No ratings yet
College Statistics
244 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
Chapter 6 - Estimation
No ratings yet
Chapter 6 - Estimation
20 pages
Normal Statistics Estimation
No ratings yet
Normal Statistics Estimation
8 pages
1 Review 1-13-2025
No ratings yet
1 Review 1-13-2025
97 pages
STAT 231 Course Notes W16 Print
No ratings yet
STAT 231 Course Notes W16 Print
424 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
30 pages
Lecture 5 - 6-260
No ratings yet
Lecture 5 - 6-260
10 pages
Stat
No ratings yet
Stat
43 pages
Econ 109 (2014) August 110000
No ratings yet
Econ 109 (2014) August 110000
2 pages
4.AS-AD Model
No ratings yet
4.AS-AD Model
29 pages
Public Sector Economics
No ratings yet
Public Sector Economics
3 pages
5.money and Bankin
No ratings yet
5.money and Bankin
26 pages
Public Sector Economics
No ratings yet
Public Sector Economics
14 pages
ANSYS Materials
No ratings yet
ANSYS Materials
57 pages
Exp 12
No ratings yet
Exp 12
4 pages
Vignesh R-6
No ratings yet
Vignesh R-6
2 pages
Yash Resume
No ratings yet
Yash Resume
1 page
High Performance PostgreSQL For Rails Beta Reliable Scalable Maintainable Database Applications Andrew Atkinson Instant Download
100% (4)
High Performance PostgreSQL For Rails Beta Reliable Scalable Maintainable Database Applications Andrew Atkinson Instant Download
61 pages
Dart01 Introduction
No ratings yet
Dart01 Introduction
25 pages
Form Talysurf Intra Handbook 1 5
No ratings yet
Form Talysurf Intra Handbook 1 5
190 pages
Ism Pro
No ratings yet
Ism Pro
12 pages
Poligraf, Human Vs Computer - Engleza
No ratings yet
Poligraf, Human Vs Computer - Engleza
12 pages
ServiceNow Flow Designer Guide
No ratings yet
ServiceNow Flow Designer Guide
6 pages
Vaishnavi Agarwal: Software Engineer Profile
No ratings yet
Vaishnavi Agarwal: Software Engineer Profile
1 page
CAR Experiments
No ratings yet
CAR Experiments
13 pages
Chapter 2 - Database Analytics
No ratings yet
Chapter 2 - Database Analytics
64 pages
Cybersecurity Analyst Interview Prep
No ratings yet
Cybersecurity Analyst Interview Prep
37 pages
AibeilePlus Manual R0
No ratings yet
AibeilePlus Manual R0
12 pages
Digital Planner Instructions
No ratings yet
Digital Planner Instructions
17 pages
Grade 6 Language Arts Performance Tasks PDF
No ratings yet
Grade 6 Language Arts Performance Tasks PDF
140 pages
Syllabus Cse
No ratings yet
Syllabus Cse
53 pages
Program Guide: Server and Cloud Enrollment
No ratings yet
Program Guide: Server and Cloud Enrollment
15 pages
4.1.1 Digital Input Module 07 DI 92 32 Digital Inputs 24 V DC, Electrically Isolated in Groups, CS31 System Bus
No ratings yet
4.1.1 Digital Input Module 07 DI 92 32 Digital Inputs 24 V DC, Electrically Isolated in Groups, CS31 System Bus
6 pages
Digitech XC4881 - Negative Film - Slide Scanner - User Manual
No ratings yet
Digitech XC4881 - Negative Film - Slide Scanner - User Manual
30 pages
Biodex System 4 Synchronization Manual-Mjs-Emg-Analog-Signal-Access-Config-14379
No ratings yet
Biodex System 4 Synchronization Manual-Mjs-Emg-Analog-Signal-Access-Config-14379
15 pages
Question Paper
No ratings yet
Question Paper
9 pages
Squid Proxy Setup for RHEL5/CentOS
No ratings yet
Squid Proxy Setup for RHEL5/CentOS
5 pages
Visual Basic Thesis Title
100% (2)
Visual Basic Thesis Title
8 pages
Transaction Processing: Seppo Sippu Eljas Soisalon-Soininen
No ratings yet
Transaction Processing: Seppo Sippu Eljas Soisalon-Soininen
399 pages
DataMap Guide
No ratings yet
DataMap Guide
34 pages
Addition of Two Polynomials Using Linked List
No ratings yet
Addition of Two Polynomials Using Linked List
15 pages
Installing Ubuntu 18.04 On A Lenovo 100S
No ratings yet
Installing Ubuntu 18.04 On A Lenovo 100S
1 page
Compilers Lecture 7
No ratings yet
Compilers Lecture 7
21 pages

Method of Moments

Uploaded by

Method of Moments

Uploaded by

Stat 200: Introduction to Statistical Inference Autumn 2018/19

Lecture 3: The method of moments

1. find some estimate of θ of the form θ̂ = T (X1 , . . . , Xn )

c Stanford University 2018

3.2 Method of moments

3.2.1 Normal data

Many students will already be familiar with a different estimator

This will come up for us later.

3.2.2 Gamma data

3.3 How well does M.O.M. work?

3.3.1 Delta method variance

g(X̄) − g(µ) ≈ (X̄ − µ)g 0 (µ) ≈ N (0, σ 2 g 0 (µ)2 /n).

3.3.2 Problems and advantages of moments

You might also like