Te 517
Te 517
TESIS
P R E S E N T A:
Georges Bucyibaruta
Director:
Asesor:
Sustentante:
Georges Bucyibaruta
iii
Abstract
The demand for reliable small area estimates derived from survey data has increased greatly
in recent years due to, among other things, their growing use in formulating policies and
programs, allocation of government funds, regional planning, small area business decisions
and other applications.
Traditional area-specific (direct) estimates may not provide acceptable precision for small
areas because sample sizes are rarely large enough in many small areas of interest. This
makes it necessary to borrow information across related areas through indirect estimation
based on models, using auxiliary information such as recent census data and current ad-
ministrative data. Methods based on models are now widely accepted. Popular techniques
for small area estimation use explicit statistical models, to indirectly estimate the small
area parameters of interest.
In this thesis, a brief review of the theory of Linear and Generalized Linear Mixed Models is
given, the point estimation focusing on Restricted Maximum Likelihood. Bootstrap meth-
ods for two-Level models are used for the construction of confidence intervals. Model-based
approaches for small area estimation are discussed under a finite population framework.
For responses in the Exponential family we discuss the use of the Monte Carlo Expectation
and Maximization algorithm to deal with problematic marginal likelihoods.
v
Dedication
Acknowledgements
Throughout the period of my study, I have had cause to be grateful for the advice, support
and understanding of many people. In particular I would like to express my boundless
gratitude to my supervisor Dr. Rogelio Ramos Quiroga for helping me get into the prob-
lem of small area estimation and kindly direct this work. He has guided me in my research
through the use of his profound knowledge, insightful thinking, thoughtful suggestions as
well as his unflagging enthusiasm. I really appreciate the enormous amount of time and
patience he has spent on me.
This work has also benefited greatly from the advice, comments and observations of Dr.
Miguel Nakamura Savoy and Dr. Enrique Raúl Villa Diharce.
I gratefully acknowledge the admission to CIMAT and financial support from Secretaria
de Relaciones Exteriores (SRE), México with which it has been possible to realize this
achievement.
Finally but certainly not least, I want to remember my entire family in Rwanda whose
unlimited support, prayers, and encouragement carried me through times of despondency
and threatening despair.
Abstract iii
Dedication v
Acknowledgements vii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Main Contribution of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 3
ix
3.3.2 Restricted Maximum Likelihood (REML) Estimation . . . . . . . . 20
3.4 Confidence Intervals for Fixed Effects . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Normal Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Bootstrapping Two-Level Models . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2 Parametric Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3 Bootstrap Confidence Intervals . . . . . . . . . . . . . . . . . . . . . 25
3.6 Prediction of Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.1 Best Linear Unbiased Predictor (BLUP) . . . . . . . . . . . . . . . . 26
3.6.2 Mixed Model Equations . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.3 Empirical Best Linear Unbiased Predictor (EBLUP) . . . . . . . . . 28
x
List of Figures
5.1 Simulated from Metropolis algorithm with Normal density proposal and
conditional density f (u|y) target. . . . . . . . . . . . . . . . . . . . . . . . . 56
List of Tables
4.1 Survey and Satellite Data for corn and soybeans in 12 Iowa Counties. . . . 40
4.2 The estimated parameters for corn. . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 The estimated parameters for soybeans. . . . . . . . . . . . . . . . . . . . . 44
4.4 Normal Confidence Interval for corn. . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Normal Confidence Interval for soybeans. . . . . . . . . . . . . . . . . . . . 45
4.6 Bootstrap Confidence Interval for corn. . . . . . . . . . . . . . . . . . . . . . 45
4.7 Bootstrap Confidence Interval for soybeans. . . . . . . . . . . . . . . . . . . 46
4.8 Predicted Mean Hectares of corn in 12 Iowa Counties. . . . . . . . . . . . . 46
4.9 Predicted Mean Hectares of soybean in 12 Iowa Counties. . . . . . . . . . . 46
xi
Chapter 1
Introduction
Small Area Estimation (Rao, 2003) tackles the important statistical problem of providing
reliable estimates of a variable of interest in a set of small geographical areas. The main
problem is that it is almost impossible to measure the value of the target variable for all
the individuals in the areas of interest and, hence, a survey is conducted to obtain a rep-
resentative sample.
Surveys are usually designed to provide reliable estimates of finite population parameters
of interest for large domains in which the large domains can refer to geographical regions
such as states, counties, municipalities or demographic groups such as age, race, or gender.
The information collected is used to produce some direct estimate of the target variable
that relies only on the survey design and the sampled data. Unfortunately, sampling from
all areas can be expensive in resources and time. Domain estimators that are computed
using only the sample data from the domain are known as direct estimators. Even if di-
rect estimators have several desirable design-based properties, direct estimates often lack
precision when domain sample sizes are small. In some cases there might be a need for
estimates of domains which are not considered during planning or design, and this leads
to the possibility that the sample size of the domains could be small and even zero. Di-
rect estimators of small area parameters may have large variance and sometimes cannot
be calculated due to lack of observations (Rao, 2003,Chapiter 1, Montanari, Ranalli and
Vicarelli, 2010).
To produce estimates for small areas with an adequate level of precision often requires
indirect estimators that use auxiliary data through a linking model in order to borrow
strength and increase the effective sample size. Regression models are often used in this
context to provide estimates for the non-sampled areas. The model is fitted on data from
the sample and used together with additional information from auxiliary data to compute
estimates for non-sampled areas. This method is often known as synthetic estimation
1
2 CHAPTER 1. INTRODUCTION
(Gonzlez, 1973).
Small area estimation (SAE) involves the class of techniques to estimate the parameters of
interest from a subpopulation that is considered to be small. In this context, a small area
can be seen as a small geographical region or a demographic group which has a sample
size that is too small for delivering direct estimates of adequate precision (Rao, 2003,
Chapiter 1). The overriding question of small area estimation is how to obtain reliable
estimates when the sample data contain too few observations (even zero in some cases)
for statistical inference of adequate precision due to the fact that the budget and other
constraints usually prevent the allocation of sufficiently large samples to each of the small
areas as we mentioned in the previous paragraph.
1.1 Background
The use of small area statistical methods has been in existence dating back to as early as
the eleventh and seventeenth centuries in England and Canada respectively. In these cases,
they were based on either census or on administrative records (Brackstone, 1987). In re-
cent years, the statistical techniques for small area estimation have been a focus for various
authors in the context of sample surveys, and there is an ever-growing demand for reliable
estimates of small area populations of all types (both public and private sectors). Some
of the main reasons of that demand that we can pick out are: Administrative planning,
disease mapping, determination of state funding allocations, etc. Examples of small area
estimation applications include: The prediction of the area under corn and soybeans in
counties of Iowa (Battese et al., 1988), the production of estimates of income and poverty
measures for various population subgroups for states, counties and school districts (Rao,
2003, Chapiter 1).
Mixed models with unit level auxiliary data have been used for small area estimation by a
good number of authors. Battese, Harter and Fuller (1988) use a linear mixed model to pre-
dict the area planted with corn and soybeans in Iowa counties. In this case, the area under
corn and soybeans are considered to be small because the sample sizes obtained were too
small for direct estimation. Datta and Ghosh (1991) introduce the hierarchical Bayes pre-
dictor for general mixed models. Larsen (2003) compared estimators for proportions based
on two unit level models, a single model with no area level model covariates and a model
using the area level information. Malec (2005) proposes Bayesian small area estimates
for means of binary responses using a multivariate binomial (multinomial) model. Jiang
1.2. STRUCTURE OF THE THESIS 3
(2007) reviews the classical inferential approach for linear and generalized linear mixed
models and discusses the prediction for a function of fixed and random effects. Montanari,
Ranalli, and Vicarelli (2010) consider unit level linear mixed models and logistic mixed
models, for binary response variables and fully known auxiliary information. Vizcaino,
Cortina, Morales Gonzalez (2011) derive small area estimators for labor force indicators in
Galicia using a multinomial logit mixed model.
ture of these models, we show that the direct ML and REML methods are limited due to
the complexity of the (marginal) likelihood function and we suggest the Monte Carlo EM
as an alternative method of parameter estimation that avoids of the complexities of direct
ML methods.
Chapter 2
2.1 Introduction
In this chapter, we briefly describe some approaches to small area estimation. In general
we can differentiate two types of small area estimators: direct and indirect estimators. The
direct small area estimator is based on survey design, while indirect estimators are mainly
based on different models and techniques such as implicit model based approaches that
include synthetic and composite estimators known as traditional indirect estimators, and
explicit models which are categorized as area level and unit level models and are based on
mixed model methodologies (Rao, 2003, Chapiter 5).
In this chapter we will briefly talk about, direct small area estimators, implicit models
(synthetic and composite estimators) and explicitly small area models (unit level model
and area level model).
5
6 CHAPTER 2. APPROACHES TO SMALL AREA ESTIMATION
It is called a synthetic estimator, the one which is a reliable direct estimator for a large
area, covering various small areas, and is used as an estimator for a small area considering
that the small area has the same characteristics as the larger area (Gonzlez, 1973, Rao,
2003, Chapiter 4).
According to Rao (2003), to avoid the potential bias of a synthetic estimator, say Ŷis and
the instability of the direct estimator, say Ŷid , we consider a convex combination of both,
known as the composite estimator.
for a suitable chosen weight ωi (0 ≤ ωi ≤ 1), where c, s, and d stand for composite,
synthetic and direct, respectively.
2.4. SMALL AREA MODELS 7
ηi = xti β + υi , i = 1, 2, . . . , m, (2.1)
The area level model assumes that there exists a direct survey estimator η̂i for the small
area parameter ηi such that
η̂i = ηi + i , i = 1, 2, . . . , m, (2.2)
where the i is the sample error associated with the direct estimator η̂i , with the assump-
tions that the i ’s are independent normal random variables with mean E(i ) = 0 and
Var(i ) = τi . Combining these two equations, yields the area level linear mixed model
mixed model known as nested error linear regression model. This type of model can be
represented by the following equation
The nested error unit level regression model (2.4) was first used to model county crop areas
in North-Central Iowa, USA (Battese et al., 1988).
The explicit linking models provide significant improvements in techniques for indirect esti-
mation. Based on mixed model methodology, these techniques incorporate random effects
into the model. The random effects account for the between-area variation that cannot
be explained by including auxiliary variables. Most small area models can be defined as
an area-level model, or a unit-level model. Area-level models relate small-area direct esti-
mators to area-specific auxiliary data. Unit-level models relate the unit values of a study
variable to unit-specific auxiliary data .
In small area estimation, the focus is on bias and variance. For a Small sample size, the
unbiasedness of the direct estimators, may be of no practical value due to its large variance.
2.5. COMPARISONS BETWEEN MODEL-BASED WITH DESIGN-BASED APPROACHES9
Model-based estimators may be sometimes biased, but they have the advantage of small
variances compared to the design-based estimators (Rao, 2003, Chapiter 5).
10 CHAPTER 2. APPROACHES TO SMALL AREA ESTIMATION
Chapter 3
3.1 Introduction
In this chapter we present an overview of the linear mixed model. It is not intended to
be an all encompassing exposition on the subject; the rationale is to briefly explore the
methods for parameter estimation used throughout the thesis. In particular, it serves as
an introduction to the models that will be used in Chapter 4. The model-based small
area estimation largely employs linear mixed models involving random area effects. The
auxiliary variables are introduced in the fixed part of the model as covariates.
A linear mixed model (LMM) can be viewed as a parametric linear model for clustered, or
longitudinal data that defines the relationships between a continuous dependent variable
and various predictor variables (covariates) (Brady et al., 2007). The term mixed refers
to the presence of fixed and random effects. To moderate the random variation in the
dependent variable at different levels of the data, random effects are directly used since
they are associated with one or more random factors.
Generally, clustered data are defined as data sets obtained after measuring the variable
of interest on different individuals grouped in a cluster. Longitudinal data, are viewed as
data sets in which the dependent variable (observed variable) is measured several times
over time for each individual.
LMMs can be considered as hierarchical models with at least two levels of data. There-
fore, clustered, and longitudinal data are referred to as hierarchical data sets, since the
observations can be placed into levels of a hierarchy in the data. Considering clustered, or
longitudinal data sets as multilevel data sets with at least two-level, we have the following:
Level 1 denotes observations at the most detailed level of the data. In a clustered data set,
Level 1 represents the units of analysis (or subjects) in the study. In a longitudinal data
11
12 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
set, Level 1 represents the repeated measures made on the same unit of analysis. Level 2
represents the next level of the hierarchy. In clustered data sets, Level 2 observations rep-
resent clusters of units. For longitudinal data sets, Level 2 represents the units of analysis
(Brady et al., 2007, McCulloch and Searle, 2001, Kubokawa, 2009).
This chapter will cover the following topics, linear mixed model formulation, point esti-
mation of parameters, confidence intervals for fixed parameters, bootstrapping two-Level
models and prediction of random effects.
There exist different ways to represent and classify LMM which can be expressed at indi-
vidual level (Level 1) or Level 2, as well as in matrix notation. The common assumption
is that of normality for all random effects; in this case the LMM is called Gaussian Mixed
Models (Brady et al., 2007, Jiang, 2007).
In this section we present some alternative representations of LMMs, such are a general
matrix specification, hierarchical linear model and marginal linear model which help to
write a explicit expression of likelihood function.
According to Brady et al. (2007), a simple and general form that indicates how most of
the components of an LMM can be written at the level of an individual observation (Level
1) in the context of a clustered two-level data set. By convention, the index j refers Level
1 units and index i means Level 2 units.
where:
Elements along the main diagonal of the D matrix represent the variances of each random
effect in ui , and the off-diagonal elements represent the covariances between two corre-
sponding random effects. Since there are q random effects in the model associated with
the ith cluster, D is a q × q matrix that is symmetric and positive definite, and is defined
as follows:
Var(ui1 ) Cov(ui1 , ui2 ) . . . Cov(ui1 , uiq )
Cov(ui1 , ui2 ) Var(ui2 ) . . . Cov(ui2 , uiq )
D = Var(ui ) := .
.. .. .. ..
. . . .
Cov(ui1 , uiq ) Cov(ui2 , uiq ) . . . Var(uiq )
The variance and covariance components of the D matrix can be treated as parameters
stored in a vector denoted by θD with elements and dimension depending on the structure
of the matrix D.
In contrast to the standard linear model, the residuals associated with clustered observa-
tions within the same cluster in an LMM can be correlated. We assume that the ni residuals
in the vector ei for a given cluster, i, are random variables that follow a multivariate normal
14 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
distribution with a mean vector 0 and a positive definite symmetric covariance matrix Σi
which is defined in the following general form:
Var(ei1 ) Cov(ei1 , ei2 ) . . . Cov(ei1 , eini )
Cov(ei2 , ei1 ) Var(ei2 ) . . . Cov(ei2 , eini )
Σi = Var(ei ) := .
.. .. .. ..
. . . .
Cov(eini , ei1 ) Cov(eini , ei2 ) . . . Var(eini )
In the same way as the case of matrix D, the variance and covariance components of the
Σi matrix are stored in a vector denoted by θΣ .
Considering both cases we define the vector, θ that will be used in subsequent sections,
which combines all covariance parameters contained in the vectors θD and θΣ . Often
the parameters in θ are simply variance and covariance components, but sometimes they
may have other interpretations such as the autoregressive parameter and moving average
component which are used in time series settings (e.g. Box and Jenkins, 1970) or the sill
and range in geostatistical applications (e.g. Cressie, 1991).
σu21
σu1 ,u2
D = Var(ui ) = .
σu1 ,u2 σu22
σu21
θD = σu1 ,u2 .
σu22
Taking into account certain constraints on the structure of D, the very commonly used
structure is the variance components (or diagonal) structure, in which each random effect
in ui has its own variance, and all covariances in D are defined to be zero. In general, the
θD vector for the variance components structure requires q covariance parameters, defining
the variances on the diagonal of the D matrix. For example, in an LMM having two random
3.2. MODEL FORMULATION 15
effects associated with the ith subject, a variance component D matrix has the following
form:
2
σu1 0
D = Var(ui ) = .
0 σu22
Here we see that the diagonal structure implies one parameter in θΣi , which defines the
constant variance at each time point:
θΣi = σ 2 .
There exist also other structure of covariance matrix Σi called compound symmetry struc-
ture. Its general form for each subject i is as follows:
2
σ + σ12 σ12 σ12
...
σ12 σ 2 + σ12 ... σ12
Σi = Var(ei ) := .
.. .. .. ..
. . . .
σ12 σ12 . . . σ 2 + σ12
In this case there are two parameters in the θΣi vector that define the variances and
covariances in the Σi matrix:
2
σ
θΣi = .
σ1
The other structure which is commonly used as the covariance structure for the Σi matrix,
is a first-order autoregressive structure and is denoted by AR(1) for which the general form
16 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
is :
σ2 σ2ρ . . . σ 2 ρni −1
σ2ρ σ2 . . . σ 2 ρni −2
Σi = Var(ei ) := .
.. .. .. ..
. . . .
σ 2 ρni −1 σ 2 ρni −2 ... σ2
The AR(1) structure has only two parameters in the θ vector that define all the vari-
ances and covariances in the Σi matrix, a variance parameter (σ 2 > 0 ) and a correlation
parameter ( −1 ≤ ρ ≤ 1),
σ2
θΣi = .
ρ
Note that the AR(1) structure is frequently used to fit models when data sets are character-
ized to be equally spaced longitudinal observations on the same units of analysis. According
to this structure, the observations which are closer to each other in time exhibit higher
correlation than observations farther apart in time. In any given analysis, the structure for
the Σ matrix that seems to be most appropriate, is determined due to the given observed
data and knowledge about the relationships between observations on an individual subject
or cluster.
We now consider the general matrix specification of an LMM for a given cluster i,
Yi = Xi β + Zi ui + ei i = 1, . . . , m. (3.2)
Where
xti1
t
zi1 yi1
Xi := ... ∈ Rni ×p , Zi := ... ∈ Rni ×q , Yi := ... ∈ Rni .
xtini t
zin i
yini
3.2. MODEL FORMULATION 17
Y = Xβ + Zu + e, (3.3)
where
u 0 G 0mq×n
∼ Nmq+n , .
e 0 0n×mq R
Y |u ∼ Nn (Xβ + Zu, R)
u ∼ Nmq (0, G) . (3.4)
The marginal linear model is different from the LMM due to the fact of having random
effects or not. Marginal models do not allow the cluster-specific effects since the random
effects are not specified. The following is the marginal model specification implied by an
LMM.
Y = Xβ + ε∗ , (3.5)
where
u
ε∗ := Zu + e =
Z In×n
| {z } e
A
with
ε∗ ∼ Nn (0, V )
where
Zt
G 0 t
G 0
V =A A = Z In×n
0 R 0 R In×n
Zt
= ZG R
In×n
= ZGZ t + R
Z1 DZ1 + Σ1 0
=
..
.
0 Zm DZm + Σm
V1 0
=
.. .
.
0 Vm
The implied marginal model defines the marginal distribution of the Y vector:
Y ∼ Nn (Xβ, V ) .
The implied marginal model has an advantage due to the flexibility of carrying out the
estimation of fixed model parameters (fixed-effects and variance components) in an LMM
without worrying about the random effects.
3.3. ESTIMATION IN LINEAR MIXED MODELS 19
In the context of the LMM, we construct the likelihood function of β and θ by referring to
the marginal distribution of the dependent variable Y defined in (3.5). The corresponding
multivariate normal probability density function, f (Y |β, θ), is :
1
f (Y |β, θ) = (2π)−n/2 |V |−1/2 exp{− (Y − Xβ)t V −1 (Y − Xβ)}. (3.6)
2
Recall that the elements of the V matrix are functions of the covariance parameters in θ.
So that the log likelihood is
1 1 n
l (β, θ) = − (Y − Xβ)t V −1 (Y − Xβ) − ln(|V |) − ln(2π). (3.7)
2 2 2
β̂ = (X t V −1 X)−1 X t V −1 Y. (3.8)
Which is also known as the Generalized Least Squares (GLS) estimator when it is assumed
that Y has a normal distribution.
To estimate V (θ), we put (3.9) into (3.7) to obtain the profile log likelihood
1 1 n
lp (θ) = l β̃(θ), θ = − (Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ)) − ln(|V (θ)|) − ln(2π)
2 2 2
The profile log likelihood is then maximized to find θM L .
Harville (1977) asserts that, in estimation of variance parameters, however, the ML method
suffers from some well known problems. One of them, it tends to give variance estimates
that are biased. For the ordinary linear model where m = 1, V = σ 2 I and q = 0, the ML
estimator of the single variance component σ 2 has expectation
n o
2 t
E(σ̂M L ) = E (Y − X β̂) (Y − X β̂)/n
1 t
= E Y (In − X(X t X)−1 X t )Y
n
1 1
= σ 2 tr(In − X(X t X)−1 X t ) + β t X t (In − X(X t X)−1 X t )Xβ
n n
n−r 2
= σ
n
where r is the rank of matrix X. So the bias is σ 2 r/n, that can be significant in the case
that the number of degrees of freedom n − r is very small.
et al., 1992, Pinheiro and Bates, 2000, Verbeke and Molenberghs, 2000, Diggle et al., 2002).
The REML estimates of θ are based on optimization of the REML log-likelihood function.
It means we first remove the effect of the fixed variables: Remember that the residuals are
uncorrelated with all the fixed variables in the model. The distribution of the residuals is
also normal. But the distribution of the residuals no longer depends on the estimates of
the fixed effects; it only depends on the variance components. Harville (1977) considered
a joint likelihood function of (β, θ), where by integrating the joint likelihood, he calculated
the marginal likelihood function depending only on θ (restricted likelihood):
Z
lR (θ) = ln L (β, θ) dβ (3.10)
where
Z Z
− 12 −n/2 1 t −1
L (β, θ) dβ = |V(θ)| (2π) exp − (Y − Xβ) V (θ) (Y − Xβ) dβ. (3.11)
2
Now consider
Define:
Note that
Therefore we have
Z
1
L (β, θ) dβ = |V(θ)|− 2 (2π)−n/2 exp{−1/2 × (Y t [V (θ)−1 + B(θ)t A(θ)B(θ)]Y }×
Z
1
exp{− (β − B(θ)Y )t A(θ)(β − B(θ)Y )}dβ
2
1 1 (2π)p/2
= |V(θ)|− 2 (2π)−n/2 exp{− (Y t [V (θ)−1 + B(θ)t A(θ)B(θ)]Y } .
2 |A(θ)−1 |−1/2
22 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
Now
(Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ)) = Y t V (θ)−1 Y − 2Y t V (θ)−1 X β̃(θ) + β̃(θ)t X t V (θ)−1 X β̃(θ)
| {z }
A(θ)
t −1 t −1 t
= Y V (θ) Y − 2Y V (θ) X β̃(θ) + β̃(θ) A(θ)β̃(θ)
t −1
= Y V (θ) Y − 2Y B(θ) A(θ)B(θ)Y + Y t B(θ)t A(θ)B(θ)Y
t t
−1 −1
where σ̂i2 = (X t V̂REM L X)ii are considered as estimates of Var(β̂i ). Therefore
q
−1 −1
β̂i ± z1−α/2 (X t V̂REM L X)ii (3.14)
Note that (3.14) is an approximate confidence interval, though a very useful one in a wide
variety of situations (Jiang, J. (2007)). We will use the bootstrap to calculate better ap-
proximate confidence intervals, where we will be concerned on the computation of bootstrap
standard errors (Efron, B. and Tibshirani, R.J. (1993)).
2. From each of the B samples, estimate the parameter θ, thereby obtaining B estimates
θb∗ .
∗ = 1
PB ∗
3 Calculate θ(.) B b=1 θb ,
∗ 1 PB
var(θ
ˆ ) = B−1 b=1 (θb∗ − θ(.)
∗ )2 , the expectation and the variance of θ ∗ respectively.
ˆ B = Bias(θ
4. The bias Bias ˆ ∗ ) = θ ∗ − θ̂
(.)
24 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
ˆ B = 2θ̂ − θ∗
5. The bias-corrected of θ is θ̂B = θ̂ − Bias (.)
1. Draw one entire Level-2 unit (Yi , Xi , Zi ), containing ni Level-1 cases, with replace-
ment,
2. From this Level-2 unit, draw a bootstrap sample (Yi∗ , Xi∗ , Zi∗ ) of size ni with replace-
ment,
5. Repeat steps 1-4 B times and compute bais-corrected estimates and bootstrap stan-
dard errors.
The bootstrap has proved to be an approach that derives consistent estimates of bias and
standard errors of model parameters. It has been recommended to consider bootstrap es-
timation in multilevel models since gives satisfactory results in small sample cases under
minimal assumptions. The application of the bootstrap to multilevel models is not direct
because it depends on the nature the problem. During a resampling process, the hierar-
chical data structure must be considered since the observations are based on intra-class
dependency. Several adaptations are needed in order to deal properly with the intra-class
dependency. Here we will consider a two-level model and discuss one bootstrap approach:
parametric bootstrap. Among the assumptions the parametric bootstrap is based on that
we take into account are: The explanatory variables are considered fixed, and both the
model (specification) and the distribution(s) are assumed to be correct (Efron (1982), Hall
(1992), Van der Leeden, Busing and Meijer (1997)).
the level-2 residuals, contained in the vector u, we use the N (0, Ĝ) distribution function.
Let β̂ be the REML estimator of β. The (re)sampling procedure is now as follows (Van
der Leeden, Busing and Meijer, 1997):
1. Draw vector u∗ from the a multivariate normal distribution with mean zero and
covariance matrix Ĝ.
2. Draw vector e∗ from the a multivariate normal distribution with mean zero and
covariance matrix R̂.
3. Generate the bootstrap samples Y ∗ from Y ∗ = X β̂ + Zu∗ + e∗ .
4. Compute estimates for all parameters of the two-level model.
5. Repeat steps 1-4 B times and compute bias-corrected estimates and bootstrap stan-
dard errors.
Note that in this sampling process the covariates are assumed to be fixed.
in which se
ˆ B (θ̂) is the bootstrap estimator of the standard deviation of θ̂. Alternatively,
one might use
h i
θ̂B − zα/2 se
ˆ B (θ̂), θ̂B + z1−α/2 se
ˆ B (θ̂) (3.18)
where θ̂B is the bootstrap bias-corrected estimator of θ (Efron, B. and Tibshirani, R.J.
(1993)).
This is the best predictor of u, and being a linear function of Y it also is the best linear
predictor (BLP) of u. In practice the unknown β in (3.19) is replaced with its estimator
β̂ = β̂GLS , which is the BLUE of β yielding the best linear unbiased predictor (BLUP) of
u:
The unbiasedness means here that both the random variable u and its predictor ũ have the
same expected value. Mathematically, the BLUP estimates of fixed parameter and random
effects are defined as solution of Henderson’s simultaneous equations which sometimes are
called mixed model equations (Robinson, 1991).
Since Y |u ∼ Nn (Xβ + Zu, R) and u ∼ Nq (0, G), the joint density of Y and u is
1
f (Y, u) = f (Y |u) f (u) = (2π)−n/2 |R|−1/2 exp{− (Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)}
2
1
× (2π)−q/2 |G|−1/2 exp{− ut G−1 u}
2
exp{− 12 [(Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)} + ut G−1 u]
= .
(2π)(n+q)/2 |R|1/2 |G|1/2
To maximize the density f (Y ; u) calculate the partial derivatives of
n+q 1 1
ln(f (Y, u)) = − ln(2π) − ln |R| − ln |G|−
2 2 2
1
[(Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)} + ut G−1 u],
2
with respect to β and u:
∂ ln(f (Y, u)) ∂ ln(f (Y, u))
= X t R−1 (Y − Xβ − Zu) = Z t R−1 (Y − Xβ − Zu) − G−1 u.
∂β ∂u
Setting these to zero yields equations
X t R−1 Xβ + X t R−1 Zu = X t R−1 Y
Z t R−1 Xβ + X t R−1 Zu + G−1 u = Z t R−1 Y,
which are written in matrix form as
t −1
X t R−1 Z
t −1
X R X β X R Y
−1 −1 −1 = . (3.22)
t t
Z R X X R Z +G u Z t R−1 Y
These are Henderson’s mixed model equations. Solving them produces the β̂GLS and the
ũ. A practical advantage of the mixed model equations is their computational convenience,
because, unlike in (3.20), there is no need for inverting the n × n covariance matrix V . The
inverses of q × q matrix G and n × n matrix R are needed instead, but they are often easy
to compute: q is usually not that large and R is usually diagonal.
28 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
The predictor û is called empirical best linear unbiased predictor (EBLUP) of u, the word
empirical referring to the fact that the values of G and V have been obtained from the
observed data. The estimator β̂ is now the GLS estimator, where V is replaced with its
estimate (i.e. β̂ is typically β̂REML or β̂ML ), and it is sometimes called empirical BLUE of
β. Both β̂ and û can be obtained by solving the Henderson’s mixed model equations where
V , G and R are substituted by the corresponding estimates.
Considering the model at cluster level (level 2), model (3.2), the EBLUP for the conditional
expectation E(Yi |ui ) = Xi β + Zi ui is given by
The expression (3.23) may be rewritten in the form a weighted average combining the in-
formation from cluster (or subject) i only and information from the population (auxiliary
information). The EBLUP for Xi β + Zi ui may be viewed as borrowing strength across
clusters to get the best prediction for individual i. That is, when the data from a region is
small, resulting that the information from that region is weak. The regional estimate needs
to be strengthened with supplementary global data. This is the fact that the smaller the
regional data is, the more weight the global information gets in the estimation (Kubokawa
(2009), McCulloch y Searl (2001)).
This connection between linear mixed models and small area estimation is discussed in the
following chapter.
Chapter 4
4.1 Introduction
In this chapter we illustrate how to apply the theory on linear mixed models and the related
EBLUP theory toward the estimation of a small area characteristic of interest. To give an
idea about the context, we start with the following discussion.
Suppose that the population U of size N is divided into M disjoint areas and that the
survey estimates are available for m, m ≤ M , of the areas.
Let
U = U1 ∪ U2 . . . ∪ Ui ∪ . . . ∪ UM
and
N = N1 + N2 + . . . + Ni + . . . + NM
where Ni is the size of area population Ui , i = 1, 2, . . . , M. Assume then that a random
sample s of size n is drawn from U and
s = s1 ∪ s2 . . . ∪ si ∪ . . . ∪ sm
and
n = n1 + n2 + . . . + ni + . . . + nm .
Furthermore, each area population Ui divides into the sample si and the remainder ri =
Ui − si . It is possible that for some areas the sample size ni is zero so that ri = Ui .
To illustrate these concepts, we consider the following example from Fuller (2009) which
considers the prediction of wind erosion in Iowa for the year 2002, where 44 counties re-
ported data of wind erosion but the author included 4 counties with no observation due to
29
30 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
the purpose of illustration. Each county is divided into segments, and the segments are the
primary sampling units of the survey. The soils of Iowa have been mapped, so population
values for a number of soil characteristics are available. The mean of the soil erodibility
index for each county is the auxiliary data in the small area model.
The data set includes different variables such are county, totalsegments, samplesegments,
erodibility, and Y. The variable county records an identification number for each county,
totalsegments records the total number of segments in the county, and samplesegments
records the sampled number of segments in the county. The variable erodibility records a
standardized population mean of the soil erodibility index for each county, and Y records
the survey sample mean (direct estimate) of a variable that is related to wind erosion
for each county. The first 44 observations contain the observed data, and the last 4 obser-
vations contain the hypothetical counties for which there were no observations in the survey.
Relating this example to general description, the following specifications are given:
ȳi = θi + ei
θi = x̄ti β + υi ,
i.e
ȳi = x̄ti β + υi + ei ,
4.2. NESTED ERROR REGRESSION MODEL 31
with x̄ti = [1, 0.1(r̄1i − 59)], i = 1, 2, . . . , 48. Where υi is the area effect, ei is sampling error,
Due to the fact that the mean of y for area i is assumed to be the sum of x̄i β and υi , the
fixed and random parts respectively, this model is also called mixed model (see Chapter
3).
The model presented here is the area level model where the quantity of interest is the
mean, but the nature of data may vary for different context and problems. The model can
be defined in terms of primary sampling units or area level terms as in this example.
This chapter covers the statistical framework specific for small area estimation. We present
present the corresponding linear mixed model in Section 4.2. In Section 4.3 we discuss
Prediction for small areas quantities of interest and we close the chapter with an example
from the literature.
Here yij is the response of unit j in area i, xij is the corresponding vector of auxiliary
variables, β is the vector of fixed parameters, ui is the random effect of area i and eij the
random individual error term. The area effects ui are assumed independent with zero mean
and variance σu2 . Similarly, the errors eij are independent with zero mean and variance
σe2 . In addition, the ui ’s and the eij ’s are assumed mutually independent. Under these
assumptions,
E(yij ) = xtij β,
and
We also make the usual assumption that both ui and eij are normally distributed. Let Yi
denote the ni × 1 vector of the ni elements in area Ui , i.e.
yi1
yi2
Yi = .
..
.
yini (ni ×1)
Now the model in the matrix form, for the population Ui of area i is
Yi = Xi β + Zi ui + ei , (4.2)
where
xti1
xti2
Xi =
..
.
xtini (ni ×p)
With this notation E(Yi ) = Xi β and Var(Yi ) = Vi = Jni σu2 + Ini σe2 , where Jni = 1ni ⊗ 1tni
is the square matrix of ones. Here the sign ⊗ is the Kronecker product.
Using the usual mixed model notation we can write Vi = Zi Gi Zit + Ri where Gi = σu2 and
Ri = σe2 Ini . If we further combine the area vectors Yi into one response vector
Y1
Y2
Y = ,
..
.
Ym (n×1)
we have the model in the form (3.3) of the general linear mixed model. The model is now
4.3. BLU AND EBLU PREDICTORS 33
Y = Xβ + Zu + e, with
X1
X2
X= ,
..
.
Xm (n×p)
1n1 0 ... 0
0 1n2 ... 0
Z= ,
.. .. .. ..
. . . .
0 0 . . . 1nm (n×m)
u1
u2
u=
..
.
um (m×1)
e1
e2
e= .
..
.
em (n×1)
Units (or observations) coming from different areas i and i0 are not correlated. The covari-
ance matrix V of the response vector Y is block-diagonal.
V1 0 . . . 0
0 V2 . . . 0
Var(Y ) = V = .
. .. . . ..
. . . .
0 0 . . . Vm n×n
= ZGZ + R, where G = σu2 Im and R = σe2 In .
t
Now suppose that the interest is to estimate the total population of the variable of interest
for area i X
Yi = yij .
j∈Ui
Following Royall (1976), let s denote a sample of size n from a finite population U and
let r denote the nonsampled remainder of U so that U = s ∪ r. Correspondingly, let ys
be the vector of observations in the sample and yr the rest of y. We have the following
decomposition:
ys cs
y= , c= .
yr cr
Note that the first term cts ys is observed from the sample, whereas the second term must
be estimated (or predicted, in the frequentist terminology, since it is a function of random
variables, not a fixed parameter). Thus, estimating h(y), or predicting h(Y ), is essentially
predicting the value ctr yr of the unobserved random variable ctr Yr .
Eξ (Y ) = Xβ
and
Varξ (Y ) = V.
and
Vs Vsr
V = ,
Vrs Vr
4.3. BLU AND EBLU PREDICTORS 35
Zs GZst Zs GZrt
2
Vs Vsr σe In 0
V = = + .
Vsr Vr Zr GZst Zr GZrt 0 σe2 IN −n
The following theorem from Royall (1976) gives the best linear unbiased predictor (BLUP)
of ct Y as well as its error variance under the general linear model, in the case of finite pop-
ulation and serves as a general basis of the BLUP approach to small area estimation with
unit level models. In this context, the best linear unbiased predictor means an unbiased
predictor which is linear in observed data (Ys ) and has a minimum error variance among
all linear unbiased predictors.
where
β̂ = (Xst Vs−1 Xs )−1 Xst Vs−1 ys
is the general least squares (GLS) estimator of β.
Note that the GLS estimator is also the best linear unbiased estimator (BLUE) of β, that
is, it has the minimum variance among linear unbiased estimators (McCulloch and Searle,
2001).
36 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
Proof:
As we are interested in predicting the part which is not observed, the information needed
will come from observed variables (sample vector ys ). Considering the decomposition of
the population quantity of interest
the random term is ctr Yr . Now suppose the predictor of this term is a linear function of
the data, i.e. the predictor ctr Yr is of the form at Ys where a is a vector to be specified. In
addition the estimator ĥ(Y ) is assumed to be unbiased. Now, define
the last expression is obtained from the expression of V and the fact that E(Y ) = Xβ.
Now the BLUP(ct Y ) will be found by minimizing the Eξ (at Ys − ctr Yr )2 with respect to a
under the constraint of model unbiasedness,
Xs λ = Vs a − Vsr , (4.5)
and then
Now multiply (4.5) by Xst Vs−1 yields Xst Vs−1 Xs λ = Xst Vs−1 Vs a−Vsr , using the unbiasedness
constraint at Xs = ctr Xr and then solving for λ, we get
then after substituting this expression in (4.6) and making some simplifications, we get
a = Vs−1 [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]cr . (4.7)
BLUP(ct Y ) = cts Ys + (Vs−1 [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]cr )t Ys
= cts Ys + ctr [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]t Vs−1 Ys
= cts Ys + ctr [Vsrt Vs−1 Ys − (Xst Vs−1 Vsr − Xrt )t (Xst Vs−1 Xs )−1 Xst Vs−1 Ys ]
= cts Ys + ctr [Vsrt Vs−1 Ys − (Xst Vs−1 Vsr − Xrt )t β̂]
= cts Ys + ctr [Xr β̂ + Vsrt Vs−1 (Ys − Xs β̂)],
The expression of error variance is found in the same way by inserting (4.7) into (4.4) •
where ci is an N × 1 vector of Ni ones and N − Ni zeros such that the ones correspond to
those yij s of y, which belong to area i. The same partition to ci yields
cis
ci =
cir
where cis picks the units in the sample si from area i and cir picks those in the remainder
ri . Now the area total to be estimated is
Yi = ctis ys + ctir yr
and the general prediction theorem can be applied directly. This gives
Ỹi,BLUP = ctis ys + ctir Xr β̂ + ctir Zr [GZst Vs−1 (ys − Xs β̂)] = ctis ys + (lti β̂ + mti ũ),
where lti = ctis Xr , mti = ctir Zr and ũ is the BLUP of u according to (3.20). Furthermore
lti β̂ + mti ũ is the BLUP of linear combination lti β + mti u (Rao, 2003).
Note that ctis ys = j∈si yij , lit = ctir Xr = j∈ri xtij and mti = ctir Zr is a row vector with
P P
Ni − ni in the entry i and zeros elsewhere. The BLUP is now
X X
Ỹi,BLU P = yij + ( xtij )β̂ + (Ni − ni )ũi ,
j∈si j∈ri
Specifically, Battese et al. (1988) specified a linear regression model for the relationship
between the reported hectares of corn and soybeans within sample segments in the Enumer-
ative Survey and the corresponding auxiliary data in order to determine the areas under
corn and soybeans. A nested error model, which defines a correlation structure among
reported crop hectares within the counties, was used. From which the mean hectares of
the crop per segment in a county was defined as the conditional mean of reported hectares,
given the satellite determinations (auxiliary data) and the realized (random) county effect.
The Battese et al. (1988) data showing the June-Enumerative survey and the Satellite
Data is presented in Table 4.1. The data shows, specifically;
(ii) The number of hectares of corn and soybeans for each sample segment (as reported
in the June Enumerative Survey),
(iii) The number of pixels classified as corn and soybeans for each sample segment.
According to Battese et al. (1988), the model construction is such that the reported crop
hectares for corn (or soybeans) in sample segments within counties are expressed as a
function of the satellite data for those sample segments. The unit level model is given by
where i is the subscript for county (i = 1, 2, . . . , m = 12), j is the subscript for a segment
within a given county (j = 1, 2, . . . , ni , where ni is the number of sample segments in the
ith county), yij is the number of hectares of corn (or soybeans) in the j th segment of the
ith county, as reported in the June Enumerative Survey, x1ij and x2ij are the number of
pixels classified as corn and soybeans, respectively, in the j th segment of the ith county,
β0 ,β1 , and β2 are unknown parameters, vi is the ith county effect and eij is the random
error associated with the j th segment within the ith county. In addition the random effects,
vi (i = 1, 2, . . . , m = 12), are assumed to be iid N (0, σv2 ) random variables independent
of the random errors, eij (j = 1, 2, . . . , ni ; i = 1, 2, . . . , m) which are assumed to be i.i.d
N (0, σe2 ) random variables.
Next, the sample mean for the hectares of corn (or soybeans) per segment in the ith count
1 Pni
is given by ȳi. = ni j=1 yij while the sample mean numbers of pixels of corn and soybeans
are x̄1i. = n1i nj=1 x1ij and x̄2i. = n1i nj=1
P i P i
x2ij respectively. Thus model for this mean ȳi. ,
is given by
ȳi. = β0 + β1 x̄1i. + β2 x̄2i. + υi + ēi. ,
where ēi. = n1i nj=1
P i
eij .
The population mean hectares of corn (or soybeans) per segment in the ith county is
defined as the conditional mean of the hectares of corn (or soybeans) per segment, given
40 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
Table 4.1: Survey and Satellite Data for corn and soybeans in 12 Iowa Counties.
Mean number of
No.of pixels in pixels per
No.of segments Reported hectares sample segments segment
county Sample County Corn Soybeans Corn Soybeans Corn Soybeans
Cerro Gordo 1 545 165.76 8.09 374 55 295.29 189.70
Hamiliton 1 566 96.32 106.03 209 218 300.40 196.65
Worth 1 394 76.08 103.60 253 250 289.60 205.28
Humboldt 2 424 185.35 6.47 432 96 290.74 220.22
116.43 63.82 367 178
Franklin 3 564 162.08 43.50 361 137 318.21 188.06
152.04 71.43 288 206
161.75 42.49 369 165
Pocahontas 3 570 92.88 105.26 206 218 257.17 247.13
149.94 76.49 316 221
64.75 174.34 145 338
Winnebago 3 402 127.07 95.67 355 128 291.77 185.37
133.55 76.57 295 147
77.70 93.48 223 204
Wright 3 567 206.39 37.84 459 77 301.26 221.36
108.33 131.12 290 217
118.17 124.44 307 258
Webster 4 687 99.96 144.15 252 303 262.17 247.09
140.43 103.60 293 221
98.95 88.59 206 222
131.04 115.58 302 274
Hancock 5 569 114.12 99.15 313 190 314.28 196.66
100.60 124.56 246 270
127.88 110.88 353 172
116.90 109.14 271 228
87.41 143.66 237 297
Kossuth 5 965 93.48 91.05 221 167 298.65 204.61
121 132.33 369 191
109.91 143.14 343 249
122.66 104.13 342 182
104.21 118.57 294 179
Hardin 5 556 88.59 102.59 220 262 325.99 177.05
165.35 69.28 355 160
104 99.15 261 221
88.63 143.66 187 345
153.70 94.49 350 190
4.4. ILLUSTRATION USING DATA FROM BATTESE ET AL. (1988) 41
the realized county effect υi and the values of the satellite data. Under the assumption of
the model (4.8), Battese et al. (1988) presented this mean by as
By letting Yi represents the column vector of the reported hectares of the given crop for
the ni samples segments in the ith count, Yi = (yi1 , yi2 , . . . , yini )t . Furthermore, consider
Y as the column vector of the reported hectares of the crop for the sample segments in the
m counties, Y = (Y1 , Y2 , . . . , Ym )t . Thus the model (4.8), expressed in matrix notation, in
marginal form, is
Y = Xβ + υ ∗ , (4.9)
where the row of X that corresponds to the element yij in Y is xij = (1, x1ij , x2ij )t and
β = (β0 , β1 , β2 )t . The random vector υ ∗ in (4.9) combines random effects and random
errors i.e υ ∗ = υ + e. The covariance matrix for the random vector υ ∗ in (4.9) is given by
where
Vi = Ji συ2 + Ii σe2
with Ji the square matrix of order ni with every element equal to 1 and Ii the identity
matrix of order ni (Battese et al., 1988).
In matrix notation the mean crop hectares per segment in the ith area are expressed as
1 PNi
where x̄i(p) = Ni j=1 xij = (1, x̄1i(p) , x̄2i(p) ).
best predictor of υi is the conditional expectation of υi , given the sample mean ῡi.∗ , where
42 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
ῡi.∗ = n−1
Pni ∗ ∗
i j=1 υij (Henderson, 1975). The expectation of υi , conditional on ῡi. , which is
considered as the best predictor has the following expression
where gi = m−1 2 2 −1 2 ∗
i συ and mi = συ + ni σe . These results are due to the fact that υi and ῡi.
have a joint bivariate normal distribution.
Considering the variances components συ2 and σe2 to be known, the β parameters of the
model can be estimated by the generalized least squares estimator
β̃ = (X t V −1 X)−1 X t V −1 Y, (4.13)
which is equivalent to the ML estimator under the normal distribution of Y . Then after
(4.12) a possible predictor for the ith county effect, υi , is
for which, according to Harville (1985), is the best linear unbiased predictor of yi .
According to the section (4.3), the predictor for the finite population mean crop hectares
per segment in the ith county, is given by
Xni Ni
X
Ni−1 yij + (xij β̃ + υ̃i ) ,
j=1 j=ni +1
where the unobserved yij are replaced by the model predictions. It approaches the pre-
dictor (4.15) as the sampling rate decreases. Due to the fact that the sampling rates are
small in this application, this expression is approximated by the predictor (4.15) which is
the one of several predictors that have been suggested for the small area problem (Fay and
Herriot, 1979).
Hence combining (4.14) and (4.15), a feasible predictor for the mean crop area (4.11) in
county i is
where ĝi , the approximately unbiased estimator for gi , which was suggested by Fuller and
Harter (1987) is such that,
ĝi = 1 − ĥi ,
ĥi = [m̂i + k̂i + (n−1 2 −1 −1 2 −1 −1
i − c) ŵi ] [ni σ̂e + (ni − c)ni ŵi ],
m̂i = m̂.. + (n−1 2
i − c)σ̂e ,
−1 4
ŵi = 2d−1
e m̂i σ̂e ,
" m
#−2 " m #
X X
k̂i = 2σ̂e2 (σ̈f f + n−1
i )
−1
ni bi n2i bi (σ̈f f + n−1
i )
2
,
i=1 i=1
σ̈f f = max[0, (m − 5)−1 (m − 3)σ̂e−2 m̂.. − c],
m
!
X
bi = 1 − 2ni x̄i. (X t X)−1 x̄i. + x̄i. (X t X)−1 n2i x̂ti. x̂i. (X t X)−1 x̂ti. ,
i=1
di = n−1
i [1
t
− ni x̄i. (X X)−1 x̄i. ],
m
!−1 m !
X X
m̂.. = ni bi ni ui ,
i=1 i=1
ui = ȳi. − x̄i. (X t X)−1 X t Y,
m
!−1 m !
X X
c= ni bi ni di ,
i=1 i=1
m
X
de = (ni − 1) − 2
i=1
Under the assumptions that the random effects and error effects are normally distributed,
it is easy to see that Y is normal distributed with mean Xβ and Variance V , after (4.9)
and (4.10).
Now, considering the models for corn and soybeans separately, we first estimate the variance
components using restricted maximum likelihood estimation approach that we presented
previously, and then after applying (3.13), we find the restricted maximum likelihood esti-
mators for the β parameters . The estimators are given in tables 4.2 and 4.3 respectively.
The data used in this example were taken from Battese et al. (1988) and are presented in
table 4.1.
44 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
Note that the variance components estimates σ̂u2 and σ̂e2 do not have the explicit expres-
sions, they have been found after applying (3.12).
The confidence intervals for fixed effects are found using (3.14). To find the bootstrap in-
tervals for corn and soybeans, we follow the resampling procedure shown in section (3.5.2)
where B = 1000, replications of bootstrap, and then we apply (3.16). The results are
presented in tables 4.4 and 4.5 in case of normal confidence intervals and tables 4.6 and
4.7 in the case of bootstrap confidence intervals for corn and soybeans respectively.
The predictor for the mean crop area (4.11) in county i is calculated using (4.16) and the
results are presented in tables 4.8 and 4.9 for hectares of corn and soybeans respectively.
This example shows the potential of small area estimation methods. They are designed
to combine two types of information: We have (expensive) reliable information on the
amount of land use of crops but only a few sample segments. On the other hand, we have
(cheap) satellite auxiliary information on all the segments of the counties under study. By
“borrowing strength” from auxiliary information from other segments we have been able
to find the predicted mean hectares of corn (or soybeans) in those 12 Iowa counties.
46 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL
5.1 Introduction
The linear mixed models that we revised in Chapter 3 are generally used in the situations
where the observations are continuous and assume that the relationship between the mean
of the dependent variable and the covariates can be modeled as a linear function. How-
ever, in some cases, the observations are correlated as well as discrete or categorical. The
assumptions of a linear relationship and independence are questionable due to the nature
of the dependent variable. In such cases the linear mixed models do not apply. There is
a need of extensions of linear mixed models in order to consider these cases in which the
observations are correlated and discrete or categorical at the same time. Such extensions
can be covered by the generalized linear mixed models (GLMMs) (Jiang, 2007).
The generalized linear mixed model is the most frequently used random effects model in the
context of discrete repeated measurements. Generalized linear mixed models are extensions
of generalized linear models that allow for additional components of variability due to un-
observable effects. Typically, the unobserved effects are modeled by the inclusion of random
effects in the linear predictor of the generalized linear model (McCulloch and Searle, 2001).
The structure of the chapter is as follows: In section 5.2 we present the Generalized Linear
Mixed Model with responses distributed according to a member of the Exponential family.
Parameter estimation is discussed in section 5.3, where we present a simulation-based
version of the Monte carlo EM algorithm.
47
48 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)
Noted that the relationship between the mean of random variable of interest and covariates
is not a linear; this allows to find a transformation of the mean that is modeled as a linear
model in both factors: fixed and random.
where ηij is the linear predictor and g(.) is a known function, called linkf unction (since
it combines together the conditional mean of yij and the linear form of predictors), with
xij and zij vectors of known covariate values, with β a vector of unknown fixed regression
coefficients, and φ a scale parameter. To that specification we have added ui which is the
random effects vector. Note that we are using µij here to denote the conditional mean
of yij given ui , not the unconditional mean. To complete the specification we assign a
distribution to the random effects:
Let the random variable of interest conditional on the random effects have a Bernoulli
distribution with the probability of success pij . This means that, given the random effects
u1 , u2 , . . . , um1 and v1 , v2 , . . . , vm2 , binary responses yij are conditionally independent with
After some manipulations the corresponding log-likelihood under this model has the fol-
lowing expression:
m1 + m2 m1 m2
l(θ|y) = − log(2π) − log(σ12 ) − log(σ22 ) + µy..
2 2 2
Z Z m1 Y
m2
{1 + exp(µ + ui + vj )}−1 ×
Y
+ log . . .
i=1 j=1
m1 m2 m1 m2
X X 1 X 1 X
exp ui yi. + uj y.j − 2 u2i − 2 u2j du1 . . . , dum1 dv1 . . . dvm2 ,
i=1 j=1
2σ 1 i=1 2σ 2 j=1
The above integral has no explicit expression and it cannot even be simplified.
This example shows the limitation of likelihood based inference in GLMMs. To try either
to solve, or to avoid the computational difficulties, the alternative approach that can be
thought is the use of Monte Carlo Expectation-Maximization (Monte Carlo EM) algorithm
(McCulloch, 1997), which we present below. Before discussing the Monte Carlo EM ap-
proach, we give a brief overview of the EM algorithm.
The important point for the EM algorithm is to consider a complete data which is a combi-
nation of observed data and unobserved random variables (in this case we consider random
effects). The EM algorithm is an iterative routine requiring two primary calculations in
each iteration: Computation of a particular conditional expectation of the log-likelihood
(E-step) and maximization of this expectation over the relevant parameters (M-step).
Let y = (y1 , y2 , . . . , ym )t be the observed data vector and, conditional on the random ef-
fects, u, assume that the elements of y are independent and drawn from a distribution that
belongs to the exponential family. Furthermore, assume a distribution for u depending on
parameters that are elements of D.
Here ηi = xti β + zit u, with xti and zit are the known vectors of covariates. The marginal
likelihood corresponding to (5.4) is given by
m
Z Y
L(β, τ, D|y) = fyi |u (yi |u)fu (u|D)du, (5.5)
i=1
which cannot usually be evaluated in closed form and has an integral that is not easy to
express in the explicit expression, with dimension equal to the number of levels of the
random factors, u as we mentioned above. Such integrals may be integrated numerically.
The goal is to develop algorithm to calculate fully parametric Maximum Likelihood (ML)
estimates based on the likelihood (5.5).
The EM algorithm thus works on the augmented log-likelihood to obtain the ML estimates
of the parameters over the likelihood (5.5). Now consider the class of models (5.4), where
to set up the EM algorithm, we consider the random effects, u, to be the missing data.
5.3. PARAMETER ESTIMATION 51
The complete data, w, is then w = (y, u), and the complete-data log-likelihood is given by
X
ln Lw = ln fyi |u (yi |u, β, τ ) + ln fu (u|D). (5.6)
i
(a) β (r+1) and τ (r+1) , which maximizes E[ln fy|u (y|u, β, τ )|y].
(b) D(r+1) which maximizes E[ln fu (u|D)|y].
(c) r = r + 1.
3. If convergence is achieved, declare β (r+1) ,τ (r+1) and D(r+1) to be the maximum like-
lihood estimators (MLE’s) otherwise, return to step 2.
Following McCulloch (1997), let y = (y1 , y2 , . . . , ym )t denote the observed data with distri-
bution f (y|ϕ) characterized by the s-vectors of parameters ϕ and ϕ = (β, τ, D). Consider
u = (u1 , u2 , . . . , um )t to be set of latent variables.
The EM algorithm thus works on the augmented log-likelihood ln f (y, u|ϕ) to obtain the
ML estimates of ϕ over the distribution f (y|ϕ) where it is assumed that
Z
f (y|ϕ) = f (y, u|ϕ)du.
52 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)
More specifically, the EM algorithm iterates between a calculation of the expected complete-
data likelihood
and the maximization of Q(ϕ|ϕ̂(r) ) over ϕ, where the maximizing value of ϕ is denoted by
ϕ̂(r+1) and ϕ̂(r) denotes the maximizer at the rth iteration.
In particular,
Z
Eϕ̂(r) (ln f (y, u|ϕ)|y) = ln f (y, u|ϕ)f (u|y, ϕ̂(r) )du,
where g(u|y, ϕ) is the conditional distribution of the latent variables given the observed
data and ϕ. In the cases where the E-step involves an integration that is difficult to eval-
uate analytically, we may estimate the quantity (5.6) from Monte Carlo simulations.
From this perspective, the conditional distribution of u|y cannot be found in closed form.
Consequently the expectations in 2a or 2b cannot be computed in closed form for the
model (5.4). In addition, the marginal density fy which is the integral of type (5.5) is not
straightforward to calculate due to dimension of the vector u. There exist the possibil-
ity of generating random observations from the conditional distribution of u|y by using a
Metropolis-Hastings algorithm without specifying fy . Consequently, it will be possible to
approximate the expectations steps using Monte Carlo (McCulloch,1997).
(r) (r)
If we obtain a sample u1 , . . . , uN from the distribution f (u|y, ϕ̂(r) ), this expectation may
be estimated by the Monte Carlo sum
N
1 X (r)
Q(ϕ|ϕ̂(r) ) = ln f (y, uk |ϕ). (5.8)
N
k=1
The Metropolis step is incorporated into the EM algorithm to give an MCEM algorithm
as follows (McCulloch, 1997):
1. Choose starting values β (0) ,τ (0) and D(0) . Set r = 0.
2. Generate N values u(1) , u(2) , . . . , u(N ) , from fu|y (u|y, β (r) , D(r) ) using the Metropolis
algorithm described previously:
(a) choose β (r+1) and τ (r+1) , to maximize a Monte Carlo estimate of E[ln fy|u (y|u, β, τ )|y];
that is, N1 N (k)
P
k=1 fy|u (y|u , β, τ ).
3. If convergence is achieved, β (r+1) , τ (r+1) and D(r+1) are declared to be the maximum
likelihood estimators (MLE’s) otherwise, return to step 2.
According to MacCulloch (1997), the main advantage of the Monte-Carlo EM algorithm is
that it can be applied in situations where direct application of the EM algorithm is either
very difficult or impossible.
We consider here a data set generated using the values of parameters proposed by Booth
j
and Hobert (1999): With β = 5, σ 2 = 1/2, ni = 15, m = 10 and xij = 15 for each i, j.
A version of the complete data log-likelihood is given by
HH j
HH 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
H
i H
1 1 0 0 0 0 1 1 0 1 1 1 1 1 1 1
2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
5 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1
6 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1
7 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
9 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
m m ni
m 1 X 2 XX
lw (β, σ, y, u) = − ln(σ 2 ) − 2 ui + [yij (βxij + ui ) − ln(1 + eβxij +ui )].
2 2σ
i=1 i=1 j=1
(5.12)
To apply the EM algorithm in this problem we would need to compute the (conditional)
expectation of lw with respect to the density
m Xni m
X 1 X
h(u|y; θ) ∝ exp [yij (βxij + ui ) − ln(1 + eβxij +ui )] − 2 u2i . (5.13)
2σ
i=1 j=1 i=1
It is easy to see that the resulting integral will be intractable. Thus we consider a Monte
Carlo EM algorithm as the means to simulate observations from the distribution given
5.3. PARAMETER ESTIMATION 55
In this chapter we presented the Generalized Linear Mixed Models and pointed out the
potential complications that could happen due to the necessary integration with respect to
the random effects (unobserved data). An alternative solution to this problem is discussed
by presenting the Monte Carlo EM algorithm which rely on the simulation to approximate
expectations in their E-step. The next chapter uses GLMMs in the context of Small Area
Estimation.
56 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)
Metropolis Graph
120
100
80
60
40
iterations
Figure 5.1: Simulated from Metropolis algorithm with Normal density proposal and con-
ditional density f (u|y) target.
Chapter 6
6.1 Introduction
In this chapter, we consider a subclass of generalized linear mixed models known as nested
models with a binary response. This is basically a subgroup of the models we discussed in
Chapter 5.
Chandra, Chambers and Salvati (2009) assert that the model-based methodologies allow for
the construction of efficient estimators by utilizing a suitable model for borrowing strength
especially with the limitation of having a small sample size. In the small area estimation
context, each small area is characterized by its area-specific effect while the auxiliary in-
formation may cover all the areas of interests. Hence, the explicit linking models based
on random area-specific effects that take into account between areas variation beyond the
ones that are explained by auxiliary variables included in the model are used (Chandra et
al.(2009), Rao (2003) and Chandra and Chambers (2009)).
When the response variables are continuous, the most common choice of inference tools to
be employed include the empirical best linear unbiased predictor (EBLUP) approach under
the linear mixed model (LMM) which we discussed in chapter 3. Chandra et al. (2009)
considered the situation where the variable of interest is binary but the related small area
estimates are needed. They used the empirical best predictor (EBP) under a generalized
linear mixed model (GLMM) with logistic link function since the estimation of EBLUP
is problematic when the variable of interest is binary and the sample size is small (Rao
(2003), Chandra et al. (2009)).
This chapter describes a real case of discrete observations where small area techniques can
57
58CHAPTER 6. SMALL AREA ESTIMATION UNDER MIXED EFFECTS LOGISTIC MODELS
be applied and how the estimation of parameters of interest can be done when the variables
of interest are discretes (proportions).
6.2.1 Problem
We present an example of a real case, adopted from the article by Ghosh, Natarajan,
Stroud and Carlin (1998). Their main objective was to estimate the proportion of workers
who have experienced any negative impact of exposure to health hazards in the workplace
for every one of the 15 × 2 × 2 = 60 groups cross-classified by 15 geographic regions and
the 4 demographic categories. The data presented in this article comes from a 1991 sample
of all persons in 15 geographic regions of Canada. The researchers sought a response to
the question “Have you experienced any negative impact of exposure to health hazards in
the workplace?”. In this survey, the responses were grouped into four categories: (1) yes,
(2) no, (3) not exposed, and (4) not applicable or not stated. However, in this thesis we
adopted only two classifications: (1) yes and (2) no, since for simplicity, we are interested
in having a binary response. For each region, workers were classified by age (≤ 40 or >
40) and sex (male or female). In the next subsection, we show how to find the probability
of exposure to negative health hazards according to a given worker’s classification using
small area models.
where xij is a vector of region by age-sex group level covariates and ui is the category effect
and the fixed part can be expressed as
xtij β = µ + γa + γs + γas ,
where µ is the general effect, γa is the main effect due to the age, γs is the main effect due
to the sex, and γas is the interaction effect of the age and sex.
The aim is to make inference about the small area quantity of interest (population propor-
tion) in a specific area i, i.e.
1 X 1 X X
pi = yij = yij + yij .
Ni Ni
j∈Ui j∈si j∈ri
To make the above inference about the population proportion (pi ), we define pij , the
probability that the j th unit within area i, yij has the attribute of interest, ui denotes the
random area effect for the small area i, assumed to be normally distributed with mean zero
and variance σu2 . Assuming that ui ’s are independent and
Therefore, a population model for the binary data is the GLMM with logistic link function,
given by
pij
logit{pij } = ln = ηij = xtij β + ui , (6.2)
1 − pij
Rao (2003) highlights that in the small area estimation, the model (6.2) can be conveniently
expressed at the population level as presented next. Hence we define yU a vector of response
variable with elements yij , XU a known design matrix with rows xij , ZU = block diag(1Ni )
is the N × M matrix with area indicators, 1Ni is a column vector of ones of size Ni ,
u = (u1 , u2 , . . . , uM )t , and ηU denotes vector of linear predictors ηij given by (6.1). Let
g(.) be a link function, such that g(µ) = g(E(yU |u)). According to the eq. (5.2) we have
the linear form
g(µ) = ηU = XU β + ZU u. (6.3)
The expression (6.2) then defines a linear predictor for a GLMM if yU given u are indepen-
dent and belong to the Exponential family of distributions. Assume the vector of random
area effects u has mean 0 and variance G(σ) = σ 2 IM , where IM is the identity matrix of
order M . The link function g(.) in the case of binary data is typically a logit function
and the relationship between yU (response) and ηU is expressed through a function h(.),
defined by E(yU |u) = h(ηU ) (Jiang (2007), McCulloch and Searle (2001)).
Suppose we are interested in predicting a small area quantity of interest through a linear
combination of the response, ψ = aU yU , where aU = diag(ati ) is a M × N matrix and
ati = (ai1 , ai2 , . . . , aiN ) is a vector of known elements. To estimate the population propor-
tion pi for small area i, we consider a case where ati denotes the population vector with
value N1i for each population unit in area i and zero elsewhere.
To express the different components of the model with respect to the sampled and non
sampled parts, we decompose the vector yU so that its first n elements correspond to the
sampled units, and then partition aU , yU , ηU , XU and ZU according to the sampled and
non-sampled units
as ys ηs Xs Zs
aU = , yU = , ηU = , XU = , and ZU = ,
ar yr ηr Xr Zr
where s denotes components defined by the n sample units and r indicates components
defined by the remaining N − n non-sample units. Therefore, we have E(ys |u) = h(ηs )
6.3. SMALL AREA ESTIMATION OF PROPORTIONS 61
and E(yr |u) = h(ηr ). Here h(.) is considered as g −1 (.). Now the expression of quantity of
interest ψ = aU yU is represented
The first term of the right hand side in (6.4) is known from the sample, whereas the second
term, which depends on the non-samples values yr = h(Xr β +Zr u), is unknown and should
be predicted by fitting the model (6.3) using sample data.
Let ys = {ysij } and yr = {yrij } denote the vector of sample values and the vector of non-
samples values of the binary survey variable y respectively. The parameter of interest pi
for each small area can then be obtained by predicting each element of yrij . Since we need
to fit a GLMM and since it is difficult to find the explicit expression of marginal likelihood
function (see chapter 5), the parameter estimation is achieved via the Monte Carlo EM
technique. This gives the best linear unbiased estimate (BLUE) for β and the best linear
unbiased predictor (BLUP) for u when the value of σ is assumed to be known. Using (6.3)
we obtain the BLUP-type estimator of ψ. Using the estimated value σ̂ of σ leads to the
empirical BLUE β̂ for β and the empirical BLUP û for u and thus the empirical BLUP
type estimator of ψ, which is given by
Using (6.4) the empirical best predictor (EBP) to small area i proportion pi is then
1 X X
p̂i = yij + µ̂ij , (6.6)
Ni
j∈si j∈ri
where
exp(η̂ij )
µ̂ij = = p̂ij ,
1 + exp(η̂ij )
and
η̂ij = xtij β̂ + ûi .
Equation (6.6) gives the population proportion estimate that we had set out to achieve
(Chandra, Chambers y Salvati (2009), Saei and Chambers (2003) ).
In this work we have considered model-based approaches to finding the estimates of small
area target quantities. We employed statistical models that “borrow strength” in mak-
ing estimates of mean hectares of corn (or soybeans) in 12 Iowa counties where the small
areas of interest had sample sizes that varied from 1 to 3. Two types of information are
considered; information on the amount of land use for crops but only on a few sampled
segments and satellite information on all the segments of the counties under study, which
constituted the auxiliary information.
The linear mixed model was used to find the estimates for the regression coefficients and
the variance components using the Maximum Likelihood Estimation (MLE) and the Re-
stricted Maximum Likelihood Estimation (RMLE). However, the MLE was found to be
biased and so to calculate the variance components estimators, we recommend the use
of REML method instead. Furthermore, we calculated the confidence intervals for fixed
effects, both normal confidence intervals and bootstrap normal confidence intervals.
Under the GLMMs, due to the complexity of the (marginal) likelihood function, the use
of ML and REML estimates was limited by lack of explicit expressions and so we adopted
the Monte Carlo EM algorithm for maximum likelihood fitting of generalized linear mixed
models which uses simulation to construct Monte Carlo approximations at the E-step. For
each E-step, the Metropolis algorithm was used. The resulting parameter estimates were
found to be consistent with the model parameters used during the simulation. It is also
worth to note that the Metropolis algorithm converged pretty fast.
This work confined the attention to the the framework of mixed models with univariate
normal random area-specific effects. However, in real life, this assumption may not always
63
64 CHAPTER 7. CONCLUDING REMARKS AND FUTURE RESEARCH
be justified. It would be a rewarding topic for future work to assess the performance of
those models under multivariate normal random area effects.
Other recommendations on future research include:
• To consider the application of generalized linear mixed model in the context of small
area using real data.
• To set up and study methods of small area estimation under LMMs or GLMMs with
temporal and spatial effects.
Bibliography
[1] Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error component model
for prediction of county crop areas using survey and satellite data, Journal of the
American Statistical Association, 83, 28-36.
[2] Box, G.E.P and Jenkins, G.M (1970). Time Series Analysis Forecasting and Control,
San Francisco: Holden-Day, Inc.
[3] Brackstone, G.J. (1987). Small area data: Policy issues and technical challenges. In
Small Area Statistics (R. Platek, Rao J. N.K , C. E. Sarndal and M. P. Singh eds.)
3-20. Wiley, New York.
[4] Brady, T.W., Welch, K.B. and Andrzej, T.G. (2007). Linear Mixed Models. A Practical
Guide Using Statistical Software. Chapman and Hall, Taylor and Francis Group, New
York.
[5] Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear
mixed models, Journal of the American Statistical Association, 88, pp. 9-25.
[6] Chandra, H., Chambers, R. and Salvati, N. (2009). Small area estimation of propor-
tions in business surveys, Centre for Statistical and Survey Methodology, The Univer-
sity of Wollongong, Working Paper.
[7] Cressie, N.A. (1991). Geostatistical analysis of spatial data. Spatial Statistics and Dig-
ital Image Analysis. Washington, D.C.: National Academy Press.
[8] Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applications
to small area estimation, Annals of Statistics, 19, 1748-1770.
[9] Datta, G.S., Kubokawa, T. and Rao, J.N.K. (2002). Estimation of MSE in Small
Area Estimation, Technical Report, Department of Statistics, University of Georgia,
Athens.
[10] Diggle, P., Heagerty, P., Liang, K. and Zeger, S. (2002). Analysis of Longitudinal Data,
2nd ed., Oxford University Press, New York.
65
66 BIBLIOGRAPHY
[11] Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap, Chapman
and Hall, New York.
[12] Efron, B. (1982). The Jackknife, the Bootstrap and Resampling Plans, Philadeliphia:
SIAM.
[13] Erciulescu, A.L. and Fuller, W.A. (2013). Small area prediction of the mean of a bino-
mial random variable, JSM Proceedings. Survey Research Methods Section. Alexan-
dria, VA: American Statistical Association, 855-863.
[14] Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An appli-
cation of James-Stein procedures to census data, Journal of the American Statistical
Association, 74, 269-277.
[15] Fuller,W.A. (2009). Sampling Statistics, John Wiley and Sons, Hoboken, New Jersey.
[16] Ghosh, M., Natarajan, K., Stroud, T.W.F. and Carlin, B.P. (1998). Generalized linear
models for small area estimation, Journal of American Statistical Association, 93, 273-
282.
[17] Gonzalez, M.E. (1973). Use and evaluation of synthetic estimates, Proceedings of the
Social Statistics Section, American Statistical Association, pp. 33-36.
[18] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer.
[19] Harville, D.A. (1977). Maximum likelihood approaches to variance component esti-
mation and to related problems. Journal of the American Statistical Association, 72,
320-340.
[20] Harville, D.A. (1985). Decomposition of prediction error, Journal of the American
Statistical Association, 80, 132-138.
[21] Henderson, C.R. (1975). Best linear unbiased estimation and prediction under a se-
lection model. Biometrics,75, 423-447.
[22] Jiang, J. (2007). Linear and Generalized Linear Mixed Models and their Applications,
Springer, New York.
[23] Kubokawa, T. (2009). A review of linear mixed models and small area estimation,
University of Tokyo, CIRJE Discussion Paper. http// www.e.u.tokyo.ac.
[24] Larsen, M.D. (2003). Estimation of small-area proportions using covariates and survey
data, Journal of Statistical Planning and Inference, 112, 89-98.
[25] Longford, N.T. (2005). Missing Data and Small Area Estimation. Springer, New York.
BIBLIOGRAPHY 67
[26] Malec, D. (2005). Small area estimation from the american community survey using a
hierarchical logistic model of persons and housing units, Journal of Official Statistics,
21, 3, 411-432.
[27] McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed
models, Journal of the American Statistical Association, 92,162-170.
[28] McCulloch, C.E. and Searle, S.R. (2001). Generalized, Linear, and Mixed Models, New
York: Wiley.
[30] Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS.
Springer.
[32] Robinson, G.K. (1991). That BLUP is a good thing: The estimation of random effects,
Statistical Science, 6, 15-32.
[33] Royall, R.M. (1976). The linear least-squares prediction approach to two-stage sam-
pling, Journal of the American Statistical Association, Vol. 71, No. 355, pp. 657-664.
[34] Saei, A. and Chambers, R. (2003). Small area estimation under linear and generalized
linear mixed models with time and area effects. Methodology Working Paper- M03/15,,
University of Southampton, United Kingdom.
[35] Searle, S.R., Casella, G. and McCulloch, C.E. (1992). Variance Components, John
Wiley and Sons, New York.
[36] Van der Leeden, R., Busing, F.M.T.A. and Meijer, E. (1997). Bootstrap methods
for two-level models. Technical Report PRM 97-04, Leiden University, Department of
Psychology, Leiden.
[37] Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data,
Springer-Verlag, Berlin.