0% found this document useful (0 votes)

27 views81 pages

Te 517

Uploaded by

Sergio Pérez-Elizalde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views81 pages

Te 517

Uploaded by

Sergio Pérez-Elizalde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Centro de Investigación en Matemáticas, A.C.

STATISTICAL MODELS FOR SMALL AREA

ESTIMATION

TESIS

Que para obtener el Grado de:

Maestro en Ciencias con Orientación en Probabilidad y

Estadı́stica

P R E S E N T A:

Georges Bucyibaruta

Director:

Dr. Rogelio Ramos Quiroga

Guanajuato, Gto, México, Noviembre de 2014

Integrantes del Jurado.

Presidente: Dr. Miguel Nakamura Savoy (CIMAT).

Secretario: Dr. Enrique Raúl Villa Diharce (CIMAT).

Vocal y Director de la Tesis: Dr. Rogelio Ramos Quiroga (CIMAT).

Asesor:

Dr. Rogelio Ramos Quiroga

Sustentante:

Georges Bucyibaruta
iii

Abstract
The demand for reliable small area estimates derived from survey data has increased greatly
in recent years due to, among other things, their growing use in formulating policies and
programs, allocation of government funds, regional planning, small area business decisions
and other applications.

Traditional area-specific (direct) estimates may not provide acceptable precision for small
areas because sample sizes are rarely large enough in many small areas of interest. This
makes it necessary to borrow information across related areas through indirect estimation
based on models, using auxiliary information such as recent census data and current ad-
ministrative data. Methods based on models are now widely accepted. Popular techniques
for small area estimation use explicit statistical models, to indirectly estimate the small
area parameters of interest.

In this thesis, a brief review of the theory of Linear and Generalized Linear Mixed Models is
given, the point estimation focusing on Restricted Maximum Likelihood. Bootstrap meth-
ods for two-Level models are used for the construction of confidence intervals. Model-based
approaches for small area estimation are discussed under a finite population framework.
For responses in the Exponential family we discuss the use of the Monte Carlo Expectation
and Maximization algorithm to deal with problematic marginal likelihoods.
v

Dedication

This Thesis is dedicated

To
God Almighty,
my mother,
my sisters and brothers,
and the memory of my father.
vii

Acknowledgements
Throughout the period of my study, I have had cause to be grateful for the advice, support
and understanding of many people. In particular I would like to express my boundless
gratitude to my supervisor Dr. Rogelio Ramos Quiroga for helping me get into the prob-
lem of small area estimation and kindly direct this work. He has guided me in my research
through the use of his profound knowledge, insightful thinking, thoughtful suggestions as
well as his unflagging enthusiasm. I really appreciate the enormous amount of time and
patience he has spent on me.
This work has also benefited greatly from the advice, comments and observations of Dr.
Miguel Nakamura Savoy and Dr. Enrique Raúl Villa Diharce.

As well as the people mentioned above, my special thanks go to members of department of

probability statistics of CIMAT for the interesting lectures they offered to us and their con-
sistent support and encouragement throughout my study. A mention must also go to my
wonderful classmates, current and former students and the CIMAT community in general
who provided pleasant atmosphere and useful distraction during my stay in Guanajuato.

I gratefully acknowledge the admission to CIMAT and financial support from Secretaria
de Relaciones Exteriores (SRE), México with which it has been possible to realize this
achievement.

Finally but certainly not least, I want to remember my entire family in Rwanda whose
unlimited support, prayers, and encouragement carried me through times of despondency
and threatening despair.

My sincere thanks to all.

Table of Contents

Abstract iii

Dedication v

Acknowledgements vii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Main Contribution of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Approaches to Small Area Estimation 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Direct Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Indirect Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Synthetic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Composite Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Small Area Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Basic Area Level Model . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.3 Basic Unit Level Model . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Comparisons Between Model-based with Design-based Approaches . . . . . 8

3 Linear Mixed Models (LMMs) 11

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 General Specification of Model . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 General Matrix Specification . . . . . . . . . . . . . . . . . . . . . . 16
3.2.3 Hierarchical Linear Model (HLM) Specification of the LMM . . . . . 17
3.2.4 Marginal Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Estimation in Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Maximum Likelihood (ML) Estimation . . . . . . . . . . . . . . . . . 19

ix
3.3.2 Restricted Maximum Likelihood (REML) Estimation . . . . . . . . 20
3.4 Confidence Intervals for Fixed Effects . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Normal Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Bootstrapping Two-Level Models . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2 Parametric Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3 Bootstrap Confidence Intervals . . . . . . . . . . . . . . . . . . . . . 25
3.6 Prediction of Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.1 Best Linear Unbiased Predictor (BLUP) . . . . . . . . . . . . . . . . 26
3.6.2 Mixed Model Equations . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.3 Empirical Best Linear Unbiased Predictor (EBLUP) . . . . . . . . . 28

4 Small Area Estimation with Linear Mixed Model 29

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Nested Error Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 BLU and EBLU Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Illustration Using Data from Battese et al. (1988) . . . . . . . . . . . . . . 38
4.4.1 Fitting the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.2 Estimation and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Generalized Linear Mixed Models (GLMMs) 47

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Monte Carlo EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . 51

6 Small Area Estimation Under Mixed Effects Logistic Models 57

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Description of a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Small Area Estimation of Proportions . . . . . . . . . . . . . . . . . . . . . 59
6.3.1 The Empirical Best Predictor for the Small Areas . . . . . . . . . . 59

7 Concluding Remarks and Future Research 63

x
List of Figures

5.1 Simulated from Metropolis algorithm with Normal density proposal and
conditional density f (u|y) target. . . . . . . . . . . . . . . . . . . . . . . . . 56

List of Tables

4.1 Survey and Satellite Data for corn and soybeans in 12 Iowa Counties. . . . 40
4.2 The estimated parameters for corn. . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 The estimated parameters for soybeans. . . . . . . . . . . . . . . . . . . . . 44
4.4 Normal Confidence Interval for corn. . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Normal Confidence Interval for soybeans. . . . . . . . . . . . . . . . . . . . 45
4.6 Bootstrap Confidence Interval for corn. . . . . . . . . . . . . . . . . . . . . . 45
4.7 Bootstrap Confidence Interval for soybeans. . . . . . . . . . . . . . . . . . . 46
4.8 Predicted Mean Hectares of corn in 12 Iowa Counties. . . . . . . . . . . . . 46
4.9 Predicted Mean Hectares of soybean in 12 Iowa Counties. . . . . . . . . . . 46

5.1 Simulated data for the logit-normal model . . . . . . . . . . . . . . . . . . . 54

xi
Chapter 1

Introduction

Small Area Estimation (Rao, 2003) tackles the important statistical problem of providing
reliable estimates of a variable of interest in a set of small geographical areas. The main
problem is that it is almost impossible to measure the value of the target variable for all
the individuals in the areas of interest and, hence, a survey is conducted to obtain a rep-
resentative sample.

Surveys are usually designed to provide reliable estimates of finite population parameters
of interest for large domains in which the large domains can refer to geographical regions
such as states, counties, municipalities or demographic groups such as age, race, or gender.
The information collected is used to produce some direct estimate of the target variable
that relies only on the survey design and the sampled data. Unfortunately, sampling from
all areas can be expensive in resources and time. Domain estimators that are computed
using only the sample data from the domain are known as direct estimators. Even if di-
rect estimators have several desirable design-based properties, direct estimates often lack
precision when domain sample sizes are small. In some cases there might be a need for
estimates of domains which are not considered during planning or design, and this leads
to the possibility that the sample size of the domains could be small and even zero. Di-
rect estimators of small area parameters may have large variance and sometimes cannot
be calculated due to lack of observations (Rao, 2003,Chapiter 1, Montanari, Ranalli and
Vicarelli, 2010).

To produce estimates for small areas with an adequate level of precision often requires
indirect estimators that use auxiliary data through a linking model in order to borrow
strength and increase the effective sample size. Regression models are often used in this
context to provide estimates for the non-sampled areas. The model is fitted on data from
the sample and used together with additional information from auxiliary data to compute
estimates for non-sampled areas. This method is often known as synthetic estimation

1
2 CHAPTER 1. INTRODUCTION

(Gonzlez, 1973).
Small area estimation (SAE) involves the class of techniques to estimate the parameters of
interest from a subpopulation that is considered to be small. In this context, a small area
can be seen as a small geographical region or a demographic group which has a sample
size that is too small for delivering direct estimates of adequate precision (Rao, 2003,
Chapiter 1). The overriding question of small area estimation is how to obtain reliable
estimates when the sample data contain too few observations (even zero in some cases)
for statistical inference of adequate precision due to the fact that the budget and other
constraints usually prevent the allocation of sufficiently large samples to each of the small
areas as we mentioned in the previous paragraph.

1.1 Background
The use of small area statistical methods has been in existence dating back to as early as
the eleventh and seventeenth centuries in England and Canada respectively. In these cases,
they were based on either census or on administrative records (Brackstone, 1987). In re-
cent years, the statistical techniques for small area estimation have been a focus for various
authors in the context of sample surveys, and there is an ever-growing demand for reliable
estimates of small area populations of all types (both public and private sectors). Some
of the main reasons of that demand that we can pick out are: Administrative planning,
disease mapping, determination of state funding allocations, etc. Examples of small area
estimation applications include: The prediction of the area under corn and soybeans in
counties of Iowa (Battese et al., 1988), the production of estimates of income and poverty
measures for various population subgroups for states, counties and school districts (Rao,
2003, Chapiter 1).

Traditional methods of indirect estimation mentioned provide a link or a connection to

related small areas through supplementary data and make specific allowance for between
area variation. In particular, the mixed model involving random area-specific effects that
account for between area variation.

Mixed models with unit level auxiliary data have been used for small area estimation by a
good number of authors. Battese, Harter and Fuller (1988) use a linear mixed model to pre-
dict the area planted with corn and soybeans in Iowa counties. In this case, the area under
corn and soybeans are considered to be small because the sample sizes obtained were too
small for direct estimation. Datta and Ghosh (1991) introduce the hierarchical Bayes pre-
dictor for general mixed models. Larsen (2003) compared estimators for proportions based
on two unit level models, a single model with no area level model covariates and a model
using the area level information. Malec (2005) proposes Bayesian small area estimates
for means of binary responses using a multivariate binomial (multinomial) model. Jiang
1.2. STRUCTURE OF THE THESIS 3

(2007) reviews the classical inferential approach for linear and generalized linear mixed
models and discusses the prediction for a function of fixed and random effects. Montanari,
Ranalli, and Vicarelli (2010) consider unit level linear mixed models and logistic mixed
models, for binary response variables and fully known auxiliary information. Vizcaino,
Cortina, Morales Gonzalez (2011) derive small area estimators for labor force indicators in
Galicia using a multinomial logit mixed model.

1.2 Structure of the Thesis

In Section 1.1 we introduced the idea of small area estimation and highlighted some con-
cepts of small area estimation applied to a wide variety of situations. The work presented
in this thesis can be divided into two broad categories, the first dealing with a linear mixed
model approach with emphasis on small area estimation. The second category involves the
use of generalized linear mixed models in small area estimation. The rest of this thesis is
structured in the following manner. Chapter 2 gives an overview of approaches to small
area estimation. In Chapter 3, the general theory on linear mixed models is introduced,
two approaches for parameter estimation are considered: Point estimation and confidence
interval estimation. Furthermore, the theory of prediction is applied to find the Empirical
Best Linear Unbiased Predictor (EBLUP) of the random effects. In Chapter 4, the theory
on linear mixed models and the related EBLUP theory revised in Chapter 3 is applied to
the estimation of small area quantities of interest using the nested error regression model
based on the general theorem of prediction. In Chapter 5, the class of generalized linear
mixed models is introduced and its various aspects such as model formulation, estimation
and the Monte Carlo Expectation Maximization (EM) algorithm are presented. Chapter
6 looks at the Small Area Estimation under mixed effects logistic models. The application
and contextualization of the generalized linear mixed models and the related Empirical
Best Predictor (EBP) is implemented in the context of small area estimation. Finally,
general conclusions and suggestions for future research are presented in Chapter 7.

1.3 Main Contribution of the Thesis

The overall objective of this work is to review the various statistical models and then
consequently implement their applications in the small area estimation context. In doing
the review, we considered the Linear Mixed Models (LMM) and the Generalized Linear
Mixed models (GLMM). Firstly, we give a review on theory of the Linear Mixed Model
and applications to small area estimation under the normality assumption using the MLE
and REML methods, and we further explain the derivation of the Best Linear Unbiased
Predictor /Empirical the Best Linear Unbiased Predictor. Secondly, we contextualize the
Generalized Linear Mixed Models approach in small area estimation with emphasis on
making inferences on the model parameters when applied to count data. Given the struc-
4 CHAPTER 1. INTRODUCTION

ture of these models, we show that the direct ML and REML methods are limited due to
the complexity of the (marginal) likelihood function and we suggest the Monte Carlo EM
as an alternative method of parameter estimation that avoids of the complexities of direct
ML methods.
Chapter 2

Approaches to Small Area

Estimation

2.1 Introduction
In this chapter, we briefly describe some approaches to small area estimation. In general
we can differentiate two types of small area estimators: direct and indirect estimators. The
direct small area estimator is based on survey design, while indirect estimators are mainly
based on different models and techniques such as implicit model based approaches that
include synthetic and composite estimators known as traditional indirect estimators, and
explicit models which are categorized as area level and unit level models and are based on
mixed model methodologies (Rao, 2003, Chapiter 5).

In this chapter we will briefly talk about, direct small area estimators, implicit models
(synthetic and composite estimators) and explicitly small area models (unit level model
and area level model).

2.2 Direct Estimation

Following the definition given by Rao (2003), a small area estimator is direct when it uses
the sample values of study variables from the specified area only. In general, the direct
estimators are unbiased estimators, however, due to the small sample size, such estimators
might have unacceptably large standard errors. Within this type of estimators there is the
expansion estimator. Assume that the quantity of interest is the total Y and there is no
auxiliary information. The expansion estimator of Y is
X
Ŷ = wi yi ,
s

5
6 CHAPTER 2. APPROACHES TO SMALL AREA ESTIMATION

where wi are the design weights, s is a selected sample, i ∈ s.

2.3 Indirect Estimation

Indirect or model-based small area estimators rely on statistical models to provide estimates
for all small areas. Once the model is chosen, its parameters are estimated using the data
obtained in the survey. An important issue in indirect small area estimation is that auxiliary
information or covariates are needed.

2.3.1 Synthetic Estimation

The synthetic estimator is an example of an estimator, which can be considered either
model-based or design-based model-assisted. In both cases the specified linear relationship
between y (study variable) and the auxiliary variables, described with the parameter β
(vector of regression coefficients) plays an important role. In the design-based approach
we assume no explicit model, but the more correlated y is with the auxiliaries the more
efficient is the estimator. The name synthetic estimator comes from the fact that these
estimators borrow strength by synthesizing data from many different areas.

It is called a synthetic estimator, the one which is a reliable direct estimator for a large
area, covering various small areas, and is used as an estimator for a small area considering
that the small area has the same characteristics as the larger area (Gonzlez, 1973, Rao,
2003, Chapiter 4).

2.3.2 Composite Estimation

As we mentioned, as the sample size in a small area increases, a direct estimator becomes
more desirable than a synthetic estimator. This means, when area level sample sizes are
relatively small the synthetic estimator outperforms the traditional direct estimator. Syn-
thetic estimators have a big influence of information from the other areas, thus they may
have small variance but a large bias in the case where the hypothesis of homogeneity is not
satisfied.

According to Rao (2003), to avoid the potential bias of a synthetic estimator, say Ŷis and
the instability of the direct estimator, say Ŷid , we consider a convex combination of both,
known as the composite estimator.

Ŷic = ωi Ŷis + (1 − ωi )Ŷid ,

for a suitable chosen weight ωi (0 ≤ ωi ≤ 1), where c, s, and d stand for composite,
synthetic and direct, respectively.
2.4. SMALL AREA MODELS 7

2.4 Small Area Models

2.4.1 Introduction
Traditional methods of indirect estimators, mentioned above, are based on implicit models
(synthetic and composite). We now turn to explicit linking models which provide significant
improvements in techniques for indirect estimation. Based on mixed model methodology,
these techniques incorporate random effects into the model. The random effects account
for the between-area variation that cannot be explained by including auxiliary variables.
Most small area models can be defined as an area-level model or a unit-level model (Rao,
2003, Chapiter 5).

2.4.2 Basic Area Level Model

The area level model relates the small area information on the response variable to area-
specific auxiliary variables. One of the most widely used area level models for small area
estimation was proposed by Fay and Herriot (1979). According to the Fay-Herriot model,
a basic area level model assumes that the small area parameter of interest ηi is related to
area-specific auxiliary data xi through a linear model

ηi = xti β + υi , i = 1, 2, . . . , m, (2.1)

where m is the number of small areas, β = (β1 , β2 , . . . , βp )t is p × 1 vector of regression

coefficients, and the υi ’s are area-specific random effects assumed to be independent and
identically distributed (iid ) with E(υi ) = 0 and Var(υi ) = συ2 , model expectation and
model variance respectively. Normality of the random effects υi is also often used, but it
is possible to make robust inferences by relaxing the normality assumption (Rao, 2003).

The area level model assumes that there exists a direct survey estimator η̂i for the small
area parameter ηi such that

η̂i = ηi + i , i = 1, 2, . . . , m, (2.2)

where the i is the sample error associated with the direct estimator η̂i , with the assump-
tions that the i ’s are independent normal random variables with mean E(i ) = 0 and
Var(i ) = τi . Combining these two equations, yields the area level linear mixed model

η̂i = xti β + υi + i . (2.3)

2.4.3 Basic Unit Level Model

Unit level models relate the unit values of the study variable to unit-specific auxiliary
variables. These variables are related to the unit level values of response through a linear
8 CHAPTER 2. APPROACHES TO SMALL AREA ESTIMATION

mixed model known as nested error linear regression model. This type of model can be
represented by the following equation

yij = xtij β + νi + ij , (2.4)

where yij is the response of unit j, j = 1, 2, . . . , ni , in area i, i = 1, 2, . . . , m, xij is the

vector of auxiliary variables, β is the vector of regression parameters, νi is the random
effect of area i and ij is the individual unit error term. The area effects νi are assumed
independent with mean zero and variance σν2 . The errors ij are independent with mean
zero and variance σ2 . In addition, the νi and ij ’s are assumed to be independent.

The nested error unit level regression model (2.4) was first used to model county crop areas
in North-Central Iowa, USA (Battese et al., 1988).

2.5 Comparisons Between Model-based with Design-based

Approaches
In this Chapter, we have looked at some of the design-based and model-based approaches
for small area estimation that have been described in the literature. However, in prac-
tice the direct design-based estimators have been highly appreciated, because they are
approximately design unbiased. The traditional indirect estimators, such as synthetic and
composite estimators, are based on implicit linking models. Synthetic estimators for small
areas are derived from direct estimators for a large area that covers several small areas
under the assumption that the small areas have the same characteristics as the large area.
Composite estimators are basically weighted averages of direct estimators and synthetic
estimators. Both synthetic and composite estimators can yield estimates that provide
higher precision compared to direct estimators. However, both types of estimators share
a common tendency to be design-biased, and the design bias does not necessarily decrease
as the sample size increases.

The explicit linking models provide significant improvements in techniques for indirect esti-
mation. Based on mixed model methodology, these techniques incorporate random effects
into the model. The random effects account for the between-area variation that cannot
be explained by including auxiliary variables. Most small area models can be defined as
an area-level model, or a unit-level model. Area-level models relate small-area direct esti-
mators to area-specific auxiliary data. Unit-level models relate the unit values of a study
variable to unit-specific auxiliary data .

In small area estimation, the focus is on bias and variance. For a Small sample size, the
unbiasedness of the direct estimators, may be of no practical value due to its large variance.
2.5. COMPARISONS BETWEEN MODEL-BASED WITH DESIGN-BASED APPROACHES9

Model-based estimators may be sometimes biased, but they have the advantage of small
variances compared to the design-based estimators (Rao, 2003, Chapiter 5).
10 CHAPTER 2. APPROACHES TO SMALL AREA ESTIMATION
Chapter 3

Linear Mixed Models (LMMs)

3.1 Introduction
In this chapter we present an overview of the linear mixed model. It is not intended to
be an all encompassing exposition on the subject; the rationale is to briefly explore the
methods for parameter estimation used throughout the thesis. In particular, it serves as
an introduction to the models that will be used in Chapter 4. The model-based small
area estimation largely employs linear mixed models involving random area effects. The
auxiliary variables are introduced in the fixed part of the model as covariates.

A linear mixed model (LMM) can be viewed as a parametric linear model for clustered, or
longitudinal data that defines the relationships between a continuous dependent variable
and various predictor variables (covariates) (Brady et al., 2007). The term mixed refers
to the presence of fixed and random effects. To moderate the random variation in the
dependent variable at different levels of the data, random effects are directly used since
they are associated with one or more random factors.

Generally, clustered data are defined as data sets obtained after measuring the variable
of interest on different individuals grouped in a cluster. Longitudinal data, are viewed as
data sets in which the dependent variable (observed variable) is measured several times
over time for each individual.

LMMs can be considered as hierarchical models with at least two levels of data. There-
fore, clustered, and longitudinal data are referred to as hierarchical data sets, since the
observations can be placed into levels of a hierarchy in the data. Considering clustered, or
longitudinal data sets as multilevel data sets with at least two-level, we have the following:
Level 1 denotes observations at the most detailed level of the data. In a clustered data set,
Level 1 represents the units of analysis (or subjects) in the study. In a longitudinal data

11
12 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

set, Level 1 represents the repeated measures made on the same unit of analysis. Level 2
represents the next level of the hierarchy. In clustered data sets, Level 2 observations rep-
resent clusters of units. For longitudinal data sets, Level 2 represents the units of analysis
(Brady et al., 2007, McCulloch and Searle, 2001, Kubokawa, 2009).

This chapter will cover the following topics, linear mixed model formulation, point esti-
mation of parameters, confidence intervals for fixed parameters, bootstrapping two-Level
models and prediction of random effects.

3.2 Model Formulation

There exist different ways to represent and classify LMM which can be expressed at indi-
vidual level (Level 1) or Level 2, as well as in matrix notation. The common assumption
is that of normality for all random effects; in this case the LMM is called Gaussian Mixed
Models (Brady et al., 2007, Jiang, 2007).

In this section we present some alternative representations of LMMs, such are a general
matrix specification, hierarchical linear model and marginal linear model which help to
write a explicit expression of likelihood function.

3.2.1 General Specification of Model

According to Brady et al. (2007), a simple and general form that indicates how most of
the components of an LMM can be written at the level of an individual observation (Level
1) in the context of a clustered two-level data set. By convention, the index j refers Level
1 units and index i means Level 2 units.

yij = xtij β + zij

t
ui + eij ,
|{z} | {z } |{z}
fixed random random
i = 1, . . . , m;
j = 1, . . . , ni (3.1)
3.2. MODEL FORMULATION 13

where:

yij = response of j th member of cluster i

m = number of clusters
ni = size of cluster i
xij = covariate vector of j th member of cluster i for fixed effects, ∈ Rp
β = fixed effects parameter, ∈ Rp
zij = covariate vector of j th member of cluster i for random effects, ∈ Rq
ui = random effect parameter, ∈ Rq ,

with assumptions that

ui ∼ Nq (0, D), D ∈ Rq×q

 
ei1
 .. 
ei =  .  ∼ Nni (0, Σi ), Σi ∈ Rni ×ni
eini

u1 , . . . , um , e1 , . . . , em are assumed independent,Σi is covariance matrix of error vector ei

in cluster i and D is covariance matrix of random effects ui .

Elements along the main diagonal of the D matrix represent the variances of each random
effect in ui , and the off-diagonal elements represent the covariances between two corre-
sponding random effects. Since there are q random effects in the model associated with
the ith cluster, D is a q × q matrix that is symmetric and positive definite, and is defined
as follows:
 
Var(ui1 ) Cov(ui1 , ui2 ) . . . Cov(ui1 , uiq )
 Cov(ui1 , ui2 ) Var(ui2 ) . . . Cov(ui2 , uiq ) 
D = Var(ui ) :=  .
 
.. .. .. ..
 . . . . 
Cov(ui1 , uiq ) Cov(ui2 , uiq ) . . . Var(uiq )

The variance and covariance components of the D matrix can be treated as parameters
stored in a vector denoted by θD with elements and dimension depending on the structure
of the matrix D.

In contrast to the standard linear model, the residuals associated with clustered observa-
tions within the same cluster in an LMM can be correlated. We assume that the ni residuals
in the vector ei for a given cluster, i, are random variables that follow a multivariate normal
14 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

distribution with a mean vector 0 and a positive definite symmetric covariance matrix Σi
which is defined in the following general form:
 
Var(ei1 ) Cov(ei1 , ei2 ) . . . Cov(ei1 , eini )
 Cov(ei2 , ei1 ) Var(ei2 ) . . . Cov(ei2 , eini ) 
Σi = Var(ei ) :=  .
 
.. .. .. ..
 . . . . 
Cov(eini , ei1 ) Cov(eini , ei2 ) . . . Var(eini )

In the same way as the case of matrix D, the variance and covariance components of the
Σi matrix are stored in a vector denoted by θΣ .

Considering both cases we define the vector, θ that will be used in subsequent sections,
which combines all covariance parameters contained in the vectors θD and θΣ . Often
the parameters in θ are simply variance and covariance components, but sometimes they
may have other interpretations such as the autoregressive parameter and moving average
component which are used in time series settings (e.g. Box and Jenkins, 1970) or the sill
and range in geostatistical applications (e.g. Cressie, 1991).

Covariance Structures for the D Matrix

A D matrix is defined to be unstructured if there are no additional constraints on the
values of its elements (aside from positive definiteness and symmetry). The symmetry in
the q × q matrix D indicates that the θD vector has q × (q + 1)/2 parameters. As example
of an unstructured D matrix, in the case of an LMM having two random effects associated
with the ith cluster, we consider the following structure:

σu21

σu1 ,u2
D = Var(ui ) = .
σu1 ,u2 σu22

In this case, the vector θD contains three covariance parameters:

σu21
 

θD =  σu1 ,u2  .
σu22

Taking into account certain constraints on the structure of D, the very commonly used
structure is the variance components (or diagonal) structure, in which each random effect
in ui has its own variance, and all covariances in D are defined to be zero. In general, the
θD vector for the variance components structure requires q covariance parameters, defining
the variances on the diagonal of the D matrix. For example, in an LMM having two random
3.2. MODEL FORMULATION 15

effects associated with the ith subject, a variance component D matrix has the following
form:
2
σu1 0
D = Var(ui ) = .
0 σu22

In this case, the vector θD contains two parameters:

2
σu1
θD = .
σu22

Covariance Structures for the Σi Matrix

The simplest covariance matrix structure for Σi is the diagonal one, in which the residuals
associated with observations in the same cluster are assumed to be uncorrelated and to
have equal variance. The diagonal Σi matrix for each cluster i has the following structure:
 2 
σ 0 ... 0
 0 σ2 . . . 0 
Σi = Var(ei ) :=  . ..  .
 
. .. . .
 . . . . 
0 0 . . . σ2

Here we see that the diagonal structure implies one parameter in θΣi , which defines the
constant variance at each time point:

θΣi = σ 2 .

There exist also other structure of covariance matrix Σi called compound symmetry struc-
ture. Its general form for each subject i is as follows:
 2
σ + σ12 σ12 σ12

...
 σ12 σ 2 + σ12 ... σ12 
Σi = Var(ei ) :=  .
 
.. .. .. ..
 . . . . 
σ12 σ12 . . . σ 2 + σ12

In this case there are two parameters in the θΣi vector that define the variances and
covariances in the Σi matrix:
2
σ
θΣi = .
σ1

The other structure which is commonly used as the covariance structure for the Σi matrix,
is a first-order autoregressive structure and is denoted by AR(1) for which the general form
16 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

is :

σ2 σ2ρ . . . σ 2 ρni −1
 
 σ2ρ σ2 . . . σ 2 ρni −2 
Σi = Var(ei ) :=  .
 
.. .. .. ..
 . . . . 
σ 2 ρni −1 σ 2 ρni −2 ... σ2

The AR(1) structure has only two parameters in the θ vector that define all the vari-
ances and covariances in the Σi matrix, a variance parameter (σ 2 > 0 ) and a correlation
parameter ( −1 ≤ ρ ≤ 1),

σ2

θΣi = .
ρ

Note that the AR(1) structure is frequently used to fit models when data sets are character-
ized to be equally spaced longitudinal observations on the same units of analysis. According
to this structure, the observations which are closer to each other in time exhibit higher
correlation than observations farther apart in time. In any given analysis, the structure for
the Σ matrix that seems to be most appropriate, is determined due to the given observed
data and knowledge about the relationships between observations on an individual subject
or cluster.

3.2.2 General Matrix Specification

We now consider the general matrix specification of an LMM for a given cluster i,

Yi = Xi β + Zi ui + ei i = 1, . . . , m. (3.2)

Where

xti1
   t   
zi1 yi1
Xi :=  ...  ∈ Rni ×p , Zi :=  ...  ∈ Rni ×q , Yi :=  ...  ∈ Rni .
     

xtini t
zin i
yini
3.2. MODEL FORMULATION 17

The matrix formulation for a LMM is of the form

     
Y1 X1 D
 .. n  .  n×p ..
Y :=  .  ∈ R , X :=  ..  ∈ R , β ∈ Rp , G :=  ∈R
mq×mq
,
  
.
Ym Xm D
 
Z1 0n1 ×q . . . 0n1 ×q  
0 ... 0
 0n2 ×q Z2 
n×mq  .. .. n ×q
Z :=  ∈R , 0ni ×q :=  . ∈R i ,
  
.. .. .
 . . 
0 ... 0
0nm ×q Zm
     
u1 e1 Σ1 0 m
..  ∈ Rmq , e :=  ..  ∈ Rn , R :=  .. n×n
X
u :=  ∈R , n= ni
 
.   .   .
um em 0 Σm i=1

with this, the Linear Mixed Model eq.(3.2) can be rewritten as

Y = Xβ + Zu + e, (3.3)

where

u 0 G 0mq×n
∼ Nmq+n , .
e 0 0n×mq R

3.2.3 Hierarchical Linear Model (HLM) Specification of the LMM

It is often convenient to express an LMM in terms of an explicitly defined hierarchy of
simpler models, which correspond to the levels of a clustered or longitudinal data set.
When LMMs are specified in such a way, they are often referred to as hierarchical linear
models (HLMs), or multilevel models (MLMs). The HLM form of an LMM is equivalent
to the general LMM introduced in (3.2), and may be implemented for any LMM. Here
we present a general form for the HLM specification of LMMs as a two level hierarchical
model.

Y |u ∼ Nn (Xβ + Zu, R)
u ∼ Nmq (0, G) . (3.4)

3.2.4 Marginal Linear Model

In subsection 3.2.1 we defined a general form for LMM where the random effects are spec-
ified to explain the between-cluster variation. LMMs are referred to as cluster-specific
models since it is possible for them to perform the cluster-specific inference.
18 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

The marginal linear model is different from the LMM due to the fact of having random
effects or not. Marginal models do not allow the cluster-specific effects since the random
effects are not specified. The following is the marginal model specification implied by an
LMM.

The marginal model in matrix form is

Y = Xβ + ε∗ , (3.5)

where

u
ε∗ := Zu + e =

Z In×n
| {z } e
A

with

ε∗ ∼ Nn (0, V )

where

Zt

G 0 t
G 0
V =A A = Z In×n
0 R 0 R In×n
Zt

= ZG R
In×n
= ZGZ t + R
 
Z1 DZ1 + Σ1 0
= 
 .. 
. 
0 Zm DZm + Σm
 
V1 0
= 
 .. .

.
0 Vm

The implied marginal model defines the marginal distribution of the Y vector:

Y ∼ Nn (Xβ, V ) .

The implied marginal model has an advantage due to the flexibility of carrying out the
estimation of fixed model parameters (fixed-effects and variance components) in an LMM
without worrying about the random effects.
3.3. ESTIMATION IN LINEAR MIXED MODELS 19

3.3 Estimation in Linear Mixed Models

In the LMM, we estimate the fixed-effect parameters, β , and the covariance parameters,
(i.e., θD and θΣi for the D and Σi matrices, respectively). In this section, we discuss
maximum likelihood (ML) and restricted maximum likelihood (REML) estimation, which
are methods commonly used to estimate these parameters.

3.3.1 Maximum Likelihood (ML) Estimation

In general, maximum likelihood (ML) estimation is a method of obtaining estimates of un-
known parameters by optimizing a likelihood function. To apply ML estimation, we first
construct the likelihood as a function of the parameters in the specified model, based on
distributional assumptions. The maximum likelihood estimates (MLEs) of the parameters
are the values of the arguments that maximize the likelihood function (i.e., the values of
the parameters that make the observed values of the dependent variable most likely, given
the distributional assumptions).

In the context of the LMM, we construct the likelihood function of β and θ by referring to
the marginal distribution of the dependent variable Y defined in (3.5). The corresponding
multivariate normal probability density function, f (Y |β, θ), is :
1
f (Y |β, θ) = (2π)−n/2 |V |−1/2 exp{− (Y − Xβ)t V −1 (Y − Xβ)}. (3.6)
2
Recall that the elements of the V matrix are functions of the covariance parameters in θ.
So that the log likelihood is
1 1 n
l (β, θ) = − (Y − Xβ)t V −1 (Y − Xβ) − ln(|V |) − ln(2π). (3.7)
2 2 2

Case 1: Assume θ is Known

We consider a special case of ML estimation for LMMs, in which we assume that θ, and
as a result the matrix V , is known. In this case, the only parameters that we estimate are
the fixed effects, β. The log-likelihood function, l(β, θ), thus becomes a function of β only.
Then we differentiate respect to β:
∂l
= −2X t V −1 (Y − Xβ)
∂β
= −2X t V −1 Y + 2X t V −1 β
setting to zero, we get
−2X t V −1 Y + 2X t V −1 Xβ = 0
X t V −1 Y = X t V −1 Xβ
20 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

We solve that equation, we get the maximum likelihood estimator

β̂ = (X t V −1 X)−1 X t V −1 Y. (3.8)

Which is also known as the Generalized Least Squares (GLS) estimator when it is assumed
that Y has a normal distribution.

Case 2: Assume θ is not Known

In this case the previous relation (3.8) becomes

β̃(θ) = (X t V (θ)−1 X)−1 X t V (θ)−1 Y. (3.9)

To estimate V (θ), we put (3.9) into (3.7) to obtain the profile log likelihood
1 1 n
lp (θ) = l β̃(θ), θ = − (Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ)) − ln(|V (θ)|) − ln(2π)
2 2 2
The profile log likelihood is then maximized to find θM L .

Harville (1977) asserts that, in estimation of variance parameters, however, the ML method
suffers from some well known problems. One of them, it tends to give variance estimates
that are biased. For the ordinary linear model where m = 1, V = σ 2 I and q = 0, the ML
estimator of the single variance component σ 2 has expectation
n o
2 t
E(σ̂M L ) = E (Y − X β̂) (Y − X β̂)/n
1 t
= E Y (In − X(X t X)−1 X t )Y
n
1 1
= σ 2 tr(In − X(X t X)−1 X t ) + β t X t (In − X(X t X)−1 X t )Xβ
n n
n−r 2
= σ
n
where r is the rank of matrix X. So the bias is σ 2 r/n, that can be significant in the case
that the number of degrees of freedom n − r is very small.

At least to some extent, these problems of ML estimation can be avoided by adopting

another likelihood-based procedure called residual ML or restricted ML (REML).

3.3.2 Restricted Maximum Likelihood (REML) Estimation

The REML method is based on the likelihood principle and has the same merits, like
consistency, efficiency and asymptotic normality, as the ML method. The REML method
is now a widely preferred approach to estimate variance parameters in mixed models (Searle
3.3. ESTIMATION IN LINEAR MIXED MODELS 21

et al., 1992, Pinheiro and Bates, 2000, Verbeke and Molenberghs, 2000, Diggle et al., 2002).
The REML estimates of θ are based on optimization of the REML log-likelihood function.
It means we first remove the effect of the fixed variables: Remember that the residuals are
uncorrelated with all the fixed variables in the model. The distribution of the residuals is
also normal. But the distribution of the residuals no longer depends on the estimates of
the fixed effects; it only depends on the variance components. Harville (1977) considered
a joint likelihood function of (β, θ), where by integrating the joint likelihood, he calculated
the marginal likelihood function depending only on θ (restricted likelihood):
Z
lR (θ) = ln L (β, θ) dβ (3.10)

where
Z Z
− 12 −n/2 1 t −1
L (β, θ) dβ = |V(θ)| (2π) exp − (Y − Xβ) V (θ) (Y − Xβ) dβ. (3.11)
2

Now consider

(Y − Xβ)t V (θ)−1 (Y − Xβ) = β t X t V (θ)−1 X β − 2Y t V (θ)−1 Xβ + Y t V (θ)−1 Y

| {z }
A(θ)

Define:

B(θ) = A(θ)−1 X t V (θ)−1 .

Adding and subtracting Y t B(θ)t A(θ)B(θ)Y , we get

(Y − Xβ)t V (θ)−1 (Y − Xβ) = (β − B(θ)Y )t A(θ)(β − B(θ)Y )+

Y t V (θ)−1 Y − Y t B(θ)t A(θ)B(θ)Y.

Note that

B(θ)t A(θ) = V (θ)−1 XA(θ)−1 A(θ) = V (θ)−1 X.

Therefore we have
Z
1
L (β, θ) dβ = |V(θ)|− 2 (2π)−n/2 exp{−1/2 × (Y t [V (θ)−1 + B(θ)t A(θ)B(θ)]Y }×
Z
1
exp{− (β − B(θ)Y )t A(θ)(β − B(θ)Y )}dβ
2
1 1 (2π)p/2
= |V(θ)|− 2 (2π)−n/2 exp{− (Y t [V (θ)−1 + B(θ)t A(θ)B(θ)]Y } .
2 |A(θ)−1 |−1/2
22 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

Now
(Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ)) = Y t V (θ)−1 Y − 2Y t V (θ)−1 X β̃(θ) + β̃(θ)t X t V (θ)−1 X β̃(θ)
| {z }
A(θ)
t −1 t −1 t
= Y V (θ) Y − 2Y V (θ) X β̃(θ) + β̃(θ) A(θ)β̃(θ)
t −1
= Y V (θ) Y − 2Y B(θ) A(θ)B(θ)Y + Y t B(θ)t A(θ)B(θ)Y
t t

= Y t V (θ)−1 Y − Y t B(θ)t A(θ)B(θ)Y.

Here we used:
β̃(θ) = (Xt V (θ)−1 X)−1 X t V (θ)−1 Y = A(θ)−1 X t V (θ)−1 Y = B(θ)Y,
and
B(θ)t A(θ)B(θ) = V (θ)−1 XA(θ)−1 A(θ)B(θ) = V (θ)−1 XB(θ).
Therefore it follows from (3.11)
(2π)p/2
Z
1 1
L (β, θ) dβ = |V (θ)| 2 (2π)−n/2 exp{− × (Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ))} .
2 |A(θ)−1 |−1/2
Then
Z
1 1 1
lR (θ) = ln L (β, θ) dβ = − ln(|V (θ)|) − (Y − X β̃(θ))t V (θ)−1 (Y − X β̃(θ)) − ln(|A(θ)|) + c
2 2 2
1
= lp (θ) − ln(|A(θ)|) + c.
2
Therefore the restricted ML (REML) of θ is given by θ̂REM L which maximizes
1
lR (θ) = lp (θ) − ln(|X t V (θ)−1 X|). (3.12)
2
Substituting the estimator V̂REM L = V (θ̂REM L ) into the GLS formula (3.8) gives the
REML estimator of β
−1 −1 t −1
β̂REM L = (X t V̂REM L X) X V̂REM L Y. (3.13)

3.4 Confidence Intervals for Fixed Effects

In general, standard errors are often used to assign approximate confidence intervals to
a parameter θ of interest. A confidence interval is often more useful than just a point
estimate θ̂. Taken together, the point estimate and the confidence interval say what is
the best guess for θ, and how far in error that guess might reasonable be. In this section
we consider the way to find the confidence intervals for fixed effects under the above
distribution assumptions for linear mixed models (Gaussian Mixed Models).
3.5. BOOTSTRAPPING TWO-LEVEL MODELS 23

3.4.1 Normal Confidence Intervals

Under the assumptions that Y ∼ N (Xβ, V (θ)), holds and β̂REM L is asymptotically normal
with mean 0 and approximate variance-covariance matrix given by
−1 −1
Var(β̂) = (X t V̂REM L X)

−1 −1
where σ̂i2 = (X t V̂REM L X)ii are considered as estimates of Var(β̂i ). Therefore
q
−1 −1
β̂i ± z1−α/2 (X t V̂REM L X)ii (3.14)

denotes an approximate 100(1 − α)% confidence interval for βi .

Note that (3.14) is an approximate confidence interval, though a very useful one in a wide
variety of situations (Jiang, J. (2007)). We will use the bootstrap to calculate better ap-
proximate confidence intervals, where we will be concerned on the computation of bootstrap
standard errors (Efron, B. and Tibshirani, R.J. (1993)).

3.5 Bootstrapping Two-Level Models

3.5.1 Bootstrap
The Bootstrap is a technique based on simulation methods to estimate the bias and the
variance (and consequently the standard error) of an estimator under minimal assumptions.
In this section we discuss a little bit the general idea of the bootstrap and its applications
to two-level models. It is possible to compute bootstrap estimates via the parametric and
nonparametric approaches. The nonparametric bootstrap is the method where estimates
are computed by generating a large number of new sample from the original sample. The
parametric bootstrap method generates bootstrap observations with estimated values of
parameters.

A generic implementation of the nonparametric Bootstrap, assuming sample {z1 , z2 , . . . , zn }

is as follows:
∗ , z ∗ , . . . , z ∗ }, b = 1, 2, . . . , B from {z , z , . . . , z }.
1. Draw B bootstrap samples {zb1 b2 bn 1 2 n

2. From each of the B samples, estimate the parameter θ, thereby obtaining B estimates
θb∗ .
∗ = 1
PB ∗
3 Calculate θ(.) B b=1 θb ,
∗ 1 PB
var(θ
ˆ ) = B−1 b=1 (θb∗ − θ(.)
∗ )2 , the expectation and the variance of θ ∗ respectively.

ˆ B = Bias(θ
4. The bias Bias ˆ ∗ ) = θ ∗ − θ̂
(.)
24 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

ˆ B = 2θ̂ − θ∗
5. The bias-corrected of θ is θ̂B = θ̂ − Bias (.)

In the case of two-level model,

yij = xtij β + zij

t
ui + eij ,
|{z} | {z } |{z}
fixed random random
i = 1, . . . , m;
j = 1, . . . , ni , (3.15)

el nonparametric Bootstrap (known as cases bootstrap) can be obtained by resampling

entire cases as follows:

1. Draw one entire Level-2 unit (Yi , Xi , Zi ), containing ni Level-1 cases, with replace-
ment,

2. From this Level-2 unit, draw a bootstrap sample (Yi∗ , Xi∗ , Zi∗ ) of size ni with replace-
ment,

3. Repeat steps 1 and 2 m times,

4. Compute all parameter estimates for the two level-model

5. Repeat steps 1-4 B times and compute bais-corrected estimates and bootstrap stan-
dard errors.

The bootstrap has proved to be an approach that derives consistent estimates of bias and
standard errors of model parameters. It has been recommended to consider bootstrap es-
timation in multilevel models since gives satisfactory results in small sample cases under
minimal assumptions. The application of the bootstrap to multilevel models is not direct
because it depends on the nature the problem. During a resampling process, the hierar-
chical data structure must be considered since the observations are based on intra-class
dependency. Several adaptations are needed in order to deal properly with the intra-class
dependency. Here we will consider a two-level model and discuss one bootstrap approach:
parametric bootstrap. Among the assumptions the parametric bootstrap is based on that
we take into account are: The explanatory variables are considered fixed, and both the
model (specification) and the distribution(s) are assumed to be correct (Efron (1982), Hall
(1992), Van der Leeden, Busing and Meijer (1997)).

3.5.2 Parametric Bootstrap

The bootstrap samples are generated using the parametrically estimated distribution func-
tion of the data. In the two-level model considered here, two of these distribution functions
are involved. For the level-1 residuals e, we use the N (0, R̂) distribution function, and for
3.5. BOOTSTRAPPING TWO-LEVEL MODELS 25

the level-2 residuals, contained in the vector u, we use the N (0, Ĝ) distribution function.

Let β̂ be the REML estimator of β. The (re)sampling procedure is now as follows (Van
der Leeden, Busing and Meijer, 1997):
1. Draw vector u∗ from the a multivariate normal distribution with mean zero and
covariance matrix Ĝ.
2. Draw vector e∗ from the a multivariate normal distribution with mean zero and
covariance matrix R̂.
3. Generate the bootstrap samples Y ∗ from Y ∗ = X β̂ + Zu∗ + e∗ .
4. Compute estimates for all parameters of the two-level model.
5. Repeat steps 1-4 B times and compute bias-corrected estimates and bootstrap stan-
dard errors.
Note that in this sampling process the covariates are assumed to be fixed.

3.5.3 Bootstrap Confidence Intervals

In many cases the bootstrap is only used for bias correction and computation of standard
errors. In usual applications the standard errors are used to calculate approximate con-
fidence interval of a parameter θ of interest. Furthermore, among other important and
nontrivial applications of the bootstrap includes the computation of confidence intervals.
There exist different types of bootstrap but here we only discuss one type of bootstrap
confidence interval for a typical parameter θ with true value θ0 and we will only discuss
two-sided intervals. The intended nominal coverage of the confidence interval will be de-
noted by 1−α, so the probability that the interval contains the true parameter value should
be approximately 1 − α (Efron, B. and Tibshirani, R.J. (1993)).

Bootstrap Normal confidence Interval

If the assumptions of the model, including the normality assumptions, hold, then the
estimators are asymptotically normally distributed with a certain variance, derived from
the likelihood function and we have
√ `
n(θ̂ − θ0 ) → N (0, ψ).
The distribution of θ̂ − θ0 can be approximated by the normal distribution with mean
zero and variance ψ̂/n, where ψ̂ is a consistent estimator of ψ derived from the likelihood
function. The usual confidence intervals for θ0 are therefore
h i
θ̂ + zα/2 se(
ˆ θ̂); θ̂ + z1−α/2 se(
ˆ θ̂) (3.16)
26 CHAPTER 3. LINEAR MIXED MODELS (LMMS)
q
ψ̂
where se(ˆ θ̂) = n is the estimator of the asymptotic standard deviation of θ̂. Even if
the random terms in the models are not normally distributed, under non harsh regularity
conditions, the estimators are asymptotically normally distributed. In that case, se ˆ N (θ̂)
may not be a consistent estimator of the standard deviation of the estimators of the variance
components, although it is still consistent for the fixed parameters. This implies replacing
se(
ˆ θ̂) in (3.15) by a bootstrap estimator results in an approximate confidence interval, the
bootstrap normal confidence interval, which is better than the normal confidence interval.
h i
θ̂ − zα/2 se
ˆ B (θ̂), θ̂ + z1−α/2 se
ˆ B (θ̂) (3.17)

in which se
ˆ B (θ̂) is the bootstrap estimator of the standard deviation of θ̂. Alternatively,
one might use
h i
θ̂B − zα/2 se
ˆ B (θ̂), θ̂B + z1−α/2 se
ˆ B (θ̂) (3.18)

where θ̂B is the bootstrap bias-corrected estimator of θ (Efron, B. and Tibshirani, R.J.
(1993)).

3.6 Prediction of Random Effects

3.6.1 Best Linear Unbiased Predictor (BLUP)
Technically speaking, the random effects u in model (3.3) are not model parameters like
β and σ. However, as Pinheiro and Bates (2000) point out, in a way, they behave like
parameters and since they are unobservable, there often is interest in obtaining estimates of
their values. The best predictor (BP) of u, in the sense that it minimizes the mean squared
prediction error, is the conditional mean ũ = P B(u) = E(u|Y ). The normality assumptions
for model considered in section (3.2), imply that u and Y have a joint multivariate normal
distribution
G GZ t

u 0
∼ Nmq+n , , (3.19)
Y Xβ ZG V
and under the normal theory the conditional mean of u given Y is

E(u|Y ) = E(u) + cov(u, Y )[var(Y )]−1 (Y − E(Y )) = GZ t V −1 (Y − Xβ). (3.20)

This is the best predictor of u, and being a linear function of Y it also is the best linear
predictor (BLP) of u. In practice the unknown β in (3.19) is replaced with its estimator
β̂ = β̂GLS , which is the BLUE of β yielding the best linear unbiased predictor (BLUP) of
u:

ũ = BLU P (u) = GZ t V −1 (Y − X β̂). (3.21)

3.6. PREDICTION OF RANDOM EFFECTS 27

The unbiasedness means here that both the random variable u and its predictor ũ have the
same expected value. Mathematically, the BLUP estimates of fixed parameter and random
effects are defined as solution of Henderson’s simultaneous equations which sometimes are
called mixed model equations (Robinson, 1991).

3.6.2 Mixed Model Equations

Henderson (in Henderson et al. 1959) introduced a set of equations, solutions of which give
simultaneously the GLS estimator of β and the BLUP of u. The equations are derived by
maximizing the joint density of Y and u with respect to β and u.

Since Y |u ∼ Nn (Xβ + Zu, R) and u ∼ Nq (0, G), the joint density of Y and u is
1
f (Y, u) = f (Y |u) f (u) = (2π)−n/2 |R|−1/2 exp{− (Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)}
2
1
× (2π)−q/2 |G|−1/2 exp{− ut G−1 u}
2
exp{− 12 [(Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)} + ut G−1 u]
= .
(2π)(n+q)/2 |R|1/2 |G|1/2
To maximize the density f (Y ; u) calculate the partial derivatives of
n+q 1 1
ln(f (Y, u)) = − ln(2π) − ln |R| − ln |G|−
2 2 2
1
[(Y − Xβ − Zu)t R−1 (Y − Xβ − Zu)} + ut G−1 u],
2
with respect to β and u:
∂ ln(f (Y, u)) ∂ ln(f (Y, u))
= X t R−1 (Y − Xβ − Zu) = Z t R−1 (Y − Xβ − Zu) − G−1 u.
∂β ∂u
Setting these to zero yields equations
X t R−1 Xβ + X t R−1 Zu = X t R−1 Y
Z t R−1 Xβ + X t R−1 Zu + G−1 u = Z t R−1 Y,
which are written in matrix form as
t −1
X t R−1 Z
t −1
X R X β X R Y
−1 −1 −1 = . (3.22)
t t
Z R X X R Z +G u Z t R−1 Y

These are Henderson’s mixed model equations. Solving them produces the β̂GLS and the
ũ. A practical advantage of the mixed model equations is their computational convenience,
because, unlike in (3.20), there is no need for inverting the n × n covariance matrix V . The
inverses of q × q matrix G and n × n matrix R are needed instead, but they are often easy
to compute: q is usually not that large and R is usually diagonal.
28 CHAPTER 3. LINEAR MIXED MODELS (LMMS)

3.6.3 Empirical Best Linear Unbiased Predictor (EBLUP)

Usually the covariance matrices V , G and R are unknown. Then, in predicting u by the
BLUP formula (3.20) they will be replaced with their REML or ML estimates to yield

û = EBLUP(u) = ĜZ t V̂ −1 (Y − X β̂). (3.23)

The predictor û is called empirical best linear unbiased predictor (EBLUP) of u, the word
empirical referring to the fact that the values of G and V have been obtained from the
observed data. The estimator β̂ is now the GLS estimator, where V is replaced with its
estimate (i.e. β̂ is typically β̂REML or β̂ML ), and it is sometimes called empirical BLUE of
β. Both β̂ and û can be obtained by solving the Henderson’s mixed model equations where
V , G and R are substituted by the corresponding estimates.

Considering the model at cluster level (level 2), model (3.2), the EBLUP for the conditional
expectation E(Yi |ui ) = Xi β + Zi ui is given by

Xi β̂ + Zi D̂Zit Σ̂i (Yi − Xi β̂). (3.24)

The expression (3.23) may be rewritten in the form a weighted average combining the in-
formation from cluster (or subject) i only and information from the population (auxiliary
information). The EBLUP for Xi β + Zi ui may be viewed as borrowing strength across
clusters to get the best prediction for individual i. That is, when the data from a region is
small, resulting that the information from that region is weak. The regional estimate needs
to be strengthened with supplementary global data. This is the fact that the smaller the
regional data is, the more weight the global information gets in the estimation (Kubokawa
(2009), McCulloch y Searl (2001)).

This connection between linear mixed models and small area estimation is discussed in the
following chapter.
Chapter 4

Small Area Estimation with Linear

Mixed Model

4.1 Introduction
In this chapter we illustrate how to apply the theory on linear mixed models and the related
EBLUP theory toward the estimation of a small area characteristic of interest. To give an
idea about the context, we start with the following discussion.

Suppose that the population U of size N is divided into M disjoint areas and that the
survey estimates are available for m, m ≤ M , of the areas.
Let
U = U1 ∪ U2 . . . ∪ Ui ∪ . . . ∪ UM
and
N = N1 + N2 + . . . + Ni + . . . + NM
where Ni is the size of area population Ui , i = 1, 2, . . . , M. Assume then that a random
sample s of size n is drawn from U and
s = s1 ∪ s2 . . . ∪ si ∪ . . . ∪ sm
and
n = n1 + n2 + . . . + ni + . . . + nm .
Furthermore, each area population Ui divides into the sample si and the remainder ri =
Ui − si . It is possible that for some areas the sample size ni is zero so that ri = Ui .

To illustrate these concepts, we consider the following example from Fuller (2009) which
considers the prediction of wind erosion in Iowa for the year 2002, where 44 counties re-
ported data of wind erosion but the author included 4 counties with no observation due to

29
30 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

the purpose of illustration. Each county is divided into segments, and the segments are the
primary sampling units of the survey. The soils of Iowa have been mapped, so population
values for a number of soil characteristics are available. The mean of the soil erodibility
index for each county is the auxiliary data in the small area model.

The data set includes different variables such are county, totalsegments, samplesegments,
erodibility, and Y. The variable county records an identification number for each county,
totalsegments records the total number of segments in the county, and samplesegments
records the sampled number of segments in the county. The variable erodibility records a
standardized population mean of the soil erodibility index for each county, and Y records
the survey sample mean (direct estimate) of a variable that is related to wind erosion
for each county. The first 44 observations contain the observed data, and the last 4 obser-
vations contain the hypothetical counties for which there were no observations in the survey.

Relating this example to general description, the following specifications are given:

U : all segments in all counties under study

M : number of counties under the study
N : is the population (total) number of segments in all counties
Ui : all segments in the county i
Ni : the population (total) number of segments in the county (small area) i
m : number of counties that have been sampled
n : the sample number of segments in all counties
ni : the sample number of segments in small area i
Y : wind erosion in all counties
ȳi : the estimate (direct estimate) of the mean of segments that have been affected
by the wind erosion for the area i
x̄i : a vector of known means of auxiliary variables for the area i
r̄1i : the population mean erodibility index for county i

The model proposed by the author is:

ȳi = θi + ei
θi = x̄ti β + υi ,
i.e
ȳi = x̄ti β + υi + ei ,
4.2. NESTED ERROR REGRESSION MODEL 31

with x̄ti = [1, 0.1(r̄1i − 59)], i = 1, 2, . . . , 48. Where υi is the area effect, ei is sampling error,

υi ∼ iid N (0, συ2 )

ei ∼ iid N (0, σe2 ),

and υi and ei are independent.

Due to the fact that the mean of y for area i is assumed to be the sum of x̄i β and υi , the
fixed and random parts respectively, this model is also called mixed model (see Chapter
3).

The model presented here is the area level model where the quantity of interest is the
mean, but the nature of data may vary for different context and problems. The model can
be defined in terms of primary sampling units or area level terms as in this example.

This chapter covers the statistical framework specific for small area estimation. We present
present the corresponding linear mixed model in Section 4.2. In Section 4.3 we discuss
Prediction for small areas quantities of interest and we close the chapter with an example
from the literature.

4.2 Nested Error Regression Model

We consider a linear mixed model commonly called the nested-error regression model with
a single random effect, ui , for the ith small area, or small group:

yij = xtij β + ui + eij . (4.1)

Here yij is the response of unit j in area i, xij is the corresponding vector of auxiliary
variables, β is the vector of fixed parameters, ui is the random effect of area i and eij the
random individual error term. The area effects ui are assumed independent with zero mean
and variance σu2 . Similarly, the errors eij are independent with zero mean and variance
σe2 . In addition, the ui ’s and the eij ’s are assumed mutually independent. Under these
assumptions,

E(yij ) = xtij β,

and

Var(yij ) = σu2 + σe2 .

32 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

We also make the usual assumption that both ui and eij are normally distributed. Let Yi
denote the ni × 1 vector of the ni elements in area Ui , i.e.
 
yi1
 yi2 
Yi =  .
 
.. 
 . 
yini (ni ×1)

Now the model in the matrix form, for the population Ui of area i is

Yi = Xi β + Zi ui + ei , (4.2)

where

xti1
 
 xti2 
Xi = 
 
.. 
 . 
xtini (ni ×p)

Zi = 1ni : the ni × 1 unit vector , ui is scalar and

 
ei1
 ei2 
ei =  .
 
.. 
 . 
eini (ni ×1)

With this notation E(Yi ) = Xi β and Var(Yi ) = Vi = Jni σu2 + Ini σe2 , where Jni = 1ni ⊗ 1tni
is the square matrix of ones. Here the sign ⊗ is the Kronecker product.

Using the usual mixed model notation we can write Vi = Zi Gi Zit + Ri where Gi = σu2 and
Ri = σe2 Ini . If we further combine the area vectors Yi into one response vector
 
Y1
 Y2 
Y = ,
 
.. 
 . 
Ym (n×1)

we have the model in the form (3.3) of the general linear mixed model. The model is now
4.3. BLU AND EBLU PREDICTORS 33

Y = Xβ + Zu + e, with
 
X1
 X2 
X=  ,
 
.. 
 . 
Xm (n×p)
 
1n1 0 ... 0
 0 1n2 ... 0 
Z=  ,
 
.. .. .. .. 
 . . . . 
0 0 . . . 1nm (n×m)
 
u1
 u2 
u= 
 
.. 
 . 
um (m×1)
 
e1
 e2 
e=  .
 
.. 
 . 
em (n×1)

Units (or observations) coming from different areas i and i0 are not correlated. The covari-
ance matrix V of the response vector Y is block-diagonal.
 
V1 0 . . . 0
 0 V2 . . . 0 
Var(Y ) = V =  .
 
. .. . . .. 
 . . . . 
0 0 . . . Vm n×n
= ZGZ + R, where G = σu2 Im and R = σe2 In .
t

where G = σu2 Im and R = σe2 In .

4.3 BLU and EBLU Predictors

In this section we consider the information from unit (or individual) level to find the esti-
mator (or predictor) of the variable of interest. We Assume that our interest is to find the
total area and we suppose that y = (y1 , y2 , . . . , yN )t is a realization of a random variable
Y . Let a model ξ characterizes the probability distribution of Y . The focus is to estimate
the value of a population quantity of interest, which can be seen as function h(y) of y,
typically a linear combination ct y. If c = 1, where 1 is an unit vector, then h(y) is the
34 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

population total, and if c = 1/N , then h(y) is the population mean.

Now suppose that the interest is to estimate the total population of the variable of interest
for area i X
Yi = yij .
j∈Ui

Following Royall (1976), let s denote a sample of size n from a finite population U and
let r denote the nonsampled remainder of U so that U = s ∪ r. Correspondingly, let ys
be the vector of observations in the sample and yr the rest of y. We have the following
decomposition:

ys cs
y= , c= .
yr cr

The population quantity to be estimated is now

h(y) = cts ys + ctr yr

a realization of the random variable

h(Y ) = cts Ys + ctr Yr .

Note that the first term cts ys is observed from the sample, whereas the second term must
be estimated (or predicted, in the frequentist terminology, since it is a function of random
variables, not a fixed parameter). Thus, estimating h(y), or predicting h(Y ), is essentially
predicting the value ctr yr of the unobserved random variable ctr Yr .

Lets consider ξ to be a linear mixed model in a such way that

Eξ (Y ) = Xβ

and
Varξ (Y ) = V.

In accordance with the partition U = s ∪ r we can arrange X and V so that

Xs
X=
Xr

and

Vs Vsr
V = ,
Vrs Vr
4.3. BLU AND EBLU PREDICTORS 35

where X contains auxiliary variables, β is a vector of unknown parameters and V is an

arbitrary positive definite covariance matrix. The vectors y and e as well as the matrices
X and Z can be partitioned into sample parts ys , es , Xs and Zs (of n rows) and remainder
parts yr , er , Xr and Zr (of N − n rows). Then the nested error regression model, in the
matrix form, becomes

ys Xs Zs es
y= = β+ u+ .
yr Xr Zr er

The corresponding partition of the covariance matrix of y is

Zs GZst Zs GZrt
2
Vs Vsr σe In 0
V = = + .
Vsr Vr Zr GZst Zr GZrt 0 σe2 IN −n

The following theorem from Royall (1976) gives the best linear unbiased predictor (BLUP)
of ct Y as well as its error variance under the general linear model, in the case of finite pop-
ulation and serves as a general basis of the BLUP approach to small area estimation with
unit level models. In this context, the best linear unbiased predictor means an unbiased
predictor which is linear in observed data (Ys ) and has a minimum error variance among
all linear unbiased predictors.

The general prediction theorem.

Under the model ξ for a finite population U the best linear model-unbiased predictor of
h(Y ) = ct Y is

BLUP(ct Y ) = cts ys + ctr [Xr β̂ + Vrs Vs−1 (ys − Xs β̂)], (4.3)

and the error variance is

Varξ [BLUP(ct Y ) − ct Y ] =ctr (Vr − Vrs Vs−1 Vsr )cr +

ctr (Xr − Vrs Vs−1 Xs )(Xst Vs−1 Xs )−1 (Xr − Vrs Vs−1 Xs )t cr ,

where
β̂ = (Xst Vs−1 Xs )−1 Xst Vs−1 ys
is the general least squares (GLS) estimator of β.

Note that the GLS estimator is also the best linear unbiased estimator (BLUE) of β, that
is, it has the minimum variance among linear unbiased estimators (McCulloch and Searle,
2001).
36 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

Proof:
As we are interested in predicting the part which is not observed, the information needed
will come from observed variables (sample vector ys ). Considering the decomposition of
the population quantity of interest

h(Y ) = cts Ys + ctr Yr ,

the random term is ctr Yr . Now suppose the predictor of this term is a linear function of
the data, i.e. the predictor ctr Yr is of the form at Ys where a is a vector to be specified. In
addition the estimator ĥ(Y ) is assumed to be unbiased. Now, define

Eξ (at Ys − ctr Yr )2 = Var(at Ys − ctr Yr ) + (Eξ (at Ys − ctr Yr ))2

= at Vs a + ctr Vr cr − 2at Vsr cr + [(at Xs − ctr Xr )β]2 , (4.4)

the last expression is obtained from the expression of V and the fact that E(Y ) = Xβ.

Now the BLUP(ct Y ) will be found by minimizing the Eξ (at Ys − ctr Yr )2 with respect to a
under the constraint of model unbiasedness,

Eξ (at Ys − ctr Yr ) = (at Xs − ctr Xr )β = 0,

for all β, which is equivalent to

at Xs − ctr Xr = 0.
Lets consider the following Lagrangian function

L(a, λ) = at Vs a − 2at Vsr cr + 2(at Xs − ctr Xr )λ,

where λ is the vector of Lagrange multipliers.

The first partial derivative respect a a is

∂L(a, λ)
= 2Vs a − 2Vsr + 2Xs λ,
∂a
equating to zero, yields

Xs λ = Vs a − Vsr , (4.5)

and then

a = Vs−1 (Vsr cr − Xs λ). (4.6)

Now multiply (4.5) by Xst Vs−1 yields Xst Vs−1 Xs λ = Xst Vs−1 Vs a−Vsr , using the unbiasedness
constraint at Xs = ctr Xr and then solving for λ, we get

λ = (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )cr ,

4.3. BLU AND EBLU PREDICTORS 37

then after substituting this expression in (4.6) and making some simplifications, we get

a = Vs−1 [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]cr . (4.7)

Recall that ĥ(Y ) = cs Ys + at Ys . Then substituting a with (4.7), yields

BLUP(ct Y ) = cts Ys + (Vs−1 [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]cr )t Ys
= cts Ys + ctr [Vsr − Xs (Xst Vs−1 Xs )−1 (Xst Vs−1 Vsr − Xrt )]t Vs−1 Ys
= cts Ys + ctr [Vsrt Vs−1 Ys − (Xst Vs−1 Vsr − Xrt )t (Xst Vs−1 Xs )−1 Xst Vs−1 Ys ]
= cts Ys + ctr [Vsrt Vs−1 Ys − (Xst Vs−1 Vsr − Xrt )t β̂]
= cts Ys + ctr [Xr β̂ + Vsrt Vs−1 (Ys − Xs β̂)],

where β̂ = (Xst Vs−1 Xs )−1 Xst Vs−1 Ys .

The expression of error variance is found in the same way by inserting (4.7) into (4.4) •

According to the decomposition, the area total variable Yi = ci y can be written as

X X X
Yi = yij = yij + yij ,
j∈Ui j∈si j∈si

where ci is an N × 1 vector of Ni ones and N − Ni zeros such that the ones correspond to
those yij s of y, which belong to area i. The same partition to ci yields

cis
ci =
cir

where cis picks the units in the sample si from area i and cir picks those in the remainder
ri . Now the area total to be estimated is

Yi = ctis ys + ctir yr

and the general prediction theorem can be applied directly. This gives

Ỹi,BLUP = ctis ys + ctir [Xr β̂ + Vrs Vs−1 (ys − Xs β̂)],

where the GLS estimate

β̂ = (Xst Vs−1 Xs )−1 Xst Vs−1 ys
is calculated over the whole sample data s. The covariance matrix V is assumed known here.
38 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

Considering the covariance matrix decomposition where Vrs = Zr GZst , we have

Ỹi,BLUP = ctis ys + ctir Xr β̂ + ctir Zr [GZst Vs−1 (ys − Xs β̂)] = ctis ys + (lti β̂ + mti ũ),

where lti = ctis Xr , mti = ctir Zr and ũ is the BLUP of u according to (3.20). Furthermore
lti β̂ + mti ũ is the BLUP of linear combination lti β + mti u (Rao, 2003).

Note that ctis ys = j∈si yij , lit = ctir Xr = j∈ri xtij and mti = ctir Zr is a row vector with
P P
Ni − ni in the entry i and zeros elsewhere. The BLUP is now
X X
Ỹi,BLU P = yij + ( xtij )β̂ + (Ni − ni )ũi ,
j∈si j∈ri

As mentioned in Chapiter 3, in practical situations, the components of covariance matrix

V need to be estimated since they are not known. One of the approaches that are used
include REML method. Replacing V by its estimator in GLS, we get REML estimator of
β and the EBLUP of Yi under REML estimation is
X X
Ŷi,EBLU P = yij + ( xtij )β̂REM L + (Ni − ni )ûi ,
j∈si j∈ri

where ûi is the EBLUP of ui obtained form (3.22).

4.4 Illustration Using Data from Battese et al. (1988)

In this section, we use the data provided in Battese, Harter and Fuller (1988). They con-
sidered data for 12 Iowa counties, obtained from the 1978 June Enumerative Survey of
the U.S. Department of Agriculture and the data obtained from land observatory satellites
(LANDSAT) during the 1978 growing season. They used an error components models for
the prediction of the county crop areas.

Specifically, Battese et al. (1988) specified a linear regression model for the relationship
between the reported hectares of corn and soybeans within sample segments in the Enumer-
ative Survey and the corresponding auxiliary data in order to determine the areas under
corn and soybeans. A nested error model, which defines a correlation structure among
reported crop hectares within the counties, was used. From which the mean hectares of
the crop per segment in a county was defined as the conditional mean of reported hectares,
given the satellite determinations (auxiliary data) and the realized (random) county effect.
The Battese et al. (1988) data showing the June-Enumerative survey and the Satellite
Data is presented in Table 4.1. The data shows, specifically;

(i) The number of segments in each county,

4.4. ILLUSTRATION USING DATA FROM BATTESE ET AL. (1988) 39

(ii) The number of hectares of corn and soybeans for each sample segment (as reported
in the June Enumerative Survey),

(iii) The number of pixels classified as corn and soybeans for each sample segment.

4.4.1 Fitting the Model

In this model fitting section we utilize the model proposed by Battese et al. (1988). How-
ever, a difference with respect to the approach of Battese et al. (1988), in this thesis, we
employed the Restricted Maximum Likelihood Estimation method (REML) for estimating
the model parameter, as reviewed in Chapter 3 while Battese et al. (1988) used a modified
option of the SUPER CARP. Furthermore, we calculate the confidence intervals and the
bootstrap interval for the fixed parameter.

According to Battese et al. (1988), the model construction is such that the reported crop
hectares for corn (or soybeans) in sample segments within counties are expressed as a
function of the satellite data for those sample segments. The unit level model is given by

yij = β0 + β1 x1ij + β2 x2ij + υi + eij , (4.8)

where i is the subscript for county (i = 1, 2, . . . , m = 12), j is the subscript for a segment
within a given county (j = 1, 2, . . . , ni , where ni is the number of sample segments in the
ith county), yij is the number of hectares of corn (or soybeans) in the j th segment of the
ith county, as reported in the June Enumerative Survey, x1ij and x2ij are the number of
pixels classified as corn and soybeans, respectively, in the j th segment of the ith county,
β0 ,β1 , and β2 are unknown parameters, vi is the ith county effect and eij is the random
error associated with the j th segment within the ith county. In addition the random effects,
vi (i = 1, 2, . . . , m = 12), are assumed to be iid N (0, σv2 ) random variables independent
of the random errors, eij (j = 1, 2, . . . , ni ; i = 1, 2, . . . , m) which are assumed to be i.i.d
N (0, σe2 ) random variables.

Next, the sample mean for the hectares of corn (or soybeans) per segment in the ith count
1 Pni
is given by ȳi. = ni j=1 yij while the sample mean numbers of pixels of corn and soybeans
are x̄1i. = n1i nj=1 x1ij and x̄2i. = n1i nj=1
P i P i
x2ij respectively. Thus model for this mean ȳi. ,
is given by
ȳi. = β0 + β1 x̄1i. + β2 x̄2i. + υi + ēi. ,
where ēi. = n1i nj=1
P i
eij .

The population mean hectares of corn (or soybeans) per segment in the ith county is
defined as the conditional mean of the hectares of corn (or soybeans) per segment, given
40 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

Table 4.1: Survey and Satellite Data for corn and soybeans in 12 Iowa Counties.
Mean number of
No.of pixels in pixels per
No.of segments Reported hectares sample segments segment
county Sample County Corn Soybeans Corn Soybeans Corn Soybeans
Cerro Gordo 1 545 165.76 8.09 374 55 295.29 189.70
Hamiliton 1 566 96.32 106.03 209 218 300.40 196.65
Worth 1 394 76.08 103.60 253 250 289.60 205.28
Humboldt 2 424 185.35 6.47 432 96 290.74 220.22
116.43 63.82 367 178
Franklin 3 564 162.08 43.50 361 137 318.21 188.06
152.04 71.43 288 206
161.75 42.49 369 165
Pocahontas 3 570 92.88 105.26 206 218 257.17 247.13
149.94 76.49 316 221
64.75 174.34 145 338
Winnebago 3 402 127.07 95.67 355 128 291.77 185.37
133.55 76.57 295 147
77.70 93.48 223 204
Wright 3 567 206.39 37.84 459 77 301.26 221.36
108.33 131.12 290 217
118.17 124.44 307 258
Webster 4 687 99.96 144.15 252 303 262.17 247.09
140.43 103.60 293 221
98.95 88.59 206 222
131.04 115.58 302 274
Hancock 5 569 114.12 99.15 313 190 314.28 196.66
100.60 124.56 246 270
127.88 110.88 353 172
116.90 109.14 271 228
87.41 143.66 237 297
Kossuth 5 965 93.48 91.05 221 167 298.65 204.61
121 132.33 369 191
109.91 143.14 343 249
122.66 104.13 342 182
104.21 118.57 294 179
Hardin 5 556 88.59 102.59 220 262 325.99 177.05
165.35 69.28 355 160
104 99.15 261 221
88.63 143.66 187 345
153.70 94.49 350 190
4.4. ILLUSTRATION USING DATA FROM BATTESE ET AL. (1988) 41

the realized county effect υi and the values of the satellite data. Under the assumption of
the model (4.8), Battese et al. (1988) presented this mean by as

ȳi(p) = β0 + β1 x̄1i(p) + β2 x̄2i(p) + υi ,

where x̄1i(p) = N1i N 1 PNi

P i
j=1 x1ij and x̄2i(p) = Ni j=1 x2ij are the population mean numbers
of pixels classified as corn and soybeans per segment, respectively, in the ith county, and
Ni is the total number of segments in the ith county. The population mean pixel values
x̄1i(p) and x̄2i(p) are known since the number of pixels of corn and soybeans are available
from the satellite classifications for all segments in the ith county (Battese et al., 1988).

By letting Yi represents the column vector of the reported hectares of the given crop for
the ni samples segments in the ith count, Yi = (yi1 , yi2 , . . . , yini )t . Furthermore, consider
Y as the column vector of the reported hectares of the crop for the sample segments in the
m counties, Y = (Y1 , Y2 , . . . , Ym )t . Thus the model (4.8), expressed in matrix notation, in
marginal form, is

Y = Xβ + υ ∗ , (4.9)

where the row of X that corresponds to the element yij in Y is xij = (1, x1ij , x2ij )t and
β = (β0 , β1 , β2 )t . The random vector υ ∗ in (4.9) combines random effects and random
errors i.e υ ∗ = υ + e. The covariance matrix for the random vector υ ∗ in (4.9) is given by

E(υ ∗ υ ∗t ) = V = block diag(V1 , V2 , . . . , Vm ), (4.10)

where
Vi = Ji συ2 + Ii σe2
with Ji the square matrix of order ni with every element equal to 1 and Ii the identity
matrix of order ni (Battese et al., 1988).

In matrix notation the mean crop hectares per segment in the ith area are expressed as

ȳi(p) = x̄i(p) β + υi , (4.11)

1 PNi
where x̄i(p) = Ni j=1 xij = (1, x̄1i(p) , x̄2i(p) ).

4.4.2 Estimation and Prediction

The main focus is to predict the mean crop area (4.11) for the ith county assuming that
the county effect, υi are known. In the case where the random errors υij ∗ are known, the

best predictor of υi is the conditional expectation of υi , given the sample mean ῡi.∗ , where
42 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

ῡi.∗ = n−1
Pni ∗ ∗
i j=1 υij (Henderson, 1975). The expectation of υi , conditional on ῡi. , which is
considered as the best predictor has the following expression

E(υi |ῡi.∗ ) = ῡi.∗ gi , (4.12)

where gi = m−1 2 2 −1 2 ∗
i συ and mi = συ + ni σe . These results are due to the fact that υi and ῡi.
have a joint bivariate normal distribution.

Considering the variances components συ2 and σe2 to be known, the β parameters of the
model can be estimated by the generalized least squares estimator

β̃ = (X t V −1 X)−1 X t V −1 Y, (4.13)

which is equivalent to the ML estimator under the normal distribution of Y . Then after
(4.12) a possible predictor for the ith county effect, υi , is

υ̃i = υ̃i.∗ gi , (4.14)

where υ̃i.∗ = n−1

Pni ∗ ∗
i j=1 υ̃ij and υ̃ij = yij − xij β̃. The corresponding predictor ỹi for the
county mean crop area per segment (4.11) is

ỹi = x̄i(p) β̃ + υ̃i , (4.15)

for which, according to Harville (1985), is the best linear unbiased predictor of yi .

According to the section (4.3), the predictor for the finite population mean crop hectares
per segment in the ith county, is given by
 
Xni Ni
X
Ni−1  yij + (xij β̃ + υ̃i ) ,
j=1 j=ni +1

where the unobserved yij are replaced by the model predictions. It approaches the pre-
dictor (4.15) as the sampling rate decreases. Due to the fact that the sampling rates are
small in this application, this expression is approximated by the predictor (4.15) which is
the one of several predictors that have been suggested for the small area problem (Fay and
Herriot, 1979).

Hence combining (4.14) and (4.15), a feasible predictor for the mean crop area (4.11) in
county i is

ŷi = x̄i(p) β̂ + (ȳi. − x̄i. β̂)ĝi , (4.16)

4.4. ILLUSTRATION USING DATA FROM BATTESE ET AL. (1988) 43

where ĝi , the approximately unbiased estimator for gi , which was suggested by Fuller and
Harter (1987) is such that,

ĝi = 1 − ĥi ,
ĥi = [m̂i + k̂i + (n−1 2 −1 −1 2 −1 −1
i − c) ŵi ] [ni σ̂e + (ni − c)ni ŵi ],
m̂i = m̂.. + (n−1 2
i − c)σ̂e ,
−1 4
ŵi = 2d−1
e m̂i σ̂e ,
" m
#−2 " m #
X X
k̂i = 2σ̂e2 (σ̈f f + n−1
i )
−1
ni bi n2i bi (σ̈f f + n−1
i )
2
,
i=1 i=1
σ̈f f = max[0, (m − 5)−1 (m − 3)σ̂e−2 m̂.. − c],
m
!
X
bi = 1 − 2ni x̄i. (X t X)−1 x̄i. + x̄i. (X t X)−1 n2i x̂ti. x̂i. (X t X)−1 x̂ti. ,
i=1
di = n−1
i [1
t
− ni x̄i. (X X)−1 x̄i. ],
m
!−1 m !
X X
m̂.. = ni bi ni ui ,
i=1 i=1
ui = ȳi. − x̄i. (X t X)−1 X t Y,
m
!−1 m !
X X
c= ni bi ni di ,
i=1 i=1
m
X
de = (ni − 1) − 2
i=1

where de = 22 and m = 12 is the number of counties, in this application.

Under the assumptions that the random effects and error effects are normally distributed,
it is easy to see that Y is normal distributed with mean Xβ and Variance V , after (4.9)
and (4.10).

Now, considering the models for corn and soybeans separately, we first estimate the variance
components using restricted maximum likelihood estimation approach that we presented
previously, and then after applying (3.13), we find the restricted maximum likelihood esti-
mators for the β parameters . The estimators are given in tables 4.2 and 4.3 respectively.

The data used in this example were taken from Battese et al. (1988) and are presented in
table 4.1.
44 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

Table 4.2: The estimated parameters for corn.

Parameters Estimates
σv2 140.02
σe2 147.23
β0 51
β1 0.329
β2 −0.134

Table 4.3: The estimated parameters for soybeans.

Parameters Estimates
σv2 247.53
σe2 190.45
β0 −16
β1 0.028
β2 0.494

Note that the variance components estimates σ̂u2 and σ̂e2 do not have the explicit expres-
sions, they have been found after applying (3.12).

The confidence intervals for fixed effects are found using (3.14). To find the bootstrap in-
tervals for corn and soybeans, we follow the resampling procedure shown in section (3.5.2)
where B = 1000, replications of bootstrap, and then we apply (3.16). The results are
presented in tables 4.4 and 4.5 in case of normal confidence intervals and tables 4.6 and
4.7 in the case of bootstrap confidence intervals for corn and soybeans respectively.

The predictor for the mean crop area (4.11) in county i is calculated using (4.16) and the
results are presented in tables 4.8 and 4.9 for hectares of corn and soybeans respectively.

Table 4.4: Normal Confidence Interval for corn.

Parameters Estimates lower bound Upper bound
β0 βˆ0 = 51 34.536 67.464
β1 ˆ
β1 = 0.329 0.295 0.363
β2 ˆ
β2 = −0.134 −0.171 −0.0968
4.4. ILLUSTRATION USING DATA FROM BATTESE ET AL. (1988) 45

Table 4.5: Normal Confidence Interval for soybeans.

Parameters Estimates lower bound Upper bound
β0 βˆ0 = −16 −35.04 3.04
β1 ˆ
β1 = 0.028 −0.011 0.067
β2 βˆ2 = 0.494 0.4509 0.537

Table 4.6: Bootstrap Confidence Interval for corn.

Parameters lower bound Upper bound
β0 36.767 65.709
β1 0.2999 0.3575
β2 −0.1667 −0.1024

This example shows the potential of small area estimation methods. They are designed
to combine two types of information: We have (expensive) reliable information on the
amount of land use of crops but only a few sample segments. On the other hand, we have
(cheap) satellite auxiliary information on all the segments of the counties under study. By
“borrowing strength” from auxiliary information from other segments we have been able
to find the predicted mean hectares of corn (or soybeans) in those 12 Iowa counties.
46 CHAPTER 4. SMALL AREA ESTIMATION WITH LINEAR MIXED MODEL

Table 4.7: Bootstrap Confidence Interval for soybeans.

Parameters lower bound Upper bound
β0 −33.916 2.736
β1 −0.0104 0.065
β2 0.4514 0.534

Table 4.8: Predicted Mean Hectares of corn in 12 Iowa Counties.

county sample segments Predicted hectares
Cerro Gordo 1 122.6
Hamilton 1 123.2
Worth 1 119.1
Humboldt 2 117.2
Franklin 3 129.9
Pocahontas 3 102.0
Winnebago 3 122.2
Wright 3 120.2
Webster 4 103.6
Hancock 5 127.7
Kossuth 5 122.0
Hardin 5 134.4

Table 4.9: Predicted Mean Hectares of soybean in 12 Iowa Counties.

county sample segments Predicted hectares
Cerro Gordo 1 83.1
Hamilton 1 91.3
Worth 1 91.2
Humboldt 2 95.1
Franklin 3 80.6
Pocahontas 3 113.4
Winnebago 3 87.4
Wright 3 104.5
Webster 4 112.5
Hancock 5 93.3
Kossuth 5 99.6
Hardin 5 79.3
Chapter 5

Generalized Linear Mixed Models

(GLMMs)

5.1 Introduction

The linear mixed models that we revised in Chapter 3 are generally used in the situations
where the observations are continuous and assume that the relationship between the mean
of the dependent variable and the covariates can be modeled as a linear function. How-
ever, in some cases, the observations are correlated as well as discrete or categorical. The
assumptions of a linear relationship and independence are questionable due to the nature
of the dependent variable. In such cases the linear mixed models do not apply. There is
a need of extensions of linear mixed models in order to consider these cases in which the
observations are correlated and discrete or categorical at the same time. Such extensions
can be covered by the generalized linear mixed models (GLMMs) (Jiang, 2007).

The generalized linear mixed model is the most frequently used random effects model in the
context of discrete repeated measurements. Generalized linear mixed models are extensions
of generalized linear models that allow for additional components of variability due to un-
observable effects. Typically, the unobserved effects are modeled by the inclusion of random
effects in the linear predictor of the generalized linear model (McCulloch and Searle, 2001).

The structure of the chapter is as follows: In section 5.2 we present the Generalized Linear
Mixed Model with responses distributed according to a member of the Exponential family.
Parameter estimation is discussed in section 5.3, where we present a simulation-based
version of the Monte carlo EM algorithm.

47
48 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)

5.2 Model Formulation

As in chapter 3, two considerations may be taken into account to define a generalized
linear mixed model, such are: (i) conditional independence (given the random effects)
and a conditional distribution of the random variable of interest and (ii) the distribu-
tion of random effects. Let yij be the j th outcome measured for cluster (subject) i,
i = 1, 2, . . . , m, j = 1, 2, . . . , ni and yi is a vector of all measurements available for cluster i.
It is assumed that, conditionally on random effects ui , assumed to be drawn independently
from the N (0, D), the outcomes Yij are independent with densities from the Exponential
family of the form:

yij |ui ∼ indep. fYij |ui (yij |ui )

yij θij − b(θij )
fYij |ui (yij |ui ) = exp − c(yij , φ) . (5.1)
φ

Noted that the relationship between the mean of random variable of interest and covariates
is not a linear; this allows to find a transformation of the mean that is modeled as a linear
model in both factors: fixed and random.

E[yij |ui ] = µij

ηij = g(µij ) = xtij β + ztij ui , (5.2)

where ηij is the linear predictor and g(.) is a known function, called linkf unction (since
it combines together the conditional mean of yij and the linear form of predictors), with
xij and zij vectors of known covariate values, with β a vector of unknown fixed regression
coefficients, and φ a scale parameter. To that specification we have added ui which is the
random effects vector. Note that we are using µij here to denote the conditional mean
of yij given ui , not the unconditional mean. To complete the specification we assign a
distribution to the random effects:

ui ∼ fui (ui |D). (5.3)

5.3 Parameter Estimation

The common approach in estimating model parameters is based on the use of the likelihood
function. Unlike linear mixed models, the likelihood under a GLMM in the usual way does
not have a closed-form expression due to the fact that such a likelihood may involve high
dimensional integrals that cannot be evaluated analytically (Jiang, 2007). For such reason,
likelihood-based inference in GLMMs is still challenging. To illustrate these difficulties lets
consider the following example taken from Jiang (2007).
5.3. PARAMETER ESTIMATION 49

Let the random variable of interest conditional on the random effects have a Bernoulli
distribution with the probability of success pij . This means that, given the random effects
u1 , u2 , . . . , um1 and v1 , v2 , . . . , vm2 , binary responses yij are conditionally independent with

pij = P (yij = 1|u, v)

logit(pij ) = µ + ui + vj ,

where µ is an unknown parameter, u = (ui )1≤i≤m1 , and v = (vj )1≤j≤m2 . In addition,

random effects are independent and ui ∼ N (0, σ12 ), vj ∼ N (0, σ22 ), where the variances σ12
and σ22 are unknown. Hence the unknown parameters involved in the model which are
needed to be estimated are θ = (µ, σ12 , σ22 )t . We have:
m1 Y
Y m2
f (y|u, v) = f (yij |ui , vj )
i=1 j=1
m1 Ym2
Y exp{yij (µ + ui + vj )}
=
1 + exp{(µ + ui + vj )}
i=1 j=1
!m1 ( m1
)
1 −1 X 2
f (u) = p exp u
2πσ12 2σ12 i=1 i
!m2  
m2
1  −1 X
2

f (v) = exp vj .
 2σ22
p
2πσ22 
j=1

The (marginal) likelihood function under this model is of the form:

Z Z
L(θ|y) = . . . f (y|u, v)f (u)f (v)du1 . . . dum1 dv1 . . . dvm2 .

After some manipulations the corresponding log-likelihood under this model has the fol-
lowing expression:
m1 + m2 m1 m2
l(θ|y) = − log(2π) − log(σ12 ) − log(σ22 ) + µy..
2  2 2 
Z Z m1 Y
m2
{1 + exp(µ + ui + vj )}−1  ×
Y
+ log . . . 
i=1 j=1
 
m1 m2 m1 m2
X X 1 X 1 X
exp  ui yi. + uj y.j − 2 u2i − 2 u2j  du1 . . . , dum1 dv1 . . . dvm2 ,
i=1 j=1
2σ 1 i=1 2σ 2 j=1

Pm1 Pm2 Pm1 Pm2

where y.. = i=1 j=1 yij , yi. = i=1 yij and y.j = j=1 yij .
50 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)

The above integral has no explicit expression and it cannot even be simplified.

This example shows the limitation of likelihood based inference in GLMMs. To try either
to solve, or to avoid the computational difficulties, the alternative approach that can be
thought is the use of Monte Carlo Expectation-Maximization (Monte Carlo EM) algorithm
(McCulloch, 1997), which we present below. Before discussing the Monte Carlo EM ap-
proach, we give a brief overview of the EM algorithm.

The important point for the EM algorithm is to consider a complete data which is a combi-
nation of observed data and unobserved random variables (in this case we consider random
effects). The EM algorithm is an iterative routine requiring two primary calculations in
each iteration: Computation of a particular conditional expectation of the log-likelihood
(E-step) and maximization of this expectation over the relevant parameters (M-step).

Let y = (y1 , y2 , . . . , ym )t be the observed data vector and, conditional on the random ef-
fects, u, assume that the elements of y are independent and drawn from a distribution that
belongs to the exponential family. Furthermore, assume a distribution for u depending on
parameters that are elements of D.

Consider a second level model:

yi θi − b(θi )
fyi |u (yi |u) = exp − c(yi , τ ) .
τ
u ∼ fu (u|D). (5.4)

Here ηi = xti β + zit u, with xti and zit are the known vectors of covariates. The marginal
likelihood corresponding to (5.4) is given by

m
Z Y
L(β, τ, D|y) = fyi |u (yi |u)fu (u|D)du, (5.5)
i=1

which cannot usually be evaluated in closed form and has an integral that is not easy to
express in the explicit expression, with dimension equal to the number of levels of the
random factors, u as we mentioned above. Such integrals may be integrated numerically.
The goal is to develop algorithm to calculate fully parametric Maximum Likelihood (ML)
estimates based on the likelihood (5.5).

The EM algorithm thus works on the augmented log-likelihood to obtain the ML estimates
of the parameters over the likelihood (5.5). Now consider the class of models (5.4), where
to set up the EM algorithm, we consider the random effects, u, to be the missing data.
5.3. PARAMETER ESTIMATION 51

The complete data, w, is then w = (y, u), and the complete-data log-likelihood is given by
X
ln Lw = ln fyi |u (yi |u, β, τ ) + ln fu (u|D). (5.6)
i

The EM algorithm then takes the following form:

1. Choose starting values β (0) , τ (0) and D(0) . Set r = 0.

2. Calculate (with expectations evaluated under β (r) , and D(r) )

(a) β (r+1) and τ (r+1) , which maximizes E[ln fy|u (y|u, β, τ )|y].
(b) D(r+1) which maximizes E[ln fu (u|D)|y].
(c) r = r + 1.

3. If convergence is achieved, declare β (r+1) ,τ (r+1) and D(r+1) to be the maximum like-
lihood estimators (MLE’s) otherwise, return to step 2.

According to McCulloch (1997), in situations where the E-step is analytically troublesome,

we may approximate an expression of this step by Monte Carlo simulations. Note this E-
step is over the latent variable u. Below we present the implementation of the Monte Carlo
EM algorithm (Wei and Tanner, 1990) in which simulation methods are used to evaluate
the intractable integral at the E-step. This method uses simulated random samples from
the exact conditional distribution of the random effects vector u given the data y, obtained
via rejection sampling, using the marginal distribution of u as the candidate.

5.3.1 Monte Carlo EM Algorithm

The Monte Carlo EM (MCEM), introduced by Wei and Tanner (1990), is a modification
of the EM algorithm where the expectation in the E-step is computed numerically through
Monte Carlo simulations. The Monte Carlo estimate presents a tractable solution to prob-
lems where the E-step is not available in closed form (McCulloch (1997), Jiang (2007)).

Following McCulloch (1997), let y = (y1 , y2 , . . . , ym )t denote the observed data with distri-
bution f (y|ϕ) characterized by the s-vectors of parameters ϕ and ϕ = (β, τ, D). Consider
u = (u1 , u2 , . . . , um )t to be set of latent variables.
The EM algorithm thus works on the augmented log-likelihood ln f (y, u|ϕ) to obtain the
ML estimates of ϕ over the distribution f (y|ϕ) where it is assumed that
Z
f (y|ϕ) = f (y, u|ϕ)du.
52 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)

More specifically, the EM algorithm iterates between a calculation of the expected complete-
data likelihood

Q(ϕ|ϕ̂(r) ) = Eϕ̂(r) (ln f (y, u|ϕ)|y) (5.7)

where g(u|y, ϕ) is the conditional distribution of the latent variables given the observed
data and ϕ. In the cases where the E-step involves an integration that is difficult to eval-
uate analytically, we may estimate the quantity (5.6) from Monte Carlo simulations.

From this perspective, the conditional distribution of u|y cannot be found in closed form.
Consequently the expectations in 2a or 2b cannot be computed in closed form for the
model (5.4). In addition, the marginal density fy which is the integral of type (5.5) is not
straightforward to calculate due to dimension of the vector u. There exist the possibil-
ity of generating random observations from the conditional distribution of u|y by using a
Metropolis-Hastings algorithm without specifying fy . Consequently, it will be possible to
approximate the expectations steps using Monte Carlo (McCulloch,1997).

(r) (r)
If we obtain a sample u1 , . . . , uN from the distribution f (u|y, ϕ̂(r) ), this expectation may
be estimated by the Monte Carlo sum

N
1 X (r)
Q(ϕ|ϕ̂(r) ) = ln f (y, uk |ϕ). (5.8)
N
k=1

To establish the Metropolis-Hastings algorithm, we first identify the candidate distribu-

tion, hu (u), from which we will generate new observations. Thereafter, we identify the
acceptance function that gives the probability of accepting the new value. By letting u
denote the previous draw from the conditional distribution of u|y, we then generate a new
value, u∗k , for the k th component of u using the already specified candidate distribution.
Therefore, the result is u∗ = (u1 , . . . , u∗k , uk+1 , . . . , um ). We accept u∗ as the new value
with probability α; otherwise, we retain u. In this case, α is given by

fu|y (u∗ |y, β, τ, D)hu (u)

α = min 1, . (5.9)
fu|y (u|y, β, τ, D)hu (u∗ )
5.3. PARAMETER ESTIMATION 53

If hu = fu is chosen, where fu is a symmetric density function, the Metropolis-Hastings

algorithm is then called the Metropolis algorithm and the equation (5.9) simplifies to
Qm
fu|y (u∗ |y, β, τ, D)hu (u) ∗ ∗
i=1 fyi |u (yi |u, β, τ )fu (u |D)fu (u |D)
= m
fu|y (u|y, β, τ, D)hu (u∗ ) ∗
Q
i=1 fyi |u (yi |u , β, τ )fu (u|D)fu (u|D)
Qm
fy |u (yi |u, β, τ )
= Qmi=1 i ∗
. (5.10)
i=1 fyi |u (yi |u , β, τ )

The Metropolis step is incorporated into the EM algorithm to give an MCEM algorithm
as follows (McCulloch, 1997):
1. Choose starting values β (0) ,τ (0) and D(0) . Set r = 0.

2. Generate N values u(1) , u(2) , . . . , u(N ) , from fu|y (u|y, β (r) , D(r) ) using the Metropolis
algorithm described previously:

(a) choose β (r+1) and τ (r+1) , to maximize a Monte Carlo estimate of E[ln fy|u (y|u, β, τ )|y];
that is, N1 N (k)
P
k=1 fy|u (y|u , β, τ ).

(b) D(r+1) is chosen to maximize N1 N (k)

P
k=1 fu (u |D).
(c) r = r + 1.

3. If convergence is achieved, β (r+1) , τ (r+1) and D(r+1) are declared to be the maximum
likelihood estimators (MLE’s) otherwise, return to step 2.
According to MacCulloch (1997), the main advantage of the Monte-Carlo EM algorithm is
that it can be applied in situations where direct application of the EM algorithm is either
very difficult or impossible.

Illustration using a Logit-Normal Model

We apply the MCEM algorithm to fit a simple logit-normal model from McCulloch (1997),
which retains the Generalized Linear Mixed Model structure. Let y = {yij : j = 1, . . . , ni , i =
1, . . . , m} denote a set of binary response variables; here again one can think of yij as the
j th response for the ith subject.
Let xij be a covariate (or vector of covariates) associated with the (i, j) − th observation.
Conditional on the random effects U = u, the response are independent Bernoulli(pij ),
where

yij |ui ∼ Ber(pij )

ui ∼ N (0, σ 2 )

pij
log = βxij + ui .
1 − pij
54 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)

Let u1 , u2 , . . . , um be independent and identically distributed as Normal(0, σ 2 ). The marginal

likelihood of y is given by
 
Z m X ni m
X 1 X 
L(β, σ 2 ; y) = (σ 2 )−m/2 × exp [yij (βxij + ui ) − ln(1 + eβxij +ui )] − 2 u2i du.
Rm  2σ 
i=1 j=1 i=1
(5.11)
The above expression was found by applying exponential and log functions to the joint
likelihood.

We consider here a data set generated using the values of parameters proposed by Booth
j
and Hobert (1999): With β = 5, σ 2 = 1/2, ni = 15, m = 10 and xij = 15 for each i, j.
A version of the complete data log-likelihood is given by
HH j
HH 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
H
i H
1 1 0 0 0 0 1 1 0 1 1 1 1 1 1 1
2 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
5 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1
6 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1
7 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
9 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Table 5.1: Simulated data for the logit-normal model

m m ni
m 1 X 2 XX
lw (β, σ, y, u) = − ln(σ 2 ) − 2 ui + [yij (βxij + ui ) − ln(1 + eβxij +ui )].
2 2σ
i=1 i=1 j=1
(5.12)
To apply the EM algorithm in this problem we would need to compute the (conditional)
expectation of lw with respect to the density
 
m Xni m
X 1 X 
h(u|y; θ) ∝ exp [yij (βxij + ui ) − ln(1 + eβxij +ui )] − 2 u2i . (5.13)
 2σ 
i=1 j=1 i=1

It is easy to see that the resulting integral will be intractable. Thus we consider a Monte
Carlo EM algorithm as the means to simulate observations from the distribution given
5.3. PARAMETER ESTIMATION 55

by (5.13). We apply a variable-at-a time Metropolis-Hastings independence sampler with

Normal(0, σ 2 ) proposals. The starting values of these runs were (β (0) , (σ 2 )(0) ) = (2, 1).
After applying the Metropolis algorithm for each E-step of EM we found the values of the
parameters: β̂ = 5.2, σ̂ 2 = 1.1.

The following figure shows the convergence of the Metropolis algorithm.

In this chapter we presented the Generalized Linear Mixed Models and pointed out the
potential complications that could happen due to the necessary integration with respect to
the random effects (unobserved data). An alternative solution to this problem is discussed
by presenting the Monte Carlo EM algorithm which rely on the simulation to approximate
expectations in their E-step. The next chapter uses GLMMs in the context of Small Area
Estimation.
56 CHAPTER 5. GENERALIZED LINEAR MIXED MODELS (GLMMS)

Metropolis Graph
120
100
80
60
40

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

iterations

Figure 5.1: Simulated from Metropolis algorithm with Normal density proposal and con-
ditional density f (u|y) target.
Chapter 6

Small Area Estimation Under

Mixed Effects Logistic Models

6.1 Introduction
In this chapter, we consider a subclass of generalized linear mixed models known as nested
models with a binary response. This is basically a subgroup of the models we discussed in
Chapter 5.

Chandra, Chambers and Salvati (2009) assert that the model-based methodologies allow for
the construction of efficient estimators by utilizing a suitable model for borrowing strength
especially with the limitation of having a small sample size. In the small area estimation
context, each small area is characterized by its area-specific effect while the auxiliary in-
formation may cover all the areas of interests. Hence, the explicit linking models based
on random area-specific effects that take into account between areas variation beyond the
ones that are explained by auxiliary variables included in the model are used (Chandra et
al.(2009), Rao (2003) and Chandra and Chambers (2009)).

When the response variables are continuous, the most common choice of inference tools to
be employed include the empirical best linear unbiased predictor (EBLUP) approach under
the linear mixed model (LMM) which we discussed in chapter 3. Chandra et al. (2009)
considered the situation where the variable of interest is binary but the related small area
estimates are needed. They used the empirical best predictor (EBP) under a generalized
linear mixed model (GLMM) with logistic link function since the estimation of EBLUP
is problematic when the variable of interest is binary and the sample size is small (Rao
(2003), Chandra et al. (2009)).

This chapter describes a real case of discrete observations where small area techniques can

57
58CHAPTER 6. SMALL AREA ESTIMATION UNDER MIXED EFFECTS LOGISTIC MODELS

be applied and how the estimation of parameters of interest can be done when the variables
of interest are discretes (proportions).

6.2 Description of a Case Study

In order to illustrate the use of small area estimation context, in this section we present
the model used in a study on occupational health in Canada.

6.2.1 Problem
We present an example of a real case, adopted from the article by Ghosh, Natarajan,
Stroud and Carlin (1998). Their main objective was to estimate the proportion of workers
who have experienced any negative impact of exposure to health hazards in the workplace
for every one of the 15 × 2 × 2 = 60 groups cross-classified by 15 geographic regions and
the 4 demographic categories. The data presented in this article comes from a 1991 sample
of all persons in 15 geographic regions of Canada. The researchers sought a response to
the question “Have you experienced any negative impact of exposure to health hazards in
the workplace?”. In this survey, the responses were grouped into four categories: (1) yes,
(2) no, (3) not exposed, and (4) not applicable or not stated. However, in this thesis we
adopted only two classifications: (1) yes and (2) no, since for simplicity, we are interested
in having a binary response. For each region, workers were classified by age (≤ 40 or >
40) and sex (male or female). In the next subsection, we show how to find the probability
of exposure to negative health hazards according to a given worker’s classification using
small area models.

6.2.2 Proposed Model

Following the methodology of Chandra, Chambers and Salvati (2009) let pij denote the
probability that an individual j selected within the ith age-sex category had experienced
any negative impact of exposure to health hazards in the workplace; yij and ni correspond
to the response of workers who have been sampled and the number of individuals sampled
in the workplace in age-sex group (category) i, respectively. A population model for this
type of data arises when the sample counts yij are assumed to be conditionally distributed
as independent binomial variables with mean ni pij with the common age-sex by category
experienced any negative impact of exposure to health hazards probabilities pij following
the linear logistic mixed model

pij
logit(pij ) = ln = xtij β + ui = ηij , (6.1)
1 − pij
6.3. SMALL AREA ESTIMATION OF PROPORTIONS 59

where xij is a vector of region by age-sex group level covariates and ui is the category effect
and the fixed part can be expressed as

xtij β = µ + γa + γs + γas ,

where µ is the general effect, γa is the main effect due to the age, γs is the main effect due
to the sex, and γas is the interaction effect of the age and sex.

6.3 Small Area Estimation of Proportions

In this section we describe estimators for small area quantities based on generalized linear
mixed models. In particular, we focus on a binary response variable with the aim of
estimating the population proportions for the variable of interest in a given small area.

6.3.1 The Empirical Best Predictor for the Small Areas

When the response data are non-normal, the Generalized Linear Mixed Models (GLMMs)
are the most preferred approaches for the development of indirect estimates for the small
areas. The mentioned indirect estimators for small area quantities under GLMMs are usu-
ally considered as empirical best predictors (EBPs).

Now, consider U, as the finite population of size N with M non-overlapping sub-groups

(or small areas) partitions Ui , each of size Ni , i = 1, 2, . . . , M such that N = M
P
i=1 Ni
and U = U1 ∪ U2 ∪ . . . ∪ UM . Let j and i respectively be units within small areas, where
yij is the survey variable of interest (a binary variable in this case), known for sampled
units, xij is the vector of auxiliary variables (including the intercept), known for the whole
population. Denote si and ri the sample (of size ni ) and non-sample (of size Ni − ni ) in
small area i, respectively (Chandra et al., 2009).

The aim is to make inference about the small area quantity of interest (population propor-
tion) in a specific area i, i.e.
 
1 X 1 X X 
pi = yij = yij + yij .
Ni Ni  
j∈Ui j∈si j∈ri

To make the above inference about the population proportion (pi ), we define pij , the
probability that the j th unit within area i, yij has the attribute of interest, ui denotes the
random area effect for the small area i, assumed to be normally distributed with mean zero
and variance σu2 . Assuming that ui ’s are independent and

yij |ui ∼ Bin(1, pij ).

60CHAPTER 6. SMALL AREA ESTIMATION UNDER MIXED EFFECTS LOGISTIC MODELS

Therefore, a population model for the binary data is the GLMM with logistic link function,
given by

pij
logit{pij } = ln = ηij = xtij β + ui , (6.2)
1 − pij

with j = 1, 2, . . . , Ni , i = 1, 2, . . . , M , where β is the vector of regression parameters (Chan-

dra et al., 2009).

Rao (2003) highlights that in the small area estimation, the model (6.2) can be conveniently
expressed at the population level as presented next. Hence we define yU a vector of response
variable with elements yij , XU a known design matrix with rows xij , ZU = block diag(1Ni )
is the N × M matrix with area indicators, 1Ni is a column vector of ones of size Ni ,
u = (u1 , u2 , . . . , uM )t , and ηU denotes vector of linear predictors ηij given by (6.1). Let
g(.) be a link function, such that g(µ) = g(E(yU |u)). According to the eq. (5.2) we have
the linear form

g(µ) = ηU = XU β + ZU u. (6.3)

The expression (6.2) then defines a linear predictor for a GLMM if yU given u are indepen-
dent and belong to the Exponential family of distributions. Assume the vector of random
area effects u has mean 0 and variance G(σ) = σ 2 IM , where IM is the identity matrix of
order M . The link function g(.) in the case of binary data is typically a logit function
and the relationship between yU (response) and ηU is expressed through a function h(.),
defined by E(yU |u) = h(ηU ) (Jiang (2007), McCulloch and Searle (2001)).

Suppose we are interested in predicting a small area quantity of interest through a linear
combination of the response, ψ = aU yU , where aU = diag(ati ) is a M × N matrix and
ati = (ai1 , ai2 , . . . , aiN ) is a vector of known elements. To estimate the population propor-
tion pi for small area i, we consider a case where ati denotes the population vector with
value N1i for each population unit in area i and zero elsewhere.

To express the different components of the model with respect to the sampled and non
sampled parts, we decompose the vector yU so that its first n elements correspond to the
sampled units, and then partition aU , yU , ηU , XU and ZU according to the sampled and
non-sampled units

as ys ηs Xs Zs
aU = , yU = , ηU = , XU = , and ZU = ,
ar yr ηr Xr Zr

where s denotes components defined by the n sample units and r indicates components
defined by the remaining N − n non-sample units. Therefore, we have E(ys |u) = h(ηs )
6.3. SMALL AREA ESTIMATION OF PROPORTIONS 61

and E(yr |u) = h(ηr ). Here h(.) is considered as g −1 (.). Now the expression of quantity of
interest ψ = aU yU is represented

ψ = as ys + ar yr = as ys + ar h(Xr β + Zr u). (6.4)

The first term of the right hand side in (6.4) is known from the sample, whereas the second
term, which depends on the non-samples values yr = h(Xr β +Zr u), is unknown and should
be predicted by fitting the model (6.3) using sample data.

Let ys = {ysij } and yr = {yrij } denote the vector of sample values and the vector of non-
samples values of the binary survey variable y respectively. The parameter of interest pi
for each small area can then be obtained by predicting each element of yrij . Since we need
to fit a GLMM and since it is difficult to find the explicit expression of marginal likelihood
function (see chapter 5), the parameter estimation is achieved via the Monte Carlo EM
technique. This gives the best linear unbiased estimate (BLUE) for β and the best linear
unbiased predictor (BLUP) for u when the value of σ is assumed to be known. Using (6.3)
we obtain the BLUP-type estimator of ψ. Using the estimated value σ̂ of σ leads to the
empirical BLUE β̂ for β and the empirical BLUP û for u and thus the empirical BLUP
type estimator of ψ, which is given by

ψ̂ = as ys + ar h(Xr β̂ + Zr û). (6.5)

Using (6.4) the empirical best predictor (EBP) to small area i proportion pi is then
 
1  X X 
p̂i = yij + µ̂ij , (6.6)
Ni  
j∈si j∈ri

where
exp(η̂ij )
µ̂ij = = p̂ij ,
1 + exp(η̂ij )
and
η̂ij = xtij β̂ + ûi .
Equation (6.6) gives the population proportion estimate that we had set out to achieve
(Chandra, Chambers y Salvati (2009), Saei and Chambers (2003) ).

As mentioned in Chapter 5, fitting a GLMM involves evaluating a likelihood function that

does not have a closed form analytical expression. It is worth noting that several approxi-
mations to this likelihood function and approximate maximum likelihood estimators have
been proposed in the literature (Jiang, 2007, McCulloch and Searle, 2001). In particu-
lar, the Penalized Quasi-Likelihood (PQL) approach is a popular estimation procedure for
the GLMM that is based on linear approximation to the non-normal response variable,
62CHAPTER 6. SMALL AREA ESTIMATION UNDER MIXED EFFECTS LOGISTIC MODELS

which is then assumed to have an approximately normal distribution. This approach is

reliably convergent but tends to underestimate variance components as well as fixed effect
coefficients (Breslow and Clayton, 1993). Saei and Chambers (2003) described an iterative
procedure to obtain the Maximum Penalized Quasi-Likelihood (MPQL) estimates of β and
u for given σ. They assert that at convergence, the MPQL estimate of ψ is obtained by
substituting the converged values of β and u. However, in practice the variance component
parameter σ is unknown and is estimated from sample data. One of the disadavantages
of the MPQL approach is that, the variance component estimator is biased thus limiting
its usage in the practice (Saei and Chambers, 2003). To address this shortcoming, the pa-
rameter estimates can be found using the ideas that we presented in chapter 5, where it is
shown that incorporating a Metropolis-Hastings step allows construction of a MCEM algo-
rithm for ML in GLMMs (Jiang (2007), McCulloch and Searle (2001), Saei and Chambers
(2003)).
Chapter 7

Concluding Remarks and Future

Research

In this work we have considered model-based approaches to finding the estimates of small
area target quantities. We employed statistical models that “borrow strength” in mak-
ing estimates of mean hectares of corn (or soybeans) in 12 Iowa counties where the small
areas of interest had sample sizes that varied from 1 to 3. Two types of information are
considered; information on the amount of land use for crops but only on a few sampled
segments and satellite information on all the segments of the counties under study, which
constituted the auxiliary information.

The linear mixed model was used to find the estimates for the regression coefficients and
the variance components using the Maximum Likelihood Estimation (MLE) and the Re-
stricted Maximum Likelihood Estimation (RMLE). However, the MLE was found to be
biased and so to calculate the variance components estimators, we recommend the use
of REML method instead. Furthermore, we calculated the confidence intervals for fixed
effects, both normal confidence intervals and bootstrap normal confidence intervals.

Under the GLMMs, due to the complexity of the (marginal) likelihood function, the use
of ML and REML estimates was limited by lack of explicit expressions and so we adopted
the Monte Carlo EM algorithm for maximum likelihood fitting of generalized linear mixed
models which uses simulation to construct Monte Carlo approximations at the E-step. For
each E-step, the Metropolis algorithm was used. The resulting parameter estimates were
found to be consistent with the model parameters used during the simulation. It is also
worth to note that the Metropolis algorithm converged pretty fast.

This work confined the attention to the the framework of mixed models with univariate
normal random area-specific effects. However, in real life, this assumption may not always

63
64 CHAPTER 7. CONCLUDING REMARKS AND FUTURE RESEARCH

be justified. It would be a rewarding topic for future work to assess the performance of
those models under multivariate normal random area effects.
Other recommendations on future research include:

• To consider the application of generalized linear mixed model in the context of small
area using real data.

• To set up and study methods of small area estimation under LMMs or GLMMs with
temporal and spatial effects.
Bibliography

[1] Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error component model
for prediction of county crop areas using survey and satellite data, Journal of the
American Statistical Association, 83, 28-36.

[2] Box, G.E.P and Jenkins, G.M (1970). Time Series Analysis Forecasting and Control,
San Francisco: Holden-Day, Inc.

[3] Brackstone, G.J. (1987). Small area data: Policy issues and technical challenges. In
Small Area Statistics (R. Platek, Rao J. N.K , C. E. Sarndal and M. P. Singh eds.)
3-20. Wiley, New York.

[4] Brady, T.W., Welch, K.B. and Andrzej, T.G. (2007). Linear Mixed Models. A Practical
Guide Using Statistical Software. Chapman and Hall, Taylor and Francis Group, New
York.

[5] Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear
mixed models, Journal of the American Statistical Association, 88, pp. 9-25.

[6] Chandra, H., Chambers, R. and Salvati, N. (2009). Small area estimation of propor-
tions in business surveys, Centre for Statistical and Survey Methodology, The Univer-
sity of Wollongong, Working Paper.

[7] Cressie, N.A. (1991). Geostatistical analysis of spatial data. Spatial Statistics and Dig-
ital Image Analysis. Washington, D.C.: National Academy Press.

[8] Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applications
to small area estimation, Annals of Statistics, 19, 1748-1770.

[9] Datta, G.S., Kubokawa, T. and Rao, J.N.K. (2002). Estimation of MSE in Small
Area Estimation, Technical Report, Department of Statistics, University of Georgia,
Athens.

[10] Diggle, P., Heagerty, P., Liang, K. and Zeger, S. (2002). Analysis of Longitudinal Data,
2nd ed., Oxford University Press, New York.

65
66 BIBLIOGRAPHY

[11] Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap, Chapman
and Hall, New York.

[12] Efron, B. (1982). The Jackknife, the Bootstrap and Resampling Plans, Philadeliphia:
SIAM.

[13] Erciulescu, A.L. and Fuller, W.A. (2013). Small area prediction of the mean of a bino-
mial random variable, JSM Proceedings. Survey Research Methods Section. Alexan-
dria, VA: American Statistical Association, 855-863.

[14] Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An appli-
cation of James-Stein procedures to census data, Journal of the American Statistical
Association, 74, 269-277.

[15] Fuller,W.A. (2009). Sampling Statistics, John Wiley and Sons, Hoboken, New Jersey.

[16] Ghosh, M., Natarajan, K., Stroud, T.W.F. and Carlin, B.P. (1998). Generalized linear
models for small area estimation, Journal of American Statistical Association, 93, 273-
282.

[17] Gonzalez, M.E. (1973). Use and evaluation of synthetic estimates, Proceedings of the
Social Statistics Section, American Statistical Association, pp. 33-36.

[18] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer.

[19] Harville, D.A. (1977). Maximum likelihood approaches to variance component esti-
mation and to related problems. Journal of the American Statistical Association, 72,
320-340.

[20] Harville, D.A. (1985). Decomposition of prediction error, Journal of the American
Statistical Association, 80, 132-138.

[21] Henderson, C.R. (1975). Best linear unbiased estimation and prediction under a se-
lection model. Biometrics,75, 423-447.

[22] Jiang, J. (2007). Linear and Generalized Linear Mixed Models and their Applications,
Springer, New York.

[23] Kubokawa, T. (2009). A review of linear mixed models and small area estimation,
University of Tokyo, CIRJE Discussion Paper. http// www.e.u.tokyo.ac.

[24] Larsen, M.D. (2003). Estimation of small-area proportions using covariates and survey
data, Journal of Statistical Planning and Inference, 112, 89-98.

[25] Longford, N.T. (2005). Missing Data and Small Area Estimation. Springer, New York.
BIBLIOGRAPHY 67

[26] Malec, D. (2005). Small area estimation from the american community survey using a
hierarchical logistic model of persons and housing units, Journal of Official Statistics,
21, 3, 411-432.

[27] McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed
models, Journal of the American Statistical Association, 92,162-170.

[28] McCulloch, C.E. and Searle, S.R. (2001). Generalized, Linear, and Mixed Models, New
York: Wiley.

[29] Montanari, G.E., Ranalli, G.M. and Vicarelli, C. (2010). A compar-

ison of small area estimators of counts aligned with direct higher
level estimates, Scientific Meeting of the Italian Statistical Society
http://homes.stat.unipd.it/mgri/SIS2010/Program/contributedpaper/678-1393-
1-DR.pdf.

[30] Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS.
Springer.

[31] Rao, J.N.K. (2003). Small Area Estimation, Wiley Interscience .

[32] Robinson, G.K. (1991). That BLUP is a good thing: The estimation of random effects,
Statistical Science, 6, 15-32.

[33] Royall, R.M. (1976). The linear least-squares prediction approach to two-stage sam-
pling, Journal of the American Statistical Association, Vol. 71, No. 355, pp. 657-664.

[34] Saei, A. and Chambers, R. (2003). Small area estimation under linear and generalized
linear mixed models with time and area effects. Methodology Working Paper- M03/15,,
University of Southampton, United Kingdom.

[35] Searle, S.R., Casella, G. and McCulloch, C.E. (1992). Variance Components, John
Wiley and Sons, New York.

[36] Van der Leeden, R., Busing, F.M.T.A. and Meijer, E. (1997). Bootstrap methods
for two-level models. Technical Report PRM 97-04, Leiden University, Department of
Psychology, Leiden.

[37] Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data,
Springer-Verlag, Berlin.

[38] Vizcaino, L.E., Cortina-Lombardia, M.J. and Morales-Gonzalez, D. (2011).

Multinomial-based small area estimation of labour force indicators in Galicia, X Con-
greso Galego de Estatı́stica e Investigación de Operacións Pontevedra.

Mixed Modeling Approach To Small Area Estimation: Shonosuke SUGASAWA
No ratings yet
Mixed Modeling Approach To Small Area Estimation: Shonosuke SUGASAWA
160 pages
(Ebook) Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi ISBN 9781597180788, 1597180785 2025 Download Now
100% (3)
(Ebook) Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi ISBN 9781597180788, 1597180785 2025 Download Now
173 pages
Handout 2020 Part1 PDF
No ratings yet
Handout 2020 Part1 PDF
82 pages
Introduction To Bayesian Methods in Ecology and Natural Resources Exclusive Download
100% (12)
Introduction To Bayesian Methods in Ecology and Natural Resources Exclusive Download
15 pages
Spatial Econometrics Guide
No ratings yet
Spatial Econometrics Guide
161 pages
Bayesian Statistical Methods
100% (11)
Bayesian Statistical Methods
288 pages
Bayesian Model Averaging of Space Time Car Models With Applicatio-1
No ratings yet
Bayesian Model Averaging of Space Time Car Models With Applicatio-1
60 pages
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
100% (14)
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
14 pages
(Ebook) Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi ISBN 9781597180788, 1597180785
100% (2)
(Ebook) Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi ISBN 9781597180788, 1597180785
81 pages
MUS2 Draft Contents November 2020
No ratings yet
MUS2 Draft Contents November 2020
14 pages
Previewpdf
No ratings yet
Previewpdf
46 pages
Essays in Bayesian Econometrics Modeling, Estimation, and Inference
No ratings yet
Essays in Bayesian Econometrics Modeling, Estimation, and Inference
106 pages
Econometric S
No ratings yet
Econometric S
1,341 pages
Multivariate Generalized Linear Mixed Models Using R 1st Edition Damon Mark Berridge Download
100% (1)
Multivariate Generalized Linear Mixed Models Using R 1st Edition Damon Mark Berridge Download
59 pages
EVT - Inductrial 6 12
No ratings yet
EVT - Inductrial 6 12
7 pages
Maximum Likelihood Estimation With Stata Fourth Edition William Gould 2025 Instant Download
100% (15)
Maximum Likelihood Estimation With Stata Fourth Edition William Gould 2025 Instant Download
101 pages
Maximum Likelihood Estimation With Stata Fourth Edition William Gould Full
100% (1)
Maximum Likelihood Estimation With Stata Fourth Edition William Gould Full
81 pages
(Interdisciplinary Statistics) Diggle, Peter - Giorgi, Emanuele - Model-Based Geostatistics For Global Public Health - Methods and Applications-CRC Press (2019)
No ratings yet
(Interdisciplinary Statistics) Diggle, Peter - Giorgi, Emanuele - Model-Based Geostatistics For Global Public Health - Methods and Applications-CRC Press (2019)
274 pages
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
No ratings yet
Bayesian Statistical Methods (Brian J. Reich, Sujit K. Ghosh)
288 pages
1
No ratings yet
1
130 pages
D-H Marara Research Report
No ratings yet
D-H Marara Research Report
103 pages
Maximum Likelihood Estimation With Stata Fourth Edition William Gould - The Full Ebook Version Is Ready For Instant Download
100% (17)
Maximum Likelihood Estimation With Stata Fourth Edition William Gould - The Full Ebook Version Is Ready For Instant Download
78 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
No ratings yet
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
387 pages
Ebook Econometrics
No ratings yet
Ebook Econometrics
1,006 pages
LectureNotes22 WI4455
No ratings yet
LectureNotes22 WI4455
154 pages
Andrew B Lawson - Using R For Bayesian Spatial and Spatio-Temporal Health Modeling-CRC Press (2021)
No ratings yet
Andrew B Lawson - Using R For Bayesian Spatial and Spatio-Temporal Health Modeling-CRC Press (2021)
300 pages
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
No ratings yet
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
10 pages
Course HEM245 2021
No ratings yet
Course HEM245 2021
157 pages
GLMM in Agriculture and Biology
No ratings yet
GLMM in Agriculture and Biology
436 pages
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
No ratings yet
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
284 pages
XXXX Statistical Estimation
No ratings yet
XXXX Statistical Estimation
87 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
295 pages
Course Notes
No ratings yet
Course Notes
141 pages
Spatial Statistical Modelling of Insurance Claims
No ratings yet
Spatial Statistical Modelling of Insurance Claims
79 pages
Glmext4 Preview
No ratings yet
Glmext4 Preview
27 pages
Jacob Rich
No ratings yet
Jacob Rich
38 pages
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition Scribd PDF Download
100% (12)
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition Scribd PDF Download
16 pages
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
100% (1)
Using R For Bayesian Spatial and Spatio Temporal Health Modeling - 1st Edition High-Resolution PDF Download
16 pages
Course Notes
No ratings yet
Course Notes
141 pages
Anatolyev Problemnik
100% (1)
Anatolyev Problemnik
116 pages
Previewpdf
No ratings yet
Previewpdf
42 pages
Hitchhiker Guide
No ratings yet
Hitchhiker Guide
168 pages
Bayesian Analysis of Time Series - Broemeling L. D. (CRC 2019) (1st Ed.)
100% (6)
Bayesian Analysis of Time Series - Broemeling L. D. (CRC 2019) (1st Ed.)
293 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
Loss Data Analytics Aug 2020
100% (1)
Loss Data Analytics Aug 2020
586 pages
Analysis of Variance For Random Models, Volume 2 Unbalanced Data Theory, Methods, Applications, and Data Analysis
No ratings yet
Analysis of Variance For Random Models, Volume 2 Unbalanced Data Theory, Methods, Applications, and Data Analysis
15 pages
Lecture Notes On Maximum Likelihood Estimation: Michael Peress December 30, 2024
No ratings yet
Lecture Notes On Maximum Likelihood Estimation: Michael Peress December 30, 2024
113 pages
Sta2604 Study Guide
No ratings yet
Sta2604 Study Guide
159 pages
Applied Robust Statistics-David Olive
No ratings yet
Applied Robust Statistics-David Olive
588 pages
Regression Analysis Course Notes
No ratings yet
Regression Analysis Course Notes
73 pages
Lecture Note 2019 PDF
100% (1)
Lecture Note 2019 PDF
235 pages
Concrete Solution Builders & Supply
No ratings yet
Concrete Solution Builders & Supply
2 pages
Well Control-Day 2 - MAASP & Types of Well Barriers
No ratings yet
Well Control-Day 2 - MAASP & Types of Well Barriers
19 pages
Sharma e T Al. 2022 Estruc&Divers
No ratings yet
Sharma e T Al. 2022 Estruc&Divers
13 pages
Introduction to Python Basics
No ratings yet
Introduction to Python Basics
97 pages
Chapter3 Comparative Static Analysis Math Econ 3rd y
No ratings yet
Chapter3 Comparative Static Analysis Math Econ 3rd y
6 pages
Crab Cave A Coastal Adventure For PCs Level 1-2
No ratings yet
Crab Cave A Coastal Adventure For PCs Level 1-2
11 pages
Lab 5
No ratings yet
Lab 5
3 pages
3GPP TS 32.645: Technical Specification
No ratings yet
3GPP TS 32.645: Technical Specification
16 pages
Secret Worlds: The Extraordinary Senses of Animals 1st Edition Martin Stevens Download
100% (4)
Secret Worlds: The Extraordinary Senses of Animals 1st Edition Martin Stevens Download
48 pages
DC Motors - Long
No ratings yet
DC Motors - Long
26 pages
HRM370 Group Project 3
No ratings yet
HRM370 Group Project 3
21 pages
Hermes
No ratings yet
Hermes
7 pages
Book Index The Art of Heavy Transport
50% (2)
Book Index The Art of Heavy Transport
6 pages
Performance Task in Math 9 Princess Allayne Fariola
No ratings yet
Performance Task in Math 9 Princess Allayne Fariola
9 pages
Vaccine Development Process Guide
No ratings yet
Vaccine Development Process Guide
9 pages
Communication's Impact on M&A Success
No ratings yet
Communication's Impact on M&A Success
20 pages
Omnidirectional Antennas 285-318
No ratings yet
Omnidirectional Antennas 285-318
34 pages
T-Loop Force System With and Without Vertical Step Using Finite Element Analysis
No ratings yet
T-Loop Force System With and Without Vertical Step Using Finite Element Analysis
8 pages
Yashwantrao Chavan Maharashtra Open University
No ratings yet
Yashwantrao Chavan Maharashtra Open University
1 page
HFC 227ea System Design Installation Manual REV 2 - Compressed
No ratings yet
HFC 227ea System Design Installation Manual REV 2 - Compressed
92 pages
Protecting Privacy in The Era of Artificial Intelligence: Keyur Tripathi & Usama Mubarak
No ratings yet
Protecting Privacy in The Era of Artificial Intelligence: Keyur Tripathi & Usama Mubarak
7 pages
Cluster Passat B5
No ratings yet
Cluster Passat B5
2 pages
Interlanguage
No ratings yet
Interlanguage
28 pages
PWR c1 715wac Datasheet
No ratings yet
PWR c1 715wac Datasheet
3 pages
Mordazas Tinius Olsen
0% (1)
Mordazas Tinius Olsen
4 pages
Decision Theory & Analysis Module
No ratings yet
Decision Theory & Analysis Module
4 pages
0417 - w23 - QP - 11 Ms Paer
No ratings yet
0417 - w23 - QP - 11 Ms Paer
16 pages
01 Laboratory Exercise 1
No ratings yet
01 Laboratory Exercise 1
3 pages
Business Process Reengineering
No ratings yet
Business Process Reengineering
3 pages
Differential Products: Daniel Measurement & Control Confidential Proprietary
No ratings yet
Differential Products: Daniel Measurement & Control Confidential Proprietary
47 pages

Te 517

Uploaded by

Te 517

Uploaded by

Centro de Investigación en Matemáticas, A.C.

STATISTICAL MODELS FOR SMALL AREA

Que para obtener el Grado de:

Maestro en Ciencias con Orientación en Probabilidad y

Dr. Rogelio Ramos Quiroga

Guanajuato, Gto, México, Noviembre de 2014

Presidente: Dr. Miguel Nakamura Savoy (CIMAT).

Secretario: Dr. Enrique Raúl Villa Diharce (CIMAT).

Vocal y Director de la Tesis: Dr. Rogelio Ramos Quiroga (CIMAT).

Dr. Rogelio Ramos Quiroga

This Thesis is dedicated

As well as the people mentioned above, my special thanks go to members of department of

My sincere thanks to all.

2 Approaches to Small Area Estimation 5

3 Linear Mixed Models (LMMs) 11

4 Small Area Estimation with Linear Mixed Model 29

5 Generalized Linear Mixed Models (GLMMs) 47

6 Small Area Estimation Under Mixed Effects Logistic Models 57

7 Concluding Remarks and Future Research 63

5.1 Simulated data for the logit-normal model . . . . . . . . . . . . . . . . . . . 54

Traditional methods of indirect estimation mentioned provide a link or a connection to

1.2 Structure of the Thesis

1.3 Main Contribution of the Thesis

Approaches to Small Area

2.2 Direct Estimation

where wi are the design weights, s is a selected sample, i ∈ s.

2.3 Indirect Estimation

2.3.1 Synthetic Estimation

2.3.2 Composite Estimation

Ŷic = ωi Ŷis + (1 − ωi )Ŷid ,

2.4 Small Area Models

2.4.2 Basic Area Level Model

where m is the number of small areas, β = (β1 , β2 , . . . , βp )t is p × 1 vector of regression

η̂i = xti β + υi + i . (2.3)

2.4.3 Basic Unit Level Model

yij = xtij β + νi + ij , (2.4)

where yij is the response of unit j, j = 1, 2, . . . , ni , in area i, i = 1, 2, . . . , m, xij is the

2.5 Comparisons Between Model-based with Design-based

Linear Mixed Models (LMMs)

3.2 Model Formulation

3.2.1 General Specification of Model

yij = xtij β + zij

yij = response of j th member of cluster i

with assumptions that

ui ∼ Nq (0, D), D ∈ Rq×q

u1 , . . . , um , e1 , . . . , em are assumed independent,Σi is covariance matrix of error vector ei

Covariance Structures for the D Matrix

In this case, the vector θD contains three covariance parameters:

In this case, the vector θD contains two parameters:

Covariance Structures for the Σi Matrix

3.2.2 General Matrix Specification

The matrix formulation for a LMM is of the form

with this, the Linear Mixed Model eq.(3.2) can be rewritten as

3.2.3 Hierarchical Linear Model (HLM) Specification of the LMM

3.2.4 Marginal Linear Model

The marginal model in matrix form is

3.3 Estimation in Linear Mixed Models

3.3.1 Maximum Likelihood (ML) Estimation

Case 1: Assume θ is Known

We solve that equation, we get the maximum likelihood estimator

Case 2: Assume θ is not Known

β̃(θ) = (X t V (θ)−1 X)−1 X t V (θ)−1 Y. (3.9)

At least to some extent, these problems of ML estimation can be avoided by adopting

3.3.2 Restricted Maximum Likelihood (REML) Estimation

(Y − Xβ)t V (θ)−1 (Y − Xβ) = β t X t V (θ)−1 X β − 2Y t V (θ)−1 Xβ + Y t V (θ)−1 Y

B(θ) = A(θ)−1 X t V (θ)−1 .

Adding and subtracting Y t B(θ)t A(θ)B(θ)Y , we get

(Y − Xβ)t V (θ)−1 (Y − Xβ) = (β − B(θ)Y )t A(θ)(β − B(θ)Y )+

B(θ)t A(θ) = V (θ)−1 XA(θ)−1 A(θ) = V (θ)−1 X.

= Y t V (θ)−1 Y − Y t B(θ)t A(θ)B(θ)Y.

3.4 Confidence Intervals for Fixed Effects

3.4.1 Normal Confidence Intervals

denotes an approximate 100(1 − α)% confidence interval for βi .

3.5 Bootstrapping Two-Level Models

A generic implementation of the nonparametric Bootstrap, assuming sample {z1 , z2 , . . . , zn }

In the case of two-level model,

η̂i = xti β + υi + i . (2.3)

yij = xtij β + νi + ij , (2.4)