Efron 1994
Efron 1994
To cite this article: Bradley Efron (1994) Missing Data, Imputation, and the Bootstrap, Journal of the American Statistical
Association, 89:426, 463-475
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall not be
liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of
the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Missing Data, Imputation, and the Bootstrap
Bradley EFRON*
Missing data refers to a class of problems made difficult by the absence of some portions of a familiar data structure. For example,
a regression problem might have some missing values in the predictor vectors. This article concerns nonparametric approaches to
assessing the accuracy of an estimator in a missing data situation. Three main topics are discussed bootstrap methods for missing
data, these methods' relationship to the theory of multiple imputation, and computationally efficient ways of executing them. The
simplest form of nonparametric bootstrap confidence interval turns out to give convenient and accurate answers. There are interesting
practical and theoretical differences between bootstrap methods and the multiple imputation approach, as well as some useful
similarities.
KEY WORDS: Bayesian bootstrap; Bootstrap confidence intervals; Data augmentation; Ignorable nonresponse; Nonparametric
MLE.
463
464 Journal of the American Statistical Association. June 1994
3
610.3, giving a bootstrap bias estimate -22.9 = 610.3 - 8.
tI
The number of bootstrap samples, 2,200, is ten times that ~
Table 2. Approximate Nonparametric Confidence Limits for the Maximum Eigenvalue 0, (1. l), Given the Observed Data o in Table 1
Confidence limit 01: ,025 .050 ,100 ,160 .840 ,900 ,950 .975
1. BC. 341 379 429 478 966 1,059 1,164 1,253
2. ABC 340 379 430 476 946 1,046 1,172 1,295
3. Full-mechanism SC,: 349 387 439 490 970 1,074 1,213 1,300
4. ABC for MLE: 289 353 409 458 1.014 1,135 1,307 1,474
5. Multiple Imputation: 345 382 428 468 864 946 1,063 1,177
NOTE Row 1: Nonparametric BC. method based on the 2200 bootstrap replicationsof Figure 1. Row 2 An analytic approximationto row 1 called ABC. requiringno Monte Carla replications. Row
3: A more elaborate bootstrap cmfidence method discussed in Section 2. Row 4 ABC limits for 0 based on the normal theory MLE 8 instead of t
k best-fit imputation(1.3). Row 5: Multiple imputation,
or data augmentation, limits. The five methods are explained in Sections 2, 3, and 5
confidence intervals are not necessarily better confidence in- The point is this: In nonparametric situations it is useful
tervals, but in this case the normal theory MLE seems a little to assess the statistical properties of a variety of estimators.
unrobust; see the end of Section 5. The nonparametric bootstrap has the advantage of applying
Based on ideas suggested by the EM algorithm, Rubin in a simple way to any estimator 8. Of course, this does not
proposed a theory of multiple imputation for assessing the obviate the need to sensibly select 8. It does mean that the
variability of an estimator 8 obtained in a missing data sit- statistician can choose 8 using the usual variety of exploratory
Downloaded by [New York University] at 18:56 21 October 2014
uation. Some good references are Rubin (1987), Rubin and tools, including experience and intuition, with the assurance
Schenker ( 1986), Tanner ( 199 l ) , and others listed in Section of being able to obtain reasonable estimates of 8’s vari-
4. Tanner and Wong (1987) gave a neat computational de- ability.
scription of the ideas involved, using the term data aug- The nonparametric and parametric situations converge
mentation. The multiple imputation or data augmentation when we have categorical data. In this case there is a non-
approach, described in Section 4, is quite different from the parametric MLE 8, which is asymptotically optimal in terms
bootstrap approach of Table 2. It is based on Bayesian rather of bias, variance, and so on. Now it is possible to directly
than frequentist theory. Nevertheless, Section 5 shows that compare the nonparametric bootstrap with multiple impu-
bootstrap methods can be useful for implementing data aug- tation. This comparison is made in Section 3.
mentation. Row 5 of Table 2 refers to approximate confi-
dence limits based on a data augmentation scheme. Section 2. NONPARAMETRIC BOOTSTRAP METHODS
6 briefly summarizes the advantages and disadvantages of This section discusses the logic of the simple nonpara-
the different approaches. metric bootstrap method that produced Figure 1. For com-
This article concerns nonparametric error estimates for parison, we also describe a different nonparametric bootstrap
missing data problems. There is an important practical dif- for missing data problems. The different structure of the two
ference between the parametric and nonparametric situa- bootstraps manifests itself in what we need to assume about
tions. It is natural in a parametric framework to estimate 8 the missing data mechanism. Near the end of the section we
by its MLE, 8. We know that 8 has asymptotically optimal briefly describe the BC, system of approximate confidence
properties for estimating 8, in terms of bias, variance, or
intervals used in Table 2.
more general confidence statements. The only problem is to Figure 2 diagrams a missing data problem and its non-
make such statements on the basis of a partially missing parametric bootstrap analysis. F is a population of units X j ,
data set.
The choice of an estimator 8 is usually not so clear in a
nonparametric setting. In our maximum eigenvalue problem,
we might have considered three different estimators: (1) the with N possibly infinite. A missing data mechanism, say 0,
= c(XJ),results in a population G of partially concealed
best-fit estimator (1.2)-( 1.5); (2) the Buck estimator (Buck
1960; Little 1983), in which 2,Jin ( 1.3) is replaced by a linear objects
regression predictor based on the complete cases and the G = { O j = c ( X , ) , j = 1 , 2, . . . , N } .
elements of $ in (1.4) are augmented by the addition of
residual covariances (to add variability back into the imputed In the example of Section 1, Xj is a vector of five scores for
values zl,);and (3) the normal-theory MLE. one student and 0, is the same vector with some or perhaps
The best-fit estimator could easily be biased downward. none of the numerical values concealed by question marks.
The normal-theory estimate, which is nearly optimal in a We wish to infer the value of a parameter of the popula-
normal sampling framework, seems to be overly variable for tion F ,
this data set. The Buck estimator is appealing here, being of
intermediate complexity between the best-fit and normal-
theory estimators. We could test the appeal by a nonpara- In our example s( F ) is the maximum eigenvalue of the co-
metric bootstrap analysis like that in Figure 1. The analysis variance matrix corresponding to F . A random sample of
might show that the Buck estimator is no more variable than size n is obtained from G , o = ( o l , 02, . . . , o n ) ,as on the
the best-fit estimator, in which case it would be preferable left side of Table 1. The nonparametric inference step esti-
in terms of smaller bias. mates G by the empirical distribution G of 0 ,
466
4
F
I
I
I
s
conceal
----- -> G
I
V
--sample
--
actual
> O
infer
->G
Journal of the American Statistical Association. June 1994
I
boots trap
sample
- > o*
infer
- => *;
I
It
V
eF
e Q*
Figure 2. Diagram of NonparametricBootstrapApplied to a Missing Data Problem. The individualmembers of a population of interest F are partially
concealed according to a missing-data mechanism, giving a population G of partially concealed objects; o is a random sample of size n from G; 6
is the empirical distribution corresponding to 0; and statistic 8 = t ( G ) estimates parameter 0 = t(G), which is intended to be a good approximation
to the actual parameter of interest 0, = s ( F ) . The bootstrap sampling and inference procedures duplicate those actually used, giving bootstrap
replications 8 = t(G '). The BC. and ABC methods, described later, give good approximate confidence intervals for 0 based on the bootstrap
replications.
6: probability l / n on oi for i = 1, 2, . . . , n . (2.4) hood function. In other words, we assume that each oihas
Downloaded by [New York University] at 18:56 21 October 2014
actual bootstrap
F -
cc
sample
> x
-#
conceal
>
- 0
- =
>
.
infer A
F -
sample
-
x*
conceal
-o* -
infer
__q
,*
F
Is
V
e
Is
8*
Figure 3. Full-Mechanism Bootstrap. x is a random sample from population of interest F; members of x are partially concealed by the missing-data
mechanism to give 0; F is an estimate of F based on 0; and the parameter of interest, 0 = s ( F ) , is estimated by 8 = s( F ) . The bootstrap sampling,
missing-data, and inference procedures are supposed to duplicate those that actually occurred. This requires specification of the missing-data
mechanism.
order errors made by the standard intervals (see Efron 1987, are interpreted as ordinary observations. Rubin (1987, 1978)
sec. 2). proposed drawing multiple random imputations of the miss-
Formula (2.12) looks complicated, but it is easy to apply. ing data rather than a single best-fit imputation. Variability
The hard part is computing the 2,000 bootstrap replications of results between the randomly imputed data sets can then
necessary to give the formula sufficient accuracy. The ABC be used to assess the true accuracy of an estimate 8. The
algorithm of Section 5 uses analytic approximations in place variability calculation is camed out by means of a Bayesian
of Monte Carlo replications. Typically it requires only a few updating scheme, quite different in concept from the boot-
percent as much computational effort as BC,. strap method of Section 2. This section briefly reviews mul-
tiple imputation, following the development of Tanner and
Remark A . The full-mechanism and nonparametric Wong (1987). Section 4 presents a third bootstrap method
bootstrap methods are identical in the important special case for missing data, based on multiple imputation. At first we
where we observe censored data from a survival analysis (see will use parametric notation, which is more natural in the
Efron 1981a). Bayesian framework. Then we will discuss categorical data,
Remark B. The basic idea of the nonparametric boot- for which it is easier to make a nonparametric comparison
strap of Figure 2 is to consider the oi , including the missing of multiple imputation with the bootstrap.
values, as randomly sampled points from a population G .
This idea has been used before in the sample survey literature, 3.1 Data Augmentation
though typically with methods like balanced repeated rep- Let o indicate the observed data set and let x indicate any
Downloaded by [New York University] at 18:56 21 October 2014
lications or the jackknife, rather than the bootstrap. Fay complete data set consonant with 0 . In the example of Table
(1986, sec. 4) used the jackknife on a complicated categorical 1, x could be any 22 X 5 matrix of numbers agreeing with
data problem with missing data. Greenwood‘s formula for o at all of its numerical entries. The actual complete data
the variance of a survival curve is a delta method version of set x giving rise to 0 , which would have been observed if
the same idea. Meng and Rubin (1989) suggested the pos- there were no missing data, is assumed to be sampled from
sibility of a bootstrap approach to missing-data problems. a parametric family with density function&(x), with 7 being
Laird and Lewis’ (1987) “Type I” and “Type 11-111” boot- an unknown p-dimensional parameter vector. Starting with
straps are examples, respectively, of the nonparametric and a prior density to(7 ) on t,Bayes’s theorem would give hy-
full-mechanism methods. pothetical posterior density [( 7 I x ) if x were observed and,
Remark C. The nonparametric bootstrap of Figure 2 more concretely, the actual posterior density [( 7 lo) having
can also be applied to K-sample problems observed 0 . A standard probability calculation relates [( 7 I 0 )
to [(six):
( G I ,G2, .,GK) + (01,02,. . ,OK), (2.14)
where the ok are independent random samples of size nk t(710) = 4(7lx)f(xlo) dx, (3.1)
obtained from populations Gk, the parameter of interest
being 0 = t ( G I ,G 2 , . . . , G K ) .The empirical distributions where f(x I 0 ) is the predictive density of x given 0 , the con-
Gk corresponding to the ok give independent bootstrap sam- ditional density integrating out 7 ,
ples 0: of size n k , from which we calculate G* and finally
a* = t( 6: , 6: , . . . , GE). The BC, intervals are calculated f(Xl0) = J h ( x l o ) t ~ d o )d7. (3.2)
from (2.12), (2.13) as before, with the acceleration a obtained
as in Remark J of Section 5. The integral in (3.1) is taken over all x consonant with 0 .
Remark D. DiCiccio and Efron ( 1992) showed that the Result (3. l), the data augmentation identity, can be stated
BC, and ABC intervals are second-order accurate if the data as follows. The posterior density of 7 , given the observed
set is obtained by random sampling from a multiparameter data 0 , is the average posterior density of 7 based on a com-
exponential family and 0 is a smooth function of the expec- plete data set x. The average is taken over the predictive
tation vector of the family. A discretization argument applies density of x given 0 . In a typical missing-data problem, com-
this result to the situation in Figure 2. We suppose that the puting [( 17 I x) is easy but computing [( 7 10) is difficult. If we
sample space of the oi can be discretized to a finite number can sample from the predictive density f(x I o), then (4.1)
of outcomes, say L of them. For the examples in Table 1, gives a practical way of approximating [( 9 I 0 ):
each of the five coordinates of an 0,vector takes its value in
the 102-element set { ?, 0, 1,2, . . . , loo} ,so we can take L (3.3)
= 1025 . The multiparameter exponential family referred to
above is the L-category family of multinomial distributions. where x(’),x ( ~ ). ,. . ,x ( ~are ) the multiple imputations, that
See the comments on finite sample spaces of Efron (1987, is, independent draws fromf(x10). This argument has a
sec. 8). circular look, because we need to know [( 7 lo) to calculate
3. MULTIPLE IMPUTATION f(x I 0 ) in (3.2). Tanner and Wong (1987) investigated an
iterative algorithm related to Gibbs’ sampling for actually
Best-fit imputation, as illustrated on the right side of Table carrying out (3.3). Noniterative approximationsare available,
1, conveys a false sense of accuracy if the imputed values as discussed later.
Efron: Missing Data and the Bootstrap 469
Most often, inferences are desired for some real-valued O( k ) is a subset of X,so that observing 0, = oj means that
function (or functions) of 7 , the unobserved X, = x, lies in the subset 0,.In what follows
we will let 6(x, 0)indicate whether or not x is among the
8 = S(?), (3.4) values contained in 0:
like the maximum eigenvalue in Section 1 rather than for
6(x, 0) = 1 i f x E o ( x E X,o E 0).(3.8)
the entire vector 7. The marginal posterior densities of 8, say
T(B lo) and a(8 I x) , are related by a marginalized version of = O ifxeo
(3.11,
As a simple example, suppose that a human population
4010) = s,
T(8lX)f(XlO) dx,
is categorized by sex and handedness. There are L = 4 original
(3.5) population values:
with f(x I 0 ) still being defined by (3.2). X(1) = (male, left), X(2) = (male, right),
The most obvious difficulty in applying (3.3)-(3.5) is the
X(3) = (female, left), and
generation of imputations x ( ~from ) the predictive density
f(x lo). A simple approach, called “poor man’s data aug- X(4) = (female, right). (3.9)
mentation” by Wei and Tanner (1 990), is to sample the x ( ~ )
If there are difficulties in ascertaining handedness, we will
from f , ( x I o), with 7 set equal to the MLE 5:
need K = 6 states in 0:O( k ) = X ( k ) for k = 1 , 2 , 3 , 4 and
-
drn) f , ( x I 01, the two additional states
Downloaded by [New York University] at 18:56 21 October 2014
independently for m = 1, 2, . . . , M . (3.6) O(5) = (male, ?) and O(6) = (female, ?). (3.10)
This could also be called a conditional parametric bootstrap In this case O(6) corresponds to { X(3), X(4)}, so that ob-
sample. In many situations (3.6) is quite satisfactory, though serving 0, = O(6) means that x, is either (female, left) or
it can underestimate variability if there is too much missing (female, right).
data. The missing-data mechanism can be described by the
A better approximation to the predictive density is often conditional probability density of 0, given X,, say
used. Each imputation is drawn from its own bootstrap-
ped choice of the parameter vector 7: c(olx) = prob{ 0, = 014= x }
ek = #{o, = O(k)}/n. (3.16) The first three steps of the multiple imputation algorithm
The MLE i of f is obtained from an empirical version of implement the approximate Bayesian bootstrap (3.7). Con-
ditional resampling is done according to the conditional
(3.15),
densities (3.8),
i = gD;. (3.17)
x:* loi - p i . ( - 10;) (i = 1, 2 , . . . , n ) . (3.21)
This is the self-consistencyproperty of the MLE as explained
by Dempster et al. ( 1977). Having selected i *, the sampling in (3.21) is conditionally
The trouble with using (3.17) to find the MLE i is that independent for i = 1, 2, . . . , n. The completed data set
x * * = (X** I , x $ * , . . . ,x,**)iswhatwecalledx~”~in(3.7).
Dr depends on the missing-data density c(oIx), which is
usually unknown. This is where ignorability comes in. Let The double star notation emphasizes the two levels of boot-
p r ( XI 0)indicate the “obvious” conditional density ofxgiven strap sampling involved.
0:
A basic assumption of the multiple imputation approach
is that inferences for 8 would be easy to make if there were
Pr(xlo) = 6(x, o1.L
Ic
XI
6(x’, O > . L G (3.18)
no missing data. We assume in Figure 4 that given x**, a
completed data set, it is easy to calculate a posterior density
[( f I x * * ) and marginalize [ to the appropriate posterior
pr( X I o) puts probabilities proportional tofx on each x in o. density T ( 8 I x * * ) for 8. In contrast, the nonparametric
Suppose that the selection mechanism of oi given x, was bootstrap uses the replications 8* to directly make inferences
predetermined in the following sense: First a disjoint partition about 8. The crucial marginalization step, from f to 8, is
Downloaded by [New York University] at 18:56 21 October 2014
P Iof X was defined, then x, was selectedaccordingto density handled automatically by the bootstrap confidence algorithm
f , and finally o, was chosen to be that member of PIinto BC, or ABC.
which x, fell. In this case p r (x, 10, ) would equal the actual Marginalization is a major difficulty in applying Bayesian
conditional density dr( x, I o, ), methods to high-dimensional problems, even without miss-
Ignorability implies that we can ignore the missing-data ing data. In genuine Bayesian situations there can be no
mechanism and set dr( X I 0 ) equal to p r ( X I 0) (see Rubin argument with inferences based on the posterior density
1986, sec. 7). Letting Pr be the L X K matrix (pr(xlo)), T (8 I 0 ) . However multiple imputation is often applied in an
(3.18) becomes f = gP; , and (3.15) takes on the more prac- objectivist framework, beginning, perhaps implicitly, with
tical form some form of uninformative prior. This can be tricky ground,
i = gp;. (3.19) where an apparently innocuous prior on the full parameter
vector leads to unexpected biases for 8 (see Bernard0 and
We will write f = MLE(g) to indicate the mapping from Berger 1991 and Tibshirani 1989). Section 4 concerns an
i to i implied by (3.19), ignoring questions of uniqueness easy and accurate marginalization technique, designed to
and existence. simplify the use of multiple imputation.
The nonparametric MLE for a parameter 8 = s( f ) is Figure 4 shows that the nonparametric bootstrap is simpler
8 = t ( 0 ) = s(MLE(g)). (3.20) and more direct than multiple imputation. On the other hand,
multiple imputation can be applied to parametric problems
If (3.20) is the function t used in Figure 2, then we know we
and can incorporate Bayesian information (Lazzeroni,
are estimating the correct parameter, because BF = s( f )
Schenker, and Taylor 1990). It more graphically conveys the
= t( g) = I3 as in (2.7). Also, 8 is asymptotically efficient. This
effect of the missing data, as will be seen in Figure 5. The two
choice o f t puts the nonparametric bootstrap on the same
methods are compared further in Section 6.
footing as multiple imputation.
Figure 4 is a comparison of the nonparametric bootstrap Remark F. Theoretically we could use the nonpara-
with multiple imputation in the nonparametric categorical data metric MLE (3.20) to estimate 8 in the maximum eigenvalue
problem. The top line shows the computational steps involved problem, by discretizing the data in Table 1 as we did in
in the nonparametric bootstrap. The resampling step o + o* Remark D. This is not at all practical, given only n = 22
is the simple bootstrap defined at (1.7). Here o and o* could points in a five-dimensional space. Ad hoc estimators like
just as wellbe labeledd and G* as in Figure2, or equivalently, those in Section 1 arise in nonparametric problems because
0 and g *. The bootstrap replication 8 * = t ( G * ) in Figure 2 is of the impracticality of full nonparametric maximum like-
given here by 8* = s(MLE(i*)) as in (3.20). lihood estimation.
Nonparametric
Bootstrap -
resample
’ 0
* MLE > ;*
.
s , *e. -
conditional
Mu1 t iply
Imputation 9
resample , o*
- MLE ~
f* * resample
>!! ** Bayes > s(E(~**) > n(e/x**)
Figure 4. Comparison of Nonparametric Bootstrap With Multiple Imputation for Categorical Data. Resample indicates nonparametric bootstrap
sample (1.7); MLE is the nonparametric maximum likelihood estimator for f, (3.19); the parameter of interest is 6 = s ( f); and conditional resamples
are obtained by the approximate Bayesian bootstrap method (3.21). Multiple imputation assumes that given the completed data set x *, it is easy to
compute the Bayes posterior density [ ( f I x * *) for f and then marginalize [ to the posterior density ~ ( I x6 * *) for 6.
Efron: Missing Data and the Bootstrap 47 1
I I
- . .
,.:
I
I
I
I
..' ~ I I
. . I I
I
I
I
Figure 5. Multiple-Imputation Bootstrap for the Maximum Eigenvalue Problem of Section 1. The left panel shows ABC confidence densities d ( 8 1 x("'))
for 25 imputed data sets x("'), m = 7, 2, . . . , 25. In the right panel, the solid line is ii(8 I o), ( 4 4 , the average of d ( 8 I xcm') for m = 1, 2, . . . , 50,
and the dashed line is & , ( 8 I o), the ABC density for the nonparametric bootstrap of Section 2; + ( 8 10) has a shorter upper tail than
~Lwr(~I0).
Downloaded by [New York University] at 18:56 21 October 2014
proximate system of confidence intervals for e based on data panel is the average of all 50 densities: Fi t ( 0 I o ) , (4.3). Row
x . The conjidence density for 0 given x is defined to be 5 of Table 2 comprises the appropriate percentiles of
7;t ( e 10).
The ABC method can also be used as a computationally
(4.2) efficient way to implement the nonparametric bootstrap of
Section 2. The dashed curve in the right panel of Figure 5
This density assigns probability .01 to t9 lying between the is T~,,,,,~,.(0 1 o ) , the ABC confidence density appropriate to
.90 and .91 confidence limits, and so on. By definition, the the nonparametric bootstrap. Row 2 of Table 2 comprises
100ath percentile of ?rt(81x) is e x ( a ) ,so a t ( 0 l x ) is just the percentiles of ?rfionDar(6 lo).
another way of describing the function ex( a).But the con-
In this case the multiple imputation confidence limits are
fidence density is convenient for use in (4. l), giving
too short in the upper tail. Section 5 traces the difficulty to
an overly influential student in Table 1, combined with the
M
1""
~ i t ( e 1 0 )= -
m=l
c at(eIX(m)). (4.3) normal-theory imputations.
5. THE ABC ALGORITHM
Confidence densities can be thought of as a way to auto-
matically marginalize a high-dimensional posterior distri- The main disadvantage of the BC, method is the large
bution to a single parameter of interest. If ex( a) represents number of bootstrap replications required. This computa-
a second-order accurate confidence interval endpoint, then tional burden can often be avoided by using analytical ex-
- _ .
472 Journal of the American Statistical Association, June 1994
pansions in place of the bootstrap Monte Carlo replications. quickly computes a en route to the confidence interval limits.
DiCiccio and Efron (1992) developed an algorithm called The algorithm uses an analytic approximation for zo that is
ABC, standing for "approximate bootstrap confidence" in- often more accurate than (2.13) even for very large numbers
tervals, that uses numerical second derivatives to accurately of bootstrap replications. In the maximum eigenvalue anal-
approximate the endpoints of the BC, intervals. The devel- ysis of Figure 1, the algorithm gave (20,a ) = (.190, .099).
opment in that paper is mainly for parametric exponential In addition to the MLE 8, the standard intervals 8
family problems. Here the algorithm is adapted to nonpara- k z(")G require only the calculation of G, often taken to be
metric problems, actually simplifying the calculations. The (5.3). The ABC intervals require three more constants, ( a ,
algorithm is given as an S function in the Appendix. zo, c,). Besides the acceleration a and the bias correction zo,
Given a bootstrap sample o* = (ol, 0 2 , . . . , Xn), (1.71, we need the quadratic coefficient
let Pt indicate the proportion of times 0;is represented in
0 *:
PT = #{o: = oi}/n ( i = 1 , 2 , . .., n ) . (5.1)
With the data o fixed, we can think of a bootstrap replication
8* = t( G* ) in Figure 2 as a function of the resampling vector (5.6)
P* =(P:, P?, . . . , PX),say8*= T(P*).Theresampling
Computationally the ABC algorithm is only slightly more
vector takes its value in the simplex
ambitious than a delta method analysis of standard error
Downloaded by [New York University] at 18:56 21 October 2014
8, = { P : P; 2 0, ZP, = l}. (5.2) and bias for 8. It gives considerably more information,
though, in the form of second-order accurate approximate
The resampled statistic 8* = T(P*) can be thought of as a
confidence limits for 8. The definitions of a , 20,and c, were
function on the simplex, forming a resampling surface over
motivated and explained by DiCiccio and Efron ( 1992). The
S,, as in figure 6.1 of Efron (1982). The geometry of the
Appendix presents a nonparametric version of the ABC al-
resampling surface determines the bootstrap confidence in-
gorithm, written in the language S of Becker, Chambers, and
tervals for 8. In the BC, method the surface is explored by
Wilks (1988).
evaluating T(P*) for some 2,000 random choices of P*.
The ABC algorithm approximates the BC, interval end-
The ABC endpoints require a total of 2n 2 k recom- + +
putations of T(P), with k being the number of endpoints
points by exploring the local geometry of the resampling
desired. This amounts to 54 recomputations in rows 2 or 4
surface, its slopes and curvatures, near the central point of
of Table 2, compared to some 2,000 recomputations for BC,.
the simplex Po = 1 /n. This is done using numerical deriv-
The number can be further reduced by grouping the data
atives instead of Monte Carlo, enormously reducing the
points 0,, say into pairs.
computational burden. This tactic fails for unsmooth statis-
The ABC algorithm requires that the statistic of interest
tics like the sample median, but it has worked well for a large
be expressed in the resampling form 8* = T(P*). In the
number of examples in DiCiccio and Efron ( 1992), and also
maximum eigenvalue example, calculations ( 1.2)-( 1.5) are
here for the maximum eigenvalue problem. The ABC inter-
carried through with weight P: on o,, rather than weight
vals were proven to be second-order accurate for smooth
statistics by DiCiccio and Efron.
l / n . We minimize C,CJ P: [o,, - ( p a, PJ)l2 rather + +
Statistical error estimates based on derivatives are familiar than (1.2), impute 2: = i* 2: + +
6; for the missing ele-
from delta method or influence function calculations. For ments of o*, and calculate the weighted covariance matrix
example, the nonparametric delta method estimate of stan- n
dard error is $* = 2 p:(R: - ji*)(R? - ji*)'
I= 1
(5.3)
Li=l J (c* = 5 PTiT) (5.7)
The vector t = ( 2 , , i 2 , . . . , in) is the empirical influence i= 1
expectation vector ; * ( m ) and covariance matrix $*("), say ? r t ( 0 Io)-in fact, it was slightly longer-tailed to the right;
;(x(")) now averaged 226.7.
$* = c P: (Xl"'
I= I
- b*)(x,
(m) - A *
Table 2 shows that the multiple imputation intervals where P; = ( P z l ,P z 2 , . . . , P z n k ) / P:. (5.17)
1 J=
are somewhat too short in the upper direction. The mul-
tiple imputation standard error estimate (Rubin and Then it can be shown that applying the one-sample abc al-
Schenker 1986), which does not involve (5.9), is similarly gorithm of the Appendix to S( P* ) gives exactly the same
small: confidence intervals as applying the appropriate K-sample
abc program to (5.15). The acceleration a required for the
;,,"It = [35,150 +
2,8971'" = 195.1, (5.12) BC, intervals is obtained by applying (5.4), (5.5) to S(P*).
compared to the direct delta method estimate ;= 220.0,
obtained by applying (5.3) to T(P*) defined from (5.7).
6. SUMMARY
There is no gold standard by which to judge Table 2, but Three bootstrap methods for missing-data problems have
the multiple imputation intervals are even 10%shorter than been presented: nonparametric, full mechanism, and mul-
the intervals based on complete data for the 22 students (Ef- tiple imputation. Here is a brief summary of their advantages
ron 1992a, table 1). (The complete data intervals are about and drawbacks.
474 Journal of the American Statistical Association, June 1994
of definitional bias and can even be used to assess the defi- [Received July 1992. Received June 1993.1
nitional bias in estimators like (1.2)-( 1.5). There is no
equivalent of the ABC algorithm for reducing the compu- REFERENCES
tational burden. Nor is there a simple formula like (5.5) for
the constant a used in the BC, method (though using a based Becker, R. A,, Chambers, J. M., and Wilks, A. R. (1988), The N e w s Lan-
on ( 5 . 5 ) seems to give reasonable results). guage, Pacific Grove, CA: Wadsworth and Brooks/Cole.
J. o.,and Bemardo, J. M. (1991), ‘‘On the Development of the
The full-mechanism bootstrap requires knowledge ofthe Berger, Reference Prior Method,” Proceedings of the 4th Valencia International
concealment mechanism x --* o in Figure 3. But it is some- Meeting on ~~~~~i~~statistics,
times of considerable interest, and even necessary to model Buck, S . F. (1960), “A Method of Estimation in Missing Values in Multi-
the concealment mechanism (see Rubin 1987, chap. 61, in variate Data Suitable for Use With an Electronic Computer,” Journal of
the Royal Statistical Society, Ser. B, 22, 302-306.
which case this is a less serious disadvantage. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Like-
~ ~ Imputation
l ~ i Bootstrap.
~ l ~ basic data augmen- lihood Estimation From Incomplete Data Via the EM Algorithm” (with
discussion), Journal ofthe Royal Statistical Society, Ser. B, 39, 1-38.
tation identity (3.1) is ideal for handling missing data Prob- DiCiccio, T. J., and Efron, B. (1992), “More Accurate Confidence Intervals
lems for which there is a genuine Bayes prior. Its application in Exponential Families,” Biometrika, 79.
to confidence intervals by of confidence densities Efron, B. (1 98 la), “Censored Data and the Boostrap,” Journal of the Amer-
ican Statistical Association, 76, 3 12-3 19.
(4.3), (5.9) is computationally straightforward once the (1981b), “Nonparametric Estimates of Standard Error: The Jack-
problem of Sampling from the predictive density f(x 10) iS knife, the Bootstrap, and Other Resampling Methods,” Biometrika, 68,
solved. Here we require knowing the conditional density of 589-599.
(1982), “The Jackknife, the Bootstrap, and Other Resampling Plans,”
x given 0 , but not of o given x as with the full-mechanism SIAM cBMs-NsF Monograph, 38.
bootstrap. Sampling methods like (3.6) or (3.7) are reasonable -( I 987), “Better Bootstrap Confidence Intervals and Bootstrap A p
surrogates for the predictive density. However the maximum proximations,” Journal ofthe American Statistical Association, 82, 17 I-
185.
eigenvalue example suggests that (4.3), (5.9) may be uncom- ( 1992), “Jackknife-After-BootstrapStandard Errors and Influence
fortably vulnerable to failures in the Parametric assumptions. Functions,” Journal ofthe Royal Statistical Society, Ser. B, 54, 83-127.
The multiple imputation bootstrap can be applied to -(1993), “Bayes and Likelihood Calculations From Confidence In-
parametric problems and to arbitrarily complicated data tervals,”Biometrika, 80, 3-26.
(1992b), “Six Questions Raised by the Bootstrap,” in Bootstrap
structures. Each multiple imputation x ( ~uses ) exactly the proceedings Volume, ed. R. hpage, New York: John Wiley.
same set of observed data, with only the imputed numbers Efron, B., and Stein, C. (1981), “The Jackknife Estimate of Variance,” The
Annals ofstatistics, 9, 586-596.
varying’ so that the are better conditioned On O ’ The Efron, B., and Tibshirani, R. J. (1986), “Bootstrap Methods for Standard
method fits in well with the EM algorithm, which is often ~rrors,Confidence Intervals, and Other Measures of Statistical Accuracy,”
used to find MLE’s in missing data situations. Results like Statistical science, I , 54-77.
~i~~~~5 give an assessment of how much the missing data Fay, R. E. (19861, “Causal Models for Patterns of Nonresponse,” Journal
of the American Statistical Association, 8 1, 354-365.
is affecting our answer. Asymptotic Properties ofthe Hafiigan, 0.F., and Little, R. J. A. (1991), “Multiple Imputation for the
imputation bootstrap, like second-order accuracy, have not Fatal Accident Reporting System,” Applied Statistics, 40, 13-29.
yet been investigated. Laird, N., and Lewis, T. A. (1987), “Empirical Bayes Confidence Intervals
Based on Bootstrap Samples” (with discussion), Journal ofthe American
APPENDIX: A NONPARAMETRIC ABC PROGRAM Statistical Association, 82, 739-757.
Lazzeroni, L. C., Schenker, N., and Taylor, J. M. G. (1990), “Robustness
The program abcnon, Written in the language s of kxker et al. of Multiple-lmputation Techniques to Model Misspecification,” in Pro-
(1988), evaluates the ABC intervals described in Section 3; t t ( P ) ceedings of the Survey Research Methods Section, American Statistical
is the resampling function T(P* ). Association.
Efron: Missing Data and the Bootstrap 475
Little, R. J . A. (1983), “The Ignorable Case,” in Incomplete Data In Sample (1987), Multiple Imputation for Nonresponse in Surveys, New York
Surveys, Vol. 2, Part VI, New York: Academic Press, pp. 341-382. John Wiley.
Little, R. J. A,, and Rubin, D. B. (1987), Statistical Analysis With Missing Rubin, D. B., and Schenker, N. (1986), “Multiple Imputation for Interval
Data, New York: John Wiley. Estimation From Simple Random Samples With Ignorable Nonresponse,”
Louis, T. A. (1982), “Finding the Observed Information Matrix When Using Journal ofthe American Statistical Association, 8 I , 366-374.
the EM Algorithm,” Journal ofthe Royal Statistical Society, Ser. B, 44, Tanner, M. A. (1991), Tools for Statistical Inference-Observed Data and
226-233. Data Augmentation Schemes, New York: Springer-Verlag.
Mardia, K. V., Kent, J. T., and Bibby, J. M. ( 1979), Multivariate Analysis, Tanner, M. A,, and Wong, W. H. (1987), “The Calculation of Posterior
New York: Academic Press. Distributions by Data Augmentation,” Journal of the American Statistical
Rubin, D. B. (1976), “Inference and Missing Data,” Biometrika, 63, 581-
Association, 82, 805-8 I 1.
592.
(1978), “Multiple Imputations in Sample Surveys-A Phenome- Tibshirani, R. (1989), “Noninformative Priors for One Parameter of Many,”
nological Bayesian Approach to Nonresponse,” Proceedings ofthe Survey Biometrika, 76, 604-608.
Research Methods Section, American Statistical Association, pp. 20-34. Wei, G . C. G.,and Tanner, M. A. (1990), “A Monte Carlo Implementation
(198 I ) , “The Bayesian Bootstrap,” The Annals ofStatistics, 9, 130- of the EM Algorithm and the Poor Man’s Data Augmentation Algo-
134. rithms,” Journal of the American Statistical Association, 85, 699-704.
COMMENT
Downloaded by [New York University] at 18:56 21 October 2014
Donald B. RUBIN*