Kriging
Kriging
In
this article, we explore two alternatives for charac-
terizing the kriging prediction: using GLS regression
and splines. Using GLS regression is arguably the
Introduction cleanest and easiest to interpret formula for kriging.
In contrast, the spline characterization is useful for
Kriging, at its most fundamental level, is an interpola- relating all three characterizations. The spline char-
tion method used to convert partial observations of a acterization also allows one to easily generalize from
spatial field to predictions of that field at unobserved the point-wise predictions discussed in this article to
locations. In geostatistics, the field may represent per- the prediction of other linear functionals (integrals or
meability of soil and the observations from physical derivatives for example).
measurements in a well-bore. In another example,
of particular importance for environmental problems,
the field may represent the concentration of pollu- The Basic Model
tants in a contaminated site and the observations from
remote sensing equipment. In this article we give a In this section, we present the basic modeling
short account of kriging under some basic model- assumptions which provide the main ingredients for
ing assumptions and observation scenarios. We give kriging and serve as the basis for the generalizations
a number of equivalent derivations of the kriging pre- discussed in section titled Extensions to the Basic
diction, each shedding light on a particular aspect of Model. We start by letting Y (x) denote the spatial
kriging. Taken as a whole, these derivations provide field of interest, where x ranges in !d . The basic
a broader perspective than any single derivation of kriging model stipulates the following form for Y
kriging can provide.
N
!
Kriging is named after the South African mining
engineer, D. G. Krige, who during the 1950s devel- Y (x) = βp fp (x) + Z(x) (1)
oped statistical techniques for predicting ore-grade p=1
distributions from ground samples [1]. The term krig- where Z is a centered (i.e., mean zero) correlated
ing was coined by Matheron [2] who is one of the random field; the functions fp are known basis func-
most influential contributors to the subject. By now, tions; and βp are unknown coefficients. We assume
kriging has become a predominant prediction tool in the field Y is observed at spatial locations x1 , . . . , xn
spatial statistics (see Ref. 3 for a historical account) with some additive noise corrupting the observations.
and can be shown to be related to spline interpola- In particular, our observations constitute n measure-
tion [4], generalized least squares (GLS) regression, ments y1 , . . . , yn where each yk has the form
Wiener filtering, and objective analysis in meteorol-
ogy [5]. Book length accounts of kriging can be found yk = Y (xk ) + σ #k
in Refs 2, 5–13.
One of the fundamental assumptions used in krig- where #1 , . . . , #n are independent mean zero random
ing is that the spatial field of interest is a realization variables (also independent of Y ) with standard
of a random field or a stochastic process. The ran- deviation 1. The goal is to then predict Y (x0 ) at
domness can be a consequence of some physical some spatial location x0 ∈ !d . Kriging produces such
mechanism generating the spatial field (e.g., diffusion a prediction, denoted Y "(x0 ), which is a linear function
process driven by random forcing) or may simply of the data y1 , . . . , yn .
be used to account for uncertainty. Either way, the The additive errors σ #k constitute what is called
random field model is used to define the kriging pre- a nugget effect. The nugget effect can model instru-
dictor as the optimal linear unbiased predictor, where mental noise or the presence of a microscale process,
optimality and unbiasedness are defined through the possibly discontinuous, which has correlation length
scales much smaller than Z. The nomenclature comes
Based in part on the article “Kriging” by Paul from the geostatistics literature (see Ref. 8) which
Switzer, which appeared in the Encyclopedia of uses a nugget effect to model the presence of small
Environmetrics. nuggets which contribute a spatially uncorrelated (or
nearly so) discontinuous field added to the larger scale Now, by replacing β in (3) with the GLS estimate β̂,
concentration variations modeled with Y . When there "(x0 )
one obtains the following kriging prediction Y
is no nugget effect (i.e., σ = 0) kriging becomes an (even if Gaussianity failed to hold):
exact interpolator.
To fix notation for the remainder of the article "(x0 ) = $0 $ −1 (y − Xβ̂) + xβ̂
Y (4)
we let K(x, y) ≡ cov(Z(x), Z(y)) denote the covari-
ance function of Z. Let y ≡ (y1 , . . . , yn )t denote This is arguably the easiest way to construct the
the vector of observations. Let β ≡ (β1 , . . . , βn )t kriging prediction and it makes clear how one can
denote
# $ vector of unknown coefficients and X =
the extend the basic kriging model to more complex
fp (xk ) denote the matrix of covariates at the observation scenarios and predictions. For example,
k,p if one observed linear functionals of Y , then the
observations, where the rows index the observations
basic structure of (2) does not change. The only
and the columns index the covariates. Also let x ≡
difference is a change in the design matrices X
(f1 (x0 ), . . . , fN (x0 )) be the row vector of the covari-
and the covariance matrices $ and $0 . Once these
ates at# the prediction
$ location x0 . Finally, the matrix
changes are made, (4) gives the appropriate kriging
$ = K(xi , xj ) + σ 2 I denotes the covariance estimate for these generalizations.
i,j
matrix for the observation% vector y where I is the& In the simplified case when β is known, the
identity matrix and $0 = K(x0 , x1 ), . . . , K(x0 , xn ) kriging estimate Y "(x0 ) generated from (3) is often
denotes the row vector of the covariance of Y (x0 ) referred to as simple kriging. When N = 1 and the
with the observations in y. covariate f1 (x) is a constant, Y "(x0 ) from (4) is
referred to as ordinary kriging. Finally, the prediction
"(x0 ) with no simplified assumptions on the size of
Y
Kriging with Generalized Least Squares
Regression N or the covariates fp is generally referred to as
universal kriging.
The most intuitive definition of kriging is easiest to
derive when additionally assuming both Z and the
Kriging as a Best Linear Unbiased Predictor
observations errors #1 , . . . , #n are Gaussian. In actu-
ality, Gaussianity is unnecessary but it makes the Our second characterization of kriging is arguably
exposition somewhat cleaner. Under the Gaussian the most popular. It defines kriging as optimal linear
assumption one has the familiar regression charac- unbiased prediction. One clear advantage of this
terization of the observations: y ∼ N [Xβ, $] (i.e., y viewpoint is that it makes it clear why one should use
is a Gaussian vector with mean Xβ and covariance kriging: it is unbiased and has the smallest expected
matrix $). Even more is true, by Gaussianity of Y , (squared) prediction error among all linear unbiased
the joint distribution of y and Y (x0 ) is given by estimates of Y (x0 ).
' ( )' ( ' (* To derive this characterization first consider
+ the set
Y (x0 ) xβ K(x0 , x0 ) $0 of linear combinations of the data λt y = nk=1 λk yk ,
∼N , (2)
y Xβ $0t $ where λ ≡ (λ1 , . . . , λn )t . One of these linear com-
binations is the kriging prediction Y "(x0 ). To isolate
If β were known, one would simply use the which one, first restrict λ by requiring λt y and Y (x0 )
conditional expectation E(Y (x0 )|y) to predict Y (x0 ), have the same expected value, irrespective of the
which can be computed as follows unknown coefficient vector β. We call this condi-
tion unbiasedness. To enforce this condition notice
E(Y (x0 )|y) = $0 $ −1 (y − Xβ) + xβ (3)
that Eλt y = λt Xβ and EY (x0 ) = xβ. Therefore it
However, the kriging model (1) assumes β is is clear that λt X = x (or equivalently Xt λ = xt ) is
unknown. To account for this uncertainty it is nat- a sufficient condition for unbiasedness. Of course,
ural to use the regression format of the observations the vector xt must be a member of the span of the
y ∼ N [Xβ, $] to estimate β. Indeed the GLS esti- columns of Xt for such a vector λ to exist. Assum-
mate of β is given by ing there exists at least one such λ, we can isolate
the optimal unbiased linear predictor—the kriging
β̂ = (Xt $ −1 X)−1 Xt $ −1 y predictor—by minimizing the mean square prediction
function K(x, y) (see Refs 7 and 8). To be explicit, covariance functions can be found in Refs [7 and
when computing the kriging prediction in (4), (6), 14].
or (7) and the mean square prediction error (5) Although the form of (8) seems complicated, there
one can replace all occurrences of K(x, y) with the are rich classes of functions G(x, y) which satisfy
function − 12 var(Z(x) − Z(y)) and obtain the same (8). For example, any covariance function K(x, y)
quantity. This is important since multiple covariance can be written as in (8) when k0 = 0 and G(x, y)
functions may have the same variogram. For exam- is set to − 12 var(Z(x) − Z(y)). This recovers the
ple, the two covariance functions (1 − |x − y|)+ and aforementioned fact that if one of the covariates
|x| + |y| − |x − y| both have the same variogram fp (x) is constant then the kriging prediction only
when x, y ∈ (0, 1). The advantage, then, is that one depends on the variogram. Another important class
does not need to spend time deciding between two of generalized covariance functions are indexed by a
competing covariance models if they have the same single parameter α > 0 and given by
variogram, for they yield the same predictions and
mean square prediction errors. Another advantage is Gα (x, y)
.
that variograms must satisfy a somewhat less strin- (−1)1+'α/2( |x − y|α , when α/2 )∈ "
=
gent requirement (they are conditionally negative def- (−1)1+α/2 |x − y|α log |x − y|, when α/2 ∈ "
inite) than the positive definite condition required for
covariance functions (see Section 2.3.6 in Ref. 8). where " denotes the set of integers. It can be shown
The ubiquity of the variogram in the kriging liter- that Gα (x, y) is a generalized covariance function of
ature can therefore be partially understood through order k0 = 'α/2( (see Sections 4.5.5 and 4.5.7 in
the simple fact that it is common for kriging models Ref. 7). Therefore, one can use Gα (x, y) in place
to have at least one of the covariates be a constant of K(x, y) in any of the formulas (4), (6), (7), or
function. (5) so long as the set of covariates {fp (x): p =
The simplification provided by the variogram can 1, . . . , N } contain all monomials of order less than or
be generalized further. In particular, suppose the equal to 'α/2(. Note that when 0 < α < 2 the func-
covariates {fp (x): p = 1, . . . , N } contain all mono- tion Gα (x, y) corresponds to a generalized covari-
mials with degree less than or equal to some non- ance function of order k0 = 0 for the fractional
negative integer k0 . In addition, suppose there exists Brownian covariance K(x, y) = |x|α + |y|α − |x −
a function G(x, y) which is related to K(x, y) as y|α . Another important case occurs when d = 2, α =
follows 2, and the set of covariates {fp (x): p = 1, . . . , N }
! !
K(x, y) = G(x, y) + ai (y)x i + bi (x)y i are exactly the set of monomials of order less
0≤|i |≤k0 0≤|i |≤k0 than or equal to k0 = 1. In this case, kriging with
! the generalized covariance G2 (x, y) yields the clas-
+ ci ,j x i y j (8) sic thin-plate spline smoother (see Section 2.4 of
0≤|i |+|j |≤2k0 Ref. 4).
One advantage provided by the decomposition
where i , j are d-dimensional multiindices; ai (y) and given in (8) is that one needs not worry about mod-
bi (x) are arbitrary functions; and ci ,j is an arbitrary eling aj (y), bj (x), or ci ,j , when the set of covari-
real number for each i , j . Then to compute (4), (5), ates {fp (x): p = 1, . . . , N } contain all monomials
(6), or (7) one can simply replace all occurrences of of order less than or equal to k0 since it has
K(x, y) with G(x, y) to obtain the same quantity. no impact on prediction or prediction mean square
A derivation is beyond the scope of this article, error. A particularly cogent example is the invari-
however, we remark that the natural way to show this ance of kriging to adding a random, mean zero,
fact is through the kriging characterization given by polynomial of order k0 to Z. If the random coef-
(6) along with the unbiasedness condition Xt λ = xt . ficients have finite second moments, then the only
Any function G(x, y) which satisfies (8) for some effect on the covariance structure of Z is through
positive definite function K(x, y) is said to be a aj (y), bj (x), and ci ,j in (8). Therefore, it is unneces-
generalized covariance function of order k0 [7, 12, sary to model any such additive random polynomial
14]. A detailed study of the theory of generalized in Z since it is inconsequential to kriging (so long
as the covariates fp contain a sufficient number of covariates are sufficiently rich. The simplest form
monomials). of cross validation divides the data into a test set
and a training set. The training set is used to pre-
dict (with kriging) the test set values. Prediction
Estimating the Covariance Structure squared error is measured then subsequently mini-
Throughout the above exposition it has been assumed mized to find the optimal values of θ and σ . This
that the covariance function K(x, y) and the standard is the basic cross validation technique (with many
deviation σ are both known. In practice, however, generalizations). Since kriging only depends on the
this is almost never the case. There are a multitude generalized covariance functions G(x, y) (when the
of different techniques for estimating K and σ . covariates contain enough monomials), the corre-
Examples include variogram model fitting, spectral sponding cross validation estimates of θ and σ only
density estimation, maximum likelihood (or REML) depend on G(x, y). We mention a minor simplifi-
estimation, Bayesian techniques, and cross validation. cation that occurs when performing cross validation
In this section, we present a short discussion of which can reduce the number of kriging parame-
two popular techniques: REML estimation and cross ters by one. When the covariance structure of Z
validation. is known up to a proportionality constant, so that
If K(x, y) is known up to some parameter vec- cov(Z(x), Z(y)) = θ 2 K0 (x, y), for example, then the
tor θ , then maximum likelihood estimation (MLE) kriging prediction only depends on σ 2 and θ 2 through
based on the data y provides a natural approach the ratio σ 2 /θ 2 . In particular, one can simply replace
for estimating θ and σ . One of the difficulties with K(x, y) with K0 (x, y) and σ 2 with σ 2 /θ 2 in (4),
the MLE, in this case, is the presence of the addi- (6), or (7) to obtain the same value. In doing so, the
tional (unknown) coefficients β in the data model number of parameters which needs to be estimated is
y ∼ N[Xβ, $θ,σ ], where $θ,σ denotes the covariance reduced by one. Note that a derivation of this fact fol-
matrix of y as it depends on the unknown param- lows simply from the GLS characterization of kriging
eters θ and σ . REML estimation circumvents this given in (4).
problem by restricting the data to linear combina-
tions of the data which do not depend on β. For
example, assume there exists a matrix M such that Extensions to the Basic Model
the rows form linearly independent vectors which are
The kriging methodology presented here constitutes
orthogonal to the columns of X. Notice this implies
only a small portion of what appears in the current
that MX = 0 and therefore My ∼ N[0, M$θ,σ M t ]
literature on kriging. In this section, we discuss some
which does not depend on β. The REML estimate
of the generalizations on the basic kriging model
is now defined as the maximizer of the likelihood
presented in section titled The Basic Model, with
for My over θ and σ . REML estimation is par-
the understanding that this is far from an exhaustive
ticularly well suited for the generalized covariance
review. We loosely characterize generalizations by
functions discussed in section titled The Variogram
and Generalized Covariance Functions. In particu- those which preserve some facet of linearity and those
lar, if the covariates {fp (x): p = 1, . . . , N } contain which do not.
all monomials with degree less than or equal to some There are a whole class of kriging problems
nonnegative integer k0 and K(x, y) is decomposed devoted to linear generalizations of the basic kriging
as in (8), then the covariance matrix M$θ,σ M t only methodology. For example, one can generalize the
depends on G(x, y). This fact along with the discus- stipulation, given in section The Basic Model, that
sion presented in section titled The Variogram and the observations and predictions are based on point-
Generalized Covariance Functions provides a unified wise observations of Y to the case that one observes
methodology for estimation and kriging within the or predicts linear functionals of Y . For example, in
class of generalized covariance functions. block kriging the problem is to predict a block aver-
age |B|1 / Y (x)dx where B is some d-dimensional
Cross validation is another approach which can B
be used to estimate σ and a covariance parameter region and |B| is the volume. In another situation,
θ . Like REML, cross validation only requires mod- the observations or predictions are derivatives of Y .
eling the generalized covariance functions when the Another linear generalization, called cokriging, deals
with the multidimensional setting where Y is a ran- [3] Cressie, N. (1990). The origins of Kriging, Mathematical
dom vector field, mapping !d to !m for some m > 1. Geology 22, 239–252.
Many of the results found in sections titled Introduc- [4] Wahba, G. (1990). Spline Models for Observational
Data, SIAM, Philadelphia.
tion and Estimating the Covariance Structure extend [5] Gandin, L. S. (1965). Objective analysis of meteoro-
easily to these examples. logical fields, Israel Program for Scientific Translations,
In contrast to the linear case, some generalizations Jerusalem.
introduce nonlinearity into the kriging methodology. [6] Armstrong, M. (1998). Basic Linear Geostatistics,
Consider, for example, the situation that the obser- Springer-Verlag, Berlin.
vations or predictions are nonlinear functions of Y [7] Chiles, J-P. & Delfiner, P. (1999). Geostatistics: Model-
ing Spatial Uncertainty, John Wiley & Sons, New York.
(indicators, maximum values, etc.). Another exam-
[8] Cressie, N. (1991). Statistics for Spatial Data, John
ple postulates nonlinearity in the mean function of Wiley & Sons, New York.
Y . Yet another considers predictors Y "(x0 ) which are [9] Goovaerts, P. (1997). Geostatistics for Natural Resources
nonlinear functions of the data y1 , . . . , yn . This is Evaluation, Oxford University Press, Oxford.
often used when the data is highly non-Gaussian so [10] Kitanidis, P. K. (1997). Introduction to Geostatistics:
that restricting to linear combinations of the data may Applications to Hydrogeology, Cambridge University
result in poor performance. Two cogent examples Press, Cambridge.
[11] Olea, R. (1999). Geostatistics for Engineers and Earth
are robust kriging and Bayesian kriging. In robust
Scientists, Kluwer, Dordrecht.
kriging, one attempts to mitigate the effects of out- [12] Stein, M. (1999). Interpolation of Spatial Data: Some
liers or highly skewed/non-Gaussian distributions. In Theory for Kriging, Springer, New York.
Bayesian kriging, one incorporates a prior distribu- [13] Wackernagel, H.(1998). Multivariate Geostatistics, 2nd
tion on the mean function of Y (x) which can produce Edition, Springer-Verlag, Berlin.
predictors which are nonlinear functions of the data [14] Matheron, G. (1973). The intrinsic random functions and
y1 , . . . , yn . their applications, Advances in Applied Probability 5,
439–468.
References
(See also Kriging, asymptotic theory; Covari-
[1] Krige, D. (1951). A statistical approach to some basic
ogram, examples of; Kriging for functional data;
mine valuation problems on the Witwatersrand, Journal Multivariate kriging; Random field, Gaussian;
of the Chemical, Metallurgical and Mining Society of Random fields; Screening effect)
South Africa 52, 119–139.
[2] Matheron, G. (1963). Principles of geostatistics, Eco- ETHAN B. ANDERES
nomic Geology 58, 1246–1266.