0% found this document useful (0 votes)
77 views10 pages

Application of Copulas in Geostatistics: Claus P. Haslauer, Jing Li, and Andr As B Ardossy

Copulas can be used to model spatial dependence structures independently of marginal distributions. Empirical copulas describe the dependence between variables based on their ranks rather than their values. This allows copulas to isolate the pure dependence structure. Copulas also allow estimating the full conditional distribution at interpolation points. The paper demonstrates applying empirical copulas to environmental datasets to model their spatial dependence structures and for use in interpolation and uncertainty estimation.

Uploaded by

seyyed81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views10 pages

Application of Copulas in Geostatistics: Claus P. Haslauer, Jing Li, and Andr As B Ardossy

Copulas can be used to model spatial dependence structures independently of marginal distributions. Empirical copulas describe the dependence between variables based on their ranks rather than their values. This allows copulas to isolate the pure dependence structure. Copulas also allow estimating the full conditional distribution at interpolation points. The paper demonstrates applying empirical copulas to environmental datasets to model their spatial dependence structures and for use in interpolation and uncertainty estimation.

Uploaded by

seyyed81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Application of Copulas in Geostatistics

Claus P. Haslauer, Jing Li, and Andras Bardossy

Abstract This paper demonstrates how empirical copulas can be used to describe
and model spatial dependence structures of real-world environmental datasets in the
purest form and how such a copula model can be employed as the underlying structure for interpolation and associated uncertainty estimates.
Using copulas, the dependence of multivariate distributions is modelled by the
joint cumulative distribution of the variables using uniform marginal distribution
functions. The uniform marginal distributions are the effect of transforming the
marginal distributions monotonically by using the ranks of the variables. Due to
the uniform marginal distributions, copulas express the dependence structure of the
variables independent of the variables marginal distributions which means that copulas display interdependence between variables in its purest form. This property also
means that marginal distributions of the original data have no influence on the spatial dependence structure and can not cover up parts of the spatial dependence
structure. Additionally, differences in the degree of dependence between different
quantiles of the variables are readily identified by the shape of the contours of an
empirical copula density.
Regarding the quantification of uncertainties, copulas offer a significant advantage: the full distribution function of the interpolated parameter at every interpolation point is available. The magnitude of uncertainty does not depend on the density
of the observation network only, but also on the magnitude of the measurements as
well as on the gradient of the magnitude of the measurements. That means for the
same configuration of the observation network, interpolating two events with very
similar marginal distribution, the confidence intervals look significantly different for
both events.

C.P. Haslauer ()


Department of Hydrology and Geohydrology, Institute of Hydraulic Engineering,
University of Stuttgart, Germany
e-mail: claus.haslauer@iws.uni-stuttgart.de
J. Li and A. Bardossy
Institut fur Wasserbau, Pfaffenwaldring 61, 70569 Stuttgart, Germany
e-mail: Andras.Bardossy@iws.uni-stuttgart.de

P.M. Atkinson and C.D. Lloyd (eds.), geoENV VII Geostatistics for Environmental
Applications, Quantitative Geology and Geostatistics 16,
c Springer Science+Business Media B.V. 2010
DOI 10.1007/978-90-481-2322-3 34, 

395

396

C.P. Haslauer et al.

1 Introduction
Generally, the workflow of spatial analysis is to first evaluate the spatial dependence
structure of measured data. In a second step, a stochastic model is employed to
mathematically describe the empirical spatial dependence structure. This theoretical
model is subsequently used for interpolation.
Different disadvantages of traditional geostatistical methods have been recognized in the past, most notably the fact that different percentile values can
have different degrees of dependence which cannot be expressed with traditional
Gaussian two-point geostatistics (Journel and Alabert, 1989). Additionally, several
assumptions have to be made when dealing with spatially distributed data. The most
basic assumption of any geostatistical analysis is that the set of measured parameter values z1 ; : : : ; zn is a realization of a random function. At every location there
are never enough measurements to determine the characteristics of the distribution
function of the parameter, and hence the treatment of measurements as realizations
of a random function is necessary. This random function is assumed to be identical
at every location.
Furthermore, in traditional geostatistics, second order stationarity is assumed,
implying that the two-point covariance exists and depends only on the separation
vector h of those two points. The assumptions when using copulas for spatial analysis are more restrictive than when using traditional geostatistics, because when using
copulas strong stationarity is assumed, the multivariate distribution function is taken
to be translation invariant.
These more restrictive assumptions require more effort but also offer advantages:
1. The marginal distribution, which might distort the dependence structure is filtered out using copulas. Thus the pure dependence structure of spatially distributed data can be obtained, and this structure is identical, no matter what the
marginal distributions of the measured data might be. Frequently applied data
transformations (e.g. taking the natural logarithm) do not influence a copula.
2. Different percentile values can have different degrees of dependence. For example, high values might exhibit a strong spatial dependence, low values a weak
spatial dependence, and values of a different quantile range yet another degree of
dependence.
3. At each location where interpolation is carried out, the full conditional distribution function of the interpolated value can be estimated. The shape of this
distribution function is not only dependent on the geometry of the measurement
network, but also on the values of the measurements. These factors allow for an
improved uncertainty quantification of the interpolation.
4. When using copulas as the underlying model for simulation, then values of similar magnitude are simulated to be neighbors.
5. A full stochastic model is the backbone for geostatistical analysis with copulas.

Application of Copulas in Geostatistics

397

2 Methods
This section explains the steps necessary to analyze spatially distributed data using copulas, for estimating the parameters of a theoretical copula function, and for
interpolation.

2.1 Using Copulas for Spatial Analysis


Any multivariate distribution F .t1 ; : : : ; tn / can be represented with a copula (Sklar,
1959):


(1)
F .t1 ; : : : ; tn / D C Ft1 .t1 /; : : : ; Ftn .tn / ;
where Fti .ti / represents the i -th one-dimensional marginal distribution of the multivariate distribution.
Assuming that C is continuous, then the copula density c.u1 ; : : : ; un / can be
written as
@n C.u1 ; : : : un /
:
(2)
c.u1 ; : : : ; un / D
@u1 ; : : : @un
A bivariate copula expresses a symmetrical dependence with respect to the minor
axis u2 D 1  u1 of the unit square, if
c.u1 ; u2 / D c.1  u1 ; 1  u2 /:

(3)

A Gaussian copula is fully symmetrical; a family of non-Gaussian copulas representing non-symmetrical dependence was introduced in Bardossy (2006) and
Bardossy and Li (2008).
Empirical copulas can be used to describe the spatial variability. For this purpose,
several assumptions are required (Bardossy and Li, 2008):
1. Similar to the variogram- or covariance functions, the bivariate spatial copula of
the random variable Z.x/ corresponding to two locations separated by the vector
h is assumed to be only dependent on h. The marginal distribution of Z.x/ is
supposed to be the same everywhere.
2. The parameterization of the copula should enable any n-dimensional copula corresponding to any selected n points to reflect their spatial configurations.
3. The parameterization of the copula should allow arbitrarily strong dependence.
Gaussian copulas and certain non-Gaussian copulas (as shown by Bardossy and
Li (2008)) fulfill these conditions. Further details on the theory of copulas can be
found in the books by Joe (1997) and Nelsen (1999). Details on using copulas with
spatially distributed data are given by Bardossy (2006).

398

C.P. Haslauer et al.

2.2 Empirical Bivariate Copulas


Empirical bivariate two-dimensional spatial copulas describe the dependence
structure between random variables independent of marginal distributions. Such
empirical copulas can be evaluated for different directions and different angles
between pairs of points, and they give insights into the form and the quality of
the spatial dependence structure of a field of spatially distributed values. Empirical
bivariate spatial copulas can be assessed from measured data z.x1 /; : : : ; z.xn / by
first calculating the empirical distribution function Fn .z/. Using this distribution
function for any given vector h, the set of pairs S.h/, consisting of distribution
function values corresponding to the parameter at locations X separated by the
vector h, can be calculated:
S.h/ D fFn .z.xi //; Fn .z.xj // j .xi  xj
h/ or .xj  xi
h/g:

(4)

S.h/ is thus a set of points in the unit square. Note that S.h/ is by definition symmetrical regarding the major axis u1 D u2 of the unit square, namely, if
.u1 ; u2 / 2 S.h/, then .u2 ; u1 / 2 S.h/.
Empirical bivariate copula densities for pairs of points separated by h are no
prerequisite to model a theoretical copula based on measurements! They are a possibility to express and visualize spatial dependence structures. On such density plots,
locations associated with low measurements are plotted close to the origin, and
points where the measured value is high are plotted far from the origin. If the empirical copula density for a certain quantile is high, then there are a lot of pairs of points
separated by the given distance which have the corresponding quantile values. On
Fig. 1, an example of a copula density plot, high copula densities are indicated by
dark shading.
As an alternative to dealing with multiple plots of empirical bivariate copula
densities, two scalar measures can be derived from the empirical copula space:
1. The rank correlation function Rank representing the degree of the spatial dependence (Equation 5).
2. A measure for the symmetry of the empirical copula density function representing for which range of quantiles the density is strongest (Sym, Equation 6).
High positive symmetry values indicate strong dependence for high quantiles,
high negative symmetry values indicate strong dependence for low quantiles. A
Gaussian type dependence would have zero symmetry.
Each of these measures is calculated for a given magnitude and/or angle of
anisotropy of the separation vector h. The number of pairs of points for each h
is denoted by n.h/.


X
1
1
1

 Fn .z.xj // 
Fn .z.xi // 
Rank.h/ D
12 n.h/ x x h
2
2
i

(5)

Application of Copulas in Geostatistics

Sym.h/ D

399



X
1
1 2
1

C
Fn .z.xi // 
Fn .z.xj // 
n .h/ x x h
2
2
i
j



1
1 2
C Fn .z.xi // 
Fn .z.xj // 
2
2

(6)

2.3 Parameter Estimation


The parameterization of a copula model for the description of spatial dependence
is not a trivial task. As shown in Section 2.2, the calculation of spatial copulas is
not based on independent samples (as observations are accounted for in a number of
pairs). Hence the parameterization of a copula model on empirical copulas is not appropriate, and instead Bardossy and Li (2008) proposed a more rigorous approach.
In this approach, the observation set is divided into subsets of arbitrary sizes. The
likelihood of the parameter vector for each subset is estimated by the copula density of the observations in this subset. The result is a set of optimal parameters as
given by the maximum likelihood of the product of the individual subsets.

2.4 Interpolation Using Copulas


The typical goal of an interpolation method is to estimate a random variable at unsampled locations x0 . In Section 3.2, results are discussed using two precipitation
events as examples; this section describes the interpolation algorithm:
1. The observation network consists of n locations x1 ; : : : ; xn . At each location
there are observations available, z1 ; : : : ; zn , which are transformed to u1 ; : : : ; un
by F .zi / D ui .
2. In the neighborhood of a x 0 , m observation points are selected.
3. The copula density value corresponding to those m locations and their observation values is calculated: cm .u1 ; : : : ; um /.
4. For the point x 0 , for a quantile v, the m C 1 dimensional copula density
cmC1 .u1 ; : : : ; um ; v/ is calculated.
5. The density function corresponding to x 0 conditioned on the n observations in
the vicinity is calculated:
c  .v/ D c.vk ju1 ; : : : ; un / D

cnC1 .u1 ; : : : ; un ; v/
cn .u1 ; : : : ; un /

(7)

6. The conditional copula C  is calculated from its density c  .


7. The conditional distribution C  at x 0 is transformed back into the space of the
measurement values, where

400

C.P. Haslauer et al.

Fx .z0 / D P .z.x/ z/ D
D P .U.x 0 / Fz .z// D
D C  .Fz .z//:

(8)

copula based interpolation offers choices for the estimated value: the summed
observations weighted by the conditional densities, the observed value corresponding to the 50% conditional distribution value (comparable to the median), or the
length of the interval between two quantile as a confidence interval of the estimate
(the length of Q80  Q20 as a 60% confidence interval).

3 Results
3.1 Analyzing Spatial Dependence Using Copulas
Plots of empirical copula densities are shown for the geological parameter hydraulic
conductivity (Fig. 1a), for the geohydrological parameter pH (Fig. 1b), as well as for
the meteorological parameter precipitation for two precipitation events in the Neckar
catchment in 1982 (Fig. 1c) and in 1992 (Fig. 1d). For the precipitation events, a
monitoring network comprising 950 stations in the German part of the Rhine catchment was available for this study. In all three cases, the empirical copula density
plots are not symmetric as defined in Equation (3), and hence the spatial dependence structure does not follow a Gaussian distribution function. It is also shown,
that for the given process, the dependence structure is different for different quantile
values, indicated by the degree of shading in different areas in the unit square.
The shape of the empirical copula density functions is very similar for both
events, whereas traditional variograms are quite different, since the marginal distributions for the two events are different. This might be an indication that empirical
copulas represent the physical structure of a given process, without the influence of
marginal distribution functions.

3.2 Interpolation and Associated Uncertainty Estimates


In this section, the two precipitation events for which an empirical bivariate copula
was shown on Fig. 1c and d are used to illustrate results of spatial interpolation and
uncertainty estimates using copulas. For each event the same theoretical copula was
fitted using the method described in Section 2.3.
This theoretical copula was subsequently used to interpolate mean precipitation
intensities, shown on Fig. 2a and b. Estimates were interpolated on a equidistantly
spaced grid for a total of 32,000 points. Each interpolation estimate was conditioned
on 12 surrounding measurement values. The advantage of being able to calculate

Application of Copulas in Geostatistics


1

401
1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0

Fig. 1 Empirical Copula density plots for: (a) a hydraulic conductivity field along Bordens crosssection AA (Sudicky, 1986) for 0.05 m vertical spacing, (b) for the groundwater quality parameter
pH based on the monitoring network in the province of Baden-Wurttemberg, Germany (Bardossy,
2006), and (c), (d) two precipitation events in the river Necker catchment, based on 950 stations,
for a separation of 5 km

the full conditional distribution function of the estimate at each interpolation point
is illustrated on Fig. 2c and d which show the length of the 60% confidence interval,
calculated by subtracting the 20% quantile from the 80% quantile.
Generally, in areas where the measured precipitation is high (as indicated by
the shading of the dots representing the measurement locations) the uncertainty is
low, and vice versa. Additionally, the copula method takes the homogeneity of the
interpolated field into account. In areas where the gradient of the measurements
is high, also the uncertainty of the interpolation is high. The circle shaped area of
high uncertainty to the east of the bow of the river Neckar in 1992 corresponds
to a confined area of high precipitation intensities, whereas the other area of high
precipitation intensities in 1992, to the west of the river Neckar is a more continuous,
larger area, and hence the uncertainties of the interpolation are smaller in that area.

402

C.P. Haslauer et al.

b
N

1982
0

10 km

1982
0

10 km

1992
0

10 km

Fig. 2 Maps of precipitation in the Neckar (blue line) catchment (black line). Measurement locations of precipitation intensity are shown as coloured dots, the colour representing the magnitude
of the precipitation intensity. Precipitation intensities are plotted for an event in 1982 on panel (a),
for an event in 1992 on panel (b). The corresponding 60% confidence intervals are shown on panels
(c) and (d). Panels (e) and (f) show the Ordinary Kriging standard errors (OK StdEr)

It is important to stress the fact that the shape of the contours, for the same observation network at two different events, is different when using copulas. Figure 3e
and 3.1 show Ordinary Kriging prediction standard error maps for the two events.
The shadings of both maps have a very similar geometry due to a nearly identical
semivariogram. However, the shape of the confidence intervals of interpolation using copulas is significantly different despite the fact that the same parameters for

Application of Copulas in Geostatistics

Precip. Intensity

403

Q80-Q20

OK Std. Error

104-260

min-90

1992

260.1-370

91-140

83-84

64-65.5

84-85

65.5-67

85-86

67-68.5

86-87

68.5-70

87-88

70-71.5

88-89

71.5-73

89-90

73-74.5

90-91

74.5-76

91-92

76-77.5

92-93

77.5-79

370.1-480
480.1-590
590.1-700

150-190
200-240
250-290

Precip. Intensity
at Stations
10

1000

1982

all units in [0.1 mm]

Fig. 2 (continued)

the theoretical copula were used for both events. The bulls eyes effect is much
less pronounced when using copulas compared to Kriging, but the effect is still recognizable: Near a location where a measurement is available the uncertainty of the
interpolated value is small, however it could be that this location happens to be in
an area where the gradient of the measurement values is high, causing large uncertainties, and resulting in an overall medium-range uncertainty.

404

C.P. Haslauer et al.

4 Conclusion
Copulas offer the possibility to describe and model non-Gaussian dependence
structures. Such non-Gaussian dependence structures become evident when analyzing real world datasets with empirical bivariate copulas and the associated scalar
measures Rank and Symmetry presented in this paper. A complete stochastic
model is the backbone of the copula based geostatistical workflow whose use for
interpolation was demonstrated. The same model could be used for simulation purposes. Compared to traditional geostatistical tools, the copula approach takes both
the spatial configuration and the magnitude of the measurements into consideration
when modelling the spatial dependence structure. The full estimation uncertainty
(e.g. confidence intervals) can be obtained, because using copulas provides the
full conditional distribution, which can prove to be beneficial for risk assessment.
The possibility to express heterogeneous uncertainty can be important for valuedependent observation strategies, for example in observation network design.

References
Bardossy A (2006) Copula-based geostatistical models for groundwater quality parameters. Water
Resour Res 42(W11416) doi:10.1029/2005WR004754
Bardossy A, Li J (2008) Geostatistical interpolation using copulas. Water Resour Res 44(W07412)
doi:10.1029/2007WR006115
Joe H (1997) Multivariate models and dependence concepts. Number 73 in Monographs on
Statistics and Applied Probability. Chapman & Hall/CRC, London
Journel AG, Alabert F (1989) Non-gaussian data expansion in the earth science. Terra Nova 1
Nelsen RB An introduction to copulas. Lecture notes in statistics volume 139 Springer, New York
Sklar M (1959) Fonctions de repartition a n dimensions et leur marges. Publ Inst Stat Paris
8:229131
Sudicky EA (1986) A natural gradient experiment on solute transport in a sand aquifer: Spatial
variability of hydraulic conductivity and its role in the dispersion process. Water Resour Res
22(13):20692082

You might also like