Delta Method
Delta Method
Delta Method
If y'n(Tn -e) - T for some variable T, then we expect that y'n(¢(Tn) - ¢(e») - ¢'(e) T.
In particular, if y'n(Tn - e) is asymptotically normal N(O, a 2), then we expect that
y'n(¢(Tn) - ¢(e») is asymptotically normal N(O, ¢'(e)2a 2). This is proved in greater
generality in the following theorem.
In the preceding paragraph it is silently understood that Tn is real-valued, but we are more
interested in considering statistics ¢ (Tn) that are formed out of several more basic statistics.
Consider the situation that Tn = (Tn.!, ... , Tn,k) is vector-valued, and that ¢ :]Rk f-+ ]Rm is
a given function defined at least on a neighbourhood of e. Recall that ¢ is differentiable at
e if there exists a linear map (matrix) :]Rk f-+ ]Rm such that
All the expressions in this equation are vectors of length m, and IIhll is the Euclidean
norm. The linear map h f-+ is sometimes called a "total derivative," as opposed to
25
partial derivatives. A sufficient condition for t/J to be (totally) differentiable is that all partial
derivatives at/Jj(X)/aXi exist for x in a neighborhood of 0 and are continuous at O. (Just
existence of the partial derivatives is not enough.) In any case, the total derivative is found
from the partial derivatives. IT t/J is differentiable, then it is partially differentiable, and the
derivative map h t-* (h) is matrix multiplication by the matrix
3.1 Theorem. Let t/J : 1D>t/> C IRk t-* IRm be a map defined on a subset of IRk and dif-
ferentiable at O. Let Tn be random vectors taking their values in the domain of t/J. If
rn(Tn - 0) "-'+ T for numbers rn --+ 00, then rn(t/J(Tn) - t/J(O») Moreover, the
difference between rn(t/J(Tn) - t/J(O») and - 0») converges to zero in probability.
Proof. Because the sequence r n(Tn - 0) converges in distribution, it is uniformly tight and
Tn - 0 converges to zero in probability. By the differentiability of t/J the remainder function
=
R(h) t/J(O + h) - t/J(O) - =
satisfies R(h) o(lIhll) as h --+ O. Lemma 2.12 allows
to replace the fixed h by a random sequence and gives
The map 4> is differentiable at the point () = (ai, a2)T, with derivative 4>(al.a2l = (-2al, 1).
Thus if the vector (TI , T2)' possesses the normal distribution in the last display, then
In view of Slutsky's lemma, the same result is valid for the unbiased version n/(n - 1)S2
of the sample variance, because In(n/(n - 1) - 1) -+ O. 0
3.3 Example (Level of the chi-square test). As an application of the preceding example,
consider the chi-square test for testing variance. Normal theory prescribes to reject the null
hypothesis Ho: IL2 1 for values of nS2 exceeding the upper a point X;.a of the xLI
distribution. If the observations are sampled from a normal distribution, then the test has
exactly level a. Is this still approximately the case if the underlying distribution is not
normal? Unfortunately, the answer is negative.
For large values of n, this can be seen with the help of the preceding result. The central
limit theorem and the preceding example yield the two statements
X2 -(n-1)
n-I _ N(O, 1), In (S2
"2 -
)
1 - N(O, K + 2),
./2n - 2 fA'
where K = IL4/ - 3 is the kurtosis of the underlying distribution. The first statement
implies that (X;.a - (n - 1») / ./2n - 2) converges to the upper a point Za of the standard
normal distribution. Thus the level of the chi-square test satisfies
PIL2=1
2) =
(nS 2 > Xn•a P
(c(S2 1)
...,n IL2 - >
X;.a- n
In ) -+ 1 - <I> (zaJ2)
JK + 2 .
The asymptotic level reduces to 1 - <I> (za) = a if and only if the kurtosis of the underlying
distribution is O. This is the case for normal distributions. On the other hand, heavy-tailed
distributions have a much larger kurtosis. If the kurtosis of the underlying distribution is
"close to" infinity, then the asymptotic level is close to 1 - <1>(0) = 1/2. We conclude that
the level of the chi-square test is nonrobust against departures of normality that affect the
value of the kurtosis. At least this is true if the critical values of the test are taken from
the chi-square distribution with (n - 1) degrees of freedom. If, instead, we would use a
Law Level
Laplace 0.12
0.95 N(O, 1) + 0.05 N(O, 9) 0.12
normal approximation to the distribution of ../Ti(S2 / JL2 - 1) the problem would not arise,
provided the asymptotic variance IC + 2 is estimated accurately. Table 3.1 gives the level
for two distributions with slightly heavier tails than the normal distribution. 0
In the preceding example the asymptotic distribution of ../Ti(S2 - (12) was obtained by the
delta method. Actually, it can also and more easily be derived by a direct expansion. Write
C2 2
vn(S - (1 ) = vn -1 L)Xi
n i=1
c( - JL) -2 (1 2) - vn(X
C- - JL)2.
The second term converges to zero in probability; the first term is asymptotically normal
by the central limit theorem. The whole expression is asymptotically normal by Slutsky's
lemma.
Thus it is not always a good idea to apply general theorems. However, in many exam-
ples the delta method is a good way to package the mechanics of Taylor expansions in a
transparent way.
3.4 Example. Consider the joint limit distribution of the sample variance S2 and the
t-statistic X/So Again for the limit distribution it does not make a difference whether we
use a factor nor n - I to standardize S2. For simplicity we use n. Then (S2, XIS) can be
written as t/J(X, X2) for the map t/J: R2 R2 given by
t/J(x, y) = (y - x 2, (y _:2)1/2).
The joint limit distribution of ../Ti(X-ai, X2 -a2) is derived in the preceding example. The
map t/J is differentiable at () = (ai, a2) provided (12 = a2 - ar is positive, with derivative
It follows that the sequence ../Ti(S2 - (12, X/ S - all(1) is asymptotically bivariate normally
distributed, with zero mean and covariance matrix,
3.5 Example (Skewness). The sample skewness of a sample XI, .. ', Xn is defined as
(X. _ X)3
I _ n L..1=1 I
The sequence ../ii(X - ai, X2 - a2, X3 - (3) is asymptotically mean-zero normal by the
central limit theorem, provided is finite. The value ¢(al, a2, (3) is exactly the popu-
lation skewness. The function ¢ is differentiable at the point (ai, a2, (3) and application of
the delta method is straightforward. We can save work by noting that the sample skewness
is location and scale invariant. With Yi = (Xi - a 1) / a, the skewness can also be written as
¢(Y, y2, y3). With).. = JL3/a3 denoting the skewness of the underlying distribution, the
Ys satisfy
'V'-tN(O,(
y3 _ A K + 3 JLs/a -).. JL6/a - A
The derivative of ¢ at the point (0, 1, A) equals (-3, -3A/2, 1). Hence, if T possesses the
normal distribution in the display, then ../ii(In -)..) is asymptotically normal distributed with
mean zero and variance equal to var( -3Tl - 3AT2/2 + T3). If the underlying distribution
asymptotically N(O,
6)-distributed.
°
is normal, then A = JLs = 0, K = and JL6/a6 = 15. In that case the sample skewness is
An approximate level a test for normality based on the sample skewness could be to
reject normality if vlnl1n I > ,J6 Zaj2. Table 3.2 gives the level of this test for different
values of n. 0
n Level
10 0.02
20 0.03
30 0.03
50 0.05
For φ chosen such that φ (θ )σ (θ ) ≡ 1, the asymptotic variance is constant and finding an
asymptotic confidence interval for η = φ(θ ) is easy. The solution
1
φ(θ ) = dθ
σ (θ)
It does not work very well to base an asymptotic confidence interval directly on this result.
Figure 3.1. Histogram of 1000 sample correlation coefficients, based on 1000 independent
samples of the the bivariate normal distribution with correlation 0.6, and histogram of the
arctanh of these values.
The transformation
1 1 1+ρ
φ(ρ) = dρ = log = arctanh ρ
1−ρ 2 2 1−ρ
√
is variance stabilizing. Thus, the sequence n(arctanh rn –arctanh ρ) converges to a
standard normal distribution for every ρ. This leads to the asymptotic confidence interval
for the correlation coefficient ρ given by
√ √
tanh(arctanh rn − z α / n ), tanh(arctanh rn + z α / n ) .
Table 3.3 gives an indication of the accuracy of this interval. Besides stabilizing the
variance the arctanh transformation has the benefit of symmetrizing the distribution of the
sample correlation coefficient (which is perhaps of greater importance), as can be seen in
Figure 5.3.
In the one-dimensional case, a Taylor expansion applied to a statistic Tn has the form
Usually the linear term (Tn - O)t/J'(O) is of higher order than the remainder, and thus
determines the order at which t/J(Tn) - t/J(O) converges to zero: the same order as Tn - O.
Then the approach of the preceding section gives the limit distribution of t/J(Tn) - t/J(O). If
t/J' (0) = 0, this approach is still valid but not of much interest, because the resulting limit
distribution is degenerate at zero. Then it is more informative to multiply the difference
t/J(Tn) - t/J(O) by a higher rate and obtain a nondegenerate limit distribution. Looking at
the Taylor expansion, we see that the linear term disappears if t/J'(O) = 0, and we expect
that the quadratic term determines the limit behavior of t/J(Tn).
3.7 Example. Suppose that .jTi.X converges weakly to a standard normal distribution.
Because the derivative of x cos x is zero at x = 0, the standard delta method of the
preceding section yields that .jTi.(cos X - cos 0) converges weakly to O. It should be
concluded that .jTi. is not the right norming rate for the random sequence cos X-I. A
more informative statement is that - 2n (cos X-I) converges in distribution to a chi-square
distribution with one degree of freedom. The explanation is that
If the sequences Tn,j - OJ are of different order, then it may happen, for instance, that the
linear part involving Tn,j - OJ is of the same order as the quadratic part involving (Tn,j _ OJ)2.
Thus, it is necessary to determine carefully the rate of all terms in the expansion, and to
rearrange these in decreasing order of magnitude, before neglecting the "remainder."
Several applications of Slutsky's lemma and the delta method yield as limit in law the vector
+ h) - = if T is the limit in distribution of ,.jn(Tn - 9n). For 9n -* 9
at a slower rate, this argument does not work. However, the same result is true under a
slightly stronger differentiability assumption on q,.
3.8 Theorem. Let q, : IRk H- IRm be a map defined and continuously differentiable in
a neighborhood of 9. Let Tn be random vectors taking their values in the domain of
t/J. Ifrn(Tn - 9n) - T for vectors 9n -* 9 and numbers rn -* 00, then rn(t/J(Tn) -
t/J(9n») - Moreover, the difference between rn(t/J(Tn) - t/J(9n») and - 9n»)
converges to zero in probability.
Proof. It suffices to prove the last assertion. Because convergence in probability to zero
of vectors is equivalent to convergence to zero of the components separately, it is no loss
of generality to assume that q, is real-valued. For 0 ::: t ::: 1 and fixed h, define gn(t) =
q,(fJn + th). For sufficiently large n and sufficiently small h, both fJn and 9n + h are in a
ball around 9 inside the neighborhood on which t/J is differentiable. Then gn : [0, 1] H- IR is
continuously differentiable with derivative (t) = (h). By the mean-value theorem,
gn(1) - gn(O) = for some 0::: ::: 1. In other words
By the continuity of the map 9 H- there exists for every e > 0 a I) > 0 such that
- < ellhll for every - 911 < I) and every h. For sufficiently large nand
IIhll < 1)/2, the vectors 9n + are within distance I) of fJ, so that the nonn II Rn(h)/I of the
right side of the preceding display is bounded by ell h II. Thus, for any 1/ > 0,
p(rnIIRn(Tn - fJn)11 > 1/) ::: P(II Tn - 9nll + P(rnII Tn - 9nlle > 1/).
The first tenn converges to zero as n -* 00. The second tenn can be made arbitrarily small
by choosing e small. •
*3.5 Moments
So far we have discussed the stability of convergence in distribution under transfonnations.
We can pose the same problem regarding moments: Can an expansion for the moments of
q,(Tn) - q,(fJ) be derived from a similar expansion for the moments of Tn - fJ? In principle
the answer is affirmative, but unlike in the distributional case, in which a simple derivative
of q, is enough, global regularity conditions on q, are needed to argue that the remainder
tenns are negligible.
One possible approach is to apply the distributional delta method first, thus yielding the
qualitative asymptotic behavior. Next, the convergence of the moments of t/J(Tn) - q,(fJ)
(or a remainder tenn) is a matter of unifonn integrability, in view of Lemma 2.20. If
q, is uniformly Lipschitz, then this unifonn integrability follows from the corresponding
unifonn integrability of Tn - 9. If q, has an unbounded derivative, then the connection
between moments of q,(Tn) - q,(fJ) and Tn - fJ is harder to make, in general.
Notes
The Delta method belongs to the folklore of statistics. It is not entirely trivial; proofs are
sometimes based on the mean-value theorem and then require continuous differentiability in
a neighborhood. A generalization to functions on infinite-dimensional spaces is discussed
in Chapter 20.
PROBLEMS
1. Find the joint limit distribution of (.Jii"(X - /1-), .Jii"(S2 - (12») ifK and S2 are based on a sample
of size n from a distribution with finite fourth moment. Under what condition on the underlying
distribution are .Jii"(K - /1-) and .Jii"(S2 - (12) asymptotically independent?
2. Find the asymptotic distribution of .Jii"(r - p) if r is the correlation coefficient of a sample of n
bivariate vectors with finite fourth moments. (This is quite a bit of work. It helps to assume that
the mean and the variance are equal to 0 and 1, respectively.)
3. Investigate the asymptotic robustness of the level of the t-test for testing the mean that rejects
Ho : /1- 0 if .Jii"K/ S is larger than the upper a quantile of the tn-I distribution.
4. Find the limit distribution of the sample kurtosiskn = n- I I:7=1 (Xi - K)4/S4 - 3, and design an
asymptotic level a test for normality based on kn • (Warning: At least 500 observations are needed
to make the normal approximation work in this case.)
S. Design an asymptotic level a test for normality based on the sample skewness and kurtosis jointly.
6. Let X10 ••• , Xn be i.i.d. with expectation /1- and variance 1. Find constants such that an - bn )
converges in distribution if /1- = 0 or /1- =f. O.
7. Let XI, ... , Xn be a random sample from the Poisson distribution with mean (J. Find a variance
stabilizing transformation for the sample mean, and construct a confidence interval for (J based on
this.
8. Let XI, ... ,Xn be Li.d. with expectation I and finite variance. Find the limit distribution of
.Jii"(K;1 - 1). Ifthe random variables are sampled from a density f that is bounded and strictly
positive in a neighborhood ofzero, show that EIK;II = 00 for every n. (The density of Xn is
bounded away from zero in a neighborhood of zero for every n.)