0% found this document useful (0 votes)
38 views16 pages

Diagnostic Smooth Tests of ®T: Metrika (2000) 52: 237 252

This document discusses diagnostic smooth tests for assessing the goodness of fit of a parametric family of distributions. It examines how smooth test statistics must be rescaled to obtain procedures that have diagnostic properties, at least for large sample sizes. It derives the joint limit distribution of components under both a parametric hypothesis and a suitable nonparametric class of distributions.

Uploaded by

marryeah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views16 pages

Diagnostic Smooth Tests of ®T: Metrika (2000) 52: 237 252

This document discusses diagnostic smooth tests for assessing the goodness of fit of a parametric family of distributions. It examines how smooth test statistics must be rescaled to obtain procedures that have diagnostic properties, at least for large sample sizes. It derives the joint limit distribution of components under both a parametric hypothesis and a suitable nonparametric class of distributions.

Uploaded by

marryeah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Metrika (2000) 52: 237±252

> Springer-Verlag 2000

Diagnostic smooth tests of ®t


Bernhard Klar
Institut fuÈr Math. Stochastik, UniversitaÈt Karlsruhe, Englerstr. 2, D-76128 Karlsruhe, Germany.
email: Bernhard.Klar@math.uni-karlsruhe.de

Received: September 1999

Abstract. Smooth tests are frequently used for testing the goodness of ®t of
a parametric family of distributions. One reason for the popularity of the
smooth tests are the diagnostic properties commonly attributed to them.
In recent years, however, it has been realized that these tests are strictly non-
diagnostic when used conventionally. The paper examines how the smooth
test statistics must be rescaled in order to obtain procedures having diagnostic
properties at least for large sample sizes.

Key words: Smooth test, goodness of ®t, components, exponential families,


bivariate Poisson distribution

1 Introduction

Smooth tests, originally introduced by Jerzy Neyman (1937) to assess the


goodness of ®t for the uniform distribution, have been extended by several
authors to test the hypothesis that an unknown distribution belongs to a para-
metric family fPQ g (Thomas and Pierce (1979), Bargal and Thomas (1983),
Rayner and Best (1986, 1988, 1989, 1995), Koziol (1986, 1987)). Smooth test
statistics usually have the form

X
s‡k
1 X n
^2 ˆ
Cn; k U^n;2 j with U^n; j ˆ p hj Xi ; Q^n †: 1†
jˆs‡1
n iˆ1

Here, X1 ; X2 ; . . . ; Xn are independent observations on a random variable X


with unknown distribution PQ , Q^n is a suitable estimator of the s-dimensional
parameter (vector) Q, and fh0  ; Q† 1 1, h1  ; Q†, h2  ; Q†; . . .g are orthonormal
238 B. Klar

polynomials with respect to PQ , that is,

hj  ; Q†hl  ; Q† dPQ ˆ djl 0 U j; l U s ‡ k†; 2†

where djl denotes Kronecker's delta. Since the polynomial hj  ; Q† is of degree


j j V 1†, the U^n; j are called components of degree j. Note, however, that
Thomas and Pierce (1979) and Bargal and Thomas (1983) used powers of the
distribution function instead of orthonormal functions.
Putting l ˆ 0 in (2) yields

hj  ; Q† dPQ ˆ EQ ‰hj X ; Q†Š ˆ 0

for j V 1. This equation describes a relation between the ®rst j moments of X


which is valid under PQ .
In case of the Poisson distribution with parameter Q, the orthogonal
system is given by the Poisson-Charlier polynomials (see Chihara (1978),
Section VI.1); in particular,

x Q x Q† 2 x
h1 x; Q† ˆ p ; h2 x; Q† ˆ p :
Q 2Q

For j ˆ 1, the equation EQ ‰hj X ; Q†Š ˆ 0 means that the expected value of the
Poisson distribution equals Q, whereas, for j ˆ 2, it stands
P nfor the equality of
expectation and variance. Since, putting Q^n ˆ X n ˆ n 1 jˆ1 Xj ,
!
X
n
1 Xn
h1 Xi ; Q^n † ˆ p Xi nX n ˆ 0;
iˆ1 Q^n iˆ1

the ®rst component U^n; 1 is zero; hence s ˆ 1 in (1). The reason for this is that
the usual estimator of Q is the method of moments estimator of the mean.
Furthermore,

C^ 2 ˆ U^ 2 ˆ 1 Dn n† 2 ;
n; 1 n; 2
2n
Pn
where Dn ˆ jˆ1 Xj X n † 2 =X n denotes Fisher's index of dispersion. Thus,
the smooth test for Poissonity based on the ®rst nonzero component U^n; 2 and
the dispersion test are equivalent.
The diagnostic properties commonly attributed to the smooth tests are one
reason for their popularity: it is assumed that such tests, in case of rejection of
a hypothesis H0 , provide some kind of directed diagnosis regarding the kind
of departure from H0 of the underlying distribution. The dispersion test, for
example, examines the ratio of variance to expected value, which equals one in
case of the Poisson distribution. If the ratio of empirical variance to arithmetic
mean is not su½ciently close to one, the hypothesis of Poissonity is rejected
(on a given signi®cance level). Furthermore, it is believed that the hypothetical
Diagnostic smooth tests of ®t 239

model has been rejected because the variance of the underlying true distribu-
tion di¨ers from its expectation. Henze and Klar (1996) showed that the con-
ventional use of the dispersion test does not qualify to this diagnosis since, on
one hand, the conclusion refers to a nonparametric population parameter but,
on the other hand, it is drawn from the distributional behaviour of the test
statistic under the `narrow' parametric hypothesis of Poissonity. In Henze
(1997) and Henze and Klar (1996), it was also shown how to modify this and
similar tests to obtain procedures having diagnostic properties at least for
large sample sizes. The correct approach is to formulate a nonparametric hy-
pothesis that is tailored to the desired kind of directed diagnosis.
In case of the dispersion test, this hypothesis is the equality of expected
value and variance. The rejection of this nonparametric hypothesis on a cer-
tain signi®cance level leads to the conclusion that the variance of the under-
lying distribution di¨ers from its expected value.
It is the aim of this paper to generalize these results to smooth tests of
arbitrary order. To this end, the joint limit distribution of components is de-
rived in Section 2 not only under the parametric hypothesis, but also under a
suitable nonparametric class of distributions.
Under the parametric hypothesis, and if the maximum likelihood estima-
tion method is used, the usual representation of the asymptotic distribution of
components as limit law of score statistics is regained.
In the nonparametric class of distributions, however, the appropriate esti-
mation method is the method of moments, and we show how the components
have to be rescaled in order to obtain test statistics which are asymptotically
distribution-free within this class.
Section 3 treats smooth tests for certain exponential families for which re-
scaling is particularly easy. This applies to all cases treated in Rayner and Best
(1989).
Since the assertion about the limiting distribution of components in Section
2 remains valid without the assumption that the polynomials hj are orthogonal,
diagnostic smooth tests can be constructed even when no orthogonal system
exists. As an example, smooth tests of ®t for bivariate Poissonity are discussed
in Section 4.

2 The joint limit distribution of components

Let fPQ : Q A Yg be a parametric class of probability measures on the Borel


sets of R, where Y is an open subset of R s . Suppose that the distributions PQ
have common support and that they are distinct, i.e. the mapping Q ! PQ is
one-to-one. Assume further that
y
jxj k dFQ x† < y k ˆ 1; 2; . . .†;
y

where FQ : R ! ‰0; 1Š denotes the distribution function of PQ . Then, the kth


moment is given by
y
mk ˆ x k dFQ x† k ˆ 1; 2; . . .†:
y
240 B. Klar

If the set S FQ † ˆ fx j FQ x ‡ e† FQ x e† > 0 for each e > 0g of growth


points of FQ is in®nite, there is a unique system of orthonormal polyno-
mials fh0  ; Q† 1 1, h1  ; Q†, h2  ; Q†; . . .g with respect to PQ (see, e.g., Chihara
(1978), Theorem I.3.3 and the corollary after Theorem I.2.2).
Suppose X ; X1 ; . . . ; Xn are independent observations on a random variable
X with (unknown) distribution P. For testing the composite hypothesis
H0 : P A fPQ : Q A Yg;
a smooth test of order k pertaining to fPQ g can be used. This test rejects H0
for large values of
kX
0 ‡k
^2 ˆ
C U^n;2 j ; 3†
n; k
jˆk0 ‡1

where 0 U k0 U s and

1 X n
U^n; j ˆ p hj Xi ; Q^n †:
n iˆ1

The value of k0 in (3) depends on the estimation method: it is the largest


integer for which U^n; 1 ˆ    ˆ U^n; k0 ˆ 0. In many examples, k0 equals the
dimension s of the parameter vector. The estimator Q^n of Q is assumed to sat-
isfy the following regularity condition under H0 :

(R1) There exists a measurable function l : R  Y ! R s such that

p ^ 1 X n
n Qn Q† ˆ p l Xi ; Q† ‡ oPQ 1†;
n iˆ1

where EQ ‰l X ; Q†Š ˆ 0 and EQ ‰l 2 X ; Q†Š < y.

The ®rst result establishes the joint asymptotic distribution of the random
vector U^n ˆ U^n; k0 ‡1 ; . . . ; U^n; k0 ‡k † 0 under H0 , where the prime denotes trans-
pose. Thereby, h x; Q† ˆ hk0 ‡1 x; Q†; . . . ; hk0 ‡k x; Q†† 0 , and `Q h x; Q† is the
k  s†-matrix with entries qhi x; Q†=qQj , where Q ˆ Q1 ; . . . ; Qs † 0 .
Pj
2.1 Theorem. a) Suppose that the coe½cients aji Q† in hj x; Q† ˆ iˆ0 aji Q†x i ,
k0 ‡ 1 U j U k0 ‡ k, have a continuous derivative with respect to Q. Let Q^n
satisfy (R1). Then the limiting distribution of U^n under PQ is k-variate normal
N 0; S† with covariance matrix

S ˆ EQ ‰v X ; Q†v X ; Q† 0 Š;

where v x; Q† ˆ h x; Q† ‡ EQ ‰`Q h X ; Q†Šl x; Q†.


b) Additionally, suppose that PQ has a density f  ; Q† with respect to a s-®nite
measure m, and that, for ®xed x A R, the derivative of f x; Q† with respect
to Q is continuous. Furthermore, assume that, for k0 ‡ 1 U j U k0 ‡ k, the
integral of hj  ; Q† f  ; Q† with respect to m may be di¨erentiated under the
integral. Then
Diagnostic smooth tests of ®t 241

EQ ‰`Q h X ; Q†Š ˆ CQ ;

where CQ is the k  s†-matrix with entries


 
q log f X ; Q†
cij ˆ EQ hi X ; Q† :
qQj

c) If Q^n is the maximum likelihood estimator (MLE) of Q, then, under the


assumptions of b),

S ˆ Ik CQ ‰I Q†Š 1 CQ0 ;

where Ik denotes the identity matrix of order k, and I Q† is the Fisher infor-
mation matrix with entries
   
q q
Ijk Q† ˆ EQ log f X ; Q†  log f X ; Q† :
qQj qQk
p P n
Proof: Let Un; j g† ˆ 1= n† iˆ1 hj Xi ; g† so that U^n; j ˆ Un; j Q^n †. By the
mean value theorem,

Un; j Q^n † ˆ Un; j Q† ‡ Q^n Q† 0 ‰`g Un; j g†jgˆQ  Š;

where Q  is between Q and Q^n . Using Sluzky's lemma and (R1),


p
Un; j Q^n † ˆ Un; j Q† ‡ n Q^n Q† 0 E‰`g hj X ; g†jgˆQ Š ‡ oPQ 1†

1 X n
ˆ p ‰hj Xi ; Q† ‡ l Xi ; Q† 0 E‰`g hj X ; g†jgˆQ Š ‡ oPQ 1†:
n iˆ1

Since hj X ; Q† and l X ; Q† are centered under PQ , the central limit theorem


yields assertion a). Under the standing assumption, b) follows from

q q
0ˆ ‰EQ hi X ; Q††Š ˆ hi x; Q† f x; Q†m dx†
qQj qQj

q hi x; Q† f x; Q††
ˆ m dx†
qQj

qhi x; Q† q log f x; Q†
ˆ f x; Q†m dx† ‡ hi x; Q† f x; Q†m dx†
qQj qQj
   
qhi X ; Q† q log f X ; Q†
ˆ EQ ‡ EQ hi X ; Q† :
qQj qQj

If a maximum likelihood estimator satis®es (R1), the function l  ; Q† takes


the form l x; Q† ˆ ‰I Q†Š 1  `Q log f x; Q†. Plugging this and the matrix CQ from
b) into the covariance matrix S, and noting that EQ ‰h X ; Q†h X ; Q† 0 Š ˆ Ik by
orthonormality of the hj , assertion c) follows. 9
242 B. Klar

Remarks:

1. Part a) of Theorem 2.1 is the multivariate version of the assertion in Henze


and Klar (1996) concerning a single component. The proof shows that
parts a) and b) remain valid without the assumption that the polynomials
hj are orthogonal. Moreover, the Xi may be multivariate random vectors as
well (see Section 4).
2. Smooth tests of ®t are commonly introduced as score tests which are
asymptotically optimal against certain parametric alternatives, similar to
the likelihood ratio and the Wald test (Rayner and Best (1989), Chapter 3).
This property only holds if the MLE is used. The general distribution
theory of score tests then yields the asymptotic distribution given in part
c) of Theorem 2.1 (see, e.g., Rayner and Best (1989), p. 80). However, in
many cases, for example when testing for normality, it is not possible to
de®ne families of smooth alternatives (see Mardia and Kent (1991), p. 356;
Kallenberg et al. (1997), p. 45). Hence, this approach is of limited appli-
cability.

The limiting joint normal distribution of the components U^n; j may be


degenerate. In particular, some components may be zero due to the method of
estimation. The following lemma shows that this is always the case when using
moment estimators.
In the following, we assume P nthat Q is a unique function of the moments
m1 ; . . . ; ms . Letting ml ˆ 1=n† iˆ1 Xil denote the lth empirical moment and
writing ml Q† ˆ EQ ‰X l Š; l V 1, the moment estimator Q n is de®ned by ml Q n † ˆ
ml for l ˆ 1; . . . ; s.

2.2 Lemma. Let Q n denote the moment estimator of Q. If Q n A Y, then

U^n; j ˆ 0;
j ˆ 1; . . . ; s:
Pj
Proof: Write hj x; Q† ˆ lˆ0 al Q†x l . Since EQ ‰hj X ; Q†Š ˆ 0 for j V 1,

X
j
al Q†ml Q† ˆ 0; Q A Y:
lˆ0

Assuming Q n A Y, it follows that for j ˆ 1; . . . ; s

1 1X n
p U^n; j ˆ hj Xi ; Q n †
n n iˆ1

X
j
ˆ al Q n †ml
lˆ0

X
j
ˆ al Q n †ml Q n † ˆ 0: 9
lˆ0

If a smooth test is conducted solely as goodness of ®t test for the para-


Diagnostic smooth tests of ®t 243

metric family PQ , other estimation methods may be used instead of the (non-
parametric)
p method of moments. Then, for example, the ®rst component U^n; 1
ˆ n X n m1 Q^n †† examines whether m1 Q^n † is a `reasonable' estimator of the
mean. In general, this will not be the case if P B fPQ g. A large or small value
of U^n; 1 indicates that the parametric model may not be appropriate. If the aim
is to use the test as a `diagnostic' procedure, the moment estimator of Q has to
be used, as will be explained in the following.
The equations EQ ‰hj X ; Q†Š ˆ 0, j ˆ 1; . . . ; s, determine the moments m1 ; . . . ;
ms , which depend on Q1 ; . . . ; Qs . Conversely, Q is a unique function of m1 ; . . . ; ms
by assumption. On the set
 
j
P :ˆ P : jxj P dx† < y; j ˆ 1; . . . ; s; Q ˆ Q m1 ; . . . ; ms † A Y ;

we de®ne a functional d : P ! Y by d P† ˆ Q m1 ; . . . ; ms † with m1 ; . . . ; ms being


the moments of P. Note that
EP ‰hj X ; d P††Š ˆ 0; j ˆ 1; . . . ; s; 4†
for each P A P. To establish convergence in distribution of U^n for P B fPQ g,
a suitable requirement is that the estimator Q^n of Q ˆ d P† satis®es condition
(R1). In general, this will be the case for the moment estimator, but not for an
estimator especially tailored to the parametric model PQ . Hence, we assume
in the following that the moment estimator Q n ˆ Q n m1 ; . . . ; ms † is used, which
implies U^n; j ˆ 0, j ˆ 1; . . . ; s, by Lemma 2.2. Consequently, the test statistic
C^ 2 consists of the components U^n; s‡1 ; . . . ; U^n; s‡k , i.e. k0 ˆ s. Suppose Q n
n; k
satis®es (R1). Further, let

Tj P† :ˆ hj  ; d P†† dP; 5†

and de®ne a nonparametric class of distributions by

P0 :ˆ fP A P : Ts‡1 P† ˆ    ˆ Ts‡k P† ˆ 0; EP ‰X 2 s‡k† Š < yg:

Obviously, P0 includes fPQ g: the moments ms‡1 ; . . . ; ms‡k of each distribution


in P0 and each distribution PQ satisfy the same relations. The assertion of
Theorem 2.1 a) carries over to the whole family P0 , throughout replacing Q
by d P† (cf. Section 3 in Henze and Klar (1996)). Note that the covariance
matrix S in 2.1 a) depends on P A P0 .
A directed smooth test aims at making the diagnosis that, in case of rejec-
tion of the hypothesis, there are deviations in the moments ms‡1 ; . . . ; ms‡k
from the corresponding moments of the parametric model. Actually, this
means testing the hypothesis H ~0 : P A P0 against H ~1 : P B P0 . Such a directed
diagnosis, however, can not be achieved by the statistic C ^ 2 in (3) since S in
n; k
2.1 a) and hence the asymptotic distribution of C ^ 2 depends on the underlying
n; k
P A P0 . To obtain a statistic which is asymptotically distribution-free over the
^ 2 has to be rescaled appropriately. The next theorem, which is in
class P0 , Cn; k
the setup and notation of Theorem 2.1, shows that such a rescaling can be
done quite easily if C ^ 2 has a limiting w 2 -distribution under H0 . This applies
n; k
to all cases treated in Rayner and Best (1989).
244 B. Klar

2.3 Theorem. a) Suppose that, under H0 , the covariance matrix S in 2.1 is the
identitiy matrix, and, hence, C ^ 2 has a limiting chi-squared distribution with k
n; k
degrees of freedom under H0 . Further, let the support of P A P0 consist of at
least s ‡ k ‡ 1 elements. Then SP ˆ EP ‰h X ; d P††h X ; d P†† 0 Š is nonsingular,
and U^n0 SP 1 U^n has a limiting wk2 -distribution under P as well.
b) If n V s ‡ k ‡ 1 and if the distribution function of P is continuous, then the
k  k†-matrix

X
n
^n ˆ 1
H ‰h Xi ; Q^n †h Xi ; Q^n † 0 Š
n iˆ1

is nonsingular with probability 1. Moreover, the rescaled statistic

~ 2 :ˆ U^ 0 H
C ^ 1^
n; k n n Un

has a limiting wk2 -distribution under P.

Proof: By assumption, S ˆ Ik under H0 . In view of 2.1 a), all entries in


EQ ‰`Q h X ; Q†Š are zero. Now, qhr x; Q†=qQj is a polynomial of degree r with
respect to x:

qhr x; Q† X r
qai Q† i X r
ˆ x ˆ ci Q†hi x; Q†:
qQj iˆ0
qQj iˆ0

Since EQ ‰h0 X ; Q†Š ˆ 1 and EQ ‰hi X ; Q†Š ˆ 0 i V 1†, we obtain c0 Q† ˆ 0


for each Q A Y. In view of (4) and the de®nition of P0 , this yields
EP ‰`Q h X ; Q†jQˆd P† Š ˆ 0 for each P A P0 .
Hence, the covariance matrix under P is SP ˆ EP ‰h X ; d P††h X ; d P†† 0 Š.
Using the assumption on the support of P, we obtain for cs‡1 ; . . . ; cs‡k ; g A R
! !
X
s‡k X
s‡k
P cj hj X ; Q† ˆ g ˆP cj0 X j ˆg < 1;
jˆs‡1 jˆ0

0 0
where cs‡1 ; . . . ; cs‡k A R are determined by cs‡1 ; . . . ; cs‡k . Therefore, the
matrix SP is nonsingular. Using well-known results, assertion a) follows.
To prove b), note that H ^n is the empirical version of SP . Hence, the state-
ment about the distribution of C ~ 2 follows from a) if H ^n is nonsingular with
n; k
probability one.
To this end, let n V s ‡ k ‡ 1 and assume that X1 ; . . . ; Xs‡k‡1 take di¨erent
values. Adding s ‡ 1 rows X1j ; . . . ; Xs‡k‡1
j
†; j ˆ 0; . . . ; s, to the k  s‡k‡1†-
matrix with rows

hs‡j X1 ; Q^n †; . . . ; hs‡j Xs‡k‡1 ; Q^n ††; j ˆ 1; . . . ; k;

yields a matrix with s ‡ k ‡ 1 rows which can be transformed into a


(nonsingular) Vandermonde matrix by elementary row operations. Hence, the
^n is nonsingular. 9
original matrix is of rank k. It follows that H
Diagnostic smooth tests of ®t 245

Remark: If P is a discrete distribution, H^n is nonsingular with probability


smaller than one. Since the elements of H ^n converge to the elements of SP
almost surely and the determinant of H ^n depends continuously on these ele-
ments, there exists almost surely an integer n0 such that H^n is nonsingular
for each n V n0 . Hence, the assertion about the asymptotic distribution of
C~ 2 remains valid for discrete distributions satisfying the assumptions of
n; k
part a).

3 Smooth tests in exponential families

In this section, we consider a large class of continuous and discrete parametric


families of distributions with the property that the pertaining statistics C^2
n; k
have a limiting chi-squared distribution as required in Theorem 2.3. To this
end, let PQ be an s-parameter exponential family with density
" #
Xs
pQ x† ˆ C Q† exp zj Q†tj x† h x†
jˆ1

with respect to some s-®nite measure m, where 1; t1 x†; . . . ; ts x† are a½nely


independent with probability one.
We make the additional assumption that the tj are polynomials of degree
less than or equal to s. Then, we may assume without loss of generality that
tj x† ˆ x j for 1 U j U s.
Each of the parametric families considered in Rayner and Best (1989) be-
longs to this class of exponential families: in the continuous case the normal
and the exponential distribution (or more generally the Gamma distribution
with known shape parameter); in the discrete case the Poisson, the binomial
and the geometric distribution.
If the parameter Q is one-dimensional, the exponential family is called lin-
ear exponential. In the discrete case, these distributions are also termed modi-
®ed power series distributions (MPSD), since their probability mass functions
have the representation

ax ‰u Q†Š x
pQ x† ˆ ; ax V 0;
h Q†

where u Q† is a positive (and usually di¨erentiable) function and h is given by


P
h Q† ˆ x ax ‰u Q†Š x . Besides the discrete distributions listed above, the nega-
tive binomial, the logarithmic and the Lagrange distribution are MPSD;
moreover, each truncated MSPD like the positive Poisson distribution again
is a modi®ed power series distribution (see Johnson, Kotz and Kemp (1992)).
In exponential families, the maximum likelihood equations are

1X n
EQ ‰tj X †Š ˆ tj xi †; j ˆ 1; . . . ; s;
n iˆ1

Lehmann (1983), p. 439). In particular, if tj x† ˆ x j , we obtain


(see, e.g., P
j
EQ ‰X Š ˆ iˆ1 n
xij =n, i.e. the maximum likelihood equations and the equations
246 B. Klar

de®ning the method of moments estimators coincide. By Lemma 2.2, the ®rst
s components are zero. Since
X
s
`Q ‰log pQ x†Š ˆ `Q ‰log C Q††Š ‡ tj x†`Q ‰zj Q†Š; 6†
jˆ1

`Q ‰log pQ x†Š is a polynomial with respect to x of degree s; hence

EQ ‰hk X ; Q†`Q log pQ X †Š ˆ 0; k > s; 7†

where the hk  ; Q† denote the orthogonal polynomials of degree k. In view of


2.1 c), the next result follows.

3.1 Theorem. Let fPQ g be an s-parameter exponential family satisfying the


assumptions of 2.1 c) and assume that the functions tj x† are polynomials of
degree less than or equal to s. If the maximum likelihood estimator of Q is used,
P
C^ 2 ˆ s‡k U^ 2 has a limiting chi-squared distribution with k degrees of
n; k jˆs‡1 n; j
freedom.

Consequently, Theorem 2.3 can be used to modify the smooth test of ®t in


order to have (at least asymptotically) diagnostic properties; if the k moments
under consideration di¨er strongly from the moments of the hypothetical dis-
tribution, the modi®ed test rejects the hypothesis for `su½ciently large' sample
size. Here, `su½ciently large' depends on the underlying distribution which
in¯uences the speed of convergence to the limiting distribution.
In general, the necessary modi®cations are more complicated for para-
metric families which do not belong to the class described above, as can be
seen in Theorem 2.1. Examples are the logistic, the Laplace and the Gumbel
distribution or the Gamma distribution with unknown shape parameter,
mentioned in Boulerice and Ducharme (1995).

3.2 Example. As an important example, we consider the smooth test of ®t


for the normal distribution, based on the ®rst two nonzero components. Ob-
viously, the normal distribution satis®es the assumptions of Theorem 3.1. The
orthonormal polynomials with respect to the unit normal distribution are the
normalized Hermite polynomials; the ®rst four of these are
p
H1 x† ˆ x; H2 x† ˆ x 2 1†= 2;
p p
H3 x† ˆ x 3 3x†= 6; H4 x† ˆ x 4 6x 2 ‡ 3†= 24:

As a consequence, hj x; m; s† ˆ Hj x m†=s† j V 1† are orthonormal with


^n in Theorem 2.3 has
respect to the normal distribution N m; s 2 †; the matrix H
the entries
X
n
^jk ˆ 1
H hj Xi ; X n ; s^n †hk Xi ; X n ; s^n †; j; k A f3; 4g;
n iˆ1

where X n and s ^n denote the arithmetic mean and the empirical standard
deviation, respectively. In the class
Diagnostic smooth tests of ®t 247

P0 ˆ fP A P : T3 P† ˆ T4 P† ˆ 0; EP ‰X 8 Š < yg;

the statistic

U^n;2 3 H
^44 2U^n; 3 U^n; 4 H
^34 ‡ U^ 2 H^
n; 4 33
~2 ˆ
Cn; 2 ^33 H^44 H ^ 2
H 34

has a limiting w22 -distribution. The test based on C ~ 2 has diagnostic properties.
n; 2
If the skewness and/or the kurtosis of the underlying distribution di¨er from
the corresponding values (ˆ 0 and 3) of the normal distribution, the non-
parametric hypothesis H ~0 : P A P0 will be rejected, at least for su½ciently large
sample sizes: if a A 0; 1† and w2;2 1 a denotes the 1 a†-quantile of the w22 -
distribution, then

~ 2 > w2
lim P C
n!y n; 2 2; 1 a † ˆ 1: 8†

This fact can be seen as follows. If P B P0 , a Taylor expansion as in the


proof of Theoremp2.1  may be used to derive an asymptotic normal distri-
bution for U^n; j nEhj X ; X n ; s^n †, j ˆ 3; 4 (assuming Ehj2 X ; X n ; s^n † < y).
Since P B P0 , Eh3 X ; X n ; s^n † 0 0 or Eh4 X ; X n ; s^n † 0 0 and, hence, jU^n; 3 j !
y or jU^n; 4 j ! y in probability. A similar reasoning as in the proof of 2.3 b)
yields the positive de®niteness of H ^n for large n under P. Hence, the consi-
stency result (8) follows.
To get an impression about the speed of convergence of C ~ 2 to the
n; 2
asymptotic distribution if H ~0 holds, we conducted a simulation study for the
nominal level a ˆ 0:1 and several distributions from P0 . These distributions
were taken to be N 0; 1†, mixtures pN 0; 1† ‡ 1 p†U a; a† of a
standard normal distribution and a uniform distribution on a; a† (denoted
by NU p; a†) and mixtures of a t-distribution with n degrees of freedom and a
uniform distribution on a; a† with equal mixing proportions (denoted by
tU n; a†). The parameters p; a† and n; a† are determined in such a way that
the resulting distributions are in P0 .
Table 1 shows the percentage of 10000 Monte Carlo samples declared sign-
i®cant by the tests based on C ^ 2 and C ~ 2 for the sample sizes n ˆ 50, 200, 1000
n; 2 n; 2
and 10000. Since the convergence of both statistics to their limit distributions
is slow even if the underlying distribution is normal, we used empirical critical
values (instead of w2;2 0:9 ) to ensure that the statistics maintain the nominal level
closely under normality. Note that C ^ 2 coincides with the statistic of Bowman
n; 2
and Shenton (1975). Therefore, the results of Gastwirth and Owens (1977) can
be used to determine the limit law of C ^ 2 . If the underlying distribution P
n; 2
belongs to P0 , one obtains
D
^2 !
C g1 N12 ‡ g2 N22 ;
n; 2

D
where N1 and N2 are independent unit normal random variables, and !
denotes weak convergence. The weights are given by

g1 ˆ E‰X~ 6 Š 9†=6; g2 ˆ E‰X~ 8 Š 12E‰X~ 6 Š ‡ 99†=24;


248 B. Klar

^2
Table 1. Percentage of 10000 Monte Carlo samples declared signi®cant by the tests based on Cn; 2
and C~n;22 for several distributions from P0 (a ˆ 0:1)

Distribution g1 g2 ^2
C C~n;22
n; 2
n ˆ 50 200 1000 10000 n ˆ 50 200 1000 10000

N 0; 1† 1.0 1.0 10.6 10.2 10.1 10.1 10.2 10.1 10.0 10.2
NU 0:7; 3:506† .59 .36 5.2 2.3 1.2 0.8 8.1 4.7 6.2 9.4
NU 0:5; 5:331† .41 .40 4.8 1.3 0.5 0.4 11.2 5.9 6.7 9.1
NU 0:7; 1:307† 1.2 1.4 11.9 15.1 16.3 17.0 15.5 14.9 12.3 10.7
NU 0:5; 1:258† 1.4 1.9 14.9 20.3 22.4 25.1 21.9 20.5 14.6 11.6
NU 0:3; 1:180† 1.8 3.0 20.0 31.5 35.5 37.3 32.2 28.7 18.9 11.5
tU 9; 5:892† .41 .45 4.8 1.5 0.7 0.5 10.9 6.0 6.3 10.0
tU 85; 1:306† 1.5 2.2 14.9 22.9 26.3 28.4 22.9 23.7 16.6 11.1
tU 20; 1:494† 1.8 4.2 15.7 28.2 37.0 42.7 26.5 29.3 22.3 13.8

where X~ ˆ X EX †= Var X † 1=2 is the standardized random variable and X


is distributed according to P. The values of g1 and g2 for the di¨erent distri-
butions are also given in Table 1 to explain the results for C ^ 2 for large n.
n; 2
The results of the simulations are a con®rmation of the theoretical ®ndings:
For distributions P A P0 , the empirical size of the test based on C ~ 2 approaches
n; 2
its asymptotic value 0.1 for increasing sample sizes, whereas the proportion of
the Monte Carlo samples declared signi®cant by the tests based on C ^ 2 is be-
n; 2
tween 0 and 1. However, it should be noted that the convergence of the empir-
ical size of the test based on C~ 2 is quite slow for long-tailed distributions.
n; 2

4 Diagnostic smooth tests of ®t for bivariate Poissonity

Similarly to the univariate Poisson distribution, the bivariate Poisson distri-


bution is an important discrete distribution in bivariate settings. A detailed
description is given in Kocherlakota and Kocherlakota (1992). The bivariate
Poisson-distribution is a 3-parameter family with probability mass function

X
min r; s†
l1 ‡l2 l3 † l1 l3 † r i l2 l3 † s i l3i
f r; s; Q† ˆ e ;
iˆ0
r i†! s i†!i!

where Q ˆ l1 ; l2 ; l3 †. Often, the alternative parameterization l1 ˆ l1 l3


and l2 ˆ l2 l3 is used.
If the random vector X ; Y † has a bivariate Poisson distribution, X and
Y have (univariate) Poisson distributions with parameters l1 and l2 , respec-
tively. The factorial moments m‰r; sŠ ˆ E‰X ‰rŠ Y ‰sŠ Š, where x‰rŠ ˆ x x 1† x 2†
. . . x r ‡ 1†, are given by

min r; s†
! !  
X r s l3 i
m‰r; sŠ ˆ lr s
1 l2 i!   :
iˆ0 i i l1 l2

Hence, E‰X Š ˆ l1 , E‰Y Š ˆ l2 and Cov X ; Y † ˆ l3 . Higher non-central mo-
ments mr;0 s ˆ E‰X r Y s Š r ‡ s U 3† are given by
Diagnostic smooth tests of ®t 249

m2;0 0 ˆ l2 
1 ‡ l1 ;

m3;0 0 ˆ l3 2 
1 ‡ 3l1 ‡ l1 ;

m2;0 1 ˆ l2    
1 l2 ‡ 2l1 l3 ‡ l3 ‡ l1 l2 ;

0 0 0
similar equations hold for m 0; 2 ; m 0; 3 and m1; 2 . Therefore, the corresponding
central moments mr; s ˆ E‰ X EX † Y EY † s Š are m2; 0 ˆ m3;0 ˆ l1 , m 0; 2 ˆ
r

m 0; 3 ˆ l2 and m2; 1 ˆ m1; 2 ˆ l3 .


As in the previous sections, we want to construct smooth tests with diag-
nostic properties, but it is in general not possible to de®ne a complete system
of orthogonal polynomials with respect to a multivariate distribution (an ex-
ception is the multivariate normal distribution). However, appropriate biva-
riate polynomials can be used to detect deviations of the moments of the
underlying distribution from the corresponding moments of the bivariate
Poisson distribution. The assertion about the limiting distribution of compo-
nents in Theorem 2.1a) remains valid (see the remark after 2.1), but not the
remainder of the theorem since the proof makes use of the orthogonality re-
lations. Instead of restating the general expressions, we consider two examples
which are also examined in Rayner and Best (1995).

4.1 Example. In the work of Rayner and Best (1995), smooth tests of ®t are
derived as score tests with respect to certain alternatives; but again, there is the
problem that the `smooth alternatives' do not exist.
Rayner and Best (1995) use the maximum likelihood method. The MLE
of l1 and l2 are given by l^1 ˆ X and l^2 ˆ Y , respectively.
Pn The MLE l^3 of
l3 di¨ers from the empirical covariance SXY ˆ n 1 iˆ1 Xi X † Yi Y †
(i.e. from the corresponding estimator derived by the method of moments)
and has to be computed by an iterative technique (see Kocherlakota and
Kocherlakota (1992), Section 4.7).
Therefore, a test can be based on h1 x; y; Q† ˆ x l1 † y l2 † ˆ
x E‰X І y E‰Y І. The appropriate centered component

1 X n
U^n; 1 ˆ p Xi l^1 † Yi l^2 † l^3 †
n iˆ1
p
ˆ n SXY l^3 †
has a limiting normal distribution with expectation 0 under the hypothesis
of bivariate Poissonity. Computing the variance of U^n; 1 , normalizing and
squaring, yields a test statistic which has a limiting w12 -distribution (Rayner
and Best (1995), Section 3).
Obviously, the smooth test which rejects the hypothesis of bivariate
Poissonity for large values of U^n;2 1 is not a test having diagnostic properties: a
signi®cant test result does not necessarily mean that the covariance structure
of the data at hand is incompatible with that of a bivariate Poisson distri-
bution for a given signi®cance level (since l3 can take any positive value). It
rather means that the MLE di¨ers signi®cantly from the nonparametric mo-
ment estimator, indicating that the hypothetical parametric model may not be
appropriate.
250 B. Klar

A test which aims at detecting di¨erences in certain higher moments has


to use the moment estimator (see the remark after Lemma 2.2). In this case,
the test based on U^n; 1 makes no sense since the test statistic is always zero.

4.2 Example. By estimating the three parameters of the bivariate Poisson


distribution by the method of moments, we can construct a test which is able
to detect deviations of the variances of the marginal distributions from those
of the hypothetical model. The corresponding test statistic consists of all
nonvanishing components of degree less than or equal to two. A reasonable
choice are the polynomials h1 x; y; Q† ˆ x l1 † 2 and h2 x; y; Q† ˆ y l2 † 2 .
Since Eh1 X ; Y ; Q†Š ˆ Var X † ˆ l1 , the ®rst centered component is

1 X n
U^n; 1 ˆ p Xi l^1 † 2 l^1 †
n iˆ1
p
ˆ n SXX l^1 †;
p
analogously, the second component is U^n; 2 ˆ n SYY l^2 †.
It is not di½cult to verify that condition (R1) on page 5 is satis®ed with
l1 x; y; Q† ˆ x l1 ; l2 x; y; Q† ˆ y l2 and l3 x; y; Q† ˆ xy l3 x l1 †l2
y l2 †l1 . Since E‰qh1 =ql1 Š ˆ 1 and E‰qh1 =ql2 Š ˆ E‰qh1 =ql3 Š ˆ 0, the ®rst
entry in the covariance matrix S in Theorem 2.1 a) is

s11 ˆ E‰ X l1 † 2 l1 X l1 †† 2 Š

ˆ m4; 0 2m3; 0 l2 


1 ‡ l1 :

In a similar way, we get


!
m4; 0 2m3; 0 l2 
1 ‡ l1 m2; 2 m2; 1 m1; 2 ‡ l3 l1 l2
Sˆ :
 m 0; 4 2m 0; 3 l2 
2 ‡ l2

Replacing all higher moments in S with the corresponding empirical quanti-


ties and computing U^n0 S^ 1 U^n , yields the following assertion.
^ 2 given by the equation
4.3 Theorem. The test statistic Cn

1
f md
4; 0 2d
m3; 0 l^1 † 2 ‡ l^1 † md
0; 4 2md
0; 3 l^2 † 2 ‡ l^2 †
n

md
2; 2 md
2; 1 md
1; 2 ‡ SXY l^1 l^2 † 2 gC
^2
n

ˆ md
0; 4 2md
0; 3 l^2 † 2 ‡ l^2 † SXX l^1 † 2

2 md
2; 2 md
2; 1 md
1; 2 ‡ SXY l^1 l^2 † SXX l^1 † SYY l^2 †

‡ md
4; 0 2d
m3; 0 l^1 † 2 ‡ l^1 † SYY l^2 † 2
Diagnostic smooth tests of ®t 251

has a limiting chi-squared distribution with two degrees of freedom in the class
of all bivariate discrete distributions having the property that the expectation of
each marginal distribution equals its variance.

Remark: Using the moments of the bivariate Poisson distribution, the cova-
riance matrix, after simpli®cation, takes the form
!
m4; 0 l2
1 l1 m2; 2 l1 l2 l3
S0 ˆ :
 m 0; 4 l2
2 l2

This is the covariance matrix given in Rayner and Best (1995), Section 4. The
corresponding statistic has a limiting w22 -distribution under the hypothesis of
bivariate Poissonity, but not within the wider nonparametric class of distri-
butions de®ned above.
The last example showed how tests with diagnostic properties can be con-
structed in multivariate settings where no orthogonal system is available. A
general objection against the use of moment estimators is the low e½ciency
compared with other estimators; in case of the bivariate Poisson distribution,
the moment estimator SXY of l3 is not e½cient for large values of the corre-
lation coe½cient r (see Kocherlakota and Kocherlakota (1992), p. 108). This
does not a¨ect the testing procedure; but possibly one would prefer to con-
tinue working with an e½cient estimator. Yet this argument does not meet
the intention of a diagnostic test. The aim of a directed test is to examine
whether the data coincide with a certain (simple) model in important charac-
teristics (the ®rst moments). If this is the case, the theoretical model is used
even if it is not the `true' underlying distribution. The term `e½ciency', how-
ever, is meaningful only within the parametric model.
Similarly, the statement that diagnostic tests do not always have good
power as tests of ®t is directed at the parametric model. However, diagnostic
tests are not a new goodness of ®t statistic for testing the parametric model
but they aim to test a di¨erent (nonparametric) hypothesis. Therefore, com-
parisons with the power of goodness of ®t tests for the parametric model are
of limited meaning.

Acknowledgements. This work is based on a part of the author's doctoral thesis at the University
of Karlsruhe, written under the supervision of Professor Norbert Henze, whose guidance is
gratefully appreciated.

References

Bargal AI, Thomas DR (1983) Smooth goodness of ®t tests for the Weibull distribution with
singly censored data. Commun. Statist.-Theory Meth. 12:1431±1447
Boulerice B, Ducharme GR (1995) A note on smooth tests of goodness of ®t for location-scale
families. Biometrika 82:437±438
Bowman
p KO, Shenton BR (1975) Omnibus test contours for departures from normality based on
b1 and b2 . Biometrika 62:243±250
Chihara TS (1978) An introduction to orthogonal polynomials. New York, Gordon and Breach
Gastwirth JL, Owens MEB (1977) On classical tests of normality. Biometrika 64:135±139
Henze N (1997) Do components of smooth tests of ®t have diagnostic properties? Metrika 45:121±
130
252 B. Klar

Henze N, Klar B (1996) Properly rescaled components of smooth tests of ®t are diagnostic. Aus-
tral. Journ. Statist. 38:61±74
Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions. J. Wiley & Sons, New
York
Kallenberg WCM, Ledwina T, Rafajlowicz E (1997) Testing bivariate independence and nor-
mality. SankhyaÅ Ser. A 59:42±59
Kocherlakota S, Kocherlakota K (1992) Bivariate discrete distributions. Marcel Dekker, New
York
Koziol JA (1986) Assesing multivariate normality: a compendium. Commun. Statist.-Theory
Meth. 15:2763±2783
Koziol JA (1987) An alternative formulation of Neyman's smooth goodness of ®t tests under
composite alternatives. Metrika 34:17±24
Lehmann EL (1983) Theory of point estimation. J. Wiley & Sons, New York
Mardia KV, Kent JT (1991) Rao score tests for goodness of ®t and independence. Biometrika
78:355±363
Neyman J (1937) Smooth test for goodness of ®t. Skandinavisk Aktuarietidskrift 20:149±199
Rayner JCW, Best DJ (1986) Neyman-type smooth tests for location-scale families. Biometrika
73:437±446
Rayner JCW, Best DJ (1988) Smooth tests of goodness of ®t for regular distributions. Commun.
Statist.-Theory Meth. 17:3235±3267
Rayner JCW, Best DJ (1989) Smooth tests of goodness of ®t. New York, Oxford University Press
Rayner JCW, Best DJ (1995) Smooth tests for the bivariate Poisson distribution. Austral. J.
Statist. 37:233±245
Thomas DR, Pierce DA (1979) Neyman's smooth goodness-of-®t test when the hypothesis is
composite. J. Amer. Statist. Ass. 74:441±445

You might also like