Dornik (1994)
Dornik (1994)
Multivariate Normality
By JURGEN A. DOORNIKy and HENRIK HANSEN
Nueld College, Oxford University of Copenhagen, Denmark
November 24, 1994
SUMMARY
We suggest an easy to use version of the omnibus test for normality using skewness
and kurtosis based on Shenton and Bowman (1977) which controls well for size. A
multivariate version is introduced. Size and power are investigated in comparison
with four other tests for multivariate normality. The alternative hypothesis in the
power simulations is the whole Johnson system of distributions.
1. INTRODUCTION
Testing for normality is a common procedure in much applied work and many tests
have been proposed, see for example the overviews in Mardia (1980), D'Agostino (1982)
and Small (1985). The need for testing normality in a multivariate setting is discussed,
inter alia, by Gnanadesikan (1977, x5.4.2), Cox and Small (1978) and recently Cox and
Wermuth (1994).
A test frequently used is the sum of squares of the standardized sample skewness
and kurtosis, which is asymptotically distributed as a 2 -variate. For small samples the
transformation to approximate 2 as given in Bowman and Shenton (1975) is cumber-
some, and many computer programs only report the asymptotic version of the test.
The next section bases a more convenient test statistic on work by Shenton and
Bowman (1977) (a possibility already mentioned by Pearson, D'Agostino and Bowman,
1977), who give the kurtosis a conditional gamma distribution. The test statistic uses
transformed skewness and kurtosis, and is simple to implement. A multivariate version of
the test is introduced in x3. Size and power comparisons with several other multivariate
tests are given in x5 and x6.
All results in this paper are based on random samples. In practice, however, the
tests will also be applied to regression residuals and residuals from time series models.
yAddress for correspondence: Nueld College, Oxford OX1 1NF, UK
1
an omnibus test for normality 2
Limited numerical results in Pierce and Gray (1982), Pierce (1985) and Lutkepohl and
Schneider (1989) suggest that the size of the tests is largely una ected, but there could
be some loss of power.
x = n1
X
n
x ; m = n1
X
n
p
(x , x) ; b1 = m332 and b2 = m42 :
i
i =1
i i
i=1
i
m2 =
m2
Bowman and Shenton (1975) consider the test ( f denotes `asymptotically dis-
a
p
unsuitable except in very large samples. The statistics b and b are not independently
1 2
distributed (although uncorrelated), and the sample kurtosis especially approaches nor-
mality
p very slowly. They proceed to derive a test based on approximating the distribution
of b1 and b2 by the Johnson system (see e.g. Kendall, Stuart and Ord, 1987, x6.27{36,
or Pearsonpand Hartley, 1972, x18), assuming independence. The Johnson S approxi- U
The test considered here derives from Shenton and Bowman (1977), p who give b2
(conditional on b2 > 1 + b1 ) a gamma distribution. The distribution of b1 is still based
on Johnson S . Let z1 and z2 denote the transformed skewness and kurtosis, where the
U
transformation creates statistics which are much closer to standard normal. The test
]
statistic is ( denotes `approximately distributed as'):
] (2) :
app
E = z12 + z22
p app
2
The transformation for the skewness is as in D'Agostino (1970); the kurtosis is trans-
formed from a gamma distribution to 2 , which is then translated into standard normal
using the Wilson-Hilferty cubed root transformation. The precise form of z1 and z2 is
given in the appendix, together with a numerical example. The advantage of this statis-
tic is that is easy to implement and requires only tables of the 2 distribution. Note
that the formulae break down for n 7.
an omnibus test for normality 3
with sample mean and covariance X = n,1 (X1 + : : : + X ) and S = n,1 X 0X where
n
X 0 = (X1 , X;
: : :; X , X ).
n
Create a matrix with the reciprocals of the standard deviation on the diagonal:
V = diag S11,1 2; : : :; S ,1 2 ;
=
pp
=
with = diag (1; : : :; ), the matrix with the eigenvalues of C on the diagonal. The
n
Using population values for C and V , a multivariate normal can thus be transformed
into independent standard normals; using sample values this is only approximately so.
We may now compute univariate p skewness
p and kurtosis of each transformed n{vector
of observations. De ning B1 = ( b11; : : :; b1 ), B20 = (b21; : : :; b2 ) and as a p-vector
0 p p
where Z10 = (z11 ; : : :; z1 ) and Z20 = (z21; : : :; z2 ) are determined by (1) and (2) given
p p
compute marginal skewness and kurtosis coecients, Small (1980) took the opposite
route: weight the marginal skewness and kurtosis coecients of the raw variables by
their approximate correlations. Let B1 and B2 denote the vectors of sample skewness
and kurtosis of the original data, which has sample correlation matrix C = (c ). Small ij
(1980) used the Bowman and Shenton (1975) correction, but here we use x2 to compute
the transformed skewness and kurtosis of the original data vectors obtaining Z1 and Z2,
and:
Q = Z10 U1,1 Z1 + Z20 U2,1Z2
p 2 (2p) ; ]
app
The next test is that proposed by Mardia (1970). Create the n n matrix:
,1X 0;
D = (d ) = XS
ij
1p
n2 i=1 j =1
ij 2p
n i=1
ii
]
!
z + A1 (p)=n , A2 (p)=n
2 2
Z = p
2
(1) :
fB1(p)=n , B2(p)=n2g 21
p app
The original paper shows how to obtain A and B . To obtain a correct size for the test
i i
subtract it.
Finally, we consider the multivariate Shapiro-Wilk test as proposed by Royston
(1983). This combines the univariate tests (see Royston, 1982) creating a statistic H
which has an approximate 2 distribution with non-integer degrees of freedom e, where
1 e p. For convenience we translate this into a 2 (1) using the Wilson-Hilferty
approximation:
] (1) :
( )
w= H 13 , 1 + 2 9e 21 ; W = w2 2
e 9e 2 p app
an omnibus test for normality 5
of normality, using 2 (2p) critical values. The reported simulations were done in Ox
(a C++ like matrix language), using the internal random number generator. The com-
putations were repeated in Gauss, resulting in minor descrepancies owing to a di erent
random number generator. Each table is based on common randomp numbers. We report
the approximate Monte Carlo standard error as MCSE = fq (1 , q )=M g where M is
the number of replications and q the tail probability of interest.
Table 1 contrasts E with its asymptotic version (E ) and the form suggested by
p
a
p
Anscombe and Glynn (1983), labelled E . The latter uses the Pearson type V distri-
V
p
bution for the kurtosis and a Johnson S for the skewness. Table 1 shows that this
U
does not control the size as well as E . Figure 1 illustrates the bene ts of taking the
p
moderately well in the bivariate case. Table 3 is for the bivariate normal case, with
various correlation coecients. It shows that Q is more sensitive to than E . Both
p p
6. POWER COMPARISONS
For the power simulations we compute rejection frequencies at 5% under the alternative
distribution, using a sample size of 50 observations. The critical values for E , Z and
p p
W are taken from the 2 table; the 5% critical values for M are found by simulation:
p p
Q very closely resembles that of E and is omitted. We see from Figs. 2{4 that E has
p p p
an omnibus test for normality 6
the best power properties. Z has virtually no power against any of these alternatives.
p
Z focuses only on the elements of D which form the kurtosis component of M , so low
p p
power against skewed alternatives is expected, however, power in the kurtosis dimension
is absent too. Figure 5 shows the di erence of power between E and W , and con-
p p
rms an established result for the univariate case, namely that the Shapiro-Wilk test
has higher power against platykurtic distributions, and lower power against leptokurtic
distributions. Or more precisely: Shapiro-Wilk has higher power against alternatives in
the bimodal area of the Johnson S system, but lower against S . The graphs for the
B U
P30 simulations are not presented, but give the same overall picture: all powers have
improved, but the relative ranking remains the same.
We experimented with an alternative form of the proposed test statistic E using p
only one degree of freedom for the kurtosis component: the p 2 components P
of the
kurtosisPare added up, giving (non-integer) degrees of freedom = pa + c b1 for i
replace the and in (2), given in the appendix). This alternative test results in
somewhat better size properties, but a loss of power in low kurtosis directions (a property
shared by M which also has only one kurtosis component). This form could be useful
p
plete lack of power of the Q and W tests against a non-normal bivariate distribution
p p
with normal marginals. It is interesting to note that the test suggested by Malkovich and
A (1973), which consists of that linear combination of the variables which maximizes
skewness or kurtosis, will also fail in this situation. The W statistic is an alternative
p
form of the multivariate Shapiro-Wilk test, where the data are rst transformed to ap-
proximate independent normality as for E ; then the p univariate Shapiro-Wilk tests are
p
has correct size and good power properties. However, E could potentially be used in
p
conjunction with M .p
ACKNOWLEDGEMENTS
We wish to thank Sir David Cox, David Hendry, Chris Orme and Neil Shephard for
helpful comments. Financial support from the UK Economic and Social Science Research
Council is gratefully acknowledged by Jurgen Doornik.
an omnibus test for normality 7
APPENDIX
p
The transformation for the skewness b1 into z1 is as in D'Agostino (1970):
27n , 70)(n + 1)(n + 3) ;
= 3((nn ,+2)(
2
n + 5)(n + 7)(n + 9)
! 2 = ,1 + f2( , 1)g 2 ;
1
= n p1 o 1 ; (1)
log( ! 2) 2
p
y = b1 2
! 2
, 1
1
(n + 1)(n + 3) 2
;
6(n , 2)
, 1
z1 = log y + y 2 + 1 2 :
The kurtosis b2 is transformed from a gamma distribution to 2 , which is then translated
into standard normal z2 using the Wilson-Hilferty cubed root transformation:
= (n , 3)(n + 1)(n2 + 15n , 4);
a = (n , 2)(n + 5)(n + 7)(n2 + 27n , 70) ;
6
c = (n , 7)(n + 5)(n + 7)(n2 + 2n , 5) ;
6
k = (n + 5)(n + 7)(n3 + 37n2 + 11n , 313) ; (2)
12
= a + b1c;
= (b2 , 1 , b1)2k;
z2 = 13 , 1 + 1 (9 ) 12 :
2 9
We base a numerical example on the iris data of Fisher (1936), which was used for
the same purpose by Small (1980) and Fang and Wang (1994, example 6.5). Using the 50
observations for Iris Setosa on sepal length, sepal width, petal length and petal width,
we nd for each individual variable: skewness (0.11645, 0.039921, 0.10318, 1.2159) and
kurtosis (2.6542, 3.7442, 3.8046, 4.4343). Applying the transformation to approximate
normality discussed in x3 yields skewness B10 = (0.19965, ,0.17132, 0.15837, 1.1610) and
kurtosis B20 = (2.8221, 4.1994, 3.9722, 4.5793). Application of (1) and (2) then yields
Z10 = (0.63839, ,0.54876, 0.50762, 3.1862) and Z20 = (0.24687, 2.5423, 2.2381, ,1.3278).
We see that after rotation, skewness is a problem in one dimension, whereas excess
kurtosis is evident in two dimensions. The resulting test statistics are: E = 24:4145,
p
than 1%, whereas the Z value corresponds to the 29 per cent point, and M to 10%
p p
(based on simulation results). In this case the non-normality of petal width seems to be
the main factor, as without that variable non-normality is not rejected at the 5% level,
with test-statistics of 10.8174 (9%), 8.74419 (19%), 12.9066 (20%) and 1.06251 (30%)
respectively. The same conclusion was reached by Royston (1983).
an omnibus test for normality 8
REFERENCES
Anscombe, G. J. and Glynn, D. F. (1983). Distribution of the kurtosis statistic b2 for
normal samples. Biometrika 70, 227{234.
Baringhaus, L., Danschke, R. and Henze, N. (1989). Recent and classical tests for
normality | A comparative study. Comm. Statist. B 18, 363{379.
Bowman, K. O. and Shenton, p L. R. (1975). Omnibus test contours for departures from
normality based on b1 and b2. Biometrika 62, 243{250.
Cook, R. D. and Johnson, M. E. (1986). Generalized Burr-Pareto-logistic distributions
with applications to a uranium exploration data set. Technometrics 28, 123{131.
Cox, D. R. and Small, N. J. H. (1978). Testing multivariate normality. Biometrika 65,
263{272.
Cox, D. R. and Wermuth, N. (1994). Tests of linearity, multivariate normality and the
adequacy of linear scores. Appl. Statist. 43, 347{355.
D'Agostino, R. B. (1970). Transformation to normality of the null distribution of g1.
Biometrika 57, 679{681.
|| (1982). Departures from normality, testing for. In Kotz, S., Johnson, N. L. and
Read, C. B. (eds.), Encyclopedia of Statistical Sciences, vol. 2. Amsterdam: North-
Holland.
Fang, K.-T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. London:
Chapman and Hall.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. An-
nals of Eugenics, 7, 179{188. Reprinted in R.A. Fisher (1950). Contributions to
Mathematical Sciences. London: John Wiley and Sons.
Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Obser-
vations. London: John Wiley and Sons.
Kendall, M. G., Stuart, A. and Ord, J. K. (1987). Advanced Theory of Statistics 5th
edn., vol. 1. London: Charles Grin and Co.
Lutkepohl, H. and Schneider, W. (1989). Testing for normality of autoregressive time
series. Comp. Statist. Quart. 2, 151{168.
Malkovich, J. F. and A , A. A. (1973). Testing for normality of autoregressive time
series. J. Am. Statist. Ass. 68, 176{179.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis. Biometrika 57,
519{530.
|| (1980). Tests of univariate and multivariate normality. In Krishnaiah, P. R. (ed.),
Handbook of Statistics, vol. 1, ch. 9. Amsterdam: North-Holland.
Mudholkar, G. S., McDermott, M. and Srivastava, D. K. (1992). A test of p{variate
normality. Biometrika 79, 850{854.
Pearson, E. S., D'Agostino, R. B. and Bowman, K. O. (1977). Test for departure from
normality: Comparison of powers. Biometrika 64, 231{246.
an omnibus test for normality 9
Pearson, E. S. and Hartley, H. O. (1972). Biometrika Tables for Statisticians, vol. II.
Cambridge: Cambridge University Press.
Pierce, D. A. (1985). Testing normality of errors in regression models. Biometrika 72,
293{297.
Pierce, D. A. and Gray, R. J. (1982). Testing normality of errors in regression models.
Biometrika 69, 233{236.
Royston, J. P. (1982). An extension of Shapiro and Wilk's W test for normality to large
samples. Appl. Statist. 31, 115{124.
|| (1983). Some techniques for assessing multivariate normality based on the Shapiro{
Wilk W . Appl. Statist. 32, 121{133.
Shenton,
pb L.andR.band Bowman, K. O. (1977). A bivariate model for the distribution of
1 2 . J. Am. Statist. Ass. 72, 206{211.
E a
p
E
V
p
Ep
Ep Qp Z
p Wp
E p Qp W p
.1 ,1 .955 .139 .993 .544 .027 .912 1 ,1 .053 .054 .248 .064 .046 .062
.1 ,.5 .954 .145 .993 .573 .028 .906 1 ,.5 .070 .052 .238 .066 .046 .069
.1 0 .953 .153 .990 .597 .026 .906 1 0 .085 .053 .251 .065 .044 .083
.1 .5 .952 .162 .991 .621 .025 .905 1 .5 .105 .052 .268 .068 .046 .099
.1 1 .955 .179 .991 .643 .026 .911 1 1 .134 .048 .288 .073 .043 .116
.2 ,1 .774 .068 .941 .260 .039 .685 2 ,1 .041 .052 .144 .058 .041 .051
.2 ,.5 .780 .070 .940 .285 .035 .692 2 ,.5 .042 .052 .133 .060 .040 .050
.2 0 .788 .075 .935 .315 .032 .700 2 0 .047 .050 .132 .059 .041 .052
.2 .5 .799 .084 .936 .349 .029 .715 2 .5 .057 .051 .142 .064 .043 .058
.2 1 .815 .095 .938 .387 .028 .731 2 1 .073 .052 .146 .064 .041 .065
.5 ,1 .212 .053 .543 .084 .049 .184 10 ,1 .058 .048 .053 .050 .039 .056
.5 ,.5 .242 .054 .542 .088 .048 .197 10 ,.5 .047 .047 .055 .050 .039 .053
.5 0 .272 .055 .552 .106 .047 .218 10 .0 .045 .048 .057 .050 .037 .049
.5 .5 .311 .056 .577 .124 .046 .251 10 .5 .050 .051 .061 .053 .037 .051
.5 1 .369 .057 .611 .144 .043 .296 10 1 .062 .053 .057 .053 .039 .053
an omnibus test for normality 12
Fig. 1. Cross-plots of transformed skewness and kurtosis for 5000 samples of sizes n =
20; 50; 100 from the standard normal.
an omnibus test for normality 13
Fig. 2. P12 for E : power against two standard normal marginals and the Johnson
p
Fig. 3. P12 for M : power against two standard normal marginals and the Johnson
p
Fig. 4. P12 for Z : power against two standard normal marginals and the Johnson
p
Fig. 5. P12 for W , E : di erence in power against two standard normal marginals and
p p