0% found this document useful (0 votes)
8 views11 pages

Composites As Factors

The paper discusses a factor model utilizing canonical variables to analyze high-dimensional datasets without requiring an underlying model. It emphasizes the use of generalized canonical variable estimators to retrieve parameters from covariance matrices and outlines methods for testing the composite factor model. The findings contribute to the debate on Partial Least Squares (PLS) and suggest further extensions of the model for improved analysis.

Uploaded by

Jorge Quijano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Composites As Factors

The paper discusses a factor model utilizing canonical variables to analyze high-dimensional datasets without requiring an underlying model. It emphasizes the use of generalized canonical variable estimators to retrieve parameters from covariance matrices and outlines methods for testing the composite factor model. The findings contribute to the debate on Partial Least Squares (PLS) and suggest further extensions of the model for improved analysis.

Uploaded by

Jorge Quijano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Composites as Factors: canonical variables revisited

working paper, Theo K. Dijkstra, June-22-2013

In this note we specify a factor model with observables only. Its


parameters are estimated in a natural way by generalized canon-
ical variables programs. The paper is meant to contribute to the
ongoing discussion about the merits of Partial Least Squares.

0. Introduction

Canonical variables attempt to capture in low dimensions the information


that separate sets of measured variables contain about each other. There
need not be an underlying model, no speci…cation of a hypothetical gener-
ator, partly random, partly structural, of the observations. The researcher
could simply be involved in an exploratory analysis of a high-dimensional
dataset, using a partition of the observed variables, that is neither de…nitive
nor completely arbitrary. Canonical variables can be of great help there.
But there are also situations where one is willing to entertain a path di-
agram, that maps possible dependencies between concepts as measured or
de…ned in a formative way. Observed variables, ‘indicators’, determine the
content of the concepts. One of the goals of the analysis is to assign weights
to the indicators for the concepts, to construct ‘indices’or ‘linear composites’
that honor their interrelationships as well as possible. So we have a ‘mea-
surement triplet’: a conceptual structure, as embodied in the path diagram,
the speci…cied indicators, and their composites. We will adopt a modeling
principle from PLS1 : all information between the blocks is conveyed by the
composites. The next section speci…es the implied model, it is similar to yet
clearly distinct from a classic factor model. In section two we will show that
classical generalized canonical variable estimators are capable of retrieving
the underlying parameters from a knowledge of the covariance/correlation
matrix. This is true also for PLS’mode B, as well as mode A (with a simple
modi…cation). When applied to a sample covariance/correlation the resulting
estimators are consistent and asymptotically normal, under standard condi-
tions (a Gaussian distribution is allowed of course but not required). The
third section outlines possible ways of testing the composite factor model,
using overall goodness-of-…t criteria and ‘local tests’. Section four concludes
and suggests further extensions of the model.
1
In PLS this is seen as a basic principle in soft modeling. See e.g. Wold’s chapter
on ‘Soft Modeling: The Basic Design and Some Extensions’in K.G. Jöreskog & H. Wold
(1982), (eds.), Systems under indirect observation, part II, North-Holland, Amsterdam.

1
1. The composites factor model

Suppose we have a zero mean random vector x; decomposable into N sub-


vectors x1 , x2 , :::, xN , with , the covariance/correlation matrix2 of x de-
composed conformably:
2 3
11 12 ::: 1N
6 22 :::
7
=64
2N 7
: (1)
::: ::: 5
NN

Here each ii := Exi x|j is assumed to be positive de…nite, and for all i 6= j
we have
ij := Exi x|j = ij ii wi wj| jj (2)
where each weight vector wi is normalized: var(wi| xi ) = wi| ii wi = 1: We
will take := ij positive de…nite. This, together with all ii > 0 entails
that itself is positive de…nite. Note that because of the normalization
wi| ij wj =corr wi| xi ; wj| xj = ij . In addition, corr(xi ; wi| xi ) = ii wi . So
according to the model the ‘cross-covariances’ ij are completely determined
by the correlations between the linear composites and their correlations with
their direct indicators. In fact we have

ij = corr wi| xi ; wj| xj corr (xi ; wi| xi ) corr x|j ; wj| xj (3)

There are serious restrictions on the ranks of the o¤-diagonal matrices. In


particular, every sub-matrix of the form [ i;i+1 ; :::; iN ] has rank one, and
similarly for ‘columns’.
If we de…ne for every block a vector of ‘measurement errors’

i := xi ii wi wi| xi (4)

(perhaps a better name would be ‘redundancy vector’since it separates the


information from xi that is uncorrelated with its composite), it is easily
veri…ed that they possess the following properties (with i 6= j):

E( i wi| xi ) = 0 (5)
E i wj| xj = 0 (6)
|
E i j = 0: (7)
2
Generally we will assume that all indicators are standardized, with a zero mean and
unit standard deviation.

2
In words: the measurement vectors are uncorrelated with every composite,
and uncorrelated between blocks. This is rather reminiscent of a classical
factor model, in particular the ‘basic design’in PLS, but it is certainly not
identical with it. The factor model adds N unobserved latent variables to x,
one for each xi . These latent variables replace the composites. The ensuing
covariance matrix has the same rank-restrictions as on the o¤-diagonal
sub-matrices. But usually the matrices on the diagonal satisfy additional
restrictions, due to zero-correlation assumptions concerning the measure-
ment errors within the blocks. At least some zero-correlations are needed
to identify the loadings. A typical, strong assumption is that the covariance
matrices of the measurement vectors are diagonal. This is not possible for
|
i since clearly wi i is identically zero: so the elements of i can only be
uncorrelated when they are all zero with probability one, and ii , in that
case with rank one, would not be invertible. It may sometimes be possible to
rescale the vectors, by multiplying ii wi by a scalar ci , such that ij =(ci cj )
|
and the ‘error covariance matrices’ ii c2i ii wi wi ii are positive de…nite
with at least some o¤-diagonal zeros: Then one could re-interpret as the
covariance matrix for a traditional factor model, at the expense of introduc-
ing N additional unobserved variables. But this does not seem necessary3 ,
and Occam cum suis would advice against it.
So far we have not restricted the values of the ij , they could be anything
provided ij > 0: In simulation studies one may wish to work with …xed
values for the regression coe¢ cients and the multiple correlation coe¢ cient.
For three blocks e.g. one could want particular values for 1 , 2 and 23:12
respectively. Since

1 + 12 2 = 13 (8)
12 1 + 2 = 23 (9)
and
2 2 2
1 + 2 +2 1 2 12 = 3:12 (10)
we can solve the last equation for 12 , and then use the previous two for 13
and 23 . It remains to check for positive de…niteness that 212 < 1 and
2 2 2
12 + 13 + 23 2 12 13 23 < 1: (11)
The weight vectors are perfectly free (apart from the normalization of course)
as well as the (p.d.) covariance matrices ii . So if we generate a random
3
The additional factor model structure on is not necessarily a bad thing. Predic-
tions based on estimators incorporating the constraints, with the implied reduction of the
number of parameters, even when or perhaps just because they are incorrect, might well
be more accurate than those not using them.

3
vector x with as speci…ed, we know that the composites obey the required
regression equation

w3| x3 = 1 w1| x1 + 2 w2| x2 + 3:12 (12)

where the implied residual 3:12 is by construction uncorrelated with the ex-
planatory composites, and the multiple R2 equals 23:12 . Note that the best
b3 of x3 in terms of x1 and x2 is simply:
(least squares) predictor x

b3 =
x 33 w3 ( 1 w1| x1 + 2 w2| x2 ) (13)
x1
= 33 w3 1 w1| 2 w2| : (14)
x2

A regression prediction of an indicator of the third block based on x1 and x2


is the product of its correlation with the third composite and the prediction
of the latter in terms of the other composites. The regression matrix has
rank one.
Models with rank-restrictions on sub-matrices of ij , as is the case for
interdependent simultaneous equation systems, will require some more work
but no more than for traditional factor models.

2. Generalized canonical variables

Optimization programs designed for the extraction of canonical variables


appear to have a natural role here. The special case of two blocks, treated
extensively in standard multivariate literature4 , is the least interesting, since
all alternative programs are then identical. Here we will discuss a number of
the generalizations as developed for three and more blocks5 , and show that
they are all capable of retrieving the parameters from . The implication,
provable with some standard asymptotic arguments, is that when applied to
S they will all produce CAN-estimators when S is CAN for . If the model is
incorrect, so that plim(S) 6= , they will tend to di¤erent probability limits,
and their sample di¤erences can be used for testing (section 3). For nota-
tional ease we will use a composites factor model with three blocks, N = 3,
but that is no limitation.
Now consider the covariance matrix, denoted by (v), of v1| x1 , v2| x2 and
4
See e.g. chapter twelve from T. W. Anderson (1984), Introduction to Multivariate
Statistical Analysis, Wiley, New York.
5
The reference here is J. R. Kettenring (1971). Canonical analysis of several sets of
variables, Biometrika, 58, 3, 433-451. See also chapter three from R. Gnanadesikan (1997).
Methods for Statistical Data Analysis of Multivariate Observations, Wiley, New York.

4
v3| x3 where each vi is normalized (var(vi| xi ) = 1). So
2 3
1 v1| 12 v2 v1| 13 v3
(v) := 4v1| 12 v2 1 v2| 23 v3 5 : (15)
v1| 13 v3 v2| 23 v3 1

Canonical variables are composites whose correlation matrix has ‘maximum


distance’ to the identity matrix of the same size. They are ‘collectively
maximally correlated’. The term is clearly ambiguous for more than two
blocks. One program that would seem to be natural is to maximize with
respect to v
z (v) := abs ( 12 ) + abs ( 13 ) + abs ( 23 ) (16)
subject to the usual normalizations. Since

abs ( ij ) = abs ij abs (vi| ii wi ) abs vj| jj wj (17)

we know, thanks to Cauchy-Schwarz, that


q
1 1 1 1 1 1
abs (vi| ii wi ) = abs vi| 2
ii ii wi
2
vi| 2
ii ii vi
2
wi| 2
ii ii wi
2
(18)
p |
= vi ii vi wi| ii wi =1 (19)

with equality if and only if vi = wi (ignoring irrelevant sign di¤erences).


Observe that the upper bound can be reached for vi = wi for all terms in
which vi appears, so maximization of the sum of the absolute correlations
gives w: A numerical, iterative routine6 suggests itself by noting that the
optimal v1 satis…es the …rst order condition

0 = sgn ( 12 ) 12 v2 + sgn ( 13 ) 13 v3 l1 11 v1 (20)

where l1 is a Lagrange multiplier (for the normalization), and two other quite
similar equations for v2 and v3 : So with arbitrary starting vectors one could
solve the equations recursively for v1 , v2 and v3 respectively, updating them
after each full round or at the …rst opportunity, until they settle down at the
optimal value. Note that each update of v1 is obtainable by a regression of
a ‘sign-weighted sum’

sgn ( 12 ) v2| x2 + sgn ( 13 ) v3| x3 (21)


6
With one does not really need an iterative routine of course: ij = ij ii wi wj| jj
can be solved directly for the weights (and the correlation). But in case we just have S,
an algorithm comes in handy.

5
on x1 , and analogously for the other weights. This happens to be the classical
form of PLS’ mode B7 . For we do not need many iterations, to put it
mildly: the update of v1 is already w1 , as straightforward algebra will easily
show. And similarly for the other weight vectors. In other words, we have
in essentially just one iteration a …xed point for the mode B equations that
is precisely w.
If we use the correlations themselves in the recursions instead of just their
signs, we regress the ‘correlation weighted sum’

12 v2| x2 + 13 v3| x3 (22)

on x1 (and analogously for the other weights), and end up with weights that
maximize
z (v) := 212 + 213 + 223 ; (23)
the simple sum of the squared correlations. Again, with the same argument,
the optimal value is w.
Observe that for this z (v) we have

2
P
3
2
tr = 2 z (v) + 3 = i (24)
i=1

where i := i ( (v)) is the ith eigenvalue of (v). We can take other


functions of the eigenvalues, in order to maximize the di¤erence between
(v) and the identity matrix of the same order. Kettenring
Q (1971) discusses
a number of alternatives. One of them minimizes i i , the determinant of
(v), also known as the generalized variance,
P …rst suggested by Steel (1951)8 .
The program is called ‘genvar’. Since i i is always N (three in this case)
for every choice of v, genvar tends to make the eigenvalues as diverse as
possible (as opposed to the identity matrix where they are all equal to one).
2
Note that the determinant of equals (1 23 ) times

1
1 23 12
1 12 13 (25)
23 1 13
1 |
| | 1 12 v2 22 w2
= 1 (v1| 11 w1 )
2
12 v2 22 w2 13 v3 33 w3
23
|
23 1 13 v3 33 w3

where the last quadratic form does not involve v1 , and we have with the
usual argument that genvar produces w also. See Kettenring (1971) for
7
See chapter two of T. K. Dijkstra (1981). Latent variables in linear stochastic models
(PhD thesis). Second edition 1985, Sociometric Research Foundation, Amsterdam.
8
R. G. D. Steel (1951). Minimum generalized variance for a set of linear functions.
Annals of Mathematical Statistics, 22, 456-460.

6
an appropriate iterative routine (this involves the calculation of ordinary
canonical variables of xi and the (N 1)-vector consisting of the composites
of the other blocks).
Another program is ‘maxvar’, which maximizes the largest eigenvalue.
For every v one can calculate the linear combination of the corresponding
composites that best predicts or explains them: the …rst principal component
of (v). No other set is as well explained by the …rst principal component
as the maxvar composites. There is an explicit solution here, no iterative
routine is needed, if one views the calculation of eigenvectors as non-iterative,
see Kettenring (1971) for details9 . One can show again that the optimal v
equals w, although this requires a bit more work than for genvar (due to the
additional detail needed to describe the solution).
As one may have expected, there is also ‘minvar’, the program aimed
at minimizing the smallest eigenvalue (Kettenring (1971)). The result is a
set of composites with the property that no other set is ‘as close to linear
dependency’as the minvar set. We also have an explicit solution, and w is
optimal again.
As indicated in the introduction, PLS’mode A can be used to retrieve w
as well. Start with any v, and regress x1 on the sign-weighted sum (instead
of the other way around, as with mode B). This yields a vector proportional
to
sgn ( 12 ) 12 v2 + sgn ( 13 ) 13 v3 : (26)
Inserting the expressions for 12 and 13 in terms of w yields quickly that the
new update for v1 is proportional to 11 w1 . The other mode A regressions
yield vectors proportional to 22 w1 and 33 w3 respectively: So we have a …xed
point of the PLS mode A equations, that can be trivially transformed into w:

A remark about the PLS-algorithms in this section: they were applied to


path diagrams with links between all composites, whereas many path dia-
grams in practice will have at least some composites which are not connected
directly. As long as the ij have the rank one product structure, that is ir-
relevant when we work with : With S it is an empirical matter: it may be,
as seems to be backed up by some experience, that leaving out composites in
the sign-weighted sum that are only weakly connected with its corresponding
composite helps reduce the sampling variability in the weight estimators.

A challenge, not for but for S, is that one might have speci…ed a simultane-
ous equation system for the composites, which leads to restrictions on their
9
This is true when applied to S as well: methods other than maxvar and minvar will
for S require more than just one iteration (and all programs produce di¤erent results).

7
correlations matrix. We could in principle employ them in the determination
of the weights, and so enhance the statistical stability. The numerical issues
remain to be resolved for arbitrary setups10 .

3. Testing the composites factor model

Here we sketch four more or less related approaches to test the appropri-
ateness or usefulness of the model. In practice one might perhaps want to
deploy all of them.

3.1 Testing rank restrictions on sub-matrices using standard two-block canon-


ical variables.
The covariance matrix of any sub-vector of xi with any choice form the other
indicators has rank one. Therefore one could use any of the methods de-
veloped for restricted rank testing, using standard canonical variables. A
possible objection could be that the tests are probably sensitive to devia-
tions from the Gaussian distribution, but jackkni…ng or boostrapping might
help to alleviate this. Another issue is the fact that we get many tests that
are also correlated, so that simultaneous testing techniques based on Bonfer-
roni, or more modern approaches are required11 .

3.2 Exploiting the di¤erence between di¤erent estimators.


We noted that a number of generalized canonical variable programs yield
identical results when applied to a satisfying the composites factor model.
But we expect to get di¤erent results when this is not the case. So, when us-
ing S one might want to check whether the di¤erences between, say PLS and
maxvar (or any other couple of methods), are too big for comfort. The scale
on which to measure this could be based on the probability (as estimated by
the bootstrap) of obtaining a larger ‘di¤erence’then actually observed12 .

3.3 Prediction tests, via cross-validation.


The path diagram might naturally indicate composites that are most relevant
for prediction, as in the context of Technology Acceptance Models13 e.g. So
10
See also T. K. Dijkstra (2010). Latent Variables and Indices: Herman Wold’s Basic
Design and Partial Least Squares. Chapter 1 in V. Esposito Vinzi, W. W. Chin. J.
Henseler, H. Wang (eds.): Handbook of Partial Least Squares. Springer.
11
See e.g. chapter 34 from A. DasGupta (2008). Asymptotic Theory of Statistics and
Probability. Springer.
12
It is not clear, to me, whether bootstrap samples should be generated from the original
sample data or from ‘model transformed’sample data (which means that the covariance
matrix of the transformed data satis…es the composites factor model).
13
See for this model F. D. Davis (1989). Perceived Usefulness, Perceived Ease of Use,and

8
it would seem to make sense to test whether the model’s rank restrictions
can help improve predictions of certain selected composites. As noted before,
the result will not only re‡ect model adequacy but also the statistical phe-
nomenon that the imposition of structure, even when strictly unwarranted,
can help in prediction. So it would re‡ect also the sample size.

3.4 Global goodness-of-…t-tests.


In SEM we test the model by assessing the probability value of a Chi2 -type
distance measure between the sample covariance matrix S and a matrix b
2
that satis…es the model. Popular measures are 1 tr S 1 S b 2
, and
tr S b 1 +log det S 1 b minus the number of indicators. They belong
to a large class of distances, all expressible in terms of a suitable function f :
P
f k S 1b : (27)
k

Here k S 1 b is the k th eigenvalue of its argument, and f is essentially a


smooth real function de…ned on positive real numbers, with a unique global
minimum of zero at the argument value 1. For the examples referred to we
have f ( ) = 12 (1 )2 and f ( ) = log( ) 1 respectively. Another ex-
1 2
ample is f ( ) = 2 (log ( )) . The idea is that when the model …ts perfectly
S 1 b is the identity matrix and all its eigenvalues equal one, and conversely.
This class of distances was …rst analyzed by Swain14 (1975). Distance mea-
sures outside of this class are those induced by WLS with general fourth-
2
order moments based weight matrices, but also the simple tr S b . We
can take any of these measures, calculate its value and use the bootstrap to
estimate the corresponding probability value. A probably highly redundant
reminder: the observation vectors ought to be pre-multiplied by b 2 S 2 (to
1 1

ensure that their empirical distribution has a covariance matrix that agrees
with the model) before the bootstrap is implemented. For b one could take
b ii := Sii and for i 6= j

bi| Sij w
b ij := w bj Sii w bj| Sjj :
bi w (28)
4. Conclusion

User Acceptance of Information Technology. MIS Quarterly, 13, 319-340.


14
A. J. Swain (1975). A class of factor analysis estimation procedures with common
asymptotic sampling properties. Psychometrika, 40, 315-335. See also T. K. Dijkstra
(1990). Some properties of estimated scale invariant covariance structures. Psychometrika,
55, 327-336.

9
Although the discussion of various generalized canonical variables programs
could have been misleading for general readers, those readers familiar with
PLS will not have failed to note what the paper is really all about: the
speci…cation of a model, in the classical sense of a restriction on (aspects
of) the distribution function of the observables, that …ts the PLS-algorithms
and the underlying modeling approach as ‘naturally as possible’. The tra-
ditional way to expound PLS is to introduce a particular factor model with
unobservable latent variables, the so-called ‘basic design’, and a set of al-
ternating least squares algorithms that produce composites, who are taken
as ‘proxies’ for the latent variables15 . The relationships between the for-
mer represent an inevitable distortion of the relationships between the latter
(consistency-at-large notwithstanding), and there is an inevitable stream of
papers bemoaning this (the author’s PhD thesis included). There are also
constructive contributions (the author’s PhD thesis and some recent work
included) that correct the ‘inconsistencies’ in a simple way, and an addi-
tional ‘simple’adaptation gives asymptotically e¢ cient estimators on a par
with ML16 . But they all take as their point of departure the latent variables
factor model, where PLS is by necessity an approximation. In the present
paper we turn the tables as it were and de…ne a composites factor model
with observables only: now the latent variables factor models are approxi-
mations, and a perfect …t is generally not possible (but no doubt there is also
a consistency-at-large property).
Spaßbeiseite, the present paper should not be construed as an attempt
to heat up a discussion that seems to be getting ever more personal17 . An
instrumentalist approach seems to be called for, aimed at testing the useful-
ness of the alternative approaches, and cataloguing with an open mind in
what …elds, and in which stages of development, and for which types of data
they are most appropriate. Personally I would emphasize ‘prediction’but I
am willing to concede that understanding has some uses as well.
The ideas in this paper certainly need further analysis and development.
A case in point is the incorporation in a user friendly way of the informa-
tion embodied in simultaneous equation systems in the determination of the
weights. Another, perhaps even more pressing challenge is to develop higher
order stages, as in principal components and canonical variables analysis
15
See H. Wold (1982). Soft Modeling: The Basic Design and Some Extensions. Chapter
1 in K. G. Jöreskog & H. Wold (1982), eds. Systems under indirect observation, part II.
North-Holland, Amsterdam.
16
See Wenjing Huang (2013). PLSe: E¢ cient Estimators and Tests for Partial Least
Squares. PhD thesis UCLA. A paper version, together with P. Bentler and the present
author is in preparation.
17
References not included.

10
proper. The approach in this paper would be the …rst stage. For the next
stage one would dig deeper, and extract path diagrams of other dimensions
of the concepts under scrutiny. It would seem to be important to specify
the model in such a way that the algorithms as descibed when applied to S
return the …rst stage, even when there are more ‘layers of meaning’, instead
of some mixture of composites at various dimensions. It is not evident, to
the present author, whether and how this can be done.
Acknowledgement.18

18
A recent review of a paper stimulated me to write this note. Proper acknowledgement
will follow when that paper is published.

11

You might also like