0% found this document useful (0 votes)
56 views9 pages

Unit 18

This document provides an overview of factor analysis, including its objectives, uses, assumptions, and key concepts. Factor analysis is a technique used to understand the underlying relationship between observed variables by reducing them to a smaller number of unobserved factors. It assumes correlations between observed variables are driven by a fewer number of underlying factors. The document outlines the main uses of factor analysis, the assumptions it is based on, and how it generates a factor loading matrix to show the relationship between observed and latent factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views9 pages

Unit 18

This document provides an overview of factor analysis, including its objectives, uses, assumptions, and key concepts. Factor analysis is a technique used to understand the underlying relationship between observed variables by reducing them to a smaller number of unobserved factors. It assumes correlations between observed variables are driven by a fewer number of underlying factors. The document outlines the main uses of factor analysis, the assumptions it is based on, and how it generates a factor loading matrix to show the relationship between observed and latent factors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT 18 FACTOR ANALYSIS

Structure
18.0 Objectives
18.1 Introduction
18.2 Uses and.Classifications
18.2.1 Understanding the Causes
18.2.2 Assumptions
18.2.3 Factor Loadings hiatrix
18.2.4 Communalities
18.2.5 Number of Factors
18.2.6 Varimax Rotation and Simple Structure Concepts
18.3 Let Us Sum Up
18.4 Key Words
18.5 Some Useful BooksIReferences
18.6 AnswersJHints to Check Your Progress Exercises
1 8.7 Exercise

18.0 OBJECTIVES

After going through this Unit you will be able to:

explain the underlying patterns in a the interrelationship between variables;

estimate factors or latent variables from a data set; and


reduce the dimensionality of a large number of variables to a fewer number df
factors.

INTRODUCTION

Factor analysis (FA) explains variability among observed random variables in terms
of fewer unobserved random variables called factors. The observed variables are
expressed in terms of linear combinations of the factors, plus "error" terms. Factor
analysis originated in psychometrics, and is used in social sciences, marketing,
product management, operations research, and other applied sciences that deal with
large quantities of data.
Factor analysis is applied to a set of variables to discover coherent subsets that are
relatively independent of one another. Variables, correlated with each other and
independent of other subsets of variables are combined into factors. Factors, which
are generated, are thought to be representative of the underlying processes that have
created the correlations among variables.
FA can be exploratory in nature; FA is used as a tool in attempts to reduce a large set '

of variable:: to a more meaningful, smaller set of variables. As FA is sensitive to the


magnitude ~T~orrelations robust comparisons must be made to ensure the quality of
the analysis.
' I
Accordingly, FA is sensitive to outliers, missing data, and poor correlations between
variables due to poorly distributed variables. As a result data transformations have a
large impact upon FA.
Factor ~ n a l ~ s i s
USES AND CLASSIFICATIONS
-
' Factor analysis is integrated in structural equation modeling (SEM), helping create
the latent'variables modeled by SEM. However, factor analysis can be and is often
used OR a standlalone basis for similar purposes.
To select a subset of variables from a larger set based on which original
variables have the highest correlations with the principal component factors.
To create a set of factors to be treated as uncorrelated variables as one
approach to handling multicollinearity i n such procedures as multiple
regression
a To validate a scale or index by demonstrating that its constituent items load on
the same factor, and to drop proposed scale items which cross-load on more
than one factor.
a To establish that multiple tests measure the same factor, thereby giving.
justification for administering fewer tests. -
TO identify clusters of cases andlor outliers.
To, determine network groups by determining which sets of people cluster
together (using Q-mode factor analysis)
' Let us consider this example. Suppose, a mother finds various bumps and shapes
under a blanket at the bottom of a bed. With one shape moving toward the top of the
bed, all the other bumps and shapes also move toward the top, making the mother
, infer that what is under the blanket is a single thing, most likely her child. Factor
analysis takes as input a number of measures and tests, analogous to the bumps and
shapes. Those that move together are considered to be a single factor. That is, in
factor analysis the researcher is assuming that there is a "child" out there in the form
I
of an underlying factor, and he or she takes simultaneous movement (correlation) as
I evideiice of its existence. If correlation is spurious for some reason. this inference
I will be mistaken, of course, so it is important when conducting factor analysis that
/ possible variables, which might introduce spuriousness, such as anteceding causes,
be included in the analysis and taken into account.'
Factor analysis makes many of the same assumptions as multiple regression: linear
relatibnships,, interval or near-interval data, untruncated variables, proper
specification (relevafit variables included, extraneous ones excluded), lack of high
multicollinearity, and multivariate normality for purposes of significance testing.
Eventually, it generates a table in which the rows are the observed raw indicator
variables and the columns are the factors or latent variables, which explain as much
of the varianw in these variables as possible. The cells in this table are factor
loadings, and the meaning of the factors must be induced, from seeing which
variables are most heavily loaded on which factors. This inferential labeling process
can be fraught with subjectivity as diverse researchers impute different labels. 2
Of the various types of factor analyses, principal components a~alysis(PCA) is the
most common. However, principal axis factoring (PAF), also called common factor
analysis, is preferred for purposes of confirmatory factory analysis in structura!
equation modeling.

1
I
18.2.1 Understanding the Causes
Many statistical methods are used to study the relation between independent and
dependent variables. Factor analysis is different; it is used to study the patterns of
relationship among many dependent variables, with the goal of discovering

' http://ww2.chass.ncsu.edulgarson/pa765/factor.htm
htt~:/lww2.chass.ncsu.edu/garson/va765/factor.htm
something about the nature of the independent variables that affect them, even .
though those independent variables were' not measured directly. Thus answers
o>tained by factor analysis are necessarily more hypothetical and tentative than is
true when independent variables are observed directly. The inferred independent
variables are called factor,^. A typi~al'factoranalysis suggests answers to four maior
questions:
'

1) How many different'.factors are needed to explain the pattern of relationships


among these variables?
2) What is the nature of those factors?
'
3) How well do the hypothesized factors explain the observed data?
4) How much purely random or unique variance does each observed variable
include?

18.2.2 Assumptions '

'There is a simple list of fundamental assumptions that underlie factor analysis and
'distinguish it from principal component analysis (even if they share a lot of common
mathematical machinery).
1) The correlations and covariances that exist between n1 variables are a result of
p underlying, mu't~all~~uncorrelat~dfactors. Usually p is less than in.
2) Usually p is known in advance. The number of factors, hidden i l l the data set.
is one of the pieces of a priori knowledge that is brought .to the table to solve
the factor analysis problem.
3). The rank of a matrix and the number of eigenvectors are interrelated, the
eigenvalues are the square of the non-zero singular values. The eigenvalues are
ordered by the amount of variance accounted for.
Factor analysis starts with the basic principal component approach, but differs in two
important ways. First of all, factor analysis is alwuyr, done with starzc/urc/ized data.
This implies that we want the individual variables to have equal weight in their
influence on the underlying variance-covariance structure. In addition, this.-
requirement is-necessary for us to be able to convert the principal component vectors
into factors:Secondly, the eigenvectors must be computed in such a way that thej -
are normulized,i.e. of unit length or orthonormal.

18.2.3 Factor Loadings Matrix


As we stated above, we start factor analysis with principal component analysis, but
we quickly diverge as we apply the a priori knowledge we brought to the problem.
This knowledge may be of the form that we "know" how inany factors there should
be or it may be more of the nature that allows our experience and intuition about the
data guide us as to how many factors there should be. As before, with PCA, we can
take the eigenvectors (of unit length) and weight them with the square root of the
corresponding eigenvalue:

A" = UIJ/Z' ...( 18.1)

Here A" represents a matrix such as:


I I1 111 ... p
XI Lll L12 L I ~ ... LlP
x2 tZ1L22 L21 ...
..
\;here the m variables X run down the side and thepfactors go across the top and th6 Factor Analysis
Id represent the Loudiilys of each variable on ind6idual factors. When p=m you have
the same thing as PCA.

88.2:4 Communaiities

rile cornlnunalities hi represent the fraction of the total variance' 'accounted for of
variabie j. Ry calculating the communalities we can keep track of how much of-the
or:ginai ~ariancethat was contained in variable j is still being accounted for by the
'numbcr of factors we have .retained. As a consequence, when p = m, h: = 1,
always. so long as the data was standardized first. The communalities are calculated
in thc following fashion:

r- I

O!:e can readlthink of this as: summing the squares of -the factor loadings
horizontally across the factor loadings matrix.

18.2.5 'Number of Factors


Above we make several references to the fact that ifp=m (i.e. the number of factors
equal the number of variables) then factor analysis is no different than PCA with
standardized' variables. But of course, in factor analysis, you wantp<<m and so the
question remains: how do you decide which faators to keep? When doing factor
analysis it helps to keep the results in tbe following organization:

Factor Eigenvalue % Total Var Cummulative % Var

At first the A; could be the entries from U (the raw eigenvectors), but later you can
use the, entriesbfrom A " (the factor loadings) as the analysis continues and the
number of factors decreases. In this manner, you can keep track of the number of
factors you are dealing with, how much of the original total variance is being
accounted for, where the variables are loading on the individual factors, and the
communalities on each individual variable. This will help you see how your choices
in the number of factors kept have effected these measures of performance. -
Multivariate Analysis As to the question of how you decide, unfortunately there is no hard and fast rule as
to how many.factors to keep. One rule of thumb is to keep all of the factors whose
eigenvalue is greater than one, provided you started with standardized data. If you
get a lot of factors with eigenvalues greater than one. then you might have to face the
likelihood that maybe the factor theory approach isn't applicable to your problem, at
least in the way you have presented it. Typically the more successful factor analyses
have been those where a "few" factors account for most of the variance. See the
discussion of simple structure concepts in the next section.

18.2.6 Varimax Rotation and Simple Structure Concepts


After you have chosen the few factors you wish to keep in your analysis, you can
"improve" the fit of this reduced dimensionality coordinate system to your data by a
technique known'as factor rofafion. Even though the number of factors may have
reduced the dimensionality of your problem, the factors may not be easy to interpret.
Factor rotation allows you to reorganize the loadings onto rotafed factors. This is
accomplished by maximizing the variance of the loadings on the factors. For each
( K ' ~factor
) we can compute:

where p is the number of retained factors, rn is the number of original variables, a,,, is
the loading of variable j on factor p , and h: is the communality of the/* variable.
Using this expression of the variance of the loading on the I(" factor, one maximizes
the following:

This is an iterative process where you rotate two factors at a time, holding the others
constant, until the increase in the'overall variance V drops below a preset value. This
is the heart of the Kaiser Varimax orthogonal rotation. ,

The various factor rotation methods have, as a guiding principle, the s i n ~ l structure
e
concepfs. That is to say, the results, after rotation, should have become simple in
their appearance. To put it another way, these simple structure concepts should be
considered when trying to determine whether or not a given factor rotation has
clarified the underlying structure of the data. Five simple structure precepts have
been put forth by Thurstone:

1) There should be at least one zero in each row of the factor loadings matrix.

2) There should be at least k zeros in each colimn of the factor matrix, where k B
the number of factors extracted.

3) For every pair of factors, some variables (in the R-mode) should have high
loadings on one and near-zero loadings on the other.

4) For every pair qf fa~tors,several variables should have small loadings on both
factors.

5) ' For &very pair of factqrs, only a few variables should have non-vanishing
loadings on both.
But in real world to get fulfilled all these five rules is very rare with orthogonal
factor rotation.
Check Your Pr0gres.s 1 Factor A.nalysis 4

1) What is meant by a factor in the context of factor analysis?

2) Sketch the three basic matrices involved in the factor analysis procedure: Input
\
data matrix, Correlation matrix, and Factor matrix. F

...............................................................................................

3) Discuss the meaning of a factor loading. What is its maximum and minimum
value?, Explain.

................................................................................................

4). What is accomplished by rotating a factor-loading matrix?


.

18.3 LET US SUM UP


The factor analysis (FA) and principal component analysis (PCA) are amongst the
oldest of the multivariate statistical methods of data reduction. These are the methods
for producing a small number of constructed variables, derived from the larger
number of variables originally collected. The idea is to produce a small number of
/
a
derived variables that are uncorrelated and thateaccount for'most of the variation in
the.origina1 data set. The main reason that we might want 'to reduce the number of
/ vacables in this way is that it helps us understand the underlying structure of the
data.
Principal component analysis does not have an underlying statistical model. I: is just'
1 , '
aamathematicaltechnique and, as such, is used in other statistical analyses that are,
driwn by models, for example, factor analysis. The emphasis in factor analysis is on
-identificationtof underlying factors that might explain the variability in a large and
complix data set. Factor analysis is a two-stage process!while PCA is the most
,
-.

Multivariate Analysis commonly used method for the first stage. the extraction i f an initial solutioli. 1 . h ~ ~ .
the mathematical technique of PCA underlies other multivariate statistical method\.

18.4 KEY WORDS


Communality : It shows the variance ofpan observed variable ., -
accounted for by the comrnon factbrs; in an
orthogonal factor model. It is equivalent to the '.
sum of the squared factor loadings.
Common Factor : It implies the unmeasured (or hypothetical) .
underlying variable which is the source of
variation in at least two observed variables
under consideration.
Eigenvalue (or : It is a mathematical property of a matrix; used
characteristic root) in relation to the decomposition of a covariancL
matrix, both as a criterion of determining the '

number of factors to extract and a measure of


variance accounted for by a given dimension. I

Eigenvector : It is a vector associated with its respective


eigenvalue; obtained in the process of initial
factoring; when these vectors are appropriately
standardized, they become factor loadings.
Factor Extraction : It is the initial stage of factor analysis i n which
the covariance matrix is resolved into a smaller q

nuinber of underlying factors or


Factors : It implies hypothesized, ~ n ~ e a s u r e d ,and
'.
underlying variables which. are presumed to be
the sources of the observed variables; often
divided into unique and common factors.
Factor Loading : It is a general term referring to a coefficient in a
factor'pattern or structure matrix. .
Factor Pattern Matrix : It refers to a matrix of coefficients where the
columns usually refer to common factors and
the rows to the observed variables; elements of
the matrix represent regression weights for the
common factors where an observed variable is
bsumed to be a linear combination of the
factors; for an orthogonal solution, the pattern
matrix is equivalent to correlations between
factors and variables:
Orthogonal Factors : It indicates the factors that are not correlated
with each other; faciors obtained through
orthogonal rotation.
Orthogonal Rotation : It refers to the operation through which a
simple structure.is sought under the restriction
that factors be orthogonal (or uncorrelated);
factors obtained through this rotation are by
definition uncorrelated.
*
Principal Axis : It is a method of initial factoring in which the
Factoring adjusted correlation matrix is decomposed
,, hierarchically; a principal axis factor analysis
with iterated commonalities leads to a least- Factor Analysis -
squares solutions o f initial factoring.

Principal Components : I t reflects linear combinations o f observed


variables, possessing properties such as being
orthogonal to each other, and the first principal
component representing the largest amount o f
variance in the data, the second representing the
second largest and so on; often considered
variants o f common faGors, but more
accurately they are contrasted with common
factors which are hypothetical.

Scree Test : It i s a rule o f thumb criterion for determining


the number o f significant factors to retain; it' i s
based on the graph o f roots (eigenvalues);
dltilned to be appropriate in handling
disturbances due to minor (unarticulated)
factors.

Varimax : It refers to a method o f orthogonal rotation


which simplifies the factor structure by
maximizing the variance o f a column o f the
pattern matrix.

18.5 SOME USEFUL BOOKSIREFERENCES

Kachigan, Sam K., 199 1, Multivuriu/e ,Stutisticul Analysis: A Conceptual


/rrlrodzrction
Kin, Jae-On and Charles W. Mueller, 1978, Factor Analysis: Statistical Methods
(2nd Pruclical ksues, Sage Publications

18.6 ANSWERSIHINTS TO CHECK YOUR


PROGRESS EXERCISES

Check Your Progress 1


1) See Section 18.2 and answer.

2) - See Section 18.2 and answer.


3) See Sub-section 18.2.3 and answer.

4) See Sub-section 18.2.6 and answer.

18.7 EXERCISE
The following Table provides data on the crime rates on 35 demographic groups.
Using an appropriate statistical package (SPSSI Stata, etc.) run a factor analysis and
interpret the factors.
HRS -= Average hours worked during the year
RATE = ' Average hourly wage (dollars)
ERSP = Average yearly earnings of a spouse (dollars)
ERN0 = Average yearly earnings of other family members (dollars)
' NEIN = Average yearly non-earned income (Mars)
-.
Multivariate Analysis ASSET = Average family asset holdings (bank account, etc.) (dollars)
AGE = Average age of respondent
DEP = Average number of dependents
SCHOOL = Average highest grade of school completed.

HRS RATE ERSP ERN0 NElN ASSET AGE DEP SCHOOL


2157 2.905 1121 29! 380 7250 38.5 2.34 10 5
2 174 2.97 1128 30 1 398 7744 39.3 2.335 10.5
2062 2.35 1214 . 326 185 - 3068 40.1 2.85 I 8.9
2111 2.5 1 1 1203 ' 49 117 1632 22.4 ' 1.159 II 5
2134 ' 2.791 1013 594 730 12710 57.7 1.229 8.8
2185 3.04 1135 287 382 7706 38.6 2.602 10 7
r 2210 3.222 1100 295 4 74 9338 39 2.187 11.2
2 105 2.493, 1180 310 255 4730 39.9 2.6 16 9.3
2267 2.838 1298 252 43 1 83 17 38.9 2 074 11.1
2205 2.356 885 264 373 6789 38.8 2.662 9.5
2121 2.922 125 1 328 312 5907 39.8 2.287 10.3
2109 2.499 1207 347 27 1 5069 39.7 3.193 8.9
2108 2.796 1036 300 259 4614 38.2 2.04 9.2
2047 2.453 1213 297 139 1987 40.3 2.545 9.1
2 174 ' 3.582 1141 414 498 10239 40 2.064 11.7
2067 . 2.909 1805 290 239 4439 39.1 2.30 1 10.5
2 159 2.5 1 1 1075 289 308 562 1 39.3 2.486 9.5
2257 2.5 16 1093 1 76 392 7293 37.9 2.042 10.1
1985 1.423 . 553 381 146 1 866 40.6 3.833 6.6
2 184 3.636 . ,1091 29 1 560 1 1240 39.1 2.328 11.6
20b4 . 2.983. 1327 33 1 296 5653 39.8 2.208 10.2
205 1 2.573 1194 279 172 2806 40 2.362 9.1
2 127 3.262 1226. 314 408 8042 39.5 2.259 10.8
-2102 ' 3.234 '1188 4 14 352 7557 39.8 2.019 10.7
2098 . 2.28 973 364 272 4400 40.6 2.66 1 8.4
2042 2.304 1085 328 140 1739 41.8 2.444 8.2
2181 2.9 12 1072 304 ' 383 7340 ' 39 2.337 10.2
2 186 3.015 1122 30 352 7292 37.2 2.046 10.9
2188 3.01 990 366 374 7325 38.4 , 2.847 10.6 -
52077 1.90 1 350 209 95 1370 37.4 4.158 8.2.
, 2196 3.009 947 294 342 6888 37.5 3.047 10.6
2093 1.899 342 31 1 120 1425 37.5 4.5 12 8.1
2173 2.959 1116 296 387, 7625 39.2 . 2.342 10.5
2179 2.971 . 1128 3 12 397 7779 39.4 2.34 1 10.5 .
2200 2.98 1126 204 393 7885 39.2 2.341 10.6

Source: Greenberg, D.H. and M. Kosters, 1970, h o m e Guarantees a i d Working Poor, The Rand Corporation,
R-579-OEO, December.

Hints:
.
How to Run Factor Analysis in SPSS?
J
After opening the SPSS (whatever be the version) window,
Step 1 : Go to "Analyaze" ih the menu given on the top
Step 2 : Click on "Data ReduCtion" '
Step 3 . Slick on "Factor" L

Step 4 : Select all the variables, and place in the blank space
step 5 : Click on "Extraction'? and selict LLprincipal axis factoringy'and -press
continue"
Step 6 : d ~ i c kon "OK".

You might also like