Lecture 10
Factor Analysis
  ➢ The essential purpose of factor analysis is to describe, if possible, the
    covariance relationships among many variables in terms of a few
    underlying, but un observable, random quantities called factors.
  ➢ Basically, the factor model is motivated by the following argument: Suppose
    variables can be grouped by their correlations.
     ✓ That is, suppose all variables within a particular group are highly
         correlated among themselves but have relatively small correlations
         with variables in a different group.
  ➢ Then it is conceivable that each group of variables represents a single
    underlying construct, or factor, that is responsible for the observed
    correlations.
  ➢ For example, correlations from the group of test scores in classics, French,
    English, mathematics, and music collected by Spearman suggested an
    underlying "intelligence" factor. A second group of variables, representing
    physical-fitness scores, if available, might correspond to another factor. It
    is this type of structure that factor analysis seeks to confirm.
  ➢ Factor analysis can be considered an extension of principal component
    analysis.
     ✓ Both can be viewed as attempts to approximate the covariance matrix ∑.
     ✓ However, the approximation based on the factor analysis model is more
        elaborate.
Orthogonal Factor Model with Common Factors
                   𝑋
                   ⏟ = ⏟
                       𝜇 +              ⏟
                                        𝐿       𝐹
                                                ⏟    + ⏟
                                                       𝜀
                 (𝑝×1)     (𝑝×1)      (𝑝×𝑚) (𝑚×1)       (𝑝×1)
                      𝜇𝑖 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖
                      𝜀𝑖 = 𝑖𝑡ℎ 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑓𝑎𝑐𝑡𝑜𝑟
                      𝐹𝑗 = 𝑗𝑡ℎ 𝑐𝑜𝑚𝑚𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟
                      𝑙𝑖𝑗 = 𝑙𝑜𝑎𝑑𝑖𝑛𝑔 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑓𝑎𝑐𝑡𝑜𝑟
The unobservable random vector F and 𝜺 satisfy the following conditions:
                        𝐹 𝑎𝑛𝑑 𝜀 are independent
                        𝐸(𝐹) = 0, 𝐶𝑜𝑣(𝐹) = 𝐼
                        𝐸(𝜀) = 0, 𝐶𝑜𝑣(𝜀) = 𝛹, 𝑤ℎ𝑒𝑟𝑒 𝛹 𝑖𝑠 𝑎 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝑚𝑎𝑡𝑟𝑖𝑥.
Covariance Structure for the Orthogonal Factor Model
   1. 𝐶𝑜𝑣(𝑋) = 𝐿𝐿′ + 𝜳
      or
                            2                2
              𝑉𝑎𝑟(𝑋𝑖 ) = 𝑙𝑖1  + ⋯ … … … . +𝑙𝑖𝑚  + 𝛹𝑖
              𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑘 ) = 𝑙𝑖1 𝑙𝑘1 + ⋯ … … … . +𝑙𝑖𝑚 𝑙𝑘𝑚
   2. 𝐶𝑜𝑣(𝑋, 𝐹) = 𝐿
      or
                𝐶𝑜𝑣(𝑋𝑖 , 𝐹𝑗 ) = 𝑙𝑖𝑗
   ➢ The portion of the variance of the ith variable contributed by the m common
     factors is called the ith communality.
   ➢ The portion of 𝑽𝒂𝒓 (𝑿𝒊 ) = 𝝈𝒊𝒊 due to the specific factor is often called the
     uniqueness, or specific variance.
   ➢ Denoting the ith communality by ℎ2 , we that
                                           2     2              2
                            𝜎
                            ⏟𝑖𝑖          =⏟
                                          𝑙𝑖1 + 𝑙𝑖2 + ⋯ … … . +𝑙𝑖𝑚 +           𝛹
                                                                               ⏟𝑖
                        𝑉𝑎𝑟 (𝑋𝑖 )    =        𝑐𝑜𝑚𝑚𝑛𝑎𝑙𝑖𝑡𝑦      +         𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
        or
                                    ℎ𝑖2 = 𝑙𝑖1
                                           2     2
                                              + 𝑙𝑖2             2
                                                    + ⋯ … . . +𝑙𝑖𝑚
        and
                                  𝜎𝑖𝑖 = ℎ2 + 𝛹𝑖          𝑖 = 1.2 … … . . , 𝑝
The ith communality is the sum of squares of the loadings of the ith variable on the
m common factor.
Example: (Verify that the relation ∑= 𝐿𝐿′ + 𝜳 for two factors) Consider the
covariance matrix
                         19 30 2 12
                         30 57 5 23
                      ∑=[            ]
                          2  5 38 47
                         12 23 47 68
The equality
        19     30 2 12      4          1                 2             0    0   0
        30     57 5 23      7          2 4    7 −1 1     0             4    0   0
       [                ]=[              ][          ]+[                          ]
         2     5 38 47     −1          6 1    2 6 8      0             0    1   0
        12     23 47 68     1          8                 0             0    0   3
or
            ∑= 𝐿𝐿′ + 𝜳
may be verified by matrix algebra. Therefore, ∑ has the structure produced by an
𝑚 = 2 orthogonal factor model. Since
                              𝑙11 𝑙12      4          1
                              𝑙    𝑙       7          2
                         𝐿 = [ 21 22 ] = [              ]
                              𝑙31 𝑙32     −1          6
                              𝑙41 𝑙42      1          8
                      𝜓1 0       0    0      2         0     0   0
                       0 𝜓2 0         0      0         4     0   0
                   𝜳=[                  ]=[                        ]
                       0    0 𝜓3 0           0         0     1   0
                       0    0    0 𝜓4        0         0     0   3
the communality of 𝑋1 is,
                            ℎ12 = 𝑙11
                                   2     2
                                      + 𝑙12 = 42 + 12 = 17
and the variance of 𝑋1 can be decomposed as
                             2    2 )
              𝜎11       = (𝑙11 + 𝑙12   +           𝜓1 = ℎ12 +          𝜓1
          19
          ⏟       =         ⏟2
                            4         +       12          + ⏟              2
       𝑣𝑎𝑟𝑖𝑛𝑎𝑐𝑒               𝑐𝑜𝑚𝑚𝑢𝑛𝑎𝑙𝑖𝑡𝑦             +      𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
A similar breakdown occurs for the other variables.
Methods of Estimation
  ➢ The two most popular methods of parameter estimation are
      (i)   the Principal Component (and the related principal factor) method,
      (ii) the Maximum Likelihood Method.
  ➢ The solution from either can be rotated in order to simplify the
    interpretation of factors.
Principal Component Solution of the Factor Model
  ➢ The principal component factor analysis of the sample covariance matrix S
    is specified in terms of its eigenvalue-eigenvector pairs
           (𝜆̂1 , 𝑒̂1 ), (𝜆̂2 , 𝑒̂2 ), … … … , (𝜆̂𝑝 , 𝑝), where 𝜆̂1 ≥ 𝜆̂2 ≥, … … … . . , ≥ 𝜆̂𝑝 .
  ➢ Let m < p be the number of common factors.
  ➢ Then the matrix of estimated factor loadings {𝑙̃𝑖𝑗 } is given by
             𝐿̃ = [√𝜆̂1 𝑒̂1 ⋮ √𝜆̂2 𝑒̂2 ⋮ ⋯ … … … … ⋮ √𝜆̂𝑚 𝑒̂𝑝 ] … … … … (1)
  ➢ The estimated specific variances are provided by the diagonal elements of the
               ̃′ , so
    matrix 𝑆 − 𝐿𝐿
                     𝜓̃1 0 ⋯ 0
                           ̃
              𝛹̃ = 0 𝜓2 ⋯ 0 with 𝜓̃𝑖 = 𝑠𝑖𝑖 − ∑𝑚                 ̃2
                                                            𝑖=1 𝑙𝑖𝑗 … … … . (2)
                      ⋮    ⋮    ⋱     ⋮
                    [0     0 ⋯ 𝜓̃𝑝 ]
  ➢ Communalities are estimated as
                    ℎ̃𝑖2 = 𝑙̃𝑖1
                             2
                                + 𝑙̃𝑖2
                                    2
                                       + ⋯ … … . +𝑙̃𝑖𝑚
                                                    2
                                                       … … … … . . (3)
  ➢ The principal component factor analysis of the sample correlation matrix is
    obtained by starting with R in place of S.
  ➢ Ideally, the contributions of the first few factors to the sample variances of
    the variables should be large.
          ✓ The contribution to the sample variance 𝑠𝑖𝑖 from the first common
            factor is 𝑙̃𝑖1
                        2
                           . The contribution to the total sample variance,
                              𝑠11 + 𝑠22 + ⋯ … . +𝑠𝑝𝑝 = 𝑡𝑟(𝑆),
              from the first common factor is then
                                                          ′
                𝑙̃11
                  2
                     + 𝑙̃21
                         2
                            + ⋯ … … . +𝑙̃𝑝1
                                         2
                                            = (√𝜆̂1 𝑒̂1 ) (√𝜆̂1 𝑒̂1 ) = 𝜆̂1
              sine the eigenvector 𝑒̂1 has unit length.
In general,
                                𝜆̂𝑗
 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙                       𝑓𝑜𝑟 𝑎 𝑓𝑎𝑐𝑡𝑜𝑟 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠 𝑜𝑓 𝑆
                       𝑠11 + 𝑠22 + ⋯ + 𝑠𝑝𝑝
( 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 ) =
  𝑑𝑢𝑒 𝑡𝑜 𝑗𝑡ℎ 𝑓𝑎𝑐𝑡𝑜𝑟       𝜆̂𝑗
                                        𝑓𝑜𝑟 𝑎 𝑓𝑎𝑐𝑡𝑜𝑟 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠 𝑜𝑓 𝑅
                      { 𝑝
   ➢ Criterion above is frequently used as a heuristic device for determining the
     appropriate number of common factors.
   ➢ The number of common factors retained in the model is increased until a
     "suitable proportion" of the total sample variance has been explained.
Example: (Factor analysis of consumer-preference data) In a consumer-preference
study, a random sample of customers were asked to rate several attributes of a new
product. The responses, on a 7-point semantic differential scale, were tabulated and
the attribute correlation matrix constructed. The correlation matrix is presented next:
               Attribute (variable)           1    2     3   4    5
                𝑇𝑎𝑠𝑡𝑒                   1 1.00 . 02 . 96 . 42 . 01
         𝐺𝑜𝑜𝑑 𝑏𝑢𝑦 𝑓𝑜𝑟 𝑚𝑜𝑛𝑒𝑦             2 . 02 1.00 . 13 . 71 . 85
                𝐹𝑙𝑎𝑣𝑜𝑟                  3 . 96 . 13 1.00 . 50 . 11
          𝑆𝑢𝑖𝑡𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑠𝑛𝑎𝑐𝑘            4 . 42 . 71 . 50 1.00 . 79
        𝑃𝑟𝑜𝑣𝑖𝑑𝑒𝑠 𝑙𝑜𝑡𝑠 𝑜𝑓 𝑒𝑛𝑒𝑟𝑔𝑦         5 [ . 01 . 85 . 11 . 79 1.00]
   ➢ It is clear from the shaded entries in the correlation matrix that variables 1 and
     3 and variables 2 and 5 form groups.
   ➢ Variable 4 is "closer" to the (2, 5) group than the (1, 3) group.
     ➢ Given these results and the small number of variables, we might expect that
       the apparent linear relationships between the variables can be explained in
       terms of, at most, two or three common factors.
     ➢ The first two eigenvalues, 𝜆̂1 = 2.85 and 𝜆̂2 = 1.81, of R are the only
       eigenvalues greater than unity.
     ➢ Moreover, m = 2 common factors will account for a cumulative proportion
                            𝜆̂1 + 𝜆̂1 2.85 + 1.81
                                     =            = .93
                                𝑝          5
         of the total (standardized) sample variance. The estimated factor loadings,
         communalities, and specific variances, obtained using (1), (2), and (3), are
          given in the following Table.
Variable                                Estimated     Communalities    Specific
                                    factor loadings       ℎ̃𝑖
                                                            2         variances
                                     𝑙̃𝑖𝑗 = √𝜆̂𝑖 𝑒̂𝑖𝑗               𝜓̃𝑖 = 1 − ℎ̃𝑖2
                                        𝐹1      𝐹2
1.   𝑇𝑎𝑠𝑡𝑒                             .56     .82          .98              .02
2.   Good buy for money                .78    -.53          .88              .12
3.   Flavor                            .65     .75          .98              .02
4.   Suitable for snack                .94    -.10          .89              .11
5.   Provides los of energy            .80    -.54          .93              .07
     Eigenvalues                      2.85    1.81
     Cumulative proportion of         .571    .932
     total (standardized) sample
     variance
Now
                     . 56 . 82
                     . 78 −.53
             ̃ +𝜳
             𝐿𝐿′ ̃ = . 65 . 75 [. 56 . 78 . 65 . 94 . 80
                                                         ]
                                . 82 −.53 . 75 −.10 −.54
                     . 94 −.10
                    [. 80 −.54]
              . 02 0    0   0   0      1.00 . 01 . 97 . 44 . 00
                0 . 12 0    0   0           1.00 . 11 . 79 . 91
            + 0     0 . 02 0    0 =              1.00 . 53 . 11
                0   0   0 . 11 0                      1.00 . 81
             [ 0    0   0   0 . 07 ] [                     1.00]
nearly reproduces the correlation matrix R. Thus, on a purely descriptive basis, we
would judge a two-factor model with the factor loadings displayed in the Table as
providing a good fit to the data. The communalities (.98, .88, .98, .89, .93) indicate
that the two factors account for a large percentage of the sample variance of each
variable