B RIEFINGS IN BIOINF ORMATICS . VOL 8. NO 1. 32^ 44                                                                               doi:10.
1093/bib/bbl016
Advance Access publication May 26, 2006
Partial least squares: a versatile tool
for the analysis of high-dimensional
genomic data
Anne-Laure Boulesteix and Korbinian Strimmer
Abstract
Partial least squares (PLS) is an efficient statistical regression technique that is highly suited for the analysis
                                                                                                                                                           Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
of genomic and proteomic data. In this article, we review both the theory underlying PLS as well as a host of
bioinformatics applications of PLS. In particular, we provide a systematic comparison of the PLS approaches
currently employed, and discuss analysis problems as diverse as, e.g. tumor classification from transcriptome data,
identification of relevant genes, survival analysis and modeling of gene networks and transcription factor activities.
Keywords: partial least squares (PLS); high-dimensional genomic data; gene expression; classification; dimension reduction
INTRODUCTION                                                                      econometric path modeling, and was subsequently
In the last few years, multivariate statistical methods                           adopted by his son Svante Wold (and many others)
for the analysis of high-dimensional genomic data                                 in the 1980s for regression problems in chemometric
have been the subject of numerous publications in                                 and spectrometric modeling. Early references on
statistics, machine learning, bioinformatics and                                  path modeling are, e.g. Wold [1–3]. One of the first
biology. A challenging problem connected with                                     applications of PLS to regression is Wold et al. [4].
these data is that they contain typically many more                               Two recent studies [5, 6] describe these early
variables ( p, genes and features) than observations                              developments and provide a detailed chronological
(n, gene chips and time points). For instance, it is not                          overview. PLS is still a highly active research area
uncommon to collect expression data for 20 000                                    from a theoretical point of view; see for instance [7]
genes using only 10–20 microarrays. Since many                                    for recent developments on the connections of PLS
traditional multivariate methods are not applicable in                            with Krylov subspaces and conjugate gradients.
this case, predicting, e.g. the survival time or the                              PLS started to attract the attention of statisticians
tumor class of a patient with such high-dimensional                               only about 15 years ago—see e.g. [8–11]. This was
data is a difficult and challenging task that requires                            mainly due to the ability of PLS to work very well
special techniques such as variable selection or                                  for data with very small sample sizes and a large
dimension reduction.                                                              number of parameters. Thus, it is only natural that in
    In this article, we survey the application of partial                         the last few years this methodology is being successfully
least squares (PLS), a powerful yet comparatively                                 applied to problems in genomics and proteomics.
unknown approach for analyzing high-dimensional                                      PLS methods are in general characterized
data, to problems in bioinformatics and genomics.                                 by high computational and statistical efficiency.
The PLS method was first developed by Herman                                      They also offer great flexibility and versatility in
Wold in the 1960s and 1970s to address problems in                                terms of the analysis problems that may be addressed.
Corresponding author. Anne-Laure Boulesteix, Department of Medical Statistics and Epidemiology, Technical
University of Munich, Ismaningerstrasse 22, D-81675 Munich, Germany. Tel: +49 89 4140-4347; Fax: +49 89 4140-4840;
E-mail: anne-laure.boulesteix@tum.de
Anne-Laure Boulesteix is a post-doctoral researcher and consultant in biostatistics at the Technical University of Munich.
She received her PhD in statistics in 2005 from the University of Munich, and is generally interested in computational statistics and
high-dimensional multivariate data analysis.
Korbinian Strimmer is heading the ‘Information Theory and Bioinformatics’ group at the Department of Statistics of the University
of Munich. His research focuses on statistical learning procedures, complex networks and statistical genomics.
ß The Author 2006. Published by Oxford University Press. For Permissions, please email: journals.permissions@oxfordjournals.org
                                      Partial least squares for genomics analyses                                        33
However, the literature of PLS is very diverse                                                    1X n
                                                                                     yi ¼ y0i         y0
because of the existence of a large number of                                                     n s¼1 s
algorithmic variants of PLS, which render it very
difficult to understand the principles underlying PLS.           The xi ¼ (xi1, . . . , xip)T are collected in the n  p
It is the aim of this article to fill this gap by, firstly,   matrix X. Similarly, Y is the n  q matrix containing
providing a systematic overview of the available PLS          the yi ¼ (yi1, . . . , yiq)T:
                                                                                0  1                        0
                                                                                                            1
methods and, secondly, reviewing the broad range of                            xT1                    yT1
their applications to genome data.                                        X ¼ @...A         and Y ¼ @ . . . A:
    The remainder of the article is structured as                              xTn                    yTn
follows. In ‘Methodological Foundations of Partial            When n < p, the usual regression tools such as
Least Squares’ section, we summarize the main
                                                              classical linear regression, which is often denoted as
methodological aspects of PLS regression. In
                                                              ordinary least squares (OLS), cannot be applied since
‘Applications of Partial Least Squares to High-
                                                                                                                                Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
                                                              the p  p covariance matrix XTX (which can have
dimensional Genomic Data’ section, various appli-
                                                              a maximum rank n  1) is singular. In contrast,
cations of PLS regression to microarray studies are
                                                              PLS may be applied also to cases in which n < p. PLS
reviewed. ‘Outlook and Generalizations of PLS’
                                                              regression is based on the basic latent component
section is devoted to PLS-based methods that are
                                                              decomposition:
especially designed for particular types of response
variables (for instance, survival time or categorical                                Y ¼ TQT þ F,                        ð1Þ
outcome) and to their practical use in microarray                                                 T
                                                                                     X ¼ TP þ E,                         ð2Þ
data analysis. A recapitulation of the notations
and abbreviations that are used throughout the                where T is a n  c matrix giving the latent
manuscript can be found in the appendix.                      components for the n observations, P (of size p  c)
                                                              and Q (of size q  c) are matrices of coefficients and
                                                              E (of size n  p) and F (of size n  q) are matrices of
                                                              random errors. Note that if the given matrices T, P
METHODOLOGICAL                                                and Q satisfy Equations (1) and (2), then so do
FOUNDATIONS OF PARTIAL                                        T* ¼ TM, P* ¼ P(M1)T and Q* ¼ Q(M1)T for
LEAST SQUARES                                                 any non-singular c  c matrix M. Thus, the space
In this section, we provide an introduction into the          spanned by the columns of T is more important than
mathematics of PLS. In a nutshell, PLS is a dimen-            the columns of T themselves.
sion reduction approach that is coupled with                     PLS as well as principal component regression and
a regression model. Unlike in similar approaches              reduced rank regression can all be seen as methods to
such as principal component regression, the latent            construct a matrix of latent components T as a linear
components obtained by PLS are chosen with the                transformation of X:
response variable of the regression kept in mind.                                       T ¼ XW,                          ð3Þ
                                                              where W is a p  c matrix of weights. In the
PLS regression                                                remainder of the article, the columns of W and T are
Suppose we want to predict q continuous response              denoted as wi ¼ (w1i, . . ., wpi)T and ti ¼ (t1i, . . ., tni)T,
variables Y1, . . . , Yq using p continuous predictor         respectively, for i ¼ 1, . . ., c. For a fixed matrix W,
variables X1, . . . , Xp. The available data sample           the random variables obtained by forming the
consisting of n observations is denoted as                    corresponding linear transformations of X1, . . ., Xp
ðx0i , y0i Þi¼1, ..., n , where x0i and y0i denote the ith    are denoted as T1, . . ., Tc:
observation of the predictor and response variables,
                                                                             T1 ¼ w11 X1 þ . . . þ wp1 Xp ,
respectively. The prime denotes uncentered basic
                                                                             ... ¼            ...
data, as in [9]. Their removal indicates the subtrac-
tion of the sample average, i.e.                                              Tc ¼ w1c X1 þ . . . þ wpc Xp :
                                 1X n                            The latent components are then used for predic-
                    xi ¼ x0i         x0                      tion in place of the original variables: once T is
                                 n s¼1 s
34                                                 Boulesteix and Strimmer
constructed, QT is obtained as the least squares                    These four different levels are connected as
solution of Equation (1):                                       follows:
                      QT ¼ ðTT TÞ1 TT Y:
                                                                 The same W matrix can maximize several
   Finally, the matrix B of regression coefficients for           objective functions. But a given objective function
the model Y ¼ XB þ F is given as                                  is generally satisfied by only one W matrix (and its
                                                                  opposite–W).
             B ¼ WQT ¼ WðTT TÞ1 TT Y,
                                                                 There might be several algorithms that output the
and the fitted response matrix Y“ may be written as               same W matrix.
                                                                 A given W matrix leads to only one possible matrix
                     ^ ¼ TðTT TÞ1 TT Y:
                     Y
                                                                  of regression coefficients. But two different matrices
   If we have a new (uncentered) raw                              W and W* can lead to the same regression
observation x00 , the prediction y^ 00 of the response is         coefficients if there exists an invertible c  c matrix
                                                                                                                            Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
given by                                                          M such that W* ¼ WM. Note that, although W
                                                                  and W* lead to the same prediction, they do not
                     1X n
                                          1X n
                                                                  necessarily satisfy the same objective function.
           y^ 00 ¼         y0i þ BT ðx0       x0 Þ:
                     n i¼1                n i¼1 i
                                                                Univariate response
   In PLS, dimension reduction and regression are               In this section, the case of univariate response
performed simultaneously, i.e. PLS outputs the                  variables (q ¼ 1) is considered. Thus, Y is a n  1
matrix of regression coefficients B as well as                  matrix, i.e. a vector of length n. Y1 is denoted as Y
the matrices W, T, P and Q, and hence the term                  in the present section. For a fixed-weight vector
PLS regression. In the PLS literature, the columns              wi ¼ (w1i, . . ., wpi)T, the sample covariance between
of T are often denoted as ‘latent variables’ or ‘scores’.       the response variable Y and the random variable
In this study, we prefer the term ‘latent components’,          Ti ¼ w1iX1 þ . . . þ wpiXp can be computed as
since in PLS the columns of T are rather the result                                         1
of a matrix decomposition than observations                                     b
                                                                               COVðY, Ti Þ ¼ wTi XT Y,
                                                                                            n
of underlying random variables. P and Q are
often denoted as ‘X-loadings’ and ‘Y-loadings’,                 since the matrices X and Y contain the centered
respectively.                                                   data. Similarly, for the sample variance of the random
   The basic idea of the PLS method is that the                 variable Ti, we have
response Y should be taken into account for the                                                      1
                                                                             Vb
                                                                              AR ðTi Þ ¼ wTi XT Xwi ¼ tTi ti
construction of the components T. More precisely,                                                    n
the components are defined such that they have                  and for the sample covariance between Ti and Tj
high covariance with the response, as outlined in               (i 6¼ j, i, j ¼ 1, . . ., c),
‘Univariate response’ and ‘Multivariate response’
                                                                          b             1 T T      1 T
sections. That is why PLS is called a supervized                         COVðTi , Tj Þ ¼ wi X Xwj ¼ ti tj :
method in contrast to, e.g. principal component                                         n          n
analysis (PCA), which does not use the response                    In PLS univariate regression, there is only one
for the construction of the new components. This                commonly adopted objective function. The columns
feature explains why PLS usually performs better                w1, . . ., wc of the p  c weight matrix W are defined
than PCA in prediction problems.                                such that the squared sample covariance between Y
   The characterization of the various PLS regression           and the latent components is maximal under the
approaches might be done at four different levels:              condition that the latent components are mutually
                                                                empirically uncorrelated. Moreover, the vectors
 the objective function maximized by the W                     w1, . . ., wc are constrained to be of unit length.
  matrix,
 the W matrix itself,                                          Objective function 1: Univariate PLS (PLS1)
 the obtained matrix of regression coefficients B              For i ¼ 1, . . ., c,
  and
 the algorithm used to compute W.                                           wi ¼ argmaxw wT XT YYT Xw,
                                     Partial least squares for genomics analyses                                  35
subject to wTi wi ¼ 1 and tTi tj ¼ wTi XT Xwj ¼ 0, for        subject      to      wTi ðIp  WWþ Þwi ¼ 1         and
                                                               T       T T
j ¼ 1, . . ., i  1,where c is the number of latent           ti tj ¼ wi X Xwj ¼ 0, for j ¼ 1, . . ., i  1, where Ip
components fixed by the user. The maximal                     denotes the p  p identity matrix and Wþ is the
number of such latent components that have non-               unique Moore–Penrose inverse of W.
zero covariance with Y is cmax ¼ min (n  1, p). The              The second important variant of multivariate
weight vectors w1, . . ., wc can be computed sequen-          regression is SIMPLS, which was first introduced
tially via a simple and fast non-iterative algorithm          by de Jong [14]. In contrast to PLS2, SIMPLS was first
given, e.g. in [12] and denoted as ‘algorithm with            developed as an optimality problem. Algorithms were
orthogonal scores’ because the matrix TTT is                  then developed to solve this optimality problem.
diagonal. Martens and Naes [12] also give another
algorithm denoted as ‘algorithm with orthogonal               Objective function 3: SIMPLS
loadings’, which outputs a different W matrix. Using          For i ¼ 1, . . ., c,
this algorithm, one obtains orthogonal loadings                              wi ¼ argmaxw wT XT YYT Xw,
                                                                                                                        Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
instead of orthogonal latent components (PTP is
diagonal but not TTT). It can be shown [8] that the           subject to wTi wi ¼ 1 and tTi tj ¼ wTi XT Xwj ¼ 0, for
resulting regression coefficients in matrix B are the         j ¼ 1,. . ., i  1,
same with both algorithms. Since the orthogonal                   The term wTXTYYTXw which is maximized
latent components are easier to interpret than                by both PLS2 and SIMPLS is the same as in the
orthogonal loadings, the first algorithm is almost            univariate case. In the case of a multivariate response
always preferred in the literature. Some statistical          (q > 1), it can be reformulated as the sum of the
aspects of PLS1 regression are discussed by, e.g.             squared empirical covariances between T and
[9–11]. From a practical point of view, the objective         Y1, . . ., Yq
function of PLS1 can be interpreted as follows. From                                       T      
                                                                      wT XT YYT Xw ¼ ðXwÞT Y ðXwÞT Y
Equation (4), it is clear that the components
                                                                                         X
                                                                                         q                                                                                                  
constructed in PLS1 have maximal covariance with                                   ¼n2     b T, Yj 2 ,
                                                                                           COV
the response and thus have high predictive power.                                              j¼1
Moreover, they are not redundant since mutually
uncorrelated. The case of multivariate response               where T is the random variable corresponding to
(q > 1) is presented in the following section.                the latent component t ¼ Xw. Note that SIMPLS
                                                              can be seen as a generalization to multivariate
Multivariate response                                         response variables of univariate PLS because it has
The case of a multivariate response is more difficult to      the same criterion wTXTYYTXw and the same
handle since one has to find latent components which          constraints. Another equivalent objective function
explain all the responses Y1, . . ., Yq simultaneously.       for SIMPLS is often found in the literature, which
There are two main variants for multivariate PLS              involves weight vectors for both the response
regression. The first variant is usually denoted as PLS2      variables and the predictor variables. Based on this
in contrast to the univariate method PLS1, or simply          formulation, it becomes clear that PLS is connected
PLS. To avoid misunderstandings, we use the term              to classical canonical correlation analysis (CCA). The
PLS2. The W matrix corresponding to PLS2 may be               main difference between the two approaches is that
obtained via several algorithms. The most well-               PLS does not maximize correlations but covariances.
known are the Nonlinear Iterative Partial Least               Thus, PLS does not require the inversion of a p  p
Squares (NIPALS) algorithm and the Kernel-PLS                 covariance matrix, in contrast to CCA. This feature
algorithm, which are implemented in the R packages            makes it appropriate for the analysis of high-
pls and pls.pcr. Recently, ter Braak and de Jong              dimensional data. It can be shown using results
[13] discovered that the PLS2 maximizes the same              from linear algebra [15] that the objective functions 3
expression as Statistically Inspired Modification of PLS      and 4 are equivalent.
(SIMPLS) but with different and less intuitive constraints.
                                                              Objective function 4: SIMPLS
Objective function 2: PLS2                                    (equivalent formulation)
For i ¼ 1, . . ., c,                                          For i ¼ 1, . . ., c
             wi ¼ argmaxw wT XT YYT Xw,                                     ðwi , ui Þ ¼ argmaxw, u wT XT YT u,
36                                          Boulesteix and Strimmer
subject    to     wTi wi ¼ uTi ui ¼ 1  and    tTi tj ¼       suggest a penalized version of PLS regression
   T T
wi X Xwj ¼ 0, for j ¼ 1, . . ., i  1.                       (PPLS), which eliminates genes with poor predic-
    As for PLS2, there exist several algorithms that         tion power. Their method is based on the
solve the optimality problem of SIMPLS. One of               shrinkage of the p regression coefficients obtained
them is implemented in the function simpls from              by PLS regression. After the shrinkage procedure,
the R package pls.pcr. A particularity of the R              a number of genes (depending on the shrinkage
function simpls is that it returns unit length scores        parameter ) do not contribute anymore to the
instead of unit length weights (as one would expect          model. Huang et al. [20] suggest to use cross-
when considering objective function 3). By trans-            validation for the selection of both the shrinkage
forming the weights to have unit length, one obtains         parameter  and the number c of latent
weights satisfying objective function 3. A user-             components used to produce the regression
friendly version of SIMPLS implementing this                 coefficients.
transformation can be found in the R package                PLS regression is used by Johansson et al. [21] to
                                                                                                                    Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
plsgenomics [16].                                            identify periodically expressed genes. Johansson
                                                             et al. [21] construct a virtual response Y that
                                                             represent cyclic behavior with the same periodicity
APPLICATIONS OF PARTIAL LEAST                                as the cell cycle. The genes that contribute
SQUARES TO HIGH-DIMENSIONAL                                  significantly to the PLS regression model are
GENOMIC DATA                                                 then interpreted as cell-cycle regulated.
Regression problems                                         Applications of PLS multivariate regression to
Any genomic analysis that incorporates a regression          other types of data include the prediction of
model may profit from the application of PLS. Some           transcription factor activities from combined
important recent examples are briefly reviewed in            analysis of gene expression data and chromatin
this section.                                                immunoprecipitation (ChIP) data as proposed by
                                                             Boulesteix and Strimmer [16]. The transcription of
 A straightforward application of univariate PLS            genes is regulated by DNA binding proteins,
  regression to expression data from yeast                   which are known as transcription factors. An issue
  Saccharomyces cerevisiae can be found in [17]. In          of interest for biologists is the estimation of the
  this study some handpicked gene expression levels          activity levels of these transcription factors.
  are regressed against expression levels of other           Available data material include microarray data
  genes using PLS1 with different numbers of latent          for the potential target genes under different
  components. The magnitude of the obtained                  experimental conditions, and ‘connectivity’ data
  regression coefficients are interpreted in terms of        (e.g. ChIP data) giving the amount of interaction
  interaction strength between genes.                        between the transcription factors and the con-
 PLS regression has also been successfully applied          sidered genes. Boulesteix and Strimmer [16]
  to missing values imputation in microarray data            assume as the relationship between microarray
  by Bras and Menezes [18]. In this approach, the            data and connectivity data the linear structure
  missing values are imputed by PLS regression               Y ¼A þ XB þ F, where Y is the n  q constant
  using all the genes with observed values as pre-           matrix containing the expression levels of n genes
  dictors. Another reference on PLS imputation in            (rows) in q conditions (columns), X is the n x p
  the context of microarray data is Nguyen et al. [19].      matrix containing the connectivity information
 Huang et al. [20] use PLS regression for a                 for n genes (rows) and p transcription factors
  prediction purpose. The aim is to model a                  (columns), A is a n  q matrix corresponding to
  continuous variable (LVAD support time) using              the intercepts and E is a n  q error matrix. The
  p gene expression levels as predictors. LVAD               p  q matrix B corresponds to the activity levels of
  stands for ‘left mechanical ventricular assist device’     the p transcription factors in the q considered
  and is a successful substitution therapy for heart         conditions. Thus, the estimation of the transcrip-
  failure patients waiting for transplantation.              tion factor activities can be formulated as a simple
  Although PLS regression can handle a very large            regression problem that is solved in [16] by
  number of predictors and can thus be applied to            employing the SIMPLS method. Using PLS in
  this problem without adaptation, Huang et al. [20]         this context allows not only to extract information
                                   Partial least squares for genomics analyses                                    37
  on the transcription factors activities but also to         (ii) estrogen receptor positive versus negative
  identify coherent ‘meta-factors’ corresponding to           tumors and (iii) tumor type are predicted via
  the different latent components.                            PLS discriminant analysis.
 Other applications of PLS to regression problems           PLS regression is also employed for multiclass
  in genomic data analysis include, e.g. the predic-          classification in [30] for the molecular diagnostic
  tion of the protein structure (e.g. the helix or            of cancer. Using the software SIMCA, they
  strand content using high-dimensional sequence              performed classification with the National
  data [22]).                                                 Cancer Institute (NCI) data set [31] giving the
                                                              expression levels of 9605 genes in 60 tumor cell
Classification problems                                       lines of eight different types (leukemia, non-
The example above considered only the case of                 small-cell lung, colon, melanoma, ovarian, breast,
continuous response variables Y. In many studies,             central nervous system and renal).
however, the response to be predicted is categorical.        Other classification studies based on PLS regres-
                                                                                                                        Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
In other words, Y may take only one of K possible             sion can be found in [32–36]. A similar approach
unordered values Y ¼ 0, . . ., K  1. For instance,           based on PLS regression to perform classification
Y could be the tumor type of a particular cancer              in the context of meta-analysis is suggested in [37].
patient. If Y is multicategorical (K > 2), it has to be
transformed before PLS dimension reduction.                    There exists another route to classification using
A simple transformation method consists to convert          partial least squares, first proposed by Nguyen and
Y into K  1 random variables Y1, . . ., YK  1 defined     Rocke [38, 39] and further studied by Boulesteix
as follows:                                                 [40] and compared with other dimension reduction
                                                            techniques in [41]. This approach first employs PLS
                  Yj ¼1 if Y ¼ j,                           as a dimension reduction method and subsequently
                     ¼0 otherwise:                          uses the PLS latent components as predictors in a
                                                            classical discrimination method (e.g. logistic regres-
   Using this transformation, it can be shown that
                                                            sion, linear or quadratic discriminant analysis).
multivariate PLS dimension reduction (almost) leads
                                                            To apply this method, one has to choose (i) the
to the same components as PCA performed on the
                                                            number of latent components to be extracted in the
between-group sample covariance matrix. A collec-
                                                            dimension reduction step and (ii) the classification
tion of properties on this topic as well as mathe-
                                                            method to be used for the classification step.
matical proofs are given in [23]. These properties can
                                                               In Nguyen and Rocke [38, 39], three classifica-
be seen as a justification of PLS dimension reduction
                                                            tion methods are studied: logistic regression, linear
with categorical variables. Recently, many research-
                                                            discriminant analysis and quadratic discriminant
ers have considered the PLS methods for
                                                            analysis. In [40], the only investigated classification
classification:
                                                            method is linear discriminant analysis. Generally, linear
 In two independent comparative studies by Man             discriminant analysis (LDA) turns out to yield the
  et al. [24] and Huang et al. [25], classification based   best classification performance, whereas quadratic
  on PLS regression is reported to lead to high             discriminant analysis gives worse results. In the
  prediction accuracy.                                      extensive comparison study performed by
 PLS classification analysis for binary response has       Boulesteix [40], which included many currently
  been investigated by Huang and Pan [26] for               employed methods, PLSþLDA turns out to range
  leukemia [27] and colon cancer data [28]. Each            among the best classification procedures for all the
  observation is assigned to one of the two classes         eight studied cancer data sets. According to this
  0 or 1, depending on the continuous prediction.           study, the most successful other methods are the
  Huang and Pan [26] suggest to determine the best          nearest centroids approach by Tibshirani et al. [42]
  number of latent components by leave-one-out              and the support vector machines.
  cross-validation.
 A similar approach is used in a more applied study        Feature selection
  by Perez-Enciso and Tenenhaus [29]: various               An issue that is tightly connected with the prediction
  binary outcomes such as (i) before versus after           of a clinical outcome is the identification of
  chemotherapy treatment in a case-control study,           genes whose expression levels are associated with
38                                                              Boulesteix and Strimmer
the considered outcome. For instance, a physician                                A gene selection approach based on several PLS
might want to find out which genes have different                            latent components is applied to gene expression data
expression levels in tumor tissues and normal                                by Musumarra et al. [30, 43]. It is based on all the
tissues. The selection of relevant genes is important                        weight vectors w1, . . ., wc and implemented in the
both for biologists who aim to understand                                    software package SIMCA. The ’variable influence’
the function of genes and the cell processes                                 VINj of gene j for the -th PLS component is
and for statisticians who want to apply statistical                          defined as a function of w2j and the proportion of
methods which can handle a restricted number                                 the sum of squares explained by the -th latent
of variables.                                                                component. Finally, the genes are ordered according
    In the case of PLS1 dimension reduction (see                             to their ‘variable importance in the projection’ VIPj,
‘Univariate response’ section) applied to binary                             which is defined for each gene j as the sum of
classification problems (see ‘Classification problems’                       the VINj over the c PLS latent components. An
section), the weight vector w1 ¼ (w11, . . ., wp1)T                          advantage of this approach is that it captures
                                                                                                                                        Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
defining the first latent component may be used                              information on the single genes from all the PLS
to order the p genes in terms of their relevance                             latent components included in the analysis. Thus,
for the classification problem [40]. Let Fj denote the                       it can also discover non-linear patterns which the
F-statistic used in analysis of variance and computed                        F-statistic would fail to detect. A major drawback of
from X for gene j as:                                                        the VIP index is its lack of theoretical background.
                   P           P                            2            One might investigate its connections to the matrix
                      1
                          k¼0        i:yi ¼k       xkj  xj                  of regression coefficients.
        Fj ¼ ðn  2Þ P         P                             2  ,
                          1
                          k¼0       i:yi ¼k        xij  xkj
                                                                             Survival analysis
where                                                                        Another issue of interest in the statistical analysis
                                                                             of gene expression data is the prediction of the
                          1X n
                                                                             survival time Y of diseased patients using their gene
                  xj ¼          xij ¼ 0
                          n i¼1                                              expression profiles. In this context, survival data are
                                                                             usually denoted as a triple (t, , x), where:
and
                           1 X                                                t is a continuous variable usually called failure time
                   xkj ¼             xij ,                                     which equals the time to death Y if  ¼ 1 or the
                           nk i:y ¼k
                                 i
                                                                               time to censoring if  ¼ 0,
                                                                               is a binary variable, which equals 1 if the death of
with nk denoting the number of observations from
                                                                               the patient was observed before censoring and 0 if
class k in the sample. Fj is often used as a selection
                                                                               the patient was still alive at the end of the study,
criterion to order genes in terms of their relevance
                                                                              x ¼ (X1, . . ., Xp)T is a vector of p continuous gene
for the classification problem. Boulesteix [40] proves
                                                                               expression levels which are considered as predictor
that Fj is a monotonic transformation of the squared
                                                                               variables.
weight coefficient w2j1 of PLS1 if the columns of the
predictor matrix X have been preliminarily scaled                                Standard approaches to predict survival times
to unit variance. Thus, the ordering of the genes                            using continuous predictors such as the proportional
obtained from the weight vector w1 is equivalent                             hazard regression model (PH model) by Cox [44]
to the ordering obtained using the F-statistic, which                        may not be applied directly if n < p. Various
is one of the most common ordering criteria in                               approaches based on the clustering of genes or
microarray data analysis. It shows that PLS dimension                        observations have been proposed, with the incon-
reduction and variable selection are in fact two                             venience that the results depend on the chosen
tightly related procedures and also indicates that PLS                       clustering algorithm. PLS-based survival analysis is
methods take more information into account than                              another important family of methods for survival
usual univariate gene selection procedures, since they                       analysis with many predictors.
often involve more than one latent component.                                    Nguyen and Rocke [45] suggest a two-stage
Similar results might also be obtained in the                                method that (i) performs univariate PLS with the
framework of regression.                                                     failure time as response variable and X1, . . ., Xp as
                                   Partial least squares for genomics analyses                              39
predictors and (ii) uses the obtained first latent          gpls
components as predictors in classical PH regression.         (http://cran.r-project.org/src/contrib/
They apply their approach to lymphoma data [46]              Descriptions/gpls.html)
giving the survival time and expression levels of 5622       This package implements the classification method
genes for 40 lymphoma patients and to breast cancer          using generalized PLS [52] mentioned in ‘PLS and
data [47] giving the survival time and expression            generalized linear Models’ section.
levels of 3846 genes for 49 breast cancer patients.         plss
In this two-step procedure, dimension reduction              (http://www.math.univ-montp2.fr/durand/
and prediction using PH regression are performed             ProgramSources.html)
successively. The specificity of the failure time is not     These programs implement PLS regression based
taken into account during the dimension reduction            on splines transformations of the predictors [53].
stage: it treats both time to death and time to              They work only under R for Windows.
censoring as the same continuous variable in the
                                                                                                                  Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
dimension reduction step, which is a severe draw-          Other software
back if censoring is non-negligible. Improvements of        Classification with PLS regression (PLS-DA),
this approach are proposed in [48–50]. These                 (DA, discriminant analysis) is implemented in the
approaches combine the construction of the succes-           software tool SIMCA.
sive PLS latent components with PH regression, but           (http://www.umetrics.com/default.asp/
in different ways. They are reviewed in ‘Outlook             pagename/software_simcap/c/3/).
and Generalizations of PLS’ section which deals with        The SAS procedure PLS implements several
PLS-based methods for special response variables.            dimension reduction methods such as PCR,
                                                             Reduced Rank Regression (RRR) and PLS.
Available software                                           The two main versions of multivariate PLS
There are currently four R packages that implement           (SIMPLS and PLS2) are available. For PLS2, one
partial least squares approaches:                            may specify the algorithmic variant as an option,
                                                             for instance NIPALS.
 plsgenomics                                                (http://support.sas.com/rnd/app/da/new/
  (http://cran.r-project.org/src/contrib/                    dapls.html)
  Descriptions/plsgenomics.html)                            The PLS Toolbox (by Eigenvector Research
  This package implements PLS regression (using              Incorporated) for use with MATLAB
  the function simpls from the pls.pcr                       (http://software.eigenvector.com/toolbox/3_5/
  package) with user-friendly features such as the           index.html)
  choice of the number of components. It also                includes a wide range of methods for multivariate
  implements the classification method PLSþLDA               statistical analysis, some of which are based on
  presented in ‘Classification problem’ section and          PLS regression. In particular, it includes the
  discussed by Nguyen and Rocke [38, 39] and                 function plsda, which performs classification
  Boulesteix [40] as well as the ridge PLS method            (class prediction) based on SIMPLS or PLS2
  [51] mentioned in ‘PLS and generalized linear              regression.
  models’ section.                                          The software tool Unscrambler
 pls.pcr                                                    (http://www.camo.com/rt/Products/
  (http://cran.r-project.org/src/contrib/                    Unscrambler/unscrambler.html)
  Descriptions/pls.pcr.html)                                 also implements multivariate PLS1 and multi-
  This package implements the two main variants of           variate regression (PLS2) and PLS-DA.
  multivariate PLS regression SIMPLS and PLS2 as
  well as PCR.
 pls                                                      OUTLOOK AND
  (http://cran.r-project.org/src/contrib/                  GENERALIZATIONS OF PLS
  Descriptions/pls.html)                                   So far, we have considered applications of
  This package is an extension of the earlier package      PLS regression to various biological problems.
  pls.pcr including, e.g. various plot functions           However, applying a regression method designed
  and a formula interface.                                 for continuous responses to categorical responses or
40                                                Boulesteix and Strimmer
performing dimension reduction with survival data              to replace a linear regression coefficient by a Cox
without taking censoring into account is unappeal-             regression coefficient also inspired another method
ing, although it is reported to give good results              denoted as ‘MPLS’: Nguyen [48] gives a different
in many cases. In this section, we review methods              non-sequential expression of the PLS1 latent com-
that use the principle of PLS regression but adapt it          ponents t1, . . ., tc involving eigenvectors of the
to handle special types of responses such as survival          matrices XTX and XXT (see [56] for details). This
time or categorical outcome. These methods can be              complex expression also contains a linear regression
divided into two categories. In the first category             coefficient, which Nguyen [48] replaces by a Cox
of methods, the structure of the univariate PLS                regression coefficient. The same approach is also
regression algorithm remains unchanged, but the                used in the context of binary classification [56] and
coefficients used to construct the latent components           denoted as ‘PLSM2’.
are modified. In the second category of methods,                   A related approach denoted as PLS logistic
the PLS algorithm is embedded into a complex                   regression is used in [57] to map complex trait
                                                                                                                          Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
generalized regression procedure. Both approaches              genes using gene expression data. In this setting, the
can be applied to, e.g. survival analysis and                  response is a categorical genetic trait and the latent
classification. In the following section, we consider          components t2, . . ., tc are constructed based on
only the univariate case, i.e. Y is a n  1 matrix             the regression coefficients estimated from a logistic
(n vector).                                                    regression model. Perez-Enciso et al. [57] demon-
                                                               strate the potentialities of this approach based on
Modification of the latent                                     an extensive simulation study.
components in PLS regression
Let us consider objective function 1. Some calcula-            PLS and generalized linear models
tion using the Lagrange multiplier method yields               Marx [58] proposes an extension of the concept of
                                                               PLS regression into the framework of generalized
                 t1 ¼ XXT Y=jjXT Yjj:
                                                               linear models. This approach, which is denoted as
   In the most usual PLS1 algorithm, the weight                iteratively reweighted partial least squares (IRPLS
vectors t2, . . ., tc are built sequentially in a similar      or IRWPLS), embeds the univariate PLS regression
way as t1, except that X and Y are replaced by                 algorithm into the iterative steps of the usual
deflated matrices. With tT1 ¼ ðt11 , . . . , tn1 Þ and xij     Iteratively Reweighted Least Squares algorithm
denoting the element of X at row i and column j,               [59] for generalized linear models, resulting in two
simple transformations lead to                                 nested loops. The loops are iterated a fixed number
                       X
                       p                                       of times or until a convergence criterion is reached.
               ti1 /          b ðY, Xj Þ xij
                             COV                               This apparently appealing approach has a major
                       j¼1                                     drawback in practical microarray data analysis:
                       Xp                                      convergence is never reached if X is full row-rank,
                  /          Vb
                              AR ðXj Þ j xij ,                which is most often the case in high-dimensional
                       j¼1
                                                               microarray data with n  p [51]. The IRPLS
where j is the least squares regression coefficient           method as well as a few adaptations overcoming
obtained by regressing Y against Xj. The subsequent            the convergence problem have been applied both to
vectors t2, . . ., tc may be expressed in a similar way        survival analysis and classification. Binary classifica-
using deflated matrices. Several studies are based on          tion is one of the most common applications of
the idea that j is not an optimal choice when Y is a          generalized linear models and of Marx’s IRPLS
binary or survival variable. Li and Gui [50] suggest to        algorithm. To our knowledge, the IRPLS algorithm
replace j by the regression coefficient of Xj obtained        has never been applied directly to classification with
via Cox regression analysis, thus taking the specificity       microarray data. However, it has inspired at least two
of the response variable Y into account. For the               recent papers on the generalization of PLS regression
construction of t1, Y is regressed against Xj. For the         to categorical response variables.
construction of tj, j > 1, Y is regressed against Xj and           The first approach is proposed by Ding and
the j-1 first latent components. A similar approach is         Gentleman [52] and can be seen as an adaptation of
proposed by Bastien [54] and studied from a                    Marx’s IRPLS method which solves the problem of
methodological point of view in [55]. The idea                 separation. As already mentioned in ‘Classification
                                  Partial least squares for genomics analyses                                                41
problems’ section, infinite parameter estimates can       In this article we have reviewed the PLS approach to
occur in binary logistic regression when the              regression and dimension reduction that is perfectly
two classes are completely or quasi-completely            suited for analysing this kind of data.
separated [60]. Firth [61] suggests a procedure to           Specifically, PLS has several advantages over many
remove the first-order term of the asymptotic bias        competing approaches:
of maximum likelihood estimates in Generalized
Linear Models (GLMs). The procedure is based on a          It automatically performs variable selection.
modified score function which, when applied to             It can be applied to a diverse set of tasks, including
logistic regression, guarantees finite estimates [62].      classification, survival analysis and modeling of
The binary classification method obtained by using          transcription factors activities.
the Firth’s modified score function in place of the        It is statistically very efficient.
usual score function in the IRPLS algorithm is             Moreover, it is computationally very fast, which
denoted as IRWPLSF by Ding and Gentleman [52].              renders it practical for application to large data sets.
                                                                                                                                  Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
They also propose a generalization of the method to
multicategorical response variables, which is based on       As outlined in ‘Application of Partial Least
the multinomial logit model and denoted as                Squares to High-dimensional Genomic Data’ and
MIRWPLSF. The IRWPLSF and MIRWPLSF are                    ‘Outlook and Generalizations of PLS’ sections of
reported to achieve a slightly better classification      this review, at present most reported applications
performance than usual classification methods such as     of the PLS method to genomic data focus on the
nearest neighbors or SVM on the colon cancer data         analysis of microarray data from gene expression
[28] and on the NCI cancer data [31]. The second          experiments. The key advantages that characterize
approach to modify Marx’s IRPLS is suggested in           the PLS methodology are versatility and flexibility.
[51]: the procedure embeds a PLS step into ridge          On the one hand, it can be directly applied to
penalty logistic regression and might also be general-    various types of data of any dimensions for different
ized to multicategorical responses. This method is        prediction or imputation problems. On the other
applied with success to the colon cancer data [28], the   hand, PLS algorithms adapt easily to a broad range
leukemia data [27] and the prostate cancer data [63].     of questions and thus serve as a flexible basis for
   Another classical application of generalized linear    the development of novel tools for the analysis
models and IRPLS is survival analysis. As suggested       biological data. In short, we expect that with
in [64], Park et al. [49] transform the failure time      the advent of proteomics data, e.g. from mass
problem into a generalized linear regression problem      spectrometric experiments, PLS will in the future
with logarithmic link function. They propose to           also play a major role for analysing many other kinds
use the IRPLS estimation method for generalized           of high-dimensional omics data.
linear regression [58]. In contrast to the two-
stage scheme developed in [45], this method takes
censoring explicitly into account. The choice of the       Key Points
number of components is done via a cross-validation         PLS is an efficient statistical prediction tool that is especially
procedure which suggests to use c ¼ 1 for the lung           appropriate for small sample data with many (possibly corre-
                                                             lated) variables.
cancer data set [65]. According to Park et al. [49]         PLS is fast, easy to implement and does not necessitate any
convergence is achieved in a few steps. However,             preliminary feature selection.
this property seems to be controversial and lack of         The problems that may be addressed by the PLS method are
                                                             very diverse and include, e.g. tumor diagnosis, survival analysis,
convergence problems are invoked as a drawback               and modeling of regulation network.
of the method in the more recent paper by Li and
Gui [50].
                                                          References
CONCLUSIONS                                               1.   Wold H. Estimation of principal components and related
The microarray ‘revolution’ has lead to an enormous            models by iterative least squares. In: Krishnaiah PR (ed).
                                                               Multivariate Analysis. New York: Academic Press, 1966;
increase in the availability of high-dimensional               391–420.
biomedical data. Classical multivariate methods are       2.   Wold H. Nonlinear Iterative Partial Least Squares
not applicable to these ‘small n, large p’ data sets.          (NIPALS) modeling: some current developments.
42                                                      Boulesteix and Strimmer
      In: Krishnaiah PR (ed). Multivariate Analysis. New York:           23. Barker M, Rayens W. Partial least squares for discrimina-
      Academic Press, 1973;383–407.                                          tion. J Chemom 2003;17:166–73.
3.    Wold H. Path models with latent variables: the NIPALS              24. Man MZ, Dyson G, Johnson K, et al. Evaluating
      approach. In: Blalock HM (ed). Quantitative Sociology:                 methods for classifying expression data. J Biopharm Stat
      International Perspectives on Mathematical and Statistical Model       2004;14:1065–84.
      Building. New York: Academic Press, 1975.                          25. Huang X, Pan W, Grindle S, et al. A comparative study of
4.    Wold S, Ruhe A, Wold H, et al. Collinearity problem                    discriminating human heart failure etiology using gene
      in linear regression. The partial least squares (PLS)                  expression profiles. BMC Bioinformatics 2005;6:205.
      approach to generalized inverses. SIAM J Sci Comput Stat           26. Huang X, Pan W. Linear regression and two-class
      1984;5:735–43.                                                         classification with gene expression data. Bioinformatics
5.    Martens H. Reliable and relevant modeling of real world                2003;19:2072–8.
      data: a personal account of the development of PLS                 27. Golub TR, Slonim DK, Tamayo P, et al. Molecular
      regression. Chemom Intell Lab Syst 2001;58:85–95.                      classification of cancer: class discovery and class prediction
6.    Wold S. Personal memories of the early PLS development.                by gene expression monitoring. Science 1999;286:531–7.
      Chemom Intell Lab Syst 2001;58:83–4.                               28. Alon U, Barkai DA, Notterman K. Broad patterns of gene
                                                                                                                                              Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
7.    Phatak A, Dehoog F. Exploiting the connection between                  expression revealed by clustering analysis of tumor and
      PLSR, Lanczos, and conjugate gradients: alternative proofs             normal colon tissues probed by oligonucleotide arrays.
      of some properties of PLSR. J Chemom 2002;16:361–7.                    Proc Natl Acad Sci 1999;96:6745–50.
8.    Helland I. On the structure of Partial Least Squares. Comm         29. Perez-Enciso M, Tenenhaus M. Prediction of clinical
      Stat Simul Comp 1988;17:581–607.                                       outcome with microarray data: a partial least squares
9.    Stone M, Brook RJ. Continuum regression: cross-validated               approach. Hum Genet 2003;112:581–92.
      sequentially constructed prediction embracing ordinary least       30. Musumarra G, Barresi V, Condorelli DF, et al. Potentialities
      squares, partial least squares and principal component                 of multivariate approaches in genome-based cancer
      regression. J Roy Stat Soc B 1990;52:237–69.                           research: identification of candidate genes for new
10.   Frank IE, Friedman JH. A statistical view of some                      diagnostics by PLS discriminant analysis. J Chemom 2004;
      chemometrics regression tools. Technometrics 1993;35:                  18:125–32.
      109–35.                                                            31. Ross DT, Scherf U, Eisen MB, et al. Systematic variation
11.   Garthwaite PH. An interpretation of partial least squares.             in gene expression patterns in human cancer cell lines.
      J Am Stat Assoc 1994;89:122–7.                                         Nat Genet 2000;24:227–34.
12.   Martens H, Naes T. Multivariate Calibration. New York:             32. Alaiya AA, Franzen B, Hagman A, et al. Classification of
      Wiley, 1989.                                                           human ovarian tumors using multivariate data analysis of
                                                                             polypeptide expression patterns. IntJ Cancer 2000;86:731–6.
13.   Braak CJF, de Jong S. The objective function of partial least
      squares. J Chemom 1998;12:41–54.                                   33. Musumarra G, Condorelli DF, Scire S, et al. Shortcuts in
                                                                             genome-scale cancer pharmacology research from multi-
14.   Jong S. SIMPLS: an alternative approach to partial least
                                                                             variate analysis of the National Cancer Institute gene
      squares regression. Chemom Intell Lab Syst 1993;18:251–63.
                                                                             expression data base. Biochem Pharmacol 2001;62:547–53.
15.   Rao CR. Linear Statistical Inference and its Application.
                                                                         34. Cho JH, Lee D, Park JH, et al. Optimal approach for
      New York: Wiley, 1993.
                                                                             classification of acute leukemia subtypes based on gene
16.   Boulesteix AL, Strimmer K. Predicting transcription factor             expression data. Biotech Progress 2002;18:847–54.
      activities from combined analysis of microarray and ChIP
                                                                         35. Tan Y, Shi L, Tong W, et al. Multi-class tumor classification
      data: a partial least squares approach. Theor Biol Med Model
                                                                             by discriminant partial least squares using microarray gene
      2005;2:23.
                                                                             expression data and assessment of classification models.
17.   Datta S. Exploring the relationships in gene expressions:              Comput Biol Chem 2004;28:235–44.
      a partial least squares approach. Gene Expression 2001;9:
                                                                         36. Modlich O, Prisack HB, Munnes M, et al. Predictors of
      257–64.
                                                                             primary breast cancers responsiveness to preoperative
18.   Bras LP, Menezes JC. Dealing with gene expression missing              epirubicin//cyclophosphamide-based chemotherapy: trans-
      data. IEE Syst Biol 2006;153:105–19.                                   lation of microarray data into clinically useful predictive
19.   Nguyen DV, Wang N, Caroll RJ. Evaluation of missing                    signatures. J Transl Med 2005;3:32.
      value estimation for microarray data. J Data Sci 2004;2:           37. Huang X, Pan W, Han X, et al. Borrowing information
      347–70.                                                                from relevant microarray studies for sample classification
20.   Huang X, Pan W, Park S, et al. Modeling the relationship               using weighted partial least squares. Comput Biol Chem 2005;
      between LVAD support time and gene expression changes                  29:204–11.
      in the human heart by penalized partial least squares.             38. Nguyen DV, Rocke D. Tumor classification by partial least
      Bioinformatics 2004;20:888–94.                                         squares using microarray gene expression data. Bioinformatics
21.   Johansson D, Lindgren P, Berglund A. A multivariate                    2002;18:39–50.
      approach applied to microarray data for identification of          39. Nguyen DV, Rocke D. Multi-class cancer classification
      genes with cell cycle-coupled transcription. Bioinformatics            via partial least squares with gene expression profiles.
      2003;19:467–73.                                                        Bioinformatics 2002;18:1216–26.
22.   Clementi M, Clementi S, Cruciani G, et al. Robust                  40. Boulesteix AL. PLS dimension reduction for classification
      multivariate statistics and the prediction of protein second-          with high-dimensional microarray data. Stat Appl Genet Mol
      ary structure content. Protein Eng 1997;10:747–9.                      Biol 2004;3:33.
                                        Partial least squares for genomics analyses                                               43
41. Dai JJ, Lieu L, Rocke D. Dimension reduction for                53. Durand JF. Local polynomial additive regression through
    classification with gene expression data. Stat Appl Genet           PLS and splines: PLSS. Chemom Intell Lab Syst 2001;58:
    Mol Biol 2006;5:6.                                                  235–46.
42. Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of       54. Bastien P. PLS-Cox model: application to gene expression
    multiple cancer types by shrunken centroids of gene                 data. In: Proceedings COMPSTAT’04. Springer: Physica-
    expression. Proc Natl Acad Sci 2002;99:6567–72.                     Verlag, 2004;655–62.
43. Musumarra G, Barresi V, Condorelli DF, et al. A                 55. Bastien P, Esposito-Vinzi V, Tenenhaus M. PLS generalized
    bioinformatics approach to the identification of candidate          linear regression. Comput Stat Data Anal 2005;48:17–46.
    genes for the development of new cancer diagnostics.            56. Nguyen DV, Rocke D. On partial least squares dimension
    Biol Chem 2003;384:321–7.                                           reduction for microarray-based classification: a simulation
44. Cox DR. Regression models and life-tables (with                     study. Comput Stat Data Anal 2004;46:407–25.
    discussion). J Roy Stat Soc B 1972;34:187–220.                  57. Perez-Enciso M, Toro MA, Tenenhaus M, etal. Combining
45. Nguyen DV, Rocke D. Partial least squares proportional              gene expression and molecular marker information for
    hazards regression for application to DNA microarray                mapping complex trait genes: a simulation study. Genetics
    survival data. Bioinformatics 2002;18:1625–32.                      2003;164:1597–606.
                                                                                                                                         Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
46. Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of       58. Marx BD. Iteratively reweighted partial least squares.
    diffuse large B-cell lymphoma identified by gene expression         Technometrics 1996;38:374–81.
    profiling. Nature 2000;403:503–11.                              59. Green P. Iteratively reweighted least squares for maximum
47. Sorlie T, Perou CM, Tibshirani R, et al. Gene expression            likelihood estimation and some robust and resistant
    patterns of breast carcinomas distinguish tumor subclasses          alternatives. J Roy Stat Soc B 1984;46:149–92.
    with clinical implications. Proc Natl Acad Sci 2001;98:         60. Albert A, Anderson J. On the existence of maximum
    10869–74.                                                           likelihood estimates in logistic regression models. Biometrika
48. Nguyen DV. Partial least squares dimension reduction for            1984;71:1–10.
    microarray gene expression data with a censored response.       61. Firth D. Bias reduction of maximum likelihood estimates.
    Math Biosci 2005;193:119–37.                                        Biometrika 1993;80:27–38.
49. Park PJ, Tian L, Kohane IS. Linking gene expression data        62. Heinze G, Schemper M. A solution to the problem of
    with patient survival times using partial least squares.            separation in logistic regression. Stat Med 2002;21:2409–19.
    Bioinformatics 2002;20:208–15.                                  63. Singh D, Febbo PG, Ross K, et al. Gene expression
50. Li H, Gui J. Partial Cox regression analysis for high-              correlates of clinical prostate cancer behaviour. Cancer Cell
    dimensional microarray gene expression data. Bioinformatics         2002;1:203–9.
    2004;20:208–15.                                                 64. Whitehead J. Fitting Cox’s regression model to survival data
51. Fort G, Lambert-Lacroix S. Classification using partial least       using GLIM. J Roy Stat Soc C 1980;29:268–75.
    squares with penalized logistic regression. Bioinformatics      65. Bhattacharjee A, Richards WG, Staunton J, et al.
    2005;21:1104–11.                                                    Classification of human lung carcinomas by mRNA
52. Ding B, Gentleman R. Classification using penalized partial         expression profiling reveals distinct adenocarcinoma
    least squares. J Comput Graph Stat 2005;14:280–98.                  subclasses. Proc Natl Acad Sci 2001;98:13790–5.
44                                                       Boulesteix and Strimmer
APPENDIX
List of abbreviations
Term                                        Signification                                       Introduced in sections
PLS1                                        Univariate PLS                                      Univariate response
PLS2                                        Multivariate PLS (first)                            Multivariate response
SIMPLS                                      Multivariate PLS (second)                           Univariate response
OLS                                         Ordinary Least Squares
PCR                                         Principal Component Regression
PCA                                         Principal Component Analysis
RRR                                         Reduced Rank Regression
PLSþLDA                                     Two-step classification procedure consisting        Classification problems
                                               of PLS dimension reduction and LDA
IRPLS                                       Marx’s Iteratively Reweighted PLS                   PLS and generalized linear models
X ¼ (xij)i ¼1, . . . , n, j ¼1, . . . , p   n  p matrix of predictors                          PLS regression
                                                                                                                                    Downloaded from https://academic.oup.com/bib/article/8/1/32/265330 by guest on 12 January 2025
Y ¼ (yij)i ¼1,  ,n,j ¼1, . . . , q         n  q response matrix                               PLS regression
X1, . . . , Xp                              Uncentered predictor variables (random variables)   PLS regression
Y1, . . . ,Yq                               Uncentered response variables (random variables)    PLS regression
ðx0i , y0i Þi¼1, ..., n                     Uncentered sample                                   PLS regression
(xi, yi)i ¼1, . . ., n                      Centered sample                                     PLS regression
wj ¼ (w1j, . . . , wpj)T                    Weight vector defining the j-th latent component    PLS regression
tj ¼ (t1j, . . . , tnj)T                    j-th latent component                               PLS regression
T ¼ [t1, . . . , tc]                        n  c matrix of latent components                   PLS regression
W ¼ [w1, . . . , wc]                        p  c matrix of weights                             PLS regression
Tj, j ¼1, . . . , c                         (Uncentered) random variable corresponding to tj    PLS regression
P                                           p  c matrix of X-loadings                          PLS regression
Q                                           q  c matrix of Y-loadings                          PLS regression
E                                           n  p error matrix                                  PLS regression
F                                           n  q error matrix                                  PLS regression
B                                           p  q matrix of regression coefficients             PLS regression