Xiaobo 2010
Xiaobo 2010
Review
a r t i c l e i n f o a b s t r a c t
Article history: Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields,
Received 30 November 2009 such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical
Received in revised form 21 March 2010 sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning
Accepted 23 March 2010
instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most
Available online 30 March 2010
data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable
unreliable. Recently, considerable effort has been directed towards developing and evaluating differ-
Keywords:
ent procedures that objectively identify variables which contribute useful information and/or eliminate
Near-infrared spectroscopy
Chemometrics
variables containing mostly noise. This review focuses on the variable selection methods in NIR spec-
Wavelength troscopy. Selection methods include some classical approaches, such as manual approach (knowledge
Variable selection based selection), “Univariate” and “Sequential” selection methods; sophisticated methods such as succes-
sive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based
strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs)
and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS.
Wavelength selection with B-spline, Kalman filtering, Fisher’s weights and Bayesian are also mentioned.
Finally, the websites of some variable selection software and toolboxes for non-commercial use are given.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2. The importance of variable selection in NIR spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1. Chemical basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2. Physical basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3. Statistical and multivariate calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4. Instrument and industrial requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. A brief review of regression methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1. Calibration and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2. Multivariate linear regression (MLR), principal component regression (PCR) and partial least squares regression (PLSR) . . . . . . . . . . . . . . . . 19
3.2.1. Multiple linear regression (MLR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2. Principal component regression (PCR) and partial least squares regression (PLSR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4. Variables selection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1. Manual approaches – knowledge based selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2. Variable selection by single-term linear regression and multi-term regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1. Selection by single-term linear regression and the correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2. Selection by multi-term regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3. Successive projections algorithm (SPA) and uninformative variable elimination (UVE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1. Successive projections algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.2. Uninformative variable elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.3. UVE–SPA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4. Simulated annealing (SA), artificial neural networks ANN) and genetic algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
0003-2670/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2010.03.048
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 15
1. Introduction as PLS, the influence of data that does not contain critical informa-
tion can severely corrupt the resulting calibration models, because
In recent years, near-infrared (NIR) spectroscopy has gained not all variables or their regions are equally important for the mod-
wide acceptance in different fields by virtue of its advantages eling; some of them, like noise areas, may even be harmful. Data
over other analytical techniques, the most salient of which is projection on an abstract factor space reduces the error but does
its ability to record spectra for solid and liquid samples without not eliminate it entirely; it is partially projected onto the new data
any pretreatment. This characteristic makes it especially attrac- space, often confounding the model. Therefore, removal of the vari-
tive for straightforward, speedy characterization of natural and ables, in which the noise dominates over the relevant information
synthetic products. The cost savings of NIR measurements related often leads to better accuracy and performance of the analytical
to improved control and product quality are often achieved and methods.
can provide results significantly faster compared to traditional lab- In contrast, selection methods are based on the principle of
oratory analysis. In batch processes, NIR allows several quality choosing a small number of variables selected from the original
estimates to be performed within a manufacturing cycle in opposed provide easier interpretation. Variable selection in multivariate
to a single end of batch analysis. Therefore, it can reveal potential analysis is a very important step, because the removal of non-
problems early in the process and promote corrective actions, this informative variables will produce better prediction and simpler
may have particular advantages in the case where safety is a fac- models. It has been shown that the predictive ability can be
tor. Also, e.g., safety aspects can be seen as one of the advantages increased and, the complexity of the model can be reduced by a
due to intrinsically safe measurement probes and fiber optics. NIR judicious pre-selection of wavelengths. It is now widely accepted
spectroscopy has increasingly been adopted as an analytical tool in that a well-performed variable selection can result in models hav-
variety of different fields during the past 15 years, for example in ing a greater predictive ability [15].
the petrochemical [1,2], pharmaceutical [3,4], environmental [5,6], Variable or feature selection, also called “frequency” or “wave-
clinical [7–9], agricultural [6,10–12], food [13] and biomedical [14] length” selection when applied to spectroscopic data, is a critical
sectors. step in data analysis, as it allows interactive improvement of the
Typically, modern NIR analysis involves the rapid acquisition of quality of data during the calibration procedure. The goal of fre-
large number of absorbance values for a selected spectral range. The quency selection is to identify a subset of spectral frequencies that
information contained in the spectral curve is then used to predict produce the smallest possible errors when used to perform opera-
the chemical composition of the sample by extracting the appro- tions such as making quantitative determinations or discriminating
priate variables of interest. Generally, NIR spectroscopy is used in between dissimilar samples. Recently, considerable effort has been
combination with multivariate techniques for qualitative or quan- directed toward developing and evaluating different procedures
titative analysis. The large number of spectral variables in most data that objectively identify variables that contribute useful informa-
sets encountered in spectral chemometrics often renders the pre- tion and/or eliminate variables containing mostly noise. Classically,
diction of a dependent variable complicated, however by the use of this selection is made from the basic knowledge about the spectro-
suitable projection or selection techniques the problem may be nin- scopic properties of the sample – knowledge based selection [16],
imised. Selection and projection methods differ in several aspects but it has been shown that there are mathematical strategies for
[15]. variable selection that are more efficient.
Projection methods, for example, partial least squares (PLS) From a conceptual point of view, a variable selection procedure
and principal component regression (PCR) are generally applicable includes first the choice of a relevance measure and, second, the
but do not presuppose any bias or weights on the principal axes. choice of a search algorithm to perform optimization. The relevance
However, projection calibration models are straightforward and measure aims at evaluating the influence of a particular subset of
the model calculations can be performed quickly by commercially X-variables on the dependent variables, y. Concerning the search
available software packages. Earlier PCR and PLS full-spectrum algorithm, stochastic algorithms are performed in applications such
methods did not feature preliminary selection, but introduce latent as spectroscopic multivariate calibration. This approach is usually
variables comprised of combinations of the original features. Even called computer aided variable selection. Computer aided variable
where prediction properties are good, they usually suffer from selection is an important pre-processing procedure in chemomet-
the fact that the latent variables are hardly interpretable in terms rics, which is widely used to improve the performance of various
of original features (wavelengths in the case of infrared spectra). multivariate methods and algorithms, such as regression methods,
Furthermore, multivariate calibration models such as partial least factor analysis, and curve resolution. Multivariate approaches can
squares (PLS) regression have been developed for quantitative anal- exploit all variables and effectively extract necessary information in
ysis of spectral data because of their ability to reduce the impact of the analysis. Computer aided variable selection is also important in
common problems such as collinearity, band overlaps, and inter- industry for several reasons. Variable selection can improve model
actions. However, even with such sophisticated chemometric tools performance, provide robust models that may be readily trans-
16 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
Fig. 1. Overtone and combination NIR band assignment (from Bruker Gmbh, Bremen, Germany).
ferred and allow non-expert users to build reliable models with tive variable elimination (UVE). The fourth part of Section 4 focuses
only limited expert intervention. Furthermore, computer aided on elaborate search-based strategies, such as simulated anneal-
selection of variables may be the only approach for some mod- ing (SA), artificial neural networks (ANN) and genetic algorithms
els, for example predicting a physical property from spectral data. (GAs). Interval partial least squares (iPLS), including moving win-
Exploiting state-of-the-art theories and techniques of the late 20th dows PLS and iterative PLS are well discussed in the fifth part of
and the 21st centuries, has enabled tremendous progress in NIR Section 4. The last part of Section 4 introduces some other selec-
spectroscopy. tion methods, such as B-spine and Kalman filter, etc. Finally, the
There are a multitude of approaches available for variable websites of some variable selection software and toolboxes for non-
selection. These maybe categorized as follows. First, “Univariate” commercial use are given in Section 5. The review ends with a brief
approaches select those variables that have the greatest correlation summary.
with the response, mainly in the early NIR spectroscopy selection
time. Secondly, “Sequential” approaches rank variables in order and 2. The importance of variable selection in NIR spectroscopy
pair the variables in a forward or backward progression. A more
sophisticated approach iterates the progression to reassess previ- There is much literature about the importance of variable selec-
ous selections. An inherent problem with these approaches is that tion in NIR spectroscopy. Here, the different aspects of variable
only a very small part of the experimental domain is explored. selection are summarized.
These methods were used from middle of the 1970s to the mid-
dle of the 1990s. Thirdly, since 1990s, “multivariate” methods of 2.1. Chemical basis
variable selection have been introduced, namely, interactive vari-
able selection, uninformative variable elimination (UVE), interval NIR spectroscopy involves energy transfer between light and
PLS (iPLS), significance tests of model parameters, and the use of matter. The spectral features of samples in the near-infrared
genetic algorithms (GAs) for example. (1000–2500 nm) spectral region are associated with the vibrational
This review emphasizes variable selection methods in NIR spec- modes of functional groups. Organic matter present in samples has
troscopy, and is organized as follows. The importance of variable distinct spectral fingerprints in the NIR region because of the rela-
selection in NIR spectroscopy is given in Section 2 based on different tively strong absorption of overtones and the combination modes
viewpoints. Section 3 gives a brief review of the global calibration relative to several functional groups, such as C· · ·H (aliphatic), C· · ·H
methods PCR and PLS and the most common method MLR, because (aromatic), C· · ·O (carboxyl), O· · ·H (hydroxyl) and N· · ·H (amine and
these methods are always used as relevance measure methods in amide), usually present in the organic compounds. Restriction of
variable selection. Variable selection methods are discussed in Sec- the data set to the wavelengths of the second overtones of the vibra-
tion 4. Some classical approaches, such as the manual approach tion bands of CH, CH2 and CH3 bonds and the exclusion of the OH
(knowledge based selection), “Univariate” and “Sequential” selec- vibration bands (water and sugars, see above) also improved the
tion methods are introduced in the first and second part of Section model [17]. Organic molecules have specific absorption patterns in
4. The third part of Section 4 discussed the relatively, sophisticated the near-infrared region that can report the chemical composition
methods, successive projections algorithm (SPA) and uninforma- of the material being analyzed. The functional group effect is by far
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 17
the most dominant of all effects in the NIR spectrum. Fig. 1 shows highly correlated by nature, but in addition, real applications usu-
the NIR correlation chart. The chart as a simply summarises the ally concern databases with a low number of known spectra, and
most prominent effects, that of the functional groups, and offers a a high number of spectral variables. Any method built on the orig-
very useful reference for both experienced and inexperienced users inal spectral variables is thus ill-posed, making feature (spectral
of NIR technology. However, because of the complicated nature of variable) selection and/or projection necessary.
NIR spectra, such as neighbour group effects, hydrogen bonding, Viewed from a multivariate calibration perspective, variable
crystallinity, phase separation, thermal and mechanical, etc., most selection attempts to identify and remove the variables that penal-
NIR band assignments were not made from fundamental studies of ize the performance of a model since they are useless, noisy and
simple molecules, but rather from empirical NIR method develop- redundant or correlated by chance. Variable selection procedures
ment. In other cases, it was possible to estimate band positions of are of particular interest when dealing with spectroscopic data.
functional group in the NIR region from known band positions of Indeed, the number of variables is potentially very large with
the same functional group in the IR spectrum. It is important to note regard to the number of samples at disposal for a regression model.
that the band positions represented in the chart are only approx- Therefore, in MLR, wavelength selection is a necessary part of the
imate and were compiled from a limited amount of experimental procedure for building the calibration model. Usually, this dimen-
data. Despite many limitations, the charts should serve as useful sionality problem is circumvented using methods such as partial
quick references for NIR users [12]. least squares (PLS) regression. But the PLS latent variables cal-
culated may also be affected by redundancies or the presence
2.2. Physical basis of irrelevant variables. Wavelength selection is just a way for
improving precision. Consequently, improvements in prediction
If knowledge were available about the relation between exter- performance and in model characteristics (more robust models)
nal variables and the spectral intensities, many NIR chemometric can be expected from variable selection. Furthermore, understand-
problems may have satisfactory explanation. Usually, however, no ing and interpretating of the chemical process under investigation
physical model is available for estimating the influence of external should be facilitated paying particular attention to the relevant
variations on the spectral variables. As a result, variable selection spectroscopic variables.
techniques need to be used to select a spectral subset.
As in other spectrophotometric techniques, the origin of some
2.4. Instrument and industrial requirements
non-linearities in NIR spectroscopy are well-known (e.g., unfulfill-
ment of Beer’s law at high analyte concentrations, non-linearity
Instrumentation spectra used for chemometrics analysis are
in the detector response, drifts in the light source); unlike oth-
often too unwieldy to model, as many of the inputs do not con-
ers, however, the NIR technique is subject to deviations arising
tain important information. When using analytical calibrations, it is
from the process and of the measurements it provides. One of the
important to identify and minimise all possible sources of error and
most significant deviations when operating in the reflectance mode
ensure the best possible estimator. To improve the quality of the
is due to the fact that the Kubelka–Munk transformation is lin-
on-line monitoring processes, it is informative to obtain as many as
ear only under specific conditions. The most frequent alternative
possible spectra in a given period of time. Nevertheless, hardware
(viz. the variation of the reciprocal reflectance, log(1/R), against
limitations could lead to the fact that it is not possible to acquire
the concentration) is linear in most cases. However, proportional-
more than a certain number of spectra in a given period of time.
ity between the two parameters is dependent on absorptivity and
Wavelength selection could be a good way to limit this problem
scattering, the latter of which varies non-linearly with particle size.
since it decreases size of selection and consequently the acquisi-
This “extrinsic” non-linearity can be corrected to a great extent by
tion time, of each recorded spectrum. Wavelength selection results
using various mathematical signal treatments such as multiplica-
can, for instance, be used to select the most suited filters for on-line
tive scatter correction (MSC), the standard normal variate (SNV) or
applications of NIR. When developing mission-critical regression
derivative absorption spectra.
models intended for the routine usage, e.g., industrial usage, even a
One other source of non-linearity is related to the chemical
subtle improvement in the performance is important. Wavelength
nature of the target analytical parameter. The different types of
selection [22–24] is important for reliable classifications of analytes
interaction a given functional group can undergo in external ambi-
by NIR spectroscopy and chemometric models.
ent conditions (pressure, temperature, etc.), can shift absorption
bands and result in intrinsic non-linearity that cannot be corrected
by spectral pretreatment and calls for special tools. 3. A brief review of regression methods
Correct selection of variables in order to gather a small subset
with a decreased sensitivity to non-linearity or discard those wave- The commonly used chemometric methods for the analysis of
lengths most markedly contributing to it suffices in some cases. NIR spectra could be divided into three main techniques groups.
Occasionally, the process is labour-intensive and time-consuming (i) Mathematical pretreatments to enhance the information search
but can be expedited by using a variable selection method (e.g., in the study, and decrease the influence of the side information
a genetic or stepwise selection algorithm). Furthermore, variable contained in the spectra. Spectral pre-processing is considered as
selection is one very important step in any successful development well known and not described in this text. The classical pretreat-
of a calibration model. Careful choice of spectral pre-processing ments are normalizations, derivatives and smoothing. For more
and wavelength selection could eliminate the temperature depen- details, readers are referred to textbooks [25,26] and [27]. (ii) Qual-
dency [18,19]. The automatic wavelength selections [20,21] may itative analysis means classification of samples according to their
give valuable information about what factors are important in cre- NIR spectra. NIR identifications are based on pattern recognition
ating a successful discrimination. methods. There are many unsupervised and supervised methods
in classification techniques. Roggo et al. [4] give a brief descrip-
2.3. Statistical and multivariate calibration tion of the chemometric classification methods and an overview of
the pharmaceutical applications in the field of qualitative analyses,
Viewed from a statistical or data analysis perspective, the main especially identification and qualification of raw and final materi-
difficulty in such problems is to cope with the collinearity between als. The classification methods will not be described in this text,
spectral variables: not only are consecutive variables in a spectrum readers are interested in this field could referred to the articles
18 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
[26,28–37]. (iii) Regression methods are used to link the spectrum (SEV) are computed simultaneously. In this case, Spectra X and the
to quantifiable properties of the samples. The quantitative part of related sample properties “y” are split into calibration and predic-
this review will be described in this section. tion subsets separately. The calibration data usually comprised of
between 50 and 75% of the total data set and include the small-
3.1. Calibration and validation est and largest “y”, with the remaining data partitioned randomly
into the calibration and prediction sets. The efficiency of a model
In spectroscopy, the goal of calibration is to replace slow, expen- approximation for a set of calibration and prediction samples can
sive measurement of the property of interest, y, by a spectroscopic be reported as standard error of calibration (SEC), the root mean
feature that is cheaper or faster, but is still sufficiently accurate. For square error of cross-validation (RMSECV), the correlation coeffi-
any spectroscopy technique, such as NIR spectroscopy, multivari- cient (r) and standard error of prediction (SEP). These coefficients
ate calibration (MVC) is defined as “A process for creating a model are computed as follows:
‘f’ that relates sample properties ‘y’ to the intensities or absorbance
‘X’ at more than one wavelength or frequency of a set of known ref- Ic
SEC =
1 2
(ŷi − yi ) (1)
erence samples” [38]. Fig. 2 is the flow diagram of calibration and Ic − 1 − h
validation process [39]. i=1
SEP =
2
accepted by scientist in this area. Accordingly, linear MVC (LMVC) (ŷk − yk ) (2)
Ip − 1
models are used, such as multiple linear regression (MLR), princi- k=1
pal component regression (PCR) and partial least squares regression
Ic
(PLSR). The development of the regression model comprises of the 1
RMSECV =
2
following three stages [38]: (ŷi − yi ) (3)
Ic
i=1
(i) The calibration model is built and validated using a training n
set (X0 , y0 ) and a validation set (X1 , y1 ); the result is an error g=1
(yg − yg )2
of validation having an associated standard error of validation
2
r =1− n (4)
g=1
(yg − ȳ)2
(SEV), that is used to configure the model.
(ii) Both (X0 , y0 ) and (X1 , y1 ) are used to compute the standard where ŷi , ŷk denote the estimated value of the ith observation in
error of calibration (SEC) of the model. calibration and kth observation in predication sets, yi , yk is the
(iii) An independent test set (Xp , yp ) is used to evaluate the model’s measured value of ith observation in calibration and kth obser-
performance with an indicator criterion, namely the error of vation in predication sets, Ic , IP ate the number of observation
prediction, where the standard error of prediction (SEP) is uti- in calibration and predication sets and h is the number of inde-
lized. pendent variables in the regression. To evaluate the error of each
calibration model, the leave-one-out root mean square error of
Generally, the first and second steps are merged together using cross-validation (RMSECV) was used, calculated as: leave-one-out
the cross-validation technique (e.g., leave-one-out (LOO) method, cross-validation is performed by first defining the number of latent
contiguous blocks, randomization or the bootstrap), so the stan- variables. Next one sample is removed from the total for validation
dard error of calibration (SEC) and the standard error of validation (prediction), then, the calibration model is built with the remaining
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 19
samples. The procedure is repeated for all samples and a RMSECV These techniques are powerful multivariate statistical tools that
is calculated. yg , yg denote the measured and estimated value of have been successfully and widely applied to the quantitative anal-
the gth observation in data sets (including calibration, predication ysis of spectroscopic data because of their ability to overcome
and cross-validation sets), ȳ denotes the mean of measured value problems common to this data such as collinearity, band overlaps
in the data set. The basic relationships to notice are that the SEC and interactions and, the ease of their implementation due to the
decreases as r increases, r is always larger in absolute value than r2 , availability of software. Here, only a brief introduction about PCR
0 ≤ r 2 ≤ 1, and 0 ≤ RMSEC. and PLS is given as the techniques are routinely used.
Principal component regression (PCR) is a widely used regres-
3.2. Multivariate linear regression (MLR), principal component sion model for data having a large degree of covariance in the
regression (PCR) and partial least squares regression (PLSR) independent or predictor variables or, where ill-conditioned matri-
ces are present. Instead of regressing the concentrations of a
Multivariate linear regression (MLR), principal component measurement system onto the original measured variables spec-
regression (PCR) and partial least squares regression (PLSR) are trum, PCR implements a PCA decomposition of the spectrum X data
the three common multivariate methods used in calibration of NIR before regressing the concentrations information onto the principal
spectroscopy data. Furthermore, these three methods will also be component scores [47,48].
used in many selection approaches discussed later. Some vectors having small magnitude are omitted to avoid the
In fact, all three methods have a common point in that all of collinearity problem. PCR solves this by elimination of lower ranked
them model data using a linear least squares fitting technique. principal components which in turn reduces noise (error) present
This means that they build linear models between an independent within the system.
matrix X (spectral data) and a dependent matrix y and estimate the Partial least squares regression is related to both principal com-
regression coefficient matrix using least squares fitting techniques. ponents regression (PCR) and multiple linear regression (MLR).
PCR aims to find the factors which capture most of the variance
3.2.1. Multiple linear regression (MLR) within the data before regression onto the concentration variables,
Multiple linear regression (MLR) [27] can be characterized as whereas MLR seeks a single factor that correlates both the data
a technique for solving a number of simultaneous equations. In a and their concentrations. PLS attempts to maximise the covariance
multi-component system which is determined simultaneously the thus capturing the variance and correlating the data together. As
analysis can be described by measuring m variables xj and for vari- PLS searches for the factor space most congruent to both matrices,
able y with the main aim of creating a linear relationship between its predictions are far superior to PCR [49].
them. This can be represented mathematically as PCR and PLS techniques are share many similarities and the the-
oretical relationships between them has been covered extensively
y = b0 + b1 x1 + b2 x2 + b3 x3 + · · · + b1 x1 + bm xm + e (5)
in the literature [30,40,41,43,46,50,51]. PLS and PCR perform data
The multi-linear regression (MLR) is the oldest of the pre- decomposition into spectral loadings and scores prior to model
sented methods and is less and less used in applications due to building with the aid of these new variables. In PCR, the data
the improvement of computation power. This regression allows decomposition is done using only spectral information, while PLS
establishing a link between a reduced number of wavelengths (or employs spectral and concentration data. Historically, PCR pre-
wavenumber) and a property of the samples. The prediction yj of dates PLS. However, since its introduction, PLS appears by most
the search property can then be described with the formula: accounts to have become the method of choice among chemists.
On the other hand, from the literature survey made by Wentzell
k
and Montono [52], they surprisingly found that there were a few
yj = b0 + bi xi + ei,j (6)
cases which indicated that PLS gave better results than PCR, and
i=1
a greater number of studies indicated no real difference in per-
where bi is the computed coefficient, xi the absorbance at each con- formance ([50–52] and references there in). In addition, by generic
sidered wavelength and ei,j is the error. Each wavelength is studied simulation of complex mixtures, Wentzell and Montono concluded
one after the other and correlated with the studied property. The that in all of the simulations carried out, except when artificial con-
selection is based on the predictive ability of the wavelength. The straints were placed on the number of latent variables retained,
three modes of selection are: forward, backward, and stepwise. no significant differences were reported in the prediction errors
When the correlation reaches a value fixed by the operator it is reported by PCR and PLS [52]. PLS almost always required fewer
kept as a part of the model calibration wavelengths. The model is latent variables than PCR, but this did not appear to influence pre-
then computed between this set of calibration wavelengths and the dictive ability. This statement has been also confirmed by the others
reference values of the studied property. [40,45,50,51].
It should also be noted that when using MLR, there is no con- However, global models, such as PLS, implicitly endeavour to
sistent solution available when more variables than samples are include the variation due to external effects in the model, in much
present as an infinite number of solutions exist, this ultimately the same way as unknown chemical interferences can be included
leads to weakness within the system. The other situation, i.e., when in an inverse calibration model. Provided the interfering variation
there are more samples than variables, leads to an over determined is present in the calibration set, an inverse calibration model can,
system, this does not allow an exact solution for the coefficients. in the ideal case of additively and linearity, easily correct for the
variation due to unknown interferences. It is assumed in global
3.2.2. Principal component regression (PCR) and partial least calibration models that the new sources of spectral variation can
squares regression (PLSR) be modeled by including a limited number of additional PLS fac-
Among the different regression methods available for multivari- tors. Owing to increase in the calibration model’s dimensionality,
ate calibration, the factor analysis-based methods, including partial it becomes necessary to measure a large number of samples under
least squares (PLS) regression and principal component regression changed conditions in order to make a good estimation of the addi-
(PCR), have received considerable attention in the chemometrics tional parameters. When highly non-linear effects are present in
literature [30,40–46]. PLS and PCR can be used directly for ill- the spectra, many additional PLS [53] factors are necessary to model
conditioned data by extracting the latent variables (factors). The the spectral differences, and occasionally, it is not possible to model
number of latent variables is lower than the number of objects. these spectral differences.
20 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
Fig. 3. Variable removed by manual. (a) Apple NIR spectra, (b) Some data points were removed at lower and higher parts, R is the relative reflectance.
4. Variables selection methods 4.2. Variable selection by single-term linear regression and
multi-term regression
4.1. Manual approaches – knowledge based selection
4.2.1. Selection by single-term linear regression and the
For manual approaches, one possibility is to remove the vari- correlation coefficient
ables that have poor informational quality. In many studies This section gives a brief introduction to the concepts involved
[23,54,55], due to the insensitivity of the NIR instrument detec- in single-term linear regression, the statistical procedure that
tor, some data points at the lower and higher regions were omitted answers the following questions: “Given a set of data with one
from the spectral data sets. Fig. 3(a) [56] shows apple spectra col- independent variable X and one dependent variable Y and the cor-
lected by a NIR instrument. The data points at the lower and higher responding scatter-plot of Y against X, what is the straight line that
regions were cut from the spectral data sets before regression due best fits the data?” The answer is the straight line with the equation:
to a high signal to noise ratio (S/N). Fig. 3(b) [56] shows the selection
spectra interval. Manual deletion of variables suffers from two main Y = a + bX (7)
flaws: (i) there is uncertainty that exactly the same section of the
data will be removed between data sets and (ii) removed sections where Y is an approximation to Y, and a and b are constants.
may not be optimal from the point of view of the model (i.e., parts The best fitting line is called the regression of Y on X or Y
of a spectra may not look to the eye to be information rich, but for regressed against X. The regression constant a is called the con-
the model, they contain useful information). Therefore, when using stant term, and the regression constant b is called the regression
this manual approach, there is a tendency to remove sections that coefficient. The vertical distance from a data point to this line is the
contain either high noise or low detector response. However, such residual or regression error for that point, and the standard devia-
an approach can prove to be counter-productive in terms of robust tion of all the residuals is the SEC (or SEP). The correlation coefficient
model building. For example, information in the background noise (r), which is related to SEC, lies in the range [−1 1].
can be extremely useful for establishing a robust calibration model In developing a calibration model using single-term linear
as noise free spectra often has a large source of predictive error due regression, when one does not yet know the best wavelength to
to collinearity between neighbouring wavelengths in a single peak. use, one normally finds the r value at every available wavelength.
The presence of a high degree of collinearity between variables in The wavelength giving the highest r value is then used for the cali-
a model will tend to influence the matrix towards singularity, and bration and subsequent validation [58].
this in turn will have a large influence on the coefficients gener- However, in practice this simple approach seldom gives an
ated. adequate SEC, and a more complex calibration is usually needed.
Selection or reference wavelength is based on: (i) the peak One way to improve the correlation is to let X be the difference
absorbance of the component to be determined, such as one of between log(1/R) values at two different wavelengths (R is the rel-
the functional groups in Fig. 1; (ii) the peak absorbance of a com- ative reflectance). The two wavelengths can be found by an iterative
ponent whose concentration is highly correlated with that of the process. First, the single wavelength giving the best correlation is
component to be determined; and (iii) part of the difference of quo- found; then, a second wavelength is found so that the difference
tient expression and serves to normalize the spectra to one level between log(1/R) values at the first and second wavelengths gives
of scatter, particle size, temperature, etc. This would typically be the best correlation. The first wavelength is then replaced with a
the approach taken up by the spectroscopist. Manual selection is third wavelength whose difference with the second gives the best
suffers in the following respects, (i) the need for experience and correlation, and so on until the process converges, this ensures
good understanding of NIR spectroscopy as many biomaterial NIR that each pair of wavelengths provide the highest correlation. An
spectrums are too complicated to understand, and (ii) the relation- iterative procedure does not necessarily produce the pair of wave-
ship between absorption in the near-infrared (NIR) spectral region lengths whose difference provides the highest correlation. This
and the target analytical parameter is frequently of the non-linear method only provides the pair of wavelengths producing the con-
in nature. The origin of the non-linearity can be varied and diffi- verged correlation. The same process can be used with quotients
cult to identify. In some cases, the relationship between absorption (A/B) instead of differences and with the quotients of differences
and the analytical parameter of interest is intrinsically non-linear ((A − B)/(C − D)) [39]. In the last case, there are various ways of iter-
owing to the chemical nature of the sample or analytes concerned ating the process when selecting the four wavelengths. The various
[57]. methods do not all yield the same choice of wavelengths.
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 21
If several wavelengths do not give an acceptable result, multi- mance. McShane [59] described a fast stepwise algorithm that uses
term regression approaches should be used as discussed later. multiple ranking chains to identify several spectral regions corre-
lated with known sample properties. The multiple-chain approach
4.2.2. Selection by multi-term regression allows the generation of a final ranking vector that moves quickly
Multi-term regression (usually using multiple linear regression away from the initial selection point, testing several areas exhibit-
(MLR) as shown in formula (5)) uses the information at a num- ing correlation between spectra and composition early in the
ber of wavelengths to isolate the effect of a single absorber and stepping procedure [59]. There have been many studies devoted
to normalize the baseline. There are various ways of choosing the to this problem; for a comprehensive review one can see [60].
wavelengths to use in multi-term linear regression [39]. These are
(i) the step-up or forward procedure picks the wavelength giving 4.3. Successive projections algorithm (SPA) and uninformative
the best single-term calibration as the first independent variable, variable elimination (UVE)
and then finds the best wavelength to add as a second variable in a
two-term regression, and so on until some stopping criterion is met. Employing the full spectral region does not always yield opti-
(ii) The step-down or backward procedure starts with a multi-term mal results as it may include regions which comprise of more noise
linear regression using all available wavelengths and eliminates than relevant information. Therefore, uninformative variable elim-
variables by some criterion. (iii) The all-possible-combinations pro- ination (UVE) proposed by Centner et al. [61], has been used to solve
cedure tests all possible linear regressions on all subsets of available such problems and improve the quality of the models. Multiple lin-
wavelengths and reports the subset giving the lowest SEC. This pro- ear regression (MLR) models are simpler and easier to interpret, but
cedure is usually limited to all subsets containing only two or three they are very affected by collinearity between variables. The suc-
wavelengths. (iv) There are also combinations of these methods. cessive projections algorithm (SPA) proposed as a variable selection
For example, the all-possible-combinations method can select two strategy by Araújo et al. [62], illustrates the advantage of finding a
or three wavelengths, and then the step-up method can be used small representative set of spectral variables with a minimum level
to add wavelengths. Alternatively, each step in the step-up method of collinearity.
can be followed by one step of the step-down method, to determine
wavelengths that can be safely eliminated when a new wavelength 4.3.1. Successive projections algorithm
is added. This method is called the stepwise method, and is most The successive projections algorithm (SPA) is a variable selec-
commonly referenced in the literature. The detailed algorithm is tion technique designed to minimize collinearity problems in
as follows. In stepwise multiple linear regression (MLR-step) [28], multiple linear regression (MLR). SPA employs simple projection
original variables are selected iteratively according to their cor- operations in a vector space to obtain subsets of variables with min-
relation with the target property y. For a selected variable xi , a imal collinearity and is a forward variable selection algorithm for
regression coefficient bi is determined and tested for significance multivariate calibration. The principle of variable selection by SPA
using a t-test at a critical level (such as = 5%). If the coefficient is is that the new variable selected is the one among all the remaining
found to be significant, the variable is retained and another variable variables, which has the maximum projection value on the orthog-
xj is selected according to its partial correlation with the residu- onal sub-space of the previous selected variable. A graphical user
als obtained from the model built with xi . This procedure is called interface for SPA is available at www.ele.ita.br/kawakami/spa/. SPA
forward selection. The significance of the two regression coeffi- steps are described below for a given initial wavelength k(0). The
cients bi and bj associated with the two retained variables is then total number of wavelengths in the spectrum is J and the desired
again tested, and the non-significant terms are eliminated from the number of variables is N.
equation (backward elimination). Forward selection and backward
elimination are alternatively repeated until no significant improve- (i) Before the first iteration (n = 1), let xj = jth column of Xcal ; j = 1,
ment of the model fit can be achieved by including more variables . . ., J.
and all regression terms already selected are significant. In order to (ii) Let S be the set of wavelengths which have not been selected
reduce the risk of over-fitting due to retaining too many variables, yet. That is,
a procedure based on LOOCV followed by a randomisation test is
applied to test different sets of variables for significant differences S = {j such that 1 ≤ j ≤ J and j ∈
/ {k(0), . . . , k(n − 1)}}
in prediction.
The backward, forward and stepwise selection methods can be (iii) Calculate the projection of xj on the sub-space orthogonal to
performed in a short time by commercially available software pack- xk(n−1) as
ages.
There are two main flaws with these types of procedures causing −1
them to perform inconsistently across data having different noise Pxj = xj − (xTj xk(n−1) )xk(n−1) (xTk(n−1) xk(n−1) ) , for all j ∈ S
character. (i) Though the stepwise selection methods are simple
and efficient, they depend upon an ordering or ranking of the vari- where P is the projection operator.
ables which often makes them sensitive to noise distributions. (ii) (iv) Let k(n) = arg(max||Pxj ||, j ∈ S).
Because variables are usually ranked according to some criteria, (v) Let xj = Pxj , j ∈ S.
points on a single peak are commonly chosen together. In partic- (vi) Let n = n + 1. If n < N go back to Step 1.
ular, if one spectral region contains much higher correlation than
others, many points within this area will be tested before any points End: The resulting wavelengths are {k(n); n = 0, . . ., N − 1}
in other regions are considered. Neighbouring points often contain For a detailed description of SPA, see Ref. [62], the main pro-
much of the same information (collinearity), and when they are cedures are summarized here. First, set the maximum number of
added consecutively in a stepwise procedure, this may decrease variables N to be selected before a start vector is chosen in a space of
prediction accuracy. J-dimensions (where J is the number of original variables). Subse-
To overcome these drawbacks, chemical information such as quently, in an orthogonal sub-space, the vector of higher projection
correlation between spectra and composition, should be con- is selected and becomes the new starting vector. The choice of the
sidered in the selection process rather than depending upon orthogonal sub-space at each iteration is made in order to select
an optimization procedure that relies solely on model perfor- only the non-collinear variables. The optimal initial variable and
22 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
Fig. 4. Plot of s for experimental and artificial random variables. The cutoff level at max(abs(sartif. )) is indicated by the dashed line.
number of variables can be determined on the basis of the smallest mean value of bj is large and the standard deviation of bj is small,
root mean square error of validation (RMSEV) using the validation the stability value is large. Therefore, the larger the stability, the
set of MLR calibration. more important the corresponding variable is. The variables whose
In terms of prediction ability, SPA-MLR models have been shown stability is less than a threshold should be treated as uninformative
to be comparable to, or better than full-spectrum PLS/PCR mod- and be eliminated.
els in a number of applications, including UV–vis [62] and NIR In order to estimate a suitable cutoff threshold, an artificial ran-
[43] spectrometry. Good results involving the use of SPA together dom variable matrix N (n × p) with very small amplitude (e.g.,
with wavelet regression have also been reported [63]. Furthermore, 10−11 ) is added to the original data to compute their stability.
SPA has also been favourably compared with the genetic algorithm It is obvious that any variable whose stability is less than that
[62], which is a popular tool for variable selection in multivariate of random variables should be known as uninformative and be
calibration and will be discussed later. Moreover, the selected vari- eliminated. In practice, the cutoff threshold is generally defined
ables, could be used as the inputs of MLR, PLS and LS-SVM models by:
[64].
SPA employs simple projection operations to select variables cutoff = k × max(abs(snoise )) (9)
with the minimum of collinearity, however, variables selected by
where k is an arbitrary value, e.g., 0.7 or 0.9 [61].
SPA may have low signal noise to ratio (S/N) or be insufficient for
Fig. 4 [56] shows the plot of s value for experimental and artificial
multivariate calibration, this can affect the precision of the model
random variables, the cutoff level at max(abs(sartif. )) is indicated by
prediction.
the dashed line.
UVE is a method of variable selection based on stability anal-
4.3.2. Uninformative variable elimination ysis of regression coefficients (b). The main steps of UVE can be
In the manual approach, the uninformative sections are sub- summarized as follows:
jectively removed on the basis of either high noise or low
detector response. To address this uninformative variable elim-
(i) First, PLS regression is performed on instrumental response
ination method (UVE-PLS) was recently developed to eliminate
data (X) and property values (y) of the calibration set and the
uninformative variables for calibration of NIR data [33,65,66]. Arti-
optimal number of PLS factors is determined.
ficial random variables are added to the data as a reference so that
(ii) A noise matrix with the same size of the X matrix are generated,
those variables which play a less important role in the model than
whose elements are random numbers in the interval of 0.0–1.0.
the random variables are eliminated. Several versions of UVE-PLS
The elements are multiplied with a small constant to make
were described in Ref. [61]. Here we introduce one simple UVE-PLS
their influence on the model negligible.
method.
(iii) The noise matrix is appended to the original matrix X to form
In linear models, the prediction ŷ is computed with Eq.
an extended matrix having twice as many variables.
(5). A regression coefficient vector b = [b1 , . . ., bp ] is calcu-
(iv) PLS models are constructed based on the extended matrix and
lated through a leave-one-out validation. Because each coefficient
y in manner of leave-one-out cross-validation. This leads to
bj represents the contribution of the corresponding variable
a matrix of b values with as many rows as samples and, one
to the established model, the reliability of each variable j
column for each variable, both original and random.
can be quantitatively measured by the stability defined as
(v) The s value of each variable is calculated as the average of the
[65,61]:
b values of each column divided by the standard deviation of
mean(bj ) that column.
sj = , j = 1, . . . , p (8) (vi) The cutoff value is set as the maximum of absolute value s
std(bj )
among the random variables. Every original variable with equal
where mean(bj ) and std(bj ) are the mean and standard deviation or lower absolute value of s is assumed to be noise only and is
of the regression coefficients of variable j. It is clear that, when the eliminated.
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 23
ple and/or measurement abnormalities (outliers) as detected by of intervals, but now PLS models are calculated with each inter-
PLS inner relation plots should generally be removed prior to the val left out in a sequence, i.e., if one chooses 40 intervals then
application of iPLS. each model is based on 39 intervals leaving out one interval at
Models based upon the various intervals (Xinterval ) usually need a time. The first omitted interval gives the poorest performing
a different number of PLS components than do full-spectrum mod- model with respect to RMSECV or RMSEP (root mean square error
els in order to capture the relevant variation in y. This condition of cross-validation/prediction). This procedure is continued until
is caused by the variable amount of y-correlated information car- one interval remains.
ried by the interval variables (the larger the spectral interval, the The forward iPLS algorithm described in this paper is an inverse
greater the number of substances that are likely to absorb/interfere) evolution of BiPLS like the forward regression model. As in the
and, is also related to the noise/interference carried by the vari- interval PLS model, the data set is split into a given number of inter-
ables. However, the selected model dimension has to be common vals, but now the PLS models are built using successively improving
to all the local models in order to make a comparison possible. intervals with respect to RMSECV measure, i.e., if one chooses 40
In order to favor the “best” spectral region, it is natural to let the intervals, then, the first model is based on one interval which has
simplest interval model (i.e., the one with the smallest number of the best performing model the second model with the next interval
PLS components) guide the selection of the model dimension. A and so on.
fair comparison of the global and local models requires that the
global and local model dimensions be selected separately. Fig. 8 4.5.2.3. Synergy interval partial least squares (SiPLS) and genetic algo-
[55] shows the most common result after processing of iPLS. From rithm interval partial least squares (GA-iPLS). Synergy interval PLS
Fig. 8, several interval models surpass the full-spectrum model and (SiPLS) is an all-possible-interval-combinations procedure tests
the number 12 interval model shows the best results. based on all possible PLS on all subsets of intervals and reports
The results from using iPLS are comparable the other effective the subset of sets giving the lowest RMSECV or RMSEP. The compu-
methods tested, but the main advantage of using iPLS is the graph- tation time can be long depending on the number of intervals and
ical output giving an overview of the spectra data and in displaying the selected number of intervals to combine. The procedures are as
interesting spectral areas which could be selected. follows: first, the data set is subdivided into a number of intervals
(variable-wise) and secondly, all possible PLS model combinations
4.5.2. Expansion methods for interval partial least squares of two, three or four intervals are calculated.
In order to selection the more informative regions and to opti- Literature sources are the GAPLS algorithm described by Leardi
mize results, many methods were developed as expanding iPLS. et al. [84,85], the iPLS algorithm described Nørgaard et al. [94], and
some selection methods such as stepwise, synergy and genetic the GA-iPLS algorithm developed by Xiaobo [54]. The GA algorithm
algorithm were used to combine different intervals. These meth- was used to select intervals and the iPLS algorithm was used as a
ods are including backward/forward iPLS (BiPLS/FiPLS), synergy regression model. Fig. 10(c) shows the algorithm.
iPLS (SiPLS) and genetic algorithm iPLS (GAiPLS). Moving window First, the data set is split into N intervals (variable-wise), and PLS
partial least squares regression (MWPLSR) also expands on iPLS models for each interval are calculated with the results presented
by performing repeated PLS regressions within a window moving in a single plot.
across all variables. This thoroughly assesses the potential vari- Secondly, a GA was used to select several wavelength inter-
able range selection for a given size. As an additional refinement, vals as described by Leardi et al. [84]. However, on this occasion,
changeable size moving window partial least squares (CSMWPLS) the selection variables are intervals and PLS model combinations
allows regions selected by MWPLSR to be systematically modified of these selected intervals by iPLS algorithm are described by
in size to optimize results offering a further improvement. Also, Nørgaard et al. [94].
an inversion of the moving window methods allows for the direct The final intervals selection are the PLS model combinations of
elimination of uninformative wavelength intervals. those intervals that give the best performance model with respect
to RMSECV measure.
4.5.2.1. Simple optimization of the best interval from equidistant This algorithm takes advantages of genetic algorithms (GAs)
interval partial least squares. There is a minimal probability for and iPLS. It generally improves the prediction capabilities of PLS
hitting the optimal interval with the equidistant subdivisions. An modeling.
optimal interval might be found by carrying out small adjustments One of the main advantages of this method is the possibility to
in the interval limits. Fig. 9 shows the optimization algorithm gen- represent a local regression model in a graphical display, focus-
erally performed. It consists of the following steps: (i) interval shift; ing on a choice of better intervals and permitting a comparison
(ii) changes in interval width: two-sided (symmetrical), one-sided among interval models and the full-spectrum model. This method is
(asymmetrical, left), or one-sided (asymmetrical, right). Each step intended to give an overview of the data and can be helpful in inter-
is initiated with the optimal interval limits from the previous step. pretation. The software of interval PLS (iPLS) may be downloaded
The interval limits are changed one variable at a time and evalu- from the website of Royal Veterinary and Agricultural University of
ated by the RMSECV provided by application of PLS regression to the Denmark.
interval; this approach works in practice but could be done more
elegantly. 4.5.2.4. Moving window partial least squares (MWPLS), changing
Starting wavelength (SW), ending wavelength (EW) and wave- size moving window partial least squares (CSMWPLS) and search-
length interval (WI) are three spectral parameters that were ing combination moving window partial least squares (SCMWPLS).
optimized to obtain the best results by the optimization method Moving window (MW) wavelength selection [40] is a strategy to
mentioned above. obtain informative spectral regions which produce better predic-
tion results. In changing size moving window algorithm (CSMW),
4.5.2.2. Backward interval partial least squares (BiPLS) and Forward windows having different sizes are scanned over the whole spectral
interval partial least squares (FiPLS). BiPLS and FiPLS [55] are the iPLS range.
algorithm combined with forward and backward selection meth- In Moving window partial least squares regression (MWPLS)
ods. Fig. 10(a) and (b) show these two algorithms. [90], a spectral window commencing at the ith spectral channel
The backward iPLS (BiPLS) algorithm proceeds as follows: as in and teerminating at the (i + H − 1)th spectral channel is built. Here,
the interval PLS model the data set is split into a given number H is the window size. The spectra obtained in the spectral window is
28 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
a submatrix Xi (N × H) containing the ith to the (i + H − 1)th columns this algorithm [95]. When H = 1, moving the window from the first
of the calibration matrix X. PLS models with varied numbers of PLS to the end point will collect all possible sub-windows with the win-
components can then be built to relate the spectra in the window dow size of 1. Similarly, in other cases of H, all sub-windows with
to the analytes of interests. the size of H may be obtained. Therefore, this algorithm considers
The window is moved through the whole spectra. At each posi- all possible spectral intervals (sub-window or sub-regions) in the
tion, PLS models with varying PLS components are built for the range of the informative region. For every window, a PLS model
calibration of the analytes, and the RMSECV (root mean square error with a selected LVs number is constructed, and root mean square
of cross-validation) or SEC are calculated with these PLS models error of calibration (SEC) is calculated. Comparing values of SEC for
and, plotted as a function of the position of the window. A figure all sub-regions, the sub-region with the smallest value of SEC is
containing such residual lines is plotted, this provides the infor- considered as the optimized spectral interval.
mation about informative regions, where the residue lines show The objective of searching combination moving window par-
low values of RMSECV or SEC. MWPLSR can provide informative tial least squares (SCMWPLS) [96,90] is to search for either the
regions and the approximate latent variable numbers. The informa- optimized combination of informative regions or an optimized
tive regions can construct improved, but not optimized prediction individual informative region. Fig. 11(b) explains this algorithm
PLS models, comparing with the whole spectral region. [95]. First, MWPLSR is performed to locate the informative regions.
Changeable size moving window partial least squares (CSMW- Subsequently, SCMWPLS starts the process from the first infor-
PLS) is a method to optimize an informative region, i.e., to search mative region. This informative region is optimized by changing
for an optimized sub-region in a selected informative region. The the moving window size H from 1 to p. A moving window is
basic idea of CSMWPLS [95], for a given informative region with moved from the first spectral point to the (p − H + 1)th point over
p spectral points, is to change the moving window size w from 1 the informative region and collects all possible sub-windows for
to p. A moving window is moved from the first spectral point to every window size. A PLS model with a reasonable PLS component
the (p − H + 1)th point over the informative region and to collect all selected by cross-validation is built and RMSEC is calculated for
possible sub-windows for every window size. Fig. 11(a) explains every window obtained. The sub-region with the smallest value
of RMSEC among all sub-regions is considered as the optimized wavelengths are varied. Thus, by removing step 2, the following
sub-region and named as the base-region. In the next step, all pos- procedure will describe the steps required for CSMW. On the other
sible sub-regions for every window size are found out in the second hand, the main difference between MWPLSR and CSMWPLSR is in
informative region. Then, all these sub-regions are combined to the step 5. In this step, the algorithm is repeated with changing win-
base-region and a PLS model built by calculating its RMSEC. Next, dows intervals. Thus, by removing the steps 2 and 5 from the steps
a new base-region is chosen with the smallest value of RMSEC. of MCSMW, the procedure for MWPLSR may be obtained.
The same procedure as above is repeated until the last informa- There are two points to note. First, researchers can manu-
tive region is reached. Finally, the last base-region is considered as ally select their intervals in the spectra. Manual selection can
the optimized combination. also include the interval selection methods. Secondly, the wave-
Kasemsumran et al. [90] proposed and modified changeable size length selected with this procedure constitutes a set of descriptor
moving window partial least squares (MCSMWPLSR). The major variables, which can eventually be fed to different regression tech-
difference is the exertion of a wavelength interval changing step. niques (such as MLR), and including non-linear methods, such as
The steps for the MCSMW can be written as follows: neural networks, support vector machine (SVM), etc.
(i) Selecting a fixed size wavelength window having width desig- 4.5.3. Interval selection based on other methods
nated by W. Recently, wavelength selection procedures for the multivariate
(ii) Selecting a wavelength interval (WI) between the sensors; i.e., factor based methods of hybrid linear analysis (HLA) [42,97,98] and
WI = 2 means that in the selected window the wavelength interactive variable selection for PLS (IVS-PLS) [82,91,99–101] have
number 1, 3, 5 and so on are considered in the modeling. been discussed in many literatures.
This number indicates the number of wavelengths (NW) in the Wavelength selection by HLA involves the calculation of net ana-
selected window (NW = W/WI). Here, WI was varied between lyte signal regression plots (NASRP) from HLA, combined with a
1 and 10. moving window strategy. The main concept of HLA is to obtain a
(iii) Applying the desired regression method, i.e., PLS on the limited number of factors of a data matrix in which the contribution
absorbance data in the selected window, determining the of the analyte of interest has been removed, and, is therefore based
optimum number of factors and calculating the models per- on net analyte signal (NAS) calculation. The first significant factors
formances by cross-validation. of the HLA data matrix (from which the contribution of a given ana-
(iv) Scanning the selected window with specified WI through lyte has been removed) are used to search for the minimum error
the whole spectral region, by changing starting and ending indicator (EI). HLA uses less factors than the partial least squares
wavelength (SW and EW, respectively), and calculating model (PLS) method, and is simpler to adapt to the NASRP methodology
performances for each sub-region as described in the previous [42].
step. Iterative PLS (iPLS) is a variable selection method that is
(v) Go to step 1 for changing the windows width. designed to start with a small number of variables/windows and
subsequently add new variables/windows to or remove original
In the case of each analyte of interest, for every window, a regres- ones from the data set provided it improves the model. The method
sion model (PLS) with a selected number of factors is made, and used consists of four steps [101]:
root mean square error of cross-validation (RMSECV) is calculated.
Comparing values of RMSECV for all sub-regions, the sub-region (i) The original variables/windows are selected randomly.
with the smallest value of RMSECV is considered as the optimized (ii) An ordinary PLS calculation, using the selected wavelengths, is
spectral interval. made and the model is evaluated using cross-validation.
Referring to SCMWPLS reveals that the above steps, except step (iii) The variable/window to be added or withdrawn from the
2, are also used in CSMW. Therefore, the main difference between model is chosen randomly and a new PLS model is built and
CSMW and MCSMW is in step 2, where the intervals between the evaluated by means of cross-validation.
30 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
Table 1
Free processing toolbox for NIR spectroscopy.
PLS Toolbox MLR, PLS, PCR and many pre-processing methods Eigenvector Research, Inc.
3905 West Eaglerock Drive
Wenatchee, WA 98801
www.eigenvector.com
(iv) If the new cross-validation value (root mean square error of ages cited in Table 1 are free available software and the authors
cross-validation, RMSECV) is lower than the original, the new would like to express their gratitude to the developers.
set of variables replaces the original. If the new cross-validation
value is higher than the original, the original set of variables is
retained. 6. Summary
the information based on both spectral data and sample proper- [8] J.D. Caplan, S. Waxman, R.W. Nesto, J.E. Muller, Journal of the American College
ties data, suffer from the following drawbacks. (i) The MI estimation of Cardiology 47 (2006) C92–C96.
[9] A. Sakudo, Y. Suganuma, T. Kobayashi, T. Onodera, K. Ikuta, Biochemical and
becomes difficult as the number of selected variables grows. Indeed Biophysical Research Communications 341 (2006) 279–284.
in a forward procedure, the estimation is faced with the problem [10] C. Connolly, Sensor Review 25 (2005) 192–194.
of dimensionality, and making the estimation of the MI with the [11] G.P. Moreda, J. Ortiz-Cañavate, F.J. García-Ramos, M. Ruiz-Altisent, Journal of
Food Engineering 92 (2009) 119–136.
last selected feature much more difficult than with the first. (ii) [12] C.E. Miller, in: P. Williams, K. Norris (Eds.), Near-Infrared Technology in the
The low number of spectra usually available for learning makes the Agricultural and Food Industries, American Society of Cereal Chemists, St.
results of the selection highly dependent on the data set: a small Paul, Minnesota, 2001, pp. 19–37.
[13] R. Karoui, J. De Baerdemaeker, Food Chemistry 102 (2007) 621–640.
change in the data can lead to different variable selection sets, mak- [14] S. Landau, T. Glasser, L. Dvash, Small Ruminant Research 61 (2006) 1–11.
ing interpretation difficult. (iii) Even though the estimation of the [15] N. Boaz, R.C. Ronald, Journal of Chemometrics 19 (2005) 107–118.
mutual information is less demanding in terms of computation time [16] H. Namkung, Y. Lee, H. Chung, Analytica Chimica Acta 606 (2008) 50–56.
[17] R.C. Schneider, K.-A. Kovar, Forensic Science International 134 (2003)
than the construction of a nonlinear model, the large number of
187–195.
initial variables results in high computation times for the selec- [18] C.B. Zachariassen, J. Larsen, F. van den Berg, S.B. Engelsen, Chemometrics and
tion. Secondly, search-based methods provide a promising way Intelligent Laboratory Systems 76 (2005) 149–161.
to extend state-of-the-art spectral analysis to nonlinear method- [19] T.M. Baye, T.C. Pearson, A.M. Settles, Journal of Cereal Science 43 (2006)
236–243.
ologies; genetic algorithms (GA) offer an interesting, flexible and [20] D.D. Archibald, D.E. Akin, Vibrational Spectroscopy 23 (2000) 169–180.
widely used wavelength variable selection. However, a problem [21] L.O. Rodrigues, J.L. Marques, J.P. Cardoso, J.C. Menezes, Chemometrics and
inherent to all search-based methods is a tendency to yield wave- Intelligent Laboratory Systems 75 (2005) 101–108.
[22] K. Krämer, S. Ebel, Analytica Chimica Acta 420 (2000) 155–161.
length selection instabilities relative to sample data additions or [23] H. Sato, M. Kiguchi, F. Kawaguchi, A. Maki, NeuroImage 21 (2004) 1554–1562.
subtractions, which is due to the susceptibility of the region selec- [24] M. Casale, M.-J. Sáiz Abajo, J.-M. González-Sáiz, C. Pizarro, M. Forina, Analytica
tion to random noise. Thirdly, interval PLS (iPLS), including moving Chimica Acta 557 (2006) 360–366.
[25] D.-L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte, L. Kaufman,
windows strategy (MWS) are very good at locating wavelength Chemometrics: A Textbook, Elsevier, Amsterdam, 1988.
regions of the main component contributions. However, the con- [26] T. Naes, T. Isaksson, T. Fearn, T. Davis, A User-Friendly Guide to Multivariate
straints placed on the interval width and number avoid the need Calibration and Classification, NIR Publications, Christerer, UK, 2002.
[27] J.J. Workman Jr., in: D.A. Burns, E.W. Ciurczak (Eds.), Handbook of Near-
for testing large numbers of combinations, while still providing an Infrared Analysis, Marcel Dekker, Inc., New York, 1992, pp. 274–276.
exhaustive search pattern. Perhaps, an alternative approach that [28] M. Forina, S. Lanteri, M. Casale, M.C. Cerrato Oliveros, Chemometrics and
avoids the potential wavelength selection instability pitfall and, Intelligent Laboratory Systems 87 (2007) 252–261.
[29] B.L. Becker, D.P. Lusch, J. Qi, Remote Sensing of Environment 108 (2007)
provides a simpler graphic representation amenable to spectro-
111–120.
scopic interpretation is worth exploring. [30] U.G. Indahl, N.S. Sahni, B. Kirkhus, T. Næs, Chemometrics and Intelligent Lab-
Variable selection techniques consist of selecting particular vari- oratory Systems 49 (1999) 19–31.
ables related to the response. Generally, variable selection aims to [31] P.J. de Groot, G.J. Postma, W.J. Melssen, L.M.C. Buydens, Analytica Chimica
Acta 392 (1999) 67–75.
identify a subset of wavelengths that produces the smallest pos- [32] Q. Guo, W. Wu, D.-L. Massart, Analytica Chimica Acta 382 (1999) 87–103.
sible error. The benefits of variable selection are twofold. Many [33] W. Wu, Q. Guo, D. Jouan-Rimbaud, D.-L. Massart, Chemometrics and Intelli-
literatures have shown that PLSR and PCR methods perform better gent Laboratory Systems 45 (1999) 39–53.
[34] W. Wu, S.C. Rutan, A. Baldovin, D.-L. Massart, Analytica Chimica Acta 335
when wavelength selection is applied. However, this is not always (1996) 11–22.
the case because, when selecting the most correlated wavelengths, [35] W. Wu, D.-L. Massart, Chemometrics and Intelligent Laboratory Systems 35
one might eliminate those that correct for the influence of inter- (1996) 127–135.
[36] W. Wu, B. Walczak, D.-L. Massart, S. Heuerding, F. Erni, I.R. Last, K.A. Prebble,
fering compounds or factors. Indeed, a variable that is completely Chemometrics and Intelligent Laboratory Systems 33 (1996) 35–46.
useless by itself can provide a significant improvement in per- [37] W. Wu, Y. Mallet, B. Walczak, W. Penninckx, D.-L. Massart, S. Heuerding, F.
formance when taken in combination with others. Nevertheless, Erni, Analytica Chimica Acta 329 (1996) 257–265.
[38] M. Zeaiter, J.M. Roger, V. Bellon-Maurel, Trends in Analytical Chemistry 24
variable selection provides faster, more cost-effective predictors. (2005) 437–445.
[39] R.H. William, in: P. Williams, K. Norris (Eds.), Near-Infrared Technology in
the Agricultural and Food Industries, American Society of Cereal Chemists, St.
Acknowledgements Paul, Minnesota, 2001, pp. 39–58.
[40] B. Hemmateenejad, M. Akhond, F. Samari, Spectrochimica Acta Part A: Molec-
ular and Biomolecular Spectroscopy 67 (2007) 958–965.
The authors gratefully acknowledge the financial support pro-
[41] M. Khanmohammadi, M.A. Karimi, K. Ghasemi, M. Jabbari, A.B. Garmarudi,
vided by the foundations of NSFC (Grant no. 6091079), Chinese 863 Talanta 72 (2007) 620–625.
Program (Grant nos. 2008AA10Z208, 2008AA10Z204), the Postdoc- [42] B. Hemmateenejad, R. Ghavami, R. Miri, M. Shamsipur, Talanta 68 (2006)
1222–1229.
toral Foundation of China (20070411024, 0601003C) and the talent
[43] R.K.H. Galvão, M. Fernanda Pimentel, M.C.U. Araújo, T. Yoneyama, V. Visani,
foundation of Jiangsu University. Dr. Zou Xiabo thanks Dr. Jianshe Analytica Chimica Acta 443 (2001) 107–115.
Chen (University of Leeds) for advice and encouragement, and to [44] G.A. Bakken, T.P. Houghton, J.H. Kalivas, Chemometrics and Intelligent Labo-
the many researchers whom have offered the stimulating works in ratory Systems 45 (1999) 225–239.
[45] W. Wu, R. Manne, Chemometrics and Intelligent Laboratory Systems 51
this field. (2000) 145–161.
[46] L. Pasti, D. Jouan-Rimbaud, D.-L. Massart, O.E.D. Noord, Analytica Chimica Acta
364 (1998) 253–263.
References [47] T. Fearn, in: J.M. Chalmers, P.R. Griffiths (Eds.), Handbook of Vibrational Spec-
troscopy, vol. 3, Wiley, Chichester, 2002, pp. 2086–2093.
[1] A. Murugesan, C. Umarani, T.R. Chinnusamy, M. Krishnan, R. Subramanian, [48] H. Martens, T. Naes, Multivariate Calibration, Wiley, Chichester, UK, 1989.
N. Neduzchezhain, Renewable and Sustainable Energy Reviews 13 (2009) [49] D. Jouan-Rimbaud, B. Walczak, D.-L. Massart, I.R. Last, K.A. Prebble, Analytica
825–834. Chimica Acta 304 (1995) 285–295.
[2] L.C. Meher, D. Vidya Sagar, S.N. Naik, Renewable and Sustainable Energy [50] A. Donachie, A.D. Walmsley, S.J. Haswell, Analytica Chimica Acta 378 (1999)
Reviews 10 (2006) 248–268. 235–243.
[3] C. Gendrin, Y. Roggo, C. Spiegel, C. Collet, European Journal of Pharmaceutics [51] E. Vigneau, D. Bertrand, E.M. Qannari, Chemometrics and Intelligent Labora-
and Biopharmaceutics 68 (2008) 828–837. tory Systems 35 (1996) 231–238.
[4] Y. Roggo, P. Chalus, L. Maurer, C. Lema-Martinez, A. Edmond, N. Jent, Journal [52] P.D. Wentzell, L. Vega Montoto, Chemometrics and Intelligent Laboratory
of Pharmaceutical and Biomedical Analysis 44 (2007) 683–700. Systems 65 (2003) 257–279.
[5] J. Nyström, E. Dahlquist, Fuel 83 (2004) 773–779. [53] S. Wold, J. Trygg, A. Berglund, H. Antti, Chemometrics and Intelligent Labora-
[6] K.D. Shepherd, M.G. Walsh, Journal of Near Infrared Spectroscopy 15 (2007) tory Systems 58 (2001) 131–150.
1–19. [54] Z. Xiaobo, Z. Jiewen, H. Xingyi, L. Yanxiao, Chemometrics and Intelligent Lab-
[7] S.J. Erickson, A. Godavarty, Medical Engineering & Physics 31 (2009) 495–509. oratory Systems 87 (2007) 43–51.
32 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32
[55] X. Zou, J. Zhao, Y. Li, Vibrational Spectroscopy 44 (2007) 220–227. [91] A. Bogomolov, M. Hachey, Chemometrics and Intelligent Laboratory Systems
[56] X. Zou, Apple’s quality inspection technology based on fusion of machine 88 (2007) 132–142.
vision, electronic nose and NIR spectroscopy, Zhenjiang, China, 2005. [92] J.A. Cramer, K.E. Kramer, K.J. Johnson, R.E. Morris, S.L. Rose-Pehrsson, Chemo-
[57] E. Bertran, M. Blanco, S. Maspoch, M.C. Ortiz, M.S. Sánchez, L.A. Sarabia, metrics and Intelligent Laboratory Systems 92 (2008) 13–21.
Chemometrics and Intelligent Laboratory Systems 49 (1999) 215–224. [93] A.F.C. Pereira, M.J.C. Pontes, F.F.G. Neto, S.R.B. Santos, R.K.H. Galvão, M.C.U.
[58] S.D. Frans, J.M. Harris, Analytical Chemistry 57 (1985) 2680–2684. Araújo, Food Research International 41 (2008) 341–348.
[59] C.H. Spiegelman, M.J. McShane, M.J. Goetz, M. Motamedi, Q.L. Yue, G.L. Cote, [94] A.S.L. Nørgaard, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Applied Spec-
Analytical Chemistry 70 (1998) 35–44. troscopy 54 (2000) 413–419.
[60] M.L. Thompson, International Statistical Review 46 (1978) 1–19. [95] Y.P. Du, Y.Z. Liang, J.H. Jiang, R.J. Berry, Y. Ozaki, Analytica Chimica Acta 501
[61] V. Centner, D.-L. Massart, Analytical Chemistry 68 (1996) 3851–3858. (2004) 183–191.
[62] M.C.U. Araújo, T.C.B. Saldanha, R.K.H. Galvão, T. Yoneyama, H.C. Chame, V. [96] Y. Zheng, X. Lai, S.W. Bruun, H. Ipsen, J.N. Larsen, H. Løwenstein, I. Søndergaard,
Visani, Chemometrics and Intelligent Laboratory Systems 57 (2001) 65– S. Jacobsen, Journal of Pharmaceutical and Biomedical Analysis 46 (2008)
73. 592–596.
[63] M.J.C. Pontes, J. Cortez, R.K.H. Galvão, C. Pasquini, M.C.U. Araújo, R.M. Coelho, [97] M. Kompany-Zareh, S. Mirzaei, Analytica Chimica Acta 526 (2004) 83–94.
M.K. Chiba, M.F. de Abreu, B.E. Madari, Analytica Chimica Acta 642 (2009) [98] H.C. Goicoechea, A.C. Olivieri, Talanta 49 (1999) 793–800.
12–18. [99] I. Esteban-Díez, J.-M. González-Sáiz, C. Pizarro, Analytica Chimica Acta 525
[64] F. Liu, Y. Jiang, Y. He, Analytica Chimica Acta 635 (2009) 45–52. (2004) 171–182.
[65] W. Cai, Y. Li, X. Shao, Chemometrics and Intelligent Laboratory Systems 90 [100] D. Chen, W. Cai, X. Shao, Chemometrics and Intelligent Laboratory Systems
(2008) 188–194. 87 (2007) 312–318.
[66] S. Ye, D. Wang, S. Min, Chemometrics and Intelligent Laboratory Systems 91 [101] C. Abrahamsson, J. Johansson, A. Sparen, F. Lindgren, Chemometrics and Intel-
(2008) 194–199. ligent Laboratory Systems 69 (2003) 3–12.
[67] S. Kirkpatrick, M.P. Vecchi, Science 220 (1983) 671–680. [102] F. Rossi, D. Francois, V. Wertz, M. Meurens, M. Verleysen, Chemometrics and
[68] H. Swierenga, P.J. de Groot, A.P. de Weijer, M.W.J. Derksen, L.M.C. Buydens, Intelligent Laboratory Systems 86 (2007) 208–218.
Chemometrics and Intelligent Laboratory Systems 41 (1998) 237–248. [103] J. Luypaert, S. Heuerding, Y.V. Heyden, D.-L. Massart, Journal of Pharmaceuti-
[69] J.H. Kalivas, N. Roberts, J.M. Sutter, Analytical Chemistry 61 (1989) cal and Biomedical Analysis 36 (2004) 495–503.
2024–2030. [104] M. Vannucci, N. Sha, P.J. Brown, Chemometrics and Intelligent Laboratory
[70] H. Swierenga, F. Wülfert, O.E. de Noord, A.P. de Weijer, A.K. Smilde, L.M.C. Systems 77 (2005) 139–148.
Buydens, Analytica Chimica Acta 411 (2000) 121–135. [105] J.A. Panford, J.M. deMan, Journal of the American Oil Chemists’ Society 67
[71] J.R.M. Smits, W.J. Melssen, L.M.C. Buydens, G. Kateman, Chemometrics and (1990) 473–482.
Intelligent Laboratory Systems 22 (1994) 165–189. [106] C.W. Brown, P.F. Lynch, R.J. Obremski, D.S. Lavery, Analytical Chemistry 54
[72] S. Robert, A. Mure-Ravaud, S. Thiria, M. Yacoub, F. Badran, Optics Communi- (1982) 1472–1479.
cations 238 (2004) 215–228. [107] A. Hoskuldsson, Chemometrics and Intelligent Laboratory Systems 55 (2001)
[73] Z. Boger, Analytica Chimica Acta 490 (2003) 31–40. 23–38.
[74] V.G. Franco, J.C. Perin, V.E. Mantovani, H.C. Goicoechea, Talanta 68 (2006)
1005–1012. Glossary
[75] R. Todeschini, D. Galvagni, J.L. Vilchez, M. del Olmo, N. Navas, Trends in Ana-
lytical Chemistry 18 (1999) 93–98.
MVC: multivariate calibration
[76] L.F. Capitan-Vallvey, N. Navas, M. del Olmo, V. Consonni, R. Todeschini, Talanta
LMVC: linear multivariate calibration
52 (2000) 1069–1079.
[77] C.B. Lucasius, M.L.M. Beckers, G. Kateman, Analytica Chimica Acta 286 (1994) PLS: partial least squares
135–153. MLR: multiple linear regression
[78] O. Polgár, M. Fried, T. Lohner, I. Bársony, Surface Science 457 (2000) 157–177. PCR: principal components regression
[79] R.K.H. Galvão, M.C.U. Araújo, M.D.N. Martins, G.E. José, M.J.C. Pontes, E.C. Silva, SEC: standard error of calibration
T.C.B. Saldanha, Chemometrics and Intelligent Laboratory Systems 81 (2006) RMSECV: root mean square error of cross-validation
60–67. r: correlation coefficient
[80] P.A. da Costa Filho, Analytica Chimica Acta 631 (2009) 206–211. SEP: standard error of prediction
[81] S. Gourvenec, X. Capron, D.-L. Massart, Analytica Chimica Acta 519 (2004) LOOCV: leave-one-out cross-validation
11–21. SPA: successive projections algorithm
[82] H. Abdollahi, L. Bagheri, Analytica Chimica Acta 514 (2004) 211–218. UVE: uninformative variable elimination
[83] J. Ghasemi, A. Niazi, R. Leardi, Talanta 59 (2003) 311–317. SA: simulated annealing
[84] R. Leardi, in: Data Handling in Science and Technology, Elsevier, 2003, pp.
ANN: artificial neural networks
169–196.
GA: genetic algorithm
[85] R. Leardi, M.B. Seasholtz, R.J. Pell, Analytica Chimica Acta 461 (2002) 189–200.
[86] A. Durand, O. Devos, C. Ruckebusch, J.P. Huvenne, Analytica Chimica Acta 595 iPLS: interval partial least squares
(2007) 72–79. BP-ANN: back-propagation artificial neural networks
[87] I.M. Baskir, A.V. Drozd, Chemometrics and Intelligent Laboratory Systems 66 K-ANN: Kohonen artificial neural network
(2003) 89–91. BiPLS: backward iPLS
[88] L. Stordrange, T. Rajalahti, F.O. Libnau, Chemometrics and Intelligent Labora- FiPLS: forward iPLS
tory Systems 70 (2004) 137–145. SiPLS: synergy iPLS
[89] L. Nørgaard, M.T. Hahn, L.B. Knudsen, I.A. Farhat, S.B. Engelsen, International GAiPLS: genetic algorithm iPLS
Dairy Journal 15 (2005) 1261–1270. MWPLSR: moving window partial least squares regression
[90] S. Kasemsumran, Y.P. Du, K. Maruo, Y. Ozaki, Chemometrics and Intelligent CSMWPLS: changeable size moving window partial least squares
Laboratory Systems 82 (2006) 97–103.