0% found this document useful (0 votes)
42 views19 pages

Xiaobo 2010

Uploaded by

zyynir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views19 pages

Xiaobo 2010

Uploaded by

zyynir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Analytica Chimica Acta 667 (2010) 14–32

Contents lists available at ScienceDirect

Analytica Chimica Acta


journal homepage: www.elsevier.com/locate/aca

Review

Variables selection methods in near-infrared spectroscopy


Zou Xiaobo a,∗ , Zhao Jiewen a , Malcolm J.W. Povey b , Mel Holmes b , Mao Hanpin a
a
School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, China
b
Department of Food Science and Nutrition, Leeds University, Leeds LS2 9JT, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields,
Received 30 November 2009 such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical
Received in revised form 21 March 2010 sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning
Accepted 23 March 2010
instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most
Available online 30 March 2010
data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable
unreliable. Recently, considerable effort has been directed towards developing and evaluating differ-
Keywords:
ent procedures that objectively identify variables which contribute useful information and/or eliminate
Near-infrared spectroscopy
Chemometrics
variables containing mostly noise. This review focuses on the variable selection methods in NIR spec-
Wavelength troscopy. Selection methods include some classical approaches, such as manual approach (knowledge
Variable selection based selection), “Univariate” and “Sequential” selection methods; sophisticated methods such as succes-
sive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based
strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs)
and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS.
Wavelength selection with B-spline, Kalman filtering, Fisher’s weights and Bayesian are also mentioned.
Finally, the websites of some variable selection software and toolboxes for non-commercial use are given.

© 2010 Elsevier B.V. All rights reserved.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2. The importance of variable selection in NIR spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1. Chemical basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2. Physical basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3. Statistical and multivariate calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4. Instrument and industrial requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. A brief review of regression methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1. Calibration and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2. Multivariate linear regression (MLR), principal component regression (PCR) and partial least squares regression (PLSR) . . . . . . . . . . . . . . . . 19
3.2.1. Multiple linear regression (MLR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2. Principal component regression (PCR) and partial least squares regression (PLSR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4. Variables selection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1. Manual approaches – knowledge based selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2. Variable selection by single-term linear regression and multi-term regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1. Selection by single-term linear regression and the correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2. Selection by multi-term regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3. Successive projections algorithm (SPA) and uninformative variable elimination (UVE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1. Successive projections algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.2. Uninformative variable elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.3. UVE–SPA method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4. Simulated annealing (SA), artificial neural networks ANN) and genetic algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

∗ Corresponding author. Tel.: +86 511 8780174.


E-mail address: zou xiaobo@ujs.edu.cn (Z. Xiaobo).

0003-2670/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2010.03.048
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 15

4.4.1. Simulated annealing (SA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


4.4.2. Artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.3. Genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5. Interval selection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.1. Interval partial least squares (iPLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.2. Expansion methods for interval partial least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.3. Interval selection based on other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6. Other wavelength selection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5. Software of wavelength selection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1. Introduction as PLS, the influence of data that does not contain critical informa-
tion can severely corrupt the resulting calibration models, because
In recent years, near-infrared (NIR) spectroscopy has gained not all variables or their regions are equally important for the mod-
wide acceptance in different fields by virtue of its advantages eling; some of them, like noise areas, may even be harmful. Data
over other analytical techniques, the most salient of which is projection on an abstract factor space reduces the error but does
its ability to record spectra for solid and liquid samples without not eliminate it entirely; it is partially projected onto the new data
any pretreatment. This characteristic makes it especially attrac- space, often confounding the model. Therefore, removal of the vari-
tive for straightforward, speedy characterization of natural and ables, in which the noise dominates over the relevant information
synthetic products. The cost savings of NIR measurements related often leads to better accuracy and performance of the analytical
to improved control and product quality are often achieved and methods.
can provide results significantly faster compared to traditional lab- In contrast, selection methods are based on the principle of
oratory analysis. In batch processes, NIR allows several quality choosing a small number of variables selected from the original
estimates to be performed within a manufacturing cycle in opposed provide easier interpretation. Variable selection in multivariate
to a single end of batch analysis. Therefore, it can reveal potential analysis is a very important step, because the removal of non-
problems early in the process and promote corrective actions, this informative variables will produce better prediction and simpler
may have particular advantages in the case where safety is a fac- models. It has been shown that the predictive ability can be
tor. Also, e.g., safety aspects can be seen as one of the advantages increased and, the complexity of the model can be reduced by a
due to intrinsically safe measurement probes and fiber optics. NIR judicious pre-selection of wavelengths. It is now widely accepted
spectroscopy has increasingly been adopted as an analytical tool in that a well-performed variable selection can result in models hav-
variety of different fields during the past 15 years, for example in ing a greater predictive ability [15].
the petrochemical [1,2], pharmaceutical [3,4], environmental [5,6], Variable or feature selection, also called “frequency” or “wave-
clinical [7–9], agricultural [6,10–12], food [13] and biomedical [14] length” selection when applied to spectroscopic data, is a critical
sectors. step in data analysis, as it allows interactive improvement of the
Typically, modern NIR analysis involves the rapid acquisition of quality of data during the calibration procedure. The goal of fre-
large number of absorbance values for a selected spectral range. The quency selection is to identify a subset of spectral frequencies that
information contained in the spectral curve is then used to predict produce the smallest possible errors when used to perform opera-
the chemical composition of the sample by extracting the appro- tions such as making quantitative determinations or discriminating
priate variables of interest. Generally, NIR spectroscopy is used in between dissimilar samples. Recently, considerable effort has been
combination with multivariate techniques for qualitative or quan- directed toward developing and evaluating different procedures
titative analysis. The large number of spectral variables in most data that objectively identify variables that contribute useful informa-
sets encountered in spectral chemometrics often renders the pre- tion and/or eliminate variables containing mostly noise. Classically,
diction of a dependent variable complicated, however by the use of this selection is made from the basic knowledge about the spectro-
suitable projection or selection techniques the problem may be nin- scopic properties of the sample – knowledge based selection [16],
imised. Selection and projection methods differ in several aspects but it has been shown that there are mathematical strategies for
[15]. variable selection that are more efficient.
Projection methods, for example, partial least squares (PLS) From a conceptual point of view, a variable selection procedure
and principal component regression (PCR) are generally applicable includes first the choice of a relevance measure and, second, the
but do not presuppose any bias or weights on the principal axes. choice of a search algorithm to perform optimization. The relevance
However, projection calibration models are straightforward and measure aims at evaluating the influence of a particular subset of
the model calculations can be performed quickly by commercially X-variables on the dependent variables, y. Concerning the search
available software packages. Earlier PCR and PLS full-spectrum algorithm, stochastic algorithms are performed in applications such
methods did not feature preliminary selection, but introduce latent as spectroscopic multivariate calibration. This approach is usually
variables comprised of combinations of the original features. Even called computer aided variable selection. Computer aided variable
where prediction properties are good, they usually suffer from selection is an important pre-processing procedure in chemomet-
the fact that the latent variables are hardly interpretable in terms rics, which is widely used to improve the performance of various
of original features (wavelengths in the case of infrared spectra). multivariate methods and algorithms, such as regression methods,
Furthermore, multivariate calibration models such as partial least factor analysis, and curve resolution. Multivariate approaches can
squares (PLS) regression have been developed for quantitative anal- exploit all variables and effectively extract necessary information in
ysis of spectral data because of their ability to reduce the impact of the analysis. Computer aided variable selection is also important in
common problems such as collinearity, band overlaps, and inter- industry for several reasons. Variable selection can improve model
actions. However, even with such sophisticated chemometric tools performance, provide robust models that may be readily trans-
16 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Fig. 1. Overtone and combination NIR band assignment (from Bruker Gmbh, Bremen, Germany).

ferred and allow non-expert users to build reliable models with tive variable elimination (UVE). The fourth part of Section 4 focuses
only limited expert intervention. Furthermore, computer aided on elaborate search-based strategies, such as simulated anneal-
selection of variables may be the only approach for some mod- ing (SA), artificial neural networks (ANN) and genetic algorithms
els, for example predicting a physical property from spectral data. (GAs). Interval partial least squares (iPLS), including moving win-
Exploiting state-of-the-art theories and techniques of the late 20th dows PLS and iterative PLS are well discussed in the fifth part of
and the 21st centuries, has enabled tremendous progress in NIR Section 4. The last part of Section 4 introduces some other selec-
spectroscopy. tion methods, such as B-spine and Kalman filter, etc. Finally, the
There are a multitude of approaches available for variable websites of some variable selection software and toolboxes for non-
selection. These maybe categorized as follows. First, “Univariate” commercial use are given in Section 5. The review ends with a brief
approaches select those variables that have the greatest correlation summary.
with the response, mainly in the early NIR spectroscopy selection
time. Secondly, “Sequential” approaches rank variables in order and 2. The importance of variable selection in NIR spectroscopy
pair the variables in a forward or backward progression. A more
sophisticated approach iterates the progression to reassess previ- There is much literature about the importance of variable selec-
ous selections. An inherent problem with these approaches is that tion in NIR spectroscopy. Here, the different aspects of variable
only a very small part of the experimental domain is explored. selection are summarized.
These methods were used from middle of the 1970s to the mid-
dle of the 1990s. Thirdly, since 1990s, “multivariate” methods of 2.1. Chemical basis
variable selection have been introduced, namely, interactive vari-
able selection, uninformative variable elimination (UVE), interval NIR spectroscopy involves energy transfer between light and
PLS (iPLS), significance tests of model parameters, and the use of matter. The spectral features of samples in the near-infrared
genetic algorithms (GAs) for example. (1000–2500 nm) spectral region are associated with the vibrational
This review emphasizes variable selection methods in NIR spec- modes of functional groups. Organic matter present in samples has
troscopy, and is organized as follows. The importance of variable distinct spectral fingerprints in the NIR region because of the rela-
selection in NIR spectroscopy is given in Section 2 based on different tively strong absorption of overtones and the combination modes
viewpoints. Section 3 gives a brief review of the global calibration relative to several functional groups, such as C· · ·H (aliphatic), C· · ·H
methods PCR and PLS and the most common method MLR, because (aromatic), C· · ·O (carboxyl), O· · ·H (hydroxyl) and N· · ·H (amine and
these methods are always used as relevance measure methods in amide), usually present in the organic compounds. Restriction of
variable selection. Variable selection methods are discussed in Sec- the data set to the wavelengths of the second overtones of the vibra-
tion 4. Some classical approaches, such as the manual approach tion bands of CH, CH2 and CH3 bonds and the exclusion of the OH
(knowledge based selection), “Univariate” and “Sequential” selec- vibration bands (water and sugars, see above) also improved the
tion methods are introduced in the first and second part of Section model [17]. Organic molecules have specific absorption patterns in
4. The third part of Section 4 discussed the relatively, sophisticated the near-infrared region that can report the chemical composition
methods, successive projections algorithm (SPA) and uninforma- of the material being analyzed. The functional group effect is by far
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 17

the most dominant of all effects in the NIR spectrum. Fig. 1 shows highly correlated by nature, but in addition, real applications usu-
the NIR correlation chart. The chart as a simply summarises the ally concern databases with a low number of known spectra, and
most prominent effects, that of the functional groups, and offers a a high number of spectral variables. Any method built on the orig-
very useful reference for both experienced and inexperienced users inal spectral variables is thus ill-posed, making feature (spectral
of NIR technology. However, because of the complicated nature of variable) selection and/or projection necessary.
NIR spectra, such as neighbour group effects, hydrogen bonding, Viewed from a multivariate calibration perspective, variable
crystallinity, phase separation, thermal and mechanical, etc., most selection attempts to identify and remove the variables that penal-
NIR band assignments were not made from fundamental studies of ize the performance of a model since they are useless, noisy and
simple molecules, but rather from empirical NIR method develop- redundant or correlated by chance. Variable selection procedures
ment. In other cases, it was possible to estimate band positions of are of particular interest when dealing with spectroscopic data.
functional group in the NIR region from known band positions of Indeed, the number of variables is potentially very large with
the same functional group in the IR spectrum. It is important to note regard to the number of samples at disposal for a regression model.
that the band positions represented in the chart are only approx- Therefore, in MLR, wavelength selection is a necessary part of the
imate and were compiled from a limited amount of experimental procedure for building the calibration model. Usually, this dimen-
data. Despite many limitations, the charts should serve as useful sionality problem is circumvented using methods such as partial
quick references for NIR users [12]. least squares (PLS) regression. But the PLS latent variables cal-
culated may also be affected by redundancies or the presence
2.2. Physical basis of irrelevant variables. Wavelength selection is just a way for
improving precision. Consequently, improvements in prediction
If knowledge were available about the relation between exter- performance and in model characteristics (more robust models)
nal variables and the spectral intensities, many NIR chemometric can be expected from variable selection. Furthermore, understand-
problems may have satisfactory explanation. Usually, however, no ing and interpretating of the chemical process under investigation
physical model is available for estimating the influence of external should be facilitated paying particular attention to the relevant
variations on the spectral variables. As a result, variable selection spectroscopic variables.
techniques need to be used to select a spectral subset.
As in other spectrophotometric techniques, the origin of some
2.4. Instrument and industrial requirements
non-linearities in NIR spectroscopy are well-known (e.g., unfulfill-
ment of Beer’s law at high analyte concentrations, non-linearity
Instrumentation spectra used for chemometrics analysis are
in the detector response, drifts in the light source); unlike oth-
often too unwieldy to model, as many of the inputs do not con-
ers, however, the NIR technique is subject to deviations arising
tain important information. When using analytical calibrations, it is
from the process and of the measurements it provides. One of the
important to identify and minimise all possible sources of error and
most significant deviations when operating in the reflectance mode
ensure the best possible estimator. To improve the quality of the
is due to the fact that the Kubelka–Munk transformation is lin-
on-line monitoring processes, it is informative to obtain as many as
ear only under specific conditions. The most frequent alternative
possible spectra in a given period of time. Nevertheless, hardware
(viz. the variation of the reciprocal reflectance, log(1/R), against
limitations could lead to the fact that it is not possible to acquire
the concentration) is linear in most cases. However, proportional-
more than a certain number of spectra in a given period of time.
ity between the two parameters is dependent on absorptivity and
Wavelength selection could be a good way to limit this problem
scattering, the latter of which varies non-linearly with particle size.
since it decreases size of selection and consequently the acquisi-
This “extrinsic” non-linearity can be corrected to a great extent by
tion time, of each recorded spectrum. Wavelength selection results
using various mathematical signal treatments such as multiplica-
can, for instance, be used to select the most suited filters for on-line
tive scatter correction (MSC), the standard normal variate (SNV) or
applications of NIR. When developing mission-critical regression
derivative absorption spectra.
models intended for the routine usage, e.g., industrial usage, even a
One other source of non-linearity is related to the chemical
subtle improvement in the performance is important. Wavelength
nature of the target analytical parameter. The different types of
selection [22–24] is important for reliable classifications of analytes
interaction a given functional group can undergo in external ambi-
by NIR spectroscopy and chemometric models.
ent conditions (pressure, temperature, etc.), can shift absorption
bands and result in intrinsic non-linearity that cannot be corrected
by spectral pretreatment and calls for special tools. 3. A brief review of regression methods
Correct selection of variables in order to gather a small subset
with a decreased sensitivity to non-linearity or discard those wave- The commonly used chemometric methods for the analysis of
lengths most markedly contributing to it suffices in some cases. NIR spectra could be divided into three main techniques groups.
Occasionally, the process is labour-intensive and time-consuming (i) Mathematical pretreatments to enhance the information search
but can be expedited by using a variable selection method (e.g., in the study, and decrease the influence of the side information
a genetic or stepwise selection algorithm). Furthermore, variable contained in the spectra. Spectral pre-processing is considered as
selection is one very important step in any successful development well known and not described in this text. The classical pretreat-
of a calibration model. Careful choice of spectral pre-processing ments are normalizations, derivatives and smoothing. For more
and wavelength selection could eliminate the temperature depen- details, readers are referred to textbooks [25,26] and [27]. (ii) Qual-
dency [18,19]. The automatic wavelength selections [20,21] may itative analysis means classification of samples according to their
give valuable information about what factors are important in cre- NIR spectra. NIR identifications are based on pattern recognition
ating a successful discrimination. methods. There are many unsupervised and supervised methods
in classification techniques. Roggo et al. [4] give a brief descrip-
2.3. Statistical and multivariate calibration tion of the chemometric classification methods and an overview of
the pharmaceutical applications in the field of qualitative analyses,
Viewed from a statistical or data analysis perspective, the main especially identification and qualification of raw and final materi-
difficulty in such problems is to cope with the collinearity between als. The classification methods will not be described in this text,
spectral variables: not only are consecutive variables in a spectrum readers are interested in this field could referred to the articles
18 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Fig. 2. Flow diagram of calibration and validation process.

[26,28–37]. (iii) Regression methods are used to link the spectrum (SEV) are computed simultaneously. In this case, Spectra X and the
to quantifiable properties of the samples. The quantitative part of related sample properties “y” are split into calibration and predic-
this review will be described in this section. tion subsets separately. The calibration data usually comprised of
between 50 and 75% of the total data set and include the small-
3.1. Calibration and validation est and largest “y”, with the remaining data partitioned randomly
into the calibration and prediction sets. The efficiency of a model
In spectroscopy, the goal of calibration is to replace slow, expen- approximation for a set of calibration and prediction samples can
sive measurement of the property of interest, y, by a spectroscopic be reported as standard error of calibration (SEC), the root mean
feature that is cheaper or faster, but is still sufficiently accurate. For square error of cross-validation (RMSECV), the correlation coeffi-
any spectroscopy technique, such as NIR spectroscopy, multivari- cient (r) and standard error of prediction (SEP). These coefficients
ate calibration (MVC) is defined as “A process for creating a model are computed as follows:

‘f’ that relates sample properties ‘y’ to the intensities or absorbance 
‘X’ at more than one wavelength or frequency of a set of known ref-   Ic

SEC = 
1 2
(ŷi − yi ) (1)
erence samples” [38]. Fig. 2 is the flow diagram of calibration and Ic − 1 − h
validation process [39]. i=1

The use in NIR reflectance spectroscopy of a linear relation- 



ship between apparent absorbance and concentration is mainly  1 
Ip

SEP = 
2
accepted by scientist in this area. Accordingly, linear MVC (LMVC) (ŷk − yk ) (2)
Ip − 1
models are used, such as multiple linear regression (MLR), princi- k=1
pal component regression (PCR) and partial least squares regression 
 Ic
(PLSR). The development of the regression model comprises of the 1
RMSECV = 
2
following three stages [38]: (ŷi − yi ) (3)
Ic
i=1
(i) The calibration model is built and validated using a training n 
set (X0 , y0 ) and a validation set (X1 , y1 ); the result is an error g=1
(yg − yg )2
of validation having an associated standard error of validation
2
r =1− n (4)
g=1
(yg − ȳ)2
(SEV), that is used to configure the model.
(ii) Both (X0 , y0 ) and (X1 , y1 ) are used to compute the standard where ŷi , ŷk denote the estimated value of the ith observation in
error of calibration (SEC) of the model. calibration and kth observation in predication sets, yi , yk is the
(iii) An independent test set (Xp , yp ) is used to evaluate the model’s measured value of ith observation in calibration and kth obser-
performance with an indicator criterion, namely the error of vation in predication sets, Ic , IP ate the number of observation
prediction, where the standard error of prediction (SEP) is uti- in calibration and predication sets and h is the number of inde-
lized. pendent variables in the regression. To evaluate the error of each
calibration model, the leave-one-out root mean square error of
Generally, the first and second steps are merged together using cross-validation (RMSECV) was used, calculated as: leave-one-out
the cross-validation technique (e.g., leave-one-out (LOO) method, cross-validation is performed by first defining the number of latent
contiguous blocks, randomization or the bootstrap), so the stan- variables. Next one sample is removed from the total for validation
dard error of calibration (SEC) and the standard error of validation (prediction), then, the calibration model is built with the remaining
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 19

samples. The procedure is repeated for all samples and a RMSECV These techniques are powerful multivariate statistical tools that

is calculated. yg , yg denote the measured and estimated value of have been successfully and widely applied to the quantitative anal-
the gth observation in data sets (including calibration, predication ysis of spectroscopic data because of their ability to overcome
and cross-validation sets), ȳ denotes the mean of measured value problems common to this data such as collinearity, band overlaps
in the data set. The basic relationships to notice are that the SEC and interactions and, the ease of their implementation due to the
decreases as r increases, r is always larger in absolute value than r2 , availability of software. Here, only a brief introduction about PCR
0 ≤ r 2 ≤ 1, and 0 ≤ RMSEC. and PLS is given as the techniques are routinely used.
Principal component regression (PCR) is a widely used regres-
3.2. Multivariate linear regression (MLR), principal component sion model for data having a large degree of covariance in the
regression (PCR) and partial least squares regression (PLSR) independent or predictor variables or, where ill-conditioned matri-
ces are present. Instead of regressing the concentrations of a
Multivariate linear regression (MLR), principal component measurement system onto the original measured variables spec-
regression (PCR) and partial least squares regression (PLSR) are trum, PCR implements a PCA decomposition of the spectrum X data
the three common multivariate methods used in calibration of NIR before regressing the concentrations information onto the principal
spectroscopy data. Furthermore, these three methods will also be component scores [47,48].
used in many selection approaches discussed later. Some vectors having small magnitude are omitted to avoid the
In fact, all three methods have a common point in that all of collinearity problem. PCR solves this by elimination of lower ranked
them model data using a linear least squares fitting technique. principal components which in turn reduces noise (error) present
This means that they build linear models between an independent within the system.
matrix X (spectral data) and a dependent matrix y and estimate the Partial least squares regression is related to both principal com-
regression coefficient matrix using least squares fitting techniques. ponents regression (PCR) and multiple linear regression (MLR).
PCR aims to find the factors which capture most of the variance
3.2.1. Multiple linear regression (MLR) within the data before regression onto the concentration variables,
Multiple linear regression (MLR) [27] can be characterized as whereas MLR seeks a single factor that correlates both the data
a technique for solving a number of simultaneous equations. In a and their concentrations. PLS attempts to maximise the covariance
multi-component system which is determined simultaneously the thus capturing the variance and correlating the data together. As
analysis can be described by measuring m variables xj and for vari- PLS searches for the factor space most congruent to both matrices,
able y with the main aim of creating a linear relationship between its predictions are far superior to PCR [49].
them. This can be represented mathematically as PCR and PLS techniques are share many similarities and the the-
oretical relationships between them has been covered extensively
y = b0 + b1 x1 + b2 x2 + b3 x3 + · · · + b1 x1 + bm xm + e (5)
in the literature [30,40,41,43,46,50,51]. PLS and PCR perform data
The multi-linear regression (MLR) is the oldest of the pre- decomposition into spectral loadings and scores prior to model
sented methods and is less and less used in applications due to building with the aid of these new variables. In PCR, the data
the improvement of computation power. This regression allows decomposition is done using only spectral information, while PLS
establishing a link between a reduced number of wavelengths (or employs spectral and concentration data. Historically, PCR pre-
wavenumber) and a property of the samples. The prediction yj of dates PLS. However, since its introduction, PLS appears by most
the search property can then be described with the formula: accounts to have become the method of choice among chemists.
On the other hand, from the literature survey made by Wentzell

k
and Montono [52], they surprisingly found that there were a few
yj = b0 + bi xi + ei,j (6)
cases which indicated that PLS gave better results than PCR, and
i=1
a greater number of studies indicated no real difference in per-
where bi is the computed coefficient, xi the absorbance at each con- formance ([50–52] and references there in). In addition, by generic
sidered wavelength and ei,j is the error. Each wavelength is studied simulation of complex mixtures, Wentzell and Montono concluded
one after the other and correlated with the studied property. The that in all of the simulations carried out, except when artificial con-
selection is based on the predictive ability of the wavelength. The straints were placed on the number of latent variables retained,
three modes of selection are: forward, backward, and stepwise. no significant differences were reported in the prediction errors
When the correlation reaches a value fixed by the operator it is reported by PCR and PLS [52]. PLS almost always required fewer
kept as a part of the model calibration wavelengths. The model is latent variables than PCR, but this did not appear to influence pre-
then computed between this set of calibration wavelengths and the dictive ability. This statement has been also confirmed by the others
reference values of the studied property. [40,45,50,51].
It should also be noted that when using MLR, there is no con- However, global models, such as PLS, implicitly endeavour to
sistent solution available when more variables than samples are include the variation due to external effects in the model, in much
present as an infinite number of solutions exist, this ultimately the same way as unknown chemical interferences can be included
leads to weakness within the system. The other situation, i.e., when in an inverse calibration model. Provided the interfering variation
there are more samples than variables, leads to an over determined is present in the calibration set, an inverse calibration model can,
system, this does not allow an exact solution for the coefficients. in the ideal case of additively and linearity, easily correct for the
variation due to unknown interferences. It is assumed in global
3.2.2. Principal component regression (PCR) and partial least calibration models that the new sources of spectral variation can
squares regression (PLSR) be modeled by including a limited number of additional PLS fac-
Among the different regression methods available for multivari- tors. Owing to increase in the calibration model’s dimensionality,
ate calibration, the factor analysis-based methods, including partial it becomes necessary to measure a large number of samples under
least squares (PLS) regression and principal component regression changed conditions in order to make a good estimation of the addi-
(PCR), have received considerable attention in the chemometrics tional parameters. When highly non-linear effects are present in
literature [30,40–46]. PLS and PCR can be used directly for ill- the spectra, many additional PLS [53] factors are necessary to model
conditioned data by extracting the latent variables (factors). The the spectral differences, and occasionally, it is not possible to model
number of latent variables is lower than the number of objects. these spectral differences.
20 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Fig. 3. Variable removed by manual. (a) Apple NIR spectra, (b) Some data points were removed at lower and higher parts, R is the relative reflectance.

4. Variables selection methods 4.2. Variable selection by single-term linear regression and
multi-term regression
4.1. Manual approaches – knowledge based selection
4.2.1. Selection by single-term linear regression and the
For manual approaches, one possibility is to remove the vari- correlation coefficient
ables that have poor informational quality. In many studies This section gives a brief introduction to the concepts involved
[23,54,55], due to the insensitivity of the NIR instrument detec- in single-term linear regression, the statistical procedure that
tor, some data points at the lower and higher regions were omitted answers the following questions: “Given a set of data with one
from the spectral data sets. Fig. 3(a) [56] shows apple spectra col- independent variable X and one dependent variable Y and the cor-
lected by a NIR instrument. The data points at the lower and higher responding scatter-plot of Y against X, what is the straight line that
regions were cut from the spectral data sets before regression due best fits the data?” The answer is the straight line with the equation:
to a high signal to noise ratio (S/N). Fig. 3(b) [56] shows the selection

spectra interval. Manual deletion of variables suffers from two main Y = a + bX (7)
flaws: (i) there is uncertainty that exactly the same section of the

data will be removed between data sets and (ii) removed sections where Y is an approximation to Y, and a and b are constants.
may not be optimal from the point of view of the model (i.e., parts The best fitting line is called the regression of Y on X or Y
of a spectra may not look to the eye to be information rich, but for regressed against X. The regression constant a is called the con-
the model, they contain useful information). Therefore, when using stant term, and the regression constant b is called the regression
this manual approach, there is a tendency to remove sections that coefficient. The vertical distance from a data point to this line is the
contain either high noise or low detector response. However, such residual or regression error for that point, and the standard devia-
an approach can prove to be counter-productive in terms of robust tion of all the residuals is the SEC (or SEP). The correlation coefficient
model building. For example, information in the background noise (r), which is related to SEC, lies in the range [−1 1].
can be extremely useful for establishing a robust calibration model In developing a calibration model using single-term linear
as noise free spectra often has a large source of predictive error due regression, when one does not yet know the best wavelength to
to collinearity between neighbouring wavelengths in a single peak. use, one normally finds the r value at every available wavelength.
The presence of a high degree of collinearity between variables in The wavelength giving the highest r value is then used for the cali-
a model will tend to influence the matrix towards singularity, and bration and subsequent validation [58].
this in turn will have a large influence on the coefficients gener- However, in practice this simple approach seldom gives an
ated. adequate SEC, and a more complex calibration is usually needed.
Selection or reference wavelength is based on: (i) the peak One way to improve the correlation is to let X be the difference
absorbance of the component to be determined, such as one of between log(1/R) values at two different wavelengths (R is the rel-
the functional groups in Fig. 1; (ii) the peak absorbance of a com- ative reflectance). The two wavelengths can be found by an iterative
ponent whose concentration is highly correlated with that of the process. First, the single wavelength giving the best correlation is
component to be determined; and (iii) part of the difference of quo- found; then, a second wavelength is found so that the difference
tient expression and serves to normalize the spectra to one level between log(1/R) values at the first and second wavelengths gives
of scatter, particle size, temperature, etc. This would typically be the best correlation. The first wavelength is then replaced with a
the approach taken up by the spectroscopist. Manual selection is third wavelength whose difference with the second gives the best
suffers in the following respects, (i) the need for experience and correlation, and so on until the process converges, this ensures
good understanding of NIR spectroscopy as many biomaterial NIR that each pair of wavelengths provide the highest correlation. An
spectrums are too complicated to understand, and (ii) the relation- iterative procedure does not necessarily produce the pair of wave-
ship between absorption in the near-infrared (NIR) spectral region lengths whose difference provides the highest correlation. This
and the target analytical parameter is frequently of the non-linear method only provides the pair of wavelengths producing the con-
in nature. The origin of the non-linearity can be varied and diffi- verged correlation. The same process can be used with quotients
cult to identify. In some cases, the relationship between absorption (A/B) instead of differences and with the quotients of differences
and the analytical parameter of interest is intrinsically non-linear ((A − B)/(C − D)) [39]. In the last case, there are various ways of iter-
owing to the chemical nature of the sample or analytes concerned ating the process when selecting the four wavelengths. The various
[57]. methods do not all yield the same choice of wavelengths.
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 21

If several wavelengths do not give an acceptable result, multi- mance. McShane [59] described a fast stepwise algorithm that uses
term regression approaches should be used as discussed later. multiple ranking chains to identify several spectral regions corre-
lated with known sample properties. The multiple-chain approach
4.2.2. Selection by multi-term regression allows the generation of a final ranking vector that moves quickly
Multi-term regression (usually using multiple linear regression away from the initial selection point, testing several areas exhibit-
(MLR) as shown in formula (5)) uses the information at a num- ing correlation between spectra and composition early in the
ber of wavelengths to isolate the effect of a single absorber and stepping procedure [59]. There have been many studies devoted
to normalize the baseline. There are various ways of choosing the to this problem; for a comprehensive review one can see [60].
wavelengths to use in multi-term linear regression [39]. These are
(i) the step-up or forward procedure picks the wavelength giving 4.3. Successive projections algorithm (SPA) and uninformative
the best single-term calibration as the first independent variable, variable elimination (UVE)
and then finds the best wavelength to add as a second variable in a
two-term regression, and so on until some stopping criterion is met. Employing the full spectral region does not always yield opti-
(ii) The step-down or backward procedure starts with a multi-term mal results as it may include regions which comprise of more noise
linear regression using all available wavelengths and eliminates than relevant information. Therefore, uninformative variable elim-
variables by some criterion. (iii) The all-possible-combinations pro- ination (UVE) proposed by Centner et al. [61], has been used to solve
cedure tests all possible linear regressions on all subsets of available such problems and improve the quality of the models. Multiple lin-
wavelengths and reports the subset giving the lowest SEC. This pro- ear regression (MLR) models are simpler and easier to interpret, but
cedure is usually limited to all subsets containing only two or three they are very affected by collinearity between variables. The suc-
wavelengths. (iv) There are also combinations of these methods. cessive projections algorithm (SPA) proposed as a variable selection
For example, the all-possible-combinations method can select two strategy by Araújo et al. [62], illustrates the advantage of finding a
or three wavelengths, and then the step-up method can be used small representative set of spectral variables with a minimum level
to add wavelengths. Alternatively, each step in the step-up method of collinearity.
can be followed by one step of the step-down method, to determine
wavelengths that can be safely eliminated when a new wavelength 4.3.1. Successive projections algorithm
is added. This method is called the stepwise method, and is most The successive projections algorithm (SPA) is a variable selec-
commonly referenced in the literature. The detailed algorithm is tion technique designed to minimize collinearity problems in
as follows. In stepwise multiple linear regression (MLR-step) [28], multiple linear regression (MLR). SPA employs simple projection
original variables are selected iteratively according to their cor- operations in a vector space to obtain subsets of variables with min-
relation with the target property y. For a selected variable xi , a imal collinearity and is a forward variable selection algorithm for
regression coefficient bi is determined and tested for significance multivariate calibration. The principle of variable selection by SPA
using a t-test at a critical level  (such as  = 5%). If the coefficient is is that the new variable selected is the one among all the remaining
found to be significant, the variable is retained and another variable variables, which has the maximum projection value on the orthog-
xj is selected according to its partial correlation with the residu- onal sub-space of the previous selected variable. A graphical user
als obtained from the model built with xi . This procedure is called interface for SPA is available at www.ele.ita.br/kawakami/spa/. SPA
forward selection. The significance of the two regression coeffi- steps are described below for a given initial wavelength k(0). The
cients bi and bj associated with the two retained variables is then total number of wavelengths in the spectrum is J and the desired
again tested, and the non-significant terms are eliminated from the number of variables is N.
equation (backward elimination). Forward selection and backward
elimination are alternatively repeated until no significant improve- (i) Before the first iteration (n = 1), let xj = jth column of Xcal ; j = 1,
ment of the model fit can be achieved by including more variables . . ., J.
and all regression terms already selected are significant. In order to (ii) Let S be the set of wavelengths which have not been selected
reduce the risk of over-fitting due to retaining too many variables, yet. That is,
a procedure based on LOOCV followed by a randomisation test is
applied to test different sets of variables for significant differences S = {j such that 1 ≤ j ≤ J and j ∈
/ {k(0), . . . , k(n − 1)}}
in prediction.
The backward, forward and stepwise selection methods can be (iii) Calculate the projection of xj on the sub-space orthogonal to
performed in a short time by commercially available software pack- xk(n−1) as
ages.
There are two main flaws with these types of procedures causing −1
them to perform inconsistently across data having different noise Pxj = xj − (xTj xk(n−1) )xk(n−1) (xTk(n−1) xk(n−1) ) , for all j ∈ S
character. (i) Though the stepwise selection methods are simple
and efficient, they depend upon an ordering or ranking of the vari- where P is the projection operator.
ables which often makes them sensitive to noise distributions. (ii) (iv) Let k(n) = arg(max||Pxj ||, j ∈ S).
Because variables are usually ranked according to some criteria, (v) Let xj = Pxj , j ∈ S.
points on a single peak are commonly chosen together. In partic- (vi) Let n = n + 1. If n < N go back to Step 1.
ular, if one spectral region contains much higher correlation than
others, many points within this area will be tested before any points End: The resulting wavelengths are {k(n); n = 0, . . ., N − 1}
in other regions are considered. Neighbouring points often contain For a detailed description of SPA, see Ref. [62], the main pro-
much of the same information (collinearity), and when they are cedures are summarized here. First, set the maximum number of
added consecutively in a stepwise procedure, this may decrease variables N to be selected before a start vector is chosen in a space of
prediction accuracy. J-dimensions (where J is the number of original variables). Subse-
To overcome these drawbacks, chemical information such as quently, in an orthogonal sub-space, the vector of higher projection
correlation between spectra and composition, should be con- is selected and becomes the new starting vector. The choice of the
sidered in the selection process rather than depending upon orthogonal sub-space at each iteration is made in order to select
an optimization procedure that relies solely on model perfor- only the non-collinear variables. The optimal initial variable and
22 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Fig. 4. Plot of s for experimental and artificial random variables. The cutoff level at max(abs(sartif. )) is indicated by the dashed line.

number of variables can be determined on the basis of the smallest mean value of bj is large and the standard deviation of bj is small,
root mean square error of validation (RMSEV) using the validation the stability value is large. Therefore, the larger the stability, the
set of MLR calibration. more important the corresponding variable is. The variables whose
In terms of prediction ability, SPA-MLR models have been shown stability is less than a threshold should be treated as uninformative
to be comparable to, or better than full-spectrum PLS/PCR mod- and be eliminated.
els in a number of applications, including UV–vis [62] and NIR In order to estimate a suitable cutoff threshold, an artificial ran-
[43] spectrometry. Good results involving the use of SPA together dom variable matrix N (n × p) with very small amplitude (e.g.,
with wavelet regression have also been reported [63]. Furthermore, 10−11 ) is added to the original data to compute their stability.
SPA has also been favourably compared with the genetic algorithm It is obvious that any variable whose stability is less than that
[62], which is a popular tool for variable selection in multivariate of random variables should be known as uninformative and be
calibration and will be discussed later. Moreover, the selected vari- eliminated. In practice, the cutoff threshold is generally defined
ables, could be used as the inputs of MLR, PLS and LS-SVM models by:
[64].
SPA employs simple projection operations to select variables cutoff = k × max(abs(snoise )) (9)
with the minimum of collinearity, however, variables selected by
where k is an arbitrary value, e.g., 0.7 or 0.9 [61].
SPA may have low signal noise to ratio (S/N) or be insufficient for
Fig. 4 [56] shows the plot of s value for experimental and artificial
multivariate calibration, this can affect the precision of the model
random variables, the cutoff level at max(abs(sartif. )) is indicated by
prediction.
the dashed line.
UVE is a method of variable selection based on stability anal-
4.3.2. Uninformative variable elimination ysis of regression coefficients (b). The main steps of UVE can be
In the manual approach, the uninformative sections are sub- summarized as follows:
jectively removed on the basis of either high noise or low
detector response. To address this uninformative variable elim-
(i) First, PLS regression is performed on instrumental response
ination method (UVE-PLS) was recently developed to eliminate
data (X) and property values (y) of the calibration set and the
uninformative variables for calibration of NIR data [33,65,66]. Arti-
optimal number of PLS factors is determined.
ficial random variables are added to the data as a reference so that
(ii) A noise matrix with the same size of the X matrix are generated,
those variables which play a less important role in the model than
whose elements are random numbers in the interval of 0.0–1.0.
the random variables are eliminated. Several versions of UVE-PLS
The elements are multiplied with a small constant to make
were described in Ref. [61]. Here we introduce one simple UVE-PLS
their influence on the model negligible.
method.
(iii) The noise matrix is appended to the original matrix X to form
In linear models, the prediction ŷ is computed with Eq.
an extended matrix having twice as many variables.
(5). A regression coefficient vector b = [b1 , . . ., bp ] is calcu-
(iv) PLS models are constructed based on the extended matrix and
lated through a leave-one-out validation. Because each coefficient
y in manner of leave-one-out cross-validation. This leads to
bj represents the contribution of the corresponding variable
a matrix of b values with as many rows as samples and, one
to the established model, the reliability of each variable j
column for each variable, both original and random.
can be quantitatively measured by the stability defined as
(v) The s value of each variable is calculated as the average of the
[65,61]:
b values of each column divided by the standard deviation of
mean(bj ) that column.
sj = , j = 1, . . . , p (8) (vi) The cutoff value is set as the maximum of absolute value s
std(bj )
among the random variables. Every original variable with equal
where mean(bj ) and std(bj ) are the mean and standard deviation or lower absolute value of s is assumed to be noise only and is
of the regression coefficients of variable j. It is clear that, when the eliminated.
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 23

The obvious advantage of the UVE–SPA lies in two aspects


compared with direct SPA: (i) the first advantage is to make the
association of variables and property closer; (ii) the number of
variables required to be sought by SPA is reduced.

4.4. Simulated annealing (SA), artificial neural networks ANN)


and genetic algorithm (GA)

4.4.1. Simulated annealing (SA)


SA is a probabilistic global optimization technique, based on the
physical annealing process of solids, put forward by Kerkpatrick
et al. [67]. In contrast to deterministic optimization techniques
(e.g., simplex optimization), probabilistic optimization techniques
allow acceptance of an inferior solution during optimization [68].
Consequently, probabilistic optimization techniques have the abil-
ity to traverse local optimums and to find the global optimal
solution. Hence, the SA algorithm has been widely applied to
many optimization problems. More detailed description of SA
can be found in the literatures [67,69,70]. Fig. 5 illustrates the
flowchart of SA. A graphical user interface for SA is available at:
http://neo.lcc.uma.es/∼software/mallba/sa.php.
In simulated annealing, a problem starts with an initial solution
which is iteratively modified subject to some control parameter T,
this is analogous with temperature. As the parameter T is reduced,
the convergence criterion becomes increasingly difficult to satisfy.
Finally, if T is lowered sufficiently, no further changes in the solution
space are possible. To avoid being frozen at a local optimum, the SA
algorithm moves slowly through the solution space. This controlled
improvement of the objective value is accomplished by accepting
non-improving moves with a certain probability that decreases as
the algorithm progresses. This criterion is a Boltzman’s probability
distribution (Metropolis criterion) as a function of temperature T:
 −F 
p(F) = exp (10)
T
where

F = F(v ) − F(vi ) (11)

where F is the objective function, F is the increment of objective


function, vi is current values, and v is a randomly generated new
solution in the neighborhood of vi .
Fig. 5. The flowchart of simulated annealing (http://www.heatonresearch.com). For NIR spectroscopy wavelength selection, the SA solution is
represented as a numerical string containing k values (integers)
Based on this definition of noise, UVE can eliminate non- representing the variables to be selected from the whole spectral
informative variables. Employing the variables selected by UVE range of N variables. These k variables are selected from the cal-
for modeling can avoid model over-fitting and usually improve ibration set spectra, and in combination with reference values of
its predictive ability. However, latent variables are still required the corresponding samples, a PLS model (or PCR, MLR and other
to be employed for modeling because the number of the variables models) with a predefined number of factors is calculated. One
selected is still large. restriction of these values is that it is not possible to select two of
the same variables in the same string [31]. The number of possible
combinations is
4.3.3. UVE–SPA method
A successive projection algorithm combined with uninforma- N
tive variable elimination called UVE–SPA method is proposed by Ye Nc = (12)
k
et al. [66] for spectral variable selection, in which SPA is employed
for variable selection and then UVE discards uninformative vari- where the number of variables (k) need to be 1 higher than the
ables. Compared with direct SPA, fewer variables were selected by number of factors in the PLS model.
the SPA–UVE method for a MLR calibration model having higher Subsequently, the same k variables are selected from the spectra
precision of prediction. standardization set (spectra measured under changed circum-
Ye et al. [66] applied the method to two sets of NIR data for stances) reference parameters are predicted using the calibration
analysis of nicotine in tobacco lamina and active pharmaceuti- set and the variable subset from the standardization set. On the
cal ingredient (API) in a single tablet, respectively. MLR models basis of these prediction results, an error value is calculated. This
were developed employing the original instrument response data error value comprises the predictive ability of the model at the stan-
of spectral variables selected by UVE–SPA, and the property param- dard temperature and the predictive ability of the model when it
eters of interest could be predicted accurately using raw spectral is used at different temperatures. The goal of SA is to minimize the
data without any pretreatment. error value; which implies that the prediction error of the model is
24 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

minimized at all temperatures. In order to find the proper k value,


various SA runs are performed using different values for k.
Swierenga et al. [68] demonstrated that variable selection by
simulated annealing (SA) enhances the model’s robustness with
respect to model transfer and also improves its predictive ability.
Furthermore, this method is usually combined with artificial neural
networks (ANN) and genetic algorithms (GA), because it enables the
process to traverse local minima and achieve the global minimum.

4.4.2. Artificial neural networks


Artificial neural networks (ANN) were originally designed to
mimic the function of the human brain. They consist of a number
of simple processing units (or neurons) linked by weighted modi-
fiable interconnections. ANN has been developed for quantitative
analysis of samples during the last decade. Compared with MLR,
ANN is a more flexible modeling methodology, since both linear Fig. 6. Kohonen SOM topology (http://dlist.sir.arizona.edu).
and non-linear functions can be used (or combined) in the pro-
cessing units. This allows more complex relationships between a
high-dimensional descriptor space and the given retention data,
and may lead to better predictive power of the resulting ANN calibration model into ANN. Therefore, the later method, that is the
model compared with MLR. However, the major disadvantage of causal index analysis method, is discussed here.
ANN is also directly related to the complex model infrastructure. First, the PCA-CG algorithm [73] is used to determine the num-
Compared to MLR, it suffers from the perception of being a “black ber of neurons in hidden layer. The number of neurons in the
box” heuristic tool. ANN models generally are more difficult to hidden layer is selected in advance to represent the intrinsic dimen-
interpret. Furthermore, to get robust calibrations by ANN, the num- sionality of the dataset. Inputs that are linearly correlated do not
ber of samples must be higher than the number of weights to be contribute independent information and thus the information con-
estimated, which normally mplies the uses of a large number of cal- tent of the correlated inputs can be captured in lower dimensions.
ibration samples. Before building ANN, methods for reducing the The amount of information lost in the projection of the input data
input dimensionality by mathematical pre-processing (fast Fourier into a lower-dimensional space is quantified by the fraction of
transform, principal component analysis, and Variance analysis) the variance of the original data not represented in the reduced
are often used. space. Typically, it specifies as many hidden neurons as dimensions
ANN models are seldom directly used in the case of NIR spec- needed to capture 70–90% of the variance of the input dataset. This
tral analysis and variables selection as large-scale ANN models are immediately places limits on the number of hidden neurons in the
required. Furthermore, large-scale ANN models take a long time initial network. From experience, the suggested number of hidden
to train, if at all possible; they have tendency to become trapped neurons is already very close to the optimal size (typically about
in local minima so repeated re-training with random initial con- five hidden neurons) before the network reduction procedures are
nection weights is needed. There is a necessity for a method to applied. Thus, most of the effort is directed at reducing the num-
estimate the number of intermediate neurons, as it is known that ber of network inputs, since the number of outputs is fixed by the
too small or too large number of these neurons degrades the repre- requirements of the model.
sentation of the data in the model. The ANN model is suspected Secondly, calculate the relative contribution of the inputs to the
of being over-fitted, as the ratio of available training examples variance of the hidden layer neurons in the trained network. The
to the number of connection weights is usually considered too calculation algorithm is described in Ref. [73].
small and that the ANN model interpretation is too difficult. Large- Thirdly, determine the importance of network inputs. Inputs
scale system modeling by non-ANN techniques requires time- and that make relatively small contributions to the variance at the input
resource-intensive effort domain knowledge and expertise. of the hidden layer serve only as biasing constants and can be
Knowledge extraction from the trained ANN models is impor- replaced by fixed biases at the hidden layer neurons. When analyz-
tant for gaining the confidence and acceptance by the users. Two ing a given input, its collective effect on all hidden layer neurons
kinds of ANN, back-propagation artificial neural networks (BP- is of interest, rather than the effects on individual hidden neurons,
ANN) and Kohonen artificial neural network (K-ANN), have been since removing the input de-couples it simultaneously from all hid-
used to select variables in NIR spectroscopy. den neurons. The relative contribution of each input to the total
variance at the input to each hidden neuron is calculated when pre-
4.4.2.1. Back-propagation artificial neural networks (BP-ANN). BP- senting the whole training data. Small contribution values indicate
ANN is the most common neural network. It generally consists of an that either this input does not change much, or that the training of
input layer, one or more hidden intermediate layers and one output the ANN has allocated small connection weights to scale the input.
layer. A neuron is a processing unit which transforms, by an activa- Applying these algorithms to large-scale ANN modeling has
tion function, input into output data. It is a feed-forward network been successful. The number of training periods needed achieve
and combines a back-propagation algorithm which is used to train a low modeling error was usually small, rarely more than a cou-
the network according to a learning rule [35,71–75]. One method of ple of hundred. Even with a relatively small number of examples,
variable extraction is the finding of an optimal set of inputs that can the validation model error was also low. There are advantages in
successfully predict, or classify, the desired outputs. In the case of identifying and removing the less-relevant inputs even if a large
spectral analysis, this translates to identifying those wavelengths ANN is well trained: better modeling accuracy; knowledge gain in
that contain most of the important information. Another method identifying the important inputs; larger ratio of examples to ANN
is the causal index (CI) analysis of the trained ANN model to obtain connection weights (thus a better generalization capacity); easier
a quasi-quantitative estimate of the direction and magnitude of model understanding. Thus, better ANN models were trained with
the influence of each ANN input on each ANN output. The former the reduced input set, and this process was repeated until the model
method is like SPA and MLR selection except for a change in the accuracy was degraded.
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 25

Once the ANN model is trained, the analysis of its connection


weights can easily identify the important inputs. Repeating the pro-
cess of training the ANN model with the reduced input set and the
selection of the most relevant inputs can proceed until a quasi-
optimal, small, set of inputs is identified. Boger used this method
[73] finding the minimal set of wavelengths in benchmark diesel
fuel NIR spectra. Causal index calculation can analyze the influ-
ence of each of selected wavelengths on the predicted property.
Some of the resulting minimal sets are not unique, depending on
the ANN architecture used in the training. However, the accuracy of
the resulting ANN models is usually better, and more robust, than
the large initial ANN model.

4.4.2.2. Kohonen artificial neural network (K-ANN). The self-


organizing map (SOM) invented by Teuvo Kohonen performs a form
of unsupervised learning [75]. A set of artificial neurons learn to
map points in an input space to coordinates in an output space
as shown in Fig. 6. The input space can have different dimensions
and topology from the output space and the SOM will attempt to
preserve these features.
Kohonen artificial neural network (K-ANN) is used as a tool
for wavelength selection based on its clustering capability. K-ANN
[75] are used to select optimal sets of wavelengths for PLS cali-
bration of mixtures with stray overlapping and uses a clustering
method where the objects are distributed over a topological map
(usually a squared map of n × n cells). The net is composed of
n × n × p weights, where p is the number of input variables and
each p-dimensional vector is a neuron. During training, n objects
are presented to the net – one at a time – a fixed number of times
(epochs); each object is assigned to the cell for which the distance
between the object vector and the neuron is minimum; the weights
of the cell where an object is assigned and the topologically nearest
cells are modified in such a way as to reproduce the object profile.
When the net is trained, similar objects fall in the same cell or in
Fig. 7. The flowchart of genetic algorithm operation.
proximate cells. The maximum and minimum correction parame-
ters for the weights were chosen as 0.5 and 0.05, respectively. At
the end of the net training, similar wavelengths fall within the same
neuron, i.e., carry the same information. To select the set of wave- ment an automated wavelength selection procedure for use in
lengths, it is assumed that the wavelength closest to each neuron building multivariate calibration models and is a suitable method
centroid is the most representative of all the wavelengths within for selecting wavelengths for PLS, MLR, ANN, etc. It is capable of
the same neuron [76]. calibrating mixtures with almost identical spectra without loss of
Before commencing training, all K-ANN weights were initialized prediction capacity using the spectrophotometric method. Many
by randomization in the range 0.2–0.8 and the iteration number studies have demonstrated the importance of the validation step in
was fixed at 500 epochs in all cases. GA wavelength selection for multiple linear regression in order to
Todeschini et al. [75] developed three K-ANN nets with 10 × 10, avoid random correlation and the selection of irrelevant variables
8 × 8 and 5 × 5 neurons to select the wavelengths. Three sets of 73, [43,46,77,79–81].
52 and 24 wavelengths were selected as representatives of all 151 Genetic algorithm optimization combined with partial least
wavelengths of the full-spectrum. PLS models were constructed for squares regression (GA-PLS) is the most used method in NIR
each component, after data centering, for each set of wavelengths spectroscopic data sets, which combines the advantages of GA
selected by the net. By selecting an adequate set of wavelengths, and PLS. The GA could find optimal values for several disparate
a PLS model with better predictive ability than those constructed variables associated with the calibration model, also the PLS
with the full-spectrum can be obtained. Wavelength selection procedure could be integrated into the objective function driv-
even made it possible to reduce the number of standard samples ing the optimization. The GA-PLS exhibits superiority over other
needed for the resolution of a very highly overlapping chemical applied multivariate methods due to the wavelength selection
system. in the PLS calibration using a genetic algorithm without loss
of prediction capacity, and furthermore, provides useful infor-
4.4.3. Genetic algorithm mation about the chemical system. These results are verified in
The selection of variables for multivariate calibration can be many literatures [82–86]. The flowchart of the GA implementation
considered an optimization problem. Genetic algorithm (GA) is a with PLS is shown in Fig. 7. The GA consists of four basic steps,
popular heuristic optimization technique that employs a proba- where steps (ii)–(iv) are repeated until the termination criterion is
bilistic, non-local search process inspired by Darwin’s theory of reached:
natural selection. Genetic algorithms are currently popular in many
fields and have been successfully applied to frequency selection (i) To allow easy mathematical treatment of a chromosome,
problems, in which GA manipulates binary strings called chro- a coding is necessary. This is solved by representing each
mosomes that contain genes that encode experimental factors or variable/window (gene) with a binary code in a vector (chro-
variables [77,78]. Usually, a genetic algorithm (GA) is used to imple- mosome) with one cell for each variable/window. The original
26 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

chromosome is then perturbed randomly to make a number of


chromosomes, the initial population.
(ii) For each chromosome, the response associated with the corre-
sponding experimental conditions is evaluated. This is done by
making a PLS model for each chromosome. The model is then
evaluated by cross-validation in order to get a numeric value
describing the quality of the model. This value is known as the
fitness value and is the criterion for guiding the GA to the global
optimum.
(iii) The reproduction step creates a new population that can be
considered as the next generation. The new generation of
chromosomes is made up by recombination of the original
chromosomes. This recombination is made by single-point
crossover, which is based on two parent chromosomes that are
cut in two pieces, each at a randomly chosen point. They are
then crossed and put together again to form two children chro-
mosomes that will replace the parent chromosomes in the new
Fig. 8. Cross-validated prediction errors (RMSECV) for 40 interval models (bars) and
generation. The chromosomes with a high fitness value have a
full-spectrum model (dotted line) versus interval number for 1–10 latent variables
higher probability to reproduce than a chromosome with the in the localized models and three latent variables for the global model, for apples
target to improve the overall fitness of the population. SSC termination. Dotted line is RMSECV (3 LV’s) for global model/italic numbers are
(iv) Mutations are necessary to overcome some problems that may optimal LVs in interval mode.
occur. The most essential problem to be solved is if a variable is
not selected from any of the original chromosomes, as it would
least squares (iPLS), in which the data are subdivided into non-
never be selected in the coming generations if mutations were
overlapping sections that each undergo a separate PLS modeling to
not present. A mutation is simply an inversion of a gene in a
determine the most useful variable range [18,54,55,87–93].
chromosome. The mutation rate, is user defined and often set
The interval selection algorithm has analogies with molecu-
to between 0.001 and 0.01.
lar chemistry: The general features of molecular spectra are of
continuous bands rather than discrete responses. These continu-
The algorithm is repeated until the termination condition is
ous bands are actually composed of many discrete responses that
fulfilled. The termination condition is based on a convergence crite-
are not resolvable using standard laboratory instruments. The dis-
rion, where the algorithm is terminated when a certain percentage
crete transitions, in close wavelength proximity, are due to the
of the chromosomes are identical.
rotational–vibrational transitions from specific molecular features
GAs applied to PLS have been shown to be very efficient opti-
that are associated with particular molecules. The goal of vari-
mization procedures. They have been applied on many spectral data
able selection for multivariate calibration is the prediction of the
sets and are shown to provide better results than full-spectrum
level of specific molecular components and, thus, it is reasonable
approaches [83,85,86]. GAs have the advantage of exploring fairly
to believe a variable selection algorithm operating on molecular
well the space of all possible subsets in a large but reasonable time,
spectra would select regions of the spectrum rather than discrete
much less than that required by the study of all possible subsets.
wavelengths.
Moreover, GAs offer the choice between a number of possible opti-
The adjacent-variable wavelength selection methods are bet-
mal or near-optimal subsets. For more detail, one can see [84].
ter suited for spectroscopic NIR data sets for the same reasons
However, GAs do have significant drawbacks. First, they tend
that these methods initially appeared in the literature, i.e., spec-
to be extremely slow compared to the simple stepwise methods.
troscopic signals can resist strict orthogonal decomposition and,
Secondly, they present a tremendous configuration challenge to the
adjacent spectral data are naturally continuous and, consequently,
user because of the numerous adjustable factors that affect the out-
covariant. These factors become very important when producing
come of GAs. Fitness function, convergence criteria, mutation rate,
long-term predictive models from the complex NIR data.
crossover scheme, the number of chromosomes considered, initial
population, and the number of generations through which the pro-
cess is allowed to proceed, directly influence the result. Therefore, 4.5.1. Interval partial least squares (iPLS)
judicious selection of these parameters is critical. While GAs are By deselecting portions of the entire spectral data set containing
extremely useful for some applications, the speed and configura- only trivial amounts of relevant information, wavelength selection
tion problems limit the circumstances under which they may be has been generally shown to improve the capabilities of PLS mod-
applied successfully and require a considerable level of expertise eling. Interval partial least squares regression (iPLS) is a method
on the part of the user. of graphically oriented approach for local regression modeling of
spectral data, and was developed by Nørgaard et al. [94]. This tech-
4.5. Interval selection method nique can provide an overall picture of the relevant information
in different spectral subdivisions, thereby focusing on important
Two basic strategies for wavelength selection are the selection spectral regions and removing interference from other regions. The
of the most utilitarian wavelength variables possible regardless of sensitivity of the PLS algorithm to noisy variables is highlighted by
location and, the selection of variables which maintain continuity the informative iPLS plots.
in the original variable domain. The former strategy is commonly Interval PLS models are developed on spectral subintervals of
performed using genetic algorithms (GA) or simulated annealing equal width, and the prediction performance of these local mod-
(SA), with other approaches including artificial neural networks els and the global (full-spectrum) model may be compared. The
(ANN), Bayesian latent variable modeling, etc. The latter strategy comparison is mainly based on the validation parameter RMSECV
employs techniques implicitly or explicitly defining “windows” or (root mean squared error of cross-validation), but other parame-
“intervals” of the data, in order to maintain a continuous vari- ters such as r2 (squared correlation coefficient), slope, and offset are
able selection. The most straightforward example is interval partial also evaluated to ensure a comprehensive model overview. Sam-
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 27

ple and/or measurement abnormalities (outliers) as detected by of intervals, but now PLS models are calculated with each inter-
PLS inner relation plots should generally be removed prior to the val left out in a sequence, i.e., if one chooses 40 intervals then
application of iPLS. each model is based on 39 intervals leaving out one interval at
Models based upon the various intervals (Xinterval ) usually need a time. The first omitted interval gives the poorest performing
a different number of PLS components than do full-spectrum mod- model with respect to RMSECV or RMSEP (root mean square error
els in order to capture the relevant variation in y. This condition of cross-validation/prediction). This procedure is continued until
is caused by the variable amount of y-correlated information car- one interval remains.
ried by the interval variables (the larger the spectral interval, the The forward iPLS algorithm described in this paper is an inverse
greater the number of substances that are likely to absorb/interfere) evolution of BiPLS like the forward regression model. As in the
and, is also related to the noise/interference carried by the vari- interval PLS model, the data set is split into a given number of inter-
ables. However, the selected model dimension has to be common vals, but now the PLS models are built using successively improving
to all the local models in order to make a comparison possible. intervals with respect to RMSECV measure, i.e., if one chooses 40
In order to favor the “best” spectral region, it is natural to let the intervals, then, the first model is based on one interval which has
simplest interval model (i.e., the one with the smallest number of the best performing model the second model with the next interval
PLS components) guide the selection of the model dimension. A and so on.
fair comparison of the global and local models requires that the
global and local model dimensions be selected separately. Fig. 8 4.5.2.3. Synergy interval partial least squares (SiPLS) and genetic algo-
[55] shows the most common result after processing of iPLS. From rithm interval partial least squares (GA-iPLS). Synergy interval PLS
Fig. 8, several interval models surpass the full-spectrum model and (SiPLS) is an all-possible-interval-combinations procedure tests
the number 12 interval model shows the best results. based on all possible PLS on all subsets of intervals and reports
The results from using iPLS are comparable the other effective the subset of sets giving the lowest RMSECV or RMSEP. The compu-
methods tested, but the main advantage of using iPLS is the graph- tation time can be long depending on the number of intervals and
ical output giving an overview of the spectra data and in displaying the selected number of intervals to combine. The procedures are as
interesting spectral areas which could be selected. follows: first, the data set is subdivided into a number of intervals
(variable-wise) and secondly, all possible PLS model combinations
4.5.2. Expansion methods for interval partial least squares of two, three or four intervals are calculated.
In order to selection the more informative regions and to opti- Literature sources are the GAPLS algorithm described by Leardi
mize results, many methods were developed as expanding iPLS. et al. [84,85], the iPLS algorithm described Nørgaard et al. [94], and
some selection methods such as stepwise, synergy and genetic the GA-iPLS algorithm developed by Xiaobo [54]. The GA algorithm
algorithm were used to combine different intervals. These meth- was used to select intervals and the iPLS algorithm was used as a
ods are including backward/forward iPLS (BiPLS/FiPLS), synergy regression model. Fig. 10(c) shows the algorithm.
iPLS (SiPLS) and genetic algorithm iPLS (GAiPLS). Moving window First, the data set is split into N intervals (variable-wise), and PLS
partial least squares regression (MWPLSR) also expands on iPLS models for each interval are calculated with the results presented
by performing repeated PLS regressions within a window moving in a single plot.
across all variables. This thoroughly assesses the potential vari- Secondly, a GA was used to select several wavelength inter-
able range selection for a given size. As an additional refinement, vals as described by Leardi et al. [84]. However, on this occasion,
changeable size moving window partial least squares (CSMWPLS) the selection variables are intervals and PLS model combinations
allows regions selected by MWPLSR to be systematically modified of these selected intervals by iPLS algorithm are described by
in size to optimize results offering a further improvement. Also, Nørgaard et al. [94].
an inversion of the moving window methods allows for the direct The final intervals selection are the PLS model combinations of
elimination of uninformative wavelength intervals. those intervals that give the best performance model with respect
to RMSECV measure.
4.5.2.1. Simple optimization of the best interval from equidistant This algorithm takes advantages of genetic algorithms (GAs)
interval partial least squares. There is a minimal probability for and iPLS. It generally improves the prediction capabilities of PLS
hitting the optimal interval with the equidistant subdivisions. An modeling.
optimal interval might be found by carrying out small adjustments One of the main advantages of this method is the possibility to
in the interval limits. Fig. 9 shows the optimization algorithm gen- represent a local regression model in a graphical display, focus-
erally performed. It consists of the following steps: (i) interval shift; ing on a choice of better intervals and permitting a comparison
(ii) changes in interval width: two-sided (symmetrical), one-sided among interval models and the full-spectrum model. This method is
(asymmetrical, left), or one-sided (asymmetrical, right). Each step intended to give an overview of the data and can be helpful in inter-
is initiated with the optimal interval limits from the previous step. pretation. The software of interval PLS (iPLS) may be downloaded
The interval limits are changed one variable at a time and evalu- from the website of Royal Veterinary and Agricultural University of
ated by the RMSECV provided by application of PLS regression to the Denmark.
interval; this approach works in practice but could be done more
elegantly. 4.5.2.4. Moving window partial least squares (MWPLS), changing
Starting wavelength (SW), ending wavelength (EW) and wave- size moving window partial least squares (CSMWPLS) and search-
length interval (WI) are three spectral parameters that were ing combination moving window partial least squares (SCMWPLS).
optimized to obtain the best results by the optimization method Moving window (MW) wavelength selection [40] is a strategy to
mentioned above. obtain informative spectral regions which produce better predic-
tion results. In changing size moving window algorithm (CSMW),
4.5.2.2. Backward interval partial least squares (BiPLS) and Forward windows having different sizes are scanned over the whole spectral
interval partial least squares (FiPLS). BiPLS and FiPLS [55] are the iPLS range.
algorithm combined with forward and backward selection meth- In Moving window partial least squares regression (MWPLS)
ods. Fig. 10(a) and (b) show these two algorithms. [90], a spectral window commencing at the ith spectral channel
The backward iPLS (BiPLS) algorithm proceeds as follows: as in and teerminating at the (i + H − 1)th spectral channel is built. Here,
the interval PLS model the data set is split into a given number H is the window size. The spectra obtained in the spectral window is
28 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Fig. 9. Simple optimization of the interval for iPLS.

a submatrix Xi (N × H) containing the ith to the (i + H − 1)th columns this algorithm [95]. When H = 1, moving the window from the first
of the calibration matrix X. PLS models with varied numbers of PLS to the end point will collect all possible sub-windows with the win-
components can then be built to relate the spectra in the window dow size of 1. Similarly, in other cases of H, all sub-windows with
to the analytes of interests. the size of H may be obtained. Therefore, this algorithm considers
The window is moved through the whole spectra. At each posi- all possible spectral intervals (sub-window or sub-regions) in the
tion, PLS models with varying PLS components are built for the range of the informative region. For every window, a PLS model
calibration of the analytes, and the RMSECV (root mean square error with a selected LVs number is constructed, and root mean square
of cross-validation) or SEC are calculated with these PLS models error of calibration (SEC) is calculated. Comparing values of SEC for
and, plotted as a function of the position of the window. A figure all sub-regions, the sub-region with the smallest value of SEC is
containing such residual lines is plotted, this provides the infor- considered as the optimized spectral interval.
mation about informative regions, where the residue lines show The objective of searching combination moving window par-
low values of RMSECV or SEC. MWPLSR can provide informative tial least squares (SCMWPLS) [96,90] is to search for either the
regions and the approximate latent variable numbers. The informa- optimized combination of informative regions or an optimized
tive regions can construct improved, but not optimized prediction individual informative region. Fig. 11(b) explains this algorithm
PLS models, comparing with the whole spectral region. [95]. First, MWPLSR is performed to locate the informative regions.
Changeable size moving window partial least squares (CSMW- Subsequently, SCMWPLS starts the process from the first infor-
PLS) is a method to optimize an informative region, i.e., to search mative region. This informative region is optimized by changing
for an optimized sub-region in a selected informative region. The the moving window size H from 1 to p. A moving window is
basic idea of CSMWPLS [95], for a given informative region with moved from the first spectral point to the (p − H + 1)th point over
p spectral points, is to change the moving window size w from 1 the informative region and collects all possible sub-windows for
to p. A moving window is moved from the first spectral point to every window size. A PLS model with a reasonable PLS component
the (p − H + 1)th point over the informative region and to collect all selected by cross-validation is built and RMSEC is calculated for
possible sub-windows for every window size. Fig. 11(a) explains every window obtained. The sub-region with the smallest value

Fig. 10. (a–c) Combination of intervals by FiPLS, BiPLS and GAiPLS.


Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 29

Fig. 11. (a and b) Scheme for explanation of CSMWPLS and SCMWPLS.

of RMSEC among all sub-regions is considered as the optimized wavelengths are varied. Thus, by removing step 2, the following
sub-region and named as the base-region. In the next step, all pos- procedure will describe the steps required for CSMW. On the other
sible sub-regions for every window size are found out in the second hand, the main difference between MWPLSR and CSMWPLSR is in
informative region. Then, all these sub-regions are combined to the step 5. In this step, the algorithm is repeated with changing win-
base-region and a PLS model built by calculating its RMSEC. Next, dows intervals. Thus, by removing the steps 2 and 5 from the steps
a new base-region is chosen with the smallest value of RMSEC. of MCSMW, the procedure for MWPLSR may be obtained.
The same procedure as above is repeated until the last informa- There are two points to note. First, researchers can manu-
tive region is reached. Finally, the last base-region is considered as ally select their intervals in the spectra. Manual selection can
the optimized combination. also include the interval selection methods. Secondly, the wave-
Kasemsumran et al. [90] proposed and modified changeable size length selected with this procedure constitutes a set of descriptor
moving window partial least squares (MCSMWPLSR). The major variables, which can eventually be fed to different regression tech-
difference is the exertion of a wavelength interval changing step. niques (such as MLR), and including non-linear methods, such as
The steps for the MCSMW can be written as follows: neural networks, support vector machine (SVM), etc.

(i) Selecting a fixed size wavelength window having width desig- 4.5.3. Interval selection based on other methods
nated by W. Recently, wavelength selection procedures for the multivariate
(ii) Selecting a wavelength interval (WI) between the sensors; i.e., factor based methods of hybrid linear analysis (HLA) [42,97,98] and
WI = 2 means that in the selected window the wavelength interactive variable selection for PLS (IVS-PLS) [82,91,99–101] have
number 1, 3, 5 and so on are considered in the modeling. been discussed in many literatures.
This number indicates the number of wavelengths (NW) in the Wavelength selection by HLA involves the calculation of net ana-
selected window (NW = W/WI). Here, WI was varied between lyte signal regression plots (NASRP) from HLA, combined with a
1 and 10. moving window strategy. The main concept of HLA is to obtain a
(iii) Applying the desired regression method, i.e., PLS on the limited number of factors of a data matrix in which the contribution
absorbance data in the selected window, determining the of the analyte of interest has been removed, and, is therefore based
optimum number of factors and calculating the models per- on net analyte signal (NAS) calculation. The first significant factors
formances by cross-validation. of the HLA data matrix (from which the contribution of a given ana-
(iv) Scanning the selected window with specified WI through lyte has been removed) are used to search for the minimum error
the whole spectral region, by changing starting and ending indicator (EI). HLA uses less factors than the partial least squares
wavelength (SW and EW, respectively), and calculating model (PLS) method, and is simpler to adapt to the NASRP methodology
performances for each sub-region as described in the previous [42].
step. Iterative PLS (iPLS) is a variable selection method that is
(v) Go to step 1 for changing the windows width. designed to start with a small number of variables/windows and
subsequently add new variables/windows to or remove original
In the case of each analyte of interest, for every window, a regres- ones from the data set provided it improves the model. The method
sion model (PLS) with a selected number of factors is made, and used consists of four steps [101]:
root mean square error of cross-validation (RMSECV) is calculated.
Comparing values of RMSECV for all sub-regions, the sub-region (i) The original variables/windows are selected randomly.
with the smallest value of RMSECV is considered as the optimized (ii) An ordinary PLS calculation, using the selected wavelengths, is
spectral interval. made and the model is evaluated using cross-validation.
Referring to SCMWPLS reveals that the above steps, except step (iii) The variable/window to be added or withdrawn from the
2, are also used in CSMW. Therefore, the main difference between model is chosen randomly and a new PLS model is built and
CSMW and MCSMW is in step 2, where the intervals between the evaluated by means of cross-validation.
30 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

Table 1
Free processing toolbox for NIR spectroscopy.

Toolbox name Method Address and Website

ChemoAC Toolbox PLS, PCR, MLR, ANN, UVE http://minf.vub.ac.be/∼fabi/

iToolbox iPLS, BiPLS, FiPLS, SiPLS, mwPLS http://www.models.kvl.dk/source/iToolbox/index.asp

GAPLS Toolbox Genetic algorithm-PLS http://www.models.kvl.dk/source/GAPLS/index.asp

PLS Toolbox MLR, PLS, PCR and many pre-processing methods Eigenvector Research, Inc.
3905 West Eaglerock Drive
Wenatchee, WA 98801
www.eigenvector.com

SPA-Toolbox SPA http://www.ele.ita.br/∼kawakami/spa/

LS-SVMlab Support vector machine (SVM) http://www.esat.kuleuven.ac.be/sista/lssvmlab/

General SA algorithm Simulated annealing (SA) http://www.mathworks.com/matlabcentral/fileexchange/10548

(iv) If the new cross-validation value (root mean square error of ages cited in Table 1 are free available software and the authors
cross-validation, RMSECV) is lower than the original, the new would like to express their gratitude to the developers.
set of variables replaces the original. If the new cross-validation
value is higher than the original, the original set of variables is
retained. 6. Summary

Although NIR instrumentation produces large volumes of data,


The algorithm is terminated when every variable/window has
it often, as we have described, requires careful and sophisticated
been tested once without offering any improvements.
processing in order to extract information. Chemometrics meth-
ods have been found to be very useful for extracting information
4.6. Other wavelength selection methods from NIR spectra and there is great interest for using the NIR
technology for measurements of phenomena of different analytes
In 2007, Rossi et al. [102] presented a fast selection of NIR [25,27,47,59]. However, there is still a great need for new meth-
spectral variables with B-spline compression. This implemented ods that can handle data from these modern types of instruments.
a forward–backward procedure as applied to the coefficients of a Therefore, it is important to appraise the current methods in order
B-spline representation of the spectra. The criterion used in the to identify areas where improvements may be made. Applica-
forward–backward procedure uses the mutual information to find tion of the mentioned methods to multi-component spectroscopic
nonlinear dependencies between variables, as opposed to the gen- analysis usually requires spectral variable selection for building
erally used correlation. Each spectrum is described by a reduced well-fitted models. Training the multivariate calibration methods
set of new variables each one related to a range of frequencies. with selected spectral variables, rather than full-spectrum region,
The set of new variables being much smaller than the original set allows the informative part of the spectrum to be modeled. This
of spectral variables resulting with a significantly reduced com- is related to the variation of concentration of the analyte, con-
putation load; this renders the selection procedure feasible even sequently, other parts of the spectrum which are related to the
when the spectra contain a thousand or more spectral variables. variation of concentration of other analytes and/or background
Due to the localization properties of the B-splines, the new vari- variations will be discarded. Hence, the performance of multivari-
ables remain interpretable as they correspond to sub-ranges of the ate calibration models will be enhanced. This paper thus focuses on
original wavelength interval. the so-called wavelength selection: the wavelengths are selected
The spline representation allows interpretation of the results as prior the use of any prediction model.
groups of consecutive spectral variables will be selected. Moreover, Wavelength selection is an old and yet ever-growing research
B-spline compression allows us to reduce both the feature selection field in chemometrics, the literature on variable selection
running time and, to increase the quality of the prediction results is very large. it is supported by both practical experiments
compared with the same nonlinear procedure applied directly to [49,58,59,69,105,106] and theoretical research and indicates that
the original spectral variables. wavelength selection is necessary for multivariate spectroscopic
Wu et al. used Kalman filtering [34], Fisher’s weights [33,35,103] calibration [15,59]. There have been many studies devoted to
and Bayesian [104] as a feature selection method. Kalman filtering, this problem; for a comprehensive review one can see refer
Fisher’s weights and Bayesian are the classical statistic methods. [25–27,47,60,107].
Detailed explanations and applications of these methods in NIR Several approaches have been discussed in this article for selec-
spectral selection were given in the literatures mentioned above. tion of optimal set of spectral wavelength variables or regions
These methods successfully selected the wavelengths which lead for multivariate calibration such as multi-linear regression (MLR),
to model giving very good results. generalized simulating annealing (SA), genetic algorithms (GA),
artificial noise introduction in PLS modeling (UVE-PLS), wavelet
5. Software of wavelength selection methods transform (WT), successive projections algorithm (SPA), interval
selection strategy (iPLS), etc.
In Table 1, we list the software packages (mainly in Matlab) used These algorithms work in different ways and they have been
in the non-commercial studies. The software package from NIR developed for different applications. It is therefore difficult to get
spectroscopy company and combined with the NIR spectra equip- an overview of which algorithm is best suited for a particular type
ment, such as OPUS software version 4.2 also from Bruker Gmbh of data. It requires the user to select them in practice.
(Bremen, Germany), OMNIC series from Thermo Nicolet company, There are some difficulties in the selection methods. First, the
and the software of TQ Analyst, will not be discussed here. The pack- direct selection of variables by mutual information (MI), that is
Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32 31

the information based on both spectral data and sample proper- [8] J.D. Caplan, S. Waxman, R.W. Nesto, J.E. Muller, Journal of the American College
ties data, suffer from the following drawbacks. (i) The MI estimation of Cardiology 47 (2006) C92–C96.
[9] A. Sakudo, Y. Suganuma, T. Kobayashi, T. Onodera, K. Ikuta, Biochemical and
becomes difficult as the number of selected variables grows. Indeed Biophysical Research Communications 341 (2006) 279–284.
in a forward procedure, the estimation is faced with the problem [10] C. Connolly, Sensor Review 25 (2005) 192–194.
of dimensionality, and making the estimation of the MI with the [11] G.P. Moreda, J. Ortiz-Cañavate, F.J. García-Ramos, M. Ruiz-Altisent, Journal of
Food Engineering 92 (2009) 119–136.
last selected feature much more difficult than with the first. (ii) [12] C.E. Miller, in: P. Williams, K. Norris (Eds.), Near-Infrared Technology in the
The low number of spectra usually available for learning makes the Agricultural and Food Industries, American Society of Cereal Chemists, St.
results of the selection highly dependent on the data set: a small Paul, Minnesota, 2001, pp. 19–37.
[13] R. Karoui, J. De Baerdemaeker, Food Chemistry 102 (2007) 621–640.
change in the data can lead to different variable selection sets, mak- [14] S. Landau, T. Glasser, L. Dvash, Small Ruminant Research 61 (2006) 1–11.
ing interpretation difficult. (iii) Even though the estimation of the [15] N. Boaz, R.C. Ronald, Journal of Chemometrics 19 (2005) 107–118.
mutual information is less demanding in terms of computation time [16] H. Namkung, Y. Lee, H. Chung, Analytica Chimica Acta 606 (2008) 50–56.
[17] R.C. Schneider, K.-A. Kovar, Forensic Science International 134 (2003)
than the construction of a nonlinear model, the large number of
187–195.
initial variables results in high computation times for the selec- [18] C.B. Zachariassen, J. Larsen, F. van den Berg, S.B. Engelsen, Chemometrics and
tion. Secondly, search-based methods provide a promising way Intelligent Laboratory Systems 76 (2005) 149–161.
to extend state-of-the-art spectral analysis to nonlinear method- [19] T.M. Baye, T.C. Pearson, A.M. Settles, Journal of Cereal Science 43 (2006)
236–243.
ologies; genetic algorithms (GA) offer an interesting, flexible and [20] D.D. Archibald, D.E. Akin, Vibrational Spectroscopy 23 (2000) 169–180.
widely used wavelength variable selection. However, a problem [21] L.O. Rodrigues, J.L. Marques, J.P. Cardoso, J.C. Menezes, Chemometrics and
inherent to all search-based methods is a tendency to yield wave- Intelligent Laboratory Systems 75 (2005) 101–108.
[22] K. Krämer, S. Ebel, Analytica Chimica Acta 420 (2000) 155–161.
length selection instabilities relative to sample data additions or [23] H. Sato, M. Kiguchi, F. Kawaguchi, A. Maki, NeuroImage 21 (2004) 1554–1562.
subtractions, which is due to the susceptibility of the region selec- [24] M. Casale, M.-J. Sáiz Abajo, J.-M. González-Sáiz, C. Pizarro, M. Forina, Analytica
tion to random noise. Thirdly, interval PLS (iPLS), including moving Chimica Acta 557 (2006) 360–366.
[25] D.-L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte, L. Kaufman,
windows strategy (MWS) are very good at locating wavelength Chemometrics: A Textbook, Elsevier, Amsterdam, 1988.
regions of the main component contributions. However, the con- [26] T. Naes, T. Isaksson, T. Fearn, T. Davis, A User-Friendly Guide to Multivariate
straints placed on the interval width and number avoid the need Calibration and Classification, NIR Publications, Christerer, UK, 2002.
[27] J.J. Workman Jr., in: D.A. Burns, E.W. Ciurczak (Eds.), Handbook of Near-
for testing large numbers of combinations, while still providing an Infrared Analysis, Marcel Dekker, Inc., New York, 1992, pp. 274–276.
exhaustive search pattern. Perhaps, an alternative approach that [28] M. Forina, S. Lanteri, M. Casale, M.C. Cerrato Oliveros, Chemometrics and
avoids the potential wavelength selection instability pitfall and, Intelligent Laboratory Systems 87 (2007) 252–261.
[29] B.L. Becker, D.P. Lusch, J. Qi, Remote Sensing of Environment 108 (2007)
provides a simpler graphic representation amenable to spectro-
111–120.
scopic interpretation is worth exploring. [30] U.G. Indahl, N.S. Sahni, B. Kirkhus, T. Næs, Chemometrics and Intelligent Lab-
Variable selection techniques consist of selecting particular vari- oratory Systems 49 (1999) 19–31.
ables related to the response. Generally, variable selection aims to [31] P.J. de Groot, G.J. Postma, W.J. Melssen, L.M.C. Buydens, Analytica Chimica
Acta 392 (1999) 67–75.
identify a subset of wavelengths that produces the smallest pos- [32] Q. Guo, W. Wu, D.-L. Massart, Analytica Chimica Acta 382 (1999) 87–103.
sible error. The benefits of variable selection are twofold. Many [33] W. Wu, Q. Guo, D. Jouan-Rimbaud, D.-L. Massart, Chemometrics and Intelli-
literatures have shown that PLSR and PCR methods perform better gent Laboratory Systems 45 (1999) 39–53.
[34] W. Wu, S.C. Rutan, A. Baldovin, D.-L. Massart, Analytica Chimica Acta 335
when wavelength selection is applied. However, this is not always (1996) 11–22.
the case because, when selecting the most correlated wavelengths, [35] W. Wu, D.-L. Massart, Chemometrics and Intelligent Laboratory Systems 35
one might eliminate those that correct for the influence of inter- (1996) 127–135.
[36] W. Wu, B. Walczak, D.-L. Massart, S. Heuerding, F. Erni, I.R. Last, K.A. Prebble,
fering compounds or factors. Indeed, a variable that is completely Chemometrics and Intelligent Laboratory Systems 33 (1996) 35–46.
useless by itself can provide a significant improvement in per- [37] W. Wu, Y. Mallet, B. Walczak, W. Penninckx, D.-L. Massart, S. Heuerding, F.
formance when taken in combination with others. Nevertheless, Erni, Analytica Chimica Acta 329 (1996) 257–265.
[38] M. Zeaiter, J.M. Roger, V. Bellon-Maurel, Trends in Analytical Chemistry 24
variable selection provides faster, more cost-effective predictors. (2005) 437–445.
[39] R.H. William, in: P. Williams, K. Norris (Eds.), Near-Infrared Technology in
the Agricultural and Food Industries, American Society of Cereal Chemists, St.
Acknowledgements Paul, Minnesota, 2001, pp. 39–58.
[40] B. Hemmateenejad, M. Akhond, F. Samari, Spectrochimica Acta Part A: Molec-
ular and Biomolecular Spectroscopy 67 (2007) 958–965.
The authors gratefully acknowledge the financial support pro-
[41] M. Khanmohammadi, M.A. Karimi, K. Ghasemi, M. Jabbari, A.B. Garmarudi,
vided by the foundations of NSFC (Grant no. 6091079), Chinese 863 Talanta 72 (2007) 620–625.
Program (Grant nos. 2008AA10Z208, 2008AA10Z204), the Postdoc- [42] B. Hemmateenejad, R. Ghavami, R. Miri, M. Shamsipur, Talanta 68 (2006)
1222–1229.
toral Foundation of China (20070411024, 0601003C) and the talent
[43] R.K.H. Galvão, M. Fernanda Pimentel, M.C.U. Araújo, T. Yoneyama, V. Visani,
foundation of Jiangsu University. Dr. Zou Xiabo thanks Dr. Jianshe Analytica Chimica Acta 443 (2001) 107–115.
Chen (University of Leeds) for advice and encouragement, and to [44] G.A. Bakken, T.P. Houghton, J.H. Kalivas, Chemometrics and Intelligent Labo-
the many researchers whom have offered the stimulating works in ratory Systems 45 (1999) 225–239.
[45] W. Wu, R. Manne, Chemometrics and Intelligent Laboratory Systems 51
this field. (2000) 145–161.
[46] L. Pasti, D. Jouan-Rimbaud, D.-L. Massart, O.E.D. Noord, Analytica Chimica Acta
364 (1998) 253–263.
References [47] T. Fearn, in: J.M. Chalmers, P.R. Griffiths (Eds.), Handbook of Vibrational Spec-
troscopy, vol. 3, Wiley, Chichester, 2002, pp. 2086–2093.
[1] A. Murugesan, C. Umarani, T.R. Chinnusamy, M. Krishnan, R. Subramanian, [48] H. Martens, T. Naes, Multivariate Calibration, Wiley, Chichester, UK, 1989.
N. Neduzchezhain, Renewable and Sustainable Energy Reviews 13 (2009) [49] D. Jouan-Rimbaud, B. Walczak, D.-L. Massart, I.R. Last, K.A. Prebble, Analytica
825–834. Chimica Acta 304 (1995) 285–295.
[2] L.C. Meher, D. Vidya Sagar, S.N. Naik, Renewable and Sustainable Energy [50] A. Donachie, A.D. Walmsley, S.J. Haswell, Analytica Chimica Acta 378 (1999)
Reviews 10 (2006) 248–268. 235–243.
[3] C. Gendrin, Y. Roggo, C. Spiegel, C. Collet, European Journal of Pharmaceutics [51] E. Vigneau, D. Bertrand, E.M. Qannari, Chemometrics and Intelligent Labora-
and Biopharmaceutics 68 (2008) 828–837. tory Systems 35 (1996) 231–238.
[4] Y. Roggo, P. Chalus, L. Maurer, C. Lema-Martinez, A. Edmond, N. Jent, Journal [52] P.D. Wentzell, L. Vega Montoto, Chemometrics and Intelligent Laboratory
of Pharmaceutical and Biomedical Analysis 44 (2007) 683–700. Systems 65 (2003) 257–279.
[5] J. Nyström, E. Dahlquist, Fuel 83 (2004) 773–779. [53] S. Wold, J. Trygg, A. Berglund, H. Antti, Chemometrics and Intelligent Labora-
[6] K.D. Shepherd, M.G. Walsh, Journal of Near Infrared Spectroscopy 15 (2007) tory Systems 58 (2001) 131–150.
1–19. [54] Z. Xiaobo, Z. Jiewen, H. Xingyi, L. Yanxiao, Chemometrics and Intelligent Lab-
[7] S.J. Erickson, A. Godavarty, Medical Engineering & Physics 31 (2009) 495–509. oratory Systems 87 (2007) 43–51.
32 Z. Xiaobo et al. / Analytica Chimica Acta 667 (2010) 14–32

[55] X. Zou, J. Zhao, Y. Li, Vibrational Spectroscopy 44 (2007) 220–227. [91] A. Bogomolov, M. Hachey, Chemometrics and Intelligent Laboratory Systems
[56] X. Zou, Apple’s quality inspection technology based on fusion of machine 88 (2007) 132–142.
vision, electronic nose and NIR spectroscopy, Zhenjiang, China, 2005. [92] J.A. Cramer, K.E. Kramer, K.J. Johnson, R.E. Morris, S.L. Rose-Pehrsson, Chemo-
[57] E. Bertran, M. Blanco, S. Maspoch, M.C. Ortiz, M.S. Sánchez, L.A. Sarabia, metrics and Intelligent Laboratory Systems 92 (2008) 13–21.
Chemometrics and Intelligent Laboratory Systems 49 (1999) 215–224. [93] A.F.C. Pereira, M.J.C. Pontes, F.F.G. Neto, S.R.B. Santos, R.K.H. Galvão, M.C.U.
[58] S.D. Frans, J.M. Harris, Analytical Chemistry 57 (1985) 2680–2684. Araújo, Food Research International 41 (2008) 341–348.
[59] C.H. Spiegelman, M.J. McShane, M.J. Goetz, M. Motamedi, Q.L. Yue, G.L. Cote, [94] A.S.L. Nørgaard, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Applied Spec-
Analytical Chemistry 70 (1998) 35–44. troscopy 54 (2000) 413–419.
[60] M.L. Thompson, International Statistical Review 46 (1978) 1–19. [95] Y.P. Du, Y.Z. Liang, J.H. Jiang, R.J. Berry, Y. Ozaki, Analytica Chimica Acta 501
[61] V. Centner, D.-L. Massart, Analytical Chemistry 68 (1996) 3851–3858. (2004) 183–191.
[62] M.C.U. Araújo, T.C.B. Saldanha, R.K.H. Galvão, T. Yoneyama, H.C. Chame, V. [96] Y. Zheng, X. Lai, S.W. Bruun, H. Ipsen, J.N. Larsen, H. Løwenstein, I. Søndergaard,
Visani, Chemometrics and Intelligent Laboratory Systems 57 (2001) 65– S. Jacobsen, Journal of Pharmaceutical and Biomedical Analysis 46 (2008)
73. 592–596.
[63] M.J.C. Pontes, J. Cortez, R.K.H. Galvão, C. Pasquini, M.C.U. Araújo, R.M. Coelho, [97] M. Kompany-Zareh, S. Mirzaei, Analytica Chimica Acta 526 (2004) 83–94.
M.K. Chiba, M.F. de Abreu, B.E. Madari, Analytica Chimica Acta 642 (2009) [98] H.C. Goicoechea, A.C. Olivieri, Talanta 49 (1999) 793–800.
12–18. [99] I. Esteban-Díez, J.-M. González-Sáiz, C. Pizarro, Analytica Chimica Acta 525
[64] F. Liu, Y. Jiang, Y. He, Analytica Chimica Acta 635 (2009) 45–52. (2004) 171–182.
[65] W. Cai, Y. Li, X. Shao, Chemometrics and Intelligent Laboratory Systems 90 [100] D. Chen, W. Cai, X. Shao, Chemometrics and Intelligent Laboratory Systems
(2008) 188–194. 87 (2007) 312–318.
[66] S. Ye, D. Wang, S. Min, Chemometrics and Intelligent Laboratory Systems 91 [101] C. Abrahamsson, J. Johansson, A. Sparen, F. Lindgren, Chemometrics and Intel-
(2008) 194–199. ligent Laboratory Systems 69 (2003) 3–12.
[67] S. Kirkpatrick, M.P. Vecchi, Science 220 (1983) 671–680. [102] F. Rossi, D. Francois, V. Wertz, M. Meurens, M. Verleysen, Chemometrics and
[68] H. Swierenga, P.J. de Groot, A.P. de Weijer, M.W.J. Derksen, L.M.C. Buydens, Intelligent Laboratory Systems 86 (2007) 208–218.
Chemometrics and Intelligent Laboratory Systems 41 (1998) 237–248. [103] J. Luypaert, S. Heuerding, Y.V. Heyden, D.-L. Massart, Journal of Pharmaceuti-
[69] J.H. Kalivas, N. Roberts, J.M. Sutter, Analytical Chemistry 61 (1989) cal and Biomedical Analysis 36 (2004) 495–503.
2024–2030. [104] M. Vannucci, N. Sha, P.J. Brown, Chemometrics and Intelligent Laboratory
[70] H. Swierenga, F. Wülfert, O.E. de Noord, A.P. de Weijer, A.K. Smilde, L.M.C. Systems 77 (2005) 139–148.
Buydens, Analytica Chimica Acta 411 (2000) 121–135. [105] J.A. Panford, J.M. deMan, Journal of the American Oil Chemists’ Society 67
[71] J.R.M. Smits, W.J. Melssen, L.M.C. Buydens, G. Kateman, Chemometrics and (1990) 473–482.
Intelligent Laboratory Systems 22 (1994) 165–189. [106] C.W. Brown, P.F. Lynch, R.J. Obremski, D.S. Lavery, Analytical Chemistry 54
[72] S. Robert, A. Mure-Ravaud, S. Thiria, M. Yacoub, F. Badran, Optics Communi- (1982) 1472–1479.
cations 238 (2004) 215–228. [107] A. Hoskuldsson, Chemometrics and Intelligent Laboratory Systems 55 (2001)
[73] Z. Boger, Analytica Chimica Acta 490 (2003) 31–40. 23–38.
[74] V.G. Franco, J.C. Perin, V.E. Mantovani, H.C. Goicoechea, Talanta 68 (2006)
1005–1012. Glossary
[75] R. Todeschini, D. Galvagni, J.L. Vilchez, M. del Olmo, N. Navas, Trends in Ana-
lytical Chemistry 18 (1999) 93–98.
MVC: multivariate calibration
[76] L.F. Capitan-Vallvey, N. Navas, M. del Olmo, V. Consonni, R. Todeschini, Talanta
LMVC: linear multivariate calibration
52 (2000) 1069–1079.
[77] C.B. Lucasius, M.L.M. Beckers, G. Kateman, Analytica Chimica Acta 286 (1994) PLS: partial least squares
135–153. MLR: multiple linear regression
[78] O. Polgár, M. Fried, T. Lohner, I. Bársony, Surface Science 457 (2000) 157–177. PCR: principal components regression
[79] R.K.H. Galvão, M.C.U. Araújo, M.D.N. Martins, G.E. José, M.J.C. Pontes, E.C. Silva, SEC: standard error of calibration
T.C.B. Saldanha, Chemometrics and Intelligent Laboratory Systems 81 (2006) RMSECV: root mean square error of cross-validation
60–67. r: correlation coefficient
[80] P.A. da Costa Filho, Analytica Chimica Acta 631 (2009) 206–211. SEP: standard error of prediction
[81] S. Gourvenec, X. Capron, D.-L. Massart, Analytica Chimica Acta 519 (2004) LOOCV: leave-one-out cross-validation
11–21. SPA: successive projections algorithm
[82] H. Abdollahi, L. Bagheri, Analytica Chimica Acta 514 (2004) 211–218. UVE: uninformative variable elimination
[83] J. Ghasemi, A. Niazi, R. Leardi, Talanta 59 (2003) 311–317. SA: simulated annealing
[84] R. Leardi, in: Data Handling in Science and Technology, Elsevier, 2003, pp.
ANN: artificial neural networks
169–196.
GA: genetic algorithm
[85] R. Leardi, M.B. Seasholtz, R.J. Pell, Analytica Chimica Acta 461 (2002) 189–200.
[86] A. Durand, O. Devos, C. Ruckebusch, J.P. Huvenne, Analytica Chimica Acta 595 iPLS: interval partial least squares
(2007) 72–79. BP-ANN: back-propagation artificial neural networks
[87] I.M. Baskir, A.V. Drozd, Chemometrics and Intelligent Laboratory Systems 66 K-ANN: Kohonen artificial neural network
(2003) 89–91. BiPLS: backward iPLS
[88] L. Stordrange, T. Rajalahti, F.O. Libnau, Chemometrics and Intelligent Labora- FiPLS: forward iPLS
tory Systems 70 (2004) 137–145. SiPLS: synergy iPLS
[89] L. Nørgaard, M.T. Hahn, L.B. Knudsen, I.A. Farhat, S.B. Engelsen, International GAiPLS: genetic algorithm iPLS
Dairy Journal 15 (2005) 1261–1270. MWPLSR: moving window partial least squares regression
[90] S. Kasemsumran, Y.P. Du, K. Maruo, Y. Ozaki, Chemometrics and Intelligent CSMWPLS: changeable size moving window partial least squares
Laboratory Systems 82 (2006) 97–103.

You might also like