0% found this document useful (0 votes)
206 views26 pages

Response Surface Methodology

This document discusses Response Surface Methodology (RSM) for simulation optimization. RSM uses sequential experiments and metamodels to approximate the response surface and find optimal input values that minimize an objective. It begins with an overview of classic RSM developed for physical systems. The document then focuses on adapting RSM for simulation which treats the model as a "black box" and may violate assumptions of classic RSM. It also discusses extensions of RSM like generalized RSM for multiple responses and robust optimization accounting for uncertain variables. The overall goal is to provide an updated guide to using RSM for simulation optimization problems.

Uploaded by

wagnerteixeira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views26 pages

Response Surface Methodology

This document discusses Response Surface Methodology (RSM) for simulation optimization. RSM uses sequential experiments and metamodels to approximate the response surface and find optimal input values that minimize an objective. It begins with an overview of classic RSM developed for physical systems. The document then focuses on adapting RSM for simulation which treats the model as a "black box" and may violate assumptions of classic RSM. It also discusses extensions of RSM like generalized RSM for multiple responses and robust optimization accounting for uncertain variables. The overall goal is to provide an updated guide to using RSM for simulation optimization problems.

Uploaded by

wagnerteixeira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

No.

2014-013

RESPONSE SURFACE METHODOLOGY

By

Jack P.C. Kleijnen

14 February, 2014

ISSN 0924-7815
ISSN 2213-9532

Electronic copy
Electronic copy available
available at:
at:https://ssrn.com/abstract=2395836
http://ssrn.com/abstract=2395836
manuscript No.
(will be inserted by the editor)

Response Surface Methodology


Jack P.C. Kleijnen
Tilburg University (e-mail: Kleijnen@TilburgUniversity.edu)

The date of receipt and acceptance will be inserted by the editor

Abstract This chapter …rst summarizes Response Surface Methodology


(RSM), which started with Box and Wilson’s article in 1951 on RSM for
real, non-simulated systems. RSM is a stepwise heuristic that uses …rst-
order polynomials to approximate the response surface locally. An estimated
polynomial metamodel gives an estimated local gradient, which RSM uses
in steepest ascent (or descent) to decide on the next local experiment. When
RSM approaches the optimum, the latest …rst-order polynomial is replaced
by a second-order polynomial. The …tted second-order polynomial enables
the estimation of the optimum. Furthermore, this chapter focuses on simu-
lated systems, which may violate the assumptions of constant variance and
independence. The chapter also summarizes a variant of RSM that is proven
to converge to the true optimum, under speci…c conditions. The chapter
presents an adapted steepest ascent that is scale-independent. Moreover,
the chapter generalizes RSM to multiple random responses, selecting one
response as the goal variable and the other responses as the constrained
variables. This generalized RSM is combined with mathematical program-
ming to estimate a better search direction than the steepest ascent direction.
To test whether the estimated solution is indeed optimal, bootstrapping may
be used. Finally, the chapter discusses robust optimization of the decision
variables, while accounting for uncertainties in the environmental variables.
Keywords: simulation; optimization; regression, robustness; risk
JEL: C0, C1, C9

1 Introduction

RSM is one of the more popular methods in simulation optimization. Orig-


inally, RSM was developed for the optimization of real (physical) systems;

Electronic copy
Electronic copy available
available at:
at:https://ssrn.com/abstract=2395836
http://ssrn.com/abstract=2395836
2 Jack P.C. Kleijnen

see the classic article by Box and Wilson (1951), the overview of RSM dur-
ing the period 1966-1988 by Myers et al. (1989), the recent third edition of
thee popular handbook by Myers et al. (2009), and recent RSM software
on the Web. RSM in simulation was discussed in the monograph by Kleij-
nen (1975). One of the …rst case-studies on RSM in simulation is Van den
Bogaard and Kleijnen (1977), who simulated a computer center with two
servers and three priority classes— namely, small, medium, and large jobs—
for which they estimated the 90% quantiles of the waiting times per class,
for di¤erent class limits, and applied RSM to …nd the optimal class limits.
RSM in simulation is also discussed by Barton and Meckesheimer (2006),
Bartz-Beielstein (2006), Law (2014), and Rosen et al. (2008). Google gave
more than two million results for the term "Response Surface Methodol-
ogy", on February 4, 2014. Unfortunately, RSM (unlike some other search
heuristics) has not yet been implemented as an add-on to commercial sim-
ulation software.
RSM treats the simulation model as a black box ; i.e., RSM observes the
input/output (I/O) of the simulation model, but not the internal variables
and speci…c functions implied by the simulation’s computer modules. RSM
is a sequential heuristic; i.e., it uses a sequence of local experiments that
may lead to the optimum input combination. Note that an input combina-
tion is also called a point or a scenario. RSM uses design of experiments
(DOE) and the concomitant linear regression analysis. Though RSM is only
a heuristic, it has gained a good track record— as we shall indicate in the
next sections. Moreover, practitioners may not be interested in convergence
proofs, because realistic experiments take so much time that large sample
sizes are impossible; practitioners may be more interested in …nding better
solutions than the current one (a French expression claims that "the best is
the enemy of the better").
More speci…cally, RSM uses a sequence of local metamodels (approxima-
tions) that are …rst-order polynomials in the inputs; once the optimum seems
close, RSM augments the latest …rst-order polynomial to a second-order
polynomial. Notice that terminology may be misleading, because many au-
thors use the term "response surface’instead of "metamodel"; an example
is Rikards and Auzins (2002). Note that RSM may be combined with white-
box methods that give estimated gradients, which may improve RSM; see
Qu and Fu (2012).
We focus on discrete-event simulation, though RSM has also been ap-
plied to deterministic simulation and the simulation of sets of random non-
linear di¤erence equations. By de…nition, random simulation (including dis-
crete event simulation) uses pseudorandom numbers (PRN). We do not dis-
cuss deterministic simulation with numeric noise.
Furthermore, we assume that the inputs are continuous variables. We
focus on simulation that has a mean (expected value) as "the" response
(output). Nevertheless, we have also applied RSM to outputs that are quan-
tiles; see again Van den Bogaard and Kleijnen (1977). The response may
also be a probability (e.g., the over‡ow probability in a queueing system),

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 3

which we may formulate as the expected value of a binary variable, using


the indicator function.
The optimum solution for the decision variables may turn out to be infe-
rior when ignoring uncertainties in the non-controllable environmental vari-
ables; i.e., these uncertainties create a risk. Besides "robust optimization"
originated by Taguchi for the design of products, Ben-Tal developed robust
optimization in mathematical programming with uncertain coe¢ cients; see
Taguchi (1987) and the update in Myers et al. (2009, pp. 483-555), and
Ben-Tal and Nemirovski (1998) and the update in Yanikoglu et al. (2013).
We shall discuss both approaches.
We assume that RSM starts after the important factors (inputs) and
their experimental area have been identi…ed; i.e., before RSM starts we
may need to use factor screening to identify the really important factors
among the many conceivably important factors. A recent survey of various
screening methods is Shi et al. (2014). However, Chang et al. (2014) combine
RSM with screening in a single method (called STRONG-S); we point out
that RSM without a preceding screening phase may imply the simulation
of extremely many combinations of simulation inputs, as we shall see in
Section 2.
The remainder of this chapter is organized as follows. Section 2 summa-
rizes classic RSM, developed for real-life experimentation. Section 3 adapts
this RSM to the needs of simulation. Section 4 presents the adapted steep-
est descent (ASD) search direction, developed by Kleijnen et al. (2004).
Section 5 summarizes GRSM for simulation with multiple responses, devel-
oped by Angün et al. (2009). Section 6 summarizes a procedure for testing
whether an estimated optimum is truly optimal— using the Karush-Kuhn-
Tucker (KKT) conditions— developed by Bettonvil et al. (2009). Section 7
discusses robust optimization. Section 8 presents conclusions. The chapter
ends with many references, enabling further study of details. This chapter
updates and extends Kleijnen (2008) Paragraphs starting with "Note:" may
be skipped in a …rst reading.

2 RSM: Basics

As we have already said, the basics of RSM were developed in real-life ex-
periments for the simplest type of optimization problems; namely, minimize
the expected value of a single output, with continuous inputs and without
any constraints:
min E(w0 jz) (1)
z

where E(w0 jz) is the goal or objective output, which is to be minimized


through the choice of the input combinations z = (z1 ; : : : ; zk )0 where zj (j =
1; : : : k) denotes the j th "original" input; i.e., the inputs are not standardized
such that they lie between -1 and 1.
Note: We use z for the original input and x for the standardized input;
RSM is scale dependent as we shall see. For the output we use w, which

Electronic copy available at: https://ssrn.com/abstract=2395836


4 Jack P.C. Kleijnen

may be a mnemonic referring to "waiting time" which is an output of many


simulation models.
RSM has the following characteristics, which we shall detail next.
(i) RSM is an optimization heuristic that tries to estimate the input
combination that minimizes a given goal function; see again (1). Because
RSM is a heuristic, success is not guaranteed.
(ii) RSM is a stepwise (multi-stage) method; see the steps below.
(iii) In each step, RSM …ts a local …rst-order polynomial regression
(meta)model— except for the last step, in which RSM …ts a second-order
polynomial.
(iv) To …t (estimate, calibrate) these …rst-order polynomials, RSM uses
I/O data obtained through so-called resolution-III (R-III) designs; for the
second-order polynomial, RSM uses a central composite design (CCD); we
shall de…ne these R-III designs and CCD later in this section.
(v) Each step— except the last one— selects the direction for changing
the inputs through the gradient implied by the …rst-order polynomial …tted
in that step. This gradient is used in the mathematical (not statistical)
technique of steepest descent— or steepest ascent, in case the output is to
be maximized.
(vi) In the …nal step, RSM takes the derivatives of the locally …tted
second-order polynomial to estimate the optimum input combination. RSM
may also apply the mathematical technique of canonical analysis to this
polynomial, to examine the shape of the optimal subregion: does that region
have a unique minimum, a saddle point, or a ridge with stationary points?
More speci…cally, RSM consists of the following steps; also see Figure 1
in Section 5, which gives an example with a single goal function and two
constrained random outputs; these constraints vanish in classic RSM.
RSM must be initialized by the analysts who select a starting point. They
may select the input combination currently used in practice if the system
already exists. Otherwise, they should use intuition and prior knowledge— as
in many other heuristics.
For the neighborhood of this starting point, RSM explores the I/O behav-
ior of the observed system. RSM approximates this behavior through a local
…rst-order polynomial— as Taylor’s series expansion suggests— augmented
with additive white noise e:
k
X
y= 0 + j zj + e: (2)
j=1

We denote the regression parameters by = ( 0 ; 1 ; :::; k )0 with the inter-


cept 0 and the k …rst-order or "main" e¤ects j (j = 1; : : : ; k). White noise
means that e is normally, independently, and identically distributed (NIID)
with zero mean and a constant variance (say) 2 in the local experimental
area: e N IID(0; 2 ). When the next step moves to a new local area, the
variance may change. Obviously, (2) is a linear regression model with white

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 5

Combi. i 1 2 3 4 = 1:2 5 = 1:3 6 = 2:3 7 = 1:2:3


1 - - - + + + -
2 + - - - - + +
3 - + - - + - +
4 + + - + - - -
5 - - + + - - +
6 + - + - + - -
7 - + + - - + -
8 + + + + + + +
Table 1 A fractional factorial design for seven factors

noise, so least squares (LS) gives the best linear unbiased estimator (BLUE)
of :
b = (Z0 Z) 1 Z0 w (3)
where Z denotes the N (k + 1) matrix that is determined by the R-III
design and the mi replicationsPn of point i, and w denotes the vector with N
outputs; obviously, N = i=1 mi so Z has mi identical rows where the …rst
element of each row is 1 and corresponds with the intercept 0 .
To select the n input combinations needed to estimate these k +1 e¤ects,
RSM uses a R-III design with n k + 1. In classic R-III designs this n
is a multiple of four; e.g., if k = 7, then Table 1 gives a 27 4 fractional
factorial design in the standardized inputs xj so - means -1 and + means 1;
the last column has the so-called generator 7 = 1:2:3, which means xi;7 =
xi;1 xi;2 xi;3 ; the generators in the columns 4, 5, and 6 are de…ned analogously.
The design is "balanced"; i.e., each column has n=2 values equal to -1.
Moreover, all pairs of columns are orthogonal. If n is not a power of 2,
then the R-III design is called a Plackett-Burman design, and is tabulated
in many publications and provided by DOE software; e.g., Kleijnen (2014)
gives the design for k = 11 inputs and n = 12 combinations.
Unfortunately, there are no general guidelines for determining the appro-
priate size of the local area in each step; again, intuition and prior knowledge
are important. Chang et al. (2013), however, decide on the size of the local
area, using a so-called trust region; also see Section 3.
To decide on the next subarea, RSM follows the steepest descent path;
e.g., if the estimated …rst-order e¤ects are such that b1 >> b2 > 0, then RSM
decreases input z1 much more than input z2 . Unfortunately, the steepest-
descent method is scale dependent; i.e., linear transformations of the inputs
a¤ect the search direction; see Myers et al. (2009). In Section 4 we shall
present a scale-independent variant.
The steepest descent technique does not quantify the step size along its
path. The analysts may therefore try some intuitively selected value for the
step size. If that value yields an output that is signi…cantly higher instead
of lower, then they may reduce the step size. Otherwise, they take one more
step in the current steepest descent direction. In Section 5 we shall present
a more sophisticated mathematical procedure for selecting the step size.

Electronic copy available at: https://ssrn.com/abstract=2395836


6 Jack P.C. Kleijnen

After a number of steps along the steepest descent path, the output
will increase instead of decrease: the …rst-order polynomial is only a local
approximation of the true I/O function. When such deterioration occurs,
RSM observes the n > k combinations speci…ed by a R-III design centered
around the best point found so far; i.e., RSM may use the same design as in
step 2 (see Table 1), but translate the standardized inputs xj into di¤erent
values for the original inputs zj . The best combination found so far, may
be one of the corner points of the design; see again Figure 1. Next RSM
estimates the …rst-order e¤ects in the new local polynomial approximation,
using LS; see (3). And so the search continues.
It is intuitively clear that the plane implied by the most recent local …rst-
order polynomial cannot adequately represent a hill top when searching to
maximize a function or— equivalently— minimize (1). So in the neighbor-
hood of the optimum, a …rst-order polynomial shows serious lack of …t. A
popular and simple diagnostic measure is the coe¢ cient of determination
R2 . A related diagnostic uses an F statistic to test whether all estimated
…rst-order e¤ects— and hence the gradient— are zero. These diagnostics are
given by all modern regression software packages. Instead of these diagnos-
tics, RSM might use "cross-validation". More details including references
are given by Kleijnen (2014).
If the most recently …tted …rst-order polynomial turns out to be inade-
quate, then RSM …ts a second-order polynomial :

k
X k X
X k
y= 0 + j zj + j;j 0 zj zj 0 + e: (4)
j=1 j=1 j 0 =k

To estimate the q = 1 + 2k+ k(k 1)=2 coe¢ cients of this polynomial, RSM
uses a CCD with n > q combinations More precisely, a CCD starts with a
resolution-V (R-V) design, which by de…nition gives unbiased estimators of
the intercept, the k …rst-order e¤ects, and the k(k 1)=2 cross-products or
two-factor interactions in (4). For example, if k = 7. then the 27 4 design
in Table 1 is replaced by a 27 1 design with the generator 7 = 1:2:3:4:5:6
so 64 combinations are used to estimate 29 e¤ects. A CCD augments this
R-V design such that the purely quadratic e¤ects j;j can also be estimated.
More speci…cally, a CCD adds the central point and 2k axial points that
form a star design, where— for the standardized factors— the central point
is (0; : : : 0)0 and the "positive" axial point for factor j is xj = c and all other
(k 1) factors are …xed at the center so xj 0 = 0 with j 0 = 1; : : : ; k and j 0 6=
j; the "negative" axial point for factor j is xj = c and xj 0 = 0. Kleijnen
(2014) gives a detailed discussion of CCDs, including more e¢ cient so-called
"saturated" designs; by de…nition, the latter designs have n = q. Using this
…tted second-order polynomial, RSM estimates the optimal values of the
inputs by straightforward di¤ erentiation or by more sophisticated canonical
analysis; see Myers et al. (2009).

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 7

If time permits, then RSM may try to escape from a possible local mini-
mum and restart the search from a di¤erent initial local area— which brings
RSM back to its initial step.
We recommend not to eliminate inputs that have non-signi…cant e¤ects
in a …rst-order polynomial …tted within a local experimental area: these
inputs may have signi…cant e¤ects in a next experimental area. A related
issue is the determination of the number of replications. Indeed, determining
whether the signal-to-noise ratio is big enough, is a moot issue in metamodel-
ing. Recently, this issue is examined in Kriging metamodeling by Ankenman
et al. (2010). For the time being, we recommend estimating the true mean
response for a given input combination such that a relative precision of (say)
10% has a 90% probability, using the method detailed in Law (2014).
The Taylor series argument suggests that a higher-order polynomial is
more accurate than a lower-order polynomial. A statistical counterargu-
ment, however, is that over…tting gives less accurate estimators of the poly-
nomial coe¢ cients. Consequently, the higher-order polynomial may give a
predictor yb with lower bias but higher variance such that its mean squared
error (MSE) increases. Moreover, a higher-order polynomial requires the
simulation of more input combinations.

3 RSM in simulation

We consider the following two characteristics of simulation experiments:


(i) The constant variance assumption does not hold.
(ii) The independence assumption does not hold if common random
numbers (CRN) are applied.
Many simulation experiments concern queueing systems; examples are
supply chains and telecommunication networks. The simplest queueing model
is the so-called M/M/1 model, which has one server and exponential arrival
and service times so it is a Markov chain. It is well known that as the tra¢ c
rate of the M/M/1 model increases, the mean steady-state waiting time in-
creases and the variance increases even more; consequently, the assumption
of a constant variance does not hold.
CRN are often applied in simulation experiments, because it is the de-
fault option in simulation software such as Arena; moreover, CRN are a
simple and intuitive variance reduction technique that gives more accurate
estimators of the regression parameters— except for the intercept 0 . The
outputs of all input combinations that use CRN are statistically dependent
or correlated. CRN are related to "blocking" in real-life experiments. In
simulation experiments, we may use blocking when combining CRN and
antithetic random numbers through the so-called Schruben-Margolin strat-
egy; this strategy is detailed by Chih (2013).
The preceding two characteristics imply that the ordinary LS (OLS) does
not give the BLUE. Weighted LS (WLS) does give the BLUE, but assumes
known response variances. Generalized LS (GLS) gives the BLUE but as-
sumes known response variances and covariances. We therefore recommend

Electronic copy available at: https://ssrn.com/abstract=2395836


8 Jack P.C. Kleijnen

the following simple estimator, which is detailed by Kleijnen (2015): assum-


ing a constant number of replications m (as is usual in CRN), we compute
the OLS per replication replacing w in (3) by wr to get br (r = 1, ...,
m). Replication r then gives an estimate of the steepest descent direction
if a …rst-order polynomial is used or the optimum input combination if a
second-order polynomial is used. Together, the m replications give an esti-
mate of the accuracy of this estimated direction or optimum; if the analysts
…nd this accuracy too low, then they may simulate additional replications.
We have not yet any experience with this simple sequential approach.
Actually, if we have mi > 1 (i = 1, ..., n) replications, then we can
further explore the statistical properties of the LS estimator of through
bootstrapping. The classic textbook on bootstrapping in general is Efron and
Tibshirani (1993); many more references are given by Kleijnen (2014). To
explain distribution-free bootstrapping in RSM, we …rst consider the case
of no CRN. We resample— with replacement— the mi simulation outputs
wi;r (r = 1, ..., mi ) for input combination i (i = 1, ..., n), and obtain the
bootstrapped simulation outputs wi;r where the superscript is the general
symbol for bootstrapped observations. Resampling for each input combina-
tion i, we obtain the bootstrapped I/O data (Z; w ) with the PnN -dimensional
vector w = (w1;1 ; : : : ; w1;m1 ; :::; wn;1 ; :::; wnmn )T (so N = i=1 mi ) and the
"original" N q matrix of input combinations Z. Next we consider the case
of CRN. The n elements of wr = (w1;r ; : : : ; wn;r )T (r = 1, ..., m) are then
not identically and independently distributed (IID), so we resample the m
IID vectors wr . This case also gives (Z; w ). In both cases (CRN or no
CRN) we use (Z; w ) to compute the bootstrapped regression parameter
estimator b replacing w in (3) by w . This bootstrapping we repeat (say)
B times; popular choices are B = 100 or B = 1,000. This bootstrap sample
gives bb (b = :1; :::; B) where we replace w in (3) by wb . Sorting these bb in
ascending order gives the estimated density function (EDF) of b . We can
also use bb to derive the corresponding estimated steepest ascent direction
and optimum.
Note: Instead of distribution-free bootstrapping we can apply parametric
bootstrapping, which assumes a speci…c type of distribution; e.g., a Gaussian
distribution (also see Section 6). Parametric bootstrapping may be attrac-
tive if mi is small and no CRN are used; e.g., the n means and n variances
can be estimated if the weak condition mi > 1 holds. If CRN are used,
then the covariance matrix cov(wr ; wr0 ) with r, r0 = 1, ..., m needs to be
estimated; this estimation requires m > n, as proven by Dykstra (1970). So
parametric bootstrapping may require fewer replications, but it may violate
the assumed distribution for the simulation outputs.
Chang et al. (2013) develop STRONG, which is a completely automated
variant of RSM combined with so-called trust regions. They prove that
STRONG converges to the true optimum. Originally, trust regions were
developed by Conn et al. (2000) for deterministic nonlinear optimization.
The trust region is a subregion in which the objective function is approxi-

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 9

mated; if an adequate approximation is found within the trust region, then


the region is expanded; else the region is contracted. In STRONG these
trust regions replace the "local" regions of classic RSM. Chang et al. derive
tests to decide whether trust regions should be expanded or shrunken in
the various steps of STRONG, and how much these areas should change. If
necessary, the trust region is small and a second-order polynomial is used.
Next, Chang et al. (2014) combine STRONG with screening, and call this
STRONG-S; they apply this method to several test functions with multiple
local minima. Contrary to the Taylor series argument, STRONG may have a
relatively large trust region that does not require a second-order polynomial
metamodel but only a …rst-order polynomial metamodel.

4 Adapted steepest descent (ASD)

So-called adapted steepest descent (ASD) accounts for the covariances be-
tween the elements of the estimated gradient b 0 = ( b1 ; : : : ; bk )0 where
the subscript 0 means that the intercept b0 of the estimated …rst-order
polynomial vanishes in the estimated gradient; i.e., b = ( b0 ; b 0 )0 with b
de…ned in (3). Obviously, the white-noise assumption implies

a b0
cov( b) = 2 0
w (Z Z)
1
= 2
w (5)
bC

where
2
w denotes the variance of the simulation output w;
Z is the N (1 + k) matrix of explanatory regression variables including
the column
Pnwith N one’s;
N = i=1 mi is the total number of simulation runs;
n is the number of di¤erent simulated input combinations;
mi is the number of IID replications for combination i;
a is a scalar;
b is a k-dimensional vector;
C is a k k matrix such that cov( b 0 ) = w 2
C.
2
We estimate the variance w in (5) through the mean squared residuals
(MSR): Pn Pmi
2 (wi;r ybi )2
bw = i=1 r=1 (6)
N (k + 1)
where ybi = z0i b; also see Kleijnen (2015).
We can easily prove that the predictor variance var(b
y jz) increases as z—
the point to be predicted— moves further away from the local area where
the gradient is estimated. The point with the minimum predictor variance
is C 1 b where C and b were de…ned below (5). ASD means that the new
point to be simulated is
1 1b
d= C b C 0 (7)

Electronic copy available at: https://ssrn.com/abstract=2395836


10 Jack P.C. Kleijnen

where
C 1 b is the point where the local search starts; namely, the point with
minimum local variance;
is the step size;
C 1 b 0 is the classic steepest descent direction b 0 adapted for cov( b 0 ).
Accounting for cov( b 0 ) gives a scale-independent search direction. Klei-
jnen et al. (2004, 2006) present experimental results showing that ASD
performs "better" than classic steepest descent; i.e., the angle between the
search direction based on the true 0 and the search direction estimated
in ASD is smaller; e.g., in one example this angle reduces from 89.87 for
classic steepest descent to 1.83 for ASD.

5 Multiple responses: Generalized RSM

In practice, simulation models have multiple responses (multivariate out-


put); e.g., many realistic inventory models require the inventory system
to result in a minimum service rate or …ll rate— because the out-of-stock
costs are hard to quantify— and to minimize the sum of all the other inven-
tory costs. Simulation software facilitates the collection of multiple outputs.
There are several approaches to solve the resulting issues; see the survey by
Rosen et al. (2008). The RSM literature also o¤ers several approaches for
such situations; see the surveys in Angün (2004), Khuri (1996), and Ng et
al. (2007).
A novel heuristic is Generalized RSM (GRSM) that combines the es-
timated gradients of RSM with a search procedure in mathematical pro-
gramming. GRSM selects one response as the goal variable and the other
responses as constrained variables; moreover, the deterministic input vari-
ables may also be subjected to constraints. GRSM generalizes the steepest
descent search direction through the a¢ ne scaling search direction, bor-
rowing ideas from interior point methods (a variation on Karmarkar’s al-
gorithm) in mathematical programming; see Barnes (1986). This search
direction moves faster to the optimum than steepest descent, because the
former avoids creeping along the boundary of the feasible area, which is
determined by the constraints on the random outputs and the deterministic
inputs. Moreover, this search tries to stay inside the feasible area, so the
simulation program does not crash. Finally, this search direction is scale
independent; see Angün et al. (2009).
Formally, GRSM extends the classic RSM problem in (1) to the following
constrained nonlinear random optimization problem:

min E(w0 jz) (8)


z

such that the other (say) (r 1) random outputs satisfy the constraints

E(wh0 jz) ah0 with h0 = 1; : : : ; r 1; (9)

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 11

and the k deterministic inputs zj satisfy the box constraints

lj zj uj with j = 1; : : : ; k. (10)

An example is an inventory simulation, in which the sum of the expected


inventory carrying costs and ordering costs should be minimized while the
expected service percentage should be at least (say) 90% so a1 = 0:9 in (9);
both the reorder quantity z1 = Q and the reorder level z2 = s should be
non-negative so z1 0 and z2 0 in (10). Note that more complicated
input constraints than (10)— namely, linear budget constraints— feature in
a call-center simulation by Kelton et al. (2007); input constraints resulting
from output constraints are discussed in Ng et al. (2007).
Analogously to the …rst steps of classic RSM based on (2), GRSM lo-
cally approximates the multivariate I/O function by r univariate …rst-order
polynomials augmented with white noise:

yh = Z h +eh with h = 0; : : : r 1: (11)

Like RSM, GRSM assumes that locally the white noise assumption holds
for (11), so the BLUEs are the following LS estimators:
^h = (Z0 Z) 1
Z0 wh with h = 0; : : : ; r 1: (12)

The vector ^0 (LS estimator of …rst-order polynomial approximation of


goal function) and the goal function (8) result in

min b0; 0z (13)


z

where b0; 0 = ( b0;1 ; : : : ; b0;k )0 is the LS estimate of the local gradient of the
goal function. The (r 1) estimates ^h0 in (12) combined with the original
output constraints (9) give
b0 0 ch0 with h0 = 1; : : : ; r 1 (14)
h ; 0z

where bh0 ; 0 = ( bh0 ;1 ; : : : ; bh0 ;k )0 is the estimated local gradient of constraint


function h0 , and ch0 = ah0 bh0 ;0 is the modi…ed right-hand side of this
constraint function. The box constraints (10) remain unchanged.
Now the k-dimensional vectors bh0 ; 0 (h0 = 1; : : : ; r 1) in (14) are
collected in the (r 1) k matrix (say) B. Likewise, we collect the (r 1)
elements ch0 in the vector c. We de…ne l as the vector with the k elements
lj , and u as the vector with the elements uj . Finally, we introduce the k-
dimensional vectors with the non-negative slack variables s, r, and v, to get
the following equivalent problem formulation:

minimize b0 z
0; 0
subject to Bz s = c
(15)
z+r=u
z v = l.

Electronic copy available at: https://ssrn.com/abstract=2395836


12 Jack P.C. Kleijnen

This optimization problem is linear in the inputs z. GRSM uses this problem
to derive the following search direction:
0 1
d= BS 2
B+R 2
+V 2 b0; 0 (16)

where S, R, and V are diagonal matrixes with as main-diagonal elements


the current estimated slack vectors s, r, and v in (15). Note that b0; 0 in
(16) is the estimated classic steepest ascent direction. As the value of a slack
variable in (16) decreases so the corresponding constraint gets tighter, the
GRSM search direction deviates more from the steepest descent direction.
Possible singularity of the various matrices in (16) is discussed by Angün
(2004).
Following the path de…ned by (16), GRSM must decide on the step size
(say) along this path, and chooses
" #
ch0 bh0 0 ; 0 zc
= 0:8 min (17)
h0 b0 0 d
h; 0

where the factor 0:8 is chosen to decrease the probability that the local
metamodel (14) is misleading when applied globally; zc denotes the current
input combination, so the new combination becomes zc + d. The box con-
straints (10) for the deterministic inputs hold globally, so it is easy to check
whether the new combination zc + d violates these constraints.
Analogously to classic RSM, GRSM proceeds stepwise. After each step
along the search path, GRSM tests the following two hypotheses, denoted
by the superscripts (1) and (2):

1. Pessimistic null-hypothesis: w0 (zc + d) (output of new combination) is


no improvement over w0 (zc ) (output of old combination):
(1)
H0 : E[w0 (zc + d)] E[w0 (zc )]: (18)

2. This step is feasible; i.e., wh0 (zc + d) satis…es the (r 1) constraints in


(9):
(2)
H0 : E[wh0 (zc + d)] ah0 with h0 = 1; : : : ; r 1: (19)

To test these hypotheses, we may apply the following simple statis-


tical procedures (more complicated parametric bootstrapping is used by
Angün 2004, permitting non-normality and testing the relative improve-
ment w0 (zc + d)=w0 (zc ) and the relative slacks sh0 (zc + d)=sh0 (zc )). To
test (18), we apply the paired Student tm 1 statistic if CRN are used; we
reject the hypothesis if signi…cant improvement is observed. To test (19), we
again apply a tm 1 statistic; because we test multiple hypotheses, we apply
Bonferroni’s inequality so we divide the classic value by the number of
tests (r 1).
Actually, a better solution may lie somewhere between zc (old combi-
nation) and zc + d (new combination at maximum step size). Therefore

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 13

GRSM uses binary search; i.e., it simulates a combination that lies halfway
between these two combinations— and is still on the search path. This halv-
ing of the stepsize may be applied several times.
Next, GRSM proceeds analogously to classic RSM; i.e., around the best
combination found so far, GRSM selects a new local area. Again a R-III
design speci…es the new simulation input combinations, and r …rst-order
polynomials are …tted, which gives a new search direction, etc. Note that
we might use the m replications br to estimate the accuracy of the search
direction; to test the accuracy of the estimated optimum, we present a test
in the next section.
Figure 1 illustrates GRSM. This …gure shows two inputs; see the two axes
labeled z1 and z2 . The goal function is to be minimized; the …gure also shows
two contour plots or iso-costs functions: E(w0 ) = a0;1 and E(w0 ) = a0;2 with
a0;2 < a0;1 . There are two constrained random outputs; see E(w1 ) = a1 and
E(w2 ) = a2 . The search starts in the lower-right local area with a 22 design;
see the four elongated points. Together with the replications that are not
shown, the I/O data give a search direction; see the arrow leaving from
point (0). The maximum step size along this path takes the search from
point (0) to point (1). The binary search takes the search back to point (2),
and next to point (3). Because the best point so far turns out to be point
(1), the 22 design is used to select four points to be simulated in this local
area; this point is one of the four points. This design gives a new search
direction, which indeed avoids the boundary. The maximum step-size now
takes the search to point (4). The binary search takes the search back to
point (5), and next to point (6). Because the best point so far is now point
(4), the 22 design is simulated in the local area with this point as one of its
points. A new search direction is estimated, etc.
Angün (2004) applies GRSM to two more examples; namely, Bashyam
and Fu (1998)’s inventory simulation with a service-level constraint so the
solution is unknown, and a test function with known solution. Her results
for these examples are encouraging, as GRSM found solutions that were
both feasible and gave much lower values for the goal functions.

6 Testing an estimated optimum in GRSM: KKT conditions

Obviously, it is uncertain whether the optimum estimated by a heuristic such


as GRSM is close enough to the true optimum. In deterministic nonlinear
mathematical programming, the …rst-order necessary optimality conditions
are known as the Karush-Kuhn-Tucker (KKT) conditions; see Gill et al.
(2000). First we present the basic idea behind these conditions; next, we
explain how to test these conditions in simulation.
Figure 2 illustrates the same type of problem as the one in Figure 1. More
precisely, Figure 2 shows a goal function E(w0 ) with the three contour plots
that correspond with the values 66, 76, and 96; also see (8). The …gure also
shows two constrained simulation outputs, namely E(w1 ) 4 and E(w2 )

Electronic copy available at: https://ssrn.com/abstract=2395836


14 Jack P.C. Kleijnen

z2 E(w0) = a0;2 (< a0;1)

E(w0) = a0;1

(4)
E(w2) = a2
(6)
(5)

E(w1) = a1 (1) (3) (2)


(0)
z1
Fig. 1 GRSM illustration with two inputs, two contour plots for the goal output,
two constraints for the other outputs, three local areas, three search directions,
and six steps in these directions

z2 D E(w0) = 96

E(w1) = 4 C

E(w0) = 76

B
E(w0) = 66

E(w2) = 9

z1
Fig. 2 A constrained nonlinear random optimization problem: three contour plots
with goal values 66, 76, and 96; two other outputs with lower bounds 4 and 9; .
optimal point A; points B and C on bound 9; point D on bound 4; local gradients
at A through D for goal function and binding constraint, perpendicular to local
tangent lines for binding constraint

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 15

9; also see (9). The …gure displays the boundaries of the feasible area, which
is determined by E(w1 ) = 4 and E(w2 ) = 9. The optimum combination is
point A. The points B and C lie on the boundary E(w2 ) = 9; point D lies on
the boundary E(w1 ) = 4. Obviously, the points A and D lie far away from
each other. The …gure also displays the local gradients at these four points
for the goal function and for the binding constraint; i.e., the constraint with
a zero slack value in (9). These gradients are perpendicular to the local
tangent lines; those lines are shown only for the binding constraint— not for
the goal function.
Let z0 denote a local minimizer for the deterministic variant of our
problem. The KKT conditions for z0 are then
P 0
0; 0 = h h; 0
h2A(z0 )
0
0 (20)
h
h 2 A z0

where 0; 0 denotes the k-dimensional vector with the gradient of the goal
function, as we saw in (13); A z0 is the index set with the indices of
the constraints that are binding at z0 ; 0h is the Lagrangian multiplier for
binding constraint h; h; 0 is the gradient of the output in that binding con-
straint; Figure 2 has only one binding constraint at the points A through D.
Altogether, (20) implies that the gradient of the objective is a nonnegative
linear combination of the gradients of the binding constraints, at z0 . Figure
2 shows that A satis…es (20); B has two gradients that point in di¤erent but
similar directions— and so does C— whereas D has two gradients that point
in completely di¤erent directions.
Note: If the optimum occurs inside the feasible area, then there are no
binding constraints so the KKT conditions reduce to the condition that
the goal gradient is zero. Classic RSM includes tests for a zero gradient
estimated from a second-order polynomial; see Section 2.
In simulation we must estimate the gradients; moreover, to check which
constraints are binding, we must estimate the slacks of the constraints. This
estimation changes the KKT conditions into a problem of nonlinear statis-
tics. Angün (2004) presents an asymptotic test, but Bettonvil et al. (2009)
derive a small-sample bootstrap test. We present the latter test, because
it is simpler and it allows expensive simulation. Readers not interested in
the technical details of this bootstrapping should skip the remainder of this
section.
As in classic RSM, we assume locally constant (co)variances for each of
the r simulation outputs (when moving to a new local area, the (co)variances
may change; e.g., the points A through D in Figure 2 do not have the same
variance for the goal output). LS per univariate simulation output gives ^h
(h = 0; 1; : : : ; r 1) de…ned in (12). These estimators have the following
estimated covariance matrix:
1
d ^h ; ^h0 ) = cov(w
cov( d h ; wh0 ) (Z0 Z) (h; h0 = 0; : : : ; r 1) (21)

Electronic copy available at: https://ssrn.com/abstract=2395836


16 Jack P.C. Kleijnen

where denotes the Kronecker product and cov(w d h ; wh0 ) is an r r matrix


with the classic estimators of the (co)variances based on the m replications
at the local center:

Xm
1
d h ; wh0 ) = (bh;h0 ) = (
cov(w (wh;l wh )(wh0 ;l wh0 )) : (22)
m 1
l=1

The Kronecker product implies that cov(d ^h ; ^h0 ) is an rq rq matrix with q


denoting the number of regression parameters (e.g., q = 1+k in a …rst-order
polynomial); this matrix is formed from the r r matrix cov(w d h ; wh0 ) by
1
multiplying each of its elements by the entire q q matrix (Z0 Z) (e.g., Z
is an N (1 + k) matrix in equation 5). The matrix cov(wd h ; wh0 ) is singular
if m r; e.g., the case study in Kleijnen (1993) has r = 2 response types
and k = 14 inputs so m 3 replications of the center point are required. Of
course, the higher m is, the higher is the power of the tests that use these
replications. Bettonvil et al. (2009) do not consider cases where all n local
points are replicated and CRN are possibly used; these cases require further
research.
In classic RSM we assume that the output is Gaussian, and now we
assume that the r-variate simulation output is multivariate Gaussian. We
use the center point to test whether a constraint is binding in the current
local area, because this point is more representative of the local behavior
than the points of the R-III design. (To save simulation runs, we should start
a local experiment at its center point including replications; if it turns out
that either no constraint is binding or at least one constraint is violated, then
we need not test the other hypotheses so we do not simulate the remainder
of the local design.) Actually, we test the following three null-hypotheses,
denoted by the superscripts (1) through (3):
1. The current solution is feasible and at least one constraint is binding;
see (9):
(1)
H0 : E(wh0 jd = 0) = ah0 with h0 = 1; : : : ; r 1 (23)
where d = 0 corresponds with the center of the local area expressed in
the standardized inputs.
2. The expected value of the estimated goal gradient may be expressed as
the expected value of a linear combination of the estimated gradients of
the simulation outputs in the binding constraints; i.e., in (20) we replace
the deterministic quantities by their random estimators:
0 1
X
H0 : E( b0; 0 ) = E @ b0 bh A :
(2)
h (24)
h2A(z0 )

3. The Lagrangian multipliers in (24) are non-negative:

: E(b)
(3)
H0 0: (25)

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 17

Each of these three hypotheses requires multiple tests, so we apply Bon-


ferroni’s inequality. Moreover, we test these three hypotheses sequentially,
so it is hard to control the …nal type-I and type-II error probabilities (classic
RSM has the same type of problem, but has nevertheless acquired a track
record in practice).
Sub 1 : To test (23), we use the classic t statistic:

(h0 ) wh0 (d = 0) ah0


tm 1 = p with h0 = 1; : : : ; r 1 (26)
bh ;h =m
0 0

where both the numerator and the denominator use the m replications at
the local center point; see (22). This t statistic may give the following three
results:
(i) The statistic is signi…cantly positive; i.e., the constraint for output
h0 is not binding. If none of the (r 1) constraints is binding, then the
optimal solution is not yet found, assuming that at the optimum at least
one constraint is binding; otherwise, classic RSM applies. The search for
better solutions continues (see again Section 5).
(ii) The statistic is signi…cantly negative; i.e., the current local area does
not give feasible solutions so the optimal solution is not yet found. The
search should back-up into the feasible area.
(iii) The statistic is non-signi…cant; i.e., the current local area gives
feasible solutions, and the constraint for output h0 is binding. The index
of this gradient is then included in A z0 ; see (24). And the KKT test
proceeds as follows.
Sub 2 and 3 : To estimate the linear combination in (24), we apply LS
with as explanatory variables the estimated gradients of the (say) J binding
constraints; these explanatory variables are now random. We collect these
gradients in the k J matrix B b J; 0 . The parameters estimated through LS
b
are b. Let b 0; 0 denote the LS estimator of the goal gradient:

bb
0; 0
b J;
=B b0 b
0 (BJ; 0 BJ; 0 ) BJ; 0 b0; 0
1 b0 b J;
=B 0
b (27)

with b = (B
b0 b
J; 0 BJ; 0 ) BJ; 0 b0; 0 . To quantify the validity of this linear
1 b0

approximation, we use the k-dimensional vector of its residuals


b b
e( b0;
b 0) = b0; 0
b0; 0: (28)

b
Hypothesis (24) implies E(be( b0; 0 )) = 0. This hypothesis involves a product
of multivariates, so standard tests do not apply and we use bootstrapping.
We do not apply distribution-free bootstrapping, because in expensive sim-
ulation only the center point is replicated a few times. Instead, we apply
parametric bootstrapping; i.e., we assume a Gaussian distribution (like in
classic RSM), and we estimate its parameters from the simulation’s I/O
data. This bootstrapping consists of the following four steps, where the
superscript denotes a bootstrapped value:

Electronic copy available at: https://ssrn.com/abstract=2395836


18 Jack P.C. Kleijnen

(i) Use the Monte Carlo method to sample vec( b0; 0 ; B


b
J; 0 ), which is a
(k+kJ)-dimensional vector formed by stapling (stacking) the k-dimensional
goal gradient vector and the J k-dimensional vectors of the k J matrix
b
B J; 0 :

vec( b0; b
0 ; BJ; 0 ) N (vec( b0; b d
0 ; BJ; 0 ); cov(vec(
b b
0; 0 ; BJ; 0 ))) (29)

d
where cov(vec( b0; 0 ; B
b J; 0 )) is the (k + kJ) (k + kJ) matrix computed
through (21).
(ii) Use the bootstrap values resulting from Step 1, to compute the LS
estimate of the bootstrapped goal gradient using the bootstrapped gradients
of the binding constraints as explanatory variables; i.e., use (27) adding the
b
superscript to all random variables resulting in b and b .
0; 0
b
(iii) Use b0; b
0 from Step (ii) and 0; 0 from Step (i) to compute the
b b
bootstrap residual b e( b0; 0 ) = b0; 0 b0; 0 , analogous to (28). Determine
whether any of the bootstrapped Lagrangian multipliers b found in Step (ii)
is negative; i.e., augment a counter (say) c with the value 1 if this event
occurs.
(iv) Repeat the preceding three steps (say) 1000 times. This bootstrap
b
sample gives the EDF of b e( b0; 0;j )— which denotes the bootstrapped resid-
uals per input j (j = 1; : : : ; k)— and the …nal value of the counter c . Re-
ject the hypothesis in (24) if this EDF implies a two-sided (1 =(2k))
con…dence interval that does not cover the value 0, where the factor k is
explained by Bonferroni’s inequality. Reject (25) if the fraction c =1000
is signi…cantly higher than 50%; if the true Lagrangian multiplier is only
"slightly" larger than zero, then "nearly" 50% of the bootstrapped val-
ues is negative. To test the latter fraction, we approximate the binomial
distribution through the normal distribution with mean 0.50 and variance
(0:50 0:50)=1000 = 0:00025.
The numerical examples in Bettonvil et al. (2009) give encouraging re-
sults; i.e., the classic t test for zero slacks performs as expected and the new
bootstrap tests give observed type-I error rates close to the prespeci…ed
(nominal) rates, while the type-II error rate decreases as the tested input
combination is farther away from the true optimum (see A through D in
Figure 2).

7 Robust optimization

Taguchi emphasizes that in practice some inputs of a product are under


complete control, whereas other inputs are not; e.g., the design of a car
engine is completely controlled by the engineers, but the driving style is
not. Consequently, a design that allows some ‡exibility in its use is "bet-
ter"; e.g., a car optimized only for the race circuit does not perform well on

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 19

the highways. Likewise, in simulation the optimum solution may be com-


pletely wrong when ignoring uncertainties in some inputs; e.g., the nomi-
nally optimal decision on the inventory control limits s (reorder level) and S
(order-up-to level) may be completely wrong, because this solution ignores
the uncertainty in the parameters assumed for the random demand and
delivery time distributions. In Section 7.1, we …rst explain Taguchi’s ap-
proach, updating and extending Dellino et al. (2010); also see Dellino et al.
(2012). In Section 7.2, we brie‡y discuss Ben-Tal’s approach, summarizing
Yan¬ko¼glu et al. (2013).

7.1 Taguchi’s robust optimization

We use Taguchi’s view of the world, distinguishing between two types of


inputs:
(i) decision variables, denoted by (say) dj (j = 1; : : : ; k) so d = (d1 ; :::; dk )0 ,
and
(ii) environmental or noise factors eg (g = 1; : : : ; c) so e = (e1 ; :::; ec )0 .
Taguchi assumes a single output, which we denote by w. He focuses
2
on the mean w and the variance w of this output, caused by the noise
factors e. However, we do not use Taguchi’s scalar loss function such as
2
the signal-to-noise or mean-to-variance ratio w / w ; see Myers et al. (2009,
2
pp. 486-488). Instead we use both w and w to characterize the statistical
distribution of the output, and we try to solve the following problem:
2
min w such that w T (30)
where T is some threshold; also see Myers et al. (2009, pp. 488-495). We also
refer to the surveys on robust optimization by Beyer and Sendho¤ (2007)
and Park et al. (2006).
Taguchi’s worldview has been very successful in production engineering,
but statisticians have seriously criticized his statistical techniques; see the
panel report by Nair (1992). Therefore Myers et al. (2009, pp. 502-506) com-
bine Taguchi’s worldview with RSM. We adapt their RSM; i.e., we assume
that e has the mean e and the covariance matrix e , whereas Myers et
al. assume a constant variance 2e so e = 2e I. To …nd a robust solution,
Myers et al. superimpose contour plots for the mean and variance of the
output, whereas we use more general and ‡exible mathematical program-
ming. This mathematical programming, however, requires speci…cation of
threshold values like T in (30); managers may …nd it hard to select spe-
ci…c values, so we may try di¤erent values and estimate the corresponding
Pareto-optimal e¢ ciency frontier. To estimate the variability of this Pareto
frontier resulting from the various estimators, we may use bootstrapping
(also see our bootstrapping in the preceding section). For details on our
adaptation of Meyer et al.’s approach we refer to Dellino et al. (2010).
More precisely, Myers et al. …t a second-order polynomial for d that is
to be optimized. To model possible e¤ects of e, they …t a …rst-order poly-
nomial in e. They also estimate "control-by-noise" two-factor interactions.

Electronic copy available at: https://ssrn.com/abstract=2395836


20 Jack P.C. Kleijnen

Altogether, they …t the following "incomplete" second-order polynomial:


k
X k X
X k c
X k X
X c
y= 0 + j dj + j;j 0 dj dj 0 + g eg + j;g dj eg +
j=1 j=1 j 0 =j g=1 j=1 g=1 (31)
0 0 0 0
= 0 + d + d Bd + e+d e+ ;

where we now use the symbol (instead of e) to denote the regression


residual with = 0 if this metamodel has no lack-of-…t and with constant
variance 2 ; furthermore, = ( 1 ; : : : ; k )0 , B denotes the k k symmetric
matrix with main-diagonal elements j;j and o¤-diagonal elements j;j 0 =2,
= ( 1 ; : : : ; c )0 , and denotes the k c matrix with interactions j;g .
Obviously, (31) implies the following regression predictor for the true
mean w :
0
y = 0+ d + d0 Bd + 0 e + d0 e: (32)
2
And the regression predictor for the true variance w is
2 0
y =( + d0 ) e( + 0
d) + 2
= l0 el + 2
(33)

where l = ( + 0 d) = (@y=@e1 ; : : : ; @y=@ec )0 so l is the gradient with respect


to e. So, the larger the gradient’s elements are, the larger y2 is— which
stands to reason. Furthermore, if there are no control-by-noise interactions
so = 0, then we cannot control y2 through d.
To enable estimation of the regression parameters in (31), we use a
crossed design; i.e., we combine the design for d and the design for e— as is
usual in a Taguchian approach. To estimate the optimal d, we use a CCD;
see the discussion below (4). For the …rst-order polynomial in e, we use
a R-III design; see the example in Table 1. The combination of these two
designs obviously enables the estimation of the two-factor interactions j;g .
Note that Myers et al. (2009) and Dellino et al. (2010) also discuss designs
that are more e¢ cient than crossed designs.
To use linear regression analysis for the estimation of the q parameters
in (say) = ( 0 ; :::; k;c )0 in (31), we reformulate (31) as
0
y= x+ (34)

where x is de…ned in the obvious way; e.g., the element corresponding with
1;2 (interaction between d1 and d2 ) is d1 d2 . Note that (34) is linear in ,
but (31) is not linear in d. Then (34) gives the LS estimator

b = (X0 X) 1
X0 w (35)
Pn
where X is the N q matrix of explanatory variables with N = i=1 mi
and n denoting the number of di¤erent simulated combinations of d and e;
w consists of the N simulation outputs. Obviously, the covariance matrix
of b is
cov(b) = (X0 X) 1 2 : (36) w

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 21

2 2
The RSM metamodel (31) implies that w equals . This variance is esti-
mated by
y w)0 (b
(b y w)
M SR = (37)
N q
0
where yb = b x; also see (6).
To estimate y , we simply plug b de…ned by (35) into the right-hand
side of (32), where d and e are known. To estimate y2 , we also plug b
into (33), where e is known. Note that (33) involves products of unknown
parameters, so it implies a nonlinear estimator by2 ; plugged-in estimators
certainly create bias, but we ignore this bias.
Our …nal goal is to solve (30). We solve this constrained minimiza-
tion problem through a mathematical programming solver; e.g., Matlab’s
"fmincon"— but a di¤erent solver might be used; see Gill et al. (2000). This
gives estimates of the robust decision variables and the corresponding mean
and variance.
Dellino et al. (2010) give the example of the economic order quantity
(EOQ) for an environment with a demand rate that is uncertain but has a
known distribution. This example demonstrates that if management prefers
low variability of inventory costs, then they must pay a price; i.e., the ex-
pected costs increases. Furthermore, the classic EOQ assuming a known
…xed demand rate and the robust EOQ do di¤er. More examples are refer-
enced by Yan¬ko¼ glu et al. (2013).

7.2 Ben-Tal’s robust optimization

Ben-Tal emphasizes that the nominal solution— which ignores the uncer-
tainty in e— may easily violate the constraints in the given mathematical
programming problem, so he derives a robust solution that gives a slightly
worse value for the goal variable but increases the probability of satisfying
the constraints. The mathematical challenge is to develop a computationally
tractable "robust counterpart" of the original mathematical programming
problem. Therefore he proposes a robust solution that is "immune" to vari-
ations within the uncertainty set. Yan¬ko¼ glu et al. (2013) derive a speci…c
uncertainty set for p where p denotes the unknown density function of e,
which is compatible with the historical data on e. Technically, p belongs to
this set with con…dence 1 , assuming some phi-divergence measure such
as the well-known chi-square distance. In this chapter we do not present the
mathematical details of the derivation of tractable robust counterparts, but
refer to Yan¬ko¼glu et al.
Note that Taguchians assume a speci…c distribution for e, which im-
plies e and e in (32) and (33). This distribution may be estimated from
historical data; Yan¬ko¼ glu et al., however, develop an approach that uses
only the original observed data on e. Several numerical examples demon-
strate the e¤ectiveness of this novel combination of Taguchi’s and Ben-Tal’s
approaches.

Electronic copy available at: https://ssrn.com/abstract=2395836


22 Jack P.C. Kleijnen

Yan¬ko¼glu et al. also present Myers et al. (2009, p. 512)’s real-life tele-
vision example, and Shi (2011)’s simulated distribution-center example. In
this chapter we focus on simulation, so we discuss only Shi’s example. He
has …ve decision variables (e.g., number of forklifts) and two environmen-
tal variables (e.g., delay probabilities of suppliers); the response is the to-
tal throughput. Shi …ts an incomplete second-order polynomial; see (31).
Yan¬ko¼glu et al. replace (30) by the following related problem:

2
min w such that w T: (38)
These two examples again demonstrate that in general robust solutions have
better worst-case performance and also better average performance.

8 Conclusions

In this chapter, we started with the basics of classic RSM, which minimizes
the expected value of a single response variable in real-life experiments. Next
we considered simulation experiments. We then added the adapted steepest
descent search direction, which improves classic steepest descent. We also
summarized GRSM for simulation with multivariate responses, assuming
that one response is to be minimized while all the other responses and de-
terministic inputs must meet given constraints. Furthermore, we presented
a bootstrap procedure for testing whether the KKT conditions hold for the
estimated optimum. Finally, we considered robust optimization.
Future research may study the selection of the required number of repli-
cations, and the use of replications to estimate the accuracy of the resulting
estimated search direction or optimum. Bootstrapping may solve this prob-
lem, but it needs more research. Numerical evaluation of the adapted steep-
est descent method would bene…t from more applications in practice. There
is also a need for more research on the KKT testing procedure when all local
points (not only the center) are replicated and CRN are used; more practi-
cal applications are also needed. In Taguchian robust optimization through
RSM we may vary the threshold values, which gives di¤erent optimal solu-
tions and a corresponding Pareto frontier; bootstrapping this frontier might
enable management to make the …nal compromise decision— but more re-
search and applications are needed.

References

Angün, M.E. (2004), Black box simulation optimization: generalized re-


sponse surface methodology. CentER Dissertation series, Tilburg University,
Tilburg, the Netherlands
Angün, E., D. den Hertog, G. Gürkan, and J.P.C. Kleijnen (2009), Re-
sponse surface methodology with stochastic constraints for expensive simu-
lation. Journal of the Operational Research Society, 60, (6), 735–746

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 23

Ankenman, B., Nelson, B., and Staum, J., (2010), Stochastic Kriging for
Simulation Metamodeling, Operations Research, 58, No. 2, pp. 371-382
Barnes, E. R. (1986), A variation on Karmarkar’s algorithm for solving
linear programming problems. Mathematical Programming, 36, pp. 174-182
Barton, R.R. and M. Meckesheimer (2006), Metamodel-based simulation
optimization. Handbooks in Operations Research and Management Science,
Volume 13, Elsevier/North Holland, pp. 535-574
Bartz-Beielstein, T. (2006), Experimental research in evolutionary com-
putation; the new experimentalism. Springer, Berlin
Bashyam, S. and M. C. Fu (1998), Optimization of (s, S) inventory
systems with random lead times and a service level constraint. Management
Science, 44, pp. 243-256
Ben-Tal, A. and A. Nemirovski (1998), Robust convex optimization.
Math. Oper. Res. 23(4) 769-805
Ben-Tal, A. and A. Nemirovski (2008), Selected topics in robust convex
optimization, Mathematical Programming, 112, no. 1, pp. 125–158
Bettonvil, B.W.M., E. del Castillo, and J.P.C. Kleijnen (2009). Statisti-
cal testing of optimality conditions in multiresponse simulation-based opti-
mization. European Journal of Operational Research, 199(2), 448-458.
Beyer, H. and B. Sendho¤ (2007), Robust optimization— a comprehen-
sive survey. Computer Methods in Applied Mechanics and Engineering, 196,
nos. 33-34, pp. 3190–3218
Box, G.E.P. and K.B. Wilson (1951), On the experimental attainment
of optimum conditions. Journal Royal Statistical Society, Series B, 13, no.
1, pp. 1-38
Chang, K-H., J. Hong, and H. Wan (2013), Stochastic Trust-Region
Response-Surface Method (STRONG) – A New Response-Surface Frame-
work for Simulation Optimization, INFORMS Journal on Computing, 25,
no. 2, pp. 230-243
Chang, K.-H. M-K Li and H. Wan (2014), Combining STRONG with
Screening Designs for Large-Scale Simulation Optimization, IIE Transac-
tions, accepted.
Chih, M. (2013), A more accurate second-order polynomial metamodel
using a pseudo-random number assignment strategy. Journal of the Opera-
tional Research Society, 64, pp. 198-207
Conn, A.R., N.L.M. Gould, P.L. Toint (2000), Trust-Region Methods.
SIAM
Dellino, G., Kleijnen, J.P.C., & Meloni, C. (2010). Robust optimization
in simulation: Taguchi and Response Surface Methodology. International
Journal of Production Economics, 125(1), 52-59
Dellino, G., Kleijnen, J.P.C., & Meloni, C. (2012). Robust optimization
in simulation: Taguchi and Krige combined. Informs Journal on Computing,
24(3), 471-484
Dykstra, R.L. (1970), Establishing the positive de…niteness of the sample
covariance matrix. The Annals of Mathematical Statistics, 41, no. 6, pp.
2153–2154

Electronic copy available at: https://ssrn.com/abstract=2395836


24 Jack P.C. Kleijnen

Efron, B. and R.J. Tibshirani (1993), An introduction to the bootstrap.


Chapman & Hall, New York
Gill, P. E., W. Murray, and M. H. Wright (2000). Practical optimization,
12th edition. Academic Press, London
Kelton, W.D., R.P. Sadowski, and D.T. Sturrock (2007), Simulation with
Arena; fourth edition. McGraw-Hill, Boston
Khuri, A. I. (1996), Multiresponse surface methodology. Handbook of
Statistics, volume 13, edited by S. Ghosh and C. R. Rao, Elsevier, Amster-
dam
Kleijnen, J.P.C. (1975). Statistical techniques in simulation, Part II. New
York: Dekker
Kleijnen, J.P.C. (1993), Simulation and optimization in production plan-
ning: a case study. Decision Support Systems, 9, pp. 269-280
Kleijnen, J.P.C. (2008). Response surface methodology for constrained
simulation optimization: An overview. Simulation Modelling Practice and
Theory, 16, 50-64
Kleijnen, J.P.C. (2015), Design and Analysis of Simulation Experiments;
Second Edition, Springer
Kleijnen, J.P.C., D. den Hertog, and E. Angün (2004), Response surface
methodology’s steepest ascent and step size revisited. European Journal of
Operational Research, 159, pp. 121-131
Kleijnen, J.P.C., D. den Hertog and E. Angün (2006), Response surface
methodology’s steepest ascent and step size revisited: correction. European
Journal of Operational Research, 170, pp. 664-666
Law, A.M. (2014), Simulation modeling and analysis; …fth edition. McGraw-
Hill, Boston
Myers, R.H., A.I. Khuri, and W.H. Carter (1989), Response surface
methodology: 1966-1988. Technometrics, 31, no., 2, pp. 137-157
Myers, R.H., D.C. Montgomery, and C.M. Anderson-Cook (2009), Re-
sponse surface methodology: process and product optimization using designed
experiments; third edition. Wiley, New York
Nair, V.N., editor (1992), Taguchi’s parameter design: a panel discussion.
Technometrics, 34, no. 2, pp. 127-161
Ng S.H., Xu K. and Wong W.K. (2007), Optimization of multiple re-
sponse surfaces with secondary constraints for improving a radiography in-
spection process. Quality Engineering, 19, no. 1, 53-65
Park, G-J., T-H. Lee, K.H. Lee, and K-H. Hwang (2006), Robust design:
an overview. AIAA Journal, 44, no. 1, pp. 181–191
Qu, H. and M.C. Fu (2012), ON DIRECT GRADIENT ENHANCED
SIMULATION METAMODELS. Proceedings of the 2012 Winter Simula-
tion Conference, C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and
A. M. Uhrmacher, eds.
Rikards, R. and J. Auzins (2002), Response surface method for solution
of structural identi…cation problems. Fourth International Conference on
Inverse Problems in Engineering, Rio de Janeiro, Brazil

Electronic copy available at: https://ssrn.com/abstract=2395836


Response Surface Methodology 25

Rosen, S.C., C.M. Harmonosky, and M.T. Traband (2008), Optimization


of systems with multiple performance measures via simulation: survey and
recommendations. Computers & Industrial Engineering, 54, no. 2, pp. 327-
339
Shi, W. (2011), Design of pre-enhanced cross-docking distribution cen-
ter under supply uncertainty: RSM robust optimization method. Working
Paper, Huazhong University of Science & Technology, China
Shi, W., Kleijnen, J.P.C., & Liu, Z. (2014). Factor Screening For Simu-
lation With Multiple Responses: Sequential Bifurcation. European Journal
of Operational Research, accepted
Taguchi, G. (1987), System of experimental designs, volumes 1 and 2.
UNIPUB/ Krauss International, White Plains, New York
Van den Bogaard, W. and Kleijnen, J.P.C. (1977). Minimizing waiting
times using priority classes: A case study in response surface methodology.
Discussion Paper FEW 77.056 (see
http://publications.uvt.nl/repository/kleijnen/publications.html)
Yanikoglu, I., D. den Hertog, J.P.C. Kleijnen (2013), Adjustable Robust
Parameter Design with Unknown Distributions. CentER Discussion Paper

Acknowledgment: I thank Michael Fu for inviting me to write a


chapter on RSM for this handbook and for his many comments on earlier
versions.

Electronic copy available at: https://ssrn.com/abstract=2395836

You might also like