Corner Lot PDF
Corner Lot PDF
Abstract
The goal is to determine if the new product will function when produced under the
have insufficient design margin, necessitating circuit redesign. Achieving this goal
requires producing a so-called corner lot that consists of skew chips, i.e., chips whose
key performance parameters that are expected to be around certain targeted extreme
values. These skew chips will be extensively tested to determine if their functions
process, few skewed chips can be guaranteed in a produced corner lot, and this is
Despite being a popular research area, variation reduction is a long-term effort that
from a different avenue by treating process variation as given and instead identifying a
design strategy that guarantees production of a good corner lot robust to the variation.
this problem, investigate the theoretical properties and practical implications of this
formulation, and further propose several optimal criteria and a corresponding design
reduction
1
1. Introduction
completion of the design process and transfer of tooling to the fabrication plant to
produce initial lots of the new product. Chips in these initial lots are extensively
to long-term fabrication variation. Since the product development time course is much
shorter (months) than the lifetime of the product (10 years or more), variation in the
initial lots of the new product will be much smaller than the long-term fabrication
variation. This poses a risk that the product may not perform well in the long
is to produce a corner lot (Weste and Harris, 2011; Automotive Electronics Council,
2013), which is a lot whose recipe is manipulated to try to achieve extreme values of
the long-term fabrication variation for key performance parameters of the product,
such as leakage current, circuit frequency, and operating voltage (May and Spanos,
2006).
Chips in the corner lot with performance parameters within a small tolerance
around the targeted extreme values are called skew chips. These chips will be
extensively tested and analyzed to see if they can still meet product specifications.
If they do not, the new product may be considered to lack sufficient design margin.
This will result in certain circuits being redesigned. Both the product division, who
developed the product, and the fabrication factory, which will manufacture it, have an
interest in the corner lot because a new product that is robust to long-term fabrication
variation will be cheaper and easier to produce, benefitting both the product division
To produce a corner lot, the industrial common practice is to make sure the mean
of the key performance parameter of all the chips in the corner lot is equal to the
targeted extreme value (Nakagawa et al., 1999; Wang et al., 2004; Gough, 2014).
2
Precise control on the mean of a lot is not hard in semiconductor manufacturing
engineers. However, there is no control over the variance of the key performance
involves of hundreds of process steps. Each step has its own inherent variation.
The step-wise variation accumulates and propagates, leading to large variance in the
key performance parameter of the chips at the end of the fabrication line. As a
consequence, when a corner lot is being produced according to the process recipe,
the variation will act as random shock that moves the key performance parameter
of the chips away from the targeted value. Therefore, it is a common frustration in
the current industrial practice that few skew chips can be guaranteed in a produced
corner lot. This creates tremendous difficulty for subsequent design evaluation and
product characterization. When this happens, additional corner lots will have to be
produced, which will increase the lead time and cost of new product development.
Wafer defect detection and classification: Recognizing the defect patterns is the
first step toward process variation reduction. Research in this sub-area generally
classify different defect types. For example, Wang et al. (2006) proposed an approach
distinguish three types of defect patterns. Yuan and Kuo (2007) proposed a model-
Bao et al. (2014) proposed to decompose the thickness variation of wafers into macro
and micro-scale variations modelled as a cubic curve and first-order intrinsic Gaussian
3
Process monitoring: In semiconductor manufacturing, the process and data have
some unique characteristics that drive the development of new control charts. For
example, Yeh et al. (2005) proposed a multivariate EWMA control chart for
monitoring the critical dimensions of dies sampled from different sites on individual
wafers. Zou et al. (2007) proposed a multivariate EWMA control chart to monitor
the profiles of the DRIE process and detect changes. Zou et al. (2008) relaxed the
Fault root cause diagnosis: To enable root cause diagnosis, one needs to build
a model between the quality variables and process parameters such that quality
problems can be traced back to the process. Fenner et al. (2009) proposed a
Bayesian parallel site model to link wafer deposition uniformity measures to key
process parameters. Jin and Liu (2013) proposed the use of piecewise linear regression
manufacturing, and then link quality variables at each stage with upstream quality,
process, and material variables. Yu and Qin (2009) tackled the fault diagnosis of
important tool for stabilizing process output, reducing variation, and improving
quality. Recent developments include a batch EWMA controller that considers both
batch information and feedback quality information (Wang and Han, 2013), a Smith-
EWMA controller that ensures stability at the presence of serious material delay (Jin
and Tsung, 2009), and variable EWMA controllers for drifted processes (Tseng et al.,
2007; Tseng et al., 2008). In addition to R2R control, Jin and Shi (2012) developed
4
focus: we treat process variation as given and aim to produce corner lots robust
to the variation. Our study is relevant but different from robust parameter design
(Myers et al., 1992; Myers et al., 2016). Here, “robustness”means producing corner
lots that contain a guaranteed number of skew chips. The basic idea of our proposed
approach is that since process variation is treated as fixed, the robustness can be
achieved by spreading out the mean. That is, instead of setting the mean of a key
performance parameter of all the chips in the corner lot to be the same value (i.e., the
targeted extreme value), we can set the means of some chips to be below and others
to be above the extreme value. Compared with variation reduction that is relatively
long-term effort, this study serves an immediate need in new product development.
However, despite the practical value of this study goal, there is neither a rigorous
practical algorithms to guide the search for the best mean-spread-out strategy to
The rest of the paper is organized as follows: Section 2 presents the detailed
of the formulation and their practical implications; Section 4 proposes two optimal
criteria to guide the search for the best mean-spread-out strategy, called the “optimal
design”, and a design search algorithm; Section 5 presents the application; Section
6 is the conclusion. Note that the “optimal design” in the context of this paper has
we would like to point out that this paper focuses on development of the optimal
design, not on how to generate corner lots according to the optimal design after it
2. Problem formulation
2.1 Nomenclature
5
Xijk Performance parameter (e.g., circuit frequency) for the
Li , Wij , Mijk Variance components of Xijk from the lot, wafer and
A pre-defined tolerance
2.2 Formulation
a bare silicon wafer into one containing hundreds of chips (i.e., integrated circuits).
In the final step, key performance parameters of the chips such as leakage current,
circuit frequency, and operating voltage, will be tested to see how well the chips work.
There are three variation sources for performance parameters of the chips:
• Lot-to-lot variation: a lot is the smallest batch of wafers that will be operated on
at a process step. Once a lot is started at a process step, all of the wafers in that
• Wafer-to-wafer variation: at each process step, the equipment can either process
the wafers in a lot all together (i.e., the so-called batch mode) or sequentially.
Both strategies induce wafer-to-wafer variation. In the batch mode, the wafer-
6
• Within-wafer site-to-site variation: each chip has a spatial position on the wafer.
Different equipment will induce variation across the surface of wafer in different
Considering the structure of the three variation sources, we can model a perfor-
(1) is the well-known variance components model that has been commonly used to
model semiconductor processes in the literature (Diebold, 2001; Drain, 1997; Reda
• We focus on corner lots. As discussed in the Introduction, just one corner lot
decision variable in our case, i.e., it can be set by the fabrication engineers to
achieve a desired value, and how to set µj to achieve robustness in corner lot
Considering these modifications, (1) is changed to (2), which is used in the rest of
this paper.
Furthermore, recall that the purpose of a corner lot is to have the performance
value” in the sense that it is usually far from the nominal value of the performance
parameter µnominal , e.g., µ0 = µnominal + 3σ. The chips in the corner lot that have
performance parameters within a small tolerance around the targeted extreme value
7
µ0 are successful chips, also known as skew chips. Mathematically, we can define a
skew chip in the following way: I[X1jk ] = 1 if X1jk is a skew chip in the corner lot and
1 µ0 − ≤ X1jk ≤ µ0 +
I[X1jk ] = . (3)
0 otherwise
and leave one-sided limits (i.e., performance parameters that are the-larger-the-better
m P
P n m P
P n
I[X1jk ] E[I[X1jk ] |L1 ]
j=1 k=1 j=1 k=1
f (L1 ; µ1 , . . . , µm ) , E[ |L1 ] =
mn mn
m P
P n
P (µ0 − ≤ X1jk ≤ µ0 + |L1 )
j=1 k=1
= . (4)
mn
If the corner lot had been produced, then L1 would have been observed, i.e.,
m
1 X µ0 + − (µj + l1 ) µ0 − − (µj + l1 )
φ( p 2 ) − φ( p ) , (5)
m j=1 σW + σM 2
σW2
+ σM2
where φ(·) is the cumulative probability function of the standard normal distribution.
However, we are designing the corner lot generation in this paper, i.e., L1 is not yet
observed at this stage. Therefore, the l1 in (5) should be replaced by the random
variable L1 , i.e.,
m
1 X µ0 + − (µj + L1 ) µ0 − − (µj + L1 )
f (L1 ; µ1 , . . . , µm ) = φ( p ) − φ( p ) . (6)
m j=1 2
σW 2
+ σM 2
σW 2
+ σM
8
For a fixed design in corner lot generation, i.e., µ1 , · · · , µm are given, f (L1 ; µ1 , · · · , µm )
a target design is a design used in the current industrial practice that sets all the
1. Note that we are only able to show the histogram using a Monte Carlo simulation
but not the probability density function because the distribution of f (L1 ; µ0 , · · · , µ0 )
Figure 1 sheds some light on why the target design can be unsatisfactory. For one
corner lot, we get one realization for f (L1 ; µ0 , · · · , µ0 ), which has a certain (non-zero)
probability of being a very small number. For example, f (L1 ; µ0 , · · · , µ0 ) has 3.1%,
7%, and 11.4% probabilities of being less than 0.05, 0.10, and 0.15, respectively. This
means that very few skew chips may be generated in the corner lot, making it difficult
To compare with the target design, we show the histogram of f (L1 ; µ∗1 , · · · , µ∗m )
in Figure 2 for an optimal design, i.e., the design under the maximum-single-lot-
Figure 1, f (L1 ; µ∗1 , · · · , µ∗m ) has only 0.05%, 0.30%, and 0.92% probabilities of being
less than 0.05, 0.10, and 0.15, respectively. Especially, there is zero probability for
f (L1 ; µ∗1 , · · · , µ∗m ) to be zero while this probability is non-zero for f (L1 ; µ0 , · · · , µ0 ).
Also, it is quite obvious that f (L1 ; µ∗1 , · · · , µ∗m ) has a much smaller variance than
f (L1 ; µ0 , · · · , µ0 ). All these imply that the optimal design is more likely to produce a
guaranteed number of skew chips, a highly favorable property for corner lot generation
in practice. Therefore, the research question we will need to tackle is the following:
what design in terms of the setting for (µ1 , · · · , µm )T will lead to a favorable shape
9
to fully characterize this distribution requires an infinite number of parameters. To
provide a tractable solution, we focus on studying the mean and variance of the
distribution of f (L1 ; µ1 , · · · , µm ) in this paper (Section 3), which further leads to two
proposed optimal criteria and an algorithm for searching the optimal designs that
Let E(f (L1 ; µ1 , · · · , µm )) and V ar(f (L1 ; µ1 , · · · , µm )) denote the mean and
In this section, we will first show that the target design, i.e.,(µ1 , · · · , µm )T =
among all possible designs, the target design achieves the highest mean proportion of
skew chips in a corner lot. This seems to suggest that the target design is favorable.
However, the mean only reflects the proportion of skew chips in a corner lot in the long
term. For a single corner lot that is to be produced, the proportion of skew chips
in the lot is also heavily influenced by the variance, i.e., V ar(f (L1 ; µ1 , · · · , µm )).
Even though the target design has the highest mean proportion of skew chips, if the
variance is large, a single corner lot produced under the target design may still have a
chance of including zero or few skew chips. This is well demonstrated by the example
this section. Specifically, Theorem 2 proves that the target design does not minimize
V ar(f (L1 ; µ1 , · · · , µm )), i.e., there are other designs with smaller variances.
√
2 −20a2 + 400a22 −840a1 a3
Theorem 2 When m > 1 and ˜ ≤ 2)
14a3 (1+σ̃L
, where
5
a1 = −(1 + 2σ̃L2 )(1 + σ̃L2 )5 + (1 + 2σ̃L2 ) 2 (1 + σ̃L2 )2 ,
10
5 3 3
a2 = −(1 + 2σ̃L2 ) 2 (1 + σ̃L2 ) + ( σ̃L4 + 2σ̃L2 + 1 + )(1 + σ̃L2 )4 , and
2 8m
5
a3 = (1 + 2σ̃L2 ) 2 ,
number of wafers in a corner lot, which is typically greater than one. The condition
a broad range of semiconductor products for all of which this condition is met.
Theorems 1 and 2 imply that the target design maximizes the mean of the
proportion of skew chips in a corner lot, but it does not minimize the variance of
the proportion. This suggests that we can potentially search for a design that has a
smaller variance with some acceptable sacrifice of the mean. In practice, a design with
some skew chips in a single corner lot. To search for such a design, we need to identify
along which the variance is decreasing. Corollary 2.1 presents the variance-decreasing
directions, as a result from Theorem 2. Please see the proofs of Theorems 1 and 2
Corollary 2.1 Let (1, · · · , 1)T be a vector consisting of m ones. (i) Any direction
Corollary 2.1 (ii) suggests that among the variance-decreasing directions identified
in (i), there is no mathematical preference because the variance along any of these
directions decreases at the same rate. However, from the engineering perspective,
11
Specifically, for cost-saving and error-prone purposes, a variance-decreasing direction
that needs minimum process adjustments is desirable. This leads to choosing the
even number of wafers in the corner lot). Here, √1 (−1, · · · , −1, 1, · · · , 1)T is a vector
m
for which the first m/2 elements are −1 and the remaining m/2 elements are 1. With
this direction, there are only two different settings for the m wafers, i.e., µ0 − δ and
consideration, we can replace the µj in (6) with µ0 − δ for j = 1, · · · , m/2 and with
1 + δ − L1 − + δ − L1
f (L1 ; δ) = φ( p 2 ) − φ( p )
2 σW + σM2 2
σW 2
+ σM
1 − δ − L1 − − δ − L1
+ φ( p 2 ) − φ( p 2 . (7)
2 σW + σM2
σW + σM2
can be chosen. √1 (−1, · · · , −1, 0, 1, · · · , 1)T is a vector for which the first (m − 1)/2
m
elements are −1, last (m − 1)/2 elements are 1, and middle element is 0. With this
direction, there are three different settings for the m wafers, i.e., µ0 −δ, µ0 , and µ0 +δ.
m−1 + δ − L1 − + δ − L1
f (L1 ; δ) = φ( p 2 ) − φ( p )
2m σW + σM 2 2
σW 2
+ σM
m−1 − δ − L1 − − δ − L1
+ φ( p 2 ) − φ( p
2m σW + σM 2 2
σW 2
+ σM
1 − L1 − − L1
+ φ( p 2 ) − φ( p ) . (8)
m σW + σM 2 2
σW + σM2
parameter. When δ = 0, the corresponding design is the target design (i.e., the design
design will have a decreasing V ar(f (L1 ; δ)) accompanied by a decreasing E(f (L1 ; δ)).
Therefore, the research question boils down to identifying the value for δ that
12
corresponds to an optimal design. To answer this question, we first need to define the
optimal criterion and then develop an algorithm that searches for a design satisfying
the optimal criterion (the optimal design), which will be presented in the next section.
only one corner lot is allowed to be produced due to resource or time constraints.
Since skew chips in the corner lot will be used for design evaluation and product
characterization, sufficient skew chips need to be produced. That is, there is usually
a requirement on the proportion of skew chips in the corner lot. Denote the required
proportion by α. Then, we would like to find a design that maximizes the probability
for a single corner lot to contain at least α proportion of skew chips, i.e.,
Note that this criterion maximizes the probability of a corner lot having at least
still possible for a single corner lot to contain less than α proportion of skew chips.
When this happens, the decision can be to take whatever skew chips that have been
produced for design evaluation and product characterization, if the resource or/and
production timeline is so tight that no more corner lots can be afforded. Alternatively,
if having a required number of skew chips is more important and it outweighs the
concerns of more resource spent and possible delay in the production schedule, the
decision can be to produce more corner lots until the required number of skew chips
least α proportion of skew chips, a second corner lot will be produced. In general,
corner lots have to be continuously generated until at least α proportion of skew chips
13
Then, the expected production cost for the needed corner lots is: c × EL1 (M (δ, α)),
and identically distributed with f (L1 ; δ). Without loss of generality, we assume unit
cost, i.e., c = 1, in the remainder of this paper. Then, the optimal design is one for
The optimization problems in (9) and (10) have the properties of no analytical
forms for the objective functions, no constraints, and only one decision variable δ.
Considering these properties, numerical search methods are appropriate for solving
golden section search and successive parabolic interpolation and is also implemented
rate (Brent, 2013). More detailed steps for solving the optimizations in (9) and (10)
are presented by a flow chart in Figure 3. Finally, we want to point out that both
of skewed chips. The latter would be ideal, but not possible because the proportion
5. Application
In this section, we present the results of applying the proposed optimal criteria
the circuit frequency. Standardized notations are used for clarity of the presentation.
Note that it is customary in industry to present the tolerance in units of the total
1
q q
f (L̃1 ; δ̃) = φ(˜ 1 + σ̃L + δ̃ − L̃1 ) − φ(−˜ 1 + σ̃L2 + δ̃ − L̃1 )
2
2
1
q q
+ φ(˜ 1 + σ̃L − δ̃ − L̃1 ) − φ(−˜ 1 + σ̃L2 − δ̃ − L̃1 ) ,
2
(11)
2
14
and the optimal criteria in (9) and (10) become (13) and (14), respectively.
We start out by focusing on one type of product and illustrating how to search for
the optimal design for corner lot generation. For this product, the product division
requires α = 0.15. A corner lot consists of 24 wafers and 800 chips. α = 0.15
adequate number for product characterization and design evaluation. The tolerance
to estimate the variance components. This requires a large amount of historical data
from the same distribution in order to ensure the quality of statistical estimation. To
this end, we retrieve historical data from the old generations of this (same) product
on circuit frequency for 13076487 chips contained on 9425 wafers in 400 lots. Variance
the best approach that has been recommended in the literature and also widely used
in industry is the ANOVA method (Jenson, 2002). Using the ANOVA method, we
estimates for the variance components. To account for the estimation uncertainty, we
2
dfL σ̂L 2
dfW σ̂W 2
dfM σ̂M
can employ the sampling distributions that 2
σL
∼ χ2dfL , 2
σW
∼ χ2dfW , and 2
σM
∼
χ2dfM , where dfL = 399, dfW = 9424, and dfM = 13076487, and get confidence interval
estimation. Specifically, the 95% confidence intervals for the variance components are
intervals are very narrow due to the large sample size, suggesting that estimation
uncertainty is not much of a concern. Therefore, we decide to use the point estimates
15
2
σ̂L
ˆL2 =
to conduct an initial search for the optimal design, i.e., σ̃ 2 2 = 0.316.
σ̂W +σ̂M
Give all the above information as input, we first choose the maximum-single-
lot-probability criterion. The design search algorithm finds the optimal design
parameter to be δ̃ˆ(i)∗ = 1.24. Under this optimal design, the probability for a
single lot to consist of at least 0.15 proportion of skew chips is P (f (L1 ; 1.24) >
0.15) = 1.000. Furthermore, to evaluate how this finding is impacted by the variance
obtain an empirical distribution for the optimal design parameter δ̃ (i)∗ . Based on
the empirical distribution, we can obtain a 95% confidence interval for δ̃ (i)∗ , which
is δ̃ (i)∗ ∈ [0.95, 1.47]. This confidence interval is very narrow, indicating that the
uncertainty in finding the optimal design parameter that is introduced by the variance
the design search algorithm finds the optimal design to be δ̃ˆ(i)∗∗ = 1.35. Under this
optimal design, the expected production cost (i.e., the expected number of corner
lots to be produced) is EL̃1 (M (1.35, 0.15)) = 1.000. This means that one corner lot is
variance component estimation uncertainty, we use the same sampling approach as the
one previously used for the maximum-single-lot-probability criterion and obtain a 95%
confidence interval for δ̃ (ii)∗ , which is δ̃ (ii)∗ ∈ [1.23, 1.53]. This confidence interval is
also very narrow. Note that the confidence intervals for the optimal design parameters
under the two optimal criteria overlap, which implies that the two parameters have
no statistical difference. This explains why the optimal designs under the two criteria
both suggest that a single corner lot is sufficient for satisfying the requirement of
16
components and tolerances. Typical semiconductor products have a tolerance ranging
from 0.1 to 0.5 of the total process standard deviation. Using the standardized
notation in (11), this means ˜ ∈ [0.1, 0.5]. Also, according to (11), individual variance
components do not matter but only the ratio of the lot-to-lot variance to the sum
products have σ̃L2 ∈ [0.2, 2], which is the range we focus on in this study. Each
product. For each type of product, we run the design search algorithm according to
the two proposed optimal criteria. Results associated with the maximum-single-lot-
probability criterion are shown in Tables 1-3. Specifically, Table 1 shows the optimal
design parameter δ̃ (i)∗ including a point estimate and a 95% confidence interval. Table
2 shows the probability for a single lot to consist of at least 0.15 proportion of skew
chips under the optimal design, i.e., P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) including a point estimate
and a 95% confidence interval. Table 3 shows the percentage of increase in this
probability by comparing the optimal design and the target design with a p value
1) Table 1 shows that when ˜ = 0.1, the optimal design parameter under the
to generate a corner lot with at least 0.15 proportion of skew chips. This is because
the tolerance, ˜, is so small that it is very difficult for a chip to fall within this
tolerance limit and be qualified as a skew chip. When the tolerance gets larger, i.e.,
2) At ˜ = 0.2 and σ̃L2 ∈ [0.2, 1.4], the optimal design parameter is zero, meaning
that the optimal design is found to be the target design. At other settings of ˜ and
σ̃L2 , the optimal design is different from the target design. A general observation on
the optimal design parameters in Table 1 is that at a fixed ˜, δ̃ (i)∗ increases as σ̃L2
17
increases. This makes sense because a larger σ̃L2 means a larger lot-to-lot variation,
in which case the optimal design would need to spread the means of the wafers in
the corner lot to a greater extent in order to protect against the larger variation.
that a larger tolerance makes it easier for a chip to be qualified as a skew chip, and
therefore the optimal design could spread the means more to allow more chips inside
3) Focusing on products with ˜ ≥ 0.3, i.e., when the optimal design is different
from the target design, Table 2 shows a high probability (0.885-1.000) for a single
lot to consist of at least 0.15 proportion of skew chips under the optimal design.
Table 3 shows that in terms of this probability, the optimal design has a statistically
significant improvement over the target design (p < 0.001). Greater improvement is
seen for products with larger lot-to-lot variation, i.e., σ̃L2 . For products with ˜ = 0.2
and σ̃L2 ∈ [1.6, 2.0], although the optimal design still has a statistically significant
improvement over the target design (Table 3), the optimal design is unable to generate
a single lot with at least 0.15 proportion of skew chips with a high probability. This
is a joint effect of the small tolerance that makes it difficult for a chip to be qualified
as a skew chip and the large lot-to-lot variation that makes corner lot generation
the optimal design parameter is zero, meaning that the optimal design is found to
be the target design. Also, at a fixed ˜, δ̃ (ii)∗ increases as σ̃L2 increases; at a fixed σ̃L2 ,
δ̃ (ii)∗ generally increases as ˜ increases. These trends are similar to those observed for
apply.
5) Table 5 shows that for products with ˜ ≥ 0.3, the expected number of corner
lots to be produced is very close to one. Table 6 shows that the optimal design has
a statistically significant improvement over the target design (p < 0.001) in terms
18
of reducing the expected number of corner lots. Greater improvement is seen for
6) Comparing the results under the two optimal criteria, we find similar trends in
terms of how the optimal design parameter varies with respect to σ̃L2 and ˜ (Tables 1
and 4). Comparing Tables 2 and 5, we can see that the higher the probability for a
single lot to consist of at least 0.15 proportion of skew chips (Table 2), the smaller the
expected number of corner lots that need to be produced (Table 5). This correlation
is intuitive and makes sense. Also, we observe significant improvement of the optimal
design compared with the target design over different products under both criteria
(Tables 3 and 6). Despite the consistency of the results by the two criteria, they
is more appropriate if the resource or/and production timeline is so tight that only
more important so that corner lots must be continuously produced until the required
number is met. The former criterion is usually adopted for relatively more mature
products while the latter for new products with potentially high risk and high return.
6. Conclusion
properties of the formulation and its practical implications, and further developed
a practical algorithm to identify the optimal design under two proposed optimal
that demonstrated universal improvement of the optimal design compared with the
19
key performance parameters of the chips in corner lot generation.
Appendix A
Proof of Theorem 1
m P
P n
E[I[X1jk ] |L1 ] m
j=1 k=1 1 X
E(f (L1 ; µ1 , . . . , µm )) = E[ ]= E[I[X1jk ] ]
mn m j=1
m
1 X
= P (µ0 − ≤ X1jk ≤ µ0 + ). (A-1)
m j=1
m µ0 +
(x − µj )2
Z
1 X 1
E(f (L1 ; µ1 , . . . , µm )) = √ exp{− }dx. (A-2)
m j=1 µ0 − 2πσ 2σ 2
Based on the property of normal distributions, it is clear that (A-2) is maximized when
Proof of Theorem 2
The basic concept of the proof is to demonstrate that the target design,
or saddle point naturally means that a design with a smaller variance than the target
Briefly, the proof consists of the following three steps: (i) we derive the first-order
further show that they are equal to zero at the target design. (ii) (i) implies that
the target design is either a local extreme or saddle point for V ar(f (L1 ; µ1 , · · · , µm ).
To confirm, we derive the second-order partial derivatives and further the Hessian
matrix at the target design. The derivation shows that, under some condition, the
20
Hessian matrix can either have both positive and negative eigenvalues (meaning that
the target design is a saddle point) or all negative eigenvalues (meaning that the
target design is a local maximum). In both cases, there is a design with a smaller
variance than the target design. (iii) Finally, we derive the condition in (ii) for the
In what follows, we will present the detailed derivation in each of the three steps.
µ − µ0
ˇ = p , µ̃ j = p j , L̃1 ∼ N (0, σ̃L2 ).
2 2 2 2
σW + σM σW + σM
Then, the target design is (µ̃1 , · · · , µ̃m )T = (0, · · · , 0)T . Using the new notations, (6)
becomes (A-3):
m
1 X
f (L1 ; µ̃1 , · · · , µ̃m ) = φ(µ̃j + L̃1 + ˇ) − φ(µ̃j + L̃1 − ˇ)
m j=1
m Z
1 X ˇ 1 (x − µ̃j − L̃1 )2
= √ exp{− }dx. (A-3)
m j=1 −ˇ 2π 2
For notation simplicity, we will reuse µj , L1 , σL2 , as µ̃j , L̃1 , σ̃L2 , ˇ in the subsequent
m
(x − µj − L1 )2
Z
1 X 1
V ar(f (L1 ; µ1 , · · · , µm )) = V ar( √ exp{− }dx)
m j=1 − 2π 2
m
(x − µj − L1 )2
Z
1 X 1
= V ar( √ exp{− }dx) +
m2 j=1 − 2π 2
Z
2 X 1 (x − µi − L1 )2
Cov( √ exp{− }dx,
m2 i<j − 2π 2
Z
1 (x − µj − L1 )2
√ exp{− }dx)
− 2π 2
m
1 X 2 X
, g(µ j ) + h(µi , µj ) (A-4)
m2 j=1 m2 i<j
21
Here,
(x − µj − L1 )2
Z
1
g(µj ) = V ar( √ exp{− }dx),
− 2π 2
(x − µi − L1 )2 (x − µj − L1 )2
Z Z
1 1
h(µi , µj ) = Cov( √ exp{− }dx, √ exp{− }dx),
− 2π 2 − 2π 2
(i) Derive the first-order partial derivatives of V ar(f (L1 ; µ1 , ?µm )) at the
target design
According to (A-4),
R (x−µj −L1 )2
∂g(µj ) V ar( − √12π exp{− 2
}dx)
=
∂µj ∂µj
Z ∞ Z
1 1 L21 (x − µj − L1 )2
= √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj − − L1 ) (−µj + − L1 )
(exp{− } − exp{− })dL1 −
Z 2 2
1 1 (x − µj )2 (µj + )2 (µj − )2
exp{− }(exp{− } − exp{− })dx
π − 1 + σL2 2(1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )
= (A − B) − (C − D),
where
∞ Z
L21 (x − µj − L1 )2
Z
1 1
A = √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj − − L1 )
exp{− }dL1
2
1 (x − µj )2 + (µj + )2 σL2 (x − 2µj − )
Z
1
= exp{− } exp{ }dx,
2(1 + 2σL2 )
p
π − 1 + 2σL2 2
22
∞ Z
L21 (x − µj − L1 )2
Z
1 1
B = √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj + − L1 )
exp{− }dL1
Z 2
1 1 (x − µj )2 + (µj − )2 σL2 (x − 2µj + )
= exp{− } exp{ }dx,
2(1 + 2σL2 )
p
π − 1 + 2σL2 2
(x − µj )2 (µj + )2
Z
1 1
C= exp{− } exp{− }dx,
− π 1 + σL2 2(1 + σL2 ) 2(1 + σL2 )
(x − µj )2 (µj − )2
Z
1 1
D= exp{− } exp{− }dx.
− π 1 + σL2 2(1 + σL2 ) 2(1 + σL2 )
Z
∂h(µi , µj ) 1 1
= p
∂µj 2π − (1 + 2σL2 )
(1 + σL2 )( + µi )2 + (1 + σL2 )(x − µj )2 + 2σL2 ( + µi )(x − µj )
exp{− }dx −
2(1 + 2σL2 )
Z
1 1
p
2π − (1 + 2σL2 )
(1 + σL2 )( − µi )2 + (1 + σL2 )(x + µj )2 + 2σL2 ( − µi )(x + µj )
exp{− }dx +
2(1 + 2σL2 )
Z
1 1 (x − µj )2 ( − µi )2
exp{− } exp{− }dx −
2π − (1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )
Z
1 1 (x − µj )2 ( + µi )2
exp{− } exp{− }dx.
2π − (1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )
that the first-order partial derivatives of V ar(f (L1 ; µ1 , · · · , µm )) at the target design
23
V ar(f (L1 ; µ1 , · · · , µm )) at the target design
∂ 2 g(µj ) x2 + 2 σ 2 (x − ) 2(x − )
Z
1
2
|µj =0 = exp{− } exp{ L } dx +
∂µj π − 2 2(1 + 2σL ) (1 + 2σL2 )3/2
2
x2 + 2
Z
1 2
exp{− }dx
π − (1 + σL2 )2 2(1 + σL2 )
1
Z
= s(x)ψ(x)dx, (A-6)
π −
where
x2 + 2
ψ(x) = exp{− }.
2(1 + σL2 )
Furthermore,
and
∂ 2 h(µi , µj ) 1 2
|µi =0,µj =0 = (exp{− } − exp{−2 }). (A-8)
1 + 2σL2
p
∂µi ∂µj π (1 + 2σL2 )
Using the results in (A-6),(A-7), and (A-8), the Hessian matrix for V ar(f (L1 ; µ1 , · · · , µm ))
at µi = 0, i = 1, · · · , m is:
a c c ··· c
c a c ··· c
··· c , (A-9)
c c a
.. .. .. . . ..
. . . . .
c c c ··· a
24
where
1 ∂g(µj ) 2(m − 1) ∂h(µi , µj )
a= 2
|µ =0 + |µi =0,µj =0 ,
m2 ∂µj j
m2 ∂µ2j
2 2
c= p (exp{− 2
} − exp{−2 }).
2
π (1 + 2σL )m2 1 + 2σ L
According to the properties of a Hessian matrix, if λ1 > 0 and λ2 < 0, then (0, · · · , 0)T
is a saddle point for V ar(f (L1 ; µ1 , · · · , µm )); if λ1 < 0 and λ2 < 0, then (0, · · · , 0)T is
a local maximum point for V ar(f (L1 ; µ1 , · · · , µm )). Next, we will derive a sufficient
22 1
a−c ≤ (30a1 + 20a2 2 + 7a3 4 ),
15mπ (1 + 2σL2 ) 25 (1 + σL2 )4
where
5
a1 = −(1 + 2σL2 )(1 + σL2 )5 + (1 + 2σL2 ) 2 (1 + σL2 )2 ,
5 3 3
a2 = −(1 + 2σL2 ) 2 (1 + σL2 ) + ( σL4 + 2σL2 + 1 + )(1 + σL2 )4 ,
2 8m
5
a3 = (1 + 2σL2 ) 2 .
Since σL2 > 0, it is obvious that a1 < 0, a2 > 0, a3 > 0. To make λ2 = a − c < 0,
we must ensure that 30a1 + 20a2 2 + 7a3 4 < 0. Thus, a sufficient condition for
p
2 −20a2 + 400a22 − 840a1 a3
≤ . (A-10)
14a3
25
By replacing all notations back, we have
p
2 −20a2 + 400a22 − 840a1 a3
ˇ ≤ , (A-11)
14a3
where
5
a1 = −(1 + 2σ̃L2 )(1 + σ̃L2 )5 + (1 + 2σ̃L2 ) 2 (1 + σ̃L2 )2 ,
5 3 3
a2 = −(1 + 2σ̃L2 ) 2 (1 + σ̃L2 ) + ( σ̃L4 + 2σL2 + 1 + )(1 + σ̃L2 )4 ,
2 8m
5
a3 = (1 + 2σ̃L2 ) 2 .
1
Also note that ˜2 = ˇ2 ,
2
1+σ̃L
then (A-11) becomes
p
2 −20a2 + 400a22 − 840a1 a3
˜ ≤ , (A-12)
14a3 (1 + σ̃ 2 )
The proof of Theorem 2 shows that the Hessian matrix in (A-9) at the target
design has one eigenvalue λ1 = a+(m−1)c and m−1 other identical eigenvalues, i.e.,
26
Here, H is the Hessian matrix in (A-9). In (A-13), the first-order term is zero because
∂V ar(f (L1 ;µ1 ,··· ,µm ))
the ∂µi
|µ=0 = 0 according to the derivation of Theorem 2. The second-
Xm m
X
µT Hµ = ||µ||2 ( τi ei )T H τi ei
i=2 i=2
m
X
= ||µ||2 τi2 eTi Hei = ||µ||2 (a − c) < 0. (A-14)
i=2
m
where ||µ||2 = µ2i . Therefore, V ar(f (L1 ; µ1 , · · · , µm )) < V ar(f (L1 ; 0, · · · , 0)).
P
i=1
This proves (i) in Corollary 2.1. Moreover, because the m − 1 eigenvalues are the
Appendix B
We would like to discuss practical aspects related to how corner lots will be
generated according to the optimal design. After the product division receives a
process recipe so as to produce half of the wafers with average chip performance (e.g.,
circuit frequency) equal to µ0 +δ ∗ and the other half equal to µ0 −δ ∗ in each corner lot.
mature manufacturing process in the sense that there are well-established empirical
and physical models (Gray et al., 2009) for guiding the manipulation of recipes to
To achieve a desired average circuit frequency for the chips on a wafer, e.g., to
N-type and P-type MOSFET device currents, IDN and IDP , i.e., µ = f (IDN , IDP ).
∗ ∗
Using this model, we can identify the specific IDN and IDP that help achieve the
desired µ0 + δ ∗ . Next, to decide what process parameter settings can lead to the IDN
∗
∗
and IDP , two well-known physical models exist, i.e., IDN = µn Cox W
L
(VGS − VT N )2
27
and IDP = µp Cox W
L
(VSG − VT P )2 . µn and µp are electron (N) and hole (P) carrier
type and P-type transistors. In theory, all these process parameters can be modulated
∗ ∗
to achieve the desired IDN and IDP . In practice, some may be easier to modulate
than others and which process parameter(s) to modulate for each particular product is
known from process design. There are also detailed recipes/guidance on how to adjust
design knowledge exists to make sure average chip performance in each corner lot can
be achieved as recommended by the optimal design. This can also be achieved with
high precision, except when the amount of adjustment needed on certain equipment is
so fine that it is even smaller than the smallest adjustment that is physically possible.
However, such fine adjustment is rarely needed in practice. Despite the high precision
of recipe manipulation, it would still be of great practical interest to study how small
deviation in the recipe from the desire level would impact the level of achievement on
the optimal criteria, which is a future research direction we would like to pursue.
References
[2] Brent, R.P., Algorithms for minimization without derivatives, Dover publications,
New York (2013).
[3] Bao, L., Wang, K., Jin, R., A hierarchical model for characterising spatial wafer
variations, International Journal for Production Research, Vol. 52, No. 6, 1827-1842
(2014).
[4] Diebold, A.C., Handbook of Silicon Semiconductor Metrology, CRC Press (2001).
[5] Drain, David C., Statistical Methods for Industiral Process Control, CRC Press
(1997).
28
[6] Gough, A.M., Semiconductor Sample Generation Experimental Designs Robust
to Random Process Shocks, Technical report (2014).
[7] Gray, Paul R., Hurst, Paul J., Lewis, Stephen H., Meyer, Robert G., Analysis and
Design of Analog Integrated Circuits, fifth edition, Wiley (2009).
[8] Fenner, J.S., Jeong, Y.S., Jeong, M.K., Lu, J.C. A Bayesian parallel site
methodology with an application to uniformity modeling in semiconductor
manufacturing, IIE Transactions, Vol. 41, Issue 9, 754-763 (2009).
[10] Jin, R. and Shi, J., Reconfigured piecewise linear regression tree for
multistagemanufacturing process control, IIE Transactions, Vol. 44, Issue 4, 249-
261 (2012).
[11] Jin, R., Liu, K., Multimode Variation Modeling and Process Monitoring for
Serial-Parallel Multistage Manufacturing Processes, IIE Transactions, Vol. 45, Issue
6, 617-629 (2013).
[12] Jin, M., Tsung, F., Smith-EWMA run-to-run control schemes for a process with
measurement delay, IIE Transactions, Vol. 41, Issue 4, 346-358 (2009).
[14] Montgomery, D.C., Introduction to Statistical Quality Control, 7th Edition, John
Wiley & Sons, Inc. (2012).
[15] Montgomery, D.C., Design and Analysis of Experiments, 8th Edition, John Wiley
& Sons, Inc. (2012).
[16] Myers, R.H., Khuri, A.I., Vining G., Response surface alternatives to the Taguchi
robust parameter design approach. The American Statistician, The American
Statistician, Vol. 46, Issue 9, 131-139 (1992).
[18] Nakagawa, O.S., Chang, N., Lin. S., et al., Circuit impact and skew-corner
analysis of stochastic process variation in global interconnect, IEEE International
Conference in Interconnect Technology, 230-232 (1999).
[19] Reda, S., Nassif, S.R., Accurate Spatial Estimation and Decomposition
Techniques for Variability Characterization, IEEE Transactions on Semiconductor
Manufactoring, Vol. 23, No. 3, 345-357 (2010).
[20] Tseng, S.T., Tsung, F., Liu, P,Y., Variable EWMA run-to-run controller for
drifted processes, IIE Transactions, Vol. 39, Issue 3, 291-301 (2007).
29
[21] Tseng, S.T., Jou, B.Y., Liao, C.H., Adaptive variable EWMA controller for
drifted processes, IIE Transactions, Vol. 42, Issue 4, 247-259 (2010).
[22] Wang, C.H., Kuo, W., Bensmail, H., Detection and classification of defect
patterns on semiconductor wafers, IIE Transactions, Vol. 38, Issue 12, 1059-1068
(2006).
[23] Wang, D.T., McNall, W.A., Statistical Model based ASIC Skew Selection
Method, IEEE Workshop on Microelectronics and Electron Devices, 64-66 (2004).
[24] Wang, K., Han, K., A batch-based run-to-run process control scheme for
semiconductor manufacturing, IIE Transactions, Vol. 45, Issue 6, 658-669 (2013).
[25] Weste, Neil H.E., Harris, D.M., CMOS VLSI Design: A Circuits and Systems
Perspective, 4th Edition, Addison-Wesley (2011).
[27] Yeh, Authur B., Huwang, L., Wu, C.W., A multivariate EWMA control chart
for monitoring process variability with individual observations, IIE Transactions,
Vol. 37, Issue 11, 1023-1035 (2005).
[28] Yuan, T., Kuo, W., A model-based clustering approach to the recognition
of the spatial defect patterns produced during semiconductor fabrication, IIE
Transactions, Vol. 40, Issue 2, 93-101 (2007).
[29] Yu, J., Qin, S.J., Variance component analysis based fault diagnosis of multi-
layer overlay lithography processes, IIE Transactions, Vol. 41, Issue 9, 764-775
(2009).
[30] Zou, C., Tsung, F., Wang, Z., Monitoring General Linear Profiles Using
Multivariate Exponentially Weighted Moving Average Schemes, Technometrics,
Vol. 49, No.4 (2007).
[31] Zou, C., Tsung, F., Wang, Z., Monitoring Profiles Based on Nonparametric
Regression Methods, Technometrics, Vol. 50, No. 4, 512-526 (2008).
30
Table 1: The optimal design parameter δ̃ (i)∗ (point estimate and 95% confidence interval)
under the maximum-single-lot-probability criterion for different combinations of ˜ and σ̃L2
(i.e., different products).
Table 2: The probability for a single lot to consist of at least 0.15 proportion of skew chips
under the optimal design in Table 1, i.e., P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) (point estimate and 95%
confidence interval).
31
Table 3: The percentage of improvement of the optimal design compared with the target
design, i.e., [P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) − P (f (L̃1 ; 0) > 0.15)]/P (f (L̃1 ; 0) > 0.15), with p value
indicating statistical significance of the improvement.
Table 4: The optimal design parameter δ̃ (ii)∗ (point estimate and 95% confidence interval)
under the minimum-expected-cost criterion for different combinations of ˜ and σ̃L2 (i.e.,
different products).
32
Table 5: The expected production cost of corner lots under the optimal design in Table 4,
i.e., EL̃1 (M (δ̃ (ii)∗ , 0.15)) (point estimate and 95% CI).
Table 6: The percentage of reduction in expected production cost of the optimal design
compared with the target design, i.e.,
[EL̃1 (M (0, 0.15)) − EL̃1 (M (δ̃ (ii)∗ , 0.15))]/EL̃1 (M (0, 0.15)), with p value indicating
statistical significance of the reduction.
33
Figure 1: Histogram of f (L1 ; µ1 , · · · , µm ) under the target design
(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T .
2
m = 24, σW = 2, σM2 = 3, σ 2 = 6, = 0.5σ, µ = µ
L 0 nominal + 3σ.
Figure 2: Histogram of f (L1 ; µ1 , · · · , µm ) under an optimal design, i.e., the design under
the maximum-single-lot-probability criterion proposed in Section 4.
(µ1 , · · · , µm )T = (µ∗1 , · · · , µ∗m )T ,
m = 24, σW = 2, σM = 3, σL2 = 6, = 0.5σ, µ0 = µnominal + 3σ, µ∗i = µ0 − 3.89 for
2 2
34
Figure 3: Steps of the optimal design search algorithm.
35