0% found this document useful (0 votes)
135 views35 pages

Corner Lot PDF

This document discusses generating robust semiconductor corner lots that can withstand process variation. It proposes a new approach of spreading out the means of key performance parameters rather than targeting a single extreme value. This treats process variation as fixed and aims to guarantee producing corner lots with a minimum number of skew chips. The document presents the first mathematical formulation of this problem and investigates theoretical properties to develop optimal design criteria and search algorithms. It demonstrates universal improvement over traditional industry practices.

Uploaded by

sureshchattu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views35 pages

Corner Lot PDF

This document discusses generating robust semiconductor corner lots that can withstand process variation. It proposes a new approach of spreading out the means of key performance parameters rather than targeting a single extreme value. This treats process variation as fixed and aims to guarantee producing corner lots with a minimum number of skew chips. The document presents the first mathematical formulation of this problem and investigates theoretical properties to develop optimal design criteria and search algorithms. It demonstrates universal improvement over traditional industry practices.

Uploaded by

sureshchattu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Semiconductor Corner Lot Generation Robust to Process Variation:

Modeling and Analysis

Abstract

Product characterization is an important phase in developing new semiconductors.

The goal is to determine if the new product will function when produced under the

extreme edge of fabrication variation; if not, the product might be considered to

have insufficient design margin, necessitating circuit redesign. Achieving this goal

requires producing a so-called corner lot that consists of skew chips, i.e., chips whose

key performance parameters that are expected to be around certain targeted extreme

values. These skew chips will be extensively tested to determine if their functions

still meet specifications. However, due to extensive variation in the fabrication

process, few skewed chips can be guaranteed in a produced corner lot, and this is

a long-standing frustration in the semiconductor industry. One approach to produce

a satisfactory corner lot is through variation reduction of the fabrication process.

Despite being a popular research area, variation reduction is a long-term effort that

involves both technical and managerial considerations. We approach this problem

from a different avenue by treating process variation as given and instead identifying a

design strategy that guarantees production of a good corner lot robust to the variation.

Specifically, we propose a first-of-its-kind rigorous mathematical formulation about

this problem, investigate the theoretical properties and practical implications of this

formulation, and further propose several optimal criteria and a corresponding design

search algorithm. Applications for a broad range of semiconductor products are

presented to demonstrate the universal improvement of the proposed optimal design

compared with the traditional design used in current industrial practice.

Keywords: semiconductor manufacturing, product characterization, variation

reduction

1
1. Introduction

A key milestone in the development of a new integrated circuit product is

completion of the design process and transfer of tooling to the fabrication plant to

produce initial lots of the new product. Chips in these initial lots are extensively

tested and analyzed to determine if the design meets specifications. Product

characterization is an important step in determining if the new product will be robust

to long-term fabrication variation. Since the product development time course is much

shorter (months) than the lifetime of the product (10 years or more), variation in the

initial lots of the new product will be much smaller than the long-term fabrication

variation. This poses a risk that the product may not perform well in the long

term. When performing product characterization, a longstanding industry practice

is to produce a corner lot (Weste and Harris, 2011; Automotive Electronics Council,

2013), which is a lot whose recipe is manipulated to try to achieve extreme values of

the long-term fabrication variation for key performance parameters of the product,

such as leakage current, circuit frequency, and operating voltage (May and Spanos,

2006).

Chips in the corner lot with performance parameters within a small tolerance

around the targeted extreme values are called skew chips. These chips will be

extensively tested and analyzed to see if they can still meet product specifications.

If they do not, the new product may be considered to lack sufficient design margin.

This will result in certain circuits being redesigned. Both the product division, who

developed the product, and the fabrication factory, which will manufacture it, have an

interest in the corner lot because a new product that is robust to long-term fabrication

variation will be cheaper and easier to produce, benefitting both the product division

and fabrication factory.

To produce a corner lot, the industrial common practice is to make sure the mean

of the key performance parameter of all the chips in the corner lot is equal to the

targeted extreme value (Nakagawa et al., 1999; Wang et al., 2004; Gough, 2014).

2
Precise control on the mean of a lot is not hard in semiconductor manufacturing

because this industry is mature with well-established process recipe to followed by

engineers. However, there is no control over the variance of the key performance

parameter. The variance is typically large because a semiconductor fabrication line

involves of hundreds of process steps. Each step has its own inherent variation.

The step-wise variation accumulates and propagates, leading to large variance in the

key performance parameter of the chips at the end of the fabrication line. As a

consequence, when a corner lot is being produced according to the process recipe,

the variation will act as random shock that moves the key performance parameter

of the chips away from the targeted value. Therefore, it is a common frustration in

the current industrial practice that few skew chips can be guaranteed in a produced

corner lot. This creates tremendous difficulty for subsequent design evaluation and

product characterization. When this happens, additional corner lots will have to be

produced, which will increase the lead time and cost of new product development.

One approach to produce a good corner lot is through variation reduction of

the fabrication process. There is abundant existing work in variation reduction in

semiconductor manufacturing, which can be categorized into four major sub-areas.

Next, we briefly review each sub-area:

Wafer defect detection and classification: Recognizing the defect patterns is the

first step toward process variation reduction. Research in this sub-area generally

develops pattern recognition and statistical models to detect a specific defect or

classify different defect types. For example, Wang et al. (2006) proposed an approach

comprising a spatial filter, a classification module, and an estimation module to

distinguish three types of defect patterns. Yuan and Kuo (2007) proposed a model-

based clustering algorithm for spatial defect recognition on semiconductor wafers.

Bao et al. (2014) proposed to decompose the thickness variation of wafers into macro

and micro-scale variations modelled as a cubic curve and first-order intrinsic Gaussian

Markov random field, respectively.

3
Process monitoring: In semiconductor manufacturing, the process and data have

some unique characteristics that drive the development of new control charts. For

example, Yeh et al. (2005) proposed a multivariate EWMA control chart for

monitoring the critical dimensions of dies sampled from different sites on individual

wafers. Zou et al. (2007) proposed a multivariate EWMA control chart to monitor

the profiles of the DRIE process and detect changes. Zou et al. (2008) relaxed the

linear assumption of the previous approach and proposed a non-linear, non-parametric

method for profile monitoring.

Fault root cause diagnosis: To enable root cause diagnosis, one needs to build

a model between the quality variables and process parameters such that quality

problems can be traced back to the process. Fenner et al. (2009) proposed a

Bayesian parallel site model to link wafer deposition uniformity measures to key

process parameters. Jin and Liu (2013) proposed the use of piecewise linear regression

trees to identify multiple variation propagation modes in multistage semiconductor

manufacturing, and then link quality variables at each stage with upstream quality,

process, and material variables. Yu and Qin (2009) tackled the fault diagnosis of

multi-layer overlay lithography processes using a multistage state space model.

Automatic control : In semiconductor manufacturing, R2R process control is an

important tool for stabilizing process output, reducing variation, and improving

quality. Recent developments include a batch EWMA controller that considers both

batch information and feedback quality information (Wang and Han, 2013), a Smith-

EWMA controller that ensures stability at the presence of serious material delay (Jin

and Tsung, 2009), and variable EWMA controllers for drifted processes (Tseng et al.,

2007; Tseng et al., 2008). In addition to R2R control, Jin and Shi (2012) developed

multistage feedforward control methods in semiconductor manufacturing.

Although being an effective approach, variation reduction is a long-term effort that

involves thorough process investigation, data analytics, and interventions complicated

by technical and non-technical considerations. The present study has a different

4
focus: we treat process variation as given and aim to produce corner lots robust

to the variation. Our study is relevant but different from robust parameter design

(Myers et al., 1992; Myers et al., 2016). Here, “robustness”means producing corner

lots that contain a guaranteed number of skew chips. The basic idea of our proposed

approach is that since process variation is treated as fixed, the robustness can be

achieved by spreading out the mean. That is, instead of setting the mean of a key

performance parameter of all the chips in the corner lot to be the same value (i.e., the

targeted extreme value), we can set the means of some chips to be below and others

to be above the extreme value. Compared with variation reduction that is relatively

long-term effort, this study serves an immediate need in new product development.

However, despite the practical value of this study goal, there is neither a rigorous

mathematical formulation about the problem nor any theoretical investigations or

practical algorithms to guide the search for the best mean-spread-out strategy to

achieve robustness. This study aims to bridge the existing gap.

The rest of the paper is organized as follows: Section 2 presents the detailed

mathematical formulation of the problem; Section 3 studies theoretical properties

of the formulation and their practical implications; Section 4 proposes two optimal

criteria to guide the search for the best mean-spread-out strategy, called the “optimal

design”, and a design search algorithm; Section 5 presents the application; Section

6 is the conclusion. Note that the “optimal design” in the context of this paper has

a different meaning from that in experimental designs (Montgomery, 2012). Finally,

we would like to point out that this paper focuses on development of the optimal

design, not on how to generate corner lots according to the optimal design after it

is developed. The latter, nevertheless, is briefly discussed in Appendix B for readers

who are interested in the implementation.

2. Problem formulation

2.1 Nomenclature

5
Xijk Performance parameter (e.g., circuit frequency) for the

k th chip on the j th wafer in the ith lot

µj Mean performance parameter of the j th wafer

Li , Wij , Mijk Variance components of Xijk from the lot, wafer and

chips levels, Li ∼ N (0, σL2 ), Wij ∼ N (0, σW


2
), Mijk ∼
2
N (0, σM )

σ2 Total variance of Xijk , σ 2 = σL2 + σW


2 2
+ σM

 A pre-defined tolerance

m Number of wafers in a corner lot

n Number of chips on a wafer

δ Corner lot design parameter

˜, δ̃, σ̃L2 , L̃21 Standardized parameter, ˜ = √ 


2 +σ 2 +σ 2
δ̃ = √ δ
2 +σ 2
σW M L σW M
2
σL
σ̃L2 = 2 +σ 2
σW
L̃1 ∼ N (0, σ̃L2 )
M

2.2 Formulation

A semiconductor fabrication line consists of hundreds of process steps that turn

a bare silicon wafer into one containing hundreds of chips (i.e., integrated circuits).

In the final step, key performance parameters of the chips such as leakage current,

circuit frequency, and operating voltage, will be tested to see how well the chips work.

There are three variation sources for performance parameters of the chips:

• Lot-to-lot variation: a lot is the smallest batch of wafers that will be operated on

at a process step. Once a lot is started at a process step, all of the wafers in that

lot will be processed. This processing mechanism induces lot-to-lot variation.

• Wafer-to-wafer variation: at each process step, the equipment can either process

the wafers in a lot all together (i.e., the so-called batch mode) or sequentially.

Both strategies induce wafer-to-wafer variation. In the batch mode, the wafer-

to-wafer variation is due to each wafer being in a different position in the

equipment. In the sequential mode, it is due to tool variation over time.

6
• Within-wafer site-to-site variation: each chip has a spatial position on the wafer.

Different equipment will induce variation across the surface of wafer in different

patterns. This creates within-wafer variation.

Considering the structure of the three variation sources, we can model a perfor-

mance parameter of a chip as follows:

Xijk = µ + Li + Wij + Mijk , (1)

(1) is the well-known variance components model that has been commonly used to

model semiconductor processes in the literature (Diebold, 2001; Drain, 1997; Reda

and Nassif, 2010; Yashchin, 1994).

In our case, (1) needs two modifications:

• We focus on corner lots. As discussed in the Introduction, just one corner lot

is typically produced due to cost and scheduling considerations. Therefore, we

change the lot index i to 1 in (1).

• µ is allowed to vary for different wafers, so µ is replaced by µj . In fact, µj is a

decision variable in our case, i.e., it can be set by the fabrication engineers to

achieve a desired value, and how to set µj to achieve robustness in corner lot

generation is the research question we want to address in this paper.

Considering these modifications, (1) is changed to (2), which is used in the rest of

this paper.

X1jk = µj + L1 + W1j + M1jk . (2)

Furthermore, recall that the purpose of a corner lot is to have the performance

parameter of its chips achieve a targeted extreme value µ0 , which is “extreme

value” in the sense that it is usually far from the nominal value of the performance

parameter µnominal , e.g., µ0 = µnominal + 3σ. The chips in the corner lot that have

performance parameters within a small tolerance around the targeted extreme value

7
µ0 are successful chips, also known as skew chips. Mathematically, we can define a

skew chip in the following way: I[X1jk ] = 1 if X1jk is a skew chip in the corner lot and

I[X1jk ] = 0 otherwise, i.e.,



1 µ0 −  ≤ X1jk ≤ µ0 + 

I[X1jk ] = . (3)

0 otherwise

In this paper, we focus on performance parameters with two-sided tolerance limits

and leave one-sided limits (i.e., performance parameters that are the-larger-the-better

or the-smaller-the-better) for future investigation. Then, the expected proportion of

skew chips in the corner lot is:

m P
P n m P
P n
I[X1jk ] E[I[X1jk ] |L1 ]
j=1 k=1 j=1 k=1
f (L1 ; µ1 , . . . , µm ) , E[ |L1 ] =
mn mn
m P
P n
P (µ0 −  ≤ X1jk ≤ µ0 + |L1 )
j=1 k=1
= . (4)
mn

If the corner lot had been produced, then L1 would have been observed, i.e.,

L1 = l1 . l1 denotes a realization for L1 . Then, the conditional distribution of X1jk |L1


2 2
is N (µj + l1 , σW + σM ). Then, (4) becomes:

m
1 X µ0 +  − (µj + l1 ) µ0 −  − (µj + l1 ) 
φ( p 2 ) − φ( p ) , (5)
m j=1 σW + σM 2
σW2
+ σM2

where φ(·) is the cumulative probability function of the standard normal distribution.

However, we are designing the corner lot generation in this paper, i.e., L1 is not yet

observed at this stage. Therefore, the l1 in (5) should be replaced by the random

variable L1 , i.e.,

m
1 X µ0 +  − (µj + L1 ) µ0 −  − (µj + L1 ) 
f (L1 ; µ1 , . . . , µm ) = φ( p ) − φ( p ) . (6)
m j=1 2
σW 2
+ σM 2
σW 2
+ σM

8
For a fixed design in corner lot generation, i.e., µ1 , · · · , µm are given, f (L1 ; µ1 , · · · , µm )

is a random variable because it is a function of the random variable L1 . For example,

a target design is a design used in the current industrial practice that sets all the

µj ’s to the targeted extreme value µ0 , i.e., (µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T . Under the

target design, the histogram of f (L1 ; µ0 , · · · , µ0 ) for a semiconductor process with

m = 24, µ0 = µnominal + 3σ,  = 0.5σ, σL2 = 6, σW


2 2
= 2, and σM = 3, is shown in Figure

1. Note that we are only able to show the histogram using a Monte Carlo simulation

but not the probability density function because the distribution of f (L1 ; µ0 , · · · , µ0 )

does not follow any known parametric distribution.

Figure 1 sheds some light on why the target design can be unsatisfactory. For one

corner lot, we get one realization for f (L1 ; µ0 , · · · , µ0 ), which has a certain (non-zero)

probability of being a very small number. For example, f (L1 ; µ0 , · · · , µ0 ) has 3.1%,

7%, and 11.4% probabilities of being less than 0.05, 0.10, and 0.15, respectively. This

means that very few skew chips may be generated in the corner lot, making it difficult

for the subsequent design evaluation and product characterization.

To compare with the target design, we show the histogram of f (L1 ; µ∗1 , · · · , µ∗m )

in Figure 2 for an optimal design, i.e., the design under the maximum-single-lot-

probability criterion proposed in Section 4. Compared with the target design in

Figure 1, f (L1 ; µ∗1 , · · · , µ∗m ) has only 0.05%, 0.30%, and 0.92% probabilities of being

less than 0.05, 0.10, and 0.15, respectively. Especially, there is zero probability for

f (L1 ; µ∗1 , · · · , µ∗m ) to be zero while this probability is non-zero for f (L1 ; µ0 , · · · , µ0 ).

Also, it is quite obvious that f (L1 ; µ∗1 , · · · , µ∗m ) has a much smaller variance than

f (L1 ; µ0 , · · · , µ0 ). All these imply that the optimal design is more likely to produce a

guaranteed number of skew chips, a highly favorable property for corner lot generation

in practice. Therefore, the research question we will need to tackle is the following:

what design in terms of the setting for (µ1 , · · · , µm )T will lead to a favorable shape

for the distribution of f (L1 ; µ1 , · · · , µm )? This is an extremely challenging question

because the distribution of f (L1 ; µ1 , · · · , µm ) is non-parametric; thus theoretically,

9
to fully characterize this distribution requires an infinite number of parameters. To

provide a tractable solution, we focus on studying the mean and variance of the

distribution of f (L1 ; µ1 , · · · , µm ) in this paper (Section 3), which further leads to two

proposed optimal criteria and an algorithm for searching the optimal designs that

satisfy the industrial need (Section 4).

3. Theoretical Properties and Practical Implications

Let E(f (L1 ; µ1 , · · · , µm )) and V ar(f (L1 ; µ1 , · · · , µm )) denote the mean and

variance of f (L1 ; µ1 , · · · , µm ), respectively. The mean E(f (L1 ; µ1 , · · · , µm )) and the

variance V ar(f (L1 ; µ1 , · · · , µm )) are functions of the design parameters (µ1 , · · · , µm )T .

In this section, we will first show that the target design, i.e.,(µ1 , · · · , µm )T =

(µ0 , · · · , µ0 )T , maximizes E(f (L1 ; µ1 , · · · , µm )) (Theorem 1). This implies that

among all possible designs, the target design achieves the highest mean proportion of

skew chips in a corner lot. This seems to suggest that the target design is favorable.

However, the mean only reflects the proportion of skew chips in a corner lot in the long

term. For a single corner lot that is to be produced, the proportion of skew chips

in the lot is also heavily influenced by the variance, i.e., V ar(f (L1 ; µ1 , · · · , µm )).

Even though the target design has the highest mean proportion of skew chips, if the

variance is large, a single corner lot produced under the target design may still have a

chance of including zero or few skew chips. This is well demonstrated by the example

in Figure 1. Therefore, we further study the property of V ar(f (L1 ; µ1 , · · · , µm )) in

this section. Specifically, Theorem 2 proves that the target design does not minimize

V ar(f (L1 ; µ1 , · · · , µm )), i.e., there are other designs with smaller variances.

Theorem 1 The target design, i.e.,(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T , is the global

maximum solution for E(f (L1 ; µ1 , · · · , µm )), i.e.,

(µ0 , · · · , µ0 )T = argmax E(f (L1 ; µ1 , · · · , µm )).


µ1 ,··· ,µm


2 −20a2 + 400a22 −840a1 a3
Theorem 2 When m > 1 and ˜ ≤ 2)
14a3 (1+σ̃L
, where
5
a1 = −(1 + 2σ̃L2 )(1 + σ̃L2 )5 + (1 + 2σ̃L2 ) 2 (1 + σ̃L2 )2 ,

10
5 3 3
a2 = −(1 + 2σ̃L2 ) 2 (1 + σ̃L2 ) + ( σ̃L4 + 2σ̃L2 + 1 + )(1 + σ̃L2 )4 , and
2 8m
5
a3 = (1 + 2σ̃L2 ) 2 ,

there exists a design (µ∗1 , · · · , µ∗m )T , such that

V ar(f (L1 ; µ∗1 , · · · , µ∗m )) < V ar(f (L1 ; µ0 , · · · , µ0 )).

Here we discuss the practical validity of two conditions in Theorem 2: m is the

number of wafers in a corner lot, which is typically greater than one. The condition

on  is also easily satisfied, as demonstrated in the Application section that includes

a broad range of semiconductor products for all of which this condition is met.

Theorems 1 and 2 imply that the target design maximizes the mean of the

proportion of skew chips in a corner lot, but it does not minimize the variance of

the proportion. This suggests that we can potentially search for a design that has a

smaller variance with some acceptable sacrifice of the mean. In practice, a design with

a smaller variance is desirable because it provides assurance for producing at least

some skew chips in a single corner lot. To search for such a design, we need to identify

the direction in the m-dimensional space of the design parameters (µ1 , · · · , µm )T ,

along which the variance is decreasing. Corollary 2.1 presents the variance-decreasing

directions, as a result from Theorem 2. Please see the proofs of Theorems 1 and 2

and Corollary 2.1 in the Appendix.

Corollary 2.1 Let (1, · · · , 1)T be a vector consisting of m ones. (i) Any direction

orthogonal to √1 (1, · · · , 1)T is a decreasing direction for V ar(f (L1 ; µ1 , · · · , µm )) at


m

(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T .(ii) Furthermore, V ar(f (L1 ; µ1 , · · · , µm )) along any

of the variance-decreasing directions in (i) decreases at the same rate.

Corollary 2.1 (ii) suggests that among the variance-decreasing directions identified

in (i), there is no mathematical preference because the variance along any of these

directions decreases at the same rate. However, from the engineering perspective,

practical considerations must be taken into account when choosing a direction.

11
Specifically, for cost-saving and error-prone purposes, a variance-decreasing direction

that needs minimum process adjustments is desirable. This leads to choosing the

direction of √1 (−1, · · · , −1, 1, · · · , 1)T when m is an even number (i.e., there is an


m

even number of wafers in the corner lot). Here, √1 (−1, · · · , −1, 1, · · · , 1)T is a vector
m

for which the first m/2 elements are −1 and the remaining m/2 elements are 1. With

this direction, there are only two different settings for the m wafers, i.e., µ0 − δ and

µ0 + δ. Therefore, process adjustment is minimized. Factoring in this engineering

consideration, we can replace the µj in (6) with µ0 − δ for j = 1, · · · , m/2 and with

µ0 + δ for j = m/2 + 1, · · · , m. Then, (6) becomes:

1  + δ − L1 − + δ − L1 
f (L1 ; δ) = φ( p 2 ) − φ( p )
2 σW + σM2 2
σW 2
+ σM
1  − δ − L1 − − δ − L1 
+ φ( p 2 ) − φ( p 2 . (7)
2 σW + σM2
σW + σM2

Likewise, when m is an odd number, the direction of √1 (−1, · · · , −1, 0, 1, · · · , 1)T


m

can be chosen. √1 (−1, · · · , −1, 0, 1, · · · , 1)T is a vector for which the first (m − 1)/2
m

elements are −1, last (m − 1)/2 elements are 1, and middle element is 0. With this

direction, there are three different settings for the m wafers, i.e., µ0 −δ, µ0 , and µ0 +δ.

Then, (6) becomes:

m−1  + δ − L1 − + δ − L1 
f (L1 ; δ) = φ( p 2 ) − φ( p )
2m σW + σM 2 2
σW 2
+ σM
m−1  − δ − L1 − − δ − L1 
+ φ( p 2 ) − φ( p
2m σW + σM 2 2
σW 2
+ σM
1  − L1 − − L1 
+ φ( p 2 ) − φ( p ) . (8)
m σW + σM 2 2
σW + σM2

Summarizing the results of this section, we can conclude that δ is a design

parameter. When δ = 0, the corresponding design is the target design (i.e., the design

used in the current industrial common practice). As δ increases, the corresponding

design will have a decreasing V ar(f (L1 ; δ)) accompanied by a decreasing E(f (L1 ; δ)).

Therefore, the research question boils down to identifying the value for δ that

12
corresponds to an optimal design. To answer this question, we first need to define the

optimal criterion and then develop an algorithm that searches for a design satisfying

the optimal criterion (the optimal design), which will be presented in the next section.

4. Optimal Criteria and Optimal Design Search

We propose two optimal criteria:

(i) Maximum-single-lot-probability criterion. In industry, it is common that

only one corner lot is allowed to be produced due to resource or time constraints.

Since skew chips in the corner lot will be used for design evaluation and product

characterization, sufficient skew chips need to be produced. That is, there is usually

a requirement on the proportion of skew chips in the corner lot. Denote the required

proportion by α. Then, we would like to find a design that maximizes the probability

for a single corner lot to contain at least α proportion of skew chips, i.e.,

δ (i)∗ = argmax P (f (L1 ; δ) > α). (9)


δ

Note that this criterion maximizes the probability of a corner lot having at least

α proportion of skew chips. As long as this maximum probability is not one, it is

still possible for a single corner lot to contain less than α proportion of skew chips.

When this happens, the decision can be to take whatever skew chips that have been

produced for design evaluation and product characterization, if the resource or/and

production timeline is so tight that no more corner lots can be afforded. Alternatively,

if having a required number of skew chips is more important and it outweighs the

concerns of more resource spent and possible delay in the production schedule, the

decision can be to produce more corner lots until the required number of skew chips

is reached. This motivates the second criterion presented as follows.

(ii) Minimum-expected-cost criterion. When a single corner lot fails to produce at

least α proportion of skew chips, a second corner lot will be produced. In general,

corner lots have to be continuously generated until at least α proportion of skew chips

is accumulated. Each corner lot generation is associated with a production cost, c.

13
Then, the expected production cost for the needed corner lots is: c × EL1 (M (δ, α)),

M (δ, α) = min{n : Y1 + Y2 + · · · + Yn ≥ α}, where Y1 , Y2 , · · · , Yn are independently

and identically distributed with f (L1 ; δ). Without loss of generality, we assume unit

cost, i.e., c = 1, in the remainder of this paper. Then, the optimal design is one for

which the δ minimizes the expected production cost, i.e.,

δ (ii)∗ = argmin EL1 (M (δ, α)). (10)


δ

The optimization problems in (9) and (10) have the properties of no analytical

forms for the objective functions, no constraints, and only one decision variable δ.

Considering these properties, numerical search methods are appropriate for solving

the optimizations. We adopt a popular search algorithm that is a combination of

golden section search and successive parabolic interpolation and is also implemented

in R software. This algorithm is guaranteed to converge with a super-linear converging

rate (Brent, 2013). More detailed steps for solving the optimizations in (9) and (10)

are presented by a flow chart in Figure 3. Finally, we want to point out that both

criteria were not designed to ”guarantee” a certain pre-defined proportion of number

of skewed chips. The latter would be ideal, but not possible because the proportion

of skewed chips in a corner lot is a random variable.

5. Application

In this section, we present the results of applying the proposed optimal criteria

and associated design search algorithm to a broad range of semiconductor products.

We focus on one important performance parameter of semiconductor chips, which is

the circuit frequency. Standardized notations are used for clarity of the presentation.

Note that it is customary in industry to present the tolerance in units of the total

standard deviation. Then, (7) becomes

1
q q 
f (L̃1 ; δ̃) = φ(˜ 1 + σ̃L + δ̃ − L̃1 ) − φ(−˜ 1 + σ̃L2 + δ̃ − L̃1 )
2
2
1
q q 
+ φ(˜ 1 + σ̃L − δ̃ − L̃1 ) − φ(−˜ 1 + σ̃L2 − δ̃ − L̃1 ) ,
2
(11)
2

14
and the optimal criteria in (9) and (10) become (13) and (14), respectively.

δ̃ (i)∗ = argmax P (f (L̃1 ; δ̃) > α). (12)


δ̃

δ̃ (ii)∗ = argmin EL̃1 (M (δ̃, α)). (13)


δ̃

We start out by focusing on one type of product and illustrating how to search for

the optimal design for corner lot generation. For this product, the product division

requires α = 0.15. A corner lot consists of 24 wafers and 800 chips. α = 0.15

translates into 0.15 × 24 × 800 = 2880 skew chips, which is considered to be an

adequate number for product characterization and design evaluation. The tolerance

of this product is 0.5σ, which is given by engineering design. Furthermore, we need

to estimate the variance components. This requires a large amount of historical data

from the same distribution in order to ensure the quality of statistical estimation. To

this end, we retrieve historical data from the old generations of this (same) product

from our industry collaborator. Because semiconductor manufacturing has high

production volume, a large quantity of data is available, which includes measurements

on circuit frequency for 13076487 chips contained on 9425 wafers in 400 lots. Variance

component estimation in semiconductor processes has been a well-studied topic and

the best approach that has been recommended in the literature and also widely used

in industry is the ANOVA method (Jenson, 2002). Using the ANOVA method, we

obtain the estimates to be σ̂L2 = 0.24 ,σ̂W


2 2
= 0.15, σ̂M = 0.61. Note that these are point

estimates for the variance components. To account for the estimation uncertainty, we
2
dfL σ̂L 2
dfW σ̂W 2
dfM σ̂M
can employ the sampling distributions that 2
σL
∼ χ2dfL , 2
σW
∼ χ2dfW , and 2
σM

χ2dfM , where dfL = 399, dfW = 9424, and dfM = 13076487, and get confidence interval

estimation. Specifically, the 95% confidence intervals for the variance components are

σL2 ∈ [0.210, 0.277], σW


2 2
∈ [0.146, 0.154], and σM ∈ [0.6095, 0.6104]. These confidence

intervals are very narrow due to the large sample size, suggesting that estimation

uncertainty is not much of a concern. Therefore, we decide to use the point estimates

15
2
σ̂L
ˆL2 =
to conduct an initial search for the optimal design, i.e., σ̃ 2 2 = 0.316.
σ̂W +σ̂M

Give all the above information as input, we first choose the maximum-single-

lot-probability criterion. The design search algorithm finds the optimal design

parameter to be δ̃ˆ(i)∗ = 1.24. Under this optimal design, the probability for a

single lot to consist of at least 0.15 proportion of skew chips is P (f (L1 ; 1.24) >

0.15) = 1.000. Furthermore, to evaluate how this finding is impacted by the variance

component estimation uncertainty, we generate 100 random samples for σL2 , σW


2
,
2
dfL σ̂L 2
dfW σ̂W 2
dfM σ̂M
2
σM by sampling from 2
χdf
, 2
χdf
, and 2
χdf
, respectively. This allows us to
L W M

obtain an empirical distribution for the optimal design parameter δ̃ (i)∗ . Based on

the empirical distribution, we can obtain a 95% confidence interval for δ̃ (i)∗ , which

is δ̃ (i)∗ ∈ [0.95, 1.47]. This confidence interval is very narrow, indicating that the

uncertainty in finding the optimal design parameter that is introduced by the variance

component estimation uncertainty is not much of a concern.

Alternatively, we can choose the minimum-expected-cost criterion, under which

the design search algorithm finds the optimal design to be δ̃ˆ(i)∗∗ = 1.35. Under this

optimal design, the expected production cost (i.e., the expected number of corner

lots to be produced) is EL̃1 (M (1.35, 0.15)) = 1.000. This means that one corner lot is

expected to be sufficient. Furthermore, to evaluate how this finding is impacted by the

variance component estimation uncertainty, we use the same sampling approach as the

one previously used for the maximum-single-lot-probability criterion and obtain a 95%

confidence interval for δ̃ (ii)∗ , which is δ̃ (ii)∗ ∈ [1.23, 1.53]. This confidence interval is

also very narrow. Note that the confidence intervals for the optimal design parameters

under the two optimal criteria overlap, which implies that the two parameters have

no statistical difference. This explains why the optimal designs under the two criteria

both suggest that a single corner lot is sufficient for satisfying the requirement of

producing at least 0.15 proportion of skew chips in the corner lot.

Furthermore, we would like to repeat the above analysis on a variety of other

semiconductor products. The products are different in terms of their variance

16
components and tolerances. Typical semiconductor products have a tolerance ranging

from 0.1 to 0.5 of the total process standard deviation. Using the standardized

notation in (11), this means ˜ ∈ [0.1, 0.5]. Also, according to (11), individual variance

components do not matter but only the ratio of the lot-to-lot variance to the sum

of wafer-to-wafer and within-wafer variances, i.e., σ̃L2 , do. Typical semiconductor

products have σ̃L2 ∈ [0.2, 2], which is the range we focus on in this study. Each

combination of ˜ ∈ [0.1, 0.5] and σ̃L2 ∈ [0.2, 2] corresponds to a different type of

product. For each type of product, we run the design search algorithm according to

the two proposed optimal criteria. Results associated with the maximum-single-lot-

probability criterion are shown in Tables 1-3. Specifically, Table 1 shows the optimal

design parameter δ̃ (i)∗ including a point estimate and a 95% confidence interval. Table

2 shows the probability for a single lot to consist of at least 0.15 proportion of skew

chips under the optimal design, i.e., P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) including a point estimate

and a 95% confidence interval. Table 3 shows the percentage of increase in this

probability by comparing the optimal design and the target design with a p value

indicating the statistical significant of this comparison. Likewise, results associated

with the minimum-expected-cost criterion are shown in Tables 4-6. We summarize

the observations on Table 1-6 as follows:

1) Table 1 shows that when ˜ = 0.1, the optimal design parameter under the

maximum-single-lot-probability criterion is not available (NA), i.e., no design exists

to generate a corner lot with at least 0.15 proportion of skew chips. This is because

the tolerance, ˜, is so small that it is very difficult for a chip to fall within this

tolerance limit and be qualified as a skew chip. When the tolerance gets larger, i.e.,

˜ ∈ [0.2, 0.5], the optimal design parameter exists.

2) At ˜ = 0.2 and σ̃L2 ∈ [0.2, 1.4], the optimal design parameter is zero, meaning

that the optimal design is found to be the target design. At other settings of ˜ and

σ̃L2 , the optimal design is different from the target design. A general observation on

the optimal design parameters in Table 1 is that at a fixed ˜, δ̃ (i)∗ increases as σ̃L2

17
increases. This makes sense because a larger σ̃L2 means a larger lot-to-lot variation,

in which case the optimal design would need to spread the means of the wafers in

the corner lot to a greater extent in order to protect against the larger variation.

Also, at a fixed σ̃L2 , δ̃ (i)∗ generally increases as ˜ increases. A possible explanation is

that a larger tolerance makes it easier for a chip to be qualified as a skew chip, and

therefore the optimal design could spread the means more to allow more chips inside

the tolerance limits.

3) Focusing on products with ˜ ≥ 0.3, i.e., when the optimal design is different

from the target design, Table 2 shows a high probability (0.885-1.000) for a single

lot to consist of at least 0.15 proportion of skew chips under the optimal design.

Table 3 shows that in terms of this probability, the optimal design has a statistically

significant improvement over the target design (p < 0.001). Greater improvement is

seen for products with larger lot-to-lot variation, i.e., σ̃L2 . For products with ˜ = 0.2

and σ̃L2 ∈ [1.6, 2.0], although the optimal design still has a statistically significant

improvement over the target design (Table 3), the optimal design is unable to generate

a single lot with at least 0.15 proportion of skew chips with a high probability. This

is a joint effect of the small tolerance that makes it difficult for a chip to be qualified

as a skew chip and the large lot-to-lot variation that makes corner lot generation

inherently more difficult.

4) Under the minimum-expected-cost criterion, Table 4 shows that when ˜ is small,

the optimal design parameter is zero, meaning that the optimal design is found to

be the target design. Also, at a fixed ˜, δ̃ (ii)∗ increases as σ̃L2 increases; at a fixed σ̃L2 ,

δ̃ (ii)∗ generally increases as ˜ increases. These trends are similar to those observed for

Table 1 under the maximum-single-lot-probability criterion and the same explanations

apply.

5) Table 5 shows that for products with ˜ ≥ 0.3, the expected number of corner

lots to be produced is very close to one. Table 6 shows that the optimal design has

a statistically significant improvement over the target design (p < 0.001) in terms

18
of reducing the expected number of corner lots. Greater improvement is seen for

products with larger lot-to-lot variation, i.e., σ̃L2 .

6) Comparing the results under the two optimal criteria, we find similar trends in

terms of how the optimal design parameter varies with respect to σ̃L2 and ˜ (Tables 1

and 4). Comparing Tables 2 and 5, we can see that the higher the probability for a

single lot to consist of at least 0.15 proportion of skew chips (Table 2), the smaller the

expected number of corner lots that need to be produced (Table 5). This correlation

is intuitive and makes sense. Also, we observe significant improvement of the optimal

design compared with the target design over different products under both criteria

(Tables 3 and 6). Despite the consistency of the results by the two criteria, they

each have a unique value in practice. The maximum-single-lot-probability criterion

is more appropriate if the resource or/and production timeline is so tight that only

a single corner lot can be afforded. The minimum-expected-cost criterion is more

appropriate if having a required number of skew chips for product characterization is

more important so that corner lots must be continuously produced until the required

number is met. The former criterion is usually adopted for relatively more mature

products while the latter for new products with potentially high risk and high return.

6. Conclusion

Corner lot generation is an important task in the product characterization of new

semiconductors. However, the current industrial practice is primarily empirically

based and ineffective. In this paper, we provided a first-of-its-kind rigorous

mathematical formulation for corner lot generation, investigated the theoretical

properties of the formulation and its practical implications, and further developed

a practical algorithm to identify the optimal design under two proposed optimal

criteria. Applications on a broad range of semiconductor products were presented

that demonstrated universal improvement of the optimal design compared with the

target design across various products. An immediate future research direction is to

extend the proposed methodology to additional dimensions, i.e., to consider multiple

19
key performance parameters of the chips in corner lot generation.

Appendix A

Proof of Theorem 1

This proof uses the original definition of f (L1 ; µ1 , . . . , µm ) in (4), i.e.,

m P
P n
E[I[X1jk ] |L1 ] m
j=1 k=1 1 X
E(f (L1 ; µ1 , . . . , µm )) = E[ ]= E[I[X1jk ] ]
mn m j=1
m
1 X
= P (µ0 −  ≤ X1jk ≤ µ0 + ). (A-1)
m j=1

The distribution of X1jk according to (2) is X1jk ∼ N (µj , σ 2 ), where σ 2 = σL2 + σW


2
+
2
σM . Therefore, (A-1) becomes:

m µ0 +
(x − µj )2
Z
1 X 1
E(f (L1 ; µ1 , . . . , µm )) = √ exp{− }dx. (A-2)
m j=1 µ0 − 2πσ 2σ 2

Based on the property of normal distributions, it is clear that (A-2) is maximized when

(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T . Consequently, E(f (L1 ; µ1 , . . . , µm )) is maximized at

the target design.

Proof of Theorem 2

The basic concept of the proof is to demonstrate that the target design,

i.e., (µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T , is a local maximum or a saddle point for

V ar(f (L1 ; µ1 , · · · , µm ) when  is below an upper bound. Being a local maximum

or saddle point naturally means that a design with a smaller variance than the target

design exists (i.e., the statement in Theorem 2).

Briefly, the proof consists of the following three steps: (i) we derive the first-order

partial derivatives of V ar(f (L1 ; µ1 , · · · , µm ) with respect to µi , i = 1, · · · , m, and

further show that they are equal to zero at the target design. (ii) (i) implies that

the target design is either a local extreme or saddle point for V ar(f (L1 ; µ1 , · · · , µm ).

To confirm, we derive the second-order partial derivatives and further the Hessian

matrix at the target design. The derivation shows that, under some condition, the

20
Hessian matrix can either have both positive and negative eigenvalues (meaning that

the target design is a saddle point) or all negative eigenvalues (meaning that the

target design is a local maximum). In both cases, there is a design with a smaller

variance than the target design. (iii) Finally, we derive the condition in (ii) for the

two cases to hold, which turns out to be an upper bound on .

In what follows, we will present the detailed derivation in each of the three steps.

For notation simplicity, let

 µ − µ0
ˇ = p , µ̃ j = p j , L̃1 ∼ N (0, σ̃L2 ).
2 2 2 2
σW + σM σW + σM

Then, the target design is (µ̃1 , · · · , µ̃m )T = (0, · · · , 0)T . Using the new notations, (6)

becomes (A-3):

m
1 X 
f (L1 ; µ̃1 , · · · , µ̃m ) = φ(µ̃j + L̃1 + ˇ) − φ(µ̃j + L̃1 − ˇ)
m j=1
m Z
1 X ˇ 1 (x − µ̃j − L̃1 )2
= √ exp{− }dx. (A-3)
m j=1 −ˇ 2π 2

For notation simplicity, we will reuse µj , L1 , σL2 ,  as µ̃j , L̃1 , σ̃L2 , ˇ in the subsequent

derivations. Then V ar(f (L1 ; µ1 , · · · , µm )) can be written as

m 
(x − µj − L1 )2
Z
1 X 1
V ar(f (L1 ; µ1 , · · · , µm )) = V ar( √ exp{− }dx)
m j=1 − 2π 2
m 
(x − µj − L1 )2
Z
1 X 1
= V ar( √ exp{− }dx) +
m2 j=1 − 2π 2
Z 
2 X 1 (x − µi − L1 )2
Cov( √ exp{− }dx,
m2 i<j − 2π 2
Z 
1 (x − µj − L1 )2
√ exp{− }dx)
− 2π 2
m
1 X 2 X
, g(µ j ) + h(µi , µj ) (A-4)
m2 j=1 m2 i<j

21
Here,

(x − µj − L1 )2
Z
1
g(µj ) = V ar( √ exp{− }dx),
− 2π 2

and is therefore a function of µj .

 
(x − µi − L1 )2 (x − µj − L1 )2
Z Z
1 1
h(µi , µj ) = Cov( √ exp{− }dx, √ exp{− }dx),
− 2π 2 − 2π 2

and is therefore a function of µi and µj .

(i) Derive the first-order partial derivatives of V ar(f (L1 ; µ1 , ?µm )) at the

target design

According to (A-4),

∂V ar(f (L1 ; µ1 , · · · , µm )) 1 ∂g(µj ) 2 X ∂h(µi , µj )


= + . (A-5)
∂µj m2 ∂µj m2 i6=j ∂µj

∂g(µj ) ∂h(µi ,µj )


Next, we derive ∂µj
and ∂µj
.

R (x−µj −L1 )2
∂g(µj ) V ar( − √12π exp{− 2
}dx)
=
∂µj ∂µj
Z ∞ Z 
1 1 L21 (x − µj − L1 )2
= √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj −  − L1 ) (−µj +  − L1 )
(exp{− } − exp{− })dL1 −
Z  2 2
1 1 (x − µj )2 (µj + )2 (µj − )2
exp{− }(exp{− } − exp{− })dx
π − 1 + σL2 2(1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )
= (A − B) − (C − D),

where

∞ Z 
L21 (x − µj − L1 )2
Z
1 1
A = √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj −  − L1 )
exp{− }dL1
2
1  (x − µj )2 + (µj + )2 σL2 (x − 2µj − )
Z
1
= exp{− } exp{ }dx,
2(1 + 2σL2 )
p
π − 1 + 2σL2 2

22
∞ Z 
L21 (x − µj − L1 )2
Z
1 1
B = √ exp{− 2 } exp{− }dx
π −∞ 2πσL 2σL − 2
(−µj +  − L1 )
exp{− }dL1
Z  2
1 1 (x − µj )2 + (µj − )2 σL2 (x − 2µj + )
= exp{− } exp{ }dx,
2(1 + 2σL2 )
p
π − 1 + 2σL2 2


(x − µj )2 (µj + )2
Z
1 1
C= exp{− } exp{− }dx,
− π 1 + σL2 2(1 + σL2 ) 2(1 + σL2 )


(x − µj )2 (µj − )2
Z
1 1
D= exp{− } exp{− }dx.
− π 1 + σL2 2(1 + σL2 ) 2(1 + σL2 )

Z 
∂h(µi , µj ) 1 1
= p
∂µj 2π − (1 + 2σL2 )
(1 + σL2 )( + µi )2 + (1 + σL2 )(x − µj )2 + 2σL2 ( + µi )(x − µj )
exp{− }dx −
2(1 + 2σL2 )
Z 
1 1
p
2π − (1 + 2σL2 )
(1 + σL2 )( − µi )2 + (1 + σL2 )(x + µj )2 + 2σL2 ( − µi )(x + µj )
exp{− }dx +
2(1 + 2σL2 )
Z 
1 1 (x − µj )2 ( − µi )2
exp{− } exp{− }dx −
2π − (1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )
Z 
1 1 (x − µj )2 ( + µi )2
exp{− } exp{− }dx.
2π − (1 + σL2 ) 2(1 + σL2 ) 2(1 + σL2 )

Furthermore, it can be shown that (A − B) − (C − D) = 0 when µj = 0. Therefore,


∂g(µj ) ∂h(µi ,µj )
|
∂µj µj =0
= 0. It can also be shown that ∂µj
|µi =0,µj =0 = 0. These results imply

that the first-order partial derivatives of V ar(f (L1 ; µ1 , · · · , µm )) at the target design

(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T are zero.

(ii) Derive the second-order partial derivatives and Hessian matrix of

23
V ar(f (L1 ; µ1 , · · · , µm )) at the target design


∂ 2 g(µj ) x2 + 2 σ 2 (x − ) 2(x − )
Z
1
2
|µj =0 = exp{− } exp{ L } dx +
∂µj π − 2 2(1 + 2σL ) (1 + 2σL2 )3/2
2


x2 + 2
Z
1 2
exp{− }dx
π − (1 + σL2 )2 2(1 + σL2 )
1 
Z
= s(x)ψ(x)dx, (A-6)
π −

where

σL4 (x2 + 2 ) + 2(σL2 + σL4 )x 2(x − ) 2


s(x) = exp{− 2 2
} 2 3/2
+ ,
2(1 + 2σL )(1 + σL ) (1 + 2σL ) (1 + σL2 )2

x2 + 2
ψ(x) = exp{− }.
2(1 + σL2 )

Furthermore,

∂ 2 h(µi , µj ) 1  (1 + σL2 )(x2 + 2 ) + 2σL2 x (σL2 x + (1 + σL2 ))


Z
|µi =0,µj =0 = − exp{− } dx +
∂µ2j π − 2(1 + 2σL2 ) (1 + 2σL2 )3/2
1  x2 + 2
Z
2
exp{− }dx, (A-7)
π − (1 + σL2 )2 2(1 + σL2 )

and

∂ 2 h(µi , µj ) 1 2
|µi =0,µj =0 = (exp{− } − exp{−2 }). (A-8)
1 + 2σL2
p
∂µi ∂µj π (1 + 2σL2 )

Using the results in (A-6),(A-7), and (A-8), the Hessian matrix for V ar(f (L1 ; µ1 , · · · , µm ))

at µi = 0, i = 1, · · · , m is:  
 a c c ··· c 
 

 c a c ··· c 
 
··· c  , (A-9)
 

 c c a 
 .. .. .. . . .. 


 . . . . . 
 
c c c ··· a

24
where
1 ∂g(µj ) 2(m − 1) ∂h(µi , µj )
a= 2
|µ =0 + |µi =0,µj =0 ,
m2 ∂µj j
m2 ∂µ2j

2 2
c= p (exp{− 2
} − exp{−2 }).
2
π (1 + 2σL )m2 1 + 2σ L

The eigenvalues of the Hessian matrix are λ1 = a+(m−1)c, λ2 = · · · = λm = a−c.

According to the properties of a Hessian matrix, if λ1 > 0 and λ2 < 0, then (0, · · · , 0)T

is a saddle point for V ar(f (L1 ; µ1 , · · · , µm )); if λ1 < 0 and λ2 < 0, then (0, · · · , 0)T is

a local maximum point for V ar(f (L1 ; µ1 , · · · , µm )). Next, we will derive a sufficient

condition for λ2 = a − c < 0.

(iii) Derive the condition on 


x2
Note that e−x ≥ 1 − x and e−x ≤ 1 − x + 2
for all x ≥ 0. Substituting all

exponential functions, we have

22 1
a−c ≤ (30a1 + 20a2 2 + 7a3 4 ),
15mπ (1 + 2σL2 ) 25 (1 + σL2 )4

where
5
a1 = −(1 + 2σL2 )(1 + σL2 )5 + (1 + 2σL2 ) 2 (1 + σL2 )2 ,

5 3 3
a2 = −(1 + 2σL2 ) 2 (1 + σL2 ) + ( σL4 + 2σL2 + 1 + )(1 + σL2 )4 ,
2 8m
5
a3 = (1 + 2σL2 ) 2 .

Since σL2 > 0, it is obvious that a1 < 0, a2 > 0, a3 > 0. To make λ2 = a − c < 0,

we must ensure that 30a1 + 20a2 2 + 7a3 4 < 0. Thus, a sufficient condition for

30a1 + 20a2 2 + 7a3 4 < 0 is

p
2 −20a2 + 400a22 − 840a1 a3
 ≤ . (A-10)
14a3

25
By replacing all notations back, we have

p
2 −20a2 + 400a22 − 840a1 a3
ˇ ≤ , (A-11)
14a3

where
5
a1 = −(1 + 2σ̃L2 )(1 + σ̃L2 )5 + (1 + 2σ̃L2 ) 2 (1 + σ̃L2 )2 ,

5 3 3
a2 = −(1 + 2σ̃L2 ) 2 (1 + σ̃L2 ) + ( σ̃L4 + 2σL2 + 1 + )(1 + σ̃L2 )4 ,
2 8m
5
a3 = (1 + 2σ̃L2 ) 2 .

1
Also note that ˜2 = ˇ2 ,
2 
1+σ̃L
then (A-11) becomes

p
2 −20a2 + 400a22 − 840a1 a3
˜ ≤ , (A-12)
14a3 (1 + σ̃ 2 )

which is the condition in Theorem 2.

Proof of Corollary 2.1:

The proof of Theorem 2 shows that the Hessian matrix in (A-9) at the target

design has one eigenvalue λ1 = a+(m−1)c and m−1 other identical eigenvalues, i.e.,

λ2 = · · · = λm = a−c. Furthermore, it can be derived that the eigenvector associated

with λ1 is e1 = √1 (1, · · · , 1)T . Let e2 , · · · , em be the eigenvectors associated with


m

eigenvalues λ2 , · · · , λm , respectively. Then, any vector of unit length orthogonal to


m m
τi2 = 1. Let
P P
e1 can be expressed as x = τi ei , where τi , i = 2, · · · , m satisfy
i=2 i=2
T
µ = (µ1 , · · · , µm ) be a design in a small neighborhood of the target design along the

direction of x. Then, we can write the Taylor’s expansion of V ar(f (L1 ; µ1 , · · · , µm ))

with respect to the target design as:

V ar(f (L1 ; µ1 , · · · , µm )) ≈ V ar(f (L1 ; 0, · · · , 0)) +


m
X ∂V ar(f (L1 ; µ1 , · · · , µm ))
µi · |µ=0 +
i=1
∂µi
µT Hµ (A-13)

26
Here, H is the Hessian matrix in (A-9). In (A-13), the first-order term is zero because
∂V ar(f (L1 ;µ1 ,··· ,µm ))
the ∂µi
|µ=0 = 0 according to the derivation of Theorem 2. The second-

order term can be further derived as follows:

Xm m
X
µT Hµ = ||µ||2 ( τi ei )T H τi ei
i=2 i=2
m
X
= ||µ||2 τi2 eTi Hei = ||µ||2 (a − c) < 0. (A-14)
i=2

m
where ||µ||2 = µ2i . Therefore, V ar(f (L1 ; µ1 , · · · , µm )) < V ar(f (L1 ; 0, · · · , 0)).
P
i=1
This proves (i) in Corollary 2.1. Moreover, because the m − 1 eigenvalues are the

same, this naturally implies that (ii) in Corollary 2.1 holds.

Appendix B

Corner Lot Generation According to the Optimal Design

We would like to discuss practical aspects related to how corner lots will be

generated according to the optimal design. After the product division receives a

recommended optimal design parameter, δ ∗ , the engineers need to manipulate the

process recipe so as to produce half of the wafers with average chip performance (e.g.,

circuit frequency) equal to µ0 +δ ∗ and the other half equal to µ0 −δ ∗ in each corner lot.

Recall that µ0 is an engineering-defined extreme value of the performance parameter

for purpose of product characterization and design evaluation. Semiconductor is a

mature manufacturing process in the sense that there are well-established empirical

and physical models (Gray et al., 2009) for guiding the manipulation of recipes to

achieve desired average chip performance. Take circuit frequency as an example.

To achieve a desired average circuit frequency for the chips on a wafer, e.g., to

make µ = µ0 + δ ∗ , there is a well-known empirical model that links µ with the

N-type and P-type MOSFET device currents, IDN and IDP , i.e., µ = f (IDN , IDP ).
∗ ∗
Using this model, we can identify the specific IDN and IDP that help achieve the

desired µ0 + δ ∗ . Next, to decide what process parameter settings can lead to the IDN


and IDP , two well-known physical models exist, i.e., IDN = µn Cox W
L
(VGS − VT N )2

27
and IDP = µp Cox W
L
(VSG − VT P )2 . µn and µp are electron (N) and hole (P) carrier

mobilities. L is the transistor length. VT N and VT P are threshold voltages of the N-

type and P-type transistors. In theory, all these process parameters can be modulated
∗ ∗
to achieve the desired IDN and IDP . In practice, some may be easier to modulate

than others and which process parameter(s) to modulate for each particular product is

known from process design. There are also detailed recipes/guidance on how to adjust

the process parameters. For example, adjustment on L can be achieved by adjusting

exposure energy in lithography or the etch time in plasma etch. Adjustment on

VT N and VT P can be achieved by adjusting the implant dose in ion implantation. In

summary, a combination of mature empirical/physical models and process/product

design knowledge exists to make sure average chip performance in each corner lot can

be achieved as recommended by the optimal design. This can also be achieved with

high precision, except when the amount of adjustment needed on certain equipment is

so fine that it is even smaller than the smallest adjustment that is physically possible.

However, such fine adjustment is rarely needed in practice. Despite the high precision

of recipe manipulation, it would still be of great practical interest to study how small

deviation in the recipe from the desire level would impact the level of achievement on

the optimal criteria, which is a future research direction we would like to pursue.

References

[1] Automotive Electronics Council, Component Technical Committee, Guideline for


Characterization of Integrated Circuits, February 18, 2013.

[2] Brent, R.P., Algorithms for minimization without derivatives, Dover publications,
New York (2013).

[3] Bao, L., Wang, K., Jin, R., A hierarchical model for characterising spatial wafer
variations, International Journal for Production Research, Vol. 52, No. 6, 1827-1842
(2014).

[4] Diebold, A.C., Handbook of Silicon Semiconductor Metrology, CRC Press (2001).

[5] Drain, David C., Statistical Methods for Industiral Process Control, CRC Press
(1997).

28
[6] Gough, A.M., Semiconductor Sample Generation Experimental Designs Robust
to Random Process Shocks, Technical report (2014).

[7] Gray, Paul R., Hurst, Paul J., Lewis, Stephen H., Meyer, Robert G., Analysis and
Design of Analog Integrated Circuits, fifth edition, Wiley (2009).

[8] Fenner, J.S., Jeong, Y.S., Jeong, M.K., Lu, J.C. A Bayesian parallel site
methodology with an application to uniformity modeling in semiconductor
manufacturing, IIE Transactions, Vol. 41, Issue 9, 754-763 (2009).

[9] Jensen, C.R. , Variance Component Calculations: Common Methods and


Misapplications in the Semiconductor Industry, Quality Engineering, Vol. 14, Issue
4, 647-657 (2002).

[10] Jin, R. and Shi, J., Reconfigured piecewise linear regression tree for
multistagemanufacturing process control, IIE Transactions, Vol. 44, Issue 4, 249-
261 (2012).

[11] Jin, R., Liu, K., Multimode Variation Modeling and Process Monitoring for
Serial-Parallel Multistage Manufacturing Processes, IIE Transactions, Vol. 45, Issue
6, 617-629 (2013).

[12] Jin, M., Tsung, F., Smith-EWMA run-to-run control schemes for a process with
measurement delay, IIE Transactions, Vol. 41, Issue 4, 346-358 (2009).

[13] May, Gary S., Costas J. Spanos, Fundamentals of semiconductor manufacturing


and process control, John Wiley & Sons, Inc. (2006).

[14] Montgomery, D.C., Introduction to Statistical Quality Control, 7th Edition, John
Wiley & Sons, Inc. (2012).

[15] Montgomery, D.C., Design and Analysis of Experiments, 8th Edition, John Wiley
& Sons, Inc. (2012).

[16] Myers, R.H., Khuri, A.I., Vining G., Response surface alternatives to the Taguchi
robust parameter design approach. The American Statistician, The American
Statistician, Vol. 46, Issue 9, 131-139 (1992).

[17] Myers, R.H., Montgomery, D.C., Anderson-Cook, C.M., Response surface


methodology: process and product optimization using designed experiments, John
Wiley & Sons (2016).

[18] Nakagawa, O.S., Chang, N., Lin. S., et al., Circuit impact and skew-corner
analysis of stochastic process variation in global interconnect, IEEE International
Conference in Interconnect Technology, 230-232 (1999).

[19] Reda, S., Nassif, S.R., Accurate Spatial Estimation and Decomposition
Techniques for Variability Characterization, IEEE Transactions on Semiconductor
Manufactoring, Vol. 23, No. 3, 345-357 (2010).

[20] Tseng, S.T., Tsung, F., Liu, P,Y., Variable EWMA run-to-run controller for
drifted processes, IIE Transactions, Vol. 39, Issue 3, 291-301 (2007).

29
[21] Tseng, S.T., Jou, B.Y., Liao, C.H., Adaptive variable EWMA controller for
drifted processes, IIE Transactions, Vol. 42, Issue 4, 247-259 (2010).

[22] Wang, C.H., Kuo, W., Bensmail, H., Detection and classification of defect
patterns on semiconductor wafers, IIE Transactions, Vol. 38, Issue 12, 1059-1068
(2006).

[23] Wang, D.T., McNall, W.A., Statistical Model based ASIC Skew Selection
Method, IEEE Workshop on Microelectronics and Electron Devices, 64-66 (2004).

[24] Wang, K., Han, K., A batch-based run-to-run process control scheme for
semiconductor manufacturing, IIE Transactions, Vol. 45, Issue 6, 658-669 (2013).

[25] Weste, Neil H.E., Harris, D.M., CMOS VLSI Design: A Circuits and Systems
Perspective, 4th Edition, Addison-Wesley (2011).

[26] Yashchin, Emmanuel, Monitoring Variance Components, Technometrics, Vol. 36,


No.4, 379-393 (1994).

[27] Yeh, Authur B., Huwang, L., Wu, C.W., A multivariate EWMA control chart
for monitoring process variability with individual observations, IIE Transactions,
Vol. 37, Issue 11, 1023-1035 (2005).

[28] Yuan, T., Kuo, W., A model-based clustering approach to the recognition
of the spatial defect patterns produced during semiconductor fabrication, IIE
Transactions, Vol. 40, Issue 2, 93-101 (2007).

[29] Yu, J., Qin, S.J., Variance component analysis based fault diagnosis of multi-
layer overlay lithography processes, IIE Transactions, Vol. 41, Issue 9, 764-775
(2009).

[30] Zou, C., Tsung, F., Wang, Z., Monitoring General Linear Profiles Using
Multivariate Exponentially Weighted Moving Average Schemes, Technometrics,
Vol. 49, No.4 (2007).

[31] Zou, C., Tsung, F., Wang, Z., Monitoring Profiles Based on Nonparametric
Regression Methods, Technometrics, Vol. 50, No. 4, 512-526 (2008).

30
Table 1: The optimal design parameter δ̃ (i)∗ (point estimate and 95% confidence interval)
under the maximum-single-lot-probability criterion for different combinations of ˜ and σ̃L2
(i.e., different products).

Table 2: The probability for a single lot to consist of at least 0.15 proportion of skew chips
under the optimal design in Table 1, i.e., P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) (point estimate and 95%
confidence interval).

31
Table 3: The percentage of improvement of the optimal design compared with the target
design, i.e., [P (f (L̃1 ; δ̃ (i)∗ ) > 0.15) − P (f (L̃1 ; 0) > 0.15)]/P (f (L̃1 ; 0) > 0.15), with p value
indicating statistical significance of the improvement.

Table 4: The optimal design parameter δ̃ (ii)∗ (point estimate and 95% confidence interval)
under the minimum-expected-cost criterion for different combinations of ˜ and σ̃L2 (i.e.,
different products).

32
Table 5: The expected production cost of corner lots under the optimal design in Table 4,
i.e., EL̃1 (M (δ̃ (ii)∗ , 0.15)) (point estimate and 95% CI).

Table 6: The percentage of reduction in expected production cost of the optimal design
compared with the target design, i.e.,
[EL̃1 (M (0, 0.15)) − EL̃1 (M (δ̃ (ii)∗ , 0.15))]/EL̃1 (M (0, 0.15)), with p value indicating
statistical significance of the reduction.

33
Figure 1: Histogram of f (L1 ; µ1 , · · · , µm ) under the target design
(µ1 , · · · , µm )T = (µ0 , · · · , µ0 )T .
2
m = 24, σW = 2, σM2 = 3, σ 2 = 6,  = 0.5σ, µ = µ
L 0 nominal + 3σ.

Figure 2: Histogram of f (L1 ; µ1 , · · · , µm ) under an optimal design, i.e., the design under
the maximum-single-lot-probability criterion proposed in Section 4.
(µ1 , · · · , µm )T = (µ∗1 , · · · , µ∗m )T ,
m = 24, σW = 2, σM = 3, σL2 = 6,  = 0.5σ, µ0 = µnominal + 3σ, µ∗i = µ0 − 3.89 for
2 2

i = 1, · · · , 12, µ∗i = µ0 + 3.89 for i = 13, · · · , 24.

34
Figure 3: Steps of the optimal design search algorithm.

35

You might also like