0% found this document useful (0 votes)

129 views15 pages

A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance

Uploaded by

Nguyen Van Toan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views15 pages

A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance

Uploaded by

Nguyen Van Toan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

194 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO.

1, JANUARY 2011

A 45 nm Resilient Microprocessor Core for Dynamic

Variation Tolerance
Keith A. Bowman, Member, IEEE, James W. Tschanz, Member, IEEE, Shih-Lien L. Lu, Senior Member, IEEE,
Paolo A. Aseron, Muhammad M. Khellah, Member, IEEE, Arijit Raychowdhury, Member, IEEE,
Bibiche M. Geuskens, Member, IEEE, Carlos Tokunaga, Member, IEEE, Chris B. Wilkerson,
Tanay Karnik, Senior Member, IEEE, and Vivek K. De, Senior Member, IEEE

Abstract—A 45 nm microprocessor core integrates resilient I. INTRODUCTION

error-detection and recovery circuits to mitigate the clock fre-
quency (FCLK ) guardbands for dynamic parameter variations
to improve throughput and energy efficiency. The core supports
two distinct error-detection designs, allowing a direct comparison
of the relative trade-offs. The first design embeds error-detec-
V ARIABILITY in device and circuit parameters adversely
affects the performance and energy efficiency of micro-
processors across all market segments, ranging from the small
tion sequential (EDS) circuits in critical paths to detect late embedded core in a system-on-chip (SoC) to large multi-core
timing transitions. In addition to reducing the FCLK guardbands servers. A dynamic parameter variation occurs in time during
for dynamic variations, the embedded EDS design can exploit
path-activation rates to operate the microprocessor faster than
the microprocessor operation, resulting from environmental
infrequently-activated critical paths. The second error-detection and workload changes. Examples of dynamic variations in-
design offers a less-intrusive approach for dynamic timing-error clude supply voltage droops, temperature changes, and
detection by placing a tunable replica circuit (TRC) per pipeline transistor aging degradation. droops result from abrupt
stage to monitor worst-case delays. Although the TRCs require changes in switching activity, inducing large current transients
a delay guardband to ensure the TRC delay is always slower
than critical-path delays, the TRC design captures most of the
in the power delivery system. The droop magnitude and
benefits from the embedded EDS design with less implementation duration depend on the interaction of capacitive and inductive
overhead. Furthermore, while core min-delay constraints limit parasitics at the board, package, and die levels with changes
the potential benefits of the embedded EDS design, a salient in current demand [1]. droops contain high-frequency
advantage of the TRC design is the ability to detect a wider range (i.e., fast changing) and low-frequency (i.e., slow changing)
of dynamic delay variation, as demonstrated through low supply
voltage (VCC ) measurements. Both error-detection designs inter-
components and occur locally and globally across the die.
face with error-recovery techniques, enabling the detection and Temperature variations occur at a relatively slow time scale
correction of timing errors from fast-changing variations such as with local hot spots on the die, depending on environmental and
high-frequency VCC droops. workload conditions as well as the heat-removal capability of
The microprocessor core also supports two separate error-re- the package. Transistor aging slowly degrades the drive current
covery techniques to guarantee correct execution even if dynamic
variations persist. The first technique requires clock control to re-
over time as a function of gate bias and temperature conditions.
play errant instructions at 1 2FCLK . In comparison, the second Conventional microprocessor designs build in clock frequency
technique is a new multiple-issue instruction replay design that guardbands to ensure correct functionality within
corrects errant instructions with a lower performance penalty and the presence of worst-case dynamic variations. Consequently,
without requiring clock control. Silicon measurements demon- these inflexible designs cannot exploit the opportunities for
strate that resilient circuits enable a 41% throughput gain at
equal energy or a 22% energy reduction at equal throughput, as
higher performance by increasing or lower energy by
compared to a conventional design when executing a benchmark reducing during favorable operating conditions and lack
program with a 10% VCC droop. In addition, the microprocessor of aging degradation. Since most systems usually operate at
includes a new adaptive clock control circuit that interfaces with nominal conditions where worst-case scenarios rarely occur,
the resilient circuits and a phase-locked loop (PLL) to track the necessary guardbands for these infrequent dynamic varia-
recovery cycles and adapt to persistent errors by dynamically
changing FCLK for maximum efficiency.
tions severely limit the performance and energy efficiency of
conventional designs.
Index Terms—Resilient microprocessor, resilient design, re- On-die variation sensors coupled with adaptive circuit tech-
silient circuit, dynamic variation, timing error, error detection,
error-detection sequential circuit, tunable replica circuit, error niques have been demonstrated to adjust , or body
correction, error recovery, multiple-issue instruction replay, vari- bias in response to slow-changing , temperature, and aging
ation tolerance, adaptive circuit, adaptive clocking. variations [2]–[4]. Since these designs require time to detect
and respond to dynamic variations to avoid actual timing vio-
Manuscript received July 28, 2010; revised October 01, 2010; accepted Oc- lations, these circuit techniques reduce the guardbands
tober 14, 2010. Date of publication December 03, 2010; date of current version for slow-changing global variations, resulting in higher average
December 27, 2010. This paper was approved by Associate Editor Bevan Baas. . Alternatively, the average benefits may be con-
The authors are with Intel Corporation, Hillsboro, OR 97124 USA (e-mail:
keith.a.bowman@intel.com). verted to lower average energy by decreasing . A disad-
Digital Object Identifier 10.1109/JSSC.2010.2089657 vantage of on-die sensors and adaptive circuits is the inability
to respond to fast-changing variations such as high-frequency
droops or local path-level variations. Thus, guardbands for
fast-changing variations are still required.
0018-9200/$26.00 © 2010 IEEE
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 195

In contrast to sensors and adaptive circuits that avoid timing

errors, a resilient design contains error-detection and recovery
capabilities [5]–[12] to maintain correct system functionality
while in the presence of internal errors. Resilient circuits enable
the microprocessor to operate at a higher as compared
to a conventional design. When a dynamic parameter variation
induces a timing error, the resilient circuits detect and correct
the error. The key advantage of a resilient design over sensors
and adaptive circuits is the relaxed response-time constraint. As
long as the resilient design prevents the timing error from cor-
rupting the architectural state of the microprocessor, error cor-
rection can occur over multiple clock cycles. Thus, resilient cir-
cuits detect and correct timing errors from both fast- and slow- Fig. 1. Resilient microprocessor block diagram. Error-detection circuits are
changing variations. Although error correction requires addi- integrated into the first five pipeline stages. Errors are pipelined to the write-
tional clock cycles, the gains from mitigating the back (WB) stage to invalidate errant instructions and to the error-control unit
for recovery. Adaptive clock control monitors the recovery rate to dynamically
guardbands for infrequent dynamic variations far outweigh the change clock frequency (F ) during a persistent variation.
recovery overhead, resulting in higher overall throughput [11].
Alternatively, the throughput benefit can be traded-off for lower
energy by reducing . The disadvantage of resiliency is the
design complexity of the error-detection and recovery circuits.
This paper presents a 45 nm resilient microprocessor core
that mitigates the guardbands for dynamic variations
to maximize throughput or energy efficiency [13]. Section II
provides an overview of the microprocessor design, including
the resilient core and the adaptive clock control to dynam-
ically adjust based on the operating environment for
maximum efficiency. Sections III and IV describe two sepa- Fig. 2. Resilient microprocessor micrograph and characteristics.
rate error-detection designs and two separate error-recovery
techniques, respectively, allowing a direct comparison of the
relative trade-offs. Section V gives a detailed description of the 32-bit, RISC, in-order pipeline is modified to incorporate
the design methodology for integrating the error-detection and resiliency features. As described in Fig. 1, the seven-stage
recovery circuits into the microprocessor. Section VI provides pipeline consists of instruction fetch (IF), decode (DE), register
the testing infrastructure for compiling and executing bench- access (RA), execute (EX), memory (MEM), exception (X),
mark programs. Section VII presents the silicon measurements, and write-back (WB) stages. The core only supports integer
highlighting the advantages and disadvantages of the two operations since the floating-point unit (FPU) and hardware
error-detection designs and the two error-recovery techniques. multiplier are omitted. Data is written into the register file at
Section VIII concludes by summarizing the key results and the WB stage, and the register file is accessed at the RA stage.
insights. Data cache writes, which occur in the MEM stage, are locally
buffered for one cycle to ensure the instruction is valid before
II. MICROPROCESSOR DESIGN OVERVIEW committing the write.
The microprocessor implementation allows a comparison of The modified core integrates resilient error-detection and re-
resilient and conventional designs, including an analysis of recovery features. Error-detection circuits protect the first five
siliency overheads and silicon measurements of throughput and pipeline stages (IF, DE, RA, EX, and MEM) by detecting late
energy while executing benchmark programs. As illustrated in timing transitions. As discussed further in Section V, the X and
Fig. 1, the research microprocessor consists of a resilient core WB stages are designed with additional timing guardband to en-
with an error-control unit (ECU), a 16 KB instruction cache, sure dynamic-variation timing failures do not occur in these two
a 16 KB data cache, a register file (RF), and a clock gener- stages. If a dynamic variation induces a timing failure in any of
ator with adaptive clock control. The microprocessor also con- the first five pipeline stages, the error-detection circuits iden-
tains on-die noise injectors to induce droop events. Mi- tify the error and generate a single pipeline-error signal (e.g.,
croprocessor features are programmed through an IEEE 1149.1 for the DE stage). This error signal is pipelined to the
JTAG scan controller. Subsections A–E describe each of these WB stage to invalidate the errant instruction and to the ECU
components. The micrograph and characteristics are given in for error recovery. At the WB stage, control logic also prevents
Fig. 2. The microprocessor is manufactured in a 45 nm logic subsequent instructions from corrupting the architectural state.
technology [14] on a 4.4 3.1 mm packaged die. The scan-programmable ECU implements two distinct error-re-
covery techniques based on replaying the errant instruction. If
A. Resilient Core Design the errant instruction executes correctly during the replay, the
The research microprocessor core is based on the instruction commits data to the architectural state, and then sub-
open-source, synthesizable, LEON-3 design [15], where sequent instructions continue normal operation.
196 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

B. Instruction and Data Caches and Register File scan programmable. This adaptive clock controller enables the
The instruction and data caches are 16 KB each. The cache microprocessor to adjust to the operating environment to maxi-
memory cell is based on the 8T L1 cache cell from the 45 nm mize throughput.
Intel® Core™ i7 microprocessor [16]. The register file contains
D. Noise Injector Circuits
40 entries and three ports, supporting one write and two read
operations per cycle. The core and memory structures are on Programmable noise injector circuits are inserted at multiple
separate supply voltages with level shifters in between to en- locations in the microprocessor to generate droops as ob-
sure the minimum operating voltage for the caches and served during the normal operation of a larger microprocessor.
register file does not limit the core functionality at low voltages. Noise injector circuits consist of scan-configurable current-sink
In addition, separate supply voltages allow independent testing transistors activated by an external noise clock. By program-
of the core and memory. ming the external noise clock and the number of current-sink
transistors, the timing of the droop as well as the droop
C. Clock Generator with Adaptive Clock Control magnitude and frequency are well-controlled. An on-die dy-
The clock generator contains a phase-locked loop (PLL) namic variation monitor [18] provides a cycle-accurate mea-
based on the PLL in the 45 nm Intel® Core™ i7 microprocessor surement of the droop magnitude and frequency to guide the
[17]. In order to quickly generate the signal for one noise injector settings.
of the error-recovery techniques, a clock-divider circuit skips
E. JTAG Scan Controller
alternate clock pulses to reduce in half without requiring
PLL relock [11]. The ECU enables either the full-frequency An on-die JTAG scan controller coordinates the program-
or half-frequency clock signal. This clock signal then drives ming of all scan-enabled features in the microprocessor.
a scan-tunable duty-cycle control circuit [11] to maintain a Separate JTAG scan chains are provided for the core, the in-
constant high-phase delay for the core clock at both high and struction and data caches, and the clock generator. The on-die
low values, providing min-delay protection for the em- JTAG controller interfaces with a host computer, which runs
bedded error-detection sequential circuits as discussed further custom testing software written in Perl and C programming
in Section III.A. languages. The testing software and JTAG scan controller load
An adaptive clock control circuit interfaces between the ECU the binary for the C-compiled benchmarks and the program
and the PLL to track recovery cycles and adjust the PLL divide input data into the instruction and data caches, respectively,
ratio to dynamically adapt to slow-changing variations. for execution. During program execution, data is written to the
The adaptive clock controller consists of two counters and a fi- register file and data cache. After the program completes, the
nite-state machine. The counters are based on a cascaded de- register file and data cache contents are scanned out through the
sign to ensure path delays are not timing critical. The adaptive JTAG scan controller and testing software to validate program
clock controller receives two inputs from the ECU: (i) Replay functionality.
signal and (ii) Half-frequency signal. When replaying an errant
instruction, the replay signal is logically-high for the duration of III. ERROR-DETECTION CIRCUITS
the replay. If the ECU replays an errant instruction at , This section presents two separate designs for timing-error
the half-frequency signal is also logically-high for the duration detection: (i) Embedded error-detection sequential (EDS) cir-
of the replay. The adaptive clock controller counts the number cuit and (ii) Tunable replica circuit (TRC). Both error-detec-
of replay cycles by incrementing a counter for every clock cycle tion designs interface with error recovery to mitigate the
that the replay signal is logically-high. If the half-frequency guardbands for dynamic variations. Scan bits can mask EDS
signal is also logically-high, this replay counter increments by and/or TRC error signals, allowing separate testing of either
two, which accounts for the 2X replay penalty for each half-fre- technique and for a conventional design without error detection.
quency clock cycle. The replay counter accumulates the number
of recovery cycles over a programmable sampling period (e.g., A. Embedded Error-Detection Sequential (EDS) Circuit
1 ms), which is monitored with a separate counter. The adap- Fig. 3 describes the concept of timing-error detection for dy-
tive clock controller compares the output of the replay counter namic variation tolerance. Fig. 3(a) represents a conventional
to a set of programmable thresholds. As discussed further in design, consisting of a critical path with driving and receiving
Section VII, an optimum recovery rate exists to maximize the flip-flops (FF). In Fig. 3(b), conceptual timing diagrams illus-
throughput of a resilient design. The upper and lower thresholds trate the arrival times of the input data (D) to the receiving FF
are based on the optimum recovery rate. The adaptive clock con- during worst-case dynamic variations and nominal conditions.
troller only initiates an change if the number of recovery Within the presence of worst-case dynamic variations, the input
cycles either exceeds the upper threshold or remains below the data to the receiving FF must arrive a setup time prior to the
lower threshold for two consecutive sampling periods, which rising clock edge to guarantee correct functionality. In com-
provides a long-duration measurement of the current operating parison, the input data for the same path arrives much earlier
environment. If an change is desired, the adaptive clock during nominal conditions. The difference between the input
controller changes the PLL divide ratio. After the PLL relocks to data arrival times for these two cases represents the effective
the new divide ratio, the adaptive clock controller monitors the timing guardband required for dynamic variations. A resilient
recovery cycles at the new value to ensure optimal oper- design is created by replacing the receiving FF of the conven-
ation. Maximum, minimum, and intermediate values are tional design with an EDS circuit [11] as described in Fig. 3(c).
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 197

Fig. 3. (a) Conventional design and (b) conceptual timing diagrams for worst-case dynamic variations and nominal conditions. (c) Resilient design employing
a double-sampling with time-borrowing (DSTB) error-detection sequential (EDS) circuit and (d) conceptual timing diagram for late arriving input data. For the
resilient design, CLK is duty-cycle controlled to satisfy min-delay requirements.

In Fig. 3(d), a conceptual timing diagram illustrates the EDS cir- detection window as illustrated in Fig. 3. The purpose of the
cuit operation when the input data arrives late. The EDS circuit transparency window in the datapath latch is to eliminate data-
in Fig. 3(c) is a double-sampling with a time-borrowing latch path metastability while detecting timing errors. When input
(DSTB) design [11]. The shadow FF and datapath latch sample data arrives late, the DSTB design generates an error signal
the input data on the rising and falling clock edges, respectively. even though the input data traverses to the latch output. The
An XOR logic gate compares the latch and FF outputs to gen- error signal ensures that late arriving data from the path in the
erate the error signal (ERROR). If the input data transitions late current pipeline stage does not affect the maximum path delay
as described in Fig. 3(d), latch and FF outputs differ, resulting in (max-delay) constraint for adjoining fan-out paths in subsequent
a logically-high error signal. The error signals from each EDS pipeline stages. If ample max-delay margin is available for the
circuit in a pipeline stage are inputs to an OR logic tree to gen- adjoining paths in the subsequent pipeline stage, then a pulsed
erate the single pipeline-error signal. The EDS circuit only de- latch may replace the DSTB EDS circuit at the current pipeline
tects late timing transitions during the high clock phase. During stage. This would enable traditional time borrowing between the
the low clock phase, latch and FF outputs remain at constant path in the current pipeline stage and the adjoining paths in the
logic values. The propagation delay through the OR logic tree subsequent pipeline stage.
is designed less than the delay of the low clock phase to guar- For the DSTB EDS circuit, the high clock phase defines the
antee proper error detection. As described in Section II.A, the error-detection window as illustrated in Fig. 3(d). The
pipeline-error signal propagates to the WB stage to invalidate max-delay constraint within the presence of worst-case dynamic
the errant instruction and to the ECU for error recovery. By de- conditions for max-delay is defined as
tecting and correcting late arriving data, the resilient design re-
duces the timing guardband for infrequent dynamic variations, (1)
enabling a higher as compared to a conventional design.
is the maximum path delay, including the clock-to-output
A critical issue for some previous EDS circuits [5]–[9] is the
delay of the driving sequential circuit and the clock skew
susceptibility to datapath metastability when the input data ar-
and jitter delays, is the cycle time , and
rives close to a rising clock edge, resulting in the possibility of
is the datapath latch setup time based on the falling
undetected errors. For the DSTB EDS circuit, the datapath latch
clock edge. The minimum path delay (min-delay) constraint
operates as a pulsed latch, thus eliminating datapath metasta-
during worst-case dynamic conditions for min-delay is calcu-
bility during a rising clock edge. Although datapath metasta-
lated as
bility is removed, the shadow FF output can become metastable
during a rising clock edge. In contrast to the datapath, the error (2)
path does not fan-out and behaves similar to a traditional syn-
chronizer circuit, thus drastically simplifying the metastability is the minimum path delay, accounting for the clock-to-
problem. For a microprocessor design, the mean time between output delay of the driving sequential circuit and the clock skew
failures (MTBF) from error-signal metastability is over ten or- and jitter delays, and is the hold time based on the
ders of magnitude larger than the MTBF targets for radiation-in- falling clock edge. The max-delay and min-delay constraints in
duced soft errors [11]. (1)–(2) only apply to paths with an EDS circuit as the receiving
Although the DSTB design employs a datapath latch, path sequential circuit. For a target , min-delay requirements are
timing constraints are still based on a FF design with an error- satisfied in pre-silicon design by buffer insertion and sizing. As
198 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

increases, the number of buffers increases, leading to larger

area and power. From (1) and (2), the fundamental trade-off
in the DSTB EDS circuit is max-delay versus min-delay. As
increases, may decrease to enable a higher
while satisfying the max-delay constraint in (1) at the cost of a
larger min-delay penalty in (2). For microprocessors with deep
pipelines (i.e., small number of logic stages between sequen-
tial circuits), this trade-off may not be advantageous due to the
stringent min-delay requirements. In recent technology genera-
tions, however, the microarchitecture for microprocessors has
moved towards shallow pipelines (i.e., large number of logic
stages between sequential circuits) to improve energy efficiency
Fig. 4. Implementation of the error-detection sequential (EDS) circuit. Scan-
[19], [20]. Microprocessors with shallow pipelines greatly relax configured mode signal enables either an EDS circuit (mode = 0), where the
the min-delay requirements as compared to a deep-pipeline de- initial datapath latch remains transparent, or a traditional master-slave flip-flop
sign, enabling a more effective trade-off of max-delay improve- (mode = 1), where ERROR is ignored.
ment for min-delay penalty.
To ensure protection from min-delay violations, the high
clock phase (i.e., ) is tuned at post-silicon with a duty-cycle ical or near critical timing are converted to the configurable EDS
control circuit. The duty-cycle control circuit maintains a circuits in Fig. 4. In comparison to a FF, the DSTB EDS cir-
constant high-phase delay for the clock at low and high cuit provides an error-detection capability at a cost in power and
values. The duty cycle is calibrated at the highest and area. These overheads are quantified at the microprocessor level
lowest temperature specifications to provide a worst-case in Section V.
measurement for min-delay. At these conditions, the maximum
error-detection window is measured via functional B. Tunable Replica Circuit (TRC)
testing (i.e., running programs). As a trade-off to reduce the In comparison to the embedded EDS design, the tunable
calibration time at the cost of less potential benefits, the replica circuit (TRC) design is a less-intrusive error-detection
tuning can target a delay equal to minus a guard- technique [12] that does not affect critical-path timing. As
band, where the guardband satisfies min-delay constraints described in Fig. 5(a), the TRC consists of a toggle FF and a
across die-to-die and within-die process variations. scan-configurable buffer delay chain. The toggle FF switches
In previous work, a variety of EDS circuits have been pro- the input to the buffer delay chain every cycle. The TRC
posed [5]–[11]. The transition detector with time borrowing output drives an EDS circuit to detect timing failures due to
(TDTB) is the lowest clock energy EDS circuit known [11]. dynamic variations. As illustrated in Fig. 5(b), a TRC with an
The TDTB circuit, however, is a complex design since the EDS circuit is placed adjacent to each pipeline stage in the
dynamic transition detector is sensitive to within-die process first five stages. At test time, the TRC delays are calibrated to
variations. DSTB is the lowest clock energy static-CMOS EDS track critical-path delays per pipeline stage. The TRC and the
circuit known [11]. In comparison to TDTB, DSTB allows for pipeline stage use the same local and clock, enabling the
a simpler implementation at the cost of higher clocking energy. TRC to detect droops at fine granularity and to capture
Both TDTB and DSTB eliminate datapath metastability [11]. clock-to-data correlations per pipeline stage [12]. If a dynamic
For these reasons, the DSTB circuit is chosen as the embedded variation induces a late timing transition in the TRC, the EDS
EDS circuit for the resilient microprocessor core. circuit generates an error signal, which represents the single
Fig. 4 provides the schematic of the actual EDS circuit im- pipeline-error signal as discussed in Section II.A. Although an
plementation, where a scan-enabled latch precedes the datapath actual timing error may not have occurred in the pipeline if the
latch in the DSTB design from Fig. 3(c). If the scan-input mode critical paths are not activated, this design inherently assumes a
signal is logically-low, the initial latch remains transparent and critical-path error did occur and initiates recovery. As described
the circuit logically operates as the DSTB design in Fig. 3(c). A for the embedded EDS circuits, the single pipeline-error signal
logically-high mode signal disables the EDS circuit, where the propagates to the WB stage to prevent the potentially errant
error signal is ignored and the two datapath latches behave as instruction from committing data to the architectural state and
a standard master-slave FF. Since the two datapath latches are to the ECU to enable recovery.
designed with an equal CLK-to-Q delay and setup time as com- The key insight of the TRC design is the integration with
pared to a standard FF library cell with equivalent output drive error recovery. Previous designs with on-die sensors and adap-
strength, the configurable design in Fig. 4 allows for a direct tive circuits (e.g., canary-based designs) [2]–[4] must detect the
comparison between a resilient design with embedded EDS cir- path-delay change, communicate the delay change to the adap-
cuits and a conventional design with FFs. Moreover, the mode tive circuit, and respond by adjusting the operating environment
signal assists silicon debug where groups of EDS circuits per (e.g., or ) to avoid an actual timing error. The com-
pipeline stage are either enabled or disabled for critical-path munication and response-time constraints prohibit these designs
analysis. All non-critical paths in the core and memory con- from detecting and responding to a sudden increase in path delay
troller use standard FF library cells as the receiving sequential due to fast-changing variations such as a high-frequency
circuit. Only the receiving sequential circuits for paths with crit- droop. On-die sensors and adaptive circuits primarily reduce the
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 199

TABLE I
ADVANTAGES AND DISADVANTAGES OF EDS AND TRC
ERROR-DETECTION DESIGNS

Fig. 5. (a) Tunable replica circuit (TRC) with an EDS circuit. (b) TRC design and consequently, less potential benefits. TRC settings are
integrates error recovery to detect and correct timing errors from fast-changing
variations such as high-frequency voltage droops.
validated with functional testing while injecting droops
as described in Section II.D. Since transistor delay is more
sensitive to than interconnect delay, the TRC contains
guardband for slow-changing variations only, where an
a minimum amount of interconnect to ensure the TRC delay
guardband for fast-changing variations is still required. In
degradation from a droop is either larger or nearly equal
contrast, the TRC with error recovery eliminates the communi-
to the delay degradation of critical paths in the core. Therefore,
cation and response-time constraints imposed on canary-based
the TRC delay, which is adjusted slower than the critical-path
techniques, thus mitigating the guardbands for both fast-
delays at nominal , should remain slower than the crit-
and slow-changing variations. From this perspective, the TRC
ical-path delays during a droop. By calibrating the TRC
calibration only requires the TRC to always fail if any critical
delay at the highest temperature specification, the TRC remains
path fails in the pipeline due to a dynamic variation. In guaran-
slower than the critical paths as temperature reduces, where the
teeing this constraint, the TRC delay is tuned slower than the
interconnect delay improves faster than the transistor delay at a
critical-path delays, hence replacing large delay guardbands for
of 1.0 V. From silicon measurements at 1.0 V, the TRC
dynamic variations with a much smaller TRC delay guardband.
frequency change tracks the microprocessor change to
The TRC delays are calibrated at nominal and the
within 0.5% from 90 C to 30 C. At the cost of additional test
highest temperature specification. At these conditions with the
time, the guardband between the TRC delay and critical-path
TRCs disabled, the microprocessor maximum clock frequency
delay is further reduced by repeating the calibration steps with
is measured via functional testing. Next, the core
a higher calibration while continuing to validate the TRC
executes a no-operation (NOP) program at an slightly
settings with functional testing.
less than , which is referred as the calibration .
Scan bit settings enable one TRC and disable the other four
C. Advantages and Disadvantages of EDS and TRC Designs
TRCs in the core. As illustrated in Fig. 1, the core error signal,
which interfaces with the WB pipeline stage and the ECU, is Table I lists the key trade-offs between the embedded EDS
driven off-chip. By observing the core error signal, the TRC and TRC designs. The EDS design detects critical-path timing
delay is tuned to the corresponding cycle time . The failures for fast and slow as well as long-range and local
delay calibration is then repeated for each TRC in the core. dynamic variations. In contrast, the TRC design cannot detect
Separately tuning each TRC mitigates the delay variations path-specific or highly-localized dynamic variations (e.g., delay
between critical paths and the TRC due to within-die process push-out from cross-coupling capacitance or multiple-inputs
variations [21] and allows the TRC to detect droops at fine switching). Although transistor aging degradation affects the
granularity and capture clock-to-data correlations per pipeline individual transistors in a path depending on the gate voltage
stage. For microprocessor designs with individual pipeline and temperature conditions, a separate DC-stressed TRC with
stages spread over a large area, additional TRCs would improve a periodically-toggled input can track the worst-case delay of
the accuracy of monitoring critical-path delays at a cost of aging and recovery for critical paths and clocks, while capturing
longer calibration time. As a trade-off to reduce the calibration the effects of power cycling and sleep modes [12]. As discussed
time, only the last TRC in the pipeline (i.e., TRC in MEM earlier, the TRC requires a delay guardband to ensure the TRC
stage) requires tuning while disabling the other four TRCs delay is always slower than critical-path delays, thus preventing
during operation, resulting in a larger TRC delay guardband, the possibility of exploiting path-activation rates for higher
200 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 6. Multiple-issue (MI) instruction replay example with N =3 N

: After flushing the pipeline, issue the errant instruction times;N 01 issues are replica
N
instructions to setup pipeline registers and do not affect the architecture state; th issue is a valid instruction and is allowed to commit data to the architectural
state.

performance as with embedded EDS circuits. Furthermore, A. Instruction Replay at

the TRC design may initiate an error recovery when an actual In this error-recovery design, reduces in half while re-
error did not occur, resulting in unnecessary recovery cycles. playing the errant instruction [10], [11]. As illustrated in Fig. 1
In comparison to the EDS design, the TRC design significantly and described in Section II.C, the PLL drives a clock-divider
reduces the design complexity overhead. In particular, the TRC circuit to generate the signal. When the ECU initiates
design does not affect the min-delay paths in the core, has lower an error recovery, the ECU signals the clock generator to reduce
clocking energy, and does not require a duty-cycle control cir- in half, while the duty-cycle control circuit maintains a
cuit. Moreover, since the core min-delay constraints limit the constant high-phase delay for the clock to provide min-delay
error-detection window for the EDS design, and consequently, protection for the embedded EDS circuits. This design allows
the maximum potential benefits as described in (1)–(2), the fast clock control without requiring PLL relock. As described
TRC design provides a larger error-detection window to detect earlier, the ECU flushes the pipeline and then reissues the errant
a wider range of dynamic delay variation. Both designs require instruction. Reducing in half ensures the replayed instruc-
post-silicon calibration, which affects testing costs. tion executes correctly even if dynamic variations persist. After
the replayed instruction finishes, the ECU signals the clock gen-
IV. ERROR-RECOVERY TECHNIQUES erator to resume at the target . Since is halved for
all of the recovery cycles, the number of actual and effective re-
This section describes two separate techniques for error recovery cycles per error is 14 and 28, respectively.
covery: (i) Instruction replay at and (ii) Multiple-issue
instruction replay at . The core issues an instruction and B. Multiple-Issue Instruction Replay at
the corresponding program counter (PC) value at the IF pipeline The motivation for the multiple-issue instruction replay
stage. The PC value then propagates down the pipeline with design is to guarantee correct execution for the replayed in-
the instruction. The ECU locally stores the PC of an errant struction without changing . As illustrated in the example
instruction to perform error recovery. Since the original core in Fig. 6, the instruction replay starts after the detected error
pipeline already sends the PC to most of the pipeline stages reaches the WB stage. After flushing the pipeline, the ECU is-
for exception handling, the additional overhead for propagating sues the errant instruction multiple times without changing
the PC to the remaining pipeline stages is low. Both error-re- . The first issues are replica instructions, which
covery techniques replay errant instructions, which is similar to do not affect the architecture state. The th issue is a valid
the approach for recovering from a branch misprediction. The instruction, which is allowed to commit data to the architectural
error-detection circuits prevent the errant instruction from cor- state. The replica instructions flow through the pipeline to setup
rupting the architectural state of the microprocessor. Prior to the register input nodes for the valid instruction. Any error that
replaying the errant instruction, the ECU initially flushes the occurs in the execution of these replica instructions is ignored
pipeline to resolve any complex bypass register issues. After and if the number of replica instructions is sufficient, the reg-
flushing the pipeline, the ECU reissues the errant instruction. If ister inputs for each pipeline stage statically settle to the correct
the replayed instruction executes without an error, control logic value, allowing the valid instruction to execute correctly. For
allows the instruction to commit data to the architectural state, the example in Fig. 6, the case of corresponds to two
and then subsequent instructions continue normal operation. If replica instructions and one valid instruction. The number of
an error occurs for the replayed instruction, then the ECU re- recovery cycles equals . If the ECU issues an insufficient
plays the errant instruction again. The error-recovery design and number of replica instructions such that an error occurs during
corresponding algorithm settings are programmed in the ECU the execution of the valid instruction, then the ECU replays the
through scan. When testing a conventional design without error errant instruction a second time with (i.e., seven replica
correction, ECU scan bits disable the error-recovery circuits. instructions and one valid instruction). With , the number
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 201

Fig. 7. Design methodology for integrating the resilient error-detection and correction circuits into a standard microprocessor synthesis flow.

of replica instructions equals the number of pipeline stages to replaces the receiving FF for these critical paths to detect poten-
ensure the register inputs for each pipeline stage are set to the tial timing errors from dynamic variations. Non-critical paths
appropriate value, thus guaranteeing correct execution of the have sufficient timing margin and should not limit performance
valid instruction. even with worst-case dynamic variations. An EDS circuit does
In implementing the multiple-issue replay, an additional bit not replace the receiving FF for non-critical paths. This second
is added to the microprocessor pipeline to denote whether an step is unnecessary when only considering the less-intrusive
instruction is allowed to commit data to the architectural state. TRC error-detection design. Rather, post-silicon tuning guaran-
The ECU sets this bit to a logic-low for all replica instructions tees that the TRC detects any critical-path error in the pipeline.
and to a logic-high for the valid instruction. Since this error- As described in Fig. 7, these two additional steps are inserted
recovery design relies on setting up path nodes, this technique into a standard register-transfer-level (RTL) to layout synthesis
is directly applicable to static-CMOS circuit designs and would flow. The flow consists of RTL synthesis, timing analysis, and
not correct timing errors in dynamic logic circuits. automatic place and route (APR) with extraction and timing
convergence. The design methodology starts with the structural
V. DESIGN METHODOLOGY RTL of the microprocessor core, consisting of both VHDL and
The integration of resilient error-detection and correction cir- Verilog code. Next, the FFs in the core are manually separated
cuits into a microprocessor core requires two additional steps into two lists: (i) Receiving FFs for recoverable paths and (ii)
beyond the typical design flow. First, the design is separated Receiving FFs for unrecoverable paths. The RTL is then up-
into two categories: (i) Recoverable circuits and (ii) Unrecover- dated with these two lists of FFs. As described earlier, additional
able circuits. Error recovery for some paths in the design is too timing margin is enforced on receiving FFs for unrecoverable
expensive to implement. For these unrecoverable circuits, extra paths to ensure correct timing even in the presence of dynamic
timing margin is added during design and timing analysis to pre- variations. These FFs map to a unique timing model, which
vent these circuits from being susceptible to dynamic-variation contains extra setup-time margin as illustrated in Fig. 8(a). At
timing errors. For the error-detection designs in Section III, ex- this point in the design flow, the receiving FFs for recoverable
amples of unrecoverable circuits for the core pipeline in Fig. 1 paths use standard library FFs as provided in Fig. 8(b). The up-
include any operations in the X or WB pipeline stages. When an dated RTL is run through the synthesis and timing analysis flow,
error occurs in the core pipeline, the resilient design must pre- including the physical compiler for floor-plan generation. The
vent the erroneous data from corrupting the architectural state of FFs are appropriately sized during synthesis for timing analysis
the microprocessor. As described in Section III and illustrated while maintaining the distinction between recoverable and unre-
in Fig. 3(d), the timing-error detection for a path in a given clock coverable paths. Static-timing analysis generates a timing report
cycle occurs during the next cycle. As an example, an error in specifying all of the critical paths.
the DE stage is not identified until the corresponding errant in- After timing analysis, the timing report specifies the min-
struction has already started execution in the RA stage. Thus, the imum timing margin for each receiving FF to separate the recov-
error-detection latency prevents these circuits from protecting erable paths into critical and non-critical. The non-critical re-
the X or WB stages since an error in either of these stages would ceiving FFs should not limit performance even under worst-case
be identified after erroneous data had already started writing to variations, so these sequential circuits remain as standard library
the register file. For this reason, the X and WB stages are de- FFs. Next, EDS circuits replace the critical receiving FFs. In
signed with additional timing margin to ensure dynamic-varia- assigning the EDS circuits, the critical FFs are separated into
tion timing failures do not occur in these two stages. From the three timing buckets in order of criticality. Bucket A repre-
original core design, the paths in these two stages are not timing sents the most timing-critical FFs in the design. Bucket B FFs
critical, resulting in a low overhead for applying the extra timing contain better timing margins than bucket A, although these
guardband. paths could potentially fail under worst-case dynamic varia-
Second, the recoverable circuits are further subdivided into tions. In addition, path reordering after APR and manufacturing
critical and non-critical paths for the embedded EDS circuits. could result in these sequential circuits becoming more critical.
After timing analysis, paths with the least timing margin are Bucket C FFs are significantly less critical than the FFs in either
classified as critical, and consequently, could limit the core per- bucket A or bucket B. Since these paths could potentially limit
formance under worst-case dynamic variations. An EDS circuit performance under the most severe dynamic-variation events,
202 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 8. Illustration of timing constraints for various sequential designs. (a) Unrecoverable paths apply an additional setup-time margin on the standard flip-flop.
(b) Recoverable non-critical paths retain standard flip-flop timing. (c) Recoverable critical paths insert EDS circuits as the receiving sequential circuit with the
setup-time margin based on the shadow FF and the hold-time margin based on the error-detection window.

bucket C FFs provide a safety option to ensure critical-path cov- TABLE II

erage. For each pipeline stage, each EDS circuit in a particular MICROPROCESSOR CORE AREA AND POWER OVERHEADS WITH V =10V :
FOR EDS AND TRC DESIGNS
timing bucket receives the same scan-enable mode signal as de-
scribed in Fig. 4. As an example, the same scan bit enables
all bucket A EDS circuits in the EX stage. With five pipeline
stages containing embedded EDS circuits with three bucket op-
tions, the microprocessor core supports fifteen scan bits for en-
abling/disabling EDS circuits. Grouping the EDS circuits into
multiple buckets enhances the observability during speed-path
silicon debug by isolating a failing path to a particular pipeline
stage and timing bucket. Moreover, this approach provides an
option for disabling EDS circuits in post-silicon in case of a
min-delay violation.
The RTL is now updated again with the new EDS circuit as- erally places the TRC with an EDS circuit close to the desired
signments. Although EDS circuits contain a datapath latch, the pipeline stage to reduce error-signal routing. When needed, the
setup-time margin is based on the shadow FF. Since the trans- APR flow provides an option for placing the TRCs at fixed co-
parency window of the latch defines the error-detection window, ordinates in the design.
traditional time-borrowing is not allowed and FF-based timing Table II lists the area and power overheads for the resilient
is maintained as discussed in Section III.A and illustrated in error-detection and correction circuits. For the embedded EDS
Fig. 8(c). The EDS circuit requires a longer hold-time margin design, 12% of the core sequential circuits are converted to
based on the target error-detection window. The target error-de- EDS circuits, resulting in a 2.2% area penalty. The area and
tection window is designed as a specific fraction of the target power overheads for satisfying min-delay violations with EDS
cycle time, which determines the maximum potential benefits circuits are small due to the shallow-pipeline architecture.
for the EDS design. The updated RTL is re-synthesized to re- These overheads account for EDS circuits in all three timing
size logic gates in both critical and non-critical paths to mini- buckets. The area penalty for the TRC design is 0.8%. The total
mize power for specific cycle-time and error-detection-window area overheads for EDS and TRC designs are 3.8% and 2.2%,
targets. After running timing analysis again, the timing report is respectively, including a 1.4% area increase for the ECU and
verified to ensure every unrecoverable path contains sufficient clock control. The total power overhead is less than 1% for the
max-delay margin, every recoverable critical path is assigned EDS or TRC designs. The area and power overheads for the
an EDS circuit, and min-delay margins are satisfied. If there is a ECU and clock control circuits are expected to amortize further
discrepancy in the timing report, this portion of the design flow for a larger core design. Although a power overhead exists
is repeated. Once the design is validated with the timing report, when comparing a resilient design to a conventional design at
the standard APR flow is performed. equal and , the resilient design enables significant
In comparison to the EDS design, the TRC design would only performance or energy efficiency benefits from mitigating
require the first step of separating the core into recoverable and guardbands for dynamic variations as discussed further
unrecoverable circuits. For the TRC design only, every unre- in Section VII.
coverable path must contain sufficient max-delay margin, while
all recoverable paths (i.e., non-critical and critical) would use VI. TESTING METHODOLOGY
standard library FFs and min-delay constraints would remain The 45 nm resilient microprocessor mounts on a 478-pin
identical to a conventional design. The APR design flow gen- flip-chip ball-grid-array (FC-BGA) package, which is socketed
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 203

TABLE III
THREE BENCHMARKS TO MEASURE THE BENEFITS OF THE RESILIENT MICROPROCESSOR CORE

Fig. 10. Demonstration of resilient microprocessor while executing the

edgedetect benchmark at an F of 1.5 GHz and a V of 1.0 V. (a) Input
bitmap image. (b) Correct output of edge-detected image with resilient circuits
enabled. (c) Output of edge-detected image when resilient circuits are disabled
halfway through the image processing.
Fig. 9. Resilient microprocessor die package and testing board.

the majority of the measurement results in Section VII, pos-

in a custom testing board as shown in Fig. 9. The testing sesses the property that almost any processing error in the core
board interfaces with a logic analyzer and an oscilloscope for produces an error in the final output image, thus reducing the
silicon debug as well as a host computer with C and Perl-based probability of error masking.
testing software that communicates with the on-die JTAG In Fig. 10, the resilient microprocessor demonstrates the
scan controller for configuration and program execution. After ability to detect and correct timing errors while executing the
compiling programs, the software and JTAG scan controller edgedetect benchmark in conjunction with I/O code to send and
load the binary into the instruction cache and the input data receive images over a universal serial bus (USB) connection
into the data cache. After resetting the microprocessor, pro- to a host computer. Fig. 10(a) shows the original input bitmap
gram execution starts from the reset address. During program image. The microprocessor is set at an of 1.5 GHz and
execution, data writes to the register file and data cache. After a of 1.0 V. When the resilient EDS and multiple-issue
the program finishes, the contents of the register file and data replay circuits are enabled, the edgedetect program executes
cache are scanned-out via JTAG scan and testing software correctly to generate the expected output image as provided in
to verify proper functionality. In addition, general-purpose Fig. 10(b). During this measurement, an error counter monitors
memory-mapped input and output data ports support external the number of corrected errors as the resilient core detects
device control and visibility into program execution. and corrects more than one million errors per second while
Since the resilient microprocessor core is based on a public- maintaining 100% correct output. In Fig. 10(c), the resilient
domain design [15], assemblers and compilers are available. circuits are disabled halfway through the image processing,
The only additional compiler configurations are disabling the resulting in erroneous output for the bottom half of the image.
FPU and hardware multiplier instructions and setting the ap- Thus, the core resiliency features allow correct operation at an
propriate memory locations for the reset address, stack pointers, that is impossible for the conventional design.
and data locations. Original debug code is written in assembly
language to target specific features in the microprocessor. Ad- VII. MEASUREMENT RESULTS
vanced benchmark programs are written in the C programming The microprocessor core without error detection and cor-
language. rection (i.e., conventional design) executes the edgedetect
Although many different programs have been successfully benchmark at an of 1.45 GHz at 1.0 V and consumes
compiled and executed on the resilient microprocessor, the three 135 mW of power. When a dynamic parameter variation in the
benchmarks in Table III are evaluated to measure the benefits form of a 10% droop is injected during program execution,
of resilient design in Section VII. The three benchmarks con- the reduces to 1.26 GHz, corresponding to a normalized
sist of an edgedetect image-processing algorithm, a linkedlist throughput of one, as described in Fig. 11. As illustrated by
pointer-following sorting routine, and a bubble data-sorting pro- the shaded region in Fig. 11, the difference between these two
gram based on the bubble-sort algorithm. These three bench- values represents the guardband for a 10%
marks exercise a variety of common microprocessor instruc- droop in the conventional design. EDS and TRC designs are
tions. In particular, the edgedetect program, which is used for separately measured by enabling the appropriate error-detection
204 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 12. Measured throughput gain of EDS and TRC designs relative to a con-
ventional design for the applications in Table III.

Fig. 11. Measured throughput (TP), as normalized to the conventional max-

imum TP, and recovery cycles, as a percentage of total cycles, versus clock fre-
quency (F ) for the edgedetect benchmark.

circuit and the error-recovery technique that replays instructions

at . Throughput increases linearly as increases
with no errors. Once errors are detected and corrected with
either the EDS or TRC designs, instructions per cycle (IPC)
reduce as a function of the recovery rate. As increases, Fig. 13. Measured throughput gain of EDS and TRC designs relative to a con-
the number of timing errors increases, corresponding to a higher ventional design for the edgedetect benchmark versus supply voltage.
recovery rate. For the EDS design, throughput gains continue as
increases throughout the entire guardband region, of resilient circuits to improve throughput by reducing the
where the recovery rate remains low. Once increases be- impact of a high-frequency droop on . The inclusion
yond 1.45 GHz for the EDS design, timing failures occur even of additional dynamic-variation sources (e.g., temperature
at nominal conditions, resulting in a sharp increase in recovery change) would further increase the guardband for the
rate. Since the slowest paths are infrequently activated during conventional design, resulting in larger potential benefits for
the edgedetect benchmark, throughput continues to increase the resilient designs [11].
for higher values. The maximum normalized throughput In Fig. 12, the throughput benefits for EDS and TRC de-
of 1.16 corresponds to an optimal and recovery rate of signs relative to a conventional design are measured across the
1.46 GHz and 0.25%, respectively. Pushing beyond this three benchmarks in Table III at 1.0 V. Throughput gains for
optimum reduces throughput since the IPC reduction from a the EDS design range from 15% to 20%, demonstrating the dif-
larger recovery rate outweighs the gains. In Fig. 11, the ferent activation rates for critical paths among these three pro-
resilient EDS design enables a 16% throughput benefit over grams. Since the TRC cannot exploit path-activation rates, the
the conventional design by eliminating the guardband throughput benefit for the TRC design remains at 12% across
for a 10% droop and by exploiting the activation rates for all three benchmarks.
critical paths. In comparison, the resilient TRC design achieves Fig. 13 elucidates a key distinction between the EDS and
a 12% throughput gain at an of 1.42 GHz and recovery TRC designs. In Fig. 13, the throughput gain as compared to
rate of 0.15% by mitigating most of the guardband. a conventional design is measured for EDS and TRC designs
The TRC design provides a smaller throughput advantage than across while executing the edgedetect benchmark with a
the EDS design at 1.0 V for two reasons: (i) The TRC design 10% droop. For each , the clock duty cycle and TRC
requires a delay guardband to ensure the TRC always fails delays are calibrated for EDS and TRC designs, respectively, as
if any critical path in the pipeline stage fails; consequently described in Section III. As reduces, the path-delay sensi-
the slowest path limits the TRC performance. (ii) The TRC tivity to amplifies, resulting in a larger guardband
design results in unnecessary recovery cycles since a dynamic and higher potential benefits for resilient circuits. From Fig. 13,
variation may induce a TRC failure while an actual timing the EDS design throughput gain increases from 16% at 1.0 V to
error in the pipeline does not occur if the critical paths are 28% at 0.8 V and then saturates at 28% from 0.8 V to 0.6 V. Al-
not activated. These measurements demonstrate the ability though the EDS design provides a larger benefit than the TRC
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 205

Fig. 14. Measured average recovery cycles per error for the edgedetect benchmark for instruction replay at 1=2F and multiple-issue (MI) instruction replay
at F 0
with the number of issues (N ) ranging from 2 to 8 (N 1 replica instructions and 1 valid instruction).

design at 1.0 V, the core min-delay constraints limit the max-

imum error-detection window for EDS circuits, and the corre-
sponding maximum potential throughput gain, as described in
(1)–(2). At 0.8 V and 0.6 V, the dynamic delay variation ex-
ceeds the maximum error-detection window for EDS circuits.
Consequently, the EDS design can only mitigate a portion of the
guardband at 0.8 V and 0.6 V, resulting in a throughput
benefit based on the maximum error-detection window. Since
the throughput gain for the EDS design remains constant from
0.8 V to 0.6 V, the maximum error-detection window for EDS
circuits as a percentage of the minimum cycle time
for the conventional deign also remains constant. In contrast,
the core min-delay constraints do not limit the error-detection
window for the TRC design, allowing the TRC design to cap- Fig. 15. Measured total energy consumption versus throughput for the edgede-
ture a wider range of dynamic delay variation. From Fig. 13, tect benchmark.
the TRC design enables throughput benefits of 12%, 30%, and
51% at 1.0 V, 0.8 V, and 0.6 V, respectively, thus highlighting
the opportunity of providing larger benefits than the EDS de- the core pipeline demonstrates that issuing only one replica
sign at lower values. In Fig. 13 at 0.8 V and 0.6 V, the instruction incurs the least number of recovery cycles per
throughput gains for the EDS and TRC designs are primarily error, resulting in a 46% reduction as compared to replaying at
independent of the benchmark. As described in Figs. 11 and 12 . While the reduced number of recovery cycles per
at 1.0 V, the TRC design cannot benefit from infrequently acti- error improves performance, the salient advantage of the mul-
vated critical paths. For the EDS design at 0.8 V and 0.6 V, the tiple-issue instruction replay is correcting errant instructions
larger dynamic delay variation consumes the entire error-detec- without requiring clock control.
tion window for EDS circuits, thus preventing the possibility of In Fig. 15, the total energy to execute the edgedetect bench-
exploiting path-activation rates across different programs. mark with a 10% droop is measured at 0.6 V, 0.8 V, and
In comparing the two error-recovery techniques in 1.0 V, and then plotted across the measured throughput data
Section IV, the average number of recovery cycles per error in Fig. 13 for the same values. The change in total en-
is measured in Fig. 14 for the instruction replay at ergy and throughput correspond to a change in (i.e., higher
and the multiple-issue (MI) instruction replay at with total energy and throughput correspond to higher ). For a
the number of issues ranging from two to eight ( given , the EDS and TRC designs provide larger throughput
replica instructions and one valid instruction). As described in and smaller energy as compared to the conventional design.
Section IV.B for the multiple-issue replay design with a small The total energy reduction directly results from executing the
, an error may occur during the execution of the th issued program faster, which decreases leakage energy. In comparing
instruction if an insufficient number of replica instructions are the EDS and TRC designs to a conventional design, silicon
issued. In this scenario, the errant instruction is replayed a measurements demonstrate that resilient circuits enable a 41%
second time with to guarantee correct operation. Silicon throughput gain at equal energy or a 22% energy reduction at
measurements are collected while executing the edgedetect equal throughput.
benchmark with the EDS design, a 10% droop, a Persistent parameter variations, such as longer-term
of 1.0 V, and an of 1.46 GHz, which corresponds to the droops, temperature changes, or transistor aging, can result in
maximum throughput in Fig. 11. Measured performance for long bursts of timing errors that degrade throughput. For these
206 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 16. Demonstration of adaptive clock control to dynamically optimize clock frequency (F ) based on recovery cycles for maximum efficiency. Recovery
cycle count is accumulated and compared to a set of thresholds per sampling period. During a persistent variation, the recovery cycle count exceeds the upper
threshold for two consecutive sampling periods, resulting in a lower F for the duration of the variation.

types of dynamic variations, the core resiliency features guide embeds error-detection sequential (EDS) circuits into actual
an adaptive clock controller to dynamically change . critical paths to identify late timing transitions. In addition to
Silicon measurements demonstrate this capability in Fig. 16. reducing the guardbands for dynamic variations, the EDS
As discussed in Section II.C, counters in the adaptive clock design can enable the microprocessor to operate faster than
control circuit track the number of recovery cycles over a pro- infrequently-activated critical paths during nominal conditions.
grammable sampling period and compare to a set of thresholds. The second design places a tunable replica circuit (TRC) per
As highlighted in Fig. 11, the maximum throughput directly pipeline stage to monitor critical-path delays. Although the
corresponds to an optimum recovery rate, which determines TRCs require a delay guardband to ensure the TRC delay is al-
the upper and lower threshold values. After encountering the ways slower than critical-path delays, the TRC design captures
persistent variation in Fig. 16, the number of recovery cycles is most of the benefits from the embedded EDS design with less
greater than the upper threshold for two consecutive sampling implementation overhead. In contrast to the embedded EDS
periods. Consequently, the adaptive clock controller changes design where core min-delay constraints limit the error-detec-
the PLL divide ratio to reduce . After the PLL relocks, the tion window, and corresponding potential benefits, the TRC
adaptive clock controller monitors the recovery cycles at the design is independent of core min-delay constraints, resulting
lower value. During the persistent variation, the lower in a wider error-detection window and higher potential benefits
value is optimal for maximum throughput. After nominal as demonstrated with measurements at low supply voltages
conditions are restored, the number of recovery cycles is less . The combination of either error-detection design with
than the lower threshold for two consecutive sampling periods, error recovery enables the detection and correction of timing
resulting in an increase. errors from fast-changing variations (e.g., high-frequency
In a conventional design, changing the PLL divide ratio re- droops).
quires a pipeline hold while the PLL relocks. In contrast, the The microprocessor core also integrates two techniques for
resilient error-detection and recovery circuits allow the micro- error recovery to guarantee correct execution even if dynamic
processor core to continue operation during PLL relock, where variations persist. The first technique replays errant instructions
timing errors due to short clock cycles are detected and cor- at . In comparison, the second technique introduces a
rected. This requires a sufficiently large error-detection window multiple-issue instruction replay design to correct errant instruc-
for either EDS or TRC designs to detect timing violations from tions with a lower performance penalty and without requiring
dynamic variations and short cycles during PLL relock. Com- clock control. This recovery technique issues the errant instruc-
bining error-detection and recovery circuits with dynamic adap- tion multiple times. The first issues are replica in-
tation enables the microprocessor to adapt to the operating en- structions, which do not affect the architecture state. The th
vironment to deliver maximum efficiency. issue is a valid instruction, which is allowed to commit data to
the architectural state. The replica instructions setup the register
VIII. CONCLUSION input nodes for each pipeline stage, allowing the valid instruc-
A 45 nm microprocessor core employs resilient error-de- tion to execute correctly.
tection and recovery circuits to improve performance and A description of the design methodology for integrating the
energy efficiency by mitigating the clock frequency error-detection and recovery circuits into a microprocessor core
guardbands for dynamic parameter variations. The core inte- clarifies the necessary steps beyond a standard design flow. Fur-
grates two separate designs for error detection. The first design thermore, discussions of the post-silicon calibration for EDS
BOWMAN et al.: A 45 NM RESILIENT MICROPROCESSOR CORE FOR DYNAMIC VARIATION TOLERANCE 207

and TRC designs provide insight into the trade-off between po- [15] LEON-3 [Online]. Available: http://www.gaisler.com/cms/index.
tential benefits and testing cost. Silicon measurements from the php?option=com_content&task=section&id=4&Itemid=33
[16] R. Kumar and G. Hinton, “A family of 45 nm IA processors,” in IEEE
45 nm microprocessor demonstrate that resilient circuits enable ISSCC Dig. Tech. Papers, Feb. 2009, pp. 58–59.
a 41% throughput benefit at iso-energy or a 22% energy re- [17] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar,
duction at iso-throughput, as compared to a conventional de- “Next generation Intel® Core™ micro-architecture (Nehalem)
clocking,” IEEE J. Solid State Circuits, pp. 1121–1129, Apr. 2009.
sign when executing a benchmark program with a 10% [18] K. Bowman et al., “Dynamic variation monitor for measuring the im-
droop. In addition, the resilient circuits in the microprocessor pact of voltage droops on microprocessor clock frequency,” in Proc.
core guide a new adaptive clock control circuit that tracks re- IEEE CICC, Sep. 2010, no. 17-1.
[19] V. Srinivasan et al., “Optimizing pipelines for power and per-
covery cycles and adapts to persistent errors by changing . formance,” in Proc. IEEE/ACM Int. Symp. Microarchitecture
The combination of error-detection and recovery circuits with (MICRO-35), Nov. 2002, pp. 333–344.
dynamic adaptation enables the microprocessor to adapt to the [20] A. Hartstein and T. R. Puzak, “The optimum pipeline depth considering
both power and performance,” ACM Trans. Arch. Code Opt. (TACO),
operating environment to deliver maximum efficiency. pp. 369–388, Dec. 2004.
[21] K. A. Bowman, S. G. Duvall, and J. D. Meindl, “Impact of die-to-die
and within-die parameter fluctuations on the maximum clock frequency
distribution for gigascale integration,” IEEE J. Solid-State Circuits, pp.
ACKNOWLEDGMENT 183–190, Feb. 2002.

The authors express sincere appreciation to Ken Ikeda and

Pavan Karidi for mask design, Saurabh Dighe, Jason Howard,
Greg Ruhl, David Jenkins, and David Finan for design assis-
tance, Trang Nguyen for lab support, and Nitin Borkar and Greg Keith A. Bowman (S’97–M’02) received the B.S.
degree in electrical engineering from North Carolina
Taylor for encouragement and support. State University, Raleigh, NC, in 1994 and the M.S.
and Ph.D. degrees in electrical engineering from
the Georgia Institute of Technology, Atlanta, GA, in
1995 and 2001, respectively.
REFERENCES He is currently a Staff Research Scientist in the
Circuit Research Lab (CRL) at Intel Corporation,
Hillsboro, OR. From 2001 to 2004, he worked as a
[1] A. Muhtaroglu, G. Taylor, and T. R. Arabi, “On-die droop detector for Senior Computer-Aided Design (CAD) Engineer in
analog sensing of power supply noise,” IEEE J. Solid-State Circuits, the Technology-CAD Division at Intel, Hillsboro,
pp. 651–660, Apr. 2004. to develop and support statistical-based models, methodologies, and software
[2] T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella, “A 90-nm tools to predict microprocessor performance and power variability. Since
variable frequency clock system for a power-managed itanium archi- joining CRL in 2004, his research has focused on the development of circuit de-
tecture processor,” IEEE J. Solid-State Circuits, pp. 218–228, Jan. sign solutions to mitigate the impact of parameter variations on microprocessor
2006. performance and power. He has published over 50 technical papers in refereed
[3] R. McGowen et al., “Power and temperature control on a 90-nm ita- conferences and journals and presented 15 tutorials on variation-tolerant circuit
nium family processor,” IEEE J. Solid-State Circuits, pp. 229–237, Jan. designs.
2006.
[4] J. Tschanz et al., “Adaptive frequency and biasing techniques for tol-
erance to dynamic temperature-voltage variations and aging,” in IEEE
ISSCC Dig. Tech. Papers, Feb. 2007, pp. 292–293.
[5] P. Franco and E. J. McCluskey, “Delay testing of digital circuits by James W. Tschanz (M’99) received the B.S. degree
output waveform analysis,” in Proc. IEEE Int. Test Conf., Oct. 1991, in computer engineering and the M.S. degree in elec-
pp. 798–807. trical engineering from the University of Illinois at
[6] P. Franco and E. J. McCluskey, “On-line testing of digital circuits,” in Urbana-Champaign in 1997 and 1999, respectively.
Proc. IEEE VLSI Test Symp., Apr. 1994, pp. 167–173. Since 1999, he has been a circuits researcher
[7] M. Nicolaidis, “Time redundancy based soft-error tolerance to rescue with the Intel Circuit Research Lab, Hillsboro, OR.
nanometer technologies,” in Proc. IEEE VLSI Test Symp., Apr. 1999, He also taught VLSI design for seven years as an
pp. 86–94. adjunct faculty member at the Oregon Graduate
[8] D. Ernst et al., “Razor: A low-power pipeline based on circuit-level Institute, Beaverton, OR. His research interests in-
timing speculation,” in Proc. IEEE/ACM Int. Symp. Microarchitecture clude low-power digital circuits, design techniques,
(MICRO-36), Dec. 2003, pp. 7–18. and methods for tolerating parameter variations. He
[9] S. Das et al., “A self-tuning DVS processor using delay-error detection holds 41 issued patents in those areas.
and correction,” IEEE J. Solid-State Circuits, vol. , pp. 792––804, , Apr.
2006.
[10] S. Das et al., “Razor II: In situ error detection and correction for PVT
and SER tolerance,” IEEE J. Solid-State Circuits, pp. 32–48, Jan. 2009. Shih-Lien L. Lu (M’89–SM’10) received the B.S.
[11] K. A. Bowman et al., “Energy-efficient and metastability-immune re- degree in EECS from the University of California at
silient circuits for dynamic variation tolerance,” IEEE J. Solid-State Berkeley in 1980, and the M.S. and Ph.D. degrees in
Circuits, pp. 49–63, Jan. 2009. CSE from the University of California at Los Angeles
[12] J. Tschanz et al., “Tunable replica circuits and adaptive voltage-fre- (UCLA) in 1984 and 1991, respectively.
quency techniques for dynamic voltage, temperature, and aging vari- He worked on the MOSIS project at USC/ISI
ation tolerance,” in IEEE Symp. VLSI Circuits Dig., Jun. 2009, pp. which provides the research and education commu-
112–113. nity VLSI fabrication services from 1984 to 1991
[13] J. Tschanz et al., “A 45 nm resilient and adaptive microprocessor core and served on the faculty of the ECE Department at
for dynamic variation tolerance,” in IEEE ISSCC Dig. Tech. Papers, Oregon State University (OSU) from 1991 to 1999.
Feb. 2010, pp. 282–283. While at OSU, he received the College of Engi-
[14] K. Mistry et al., “A 45 nm logic technology with high-k+metal gate neering Carter Award for outstanding and inspirational teaching in 1995 and the
transistors, strained silicon, 9 Cu interconnect layers, 193 nm dry pat- College of Engineering Engelbrecht Young Faculty Award in 1996. Currently,
terning, and 100% Pb-free packaging,” in IEEE IEDM Tech. Dig., Dec. he is a Principal Researcher and leads a research group on microarchitecture
2007, pp. 247–250. in Intel Labs.
208 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Paolo A. Aseron received the B.S. degree in com- Carlos Tokunaga (S’98–M’08) received the B.S. de-
puter engineering from the University of the Philip- gree in electronics engineering from the University
pines in 2001. of Los Andes, Bogota, Colombia, in 2001, and the
He has been with Intel Labs, Hillsboro, OR, since M.S. and Ph.D. degrees in electrical engineering from
2006. Prior to joining Intel, he worked for Canon on the University of Michigan, Ann Arbor, in 2005 and
Systems-on-a-Chip platform development from 2001 2008, respectively.
to 2003. His interests include high-performance low- He is currently a Research Scientist at the Circuit
power architecture and circuits, memory, and inte- Research Lab, Intel, Hillsboro, OR. His research in-
grated power delivery. terests include VLSI design with particular emphasis
on energy-efficient resilient circuits and security
based circuit design.

Muhammad M. Khellah (M’99) received the Ph.D.

in electrical and computer engineering from the Uni- Chris B. Wilkerson graduated from Carnegie
versity of Waterloo, Ontario, Canada, in 1999. Mellon University with his masters in 1996.
He is a Research Scientist at Intel Labs, Hillsboro, He has published a number of papers on a
OR, where he does research on low-power circuits. number of microarchitectural topics including value
He first joined Intel in 1999 and was involved in prediction, branch prediction, cache organization,
the design of L1/L2 SRAM caches for the P3 and runahead, and advanced speculative execution.
P4 microprocessor products. He has published Recently, he has focused on low-power design
about 70 technical papers in refereed international including microarchitectural mechanisms to enable
conferences and journals and has 60 patents granted, low voltage operation for microprocessors.
and 10 pending.
Dr. Khellah is a regular reviewer for JSSC, TCAD, TVLSI, and TCAS I and
II. He currently serves on the technical program committees of the IEEE CICC
and the IEEE ISLPED.
Tanay Karnik (M’88–SM’04) received the Ph.D. in
computer engineering from the University of Illinois
at Urbana-Champaign in 1995.
Arijit Raychowdhury (S’00–M’07) received the He is a Principal Engineer and Program Director in
Ph.D. degree in electrical and computer engineering Intel Lab’s Academic Research Office. His research
from Purdue University, West Lafayette, IN, in 2007. interests are in the areas of variation tolerance, power
He is currently a research scientist in the Circuits delivery, soft errors and physical design. He has pub-
Research Lab, Intel Corporation, Hillsboro, OR. lished over 45 technical papers, and has 44 issued and
Previously he worked as an Analog Circuit Designer 33 pending patents in these areas. He received an Intel
with Texas Instruments Inc., India (2002 to 2003) Achievement Award for the pioneering work on inte-
and as a summer intern with Intel Corporation (2005 grated power delivery. He has presented several in-
and 2006). His research interests include low power vited talks and tutorials, and has served on five Ph.D. students’ committees.
and high performance digital circuit design, design Dr. Karnik was a member of ISSCC, DAC, ICCAD, ICICDT and ISQED
of on-chip sensors, and memory. program committees and JSSC, TCAD, TVLSI, TCAS review committees. He
Dr. Raychowdhury has received academic excellence awards in 1997, 2000, was the General Chair of ASQED’10, ISQED’08, ISQED’09 and ICICDT’08.
and 2001, the Meissner Fellowship from Purdue University in 2002, the Intel He is an ISQED Fellow and has been a Guest Editor for JSSC.
Ph.D. Fellowship Award in 2005, and the Dimitri N. Chorafas Award for the
best doctoral thesis in 2007. He received the Best Paper Awards at the IEEE
Nanotechnology Conference 2003, ISLPED 2006. He has served on the Tech-
nical Program Committee of ICCAD, VLSI Conference, and ISQED. Vivek K. De (SM’07) received the Bachelor’s degree
in electrical engineering from the Indian Institute of
Technology, Madras, India, in 1985 and the Master’s
degree in electrical engineering from Duke Univer-
Bibiche M. Geuskens (M’07) received the B.S. sity, Durham, NC, in 1986. He received the Ph.D. de-
degree in electrical engineering from the Vrije gree in electrical engineering from Rensselaer Poly-
Universiteit Brussel, Brussels, Belgium, in 1992 and technic Institute, Troy, NY, in 1992.
the M.S. and Ph.D. degrees in electrical engineering He is an Intel Fellow and Director of Circuit
from Rensselaer Polytechnic Institute, Troy, NY, in Technology Research in Intel Labs, Hillsboro, OR.
1993 and 1997, respectively. He joined Intel in 1996 as a staff engineer in the
In 2006, she joined the Circuit Research Lab Circuits Research Lab (CRL) in Hillsboro. Since
(CRL) at Intel Corporation, Hillsboro, OR, as a Staff that time he has led research teams in CRL focused on developing advanced
Research Scientist. From 1999 to 2006, she worked circuits and design techniques for low-power and high-performance processors.
as a Staff/Senior Component Design Engineer in the In his current role, he provides strategic direction for future circuit technologies
Memory Design Unit of the Microprocessor Design and is responsible for aligning CRL’s circuit research with technology scaling
Division at Intel, Hillsboro. She was responsible for the design, implementation challenges. Prior to joining Intel, he was engaged in semiconductor devices
and validation of numerous circuit blocks. Since joining CRL in 2006, her and circuits research at Rensselaer Polytechnic Institute and Georgia Institute
research has focused on the development of low power circuit design techniques of Technology, and was a visiting researcher at Texas Instruments.
for on-chip memories. Her current research interests include CMOS biosensor Dr. De has published 167 technical papers in refereed conferences and jour-
design applications and on-chip power delivery circuit solutions. nals, and 6 book chapters on low power circuits. He holds 154 patents, with 40
more patents filed (pending). He received an Intel Achievement Award for his
contributions to a novel integrated voltage regulator technology.

Lecture 5
No ratings yet
Lecture 5
82 pages
Design and Evaluation of Confidence-Driven Error-Resilient Systems
No ratings yet
Design and Evaluation of Confidence-Driven Error-Resilient Systems
11 pages
A 409 GOPS W Adaptive and Resilient Domino Register File2
No ratings yet
A 409 GOPS W Adaptive and Resilient Domino Register File2
13 pages
A New Sensitivity-Driven Process Variation Aware Low Power Self-Restoring SRAM Design
No ratings yet
A New Sensitivity-Driven Process Variation Aware Low Power Self-Restoring SRAM Design
6 pages
Fine-Grained Aging Prediction Based On The Monitoring of Run-Time Stress Using DFT Infrastructure
No ratings yet
Fine-Grained Aging Prediction Based On The Monitoring of Run-Time Stress Using DFT Infrastructure
35 pages
Low-Power Variation-Tolerant Design in Nanometer Silicon (Bhunia) (2010)
No ratings yet
Low-Power Variation-Tolerant Design in Nanometer Silicon (Bhunia) (2010)
456 pages
(Integrated Circuits and Systems) Masashi Horiguchi, Kiyoo Itoh (Auth.) - Nanoscale Memory Repair-Springer-Verlag New York (2011)
No ratings yet
(Integrated Circuits and Systems) Masashi Horiguchi, Kiyoo Itoh (Auth.) - Nanoscale Memory Repair-Springer-Verlag New York (2011)
226 pages
Prior Art 10
No ratings yet
Prior Art 10
2 pages
Layout Lec 02 Var Rel v01
No ratings yet
Layout Lec 02 Var Rel v01
31 pages
Jucs 24 12 1776 1799 Kokila
No ratings yet
Jucs 24 12 1776 1799 Kokila
24 pages
4612-Article Text-8827-1-10-20201230
No ratings yet
4612-Article Text-8827-1-10-20201230
18 pages
A Comprehensive Framework For Analysis of Time-Dependent Performance-Reliability Degradation of SRAM Cache Memory
No ratings yet
A Comprehensive Framework For Analysis of Time-Dependent Performance-Reliability Degradation of SRAM Cache Memory
14 pages
Built-In Fault-Tolerant Computing Paradigm For Resilient Large-Scale Chip Design
No ratings yet
Built-In Fault-Tolerant Computing Paradigm For Resilient Large-Scale Chip Design
318 pages
SSD: An Affordable Fault Tolerant Architecture For Superscalar Processors
No ratings yet
SSD: An Affordable Fault Tolerant Architecture For Superscalar Processors
8 pages
Isscc 2009 / Session 3 / Microprocessor Technologies / 3.1: 3.1 A 45nm 8-Core Enterprise Xeon Processor
No ratings yet
Isscc 2009 / Session 3 / Microprocessor Technologies / 3.1: 3.1 A 45nm 8-Core Enterprise Xeon Processor
2 pages
Razor Thesis
No ratings yet
Razor Thesis
13 pages
A Dual-Core RISC-V Vector Processor With On-Chip Fine-Grain Power Management in 28-nm FD-SOI
No ratings yet
A Dual-Core RISC-V Vector Processor With On-Chip Fine-Grain Power Management in 28-nm FD-SOI
5 pages
An Energy-Efficient Resilient Flip-Flop Circuit With Built-In Timing-Error Detection and Correction
No ratings yet
An Energy-Efficient Resilient Flip-Flop Circuit With Built-In Timing-Error Detection and Correction
4 pages
Bandgap Voltage Reference
No ratings yet
Bandgap Voltage Reference
157 pages
Embedded SRAM Stability Testing
No ratings yet
Embedded SRAM Stability Testing
205 pages
A Flexible Software-Based Framework For Online Detection of Hardware Defects
No ratings yet
A Flexible Software-Based Framework For Online Detection of Hardware Defects
17 pages
Microelectronics Reliability: A. Islam, Mohd. Hasan
No ratings yet
Microelectronics Reliability: A. Islam, Mohd. Hasan
6 pages
The Case For Lifetime Reliability-Aware Microprocessors: Jayanth Srinivasan, Sarita V. Adve Pradip Bose, Jude A. Rivers
No ratings yet
The Case For Lifetime Reliability-Aware Microprocessors: Jayanth Srinivasan, Sarita V. Adve Pradip Bose, Jude A. Rivers
12 pages
Test and Debug in Deep-Submicron Technologies
No ratings yet
Test and Debug in Deep-Submicron Technologies
8 pages
2017 Current Starved Ring VCO For 1GHz To 6GHz - Suhas
No ratings yet
2017 Current Starved Ring VCO For 1GHz To 6GHz - Suhas
8 pages
10-Bit 30-MSs SAR ADC Using A Switchback
No ratings yet
10-Bit 30-MSs SAR ADC Using A Switchback
5 pages
Switchback
No ratings yet
Switchback
5 pages
VLSI Sol
No ratings yet
VLSI Sol
23 pages
BulletProof: Defect-Tolerant CMP Switch
No ratings yet
BulletProof: Defect-Tolerant CMP Switch
12 pages
Design of An Ultra Low-Power Tunable Ring VCO in 65 NM CMOS Technology For 2.9-4.5 GHZ
No ratings yet
Design of An Ultra Low-Power Tunable Ring VCO in 65 NM CMOS Technology For 2.9-4.5 GHZ
4 pages
Lec 35
No ratings yet
Lec 35
34 pages
Energy Efficient CMOS Microprocessor Design
No ratings yet
Energy Efficient CMOS Microprocessor Design
10 pages
Time Base Circuit
No ratings yet
Time Base Circuit
12 pages
LDPC Decoder Energy Optimization
No ratings yet
LDPC Decoder Energy Optimization
15 pages
Ece425 L25
No ratings yet
Ece425 L25
21 pages
Understanding PVT, RC, and OCV Variations
No ratings yet
Understanding PVT, RC, and OCV Variations
5 pages
Radiation Testing of A Multiprocessor Macrosynchronized Lockstep Architecture With FreeRTOS
No ratings yet
Radiation Testing of A Multiprocessor Macrosynchronized Lockstep Architecture With FreeRTOS
8 pages
Implementation of High SNM SRAM Cell and Testing in 45 NM CMOS Logic Process 222
No ratings yet
Implementation of High SNM SRAM Cell and Testing in 45 NM CMOS Logic Process 222
5 pages
SRAM Design for 3nm Nanosheets
No ratings yet
SRAM Design for 3nm Nanosheets
4 pages
F S: A Fast, Configurable Memory-Reliability Simulator For Conventional and 3D-Stacked Systems
No ratings yet
F S: A Fast, Configurable Memory-Reliability Simulator For Conventional and 3D-Stacked Systems
24 pages
Chip Basics: Time, Area, Power, Reliability, Configurability
No ratings yet
Chip Basics: Time, Area, Power, Reliability, Configurability
46 pages
UCI Colloquium 121031 v7 Distributed
No ratings yet
UCI Colloquium 121031 v7 Distributed
60 pages
Prior Art 13
No ratings yet
Prior Art 13
3 pages
Proc. A. Reliab. and Maintainab. Symp
No ratings yet
Proc. A. Reliab. and Maintainab. Symp
1 page
An Energy-Aware Dynamic Scheduling Algorithm For Hard Real-Time Systems
No ratings yet
An Energy-Aware Dynamic Scheduling Algorithm For Hard Real-Time Systems
4 pages
Advanced VLSI Architecture Design For Emerging Digital Systems
No ratings yet
Advanced VLSI Architecture Design For Emerging Digital Systems
78 pages
DFT, DFM Tests Assure Quality Soc Design: by Martin Schrader
No ratings yet
DFT, DFM Tests Assure Quality Soc Design: by Martin Schrader
3 pages
Real-Time Penalties in RISC Processing: Steve Dropsho
No ratings yet
Real-Time Penalties in RISC Processing: Steve Dropsho
20 pages
Referen
No ratings yet
Referen
3 pages
04-Time Area Reliability
No ratings yet
04-Time Area Reliability
46 pages
Power-Gated 9T SRAM Cell For Low-Energy Operation
No ratings yet
Power-Gated 9T SRAM Cell For Low-Energy Operation
5 pages
Results and Conclusion
No ratings yet
Results and Conclusion
5 pages
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
No ratings yet
Design of Medium Grain Integrated Clock Gater For Low Power Clock Network
9 pages
Mohanty VLSI Integration 2012jan SRAM
No ratings yet
Mohanty VLSI Integration 2012jan SRAM
30 pages
A Reconfigurable 8T Ultra-Dynamic Voltage Scalable U-DVS SRAM in 65 NM CMOS
No ratings yet
A Reconfigurable 8T Ultra-Dynamic Voltage Scalable U-DVS SRAM in 65 NM CMOS
11 pages
Layout Theiss
No ratings yet
Layout Theiss
93 pages
Design of RISC Processor With DSP Applications: Guide: Dr. Suhasini S
No ratings yet
Design of RISC Processor With DSP Applications: Guide: Dr. Suhasini S
18 pages
OCVstinks MattWeber SLE
No ratings yet
OCVstinks MattWeber SLE
21 pages
Computing Jacobian and Hessian of Estimators and Their Application To Risk Approximation
No ratings yet
Computing Jacobian and Hessian of Estimators and Their Application To Risk Approximation
4 pages
Design Review Checklist: Hauw Suwito, Consultant
No ratings yet
Design Review Checklist: Hauw Suwito, Consultant
5 pages
Definability of Rough Approximations For Binary Relations and Cloud Computing
No ratings yet
Definability of Rough Approximations For Binary Relations and Cloud Computing
4 pages
DAC DEM Techniques Comparison
No ratings yet
DAC DEM Techniques Comparison
6 pages
Soc Design Methodology Soc Design Methodology
No ratings yet
Soc Design Methodology Soc Design Methodology
25 pages
TrustZone SoC Design with Vivado
No ratings yet
TrustZone SoC Design with Vivado
28 pages
P Practical Dynamic Element Matching Techniques For 3-Level Unit Elements
No ratings yet
P Practical Dynamic Element Matching Techniques For 3-Level Unit Elements
87 pages
Knowledge Approximations and Representations in Binary Granular Computing
No ratings yet
Knowledge Approximations and Representations in Binary Granular Computing
5 pages
LAB: Cell-Based Training LAB
No ratings yet
LAB: Cell-Based Training LAB
37 pages
SoC Design Course Overview
No ratings yet
SoC Design Course Overview
34 pages
Dynamic Element Matching in DACs
No ratings yet
Dynamic Element Matching in DACs
10 pages
A Tri-Level Current-Steering DAC Design With Improved Output-Impedance Related Dynamic Performance
No ratings yet
A Tri-Level Current-Steering DAC Design With Improved Output-Impedance Related Dynamic Performance
4 pages
Dynamic Element Matching in DACs
No ratings yet
Dynamic Element Matching in DACs
6 pages
Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network
No ratings yet
Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network
2 pages
CNN Models for Face Recognition
No ratings yet
CNN Models for Face Recognition
5 pages
A Fully Integrated DC-DC Converter For Dynamic Voltage Scaling Applications
No ratings yet
A Fully Integrated DC-DC Converter For Dynamic Voltage Scaling Applications
4 pages
Study On Impacts of Large-Scale Photovoltaic Power Station On Power Grid Voltage Profile
No ratings yet
Study On Impacts of Large-Scale Photovoltaic Power Station On Power Grid Voltage Profile
5 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
60 pages
Battery Sizing Calculation Guide
No ratings yet
Battery Sizing Calculation Guide
2 pages
Global System For Mobiles
No ratings yet
Global System For Mobiles
169 pages
Microprocessors and Microcontrollers: Ș.L. Barbelian Mihai
No ratings yet
Microprocessors and Microcontrollers: Ș.L. Barbelian Mihai
19 pages
AB-30 V-I and I-V Converter: An ISO 9001: 2000 Company
No ratings yet
AB-30 V-I and I-V Converter: An ISO 9001: 2000 Company
23 pages
AqaMatic NX Series Controller Data Sheet
No ratings yet
AqaMatic NX Series Controller Data Sheet
2 pages
2GIG SMKT3 345 Install Guide
No ratings yet
2GIG SMKT3 345 Install Guide
2 pages
Presentation On NFC Sensors-1
No ratings yet
Presentation On NFC Sensors-1
25 pages
PNOZ s4 1 Operat Man 21890-EN-05
No ratings yet
PNOZ s4 1 Operat Man 21890-EN-05
20 pages
Telecom Engineers: CDMA's Evolution
No ratings yet
Telecom Engineers: CDMA's Evolution
2 pages
Micro 4 Prob Sol 5
No ratings yet
Micro 4 Prob Sol 5
40 pages
Radio Frequency Choke Design
0% (1)
Radio Frequency Choke Design
4 pages
Networking Basics & Key Concepts
No ratings yet
Networking Basics & Key Concepts
57 pages
PLC Programming & CC-Link Guide
No ratings yet
PLC Programming & CC-Link Guide
214 pages
Panasonic Streaming 4K Ultra HD Hi-Res Audio With Dolby Vision 7.1 Channel DVDCD3D Wi-Fi Built-In Blu-Ray Player Black DP-UB82
No ratings yet
Panasonic Streaming 4K Ultra HD Hi-Res Audio With Dolby Vision 7.1 Channel DVDCD3D Wi-Fi Built-In Blu-Ray Player Black DP-UB82
1 page
Arduino - How To Control Servo Motor With Potentiometer - 5 Steps (With Pictures) - Instructables
No ratings yet
Arduino - How To Control Servo Motor With Potentiometer - 5 Steps (With Pictures) - Instructables
7 pages
Motorola Cm360
No ratings yet
Motorola Cm360
184 pages
N241VP Garmin G3X Avionics Schematics 05102021
100% (1)
N241VP Garmin G3X Avionics Schematics 05102021
17 pages
Quotation - OT - 07 - Network - D - V1.0 - 4G Sim Router
No ratings yet
Quotation - OT - 07 - Network - D - V1.0 - 4G Sim Router
1 page
System Has No Power at All
100% (1)
System Has No Power at All
4 pages
Power Grid Control Through PC
No ratings yet
Power Grid Control Through PC
6 pages
OP AmP Aplications
No ratings yet
OP AmP Aplications
70 pages
21EC51 DC Module 3
No ratings yet
21EC51 DC Module 3
36 pages
Course - Computer Networks, Week - UDP
No ratings yet
Course - Computer Networks, Week - UDP
1 page
TAB Electronic Databook 3rd Ed
No ratings yet
TAB Electronic Databook 3rd Ed
432 pages
Analog and Digital Communication Syllabus
No ratings yet
Analog and Digital Communication Syllabus
3 pages
Interfacing GPS With LPC2148 ARM: Arm How-To Guide
No ratings yet
Interfacing GPS With LPC2148 ARM: Arm How-To Guide
12 pages
MP200 Printer User Guide
No ratings yet
MP200 Printer User Guide
17 pages
EFUEL 30A Instruction Manual
No ratings yet
EFUEL 30A Instruction Manual
2 pages

A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance

Uploaded by

A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance

Uploaded by

194 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO.

A 45 nm Resilient Microprocessor Core for Dynamic

Abstract—A 45 nm microprocessor core integrates resilient I. INTRODUCTION

In contrast to sensors and adaptive circuits that avoid timing

increases, the number of buffers increases, leading to larger

Fig. 6. Multiple-issue (MI) instruction replay example with N =3 N

performance as with embedded EDS circuits. Furthermore, A. Instruction Replay at

bucket C FFs provide a safety option to ensure critical-path cov- TABLE II

Fig. 10. Demonstration of resilient microprocessor while executing the

the majority of the measurement results in Section VII, pos-

Fig. 11. Measured throughput (TP), as normalized to the conventional max-

circuit and the error-recovery technique that replays instructions

design at 1.0 V, the core min-delay constraints limit the max-

The authors express sincere appreciation to Ken Ikeda and

Muhammad M. Khellah (M’99) received the Ph.D.

You might also like