0% found this document useful (0 votes)
24 views5 pages

08s Cpe633 Chap1

This document provides an overview of fault tolerance and redundancy in computer systems. It discusses fault classification including transient and permanent faults. It also describes four main types of redundancy: hardware, information, time, and functional redundancy. Finally, it outlines basic measures used to evaluate fault tolerance such as mean time to failure, mean time between failures, availability, and reliability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

08s Cpe633 Chap1

This document provides an overview of fault tolerance and redundancy in computer systems. It discusses fault classification including transient and permanent faults. It also describes four main types of redundancy: hardware, information, time, and functional redundancy. Finally, it outlines basic measures used to evaluate fault tolerance such as mean time to failure, mean time between failures, availability, and reliability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CPE 633

Chapter 1 - Preliminaries

Dr. Rhonda Kay Gaede

UAH
1
Electrical and Computer Engineering

UAH Chapter 1 CPE 633

Motivation

• Computers are everywhere.


• Computers are used in _______________
and _________________ applications.
• Computer systems (______________ and
____________) are incredibly __________.
• With complexity comes a propensity
for ____________.
• Two approaches:
– ________________________
– ____________________________________________
Page 2 of 10
Electrical and Computer Engineering

1
UAH Chapter 1 CPE 633

1.1 Fault Classification

• Definitions
– A fault (or failure) can be either a
___________________ or a ___________________
– An ______ is a manifestation of the ______.
• Examples
– Output of adder circuit ______________
– sin(x) computation really ___________
• Fault effects can ____________.
• To limit this spread, designers
incorporate _________________________.
Page 3 of 10
Electrical and Computer Engineering

UAH Chapter 1 CPE 633

1.1 Fault Classification

• These containment zones are


__________ that reduce the chance
that an effect can spread.
– ____________________________________________
_______.
• Hardware faults can be:
– ________________
– _____________
– __________________
• Hardware faults are __________ or
_______________.
Page 4 of 10
Electrical and Computer Engineering

2
UAH Chapter 1 CPE 633

1.2 Types of Redundancy

• All of fault tolerance is an exercise


in _____________ and ____________
_________________ – the property of
___________________________ than is
minimally necessary.
• Four forms of redundancy: __________,
_____________, _________, ______________
• Hardware redundancy is provided by
___________________________ in the
design to _________ or _________ errors.
– It can be ________, _________ or __________.
Page 5 of 10
Electrical and Computer Engineering

UAH Chapter 1 CPE 633

1.2 Types of Redundancy

• The best-known form of _____________


redundancy, _________________ and
_______________ coding, is widely used in
___________________________.
• ___________________ and ________________
codes are also used to protect data
communicated over _________ (channels
subject to many __________ failures)
channels. ______________ upon detection of
an error is ________ redundancy.
• _______________ redundancy leads to
hardware _____________.
Page 6 of 10
Electrical and Computer Engineering

3
UAH Chapter 1 CPE 633

1.3 Basic Measures of Fault Tolerance

• What does it mean to make machines more


__________________?
– We need __________
• Traditional Measures
– ______________, _____, is the probability that the
system has been ___________________ in the time
interval [0,t]. It is suitable for applications in
which even a ___________________________ can prove
costly.
• ____________________________ (MTTF)
• _______________________________ (MTBF)
• _______________________ (MTTR)
• ________ = ________ + ___________

Page 7 of 10
Electrical and Computer Engineering

UAH Chapter 1 CPE 633

1.3 Basic Measures of Fault Tolerance

– _______________, _____, is the average _____________


_______ over the interval [0.t] that the system is
_____.
A = lim A(t )
t →∞

MTTF MTTF
A= =
MTBF MTTF + MTTR

– ________________________, ______, is the probability


that the system is up at ___________________________
____________.
Page 8 of 10
Electrical and Computer Engineering

4
UAH Chapter 1 CPE 633

1.3 Basic Measures of Fault Tolerance

• All this is nice as long as we know what ____ means.


– Some cases are simple, _________________ for example.
– Other cases not so much, what if ______________________
____________________________________________?
– Many systems have ________________ states
• Extension of traditional measures to _____________
___________________________________ of a system with n
processors.
n
ACC = ∑ ci Pi (t )
i =1
• Ci is the _______________________________ of a system with I
____________________ processors
• Pi(t) is the probability that exactly __________________ are
operational at time t
Page 9 of 10
Electrical and Computer Engineering

UAH Chapter 1 CPE 633

1.3 Basic Measures of Fault Tolerance

• Network Measures
– Classical _______ and ____________________ – the minimum
number of ___________ and __________ that have to fail
before the network becomes ________________________.
– Average ________________________
– Maximum __________________ (_______________)

Page 10 of 10
Electrical and Computer Engineering

You might also like