EET 493
Design for Reliability and
Testability of Digital Systems
Instructor: Dr. Mihaela Radu
The World of Dependability
“If anything can go wrong, it will”
Murphy’ s law
Dependability
Dependability
=
•The property of a system such that
reliance can justifiably be placed on the
service it delivers
•It is the quality of a service that a
particular system provides.
•Is the ability of system to deliver its
intended level of service to the users.
3
Dependability
EN:dependability
DE:Zuverlässigkeit, Verlässigkeit
FR: sûreté de fonctionnement
IT: garanzia di funzionamento
JP:
GR: axionistia
ES: dependencia
RO: fiabil
NL: “vertrouwbaarheid”
4
Dependability Tree (author Prasad)
’96) reliability
availability
Attributes/fea safety
tures confidentiality
testability
maintainability
fault prevention
Dependability fault tolerance
Means/ways fault removal
fault forecasting
faults
impairments errors
failures
5
System Design
(Requirements)
Determine which Dependability attribute
primarily needs to be improved for a
specific application.
“Different goals for different applications”
6
Early computer systems-were based on unreliable
components
ENIAC, with 17.5K vacuum tubes and 1000s of other
electrical elements, failed once every 2 days (avg. down time
= minutes)
The first theoretical work in dependable (FT) computing is
credited to John Von Neumann. In 1952, he presented a
series of lectures in the use of replicated logic modules to
improve system’s reliability. Later, he developed an article
entitled:
“ Probabilistic Logics and the Synthesis of Reliable
Organisms from Unreliable Components”
In this article he presented the concept of majority voting
and analyzed the impact that such arrangements could have
on the probability of a system producing erroneous results.7
APPLICATIONS
Critical-computation applications
(also known as Safety Critical)
Critical to human safety
Aircraft flight control systems,
Space shuttle
Requirements/example
Aircraft flight control navigation systems:
Reliability (some hours) = 0.999 999 9 = 0.97 at the end
of 3 hours period (mission time)
Critical to the environment
Industrial control systems for a chemical plant, nuclear
plants, etc.
8
APPLICATIONS
High Availability Applications
Users want to have a high probability of receiving service
when it is requested.
Transaction processing
ATM: < 10 hours/year unavailable
airline reservation: < 1 min/day unavailable
Telecommunication systems: < 5 min./year unavailable
9
APPLICATIONS
Long-Life Applications
Unmanned space flights and satellites; repair is
impossible or prohibitively expensive.
Examples: Mariner, Explorer and Voyager missions,
satellites
Typical requirements of long-life application are to
have 95% probability (0.95) to be operational at
the end of mission (e.g. 10 years).
May be degraded / reconfigured before the end of the
mission time (if operator interaction is possible).
10
New emergent applications
Health industry,
Automotive industry,
Industrial control systems and production lines,
Banking, E-commerce,
Wired and wireless networked applications
Distributed, networked systems (reliability and security
are the major concerns)
11
Dependability Tree
Reliability
Availability
attributes Safety
Confidentiality
Testability
Maintainability
fault prevention
dependability fault tolerance
means fault removal
fault forecasting
faults
impairments errors
failures
12
Reliability
R(t) = probability that the system is
conform to specifications during the period of
time [t0,t], if OK at t0.
Reliability R (t) is the probability that the
system produces correct output.
It is the conditional probability that a system
performs correctly throughout an interval of
time [to, t], given that the system was
performing correctly at time to.
13
Reliability
We need systems with high reliability if:
no temporary deviation from specifications are
allowed (aircraft, heartpace makers):
aircraft: R(some hours) = 0.999 999 9 = 0.97
if no repair is possible:
satellite, spacecraft: R (some years) = 0.95
14
Reliability Function R(t)
Exponential Failure Law
t
R (t ) e
Where: -constant failure rate
1
0.8
0.6
0.4 R
0.2
0
time
15
Availability
A(t) = probability that the system is conform to
specification at time t.
Availability A (t) is a function of time.
A(t) is defined as the probability that a system is
operating correctly and is available to perform its
functions at the instant of time, t.
Depends on: failure rate-λ, repair rate-μ, related to
MTTF (mean time to failure) and MTTR (mean time to
repair).
% of time that a system is available
A = MTTF/(MTTF+MTTR)
16
Availability
High availability of a system means:
high reliability, or
failing often and repaired fast
Steady –State Availability- calculated as down time per year
Example: Ass=90% (36.5 days/year)
Examples
Transaction processing
ATM: Ass=0.93 (< 10 hours/year unavailable)
Embedded
telecom: Ass=0.95 (< 5 min./year unavailable)
17
Safety
S(t) = probability that a system is either conform to
specification, or stopped in a safe way, at time t.
Safety can be improved through the incorporation of
features that provide the fail-safe operation of the
system. For example, if a fault occurs, a mechanism
must be provided to detect the existence of the fault and
to prevent the fault from resulting in an undesired
response from the control system.
In essence, safety is the ability (or a measure) of a
system to be fail-safe.
Safety is a probability that safety actions will occur.
18
Safety requirements for aircraft
19
Reliability versus Safety
Reliability Safety
Provide correct service Avoidance of hazards
Incorporates functional Incorporates non-
specifications functional specifications
No severe consequences May have catastrophic
when it fails consequences when it
fails
different costs of failure
different methodologies for system construction
safety =reliability
20
Maintainability
M(t) = probability that the system is
back to specifications at time t
if failed at t0.
How fast can be repaired, once it failed?
Restoration process:
Locating the problem. Physically repairing the
system. Bringing to its operational condition.
Automatic diagnosis
21
Testability- Related to Maintainability
Testability
ability to test the
characteristics of the
system
Example:
Design for Testability of
Digital System:
BIST….
22
Design for Testability
Testability –a design characteristics that influences
various costs associated with testing.
Design for testability techniques are design efforts
specifically employed to ensure that a device is testable.
Attributes: Controllability and Observability
Controllability is the ability to establish a specific signal
value at each node in a circuit by setting values on the
circuit’s inputs.
Observability is the ability to the determine the signal
value at any node in a circuit by controlling the circuit’s
inputs and observing the outputs
23
Other Attributes of Dependability
Confidentiality (security)
Non-occurrence of unauthorised disclosure of
information
Integrity
Non-occurrence of improper alterations of
information
24
…. Conclusions:
Dependable Systems
Fault, Error,
Reliability
Availability Failures
Safety
Fault Testing and
Tolerance testability
25