Dependability
! !
Qualitative term for the ability of the system to perform properly encapsulates reliability, availability, safety, maintainability, performability, testability
2011 A.W. Krings
Page: 1
CS449/549 Fault-Tolerant Systems
Sequence 2
Reliability - Unreliability
!
! ! !
R(t) is the probability that the system performs as specified without interruption over the entire interval [0,t] R(t) is conditioned on the system being operational at time t=0. Unreliability F(t) is the probability that the system fails at any time in the interval [0,t]. F(t) = 1 - R(t)
2011 A.W. Krings
Page: 2
CS449/549 Fault-Tolerant Systems
Sequence 2
Reliability - Unreliability
! ! !
time t can be very long, e.g. years in case of space applications Notation 0.9i = .99999999 i 9s This notation is often used for reliability
e.g. Q(t) = 10-x R(t) = 0.9x = (1 - 10-x)
2011 A.W. Krings
Page: 3
CS449/549 Fault-Tolerant Systems
Sequence 2
Safety S(t)
!
S(t) is the probability that the system does not fail in the interval [0,t] in such a manner as to cause unacceptable damage or other catastrophic effects. Safety is a measure of the fail-safe capability of the system
system can be unreliable, yet safe bias towards safe failure e.g. duplex system (detector) e.g. babbling driver (not safe)
Page: 4 CS449/549 Fault-Tolerant Systems Sequence 2
2011 A.W. Krings
Availability A(t)
! A(t)
is the probability that the system is up and running correctly at time t is different from reliability.
! This
Reliability considers the interval [0,t] Availability takes an instance of time
! examples:
transaction processing systems, e.g. reservation systems
Page: 5 CS449/549 Fault-Tolerant Systems Sequence 2
2011 A.W. Krings
Performability
!
! !
P(L,t) is the probability that the system performance will be at or above some level L at time t Measure of the likelihood that some subset of the function is performed correctly This differs from reliability, which dictates that all functions are performed correctly
2011 A.W. Krings
Page: 6
CS449/549 Fault-Tolerant Systems
Sequence 2
Graceful Degradation
!
The ability of system to automatically decrease its level of performance to compensate for hardware failure and software errors.
2011 A.W. Krings
Page: 7
CS449/549 Fault-Tolerant Systems
Sequence 2
Maintainability
M(t) is the probability that a failed system will be restored within a specified period of time t. ! Restoration process
!
locating problem, e.g. via diagnostics physically repairing system bringing system back to its operational condition
2011 A.W. Krings
Page: 8
CS449/549 Fault-Tolerant Systems
Sequence 2
Fault - Error - Failure
!
Fault = physical defect or flow occurring in some component (hardware or software) Error
= incorrect behavior caused by a fault
manifestation of fault
Failure = inability of the system to perform its specified service
2011 A.W. Krings
Page: 9
CS449/549 Fault-Tolerant Systems
Sequence 2
Fault - Error - Failure
Fault
physical universe bit stuck-at
Error
informational universe incorrect data to ALU
Failure
external universe system crash, incorrect bank balance
Note: presents of fault does not ensure that error will occur, e.g. memory stuck-at-0
2011 A.W. Krings Page: 10 CS449/549 Fault-Tolerant Systems Sequence 2
Characteristics of faults
! Cause
specification errors
very dangerous generic fault very hard to formally verify random, not manufacturing defects noise, EMP, radiation much like random component
Page: 11 CS449/549 Fault-Tolerant Systems Sequence 2
implementation
random component faults
external disturbance
2011 A.W. Krings
Characteristics of faults
! Origin
software or hardware dont care, except:
hardware can be analog indeterminate voltage level
2011 A.W. Krings
Page: 12
CS449/549 Fault-Tolerant Systems
Sequence 2
Characteristics of faults
!
Duration
permanent fault
once component fails, it never works correctly again easiest to diagnose
transient fault
1 time only 10 times as likely as permanent fault
intermittent fault
re-occurring may appear to be transient (if long period) hard and expensive to detect
Page: 13 CS449/549 Fault-Tolerant Systems Sequence 2
2011 A.W. Krings
Avoidance - Masking - Tolerance
from Johnson 1989, Fig 2.12
Specification Mistakes Implementation Mistakes External Disturbances
Software Faults System Failure
Errors
Hardware Faults Component Defects
Fault Avoidance
2011 A.W. Krings Page: 14
Fault Masking
Fault Tolerance
CS449/549 Fault-Tolerant Systems Sequence 2