Reliability
Reliability
Besides this, our technical systems are more and more put to use in hostile
environments; they have to be suitable for a wider variety of environments.
Just think of applications in the process industry (heat, humidity, chemical
substances), mobile applications in aircraft, ships, and vehicles (mechanical
vibrations, shocks, badly defined power supply voltages, high
electromagnetic interference level).
Also the socio-ethical aspects of products with a reliability that is too low
cannot be underestimated. These low- reliability disposable products lead to
a waste of labour, energy, and raw materials that are becoming more and
more scarce.
1.3 DEFINITION
The following definitions of reliability are most often met with in the
literature.
1. Probability
2. Adequate performance
3. Time
4. Operating and environmental conditions.
The true reliability is never exactly known, but numerical estimates quite
close to this value can be obtained by the use of statistical methods and
probability calculations. How close the statistically estimated reliability
comes to the true reliability depends on the amount of testing, the
completeness of field service reporting all successes and failures, and other
essential data. For the statistical evaluation of an equipment, the equipment
has to be operated and its performance observed for a specified time
under actual operating conditions in the field or under well-simulated
conditions in a Laboratory. Criteria of what is considered an adequate
performance have to be exactly spelled out for each case, in advance.
analysis begins with the definition of an undesirable event and traces this
event down through the system to identify basic causes. In systems
parlance, the FMEA is a bottom-up procedure while the FT A is a top-down
technique.
2. System Complexity
3. Poor Maintenance
5. Human Reliability
Y(t)
Y
max
(a)
Y.
IDIIl
time
v(t)
(b)
Vf
tim.
First, there are the failures which occur early in the life of a component.
They are called early failures. Some examples of early failures are:
Many of these early failures can be prevented by improving the control over
the manufacturing process. Sometimes, improvements in design or materials
are required to increase the tolerance for these manufacturing deviations,
but fundamentally these failures reflect the manufacturability of the component
or product and the control of the manufacturing processes. Consequently,
these early failures would show up during:
Secondly, there are failures which are caused by wearout of parts. These
occur in an equipment only if it is not properly maintained-or not maintained
at all. Wearout failures are due primarily to deterioration of the design strength
of the device as a consequence of operation and exposure to environmental
fluctuations. Deterioration results from a number of familiar chemical and
physical phenomena:
* Corrosion or oxidation
* Insulation breakdown or leakage
* Ionic migration of metals in vacuum or on surfaces
* Frictional wear or fatigue
* Shrinkage and cracking in plastics
Third, there are so-called chance failures which neither good debugging
techniques nor the best maintenance practices can eliminate. These failures
Reliability Fundamentals 13
If we plot the curve of the failure rate against the lifetime T of a very large
sample of a homogeneous component population, the resulting failure rate
graph is shown in Fig 1.3. At the time T=O we place in operation a very
large number of new components of one kind. This population will initially
exhibit a high failure rate if it contains some proportion of substandard,
weak specimens. As these weak components fail one by one, the failure
rate decreases comparatively rapidly during the so-called burn-in or debugging
period, and stabilizes to an approximately constant value at the time T b
when the weak components have died out. The component population after
having been burned in or debugged, reaches its lowest failure rate level
which is approximately constant. This period of life is called the useful life
period and it is in this period that the exponential law is a good
14 Reliability Engineering
--
I
... ,
I
I
Chance failures
Useful life period I
I
T M
w
Operating life T )
(age
Fig. 1.3 Component failure rate as a function of age.
If the chance failure rate is very small in the useful life period, the mean
time between failures can reach hundreds of thousands or even millions of
hours. Naturally, if a component is known to have a mean time between
failures of say 100,000 hours (or a failure rate of 0.00001) that certainly
does not mean that it can be used in operation for 100,000 hours.
The mean time between failures tells us how reliable the component IS In
its useful life period, and such information is of utmost importance. A
component with a mean time between failures of 100,000 hours will have a
reliability of 0.9999 or 99.99 percent for any 10-hour operating period.
Further if we operate 100,000 components of this quality for 1 hour, we
would expect only one to fail. Equally, would we expect only one failure if
we operate 10,000 components under the same conditions for 10 hours, or
1000 components for 100 hours, or 100 components for 1000 hours.
The golden rule of reliability is, therefore: Replace components as they fail
within the useful life of the components, and replace each component
preventively, even if it has not failed, not later than when it has reached the
end of its useful life. The burn-in procedure is an absolute must for missiles,
rockets, and space systems in which no component replacements are
possible once the vehicle takes off and where the failure of any single
component can cause the loss of the system. Component burn-in before
assembly followed by a debugging procedure of the system is, therefore,
another golden rule of reliability.
In this formula A is a constant called the failure rate, and t is the operating
time. The failure rate must be expressed in the same time units as time, t-
usually in hours. However, it may be better to use cycles or miles in same
cases. The reliability R is then the probability that the device, which has a
constant failure rate A will not fail in the given operating time t.
This reliability formula is correct for all properly debugged devices which are
not subject to early failures, and which have not yet suffered any degree
of wearout damage or performance degradation because of their age.
The probability that the device will not fail in its entire useful life period of
1000 hours is
We often use the reciprocal value of the failure rate, which is called the
mean time between failures, m. The mean time between failures, abbreviated
MTBF can be measured directly in hours. By definition, in the exponential
case, the mean time between failures, or MTBF is
m = 11 A. (1.2)
When plotting this function, with Reliability values on the ordinate and the
corresponding time values on the abscissa, we obtain a curve which is often
referred to as the survival characteristic and is shown in Fig 1.4.
There are a few points on this curve which are easy to remember and which
help greatly in rough predicting work. For an operating time t = m, the
device has a probability of only 36.8 percent (or approximately 37 percent)
to survive. For t = m/10, the curve shows a reliability of R = 0.9 and for t
= m/1 00, the reliability is R = 0.99; for t = m/1000, it is 0.999.
Reliability Fundamentals 17
Reliability
1.0
(a)
0.367
3m
Time
1.00
0.99
0.95
(b)
- -+-
mllOO ml20 milO
For fast reliability calculations, we can use a Nomogram as shown in Fig 1.5.
If we know any two of the following three parameters, the third can be
directly read on the straight line joining the first two.
Example 1.1
Solution
1..= 0.0001/hr
Therefore, m = 1/ A. = 10,000 hr
Reliability Fundamentals 19
If a fixed number No of components are tested, there will be, after a time t,
Ns(t) components which survive the test and Nt(t) components which fail.
Therefore, No = Ns(t) + Nt(t) is a constant throughout the test. The reliability,
expressed as a fraction by the probability definition at any time t during the
test is:
In the same way, we can also define the probability of failure 0 (called
unreliability) as
(1.7)
Rearranging,
components will fail out of these Ns(t) components. When we now divide
both sides of the equation (1.9) by Ns(t), we obtain the rate of failure or the
instantaneous probability of failure per one component, which we call the
failure rate:
which is the most general expression for the failure rate because it applies
to exponential as well as non-exponential distributions. In the general case, I..
is a function of the operating time t, for both Rand dR/dt are functions of t.
Only in one case will the equation yield a constant, and that is when failures
occur exponentially at random intervals in time. By rearrangement and
integration of the above equation, we obtain the general formula for
reliability,
I..(t)dt = -(dR(t)/R(t))
t
or, In (R (t) ) = - JI..(t) dt
o
Solving for R(t) and knowing that at t = 0, R(t) = 1, we obtain
t
R(t) = J
exp[- I..(t) dt] (1.12)
o
So far in this derivation, we have made no assumption regarding the nature
of failure rate and therefore it can be any variable and integrable function
of the time t. Consequently, in the equation (1.12), R(t) mathematically
describes reliability in a most general way and applies to all possible kinds of
failure distributions.
When we specify that failure rate is constant in the above equation, the
exponent becomes
t
-I I..(t) dt = - I.. t
o
and the known reliability formula for constant failure rate results,
Example 1.3:
'. . . . . . . . . .,. .
functions.
The computation of failure density and failure rate is shown in Table 1.4.
Similarly the computation of reliability and unreliability function is shown
in Table 1.5. These results are also shown in Fig 1.8. As shown, we can
compute R(t) for this example using the formula R(t) = Ns(ti)/N o at each
value of ti and connecting these points by a set of straight lines. In the data
analysis one usually finds it convenient to work with A.(t) curve and deduce
the reliability and density functions theoretically. For example, in this
illustration, we can see that the hazard rate can be modeled as a constant.
***
T abl e 14Compu t af Ion 0 f f'l
al ure densnyan
't d f al'1 ure rat e
Time Interval Failure density Failure rate
(Hours)
0-8 11(10 x 8) = 0.0125 1/(10 x 8) = 0.0125
8-20 11(10 x 12) = 0.0084 11(9 x 12) = 0.0093
20-34 11(10 x 14) = 0.0072 11(8 x 14) = 0.0096
34-46 1/(10 x 12) = 0.0084 11(7 x 12) = 0.0119
46-63 11(10 x 17) = 0.0059 11(6 x 17) = 0.0098
63-86 11(10 x 23) = 0.0044 11(5 x 23) = 0.0087
86-111 11(10 x 25) = 0.0040 11(4 x 25) = 0.0100
111-141 11(10 x 30) = 0.0033 11(3 x 30) = 0.0111
141-186 11(10 x 45) = 0.0022 11(2 x 45) = 0.0111
186-266 11(10 x 80) = 0.0013 11(1 x 80) = 0.0125
I) "'(I)
0 time time
(a) (b)
(I) Q(I)
(c)
lime
L (d)
time
That means that l/N s (t) and dNt(t)/dt must either decrease at the same rate
or must be held constant through the entire test. A simple way to measure a
constant failure rate is to keep the number of components in the test
constant by immediately replacing the failed components with good ones.
The number of alive components Ns(t) is then equal to No throughout the
test. Therefore, 1/Ns(t) = 1/No is constant, and dNt(t)/dt in this test must
also be constant if the failure rate is to be constant. But dNt(t)/dt will be
constant only if the total number of failed components Nt(t) counted from
the beginning of test increases linearly with time. If Nt components have
failed in time t at a constant rate, the number of components failing per unit
time becomes Ntlt and in this test we can substitute Ntlt for dNt(t)/dt and
1/No for l/N s (t). Therefore,
28 Reliability Engineering
(1.29)
Thus, we need to count only the number of failures Nf and the straight hours
of operation t. The constant failure rate is then the number of failures
divided by the product of test time t and the number of components in test
which is kept continuously at No. This product No. t is the number of unit-
hours accumulated during the test. Of course, this procedure for determining
the failure rate can be applied only if A. is constant.
If only one equipment (No = 1) is tested but is repairable so that the test can
continue after each failure, the failure rate becomes A. = Nflt where the unit-
hours t amount to the straight test time.
Example 1.4:
Consider another example wherein the time scale is now divided into equally
spaced intervals called class intervals. The data is tabulated in the Table 1.6
in class intervals of 1000 hours. Compute the failure density and failure rate
functions.
Table 1.6: Data for Example 1.4
Time interval hours Failures in the interval
0000 - 1000 59
1001-2000. 24
············200·1···~··30·00···········T··························2·9··························
........................................................................+.....................................................................
3001 - 4000 i 30
4001 - 5000 17
5001 - 6000 13
Solution:
It can be seen that the failure rate in this case can be approximated by a
linearly increasing time function.
Reliability Fundamentals 29
Example 1.5 :
A sample of 100 electric bulbs was put on test for 1500 hrs. During this
period 20 bulbs failed at 840,861,901,939,993,1060, 1100,1137,
1184,1200,1225,1251,1270,1296,1314,1348,1362, 1389, 1421,
and 1473 hours. Assuming constant failure rate, determine the value of
failure rate.
Solution:
In this case,
Nf = 20
Not = 840 + 861 + 901 + 939 + 993 + 1060 + 1100 + 1137 + 1184 + 1200 + 12
25+ 1251 + 1270+ 1296+ 1314+ 1348+ 1362 + 1389 + 1421 + 1473+
80(1500) = 143, 564 hrs.
***
Reliability Mathematics 51
f{x)
1
- S ------------
a(lt J I
I
I
o II. x
(a>
F(x>
-------------------?-----
about continuous-time and discrete-state models) we must first define all the
mutually exclusive states of the system. For example, in a system composed
of a single non-repairable element X1 there are two possible states: so= x1,
the element is good, and S1 = X'1, the element is bad. The states of the
system at t = 0 are called the initial states, and those representing a final or
equilibrium state are called final states. The set of Markov state equations
describes the probabilistic transitions from the initial to the final states.
(2.37)
(2.38)
dPo(t)
------------- = -zIt) Po(t) (2.39)
dt
dP1 (t)
--------- = zIt) Po (t) (2.40)
dt
t
Po(t) = exp[ - f z( 't )d't] (2.41 )
o
and
t
P1 (t) 1 - exp[ - f z( 't)d't] (2.42)
o
Ofcourse. a formal solution of the second equation is not necessary to obtain
since it is possible to recognize at the outset that
(2.43)
The role played by the initial conditions is clearly evident. If there is a fifty-
fifty chance that the system is good at t = O. then Po(O) = 1/2. and
t
Po(t) = (1/2) exp[ - f z('t)d't] (2.44)
o
It is often easier to characterize Markov models by a graph composed of
nodes representing system states and branches labeled with transition
probabilities. Such a Markov graph for the problem described above is given
in Fig 2.10. Note that the sum of transition probabilities for the branches
54 Reliability Engineering
leaving each node must be unity. Treating the nodes as signal sources and
the transition probabilities as transmission coefficients, we can write
difference equations by inspection. Thus, the probability of being at any
node at time t + ~t is the sum of all signals arriving at that node. All other
nodes are considered probability sources at time t, and all transition
probabilities serve as transmission gains. A simple algorithm for writing the
differential equations by inspection is to equate the derivative of the
probability at any node to the sum of the transmissions coming into the
node. Any unity gain factors of the self-loops must first be set to zero, and
the ~t factors are dropped from the branch gains.
o o
l-z(t) 6. t
z(t) 6. t
P 1
Fig. 2.10 Markov graph for a single nonrepairable element
(2.44)
(2.45)
(2.46)
(2.47)
Reliability Mathematics 55
~(t)At
1-13 (t)At
Fig. 1.11 Markov graph for two distinct nonrepairable elements.
dPo(t)
= -[Zo1 (t) + Zo2(t)] Poft) (2.48a)
dt
dP 1 (t)
= -[Z13(t)) P1 (t) + [Zo1 (t)] Poft) (2.48b)
dt
dP2 (t)
= -[Z23(t)] P2(t) + [Zo2(t)]Poft) (2.48c)
dt
dP3(t)
= [Z13(t)]P1 (t) + [Z23(t)]P2(t) (2.48d)
dt
The initial conditions associated with this set of equations are PolO), P1(O),
P2(O), and P3(O). These equations, of course could have been written by
inspection using the algorithm previously stated.
It is difficult to solve these equations for a general hazard function zIt), but
if the hazards are specified, the solution is quite simple. If all the hazards are
constant, Zo1 (t) = A,1, Zo2(t) = A,2, Z13(t) = A,3, and Z23(t) = A,4.
3
RELIABILITY ANALYSIS OF
SERIES PARALLEL SYSTEMS
3.1 INTRODUCTION
Component reliabilities are derived from tests which yield information about
failure rates. The actual value of this failure rate can be obtained only by
means of statistical procedures because of the two main factors which
govern the probability of survival of a component:
59
60 Reliability Engineering
Once we have the right figures for the reliabilities of the components in a
system, or good estimates of these figures, we can then perform very exact
calculations of system reliability even when the system is the most complex
combination of components conceivable. The exactness of our results does
not hinge on the probability calculations because these are perfectly
accurate; rather, it hinges on the exactness of the reliability data of the
components. In system reliability calculations for Series-Parallel Systems we
need use only the basic rules of the probability calculus.
4. The state of each element and of the entire network is either good
(operating) or bad (failed).
Two blocks in a block diagram are shown in series if the failure of either of
them results in system failure. In a series block diagram of many blocks,
such as Fig 3.1, it is imperative that all the blocks must operate successfully
for system success. Similarly two blocks are shown in parallel in the block
diagram, if the success of either of these results in system success. In a
parallel block diagram of many blocks, such as Fig 3.2, successful operation
of anyone or more blocks ensures system success. A block diagram, in
which both the above connections are used is termed as Series-Parallel Block
Diagram.
In"IT""IT~,
'",
x
1 . X2 .f----·,~
In Out In Out
(atleast k needed)
FIg. 3.2 A Parallel Block Diagram Fig. 3.3 A k-out-of-m Block Diagram
to pass the required current. Such a block diagram can not be recognised
without a description inscribed on it, as in Fig 3.3. Series and Parallel
reliability block diagrams can be described as special cases of this type with
k equal to m and unity respectively.
62 Reliability Engineering
n
R(t) = II Pj(t) (3.4)
i= 1
and
n
R(t) exp [-t L A.jl (3.5)
i=1
Therefore, the reliability law for the whole system is still exponential. Also,
for series systems with constant failure rate components the system failure
rate is the sum of failure rates of individual components i.e.,
Reliability Analysis of Series Parallel Systems 63
(3.6)
and the MTBF of the system is related to the MTBF of individual components
by
n
ms= l/:E (lITj) (3.7)
i=l
Example 3.1
Solution
This sum is the expected hourly failure rate As of the whole circuit. The
estimated reliability of the circuit is then
R(t) = exp(-O.OOOl t)
This does not mean that the circuit could be expected to operate without
failure for 10,000 hours. We know from the exponential function that its
64 Reliability Engineering
***
It may be noted that the component failure rate figures apply to definite
operating stress conditions-for instance, to an operation at rated voltage,
current, temperature, and at a predicted level of mechanical stresses, such
as shock and vibration. Failure rates usually change radically with changes
in the stress levels. If a capacitor is operated at only half of its rated voltage,
its failure rate may drop to 1/30th of the failure rate at full rated voltage
operation.
Thus, when designing the circuits and their packaging, the circuit designer
should always keep two things in mind:
1. Do not overstress the components, but operate them well below their
rated values, including temperature. Provide good packaging against
shock and vibration, but remember that in tightly packaged
equipment without adequate heatsinks, extremely high operating
temperatures may develop which can kill all reliability efforts.
It may be observed that the time t used above is the system operating time.
Only when a component operates continuously in the system will the
component's operating time be equal to the system's operating time. In
general, when a component operates on the average for t1 hours in t system
operating hours, it assumes in the system's time scale a failure rate of
(3.8)
system is operating. If the component has a failure rate of A.' when operating
and A." when de-energized, and it operates for t, hours every t hours of
system operation, the system will see this component behaving with an
average failure rate of
(3.10)
But if this component also has a time dependent failure rate of A.' while
energized, and a failure rate of A." when de-energized (with system still
operating), the component assumes in the system time scale a failure rate of
Example 3.2
An electric bulb has a failure rate of 0.0002/hr when glowing and that of
0.00002/hr when not glowing. At the instant of switching -ON, the failure
rate is estimated to be 0.0005/switching. What is the average failure rate of
the bulb if on the average it is switched 6 times every day and it remains ON
for a total of 8 hrs in the day on the average.
Solution
Here,
t =24 hrs
t, =8 hrs
A.' =0.0002/hr
A." =0.00002/hr
I.e =0.0005/switching
C =6
switching it off when not needed. (We have not discussed the question of
energy consumption here -which may force the other decision on us).
***
In case the components in a series system are identical and independent
each with reliability, p or unreliability, q
R = pn = (1-q)n (3.12)
R ~ 1-nq (3.13)
Example 3.3
Solution
R ~ 1-nq
or, 0.99 = 1-1 Oq
or, q =0.001
Hence, p =0.999
R = p10
or, p10 =0.99
p =(0.99)0.1 = 0.99899.
We can thus see that the difference between exact calculation and
approximate calculation is negligible and hence the approximate realtion is
frequently used in practical design which in simple words means that the
system unreliability is the product of component unreliability by the number
of components in the system.
***
Reliability Analysis of Series Parallel Systems 67
If Pr(Ej') =qj and Pr(Ej) =pj, the time dependent reliability function is
m
R(t) = 1 - n qj(t) (3.17)
i=1
m
= 1- n(1-pj(t)) (3.18)
i=1
In case of identical components,
R = 1 - [1-p(t)]m (3.19)
a = q(t)m (3.20)
00
ms =I [1 - (1-exp(-At))]m dt (3.22)
o
It can be easily derived now that:
m
ms = (1!A) L (1/i) (3.23)
i=1
For large values of m, equation (3.23) can be reduced to:
When two components with the failure rates Al and A2 operate in parallel,
the reliability Rp of this parallel system is given by
(3.25)
00
Op = 0, O2 = 02 = [1-exp(-At)]2
The reliability is
(3.31 )
Example 3.4
Solution
m
Rp = 1-IT (1-Pi)
i=1
In such systems, we have to apply the product law of reliability and product
law of unreliability repeatedly for reliability analysis of the systems. This is
best clarified with the help of some examples:
Example 3.5
0.92
B
~ 0.98 r--
A 0.92
0.98 0.98
D E
Solution
(0.98)(0.9936) =0.9737
***
Example 3.6
Three generators, one with a capacity of 100 kw and the other two with a
capacity of 50 kw each are connected in parallel. Draw the reliability logic
diagram if the required load is:
(i) 100 kw (ii) 150 kw
Solution
The reliability logic diagram for case (i) is drawn as shown in Fig 3.5(a)
because in this case either one 100 kw or two 50 kw generators must
function. Similarly, the logic diagram for case (ii) is drawn as shown in Fig
3.5(b) as in this 100 kw generator must function and out of the remaining
two anyone is to function.
, - - - - i lOOkw SOkw
Rl =r+r2-r3
R2 = rI2r-r 2]
***
3.51 Redundancy at Component Level
The pertinent question here is, at what level should the components be
duplicated, i.e, at component level, subsystem level or system level? We
will explain this with the help of an example. Consider the two
configurations as given in Fig 3.6.
[]-Cl··_·_····_· n
<a)
(b)
FIg 3.6: Redundancy at Component Level
Let the reliability of each component be r. The reliability of the system (Rs) in
the case of configuration 3.6(a) can be expressed as
Rs' rn(2-r)n
----- = -------------
It can be shown that the ratio R's:Rs is greater than unity for r< 1. Hence,
the configuration 3.6(b) would always provide higher reliability. Thus, as a
generalisation, it can be said that the components if duplicated in the system
at the component level give higher system reliability than if duplicted at the
subsystem level (here each set is considered as a subsystem). In general, it
should be borne in mind that the redundancy should be provided at the
component level until and unless there are some overriding reasons or
constraints from the design point of view.
m
R= t mCi pi (1-p)m-i (3.34)
i=k
and
m
ms = (1/A.) L 1/i (3.36)
i=k
If the components are not identical but have different reliabilities, the
calculations become more complicated.
Example 3.7
Solution
m
as = PrO ~ k} = l: mCj (Ps)i (1-ps)m-j (3.51 )
j=k
Again using the rare-event approximation that Ps < < 1, we may approximate
this expression by
(3.52)
From Eqs.(3.50) and (3.52) the trade-off between fail-to-danger and spurious
operation is seen. The fail-safe unreliability is decreased by increasing k and
the fail-to-danger unreliability is decreased by increasing m-k.
In this expression the term exp(- At) *1 represents the probability that no
failure will occur, the term exp(- At) * (At) represents the probability that
exactly one failure will occur, exp(- At)(At)2/2! represents the probability that
exactly two failures will occur, etc. Therefore, the probability that two or one
or no failure will occur or the probability that not more than two failures will
occur equals:
82 Reliability Engineering
ms = I Rsdt = 11 A + AI A2 = 21 A (3.54)
o
For a stand-by system of three units which have the same failure rate and
where one unit is operating and other two are standing by to take over the
operation in succession, we have
ms = (n + 1 )1 A (3.58)
The stand-by arrangements are slightly more reliable than parallel operating
units, although they have a considerably longer mean time between
failures. However, these advantages are easily lost when the reliability of
the sensing-switching device Rss is less than 100%, which is more often
the case. Taking this into consideration and when the circuits are
arranged so that the reliability of the operating unit is not affected by the
unreliability of the sensing-switching device, we obtain for a system in
which one stand-by unit is backing up one operating unit:
Reliability Analysis of Series Parallel Systems 83
It is the exception rather than the rule that the failure rates of the stand-by
units are equal to those of the operating unit. For instance, a hydraulic
actuator will be backed up by an electrical actuator, and there may be even
a third stand-by unit, pneumatic or mechanical. In such cases, the failure
rates of the stand-by units will not be equal and the formulae which we
derived above will no longer apply.
1. A succeeds up to time t or
2. A fails at time t, <t and B operates from t, to t.
The first term of this equation represents the probability that element A
will succeed until time t. The second term excluding the outside integral, is
the density function for A failing exactly at t, and 8 succeeding for the
remaining (t-t,) hours. Since t, can range from 0 to t, t, is integrated over
that range.
For the exponential case where the element failure rates are A. a and ~
00 t 00
and (3.62)
It can be shown that it does not matter whether the more reliable element
84 Reliability Engineering
Example 3.9
Solution
The appreciable decrease in the values of reliability and MTBF may please be
observed by the reader because of the imperfect nature of sensing and
switching over device.
***
3.81 Types of Standby Redundancy
1. Cold Standby
2. Tepid Standby
In this case, the value of the standby component changes progressively. For
example, components having rubber parts deteriorate over time and
ultimately affect the reliability of standby component.
3. Hot Standby
The standby component in this case, fails without being operated because of
a limited shelf life. For example, batteries will fail even in standby due to
some chemical reactions.
4. Sliding Standby
--------@-
It may be noted that sliding standby components may have more than one
component in standby depending upon the reliability requirement.
In this case, an Automatic Fault Locator (AFL) is provided with the main
system which accomplishes the function of locating the faulty component,
disconnecting it and connecting the standby component. AFL's are generally
provided in automatic and highly complex systems. The sliding standby
redundancy having AFL is shown in Fig 3.10.
86 Reliability Engineering
~-8 .... ~
LGJ
r---------------------------------1
I
~
L6J ,
1 ,,
!
!
,
m i
i
_________________________________ J
8.1 INTRODUCTION
From time to time, statistics are generated which emphasize the costliness
of maintenance actions. While estimates of actual costs vary, they
invariably reflect the immensity of maintenance expenditures. According to
one source, approximately 800,000 military and civilian technicians in U.S.A.
are directly concerned with maintenance. Another source states that for a
sample of four equipments in each of three classes - radar, communication,
and navigation the yearly support cost is 0.6, 12 and 6 times, respectively,
the cost of the original equipment. Such figures clearly indicate the need
for continually improved maintenance techniques.
153
162 Reliability Engineering
and therefore,
t
Pr(T::;; t) f ~ exp(- ~t) dt
o
1 - exp(- ~t) (8.5)
M(I)
I-lie
1111 Time
The expected value of repair-time is called the mean time to repair (MTIR)
and is given by
00
MTIR Jt g(t) dt
o
00
J Jl t exp(- Jl t) dt = 1 / Jl (8.7)
o
o
AAI l,u\ I
1- AA I
Stale 0 Siale 1
~I
State 0 denotes that no failure has occurred and state 1 denotes that one
failure has occurred (i. e. the component is down). If component has not
failed at time t, then the probability that the component will fail in the time
interval (t, t + M) is equal to AAt. On the other hand, if the component is in
state 1 (failed state), then the probability that the compnent will enter into
state 0 is equal to Jl At.
From the Markov graph, it can be seen that the probability that the
component will be in state 0 at time t + At is
dPo(t)
(8.10a)
dt
dP, (t)
(8.10b)
dt
At time t = 0
~ A.
+ --------- exp [-( A. + ~)tl (8.11a)
A.+~ A.+~
A. A.
---------- - --------- exp [-( A. + ~)tl (8.11 b)
A. + ~ A. + ~
o nonnalized time
(a) Availability of the unit
OIP
o U T
c
(b) Average history of o/p of the unit
up do
1I A.
A (8.14)
l/A.+l/J.1
Here, 1/ A. is the mean time between failures (MTBF). It may be noted that
this has been defined as the mean time to failure (MTTF) in the case of non-
repairable components. 1/J.1 is the mean repair time or mean time to repair
(MTTR). Fig.8.6(b) characterizes the expected or mean behaviour of the
component. U represents the mean up-time (MTBF) and 0 represents the
mean down-time (MTTR). To is known as cycle time. Here,
U=l/A.
o= 1/ J.1
The steady-state availability is a number greater than zero and less than
one. It is equal to zero when no repair is performed ( J.1 = 0) and equal to one
when the equipment does not fail (A. = 0). Normally, 1/J.1 is much smaller than
1I A. and therefore the availability can be approximated as
The number of failures per unit time is called the frequency of failures.
This is given by
The availability, transition rates ( A. and J.1 ) and mean cycle time can be
related as follows:
A = U/(U + 0) = fU = f/ A. (8.18)
Example 8.1
Solution
500
Availability = -------------
= 500/555 = 0.90
500+55
The automobile would be available 90% of the time.
***
Example 8.2
Solution
R(t) = exp(-At)
Therefore,
Jl
0.98
168 Reliability Engineering
1-~ t
\-f>t
The following set of differential equations can be obtained from the state-
probability equations,
(8.21)
Maintainability and Availability 169
Sl S2
= -------- exp(s2t) - ---------- exp(sl t ) (8.22)
Where,
The mean time to first system failure (MTFF) is another system parameter useful
for the analysis of system effectiveness when repairs are performed. This
parameter is often referred to as the mean time between failures (MTBF) as
the system states alternate between good and bad continuously due to
repair.
00
MTFF = J R(t) dt
o
r
ao
51 exp (S2t) -S2 exp (Slt)
= I ------------------------------------
J (Sl-S2)
o
(8.24)
For ~ = 0, we get MTFF = 31(21..) which is the mean time to failure of a two-
unit non-maintained parallel system. Similarly, for a standby two-unit system
o
1- Ao1 t
\Il t
2
(8.28)
Therefore,
A2
A (00 ) = 1 - --------------------- 1 - [A/( A + 1l)]2 (8.29)
A2 + 2 All + 112
Example 8.3
time per fault of 20 hours. What is the mean availability of the system?
Solution
Hence,
A1 = [J.l1/(J.l1 +A.1»)=[0.02/(0.02+9xl0- 4»)= 0.9569
1..2 = 15xl0-4/hr
Hence,
A2 = [J.l2/(J.l2 + 1..2») = [0.05/(0.05 + 15xl 0- 4)] = 0.9800
Hence, the system availability for two transmitters in parallel is given by:
A = 1 - (1 - A1)(1 - A2)
1 - (1 -0.9569)(1 - 0.9800)
= 1 - 0.0431 x 0.02 = 0.9987
***
8.7 PREVENTIVE MAINTENANCE
12.1 INTRODUCTION
Reliability costs can be divided into five categories as shown in fig. 12.1.
272
Economics of Reliability Engineering 273
This classification includes all those costs associated with internal failures,
in other words, the costs associated with materials, components, and
products and other items which do not satisfy quality requirements.
Furthermore, these are those costs which occur before the delivery of the
product to the buyer. These costs are associated with things such as the
following:
1. Scrap
2. Failure analysis studies
3. Testing
4. In-house components and materials failures
5. Corrective measures
Classification II
1. Evaluating suppliers
2. Calibrating and certifying inspection and test devices and
instruments.
274 Reliability Engineering
3. Receiving inspection
4. Reviewing designs
5. Training personnel
6. Collecting quality-related data
7. Coordinating plans and programs
8. Implementing and maintaining sampling plans
9. Preparing reliability demonstration plans
Classification III
Classification IV
Classification V
This category includes costs associated with detection and appraisal. The
principal components of such costs are as follows:
1. Cost of testing
2. Cost of inspection (Le.,in-process, source, receiving, shipping
and so on)
3. Cost of auditing
products will increase reliability design costs and internal failure costs.
However, after some time internal failure costs will start decreasing. The
external costs like transportation do not depend on reliability but installation
and commissioning and maintenance costs will show decline with an
increase in reliability.
Total Cost
-----------------------------~--~--~
Cost
Failure Cost
Mfg. Cost
Operating Cost
Reliability
13.1 INTRODUCTION
111
EXTERNAL SOURCES
293
294 Reliability Engineering
The managing of reliability and quality control areas under the impact of
today's organized world competition is a highly complex and challenging
task. Management's reliability and quality control ingenuity in surmounting
the technological developments required for plant equipment, process
controls, and manufactured hardware requires a close working relationship
between all producer-and user-organization elements concerned.
The techniques and applications of reliability and quality control are rapidly
advancing and changing on an international basis. Industry views the use
of higher performance and reliability standards as scientific management
tools for securing major advantage over their competition. The application of
these modern sciences to military equipment, space systems, and
commercial products offers both challenge and opportunity to those
responsible for organization effectiveness. The use of intensified reliability
and quality programs as a means to improving product designs, proving
hardware capability, and reducing costs offers far reaching opportunity for
innovations in organization and methods.
1. Maximize output,
2. Optimize reliability,
3. Minimize waste,
4. Maximize customer satisfaction and reputation,
5. Optimize job satisfaction, and
6. Minimize discontent.
All concerned should participate in deciding specific objectives and agree for
the ways and means of achieving them. Management by objectives approach
places greater emphasis on the importance of the basic decisions made
during design and development cycle in terms of reliability and how well it
satisfies the needs for which it is intended.
1. Clearly understandable,
2. Unambiguous, and
3. Realistic in terms of resources available.
6. Maintenance policy:
Management must provide the controls needed to assure that all quality
attributes affecting reliability, maintainability, safety, and cost comply with
commitments and satisfy the customer's requirements. Tersely stated,
management must have well-planned policies, effective program planning,
timely scheduling, and technical training. Management must clearly state and
support its objectives and policies for accomplishing the product quality and
reliability and assign responsibility for accomplishment to appropriate
functions throughout the organization.
PRESIDENT
OR
PLANT GENERAL MANAOER
Management must recognize and choose the type of persons that are needed
to fill the key positions in the reliability and quality control organization.
Management must know that these selected people will be able to work
closely with and motivate others to accomplish their respective tasks.
Top management philosophy establishes the element for employee
motivation throughout the enterprise.
Responsibility for costs within the reliability and quality control organizations
can be most effectively accomplished when specific, capable individuals are
charged with coordinating all matters relating to cost analysis and budget
control. However, the assignment of coordination responsibility to these
individuals must not be allowed to detract from the duty of each member of
the reliability and quality control organization to maintain a high level of cost
effectiveness.
The cost control function within the reliability and quality control
organization is most frequently located within the quality control
Administrative Group, the Quality Control Systems Group, or the Quality
Control Engineering Group. Regardless of which group is given the
responsibility, the director of reliability and quality control and his
department managers must maintain very close and continuing
communications with the responsible individuals. Timely analysis of trends
and decisions and guidance should be provided frequently.
The reliability and quality control management team has value to the total
organization that is related directly to its favourable impact on product
300 Reliability Engineering
The abrupt deemphasis of cost plus fixed fee military contracting has
focused attention upon the incentive contract as a means for assuring
effective management interest in achieving product reliability and
maintenance commitments. With this medium, a specified scale of incentive-
and sometimes penalty is applied as a factor in the total contract price.
Penalty scales are usually applied at lower rates than incentive scales and
may be omitted in competitive fixed price contracts.
Every product merits an analysis of the total tasks to be performed with the
allowed costs. The estimation of costs for every function must be quite
close to the final actual costs of the specific function if effective results are
to be achieved. It is apparent that the general readjustment (usually arbitrary
cuts) of budgetary estimates by top management will be in those areas
where the departmental estimates and accounting reports of past
performance on similar programs are in obvious disagreement.
Cost estimation of the equipment and facilities required for standards and
calibration, process control, inspection and test is another essential task
for reliability and quality control engineers. Applicable staff and line
personnel should be given the opportunity to take part in the planning of all
equipment and facilities expansion, retirement, or replacement.
To control cost in the quality and reliability programs, careful long range
planning must be exercised by management. This planning must be
accomplished by those to whom top management has delegated the
responsibility and who will be held accountable for the implementation of the
plans. The controlling of these long range plans at the time of
implementation is one of the basic principles of cost control.
objectives of the consumer and the company. At the top management level,
the matrix technique is useful in determining the organisation structure
based upon the responsibilities delegated to each department and as a
basis for penetrating new market areas. In all cases, the effectiveness
of the management process is directly related to profitability through
consumer assurance that product performance and quality are maximized
within the negotiated cost structure.
A study of programs determine the need for an operational analysis since the
interface relations between the sections for each contract would have to be
established during the proposal stage. Each new program is placed in the
organization after a decision has been made as to the need for establishing
it as a project. Several factors are considered and the methodology of
decision theory is applied. The following factors are considered as the most
heavily weighted.
1. Customer Requirement
2. Special Requirements
3. Schedule
6. Manpower Availability
The program requirements for specialized manpower are such that this
factor is considered. This objective is not heavily weighted since it is
related to attainment of other objectives.
The management function then utilizes this tool for planning and action in
performance of its activities. The organization matrix provides the
mechanism for management in an expeditious manner and efficient
departmental control commensurate with this company's products and
philosophies.