1.
1 Fault Classification (ebook)
               chp03               1
What are computer faults? Before answering,
    we need to conceptualise faults in computer
                                   systems.
• To better understand the fault tolerance
  phenomena in a computer system, we need
  fundamental concepts of computer faults, errors
  and failures
   – Analogous to the study of how electric current is generated in relation to the flow-
                  .
      of-an-electron concept
• Here we need two models: physical-logical-
  effective machines model and fault->error-
  >failure model.
                                         chp03                                              2
     Fault-Error-Failure Three-
          Machine Model
• Physical Machine
  – system’s physical parts and software code
• Logical Machine
  – system’s logical part at the circuitry/gate level,
    where programs are executed and data manipulated
• Effective Machine
  – system’s universe, in which the external effect of
    an error is exhibited
                         chp03                           3
     Faults -> Errors -> Failures
• A fault: that part of the system’s physical state or
  software code which malfunctions (per system
  spec), leading to a logical error in the system (may
  not be instant, ie a latent fault)
• An error: that part of the system’s logical state
  which is liable to lead to a failure (may not be
  instant, ie a latent error)
• A system failure: that state of delivered service not
  complying with the specified service
                         chp03                        4
                     Pondering:
• We know a computer failure is caused by a computer error
  that is in turn caused by a computer fault. But what are the
  possible causes of a computer fault? See next slide…
                             chp03                           5
   What are the causes/origin of
        computer faults?
• specification mistakes
   – incorrect algorithms, incorrectly specified requirements (timing, power,
     environmental)
• implementation mistakes
   – poor design, software coding mistakes
• component defects
   – manufacturing imperfections, random device defects,
   – components wear-outs
• external factors
   – radiation, lightning, operator mistakes
                                      chp03                                     6
chp03   7
How could we classify faults
 according to their temporal
  nature? … next lecture…
             chp03             8