Memory Testing
Memory Cells Per Chip
                            1
                 Test Time in Seconds
 (Memory Size n Bits, Memory Cycle Time 60ns)
Size            Number of Test Algorithm Operations
  n         n       n X log2n      n3/2           n2
  1 Mb     0.06         1.26       64.5         18.3 hr
  4 Mb     0.25         5.54      515.4        293.2 hr
 16 Mb     1.01        24.16      1.2 hr      4691.3 hr
 64 Mb     4.03        104.7      9.2 hr      75060.0 hr
256 Mb    16.11        451.0     73.3 hr     1200959.9 hr
  1 Gb    64.43       1932.8    586.4 hr    19215358.4 hr
  2 Gb    128.9       3994.4    1658.6 hr   76861433.7 hr
                       Fault Types
       – Permanent
          • System is broken and stays broken the
            same way indefinitely
       – Transient
          • Fault temporarily affects the system
            behavior, and then the system reverts to the
            good machine
          • Time dependency, caused by environmental
            condition
       – Intermittent
          • Sometimes causes a failure, sometimes
            does not
                                                       4
                                                            2
               Failure Mechanisms
• Permanent faults:
  – Missing/Added Electrical Connection
  – Broken Component (IC mask defect or silicon-
    to-metal connection)
  – Burnt-out Chip Wire
  – Corroded connection between chip & package
  – Chip logic error (Pentium division bug)
• Transient Faults:
  –   Cosmic Ray
  –   An a particle (ionized Helium atom)
  –   Air pollution (causes wire short/open)
  –   Humidity (temporary short)
  –   Temperature (temporary logic error)
  –   Pressure (temporary wire open/short)
  –   Vibration (temporary wire open)
  –   Power Supply Fluctuation (logic error)
  –   Electromagnetic Interference (coupling)
  –   Static Electrical Discharge (change state)
  –   Ground Loop (misinterpreted logic value)
                                                       3
• Intermittent Faults:
  – Loose Connections
  – Aging Components (changed logic delays)
  – Hazards and Races in critical timing paths (bad
    design)
  – Resistor, Capacitor, Inductor variances (timing
    faults)
  – Physical Irregularities (narrow wire -- high
    resistance)
  – Electrical Noise (memory state changes)
      Physical Failure Mechanisms
• Corrosion
• Electromigration
• Bonding Deterioration
  – Au package wires interdiffuse with Al chip pads
• Ionic Contamination
  – Na+ diffuses through package and into FET gate oxide
• Alloying
  – Al migrates from metal layers into Si substrate
• Radiation and Cosmic Rays
  – 8 MeV, collides with Si lattice, generates n-p pairs, causes
    soft memory error
                                                                   4
               Memory Test Levels
Chip,
Array, &
Board
              March Test Notation
    r -- Read a memory location
    w -- Write a memory location
    r0 -- Read a 0 from a memory location
    r1 -- Read a 1 from a memory location
    w0 -- Write a 0 to a memory location
    w1 -- Write a 1 to a memory location
      -- Write a 1 to a cell containing 0
      -- Write a 0 to a cell containing 1
                                            10
                                                 5
    -- Complement the cell contents
   -- Increasing memory addressing
   -- Decreasing memory addressing
   -- Either increasing or decreasing
                                                                11
                 MATS+ March Test
M0: { March element            (w0) }
   for cell := 0 to n - 1 (or any other order) do
      write 0 to A [cell];
                                              O(n) complexity
M1: { March element (r0, w1) }
                                              NPSFs cannot be detected
   for cell := 0 to n - 1 do
      read A [cell]; { Expected value = 0}
      write 1 to A [cell];
M2: {March element (r1, w0) }
   for cell := n – 1 down to 0 do
       read A [cell]; { Expected value = 1 }
       write 0 to A [cell];
                                                                12
                                                                         6
                  Fault Modeling
• Behavioral (black-box) Model
   – State machine modeling all memory content combinations
   – Intractable
• Functional (gray-box) Model
   – Frequently used
• Logic Gate Model
   – Not used
   – Inadequately models transistors & capacitors
• Electrical Model
   – Very expensive
• Geometrical Model
   – Layout Model
   – Used with Inductive Fault Analysis
                                                        13
         Detailed Functional Model
                                                        14
                                                              7
       Simplified Functional Model
                                               15
            Functional Faults
        Fault
SAF     Stuck-at fault
TF      Transition fault
CF      Coupling fault
NPSF    Neighborhood Pattern Sensitive fault
                                               16
                                                    8
                Stuck-at Faults
• Manifestation:
  – The logic value of a cell or line is always at 0 or 1.
• Necessary condition for detection:
  – For each cell, a 0 and a 1 must be read.
                                                      17
          Stuck-at Faults (contd.)
                                                      18
                                                             9
             Transition Faults
• It is a special case of stuck-at fault, in
  which a cell fails to make a 0  1, or a 1 
  0 transition.
• Up transition fault:
  – Denoted as: < ↑ / 0 >
• Down transition fault:
  – Denoted as: < ↓ / 1 >
                                               19
        Transition Faults (contd.)
• Necessary condition to detect and locate all
  transition faults:
  – Each cell must undergo a ↑ transition and a ↓
    transition, and be read after each, before
    undergoing any further transitions.
                                               20
                                                    10
        Transition Faults (contd.)
            Up transition fault < ↑ / 0 >
                                                   21
              Coupling Faults
• Coupling Fault (CF):
  – Transition in memory bit j causes unwanted
    change in memory bit i.
• 2-Coupling Fault:
  – Involves 2 cells; special case of k-coupling fault.
  – Must restrict k to make the fault model practical.
     • For example, NPSF (to be discussed later).
• Inversion and Idempotent CF’s
  – Special cases of 2-coupling faults.
                                                   22
                                                          11
    Inversion Coupling Faults (CFin)
• A ↑ or ↓ transition in cell j inverts the
  contents of cell i.
  – Cell i is said to be coupled to cell j.
• Two possible CFin’s are:
  – < ↑ ;  > and < ↓ ; >
• Necessary condition for detection:
  – For all cells that are coupled, each should be read
    after a series of possible CFin’s may have occurred
    (due to writing into the coupling cells), and the
    number of coupled cell transitions must be odd (to
    prevent the CFin’s from masking each other).
                                                  23
                  CFin (contd.)
• Theorem:
  – Not all linked CFin’s are detected by March
    tests.
                                                  24
                                                          12
CFin: Good Machine State
                             25
CFin: Faulty Machine State
                             26
                                  13
  Idempotent Coupling Faults (CFid)
• A ↑ or ↓ transition in cell j sets cell i to 0 or 1.
                 ↑; 0>, <↑
   – Denoted as <↑       ↑;1>, <↓
                                ↓;0>, <↓
                                       ↓;1>
• Necessary condition for detection:
   – For all cells that are coupled, each should be read
     after a series of possible CFid’s may have occurred
     (due to writing into the coupling cells), such that
     the sensitized CFid’s do not mask each other.
• Asymmetric CFid:
   – Coupled cell only does ↑ or ↓.
• SymmetricCFid:
   – Coupled cell does both due to fault.
                                                  27
        CFid: Faulty Machine State
                                                  28
                                                           14
   Dynamic Coupling Faults (CFdyn)
• Read or write in cell of 1 word forces cell
  in a different word to 0 or 1.
• <r0 | w0 ; 0>, <r0 | w0 ; 1>,
  < r1 | w1 ; 0>, and <r1 | w1; 1>
  – ‘|’ denotes “OR” of two operations
• More general than CFid, because a CFdyn
  can be sensitized by any read or write
  operation.
                                                            29
                Bridging Faults
• Short circuit between 2 or more cells or lines.
• 0 or 1 state of coupling cell, rather than
  coupling cell transition, causes coupled cell
  change.
• Bidirectional fault -- i affects j, j affects i
• AND Bridging Faults (ABF):
  – < 0,0 / 0,0 >, <0,1 / 0,0 >, <1,0 / 0,0>, <1,1 / 1,1>
• OR Bridging Faults (OBF):
  – < 0,0 / 0,0 >, <0,1 / 1,1 >, <1,0 / 1,1>, <1,1 / 1,1>
                                                            30
                                                                 15
     Address Decoder Faults (ADFs)
• Address decoding error assumptions:
   – Decoder does not become sequential
   – Same behavior during both read & write
• Multiple ADFs must be tested for
• Decoders have CMOS stuck-open faults
                                                   31
•   Theorem:
    – A March test satisfying the following
      conditions detects all address decoder faults:
      a) (rx, ………, wx)
      b) (rx, ………, wx)
      where the dots indicate any number of read or
      write operations.
                                                   32
                                                        16
            Reduced Functional Faults
Fault       Functional fault
SAF     a   Cell stuck
SAF     b   Driver stuck
SAF     c   Read/write line stuck
SAF     d   Chip-select line stuck
SAF     e   Data line stuck
SAF     f   Open circuit in data line
CF      g   Short circuit between data lines
CF      h   Crosstalk between data lines
AF      i   Address line stuck
AF      j   Open circuit in address line
AF      k   Shorts between address lines
AF      l   Open circuit in decoder
AF      m   Wrong address access
AF      n   Multiple simultaneous address access
TF      o   Cell can be set to 0 (1) but not to 1 (0)
NPSF    p   Pattern sensitive cell interaction
                                                            33
Functional RAM Testing with March Tests
 • March Tests can detect AFs -- NPSF
   Tests Cannot
 • Conditions for AF detection:
    – It must read the value x from cell 0, write x’ to
      cell 0; read x from cell 1, write x’ to cell 1; for
      the entire memory.
    – It must read the value x’ from cell n-1, write x
      to cell n-1; read x’ from cell n-2, write x to cell
      n-2; for the entire memory.
 • In the following March tests, addressing
   orders can be interchanged.
                                                            34
                                                                 17
             Irredundant March Tests
 Algorithm    Description
   MATS       { (w0); (r0, w1); (r1) }
  MATS+       { (w0); (r0, w1); (r1, w0) }
  MATS++      { (w0); (r0, w1); (r1, w0, r0) }
 MARCH X      { (w0); (r0, w1); (r1, w0); (r0) }
  MARCH       { (w0); (r0, w1); (r1, w0);
    C--          (r0, w1); (r1, w0); (r0) }
 MARCH A      { (w0); (r0, w1, w0, w1); (r1, w0, w1);
                 (r1, w0, w1, w0); (r0, w1, w0) }
 MARCH Y      { (w0); (r0, w1, r1); (r1, w0, r0); (r0) }
 MARCH B      { (w0); (r0, w1, r1, w0, r0, w1);
                   (r1, w0, w1); (r1, w0, w1, w0);
              (r0, w1, w0) }
                                                      35
      Irredundant March Test Summary
Algorithm    SAF    AF    TF    CF CF CF          Linked
                                in id dyn         Faults
MATS         All   Some
MATS+        All    All
MATS++       All    All   All
MARCH X      All    All   All   All
MARCH C--    All    All   All   All All All
MARCH A      All    All   All   All                Some
MARCH Y      All    All   All   All                Some
MARCH B      All    All   All   All                Some
                                                      36
                                                           18
        March Test Complexity
         Algorithm      Complexity
           MATS             4n
          MATS+             5n
          MATS++            6n
         MARCH X            6n
         MARCH C--         10n
         MARCH A           15n
         MARCH Y            8n
         MARCH B           17n
                                                 37
MATS+ Example:: Cell (2,1) SA0 Fault
  MATS+:
  { M0: (w0); M1:   (r0, w1); M2:   (r1, w0) }
                                                 38
                                                      19
  MATS+ Example:: Cell (2, 1) SA1 Fault
     MATS+:
     { M0: (w0); M1:     (r0, w1); M2:   (r1, w0) }
                                                      39
        MATS+ Example:: Multiple AF
• Cell (2,1) is not addressable
• Address (2,1) maps into (3,1) & vice versa
• Can’t write (2,1), read (2,1) gives random #
      MATS+:
      { M0: (w0); M1:    (r0, w1); M2:   (r1), w0 }
                                                      40
                                                           20
           Pattern Sensitive Faults
• Base Cell – cell under test
• Neighborhood -- Immediate cluster of cells whose
  pattern makes base cell fail
• Deleted Neighborhood – Neighborhood without
  the base cell
• NPSF -- Neighborhood Pattern Sensitive Fault
  (Most General k-coupling fault)
   – ANPSF -- Active Neighborhood Pattern Sensitive Fault
   – PNPSF -- Passive Neighborhood PSF
   – SNPSF -- Static Neighborhood Pattern Sensitive Fault
                                                            41
    Neighborhood Pattern Sensitive
           Coupling Faults
• Cell i’s ability to change influenced by all other
  memory cell contents, which may be a 0/1 pattern
  or a transition pattern.
• Testing assumes read operations are fault free
                                                            42
                                                                 21
                  Type 1 Active NPSF
• Active: Base cell changes when one deleted
  neighborhood cell transitions
• Condition for detection & location: Each base cell
  must be read in state 0 and state 1, for all possible
  deleted neighborhood pattern changes.
• C i,j <d0, d1, d3, d4 ; b>
• C i,j <0, ↓ , 1, 1; 0> and C i,j <0, ↓ , 1, 1;  >
                                                          43
                 Type 2 Active NPSF
 • Used when diagonal couplings are significant,
   and do not necessarily cause horizontal/vertical
   coupling.
                                                          44
                                                               22
                 Passive NPSF
• Passive:
    – A certain neighborhood pattern prevents the
      base cell from changing
• Condition for detection and location:
    – Each base cell must be written and read in
      state 0 and in state 1, for all deleted
      neighborhood pattern changes.
•    ↑/ 0 (↓
           ↓ /1) -- Base cell fault effect
    indicating that base cannot change
                                                    45
                  Static NPSF
• Static:
    – Base cell forced into a particular state when
      deleted neighborhood contains particular pattern.
• Differs from active -- need not have a
  transition to sensitive SNPSF
• Condition for detection and location:
    – Apply all 0 and 1 combinations to k-cell
      neighborhood, and verify that each base cell was
      written.
• Ci,j < 0, 1, 0, 1; - / 0> and
  Ci,j < 0, 1, 0, 1; - / 1>
                                                    46
                                                          23
                      Number of NPSFs
• Active Neighborhood Pattern Sensitive Faults (ANPSF)
   – Base cell 0 and 1
      • ↑ and ↓ transitions in k – 1 cells
          – All 0-1 patterns in k – 2 cells
   – 2 (k – 1) 2 × 2k – 2 = (k – 1) 2k patterns
• Passive Neighborhood Pattern Sensitive Faults (PNPSF)
   – Base cell ↑ and ↓ transition
      • All 0-1 patterns in k – 1 cells
   – 2 × 2k – 1 = 2k patterns
• Total APNPSF patterns = (k – 1) 2k + 2k = k 2k
• Static Neighborhood Patterns (SNP) = 2k
   Sequencing Neighborhood Patterns with
              Minimal Writes
                End    100                    110
                                                          k=4
Start
    000                010                    101         111
    Deleted
 neighborhood          001                    011
    patterns
                             Hamiltonian path for SNPSF
                             Eulerian path for ANPSF
                                                                24
• Euler Path:
  – It is a path in a graph that traverses every edge
    of the graph exactly once.
• Hamiltonian Path:
  – It is a path that starts with some vertex of a
    graph, and traverses every vertex exactly
    once.
                                                            49
           Hamiltonian Path, k = 5
                                       1011
                                                     1111
                                1001
                                              1101
           0011
                         0111          1010
                                                     1110
    0001
                  0101          1000
                                              1100
           0010
                         0110
    0000
                  0100
                                                                 25
           Tiling for Type-1 Neighborhood
1      2    3    0      0   1      2      3      0     0     1        2
0      4    0     1     2   3      4      0      1     2     3        4
0      1    2    3      4   0      1      2    3       4     0        0
2VLSI3 Test:4 Lecture
                 0    1     2     3       4    0       1     2        3
        15alt
4    0      1    2    3     4     0       1      2     3     4        0
1      2    3    4      0   1      2      3    4       0     1        2
      4                           4                    2     3        4
0           0     1     2   3             0      1
                Total Patterns for n Cells
           Fault type           Without tiling       With tiling
    Static neighborhood
    patterns sensitive                n × 2k           n × 2k / k
    faults (SNPSF)
    Active and passive
    neighborhood pattern
                                  n × k × 2k         n × k × 2k / k
    sensitive faults
    (APNPSF)
                k = neighborhood size = 5 or 9
                                                                          26
       Type 1 Tiling Neighborhoods
• Write changes k different neighborhoods.
• Tiling Method: Cover all memory with non-
  overlapping neighborhoods.
                                                  53
             Two Group Method
 • Only for Type-1 neighborhoods.
 • Use checkerboard pattern, cell is simultaneously a
   base cell in group 1, and a deleted neighborhood cell
   in 2.
                                                  54
                                                           27
               NPSF Fault Detection
              and Location Algorithm
1. write base-cells with 0;
2. loop
      apply a pattern; { it could change the base-
         cell from 0 to 1. }
      read base-cell;
   endloop;
3. write base-cells with 1;
4. loop
      apply a pattern; { it could change the base-
         cell from 1 to 0. }
      read base-cell;
   endloop;
                                                         55
  NPSF Testing Algorithm Summary
       •   A: active, P: passive, S: static
       •   D: Detects Faults, L: Locates Faults
                    Fault Fault Coverage            Oper-
  Algorithm         Loca-          NPSF             ation
                    tion? SAF TF A P S             Count
TDANPSF1G             No   L      D                163.5 n
TLAPNPSF1G           Yes   L   L L L L             195.5 n
TLAPNPSF2T           Yes   L   L L L               5122 n
TLAPNPSF1T           Yes   L   L L L                194 n
 TLSNPSF1G           Yes   L           L            43.5 n
 TLSNPSF1T           Yes   L           L            39.2 n
 TLSNPSF2T           Yes   L           L          569.78 n
TDSNPSF1G             No   L           D          36.125 n
                                                         56
                                                              28
    NPSF Testing Algorithms
 Algorithm       Neigh-   Method    k
                  bor-
                  hood
TDANPSF1G        Type-1   2 Group   5
TLAPNPSF1G       Type-1   2 Group   5
TLAPNPSF2T       Type-2    Tiling   9
TLAPNPSF1T       Type-1    Tiling   5
 TLSNPSF1G       Type-1   2 Group   5
 TLSNPSF1T       Type-1    Tiling   5
 TLSNPSF2T       Type-2    Tiling   9
TDSNPSF1G        Type-1   2 Group   5
                                        57
             Fault Hierarchy
                                        58
                                             29
        Memory Testing Summary
• Multiple fault models are essential
• Combination of tests is essential:
  –   March -- SRAM and DRAM
  –   NPSF -- DRAM
  –   DC Parametric -- Both
  –   AC Parametric -- Both
                                        59
                                             30