0% found this document useful (0 votes)
421 views36 pages

Mbist Repair

The document discusses memory repair and diagnosis techniques, highlighting the importance of identifying defects in memory systems as technology scales down. It details the processes of diagnosis using BIST circuitry, the classification of memories into repairable and non-repairable, and the methods for mapping failures to specific memory locations. Additionally, it outlines the repair flow, types of repairs (hard and soft), and the use of fuses for permanent repair solutions.

Uploaded by

amenafrt96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
421 views36 pages

Mbist Repair

The document discusses memory repair and diagnosis techniques, highlighting the importance of identifying defects in memory systems as technology scales down. It details the processes of diagnosis using BIST circuitry, the classification of memories into repairable and non-repairable, and the methods for mapping failures to specific memory locations. Additionally, it outlines the repair flow, types of repairs (hard and soft), and the use of fuses for permanent repair solutions.

Uploaded by

amenafrt96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Memory Repair & Diagnosis

Devadathan R
Introduction

• As nanometer technology shrinks and the amount of memories are increasing day by day, the chances of defects due to
memories are high
• Repair is one of the popular technique for memory yield improvement.
• With respect to repair mechanism, memories can be classified as repairable and non-repairable.
• To achieve a certain manufacturing yield, in addition to diagnosis support, it is also beneficial to introduce self-repair
features comprising redundant memory cells.
• If the Memory BIST circuitry is able to figure the exact defect locations, its solution and repair, then we can say it as self
repair

Diagnosis : Mostly we can refer this as BIRA circuitry


Analysing failures reported by the memory BIST controller during test execution to determine is the memory is repairable or
not, failing memory and failing address location, failure bit mapping etc. If the memory is repairable and error detected,
determining the repair solution for the memory (additional feature).

Repair : Mostly we can refer this as BISR circuitry


The repair information from diagnosis stage will consider and with respect to the defect type, it will perform the repair on the
memory. Which will results in switching the faulty locations to the redundant locations.
Diagnosis

• Using memory BIST to test embedded memories provides significant advantage over the direct pin access
test methods for PASS/FAIL testing. However, in most cases, it is important to identify the source of physical
failure in the memory. This is referred to as the Diagnosis process.
• StopOnErrorLimit : memory BIST controller is equipped with an error count register that is scan initialized
before each run
There are different diagnostics level are available as follows :

• Memory-Only Level
Identifies the failing memory only
Is useful for small memories, where no root cause analysis is required

• Memory Address Level


Identifies the failing address in a memory
Provides limited failing address mapping
Is useful for applications where soft repair based on address mapping is used

• Real Time Bit Line Diagnosis


Identifies the failing column/row in a failing memory
Is useful for offline repair

• Offline Bit-Mapping
Mapping the failure to bits in memory
Useful when detailed root cause analysis is to be performed for yield improvements. Note that in this case, diagnostic time is not relevant

• Online (Real Time) Bit-Mapping


Maps the failure to bits in memory
Provides short diagnostic time; however, it requires significant tester memory and might require specific hardware to be implemented on the chip
Generating an Offline Memory Bit-Map

• Objective :
Generate a bit map of the memory for all failing bits
Detailed root cause analysis

To achieve the Offline Bit-Mapping diagnostics level, you MUST ensure the following:
• The Setup chain is operational
• The BitSlice is set 1 (full parallel testing), which means that a memory with N data bits requires N comparators. You
achieve this by using the BitSliceWidth property of the ETPlanner configuration file
• (For the compare status approach only.) The CompStat property of the ETPlanner configuration file is set to Yes or
SharedWithGo.
• (For the compare status approach only.) The CMP_STAT/GO port is connected to a chip output. The chip output is
specified using the CompStatPort property of the ETPlanner configuration file.
• (Mandatory for the compare status approach; optional for the Stop-On-Nth-Error approach.) The CompStatMux
property of the ETPlanner configuration file is set to Yes.
• (For the Stop-On-Nth-Error approach only.) The StopOnErrorLimit property of the ETPlanner configuration file is set to a
value sufficiently large to accommodate defects generating a large number of miscompares such as bit line shorts. See the
“Stop-OnNth-Error Approach” section for a detailed description of the factors to take into consideration when specifying
StopOnErrorLimit.
Diagnosis and identification of fault location and bit position
We can start the procedure and step as follows :
• Find the memory library of specified / failure detected memory
• Understand the following things from memory lib (example: synopsis memory which has name as *_uhd2prf128x64m2b1w)
1. Memory type is RF (Register File) & it is 2 port
2. 128 is number of location (depth) & 64 is width of data bit
3. Repairable or not
4. m2 is column muxing 2
5. b1 is bank 1 ( multiple bank needs to be part of address map bit )
6. Logical address map & PhysicalAddressMap :
Address[5:0] = r[3] r[2] r[1] r[0] c[1] c[0] from LogicalAddressMap

If rowAddress[0]: ~ r[0] in PhysicalAddressMap ( example for simplification )


means physical address row[0]th bit is invert of row address r[0]th bit
• Find the failure details from log and use the details of step 3, 4, 5 & 6 to identify the actual failure details

Note: Address mapping in the figure is just example, it is not the correct address mapping for this specific type of memory
Note : As It is client specified, I am not telling the name of the method and Algorithm. Both are same for the client and you will
get to know once you start working.
Bit mapping from failure log
• While simulation, you want to check the diagnosis is working fine or not.
• You created the force file add injected the error
• You want to confirm the diagnosis process is detecting the same address and bit position in the memory where you injected the
error

• In force file : force <hier>.mem_core_array[31][17] 0


You forced the location of 31st and bit of 17th permanently to 0. Memory is *hdspram2048x128m4b8w
1. Single port SRAM
2. 2048 depth & 128 data bit width
3. Repairable
4. column muxing 4
5. bank 8
6. Address mapping(actual from memory library):
ColumnAddress[1:0] : Address[1:0]
BankAddress[1:0] : Address[6:4]
RowAddress[1:0] : Address[3:2]
RowAddress[5:2] : Address[10:7]
So, Logical address [10:0] = { r[5], r[4], r[3], r[2], b[2], b[1], b[0], r[1], r[0], c[1], c[0] }
Physical address [10:0] = { r[5], r[4], r[3], r[2], b[2], b[1], b[0], r[1], r[0], c[1], c[0] }
• Now consider the log report
You got the report which is telling your X, Y, Z address are failing
X is row address , Y is column address and Z is bank address.
From log : Failing bits are Z[2], Z[1], Z[0] , X[1], X[0] and Y[0] and also GO_ID[3] .
{ r[5], r[4], r[3], r[2], b[2], b[1], b[0], r[1], r[0], c[1], c[0] } = { X[5], X[4], X[3], X[2], Z[2], Z[1], Z[0], X[1], X[0], Y[1], Y[0] }

By mapping from the X[5], X[4], X[3], X[2], Z[2], Z[1], Z[0], X[1], X[0], Y[1], Y[0] and put 1 for the locations were we got
error.
So address will be like
X[5], X[4], X[3], X[2], Z[2], Z[1], Z[0], X[1], X[0], Y[1], Y[0] = 000001111101

Remove the column address position Y[1] & Y[0] from address and take remaining as address.
Remaining data will be as highlighted in green colour : 000001111101
11111 = 31 . So we got the physical / logical address as 31.
Now the bit position / data bit

Column muxing will come now


Total number of column is 4 . Y[1] Y[0] can have 4 values like 00, 01, 10, 11
From the failure log Y[1]Y[0] = 01 & GO_ID[3] .

The alignment of bit with respect to columns as follows

In column 0 (Y[1]Y[0]=00) : 0,4,8,12,16…


In column 1 (01) : 1,5,9,13,17…
In column 2 (10) : 2,6,10,14,18…
In column 3 (11) : 3,7,11,15,19…

Take 01 column as Y[1]Y[0] = 01 & 3rd position in the column because GO_ID is 3.
3rd location in column is 13 (1,5,9,13,17). Actually we forced at 17th bit . The different came because first 4 bit were used as spare.
So, the difference of 4 was common in all repairable memories in this condition.
13+4 = 17th bit

We found that 31st address location and 17th bit position is having fault with respect to this test and diagnosis
One more similar case where there is no repair :
• Memory *_hd2prf24x256m1b2w
• Step4: column mux 1 ( column division is not there )
• Step5: 2 banks
• Step6( with respect to mem lib) :
BankAddress[0] : Address[2]
RowAddress[1:0] : Address[1:0]
RowAddress[3:2] : Address[4:3]
Both logical and physical is same here .
Address[4:0] = r[3], r[2], b[0], r[1], r[0]

Forced at the position [12][8]


In log file, we got : X[2] & Z[0]

Convert it to address = r[3], r[2], b[0], r[1], r[0] = X[3], X[2], Z[0], X[1], X[0] = 01100
No need to remove the column as there is no address position for column. Now 01100 is 12 . Address which were we forced.

As there is not column muxing , it is in single column. so bit positions be like 0,1,2,3,4,5,6,7,8,9,10…etc
Now we got the bit position of 8th in failure log. Here no repair also available. So , final bit position is 8 only.

Found that [12][8] is the faulty location and it is the same where we injected the error
Repair flow
Load design

• No difference in loading design from normal MBIST insertion flow


• If repairable memories are there,
set_dft_specification_requirements –memory_test on will be automatically detect the repairable memories from memory
libraries and take care about the creation of BISR & BIRA.

• Need to specify UPF/CPF if different power domain is available or needed for different BISR chain creation
Otherwise all BISR chains will be considering in same or always on (AON) power domain.
• DEF files for placement information
• DftSpecification : RepairOptions
BIRA Placement location should specify
• MemoryBISR wrapper need to specify
DRC
Generate DFT Spec
• Creating DFT Specification using : create_dft_specification
• For Sub-block/Physical-block : MemoryBisrWrapper will specify as follows

• For Top/ Chip level : More info like fusebox location, ijtag host_node, interface model etc. needed.

• process_dft_specification will insert BAP, BISR, BIRA IP for sub block . Along with these will create TAP & BISR controller
for the top level
ICL extraction

• There is no difference from the standard flow.


• Verifies the proper connectivity of the icl modules, that were inserted in the process DFT Specification.
Patterns specification
• The verification of repairable memories involves extra tasks that are not required for memories without self repair. This is
specified in the pattern spec.

• Sub_block :
We need to access the bisr chain and it should read and write the bisr chain.
For memory bist patterns (normal patterns) : extra step is added initially to clear the bisr registers

>> Patterns(MemoryBisr_BisrChainAccess)
-TestStep(BisrChainAccess_write)
-TestStep(BisrChainAccess_read)
>> Patterns(MemoryBist_P1)
-TestStep(clear_bisr)

• TOP level
Expecting the block level connections between BIRA & BISR Register, as well as BISR reg & memory repair parts have been already
verified.
So, Top level TB verifies connection between BISR reg, BISR controller, TAP & fusebox.
Top level verifies the FuseBox access, BISR chain access & Autonomous Modes of operation of BISR controller via TAP

>> Patterns(MemoryBisr_TapAccessMode)
-TestStep(FuseBoxProgram)
-TestStep(FuseBoxRead)

>> Patterns(MemoryBisr_AutonomousMode)
-TestStep(BisrLoadReset) //// initialize the BISR chain and calculate its leafs
-TestStep(BisrChainAccess) //// executes a BISR chain access
-TestStep(SelfFuseBoxProgram) //// the bisr controller has the capability to rotate the bisr chain compress its content & write
the content to the fuse box

-TestStep(VerifyFuseBox) //// verifies the contents of fusebox once compressed and compared the initial patterns that
were loaded to the bisr chain

>> Patterns(MemoryBist_P1)
-TestStep(clear_bisr) //// to initialize all the flipflops in the bisr chain ithout calculating its leafs
-TestStep(run_time_program) //// specifies that the memory bist controllers in the bist step using the scan inserted prior
Validate patterns / simulation

• There is no different in validating the pattern


• When the BISR detect the fault , it will start the repair process by itself
• You can see the normal testing process + bisr rotation + normal testing process sequence in the waveform
Comments

• When the bist is in normal testing , all jtag/ijtag signals will be communicating with the controller
• When self repair is started only bisr signals will be communicating to controller until it fixes the failure
• In waveform window, you can see the error signal at the interface level will be going to 1 at the time when it
detects the fault and it will go down once the bisr repair starts and remains low if the repair is success and
normal test is done after the repair. Similarly check the MBISTPG_GO signal also to confirm the repair is
done successfully.
• If the error is still showing after repair also, then chances of multiple error compared to available spare
locations / inefficient reapir might be the reason
• In normal / repair simulation you can figureout the signal name <block_name>_repair_status at the
interface to check the status of repair condition.
• <block_name>_repair_status[1:0]
00 - No Repair Required
01 - Repair Required
1x - Not Repairable
Types of repair
• With respect to programmability/Reconfiguration
Soft repair
Hard repair

• With respect to redundant / spare locations


Row redundancy / Row repair
Column redundancy / Column repair
Combination of both row and column

❖ Repair analysis approach and types


On-chip
Off-chip
Hard repair:

 In this approach, repair instructions are stored permanently within the die through the programming of
fuses.
 The two common fuse types are laser and electrical.
 Laser fuses are programmed by cutting a metal link, while electrical fuses (eFuses) are typically one-time
programmable or flash memory elements and are programmed using an elevated voltage level.
 eFuse usage is growing rapidly as they are generally smaller than laser fuses and they do not require special
equipment or a different test insertion to be programmed.
 For this last reason, eFuses are also associated with Self-Repair approaches which are described later
section.

Advantage:
Short repair setup time

Disadvantage:
One time repair
Specific technology is required
Soft repair:
 In this approach, repair instructions are stored in volatile memory, typically in scan registers, at each power up of the
device.
 Soft repair has the advantage of being able to address defects that may arise over time as new repair instructions can be
created and stored throughout the life of the device.
 This provides higher long term availability and reliability.
 Because the repair instructions are not permanently stored within the device, they have to be either stored somewhere
external to the device (somewhere in the system) or they have to be generated on-the-fly at power-up.
 Storing the repair instructions in the system can be daunting from a logistics point of view as the repair instructions for
typically many different memories within many different devices have to be properly managed.
 For this reason, soft repair is almost exclusively associated with a BIRA mechanism to calculate repair instructions on-chip
at power up.

Advantages:
Multi time repair
Low design overhead

Disadvantage:
Some latent defects cannot be repaired
Long repair setup time
Row redundancy
• One or more spare rows are added per memory.
• In the case of several spare rows, some redundancy schemes force all
rows to be allocated as a contiguous block while others allow each row
to be allocated separately.
• It is rare to have more than two spare rows within a memory.

Advantages:
 Cheapest repair method
 The BIST overhead is cheapest as a serial test interface between the
BIST controller and memory can be used.
 A serial interface only requires one comparator per word rather than
one per bit (I/O).
 The amount of BIRA logic is also low and varies only slightly with the
memory size as only the most significant bits (MSBs) of the row
address bits are logged.
Disadvantages:
 Has a slight impact on performance as the setup time on the
 address inputs is slightly increased.
 Bit level diagnostics are not possible if the serial interface is used.
Column redundancy
• One or more single columns are added.
• Each redundant element can repair one failing column within any
memory.

Advantages:
 This has the least effect on memory performance as there is no
impact on address decoding.
Disadvantages:
 It precludes the use of a serial interface between the BIST controller
and the memory as a comparator per bit (I/O) is needed.
 The area cost is a function of the number of I/Os so that even a
small memory can require a large amount of repair circuitry.
 The BIRA circuitry required to encode the failing I/O number is
relatively big and slow.
 This may reduce the maximum frequency at which the BIST and
BIRA can operate.
Combination of both

• It is very rare to have combination of both row and column redundancy


• One or more spare rows as well as one or more spare columns are added per memory
• The number of spares rows or spare columns rarely exceeds two

Advantages:
 Provides the highest repair success rate for a given number of spares.
 Having spares in both dimensions not only improves the ability to cover a random distribution of defects, but also
improves the ability to cover defect mechanisms that affect an entire word (e.g. word line fault) or entire column
(e.g. bit line fault).
Disadvantages:
 Very expensive from both a memory overhead as well as from a BIRA overhead point of view.
 Can only be justified for very large memories and generally for less mature processes.
Repairing of a combination of row and column redundancy
Repair Analysis:
 Repair rate analysis
 Repair rate can be defined as the ratio of the number of repaired memories to the number of defective memories.
 This component of the repair process consists of determining which of a memory’s defective sections (typically rows
or columns) must be replaced with available spares.
 Repair analysis can be performed in below mentioned two ways.
 On Chip
 Off Chip

Off-Chip Approach:
 In the off-chip approach, all memory failures are logged on the tester and the resulting fail data is post-processed
offline.
 A significant drawback of the off-chip approach is that logging all of the fail data off-chip results in a large increase in
test time.
 Because of this, the majority of today’s repair approaches use an on-chip repair analysis capability

On-Chip Approach:
 On-chip repair analysis often referred to as BIRA for Built-In Repair Analysis.
 With BIRA, absolutely no fail data needs to be logged externally as the BIRA circuitry or engine analyzes the fail data
coming out of an associated BIST controller on the fly.
 By the end of the memory test, the BIRA engine has determined the spare element allocation necessary to repair the
chip.
Built in Self Repair Architecture Description:

 Self-repair solution referred to as BISR (Built-In Self-Repair), is where both the repair analysis and
repair delivery are performed on-chip.
 BISR solution consists of the combined BIRA and soft repair capabilities.
 One disadvantage of this approach however is that since the repair instructions are calculated once at
power-up, they may not take into account defects that only manifest themselves under specific
operating conditions such as high temperature.
 For this reason, more advanced BISR solutions now incorporate a combination of both soft and hard
repair capabilities.
 Hard repair is used to store repair instructions determined during manufacturing test and soft repair
is then used at each power up to address any new defects.
 On-chip management of a centralized programmable fuse pool is performed by a fuse controller.
 This controller together with one or more BIST controllers perform all necessary activities for testing
and repairing memories.
 In this architecture, the BIST interfaces to memories containing redundancy are equipped with a BIRA
engine to analyze failures and generate any necessary repair instructions in the form of fuse data.
 A dedicated chip-wide (BISR) scan chain is used by the fuse controller to transfer fuse data to and
from the eFuse array and the various memories.
 This scan chain contains a BISR register for each memory with redundancy.
 The operation of this BISR architecture is described in detail as follows
This step 1 to step 7 is the actual procedure of memory self repair with the help of BIRA, BISR registers
and BISR controller

You might also like