Mentorpaper
Mentorpaper
W H I T E P A P E R
T E C H N O L O G Y A R E A
w w w . m e n t o r . c o m
Divide and Conquer: Hierarchical DFT for SoC Designs
INTRODUCTION
Large System on Chip (SoC) designs present many challenges to all design disciplines, including design-for-
test (DFT). By taking a divide-and-conquer approach to test, significant savings in tool runtime and memory
consumption can be realized. This whitepaper describes the basic components of a hierarchical DFT
methodology, the benefits that it provides, and the tool automation that is available through Mentor’s Tessent
tool suite.
With hierarchical DFT, once a core design is complete it means it’s DFT is complete as well, and that it includes
a set of patterns that can be used to test the core regardless of how it gets integrated into an SoC. Cores can
be tested individually, in groups, or all together; whatever best suits the test plan and available pin resources
in the SoC. Interconnect between cores and chip-level glue logic are then tested separately and the coverage
for all test modes is combined into a single comprehensive coverage report.
w w w. m e nto r. co m
2
Divide and Conquer: Hierarchical DFT for SoC Designs
cycles and create dependencies between cores that make the cores harder to reuse. It also still requires that
the full-chip netlist be completed before patterns can be generated. The same can be said for any approach
that puts compression logic at the chip-level in order to drive multiple cores with many scan chains. This
arrangement also has a negative impact on compression logic because of the resulting very high chain-to-
channel ratio.
The wrapper chain is a key concept and is the foundation upon which the hierarchical DFT methodology is
built. When testing the contents of the core, otherwise known as Internal mode, the wrapper chain is
configured to launch values into the core and capture responses coming out, as illustrated in Figure 1.
The External mode of the core reconfigures the wrapper chain to launch values from the outputs to test chip-
level glue logic and interconnect while inputs capture the responses, as shown in Figure 2.
Figure 1: Wrapper chain configuration for Internal test Figure 2: Wrapper chain configuration for External test mode.
mode.
The wrapper chain therefore delineates between the logic tested in Internal mode and the logic tested in
External mode. In addition to the wrapper chains, it is also necessary to insert all scan chains, compression
logic, and test control logic to make it “DFT complete.”
w w w. m e nto r. co m
3
Divide and Conquer: Hierarchical DFT for SoC Designs
at the core level that can subsequently be retargeted to the chip level. The key assumption is that the core is
wrapped. Any other core-based DFT methodology that does not include wrapper chains cannot produce
retargetable patterns without seriously compromising test coverage at the core boundaries and chip-level
glue logic. Once core-level patterns have been generated, they should also be verified at the core-level much
the same as functional and physical verification are done. The outputs from this step in the flow are a set of
retargetable patterns and a fault list containing all the coverage information for that pattern set.
w w w. m e nto r. co m
4
Divide and Conquer: Hierarchical DFT for SoC Designs
loaded design and eliminates the need to regenerate patterns at the chip level to test the cores. A single set of
retargeted patterns is saved for each Internal mode configuration required. The example in Figure 4 shows
how Internal mode 1 groups Cores 1 and 2 together whose patterns are retargeted and merged together.
Cores 3 and 4 are retargeted and merged together in Internal mode 2.
Once all cores have been tested in Internal mode, all that’s
left is to test the interconnect and glue logic between the
cores and calculate the final chip-level test coverage
number. First, the chip-level netlist is loaded along with the
graybox of each core. The External mode configuration
accesses all the wrapper chains of the cores as well as any
chip-level scan chains and ATPG is run, as shown in Figure 5.
w w w. m e nto r. co m
5
Divide and Conquer: Hierarchical DFT for SoC Designs
required for hierarchical DFT. However, dedicated wrapper cells are not usually an ideal solution because they
add area to the design and add delay to the functional path. This added area and delay problem has been a
barrier to wider adoption of hierarchical DFT.
All of these situations must be identified and handled by the wrapper insertion process. The resulting wrapper
chains and scan chains might look like Figure 6. All the logic inside these wrapper chains is tested as part of
Internal mode while any logic outside the wrapper chains will be detected in External mode. Notice also that
the input wrapper chains, core chains and output chains each have their own scan shift enable signal. That is
necessary in order to achieve at-speed test coverage at the boundary of the core.
w w w. m e nto r. co m
6
Divide and Conquer: Hierarchical DFT for SoC Designs
What needs to be included in the graybox is primarily the wrapper chains and any combinational logic that
sits between the wrapper chains and the primary inputs/outputs of the core. This is very similar to an interface
logic model that might be used for static timing analysis (STA) in which the path between an I/O port and a
register must be analyzed. What is not included in an STA model would be internal feedback paths that also
drive clouds of logic interfacing to the I/Os. There may also be some control logic needed to put the core into
its test mode or possibly needed by other cores or chip-level logic. The typical reduction factor from the full
netlist size to the graybox is in the range of 10x to 20x, with some designs outside that range. This reduction is
dependent on how much logic must be traced through in order to identify shared wrapper cells. Tessent
TestKompress includes the ability to generate the graybox netlist.
w w w. m e nto r. co m
7
Divide and Conquer: Hierarchical DFT for SoC Designs
Clock control and the placement of that control logic in the design hierarchy is a major consideration when
implementing pattern retargeting. The ideal solution is to have clock control logic that is programmable by
scan data located inside the core. Because the programming of the capture clock on a per pattern basis is part
of the scan data, clocking information is completely self-contained in the pattern set and is not dependent on
external clock sources. This makes it possible to merge the pattern sets of multiple cores without creating
clocking conflicts. Once clock sources are located outside the core (and are presumably shared by other cores)
you can only merge patterns for one clock at a time. Otherwise the cores being merged may require different
commonly sourced clocks for any given pattern, which would result in capture errors in one or both of the
cores. While still retargetable, the resulting patterns are less efficient than they would be if the clock control
were inside the core. Tessent TestKompress includes all of the pattern retargeting functionality described.
Users who are employing this methodology with pattern retargeting are seeing ATPG runtimes reduced by 5x
or more. As important as that reduction is, reduced cumulative time for hierarchical ATPG does not accurately
quantify the complete benefit to runtime challenges. What’s more important is when in the design schedule
ATPG occurs. With retargeting, the patterns for all the cores can be done far in advance of the completion of
the chip-level netlist. That means as soon as the chip netlist is complete, the core-level patterns already exist
and can be retargeted in just minutes instead of taking days to generate patterns from the chip level at a
critical point in the schedule. If there is a late ECO to one of the cores, then you only need to rerun ATPG for
that one core and then retarget. Generating scan patterns is no longer a gating item as you get close to
tapeout. The verification of those patterns is also considerably simplified because it is done primarily at the
core level at the time the core is completed.
The memory footprint reduction is highly design-dependent, but it’s not unusual to see a 10x reduction.
Whatever memory is required for your largest core represents the most memory you’ll need for the entire
chip. The chip-level netlist for External mode is typically very small because it comprises mostly graybox
models, which are usually 1-5% the size of the full core netlist. There are a couple of aspects to why this
memory reduction is advantageous. It opens up quite a few more multi-processor machines to work on ATPG
that would otherwise have all their memory consumed before all the CPUs can be used. This means you can
take better advantage of distributed and multi-threaded processing without running into memory limitations.
You also will no longer have to compete for the biggest machines with other design disciplines like physical
design and physical verification, which usually require lots of memory.
w w w. m e nto r. co m
8
Divide and Conquer: Hierarchical DFT for SoC Designs
A less intuitive advantage to hierarchical DFT is that pattern count (and consequently test time) is often
reduced by as much as 2x. This can be attributed to the fact that the limited scan channel resources no longer
need to be divided across the entire chip. Instead, by breaking up the testing into smaller pieces, all of those
resources can be dedicated to testing the individual cores thereby improving efficiency.
Diagnosis also benefits from hierarchical DFT with pattern retargeting capabilities. Being able to map chip
failures back to the core level allows you to run diagnosis at the core level rather than the full chip. Just like
ATPG, the runtime is reduced dramatically.
CONCLUSION
With some up-front design effort and planning, the biggest challenges of testing large SoCs can be addressed
with a hierarchical DFT methodology. Implementation of the methodology is greatly assisted by the
automation now available in the Tessent tool suite for the most important design tasks.