0% found this document useful (0 votes)
331 views79 pages

Mentorpaper 102908

enjoy

Uploaded by

rohan357
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views79 pages

Mentorpaper 102908

enjoy

Uploaded by

rohan357
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

SoC Emulation—Bursting Into Its Prime

SoC Emulation—Bursting
Into Its Prime
Bernard Murphy—SemiWiki
Daniel Nenni—SemiWiki
Mentor Emulation Team—Mentor Graphics, a Siemens Business

A Semiwiki.com Project

2
SoC Emulation—Bursting Into Its Prime

SoC Emulation—Bursting Into Its Prime

@2017 by Daniel Nenni and Bernard Murphy

All rights reserved. No part of this work covered by the copyright


herein may be reproduced, transmitted, stored, or used in any form
or by any means graphic, electronic, or mechanical, including but
not limited to photocopying, recording, scanning, taping, digitizing,
web distribution, information networks, or information storage and
retrieval systems, except as permitted under Section 107 or 108 of
the 1976 US Copyright Act, without the prior written permission of
the publisher.

Published by SemiWiki LLC


Danville, CA

Although the authors and publisher have made every effort to


ensure the accuracy and completeness of information contained in
this book, we assume no responsibility for errors, inaccuracies,
omissions, or any inconsistency herein.

First printing: January 2018


Printed in the United States of America

Edited by:
Mentor Emulation Team—Mentor Graphics, a Siemens Business

Storage Market Case Study:


Ben Whitehead, Storage Product Specialist, Paul Morrison, Solutions
Specialist and Shakeel Jeeawoody, Emulation Strategic Alliances,
Mentor, a Siemens Business

3
SoC Emulation—Bursting Into Its Prime

Contents
Introduction – Why Emulation 1
Verification and Validation 2
The Performance Wall for Simulation 3
The Beginnings of Emulation in Hardware Design 5
Chapter 1 – ICE is Nice 7
Arrays of FPGAs 8
The Good, the Bad and the Not So Bad 10
Emulation Versus FPGA Prototyping 12
Chapter 2 – Three Architectures 14
Processor-Based Emulation 15
Emulation Based on Custom FPGAs 17
Emulation Based on Commercial FPGAs 20
Timing in the Three Architectures 22
Chapter 3 - Emulation Muscles onto Simulation Turf 24
Speeding Up the Test Environment 24
Multi-User Support 26
A Change in Approach to Debug 29
Chapter 4 - Accelerating Software-Based Verification 31
The Rise of Software in Electronics 31
Why Software is Important in Verification 32
Emulation and Hardware/Software Co-verification 33
All Software isn’t Equally Important to V&V 34
Debugging in Hardware/Software Co-verification 35
Chapter 5 – Beyond Traditional Verification 36
Performance Modeling 36
Power Modeling 36
Test Verification 37
Deterministic ICE 38
Complex Load Modeling 39
Chapter 6 – The Role of Emulation in Design Today 40
Virtual Prototyping and Emulation 40
Simulation and Emulation 41
Emulation and FPGA Prototyping 44
The Outlook for Emulation 45
Storage Market and Emulation Case Study 46
State of Storage 47
Current leading HDD and SSD Storage Technologies 47
HDD Controllers and Associated Challenges 49

4
SoC Emulation—Bursting Into Its Prime

SSD Controllers and Associated Challenges 51


Typical Verification Flow/Methodology 52
Pre-silicon Verification 52
Post-silicon Verification 53
Gaps in Current Methodology 53
Simulation and FPGA Prototyping Methodologies 53
Is the Verification Gap Getting Better? 54
Is increasing Firmware helping current methodologies? 56
Is Hardware Emulation a Viable Option? 56
Hardware Emulation in the flow 56
In-Circuit-Emulation (ICE) Mode 57
Virtual Emulation Mode 61
Implementing an SSD Controller in Mentor Graphics Veloce 64
Creating a Verification or Validation Environment 64
Running the Tests 65
Debugging 66
A/B Testing 66
Conclusion 67
References 69

5
SoC Emulation—Bursting Into Its Prime

Introduction – Why Emulation


The modern electronic systems enabling smartphones, smart
cars, smart everything are so amazingly capable because we
have learned to compress incredible capability into tiny chips
the size of a fingernail, more than we were able to pack into a
large room not much more than 50 years ago.

But this amazing capability comes with a cost. Anything we


can build, we can build wrong which means we must also test
what we build to find (and fix) any potential problems. In the
early days, we only had to check that our room-sized computer
could do arithmetic on smallish numbers correctly and move
data between punched cards or paper tape, simple magnetic-
core memory and teletypes. In fairness, there were plenty of
other mechanical and electrical reliability problems to address
but once solved, testing that the computer functioned correctly
was not especially challenging.

Modern semiconductor engineering has largely solved the


reliability problems, but now we hold in our hands systems a
billion times more functionally complex and therefore

1
SoC Emulation—Bursting Into Its Prime

considerably harder to test. Now accurate testing has in many


ways become vastly more challenging. As consumers, we
expect these marvels to function perfectly when making phone
calls, browsing the Internet and so on and we’ll happily switch
to another supplier if we’re not satisfied. More importantly we
expect these devices to be safe and secure and keep us safe
and secure. And all of this must be satisfied in small device,
often running on a tiny battery that you might re-charge a few
times a week. Test is not an area where we want to hear our
device was “reasonably well tested”.

Verification and Validation


Testing that these complex chips will function correctly is
called verification (did I build what I was told to build) and
validation (does it do the job it’s supposed to do, often not
quite the same thing), which together is usually called V&V.
Chip manufacturing has become so incredibly expensive and
time-consuming that, to the greatest extent possible, simple
economics demand extensive V&V before the chip is built.
Which brings us to simulation and emulation.

These days, any complex engineering objective is modeled


extensively before it is built, from subsystems (automatic
transmissions, fuel injection, jet engines, turbines) to full
systems (cars, aircraft, power plants). We must do this
because trial-and-error design (build it, then test it, then re-
build it and re-test it, …) would be unreasonably expensive
and slow. So we build models and “simulate” those models
operating under many different conditions of stress,
temperature, airflow and other conditions before we commit to
manufacturing.

2
SoC Emulation—Bursting Into Its Prime

The same applies to chip design; we build a model of the chip


and simulate it under many different conditions – user
interaction, data traffic, internet activity, video and audio
activity and all the other functions that might be active.
Fortunately, in chip design the model we create to eventually
build the chip is also the model we can use to drive V&V.

Simulation is typically software-based and, for semiconductor


design, is a large program, which models the primitive logic
functions which make up the design together with the
interaction between those functions, responding to models for
external and internal data traffic. This “bottom-up” simulation,
mimicking the behavior of the full system as the interaction of
all those primitive functions and stimuli, has been the mainstay
of V&V for digital chips for many decades.

The Performance Wall for Simulation


Software-based simulation has many advantages. At least in
theory it’s easy to change the simulation model and stimulus;
all you must do is change the software description of the
design. And it’s very easy to see exactly what’s going on
inside the model, which is important when it comes to
debugging why some piece of design functionality isn’t working
correctly. But there is a significant drawback – it’s not very fast
and it becomes even slower as the design size increases.

This wasn’t so much of a problem when “big” in chip design


meant a few million logic gates and some memory. Simulator
builders found clever ways to make the software run faster and
run many simulations in parallel. But complexity has been
growing exponentially in graphics chips, networking chips, the
application processors at the heart of smart phones and in

3
SoC Emulation—Bursting Into Its Prime

many more large systems. There can be a billion or more


primitive logic functions in such a device, a scale which would
reduce a software simulation model to a crawl.

Unfortunately, that’s only one part of the problem. The size


and complexity of the testing required for V&V has grown even
faster. We now must consider internet and radio traffic, audio
and video streams and increasingly complex internal
processing in multiple functions at the heart of the chip; these
must be modeled with enough accuracy and across enough
representative use-cases to give high confidence in the
completeness of V&V.

Meanwhile, active power management on these devices has


sections of a chip frequently switching between full speed and
running slower or turning off, either to conserve battery power
or to keep temperature within reasonable bounds. This
behavior is controlled by a combination of software and
hardware interactions, and therefore you need to simulate both
hardware and software together, across all those functional
use-cases, to have high confidence in the correctness of the
design.

You could imagine that even more cleverness could have kept
the game going for a bit longer for software-based simulators,
but it was already clear many years ago that a new approach
would become essential for large-scale designs and testing.
That’s where emulation comes in.

4
SoC Emulation—Bursting Into Its Prime

The Beginnings of Emulation in Hardware Design


Emulation in the larger (non-chip) world generally means
making one operating system (or a process on that OS)
behave like another, purely as a software process. Software
development platforms for mobile phones are (in part)
emulators in this sense. In the chip design world, emulation
has developed a slightly different meaning; what we want to
emulate is a chip (hardware), rather than an OS (software), but
otherwise the concepts are similar.

Android emulator on a PC.

But we don’t just want to make the emulator behave like the
hardware – we also want it to run much faster than a software-
based simulator. There’s a general principle in the electronics
world – if you want to make a software function run faster, you
convert it into hardware. In software, you must fetch
instructions from memory, execute them and store results
back in memory, all constrained to run one stage per clock
cycle (in the simplest case). On a chip, you can skip the fetch
and store parts (because that’s built into the logic model) and
5
SoC Emulation—Bursting Into Its Prime

greatly simplify the execute part; you can even combine


executes from multiple steps into one clock cycle. And you can
run as many stages in parallel as practical. In short, hardware
can run many orders of magnitude faster than equivalent
software, so this should be a good way to accelerate V&V.

But hardware is expensive to build, so you want to build


something that will accelerate V&V for a lot of different kinds of
design. In 1983 IBM announced they had built the Yorktown
Simulation Engine (YSE)1. YSE was a custom multi-
instruction, multi-data (MIMD) machine designed specifically to
replace the TEGAS2 software simulator. This demonstrated
multiple orders of magnitude improvement in performance
over TEGAS, validating the principle. A slightly later model,
also from IBM and called the Engineering Verification Engine
(EVE)3 further improved on the earlier model. Daisy Systems
and Valid Logic, among others, also introduced specialized
accelerators around the same time.

Powerful though they were, these systems still suffered from


two problems: (a) by building on the architecture of a software
simulator, they lost performance they might have had in an
implementation more closely mirroring the real logic on the
chip and (b) they were still constrained by very low
performance in getting stimulus into the hardware and
returning the results from the simulation. Both these functions
were generally too complex to be accelerated by the same
hardware, so still had to run outside the accelerator at the
speed of software running on conventional computers. And
since the accelerated design model and the software test
model needed to communicate and therefore had to
synchronize frequently, net gain in performance was much
lower than had been hoped.
6
SoC Emulation—Bursting Into Its Prime

Chapter 1 – ICE is Nice


To recap, early simulation accelerators suffered from two
problems: native simulation speeds were still much slower
than real chips and slowed down even further in connecting to
external software testbenches.

Emerging in the early to mid-1980s, Field-programmable gate-


array (FPGA) technology4 provided an excellent solution to the
first problem. With this technology, circuit netlists up to a
certain size could be mapped onto an FPGA in a form not too
different from that implemented on a real chip, though
interspersed with circuit overhead the device required to
support programmability (and re-programmability) to any
arbitrary digital circuit. This improvement allowed circuit
models to run at speeds in the order of 100’s of kHz, still much
slower than the ultimate chip being modeled, which would run
at many MHz, but much faster than earlier software or
accelerated software simulators (~ 10-100Hz at a few million
gates5 and dropping lower as design size increased).

The second problem – slow performance of software-based


testbenches and monitoring software – turned out to be an
opportunity in disguise. Building software-based tests to
accurately model interaction of the design with external
devices like disk-drives and networks was becoming
increasingly difficult. And testing comprehensively across
many possible interactions at these speeds was clearly
impossible.

The opportunity was the realization that perhaps it was better


to throw out software tests and instead use the real electronics

7
SoC Emulation—Bursting Into Its Prime

to provide stimulus to the design model, and to consume


outputs. You build the real electronic system, but use the
FPGA-based model of the design in place of the chip you are
currently designing. And since the system is running at
hardware speed, there is no slowdown caused by trying to
synchronize with verification software. This method of
modeling is called In-Circuit Emulation or ICE.

The advantages of ICE are significant:

 You run at speeds much higher than software


simulation, so you can cover a much broader range of
tests.
 The stimulus and responses you are dealing with are
realistic for systems in which the chip will be used,
raising confidence in completeness of V&V.
 Asynchronous behavior in the system, which often
exposes bugs in a design, is handled naturally.

In fact, for all these reasons, ICE modeling is a very popular


use-mode for emulation today. So, verification problem
solved? Of course, it’s never quite that easy. The FPGA-based
solution comes with its own challenges.

Arrays of FPGAs
Early FPGAs could model circuits of several thousands of
gates, rather smaller than a typical digital chip design. The
technology has advanced rapidly, to the point that devices
come pre-integrated with millions of completely custom logic
cells, DSP slices, many megabytes of memory, fast
transceivers and much more. But digital design continues to
grow too, with the result that a large design often must be

8
SoC Emulation—Bursting Into Its Prime

mapped onto more than one FPGA and those FPGAs must be
connected on a board, that board then representing the full
model for the design.

This isn’t quite as simple as it sounds. You must divide the


design into pieces each of which will fit in an FPGA; this is
called partitioning. You can’t make this decision arbitrarily
because (a) signals needing to run from one FPGA to another,
through package pins and board traces, will be run much
slower than on-chip connections, which is going to mess up
the timing unless you divide logic in the right places, and (b)
FPGAs have a limited number of package pins so you’re also
constrained on how many signals can cross between FPGAs.

FPGA pin limitations were and remain a severe problem,


particularly around areas of high interconnectivity between
function blocks partitioned to different FPGAs. According to an
empirical observation known as Rent’s rule6, the more logic
you push into a block, the more external connections it will
require. This significantly restricted effective utilization of
available logic in each FPGA. One way to mitigate this
problem, familiar to readers who understand bus-based design
concepts, and introduced commercially by Quickturn
(subsequently Cadence) was to introduce interconnect chips7
between the FPGAs; this provided an extra degree of freedom
in managing connectivity in partitioning8.

Another approach called VirtualWire9, developed by Virtual


Machine Works (subsequently acquired by IKOS, which was
then acquired by Mentor Graphics), pipelined multiple logical
signals from inside an FPGA onto a single package pin, in
effect reducing high connectivity between packages through
time multiplexing.
9
SoC Emulation—Bursting Into Its Prime

There were other challenges, particularly on-chip clock timing


where clock routing could be unpredictable, but through FPGA
and software technology advances most of these problems
have been reduced to manageable levels, if not entirely
eliminated. Growing FPGA capacity alone has helped
significantly10.

The Good, the Bad and the Not So Bad


ICE remains a popular use mode for emulation for good
reason. This is still a good way to verify functionality in a very
realistic environment without needing to first build the chip.
That said, ICE comes with its share of challenges beyond
those mentioned earlier.

First, when you decide to build a custom chip, you likely do so


in part because custom design will enable you to offer the
highest-possible performance product. But when it comes to
modeling the design, an ICE model, since it is emulating
behavior, will inevitably run somewhat slower than at least
some of the surrounding devices to which it must connect.
Managing this lower performance requires synchronization to
adapt the speed of those devices to the emulation model and
vice-versa. This is accomplished using speed-bridges, each
designed for a specific interface.

For example, if you are designing a component which must


connect to a PCI interface, you will need to use a speed bridge
to connect your emulation to the other end of that connection –
perhaps a microprocessor. The microprocessor will run at full
speed; when it sends a request to the emulation model, the
bridge will forward the request and will then repeatedly stall
the microprocessor with re-try responses until the emulation

10
SoC Emulation—Bursting Into Its Prime

model is ready to respond. So speed bridges reduce net


system performance and, since they are extra hardware, add
to the overall cost of the emulation solution.

ICE setups can suffer from reliability problems. Cabling


between the emulation socket on the full-system board and the
emulator can lead to electrical, EMI and mechanical problems
which add debug time to your verification objective. But when
a solution is otherwise ideal for your needs, you find solutions
to these problems. V&V with realistic traffic can be a very
compelling advantage, cables notwithstanding. This is
particularly important when software models for external traffic
are not available or when you feel they do not sufficiently
cover a wide range of real traffic.

One last issue. Initially, ICE setups were single-purpose,


single user resources. This was partly a function of limitations
in what the emulator itself could do (it was designed as a
single-user, single objective machine) and partly because
connections, possibly through speed bridges, to a system
board were necessarily unique to that setup. Emulators are
also expensive. Along with speed bridges, setup and
maintenance, this made ICE sufficiently expensive and time-
consuming that it was only used only by customers with deep
pockets, on the largest and most complex designs.

11
SoC Emulation—Bursting Into Its Prime

Emulation Versus FPGA Prototyping


What I have described so far may sound just like FPGA
prototyping (see for example the SemiWiki book
“Prototypical”11). There’s a reason for that. On the family tree
of verification methods, FPGA prototyping and emulation
solutions started in the same place, but then evolved into
different solutions, albeit in some cases with similar
architectures.

Prototyping has focused, unsurprisingly, on providing the most


effective platforms for early system and software development,
assuming a quite stable (late-stage) design architecture.
Prototyping performance is expected to be as high as possible
to support reasonable turn-times for system / software
development, so considerable time must be spent setting up
and optimizing the prototype before it is ready for use.
Emulation on the other hand has focused more on earlier
stage chip-design development which requires supporting
quick turn-around for design changes and detailed debug.

To give a sense of these differences, an emulation may run at


~ 1-2MHz where a prototype can run at 5-10Mhz or even
higher, given sufficient hand-tuning. On the other hand,
emulation runs will typically take less than a day to compile,
whereas a prototype may take from 1-3 months to setup. Also
debug in prototyping systems is typically limited to a small
number of pre-determined signals, whereas debug probing in
emulators is essentially unlimited.

In short, emulators and prototyping systems today solve


different though related problems. Emulation is ideal for
verifying and debugging evolving designs, but not for

12
SoC Emulation—Bursting Into Its Prime

supporting heavy-duty software development. Prototyping is


much more helpful to support the application software
development task and is general more cost-effective for that
task, but is less useful for debugging hardware problems and
is not a very effective platform when the chip design is
changing rapidly. Vendors who support both provide integrated
flows which greatly simplify switching back and forth between
these platforms for full system V&V12.

13
SoC Emulation—Bursting Into Its Prime

Chapter 2 – Three Architectures


For in-circuit emulation, emulators were clearly the only game
in town, but software simulation continued to dominate the
V&V process in all other areas. However, simulators struggled
on full-designs and large subsystems, since on relatively small
designs (<3M gates) they might reach ~10-100Hz real-time
performance, completely impractical for testing systems over
millions or billions of cycles, and becoming even slower as
design size increased. When verification teams saw what was
possible in emulation, it was natural for them to wonder if they
also could benefit.

An obvious approach to speed up simulation - to massively


parallelize - has demonstrated about an order of magnitude
improvement in speed13 14. While impressive and certainly of
value for simulation-centric jobs, these solutions only
appeared relatively recently and remain too slow for the many
big V&V tasks that have become commonplace.

ICE emulation has the raw performance but simulation needs


demanded a lot of work on the underlying architecture, which
consequently evolved in three directions– processor-based,
FPGA-based and custom FPGA-based. Each approach has
strengths and of course some weaknesses. We’ll look at these
below.

14
SoC Emulation—Bursting Into Its Prime

Leading-edge emulation hardware includes several components,


such as chassis, boards, chips, power supplies, and fans.

Processor-Based Emulation
While emulation started with FPGA-based architectures, this is
not the only possible approach. Processor-based architectures
have also enjoyed considerable success, but you shouldn’t
confuse processors in this context with standard CPUs. These
processors are custom chips, each of which is an array of
small interconnected CPUs. The original concept and early
versions were developed by IBM, building on earlier
architectures. This was then licensed by Quickturn and
introduced commercially in the late 1990’s. Cadence
developed this architecture into what we now know as the
Palladium platform15.

The basic architecture is quite easy to understand16. (I should


add that what follows is based on a 2005 reference. I am sure
that many features have evolved since then.) Each CPU can
evaluate Boolean expressions. When the design is compiled,

15
SoC Emulation—Bursting Into Its Prime

all expressions are reduced to 4-input forms to simplify the


hardware architecture. Evaluation scheduling is determined
according to familiar leveling and other algorithms, with an
obvious goal to reduce the number of time steps required
through as much parallelism as possible. You might think of
this approach as a little like native-compiled simulation but
supercharged with massive parallelism.

There are additional details in compilation. Combinational


loops must be broken (by inserting a flop) and tristate busses
must be mapped to two-state “equivalents”, both cases being
simple to handle. Memories can be mapped and/or
synthesized for register banks for more efficient
implementation, and interfaces to external hardware (such as
devices on an ICE board) must be mapped.

Of course, the whole emulator isn’t on one chip. A full system


is built from racks of boards, each board including multiple
multi-chip modules (MCMs), each MCM carrying multiple die,
each of which hosts multiple processors. Part of the job of the
compiler is to optimally distribute analysis across these many
processors in the total system.

Compilation speed is a strength for this approach, allowing for


large (~1B gates) designs to compile from a single host in a
day. Industry reports tend to show platforms of this type having
an edge in compile times over other approaches.

Run speeds are in the 1-2MHz range which is comparable to


the custom FPGA approach, though not as fast as the
commercial FPGA approach, which is closer to FPGA
prototyping (and therefore typically has slower compile times).

16
SoC Emulation—Bursting Into Its Prime

Debug accessibility is also strong. All CPU outputs are


accessible and can be stored in a trace-buffer over some
period to provide good visibility in debug. Debug data can also
be streamed out at high speeds. I should note that debug for
all emulator vendors is generally viewed as a post-process (I’ll
discuss this later in this book).

Emulation Based on Custom FPGAs17


Mentor had been experimenting with their own FPGA-based
emulator, but in the mid-1990s switched to an architecture
based on their acquisition of Meta Systems, who had built a
custom reconfigurable device designed specifically to
accelerate emulation. I should note that Mentor prefers to call
this a custom reconfigurable or reprogrammable device rather
than an FPGA because it really it is a special purpose and
custom design specifically architected for emulation to
overcome many of the disadvantages of general-purpose
FPGAs. However, there is no widely-understood name for this
type of device so I shall simply refer to it (where appropriate)
as a custom FPGA.

In this architecture, primitive logic elements (Lookup Tables or


LUTs) are organized with routing in a clever hierarchically self-
similar fashion, this being designed, together with specialized
place and route software, to minimize congestion in routing
between elements and to ensure balanced timing in
connections.

I mentioned earlier utilization limits in using conventional


FPGAs for emulation. The timing problem arises because
FPGA platforms are built with the expectation that the designer
will fine-tune timing, at least to some extent, and that they are

17
SoC Emulation—Bursting Into Its Prime

not (necessarily) concerned with exactly matching the cycle-


level timing of a reference design. This is OK when you are
prepared to spend days or even weeks optimizing the design,
but not when you want a quick and accurate compile to start
emulation runs. The Meta Systems architecture addresses
both the congestion and balanced timing problems.

These improvements in support of emulation come at a cost,


having a somewhat lower LUT count per unit area than a
general-purpose FPGA of the size, so it wouldn’t be the right
solution for general-purpose devices, but it is ideal for this
application.

The custom reconfigurable or reprogrammable device improves


routing over commercial FPGAs.

The second innovation came from IKOS, acquired by Mentor


in 2002. Part of the objective was to timing-multiplex signals
between custom FPGAs (to overcome package pin count
limitations), but it actually does more than that. It first compiles
the design into a giant virtual custom FPGA, then maps it

18
SoC Emulation—Bursting Into Its Prime

through multiple stages of timing re-synthesis into the array of


actual custom FPGAs available on the emulator. At the same
time, memory re-synthesis maps memory into devices on the
board rather than within the custom FPGAs. From a compile
point of view, this whole process is more deterministic than for
mapping onto commercial FPGAs, and is typically quite a bit
faster than for systems based on such FPGAs. The
architecture also leads to higher utilization per custom FPGA,
somewhat offsetting the lower device count per unit area.

Thanks to the custom architecture, compilation speed seems


in practice to be very good, and in the same general range as
for the processor-based approach. As for all architectures,
large designs must be partitioned; also after mapping to
devices, per-device sub-designs must be physically
implemented on each custom FPGA; this task is typically
farmed out to parallel computation on multiple PCs.

Run speed is in the same 1-2MHz range as for the processor-


based approach, again not as fast as commercial FPGA,
which gets its speed (and slower setup times) by being a
scaled-back version of FPGA prototyping.

Debug accessibility is also strong since this is built-in to the


Meta Systems architecture. Internal node visibility is
comparable to the processor-based approach, also stored in a
trace-buffer over a selected time-window and debug data can
be streamed out at high speeds in support of debug and other
applications.

19
SoC Emulation—Bursting Into Its Prime

Emulation Based on Commercial FPGAs18


Emulation started with this approach, but the most popular
platform that continues to use this architecture was introduced
first in 2003 by EVE (subsequently acquired by Synopsys and
now marketed as the ZeBu server19). As with other
architectures, a big design won’t fit into a single device, which
means the design must be partitioned, though commercial
FPGAs can achieve similar capacities to other approaches
with a smaller number of devices, leading to smaller emulation
boxes, with shorter interconnect and therefore faster run-time
performance20. As for the other architectures, good partitioning
is essential to getting good run-times.

When the emulator is built on commercial FPGAs, as is the


case in the Synopsys ZeBu system, use of partial crossbar
devices mitigates timing-balancing problems due to inter-chip
interconnect by controlling interconnectivity between FPGAs
though a bus-like structure.

A second complexity is that each piece of logic targeted to a


partition must be synthesized, placed and routed within that
FPGA. Emulation vendors whose solutions are built on
commercial FPGAs provide scripts to run these steps
automatically, but the only certain way to ensure push-button
simplicity is to aim for sufficiently low utilization per device that
the logic in the partition is guaranteed to fit and meet timing
constraints without need for manual intervention. This works,
but the capacity per FPGA for these systems tends to be lower
than you might expect from the advertised capacity of the
devices, unless you manually intervene during setup, which
can significantly slow compile times.

20
SoC Emulation—Bursting Into Its Prime

Also, even when push-button, FPGA synthesis, place and


route is designed to generate high-quality mapping onto
complex underlying architectures, which takes time. (I should
note here that Synopsys uses its own software for partitioning
and synthesis rather than the FPGA vendor software, so we
should expect this step is optimized for their application.) As
for custom FPGAs, this mapping for each partition can be
implemented separately, so all of them can be run in parallel
on multiple PCs. Even with parallelization, this leads to
compile times for commercial FPGA-based solutions which
can significantly trail those for other architectures.

An advantage of the commercial FPGA approach is run-time


performance. Recent industry data seems to indicate run-
times as much as 5 times faster than for the other
architectures, thanks to more effective gate-packing /
utilization per unit area; this in turn allows for implementation
on a smaller number of FPGAs, and therefore you can expect
smaller interconnect delays between devices since a smaller
number of devices can be closer together on a board. If long
regression runs dominate the bulk of your testing (where a
long setup time is not so important), this can be a real
advantage. These systems can also have an advantage in
capacity over custom FPGA approaches, though it is not
always clear how much effort must be put into compile to reap
this benefit21.

An important downside for commercial FPGA platforms is in


debug. Unsurprisingly, FPGA vendors don’t assign significant
value to debug access to all the internals of a design; their
market goals are functionality and performance in the end-
application. Some (SRAM-based) FPGAs have a path to

21
SoC Emulation—Bursting Into Its Prime

access every node, primarily for bitstream programming, but


this is generally too slow for debug purposes. Alternatively, it is
possible to add debug paths to any node which are then
compiled into the design, but these decisions must be made
before compilation and can have noticeable impact on
emulation performance. For these reasons, debuggability
remains a weaker point for systems based on commercial
FPGA devices.

Timing in the Three Architectures


Almost all digital design is synchronous, which means that
logic evaluations starting at a given time are constrained to
complete in a clock cycle so the results of those evaluations
are all ready when you want to start the next round of
evaluations. While most designs today use multiple clocks,
each part of a design is controlled by just one of those clocks
at any given time. Correct operation of the whole design is
very dependent on carefully balancing logic evaluation against
clock cycles and carefully managing the distribution of clocks
around the chip, so that clock latency (delays in arrival of a
clock at any given part of the circuit) is balanced across the
whole device.

In emulation, the same clock balancing must apply not just


within each emulator device but across the whole system,
requiring careful management of clock generation and
distribution through boards, racks and even between cabinets
when large enough designs must be modeled

When it comes to signal timing, it might at first seem that


FPGA architectures would have an advantage over processor
architectures because the FPGA designs look more like real

22
SoC Emulation—Bursting Into Its Prime

circuits. But after mapping design gates into lookup tables and
mapping clocks onto the pre-designed clock distribution
system of an FPGA, resemblance between the FPGA
implementation and the ultimate design implementation is
limited. Additionally, each architecture effectively models the
multiple clock frequencies in the real design based on
modifications to a common clock for the emulator. So any
apparent advantage one architecture may have over another
in this area is largely in the eye of the beholder.

A second potential problem in timing can arise for data signals


split across partitions. Even when optimal partitions are found,
that does not guarantee that high-speed signals will not need
to cross between chips, board or cabinets. Added delays in
such signals, through package pins and board-level traces
between chips, may force a need for synchronization to keep
other signals in lock-step, which can dramatically drag down
effective performance for the emulator.

Managing system-level delays effectively can be challenging


since interconnect delays at the chip, board and rack levels
can be dramatically different. Minimizing these differences
requires use of the same interconnect technologies common in
advanced server platforms, such as optical and Infiniband
interfaces22. At this level, emulation technology is starting to
share advances most often seen in advanced datacenters.

23
SoC Emulation—Bursting Into Its Prime

Chapter 3 - Emulation Muscles onto


Simulation Turf
Architectural advances in emulation paved the path for full-
chip and large-subsystem V&V engineers to seriously consider
emulation in addressing their modeling needs. The primary
problems they needed to have addressed to consider
emulation as a worthy alternative to simulation were:

 Make setup a push-button process


 Make the stimulus-generation process run at or near
the speed of the emulated design
 Provide debug access comparable to that offered by a
software simulator, with minimal impact on
performance
 Make the solution multi-user (given an appropriate
license, many users can launch simulation runs at the
same time; the same should be possible for emulation)

The setup and debug needs were handled through differing


approaches in the 3 architectures. Accelerating the test
environment and making these systems truly multi-user
became the next major areas of development.

Speeding Up the Test Environment


The model for the design, typically known as the device under
test (DUT), emulates quickly in principle but faced significant
slowdown when communicating with a software-based
testbench. Those tests run on a conventional computer and,
since they typically communicated with the DUT through
hundreds of signals and these were required to synchronize
between the (comparatively slow) software test and the
24
SoC Emulation—Bursting Into Its Prime

emulated DUT model, emulator speed could be dragged down


close to software simulation speeds. ICE solved this problem
by connecting the emulation model directly to real hardware,
but that approach would often be too difficult to setup and
inflexible for simulation users (though exceptions are starting
to be supported in methods called “In-Circuit Simulation
Acceleration”).

Another solution was to move the testbench to the emulator, in


which case everything was running at emulation speed.
Sometimes called targetless acceleration, this approach is
frequently a part of the solution, but it isn’t a universal solution.
What goes on an emulator must be mapped into low-level
functions and therefore must be synthesizable, which limits the
complexity of stimulus generation and of assertions, when
thinking of debug. Put another way, you move testbench
components to the emulator that you could equally implement
in hardware. Another consideration may be that modern
testbenches can be significantly larger than the DUT (in lines
of code). Even if they could be ported to the emulator, cost to
emulate the full testbench plus DUT could be prohibitive.

Still, targetless acceleration covers a useful subset of testing


needs – synthesizable IP, synthesizable verification IP and
synthesizable assertions (such as OVL assertions) are
obvious examples. But loops with complex bounds, complex
assertions, class-based structures and software-based tests –
these must stay outside the emulator.

In this same vein, emulation vendors now support common


verification IP, with some limitations, on the emulation
platform; this is accomplished using proven design IP

25
SoC Emulation—Bursting Into Its Prime

equivalents in synthesizable model form. Since a lot of V&V,


for example performance testing, will typically replace many IP
with VIP models, this is a very effective way to move major
chunks of functionality to the emulation platform23.

Another breakthrough, first introduced by IKOS, is transaction-


based interfacing. Instead of communicating between
testbench and emulator through bit- or word-level signals, this
method communicates through transactions or protocol
packets as bundled groups of signals24. In the testbench
running on a PC, a high-level model (in C/C++, SystemVerilog
or similar languages) can send and receive these abstracted
packets directly. This function on the host computer is mirrored
on the emulator by a synthesizable bus functional model which
can convert packets to signals and vice-versa, all at emulation
speeds. The real benefit here comes from both sides being
able to stream transaction packets as they become ready
rather than stalling the emulator for each signal/word update;
this greatly reduces impact on overall performance.

Multi-User Support
All these improvements were critically important to make
emulation effective in simulation use-modes but emulator unit
prices (in the millions of dollars) would have severely limited
adoption if systems remained single-user, as they were in the
very early days of emulation. Fully aware of that limitation,
IKOS (subsequently acquired by Mentor in 2001) offered multi-
user support by 199825, within a similar timeframe Quickturn
(subsequently acquired by Cadence in 1998) announced
support for multiple users26. Eve (subsequently acquired by
Synopsys in 2012), announced support in 200927.

26
SoC Emulation—Bursting Into Its Prime

But this capability was limited. Usage in ICE mode still tied up
a machine and even when not running in ICE mode, multi-user
still didn’t look like the virtual capability that we have come to
expect from public and private cloud services with support for
remote access, queuing, round-the-clock utilization and
scalable growth in capacity. Stepping up to these requirements
was essential to become an effective alternative to software-
based simulation which could already run on scalable server
farms and private clouds.

Cadence28 and Mentor29 now support true virtualization


platforms to meet these needs. Users can login from any
location to launch jobs and download results for debug. These
systems support job prioritization, queuing and scheduling and
aim to maximize utilization and throughput, just as you would
expect for conventional jobs in a server/cloud environment.

Since ICE remains an important use-mode, it must continue to


be supported in virtualization. This is accomplished through at
least a couple of methods. Cadence provides an emulation
development kit30 which is a rack-mountable system providing
extension to external devices through any number of ports.
Through this interface, you can add external devices to your
virtualized emulation. The connectivity to external devices is
also virtualized – external devices in this kind of setup can be
shared between different jobs running on the emulator.

27
SoC Emulation—Bursting Into Its Prime

Mentor VirtuaLAB co-modeling.

Mentor VirtuaLAB31 offers a software-based solution, still


connecting to real traffic but here through the host system.
VirtuaLAB splits the peripheral between a design IP running on
the emulator and corresponding software stack/application
running on the host. Since everything is virtualized in this
model, again you can run ICE modeling from anywhere in the
world.

There’s debate about the pros and cons of hardware-based


and software-based modeling of external devices; each seems
to have merits in differing contexts. This is a topic I have
touched on in a SemiWiki blog32.

28
SoC Emulation—Bursting Into Its Prime

A Change in Approach to Debug33


The great thing about emulators is that they run very quickly.
But they aren’t optimal for interactive debug. In debug, you
want to set triggers and breakpoints. When you hit a
breakpoint, you want to poke around to understand why that
trigger tripped, then you want to trace back to figure out what
caused those events until you hopefully get back to a root-
cause for the problem.

But this takes time. A verification engineer doesn’t think at


MHz speeds and even if they could, all that searching around
until they find the “ah-ha!” bug takes a lot of discovery. You
really don’t want to have an emulator spinning its very
expensive wheels while the engineer is doing all this thinking.
So, by far the most common approach to debug in emulation is
to dump all the debug output to a big file, then do offline
analysis in a (software-based) debugger, which doesn’t
consume utilization dollars from your department budget as
you search.

This raises the obvious question of “how do I know what to


dump, or when to dump it, if I don’t yet know I have a bug?”
The answer comes in two parts: assertions and trace-buffers.

Assertion-based verification has been common for some time


in software development yet started to appear in hardware
design more recently, particularly thanks to formal verification
methods. The basic idea is quite simple. Rather than waiting
for something to go wrong then trying to figure out why, you
create multiple assertions about things you believe should
always be true at certain critical places in the design. For
example, you might create an assertion that a certain buffer

29
SoC Emulation—Bursting Into Its Prime

should never overflow. You know that if it does, design


behavior may be bad, or at least unpredictable.

These assertions can be written in a form which can be


directly compiled into the emulator. And they can be (and
should be) scattered all over the design, each asserting some
expected behavior in critical areas. A design sporting
assertions like these is a powerful aid to V&V. In emulation,
these assertions typically have negligible impact on emulation
performance; they simply stand to the side waiting for the
possibility that they might be triggered.
On such a trigger (and/or at times of your choosing) trace
buffers in the emulator provide state information for past time
and can be dumped to support subsequent debug. Of course,
there are bounds on what you can trace and for how long, but
these seem to be quite generous in modern emulators. You
can also checkpoint and, if you find you hadn’t traced quite
enough signals to support your debug exploration, you can
restart emulation from a checkpoint, avoiding the need to
restart from scratch.

The result of all this is that again, instead of waiting for


something bad to happen then trying to figure out why (which
is very difficult if you can’t do interactive debug), you preload
the design with assertions on expected behavior. If any of
them trigger, you already have a head-start on a much earlier
root-cause for that bad thing that would eventually have
happened, and you have big debug traces to dig back into why
the assertion failed. All of which you can do in offline debug.
Incidentally, this isn’t just the right way to do debug in an
emulator – it’s a better way to do debug in general.

30
SoC Emulation—Bursting Into Its Prime

Chapter 4 - Accelerating Software-Based


Verification
The Rise of Software in Electronics34
Back in the mists of time, chips were either computers, such
as the microprocessors (uPs) at the heart of your PC or
microcontrollers (uCs, a scaled down version of a
microprocessor) or they were something more dedicated and
hardwired. All the intelligence, driven by software, was in the
uP/uC and all the specialized work (perhaps controlling a disk
drive or a terminal) was handled by chips needing little or no
software. In design, modelling interaction with software was
limited to those few uP/uC devices.

But that all changed with the emergence of embedded


computers, particularly those from ARM35. Small processors
sitting on chip can supervise complex functionality and moving
data around, allowing complex systems to be integrated onto a
single chip where previously they would be implemented with
separate components on a PC board. Since this integration
dramatically reduces the cost and size of the system,
increases reliability and performance and reduces power
consumption, access to these capabilities created a stampede
towards integration with increasingly sophisticated software
control.

But now modelling software with the hardware cannot be


limited to testing a few uP/uC devices. Virtually every large
chip contains at least one, often multiple embedded computers
so every one of those chip designs must be verified with
software.

31
SoC Emulation—Bursting Into Its Prime

Why Software is Important in Verification36


When we start to think of new electronic products, we don’t
think in terms of hardware or software. We think instead about
the functionality we want and how we can maximize the
appeal and profitability of that product by maximizing usability
and minimizing constraints and cost.

Doing this effectively requires careful tradeoffs between what


is implemented in software versus hardware. Some common
principles have emerged in SoC design to guide these
choices. For example, hardware is faster than software, so you
want to implement lots of functional blocks in hardware. But
software is preferable when you need to support evolving
functionality - apps obviously, but also software in support of
embedded needs, such as managing the protocol part of
communication. Sometimes you need both. In managing
power consumption in a complex product like a smartphone,
some power management is controlled by requests from the
operating system and some is controlled directly by the
hardware. But what ultimately matters for the product, and
what requires V&V, is the functionality of the total system. So
here’s one reason you have to verify software and hardware
together.

Another consideration is the way we manage flows of data


around the chip. Data can flow in and out from USB, memory
and touch screen interfaces and within the device from CPUs,
GPUs, DSPs and other functions. As you are looking at a
video on the web, listening to an MP3 and connecting to a
computer through a USB connection, these functions are
competing for attention; this could lead to severe performance
problems or
32
SoC Emulation—Bursting Into Its Prime

even crashes if your design is unprepared to handle these


levels of concurrent traffic.

But it would be impossible to design for any possible level of


load on the system, independent of software and external data
activity. A realistic design must make tradeoffs, so you have to
verify against whatever you consider would be reasonable
loads. Which again means you must verify the hardware
together with software.

Emulation and Hardware/Software Co-verification


Handling co-verification is one place that software-based
simulation must surrender. The software part of a test could in
principle run on a virtual model on a host computer,
communicating with the rest of the emulation model through a
transaction-based interface. But performance would be
completely unworkable. These tests often need to run millions
to billions of cycles. At an effective speed of say 10Hz, even a
million cycles would take more than a day. Booting an OS
could take years!

Emulation is a much more workable solution (also with the


software part of the test running on a host computer), as has
been demonstrated by Linux, Android and iOS boots in a
matter of hours. Of course, FPGA-based prototyping will be
even faster, but emulation is often preferred when hardware
debug access is most important.

While emulation is a good solution here, it too has limits and


that means care must be taken when considering what
software should be included in the emulation-based testing.

33
SoC Emulation—Bursting Into Its Prime

All Software isn’t Equally Important to V&V


In an ideal world you would run all software, from low-level
drivers which provide low-level control of hardware functions,
up through the operating system (OS, perhaps Linux), the user
interface (often Android or iOS) to all the applications that
might run on the device (mapping, music, phone, email,
gaming, ..). But this level of testing would make for
impractically long run times in emulation. In practice, given
agreed interfaces and software testing at the application level,
these apps and even lower-levels of software often don’t have
to be included in hardware V&V.

It is practical to boot an OS and even the user interface in


emulation, both of which are important for validating the
interaction between hardware and software. For example,
Linux and Android can boot in a small number of hours. This
can be a little slow for heavy-duty regression testing so it might
(late in design) get an assist from FPGA prototyping with a
transition to emulation for more detailed debug, or in earlier
stages run in “big-job” regressions only after all other tests
have passed.

But there’s a lot of important software that can be tested


frequently in regressions, generally known as “bare-metal”
software which is very closely coupled to the hardware. This
includes firmware and drivers to connect to each of the
addressable functions in the device, such as the GPU and the
USB interface as well as support functions like debug and
power management. This may be complemented by stripped-
down variants of the OS and user interface, with added
instrumentation to support more detailed V&V in areas of
interest to this design.
34
SoC Emulation—Bursting Into Its Prime

Debugging in Hardware/Software Co-verification37 38 39


Debug in software-driven verification necessarily becomes
more complex. Now you need to be able to chase problems
through both software and hardware to identify root causes. All
the emulation providers have invested significantly in providing
combined hardware and software debugger capabilities to
simplify this analysis. This includes methods to view simulation
waveforms and transactions, along with memory, registers and
stacks, ability to set breakpoints and triggers and all the other
features you would expect in both hardware and software
debuggers. The days of debug using monitor statements and
simulation logs are far behind us - well-designed debuggers
have become fundamental to effective debug in
hardware/software co-verification.

35
SoC Emulation—Bursting Into Its Prime

Chapter 5 – Beyond Traditional Verification


Once you have fast emulation for functional modeling, new
opportunities emerge. Some of these are extensions of
analyses which are possible, if much slower, with traditional
simulation. Others are capabilities which would have been
impractical in simulation but with much higher performance
become feasible. We’ll explore a few here.

Performance Modeling40 41
The goal in performance modeling is to determine that the
system performs well across a wide range of system use-
cases and particularly under heavy loads competing for
resources. Doing this sort of analysis is feasible to a limited
extent using software-based simulation but requires that many
of the components in the design be modeled by verification IP
or bus-functional models to reduce the size of the simulation.
The danger in that approach is that use of abstracted models
may hide potential problems in interaction between fully
implemented IPs. Verifying performance with a full model in
emulation reduces the chance that you’ll miss unusual
implementation interactions which may drag performance
down in corner-case usage.

Power Modeling42 43 44
Power has become at least as important a product
differentiator as performance and, in many cases, has become
more important. In earlier times, you could put
best/worst/typical power values for each block together in a
spreadsheet, along with crude approximations of typical use-
cases to get an overall estimate of power consumption. But
36
SoC Emulation—Bursting Into Its Prime

that doesn’t cut it anymore. One of the problems with this


approach is that it can only give insight into average power.
Peak power is also very important. This has an impact on
localized heating and voltage drop in power rails, both of which
can cause nominally safe timing paths to fail. Peak power can
also reduce reliability through electromigration in insufficiently
robust power (or signal) routing.

The ideal way to model power is to do so dynamically, so you


can see both averages and peaks over a wide variety of use
cases. Again, emulation is the best way to do this, but there’s
a wrinkle. Power must be estimated by summing switching and
interconnect power for each node toggle in the design,
together with leakage power for each “on” node in the design,
where the scaling factors for each of these are pulled from
library files. The calculation is not complex, but it needs to be
performed across each node on each clock tick, requiring state
access to each node in the emulation on each cycle, which
would, in a crude implementation, slow emulation significantly.

In all solutions I have seen, power is computed offline rather


than in the emulator (all those floating-point multiply/add
computations would be far too slow to couple directly to
emulation). Vendors who support these flows have streaming
methods to output the required data to this computation, while
minimizing impact on emulation performance.

Test Verification45
Design for Test (DFT) in modern design has become a very
complex part of the total design. You have scan test, built-in
self-test (BIST) and test decompression logic. This very
complex logic, woven in among all the other logic in the

37
SoC Emulation—Bursting Into Its Prime

design, must be verified, just as you must verify the normal


(mission) functional mode of the design. Cycling through scan
testing, compression and the complex and lengthy sequences
implicit in BIST methods has already become unreasonably
time-consuming through software-based simulation, as
indicated by the growing popularity of static approaches to
test-logic verification. But static methods can only provide
limited coverage. Dynamic verification is still required, just as it
is for functional verification; emulation can accelerate this
objective by factors of thousands or more, making functional
verification of DFT logic a realistic option even for large
designs.

Deterministic ICE46
The value of ICE is in dealing with realistic traffic but that traffic
isn’t necessarily deterministic, so if you find a bug, it may be
difficult to find a way to trace back to pin down the root cause.
Running in a mode which also captures traffic provides a way
to enable deterministic replay.

In this mode, emulation internal state and external interface


input are captured during the course of a normal ICE run.
Deterministic ICE support then makes it easy to replay a run
based on the captured data. On a replay, you always see
exactly the same data seen in the real ICE run so debug is
predictable, whereas simply rerunning the ICE run may not
reproduce a bug if it was data or environment dependent. You
get all the advantages of ICE in getting to see these rare
problems, with all the advantages of determinism in debug to
isolate the root cause of a problem.

38
SoC Emulation—Bursting Into Its Prime

Complex Load Modeling47


There are cases where you really need to model realistic loads
but even ICE modeling would be too complex. When you’re
building a big network switch with 128 or more ports and you
need to model realistic voice, video, data and wireless traffic in
multiple protocols (and software-defined networking) at
variable bandwidths over all those ports, setting up an ICE
environment would be, if perhaps not impossible, at least
extremely expensive and challenging.

In fact, testing big switches in realistic environments is such a


big problem that companies have emerged to provide software
solutions to model those environments for the express
purpose of testing networking solutions. There are several
companies in this class (known as network emulation software
providers). One such company is Ixia; Mentor has partnered
with Ixia to connect their IxNetwork Virtual Edition, through the
Veloce Virtual Network App, to Veloce Emulation48.

39
SoC Emulation—Bursting Into Its Prime

Chapter 6 – The Role of Emulation in Design


Today
From time to time, debate surges on whether simulation’s days
are over, to be replaced by a combination of emulation and
static/formal analysis49. The topic is popular because it ignites
entertaining debates around the pros and cons of different
methods of modeling. But industry feedback consistently
supports a view that multiple different tools for verification are
essential and are likely to remain so for the foreseeable future.

It might seem that we are too easily accepting a confusing mix


of tools and methodologies. Surely if we could reduce this set
to one or at most two tools, costs, training and overall
efficiency in V&V could be optimized? In fact, different needs
at different stages of architecture and design seem impossible
to reconcile into one or two tools. It is worth understanding
why, and why emulation providers work hard to provide
seamless interoperability between these different tools and
flows in support of V&V.

Virtual Prototyping and Emulation


Virtual prototyping may be new to some readers. Imperas50
provides one popular option; EDA vendors also have offerings
in this space. Toolsets like this are most commonly used for
embedded software design and development when the
hardware platform is not yet available. They start with
instruction-accurate models of the underlying hardware,
sufficiently accurate that software running on top of the model
cannot tell it is not running on the real system, but the model is
heavily abstracted to support running at high performance.

40
SoC Emulation—Bursting Into Its Prime

Frequently these systems use Just-In-Time (JIT) methods to


model when executing the software load, which is why they
can’t afford to model much detail in the hardware.

Virtual prototypes can run OS and application software at near


real-time speed, which is obviously much more effective for
software development than the ~1-2MHz speed typical of
emulation, or even the speeds offered by FPGA prototyping.
And virtual models can be ready for use very early in design
planning, unlike FPGA prototypes. But since these prototypes
have very limited understanding of detailed hardware
architecture, they provide little useful feedback on how the real
hardware model will interact with the software.

A hybrid model can bridge the gap51. One application is to


accelerate adaptation of earlier generations of firmware and
OS to new hardware, though it is not always clear how widely
this early prototyping use-model is being adopted in practice,
largely because software teams often lack cycles to work on
planning for the next design when they’re busy wrapping up
work for the last design. A much more actively-used approach
is to support software-based testing of hardware, where
software stacks for embedded CPUs run on the virtual model,
linked to an emulation of the rest of the hardware.

Simulation and Emulation


It might seem natural that simulation should eventually be
replaced by emulation. After all, each is predominantly
valuable during design implementation (between architecture
design and tapeout) and emulation runs orders of magnitude
faster than simulation. But industry veterans in V&V think
differently. They see these solutions having complementary

41
SoC Emulation—Bursting Into Its Prime

strengths, only some of which of which can be consolidated


into one solution.

Let’s start with performance. Emulation has a huge advantage


in run-time, which makes it essential for modeling complete
SoC-scale designs and for running software-based verification.
Simulation, even accelerated simulation, cannot compete at
this level. Speed is also very important when regression
testing over significant banks of compliance / compatibility
test-suites. This need is particularly common for
microprocessor, GPU and other similar systems. And
emulation can connect to real-world devices (through speed-
bridges) in ICE-mode, a level of verification accuracy that is
typically difficult for simulation.

But emulation does not model Z (high-impedance) or X


(unknown) states. Modeling these states is often important to
correctly analyze bi-directional logic and tristate busses, or to
detect state ambiguities which may arise from inadequate
reset logic. Emulators are based on 2-state logic (0 and 1), just
like real circuits; extending emulation to handle these states
would demand significantly greater hardware capacity, making
cost very unattractive.

Emulators also can’t model detailed timing, which remains


popular for some analyses, particularly in checking some
aspects of gate-level timing. Another significant limitation is
that emulation cannot handle mixed-signal (analog and digital)
modeling. Since most large SoCs contain some level of analog
interfacing, mixed-signal simulation will continue to be
essential in some very important cases. One example is the
need to verify training mechanisms for DDR interfaces.

42
SoC Emulation—Bursting Into Its Prime

Additionally, simulation is still viewed as easier to use, more


flexible and, in the early to mid-stages of design, more cost-
effective than emulation. This is certainly true for small blocks
but also applies in full system modeling when taking a widely-
used layered approach to V&V. Compiling for emulation takes
a while (a day for a large design) so it’s inefficient to use that
solution to find bugs you could find and correct more quickly in
simulation. Basic issues such as incorrectly connected clocks
and resets, incorrectly set address maps in the bus, incorrectly
connected request and grant signals – these can be found
quickly in simulation. That leaves emulation to do what it does
best - finding the hard, subtle bugs that come up in complex
interactions between the software, hardware and realistic
external devices.

Some platforms provide methods to “hot-swap” back and forth


between simulation and emulation so you can use the speed
of emulation to get to important areas for test, then use the
simulator to dive down into more flexible debug.

Given these complementary strengths, an increasingly popular


idea is to leverage both in a mixed approach commonly called
simulation acceleration or in-circuit acceleration, where
simulation acts as the master and emulation as a slave to
accelerate or provide more realistic modeling through ICE
interfaces for some component(s) in the design52 53. The
performance difference between these systems must be
managed, typically through a transaction-based modeling
technique, such as SCE-MI54. Here instead of communicating
signal changes between the two platforms, you bundle and
communicate multi-cycle transactions55, an option now
supported on most simulation and emulation platforms.

43
SoC Emulation—Bursting Into Its Prime

I should add that I have also heard rumblings of co-modeling


with mixed-signal simulation, further underlining the value of
these hybrid approaches.

Emulation and FPGA Prototyping


FPGA prototyping is a late stage option for design. Even in the
best cases, you can be looking at several weeks to a month to
setup a prototype, not something you want to be doing when
the design is evolving rapidly. A prototype of this kind is
primarily valuable in the relatively late stages of system and
software development, and primarily for software validation
rather than for detailed hardware verification.

For V&V of the hardware design (especially driven by software


testing), emulation will dominate, thanks to faster compile
times and superior internal visibility for debug. Emulation is
also a multi-user capability, especially in virtualization
configurations, which is a must-have to support large
verification teams, whereas FPGA prototyping is intrinsically
limited to a single user at any one time, so necessarily has
more restricted usage.

But there’s a very useful hybrid mode of operation where an


FPGA prototype can be used to get quickly to an interesting
point to start detailed hardware debug. From this point a
snapshot of memory and other important state can be
transferred to an emulator from which analysis can continue
with the greater internal visibility needed for that debug.

The value of this approach is based on a common observation


- most of the important areas to test (in the hardware) are
found after you’ve got through setup/boot, especially problems

44
SoC Emulation—Bursting Into Its Prime

in performance and unexpected usage corner behaviors. So a


fast, low-visibility path to get to a useful place to debug is not
much of a compromise.

Indeed, throughout verification flows, hybrid operation as


described in this chapter is becoming an essential way to
transition between more abstract and more detailed models,
between software-driven and simulation-driven and between
slower and faster.

The Outlook for Emulation


Clearly emulation isn’t going to obsolete other forms of
verification, but it doesn’t need to. Adoption continues to grow,
as has become apparent in the growth of reported revenues
for this segment56 and in customer pressure on vendors to
support virtual operation. Growth seems to be predominantly
in mid- to late-stage hardware design, which fits with
expectations that reasonably stable designs are not so
dependent on ultra-fast compile times and that you can get
more value out of multiple verification runs before needing to
load an updated/fixed design.

Hardware emulation is here to stay. It seems reasonable to


expect continued advances in capacity, performance and
capability and increasingly strong and transparent links, not
only to other tools in the chip verification flow, but also to more
system development and verification tools, as Mentor has
already demonstrated with their link to Ixia. Other verification
methods will continue to be important in a complete V&V flow
for hardware design and close interoperability between all
these methods will be increasingly important to streamline
verification.

45
SoC Emulation—Bursting Into Its Prime

Storage Market and Emulation Case Study


Shakeel Jeeawoody, Emulation Strategic Alliances, Mentor
Graphics

For most of us, it is hard to imagine a world without storage


capabilities. Storage devices are a must-have for retaining
business-critical documents and/or family pictures. Over the
years, storage medium has evolved from magnetic tape to
floppy disk, hard disk, CD, DAT, DVD and Compact Flash. In
fact, the storage evolution is still happening and the pace of
innovation just keeps getting faster.

The evolution of data storage.

A better grasp of the dynamics of this market is gained by


looking at how the technology has evolved and its growing
use. Initially, storage was used for backups. Now it is, to a
large degree, moving towards frequent and unstructured data
storage, processing and retrieval. This results in a completely
new class of performance, reliability and capacity
requirements. Additionally, the skyrocketing costs of managing
storage systems that comply with legislations for secure
46
SoC Emulation—Bursting Into Its Prime

information handling has increased the services business


model.

The market demands that huge amounts of data/information


be stored securely and with accessibility anywhere and
anytime. This requirement is driving the adoption of key
technologies and use models. The capacity, size and
performance of Solid State Drives (SSDs) is making it a very
interesting technology for future use. Moreover, the cloud is
enabling storage to be more convenient and easier. Non-
Volatile Memory Express (NVMe) over Ethernet or fiber
channel is becoming a leading solution for connecting
appliances to servers.

Current research into new mediums continues with


holographic, memristor or even DNA storage. Scientists can
already encode text, images, video and even operating
systems in DNA. It unlocks enormous potential for video
streaming while retaining compactness and durability. The
researchers believe that DNA will not degrade over time nor
become obsolete.

Having said that, we are far from having these advanced


technologies in production.

The question is how to solve today’s challenges with the


existing set of tools.

State of Storage
Current leading HDD and SSD Storage Technologies
According to GSMAintelligence.com, newly created digital data
is doubling every two years. This means increasing amounts
of storage must be available at the same pace. As reported by

47
SoC Emulation—Bursting Into Its Prime

Statista, Hard Disk Drives (HDDs) continue to be a dominant


source of bits shipped, but SSDs are on the growth curve.
However, hardware challenges for both HDD and SSD are
substantial – and increasing.

Manufacturing hard drives demands a very high investment, in


the hundreds of millions of dollars, for clean rooms, ultra-
precise robotics and experienced employees. At one time,
there were dozens of companies competing. Now there are
three: Seagate, Toshiba and Western Digital. Entering this
business is prohibitively expensive, and for many reasons, the
future of the market is difficult to predict. In addition, all of the
patents are in the hands of current manufacturers.

Creating an SSD has fewer financial barriers to entry, by


comparison, to manufacturing a magnetic hard disk drive.
Many SSD companies combine flash chips with a controller of
their own design or one acquired from an outside company, an
achievable business model for a wide variety of companies
with limited resources. The key differentiators are not the
media (flash), which is available from several sources
(Intel/Micron, Samsung, Toshiba/Western Digital, SK Hynix
and Powerchip).

48
SoC Emulation—Bursting Into Its Prime

SSD controller market – key players. Courtesy: Mentor

The lynchpin component is the controller. Each controller


requires an algorithm and firmware (FW) to manage the
complexities of writing and reading the various types of flash.
This media is changing rapidly: NAND, 3DNAND, 3DXPoint,
and other future technologies.

Competition is fierce with the battleground looking much more


like the battleground in the HDD industry prior to the
consolidation resulting in the big three.
HDD Controllers and Associated Verification Challenges
Hard disk controllers are complex in their own way, with
mixed-signal electronics, the usual power and performance
constraints and difficulty integrating and debugging 3rd IP, such
as the write/read channel and digital back-end with the preamp
and servo mechanism.

49
SoC Emulation—Bursting Into Its Prime

Typical HHD controller SoC verification considerations and


challenges. Courtesy: Mentor

Verification engineers must ensure that the complete system


works together, with firmware, to claim that a design is verified
(see illustration below).

Hard disk controllers are complex with mixed-signal electronics, the


usual power and performance constraints, and difficulty integrating
and debugging third party IP. Courtesy: Mentor.

50
SoC Emulation—Bursting Into Its Prime

SSD Controllers and Associated Verification Challenges


As the industry moves to SSD, the controller faces a
surprisingly high number of completely different challenges.
Performance of the back-end NAND channels can now
saturate a PCIe bus. This was never the case with a spinning
disk. It subsequently requires accurate architectural modeling
to ensure that power and performance trade-off decisions
meet requirements.

SSD controllers add complexities all their own that need to be


managed. Courtesy: Mentor

Managing NAND, in all the various types, requires complex


wear leveling, table-management and garbage collection, in
addition to all the interface requirements of a hard drive—
security, compression and error correction code (ECC).

51
SoC Emulation—Bursting Into Its Prime

System verification complexity has hit the wall. Courtesy: Mentor

Typical Verification Flow/Methodology


Pre-silicon Verification
Verification engineers typically take a bottom-up approach.
They sequentially create block level, sub-module level, module
level and finally system-level verification. This approach, at
least to the module level, works well when done in-house
where teams are located near each other.

As designs get bigger and complexity grows, pieces of the


design are purchased from 3rd party vendors. This leads to
inevitable integration issues. Furthermore, system-level
simulation is no longer a viable alternative as it takes too long,
and as such, is not effective.

In some cases, FPGA prototyping for hardware verification


and early software development is a tempting solution, but it
means time needs to be budgeted to get to a working board.
FPGA prototyping, with some care, can ensure proper
functionality; however, schedule delays are possible due to
partitioning and debugging issues that can turn into a

52
SoC Emulation—Bursting Into Its Prime

nightmare. Too often, the verification that should be done early


enough to enable design changes and trade-off decisions is
pushed later in the schedule where FPGA can act as a
catchall. The later a bug is identified, the more expensive it is
to fix. Finding an architectural problem too late can be a
project killer.

Post-silicon Verification
When the chip is back, engineers usually test the design
based on the type of adaptor being used—could be PCIe,
SATA or SAS.

Testing scenario in post-silicon phase. Courtesy: Mentor

They eventually connect to the end-customer library for final


validation.

Gaps in Current Methodology


Simulation and FPGA Prototyping Methodologies
Simulation has full visibility into design registers and nets, but
as design size increases, it becomes prohibitive to do full
system-level simulation. Verification engineers note that it
takes too long, is tough to create corner cases in a reasonable

53
SoC Emulation—Bursting Into Its Prime

timeframe due to lack of stimulus and is difficult to integrate


FW into the design and have it behave as expected.

On the other hand, FPGA prototyping, when used for


validation, can support full system-level simulation.
Unfortunately, FPGA prototyping has limited visibility, making it
very hard to debug design issues. Additionally, the FPGA
board connects to external hosts, but cannot run at full speed.
The FPGA prototype is good for testing a backed-up datapath
(due to running slow), but it cannot verify full-speed
connections.

Additionally, there are no easy ways to measure real


performance until the system-on-chip (SoC) is in the complete
system with real firmware. In this case, verification engineers
can estimate performance, but habitually miss something.

To check for optimal characteristics based on the shipping


configurations, engineers choose to implement A/B testing or
split testing. This means running verification for different
NAND, different size drives, different configurations and
different connectivity. This is possible, but challenging with
FPGA prototypes, and close to impossible with simulation.
Is the Verification Gap Getting Better?
Is the verification gap getting better? Actually no, but what
causes this?

The nature of flash technologies create some interesting


challenges that need to be managed *hours* after the drive is
first powered up, after one or more drive-fills.

54
SoC Emulation—Bursting Into Its Prime

NAND-based SSS Performance states for 8 devices (RND 4KiB


writes). Courtesy: Mentor

This new reality of drive performance makes simulating a


complete system nearly impossible with traditional methods. It
is usually only done for the first time with actual drive hardware
and firmware, or with models that pre-load a possible drive-
state that create interesting test cases. This can create some
unpleasant surprises the first time the drive integration is done.

To solve this, it is important to do performance testing and A/B


configuration testing as early as possible. This determines if
the proposed architecture and design lives up to its promise.

The option to measure the SSD controller’s ability to do


garbage collection, while concurrently writing and reading,
gives a better indication of real-world performance in
comparison to the usual watered down estimation methods of
most system-level SSD pre-silicon tests.

Power optimization, security and compression are even more


important than ever because the need for secure storage,

55
SoC Emulation—Bursting Into Its Prime

using less power, is imperative in the data center.

Is increasing Firmware (FW) helping current methodologies?


As firmware increases in size and scope, FW development
and verification needs to start earlier, typically happening
concurrently with HW design.

Firmware increasing in size and scope. Courtesy: Mentor


Verification engineers are adapting to this complexity by changing
the flow:
1) Start FW development concurrent with HW development to find
bugs prior to tapeout.
2) Create a plan to test FW with the actual HW (in FPGA or
emulation) prior to tapeout, to speed up both FW and HW
development and testing. Assume both will be in development at
one time, and plan accordingly for debug.
3) Plan for a certain level of design maturity and system-level
testing prior to tapeout, otherwise one WILL spin the chip.

Is Hardware Emulation a Viable Option?


Hardware Emulation in the flow
Clearly, hardware emulation is a viable option for many parts
of the verification flow.

56
SoC Emulation—Bursting Into Its Prime

Three foundations of storage verification. Courtesy: Mentor

In general, simulation gives full visibility, lets engineers easily


force error conditions, verify design blocks and provides
visibility into bugs found in the lab. On the other hand, FPGA
prototyping allows for faster and more extensive testing, allows
controller connections to external hardware used in or by the
drive, and allows for firmware test development with real
hardware, but has limited visibility for debug and is much less
flexible.

Emulation spans the gap between simulation and FPGA


prototyping, as it’s faster than simulation, provides more
visibility than the FPGA, allows controller connections to
external hardware used in or by the drive, runs on real
firmware, creates confidence before the FPGA prototype is
available, and enables the same setup for both pre- and post-
silicon verification. In addition, hardware emulation works in
concert with FPGA prototyping for more effective debug and
root-cause analysis.

In-Circuit-Emulation (ICE) Mode


In ICE mode, an emulator connects with physical interface
devices that allow the execution and debug of embedded
systems.
57
SoC Emulation—Bursting Into Its Prime

Caption placeholder. Courtesy: Mentor

The debugger adaptor enables processor debug for firmware


(rather than just a model) and use validation. The host speed
adapter (e.g. PCIe, SATA, and SAS) allows a connection to
host testers while reusing existing test scripts. This setup
enables designers to develop, test and debug the full test suite
against the SoC. Verification engineers find SoC bugs prior to
tapeout, and reduce the number of tapeouts required to ship
product.

In ICE-mode, an emulator setup for a storage controller might


look like this:

58
SoC Emulation—Bursting Into Its Prime

Emulation setup for a storage controller.

There is physical cabling from the emulator to external


hardware, in this case the Veloce iSolve PCIe speed adapter,
which translates real-world host traffic into emulator speeds.
An external daughter-card populated with the NAND devices
targeted with the controller is connected, allowing testing of
the latest NAND flash. There are NOR and DRAM memory
models connected, and a physical JTAG allowing for software
debug and step-through of FW code. The bulk of the design
under test (DUT) is in RTL and still contained within the
emulator. With an ICE approach, similar to FPGA, the system
being verified no longer is a one-to-one representation of the
final system, as the emulator cannot run as fast as the
peripherals connected to it.

Some limitations of this approach should be obvious. We have


to create a daughter card for the NAND device, and that
device needs to be available.

Clocking the system is now a complex process, and stopping


clocks to external devices can have negative and unintended
59
SoC Emulation—Bursting Into Its Prime

consequences. Many times these consequences keep the


external devices from operating properly.

Limitations of speed adapters prevents running the host and


the device at the same speed, possibly hiding some timing
issues. It also makes it extremely difficult to inject errors, as
the speed-adapter does not pass most errors generated from
the host to the device.

However, the major limitation comes down to the pain of


configuring the system to target different media. A single SSD
controller is designed to support multiple SSD capacities via
adding or removing NAND channels, or selecting different
NAND chips, which will have different number of chip selects,
planes, blocks, and possibly even pages within a block and
page sizes. Enabling testing support for all of the identified
configurations requires, at a minimum, an external card with
socketed support for different NAND chips, plus someone to
physically swap out NAND chips when switching to a different
configuration. This testing is difficult if not impossible to
automate, slowing down testing of different configurations.
Additionally, controller development is more likely a single
project set into a roadmap of multiple controllers, where
supporting the current controller and planning for the next is
key. That requires support for multiple generations of NAND
flash, which further complicates ICE mode testing.

The last big problem is that firmware has become a major part
of the functionality delivered in an SSD. The whole system
needs to work together to qualify as a product, so testing of
just the controller without the associated firmware leaves big
holes in the verification and validation methodology. It is clear

60
SoC Emulation—Bursting Into Its Prime

that firmware has become the focus of SSD success, and has
also taken the lead in number of engineers required and
schedule time. The hardware and firmware schedules can no
longer be serialized if a project is to be competitive. Instead,
firmware development needs to start when the hardware is still
lacking maturity, and both will need to be debugged together
as a system.

Fortunately, there is a way to address each of these problems,


which brings us to a virtual environment.
Virtual Emulation Mode
In general, a virtual solution replaces a physical
implementation with a model. A purely virtual solution only
uses models—no cabling from the emulator to a physical
device.

Some companies have had notable success using an


approach that virtualizes the parts of the controller that are
well understood at an interface level, and allows for much
greater flexibility in making design and architectural changes.
All while providing greater visibility into the DUT.

If the host design is PCIe/NVMe, for example, the interface


itself is standardized and well-known. If we could stimulate that
interface in system simulation with something configurable
enough to hit corner cases, but simple enough to make bring-
up doable in a very short time-frame (without writing an entire
testbench to re-invent the wheel) then that would cover a
major portion of the controller testing.

At the same time, the NAND interfaces (both Toggle and


ONFI) are well-known, but the underlying NAND 3-D
technology and device physics are highly complex, and
61
SoC Emulation—Bursting Into Its Prime

probably still under development if your controller is forward


looking. That means the target device probably does not even
exist, and there is only an early specification. However, if a
model exists for that device, the same process done on the
host interface can be done with the NAND interface. Simply
drop in a model replacement.

Remembering the quote by George Box, “All models are


wrong, some are useful.” By understanding that the model
does not represent the device hardware exactly, the question
remains, is the model good enough? To answer this question,
some empirical data is useful. One company started their
production firmware development at the same time as the
hardware. They used Veloce soft models to emulate the
NAND devices, and they found that the firmware that passed
on the emulator [using soft models for NAND] had first-pass
success when run on the real chip. By design, the DUT on the
emulator is identical to the chip.

The fact is, a well-designed model speeds development time


and moves firmware integration very early in the project such
that any model differences with the actual NAND device, or
physical host, is trivial compared to time lost implementing a
physical ICE-based system.

Illustrating the “now” virtualized environment, we have a new


picture of the system:

62
SoC Emulation—Bursting Into Its Prime

Emulation Deployment on Veloce2. Courtesy: Mentor

Using virtual mode and testing a virtual system also allows the
emulator to be part of a data-center use model, enabling
engineers to run simulations from their desk and share the
emulator with multiple users at the same time.

Firmware development in a virtual environment can start at the


same time as design creation. Traditional storage firmware
development and testing starts in earnest when the silicon is in
the lab, but successful companies have proven software-
driven design flows let firmware start with hardware definition.
When this happens, the overall design time and the time-to-
market shrinks.

The greatest advantage to virtualizing your verification is


flexibility. Hard drive controller development rarely attempted
targeting multiple variants of spinning media; however, that is
exactly what SSD controllers must do. The ability to re-
configure a design to target a completely new NAND device,
and get accurate performance data, prior to silicon, gives
controller teams the advantage.

63
SoC Emulation—Bursting Into Its Prime

Implementing an SSD Controller in Veloce™ (Mentor


Graphics’ Hardware Emulator)
Creating a Verification or Validation Environment
Several steps need to be followed to convert an existing
environment for testing an SSD controller on Veloce (see
diagram in 4.c above), or creating a new environment for
enhanced testing.

The host interface, most likely, requires the most modification


to get an environment to run. It is best to use a host VirtuaLAB
solution, which connects into a QEMU environment and allows
the user to run existing applications that may already be
available. This can include test scripts, performance
measurement applications, plus any other host-related
exerciser scripts. It is highly recommended to use existing test
scripts that are part of most regression suites that any SSD
drive developer has, that were created for previous products.
Users can also run off-the-shelf performance measurement
software to measure the performance and identify bottlenecks
within the design. All of this creates a better first tapeout and
reduces the likelihood or number of subsequent controller
spins necessary.

Host interface VTL designs are also available, if emulation of


the SSD controller is being used to enhance an existing
verification environment. These are similar to existing VIP
used within the verification flow, although typically they have a
subset of features needed to communicate with a synthesized
design.

Replacing the NAND memory with a model is the next highest


priority. Given that SSDs typically are sized from several

64
SoC Emulation—Bursting Into Its Prime

hundred gigabytes to several terabytes, finding enough


physical memory available to implement the full drive memory
is challenging to say the least. We recommend using a
FlexMem memory, a model that runs on a server connected to
the emulator with dynamic allocation of the NAND memory as
it is used, along with a cache on the emulator hardware to
speed up performance. Also available are sparse and full
hardware memory models, although those each have
restrictions not found in the FlexMem version.

If the environment is going to be used to validate the entire


design including FW, a virtual JTAG host needs to be
connected to the processor for debug and trace support. We
recommend running Codelink® as well, to support quick FW
debug.

DRAM and NOR memories also need to be replaced with


models. Since these two memory implementations are
typically much smaller than the NAND array of memory, HW
models that live on the emulator are best used. Also available
are DRAM DFI models, which should connect to many DRAM
controllers and remove the implementation and debug time
required to get a working PHY into the emulator while being
guaranteed to work with the Mentor-supplied DDR DRAM
model.
Running the Tests
Once the design is ported to the emulator, the user can run a
set of tests to check out the controller design. Many testcases
should already exist, from the verification environment (used
with a VTL host front end) to customer-based validation tests
that can check out the design in a full system (used with a
VirtuaLAB host front end). While not as fast as real hardware
65
SoC Emulation—Bursting Into Its Prime

in the lab, these tests will run significantly faster than a


verification environment. Many tests NOT even considered
before because of runtime, are now possible, running in a
fraction of the time while still providing full visibility. FW can
also be loaded and run on the emulator, allowing for testing of
a production design long before that design is ever available in
the lab. This environment supports development and debug of
HW designs, FW designs, validation test scripts and customer
test scripts all prior to tapeout, reducing time-to-market as well
as increasing the likelihood of working first-pass silicon.

Debugging
The emulator has multiple test methods to speed finding the
root cause of bugs. While all signals within a design can be
captured in a waveform, Veloce also supports a capture mode
where only those signals of interest are captured, speeding up
run time and finding the bug sooner. Codelink provides a way
for FW engineers to run a test, capture the results, and then
replay the test while stepping forward and backward to isolate
and fix a bug. Capture and replay is also supported from a HW
perspective, where the Veloce captures test results for a test
run quickly on the Veloce, downloads them to a server tore-run
and debug the results while freeing up the emulator for other
uses.

A/B Testing
Testing of different SSD configurations, specifically the amount
and configuration of NAND memory connected to the
controller can be challenging in a typical lab environment. At a
minimum, the existing memory is replaced with the new
configuration. In the worst case, a new PCB created and parts
soldered on the board. This assumes that the physical parts

66
SoC Emulation—Bursting Into Its Prime

exist and work for cutting-edge development. It’s possible that


the NAND chips are being developed concurrently with the
controller design, and they aren’t even available for prototype
testing. The Veloce NAND memory model solves those
problems. Even if a NAND chip is not available, a specification
typically is. A NAND model is created based on that
specification and used for pre-tapeout testing of a controller to
ensure that it works as expected. If a feature in the NAND chip
changes, the model is easily updated to match the new feature
and the testing is re-run.

Most if not all controllers are designed to support multiple


sizes and configurations of memory, including number of
channels, number and size of blocks and pages, number of
planes, plus multiple other configuration options. Testing all of
these possible configurations is much easier with an emulator.
Instead of having to replace chips and possibly create new
printed circuit boards, a different top-level file is created that
instantiates the different configuration, re-compiled, and a new
set of tests run with the new configuration. This makes A/B
testing with different configurations and different optimizations
much easier and faster to run, allowing the controller design
team to make tradeoffs in their design and update their
architecture much sooner if it’s discovered that there is an
unexpected hole in performance or support.

Conclusion
SSDs are fundamentally different from traditional spinning
hard drives, and a verification methodology must evolve with
the unique challenges of using NAND flash as a storage
medium. These differences also reveal new opportunities to

67
SoC Emulation—Bursting Into Its Prime

use more flexible and powerful tools, including using a virtual


host machine driving PCIe traffic, and an entire NAND
configuration with the flexibility of soft-models. This frees up
the emulator to run multiple users in parallel, creating
efficiencies not possible using the ICE mode. Moreover,
Veloce’s save-and-restore capability is another feature that
designers appreciate since it allows them to free the emulator
while they debug a previous run.

As storage technology and use models continue to evolve, so


do the verification tools needed to solve today’s challenges.
The Veloce emulator is well suited to address these
challenges as experienced by leading storage companies who
are using it today in production environments.

68
SoC Emulation—Bursting Into Its Prime

References

1 http://mesl.ucsd.edu/gupta/cse291-fpga/Readings/YSE86.pdf
2 http://semiengineering.com/kc/knowledge_center/A-brief-
history-of-logic-simulation/12
3https://www.computer.org/csdl/proceedings/dac/1988/0864/00/
00014761.pdf
4 http://www.cis.upenn.edu/~lee/06cse480/lec-fpga.pdf
5 eg. http://www.deepchip.com/items/0522-02.html (graph near
end of article)
6 https://en.wikipedia.org/wiki/Rent%27s_rule
7http://ramp.eecs.berkeley.edu/Publications/RAMP2010_MButts20
Aug (Slides, 8-25-2010).pptx
8 https://verificationacademy.com/verification-horizons/march-
2015-volume-11-issue-1/Hardware-Emulation-Three-Decades-of-
Evolution
9 http://www.uccs.edu/~gtumbush/4211/Logic Emulation with
Virtual Wires.pdf
10 http://www.deepchip.com/items/0522-01.html
11 https://www.amazon.com/Prototypical-Emergence-FPGA-Based-
Prototyping-
Design/dp/1533391610/ref=sr_1_1?ie=UTF8&qid=1463942374&sr=
8-1&keywords=prototypical
12 https://www.semiwiki.com/forum/content/5740-software-
driven-verification-drives-tight-links-between-emulation-
prototyping.html
13 https://www.cadence.com/content/cadence-
www/global/en_US/home/company/newsroom/press-releases/pr-
ir/2016/cadence-completes-acquisition-of-rocketick-
technologies.html
14 https://www.synopsys.com/cgi-
bin/verification/dsdla/pdfr1.cgi?file=vcs-fgp-wp.pdf
15 https://www.cadence.com/content/cadence-
www/global/en_US/home/tools/system-design-and-
verification/acceleration-and-emulation/palladium-z1.html

69
SoC Emulation—Bursting Into Its Prime

16http://www.eetasia.com/ARTICLES/2005NOV/B/2005NOV01_PL_
EDA_TA.pdf?SOURCES=DOWNLOAD
17 http://electronicdesign.com/fpgas/what-s-difference-between-
fpga-and-custom-silicon-emulators
18 http://electronicdesign.com/fpgas/what-s-difference-between-
fpga-and-custom-silicon-emulators
19 http://www.synopsys.com/Tools/Verification/hardware-
verification/emulation/Pages/zebu-server-asic-emulator.aspx
20 https://verificationacademy.com/verification-
horizons/november-2015-volume-11-issue-3/hardware-emulation-
three-decades-of-evolution-part-iii (Eve/Synopsys section)
21 https://verificationacademy.com/verification-
horizons/november-2015-volume-11-issue-3/hardware-emulation-
three-decades-of-evolution-part-iii
22 https://www.semiwiki.com/forum/content/5198-r-evolution-
hardware-based-simulation-acceleration.html
23 https://www.mentor.com/company/news/mentor-adds-arm-
amba-5-ahb-verification-ip,
http://ip.cadence.com/ipportfolio/verification-ip/accelerated-vip
24 http://electronicdesign.com/eda/transaction-based-verification-
and-emulation-combine-multi-megahertz-verification-performance
25 http://www.eetimes.com/document.asp?doc_id=1212471
26http://www.thefreelibrary.com/Quickturn+Announces+Palladium
,+the+Most+Advanced+Simulation...-a075609988
27 http://embedded-computing.com/news/eves-offers-multi-user-
capability-2/
28 https://www.cadence.com/content/cadence-
www/global/en_US/home/tools/system-design-and-
verification/acceleration-and-emulation/palladium-
z1.html?CMP=pr111615_PalladiumZ1
29 http://verificationhorizons.verificationacademy.com/volume-
8_issue-2/articles/stream/virtualization-delivers-total-verification-
soc-hardware-software-interfaces_vh-v8-i2.pdf
30 https://www.cadence.com/content/dam/cadence-
www/global/en_US/documents/tools/system-design-
verification/palladium-emulation-development-kit-ds.pdf
31 https://www.mentor.com/products/fv/emulation-
systems/virtual-devices
70
SoC Emulation—Bursting Into Its Prime

32 https://www.semiwiki.com/forum/content/6711-rise-
transaction-based-emulation.html
33http://s3.mentor.com/public_documents/whitepaper/resources/
mentorpaper_81009.pdf
34 http://www.deepchip.com/items/0522-01.html
35 “Mobile Unleashed”, Daniel Nenni and Don Dingee, SemiWiki,
December 2015.
36 http://embedded-computing.com/guest-blogs/hardware-and-
software-grow-ever-closer/
37 https://www.mentor.com/products/fv/codelink/
38 https://www.cadence.com/content/cadence-
www/global/en_US/home/tools/system-design-and-
verification/software-driven-verification/indago-embedded-sw-
debug-app.html
39https://www.synopsys.com/Tools/Verification/debug/Pages/verd
i-hw-sw-ds.aspx
40http://www.eetimes.com/author.asp?section_id=36&doc_id=133
0092
41 https://verificationacademy.com/verification-
horizons/november-2015-volume-11-issue-3/hardware-emulation-
three-decades-of-evolution-part-iii
42 https://www.cadence.com/content/cadence-
www/global/en_US/home/tools/system-design-and-
verification/acceleration-and-emulation/palladium-dynamic-
power.html
43 https://www.mentor.com/products/fv/emulation-
systems/veloce-power-application
44 http://www.synopsys.com/Tools/Verification/hardware-
verification/emulation/Pages/zebu-server-asic-emulator.aspx
45 http://semimd.com/blog/2016/02/25/design-for-testability-dft-
verified-with-hardware-emulation/
46 http://www.electronicsweekly.com/news/app-based-emulators-
go-beyond-rtl-verification-2016-06/
47 https://www.semiwiki.com/forum/content/5742-ecosystem-
partnership-effective-network-hardware-design.html
48https://www.mentor.com/products/fv/techpubs/download?id=9
7874&contactid=1&PC=L&c=2016_09_05_veloce_ixia_de-
risk_network_wp 71
SoC Emulation—Bursting Into Its Prime

49 https://dvcon.org/sites/dvcon.org/files/files/2016/Panel-
Emulation-Static-Verification-Will-Replace-Simulation.mp3
50 http://www.imperas.com/why-use-virtual-platforms
51 https://www.cadence.com/content/dam/cadence-
www/global/en_US/documents/tools/system-design-
verification/palladium-xp-ii-wp.pdf
52https://community.cadence.com/cadence_blogs_8/b/sd/archive/
2012/05/16/debug-breakthroughs-enabled-by-in-circuit-
acceleration
53https://www.mentor.com/products/fv/resources/overview/testb
ench-considerations-for-maximizing-the-speed-of-simulation-
acceleration-with-a-hardware-emulator-5b79adaa-0634-41c2-8533-
ac88fe5df86b
54 http://accellera.org/downloads/standards/sce-mi
55 https://www.semiwiki.com/forum/content/6711-rise-
transaction-based-emulation.html
56 https://dvcon.org/sites/dvcon.org/files/images/2016/DVCon-
2016-FINAL%20handout-min.pdf

72
SoC Emulation—Bursting Into Its Prime

73

You might also like