Module – 5 ROUTING:
Global Routing: Goals and objectives, Global Routing Methods, Global routing between
blocks, Back-annotation.
Detailed Routing: Goals and objectives, Measurement of Channel Density,
Left-Edge Algorithm, Area-Routing Algorithms,
Multilevel routing, Timing – Driven detailed routing,
Final routing steps, Special Routing, Circuit extraction and DRC.
Introduction
Once the designer has
Floorplanned a chip
The logic cells within the flexible blocks have been placed
Time to make the connections by routing the chip.
This is still a hard problem that is made easier by dividing it into smaller
problems.
Routing is usually split into
Global routing followed by detailed routing .
•221
•The starting point of floorplaning and placement steps for
the viterbi decoder
•- collection of standard cells with no room set aside yet •222
routing.
for
The starting point of floorplaning and
placement steps for the viterbi decoder
• Small boxes that look like bricks - outlines of the standard cells.
• Largest standard cells, at the bottom of the display (labeled dfctnb)
- 188 D flipflops.
• '+' symbols -drawing origins of the standard cells—for the D flip-flops
they are shifted to the left and below the logic cell bottom left-hand
corner.
• Large box surrounding all the logic cells - estimated chip size.
• (This is a screen shot from Cadence Cell Ensemble.)
•223
The viterbi decoder after floorplanning
•224
•FIGURE 17.1 The core of the Viterbi decoder chip after placement (a screen shot from
Cadence Cell Ensemble) •225
•FIGURE 17.2 The core of the Viterbi decoder chip after the completion of global and detailed
routing (a screen shot from Cadence Cell Ensemble). This chip uses two-level metal. Although you
cannot see the difference, m1 runs in the horizontal direction and m2 in the vertical direction. •226
Global Routing
• The details of global routing differ slightly between
– cell-based ASICs, gate arrays, and FPGAs, but the principles are the
same.
• A global router does not make any connections, it just plans them.
• Global route the whole chip (or large pieces if it is a large chip) before detail
routing the whole chip (or the pieces).
•227
Goals and Objectives
• Input to routing
– Floorplan that includes the locations of all the fixed and flexible blocks;
– Placement information for flexible blocks;
• Locations of all the logic cells.
• Goal of global routing
– To provide complete instructions to the detailed router on where to
route every net.
• Objectives of global routing
– Minimize the total interconnect length.
– Maximize the probability that the detailed router can complete
the routing.
– Minimize the critical path delay.
•228
Measurement of Interconnect Delay
• After placement, the logic cell positions are fixed and the global router can afford to use
better estimates of the interconnect delay.
• To illustrate one method, we shall use the Elmore constant to estimate the interconnect
delay for the circuit shown in Figure 17.3 .
•FIGURE 17.3 Measuring the delay of a net. (a) A simple circuit with an inverter A driving a
net with a fanout of two. Voltages V 1 , V 2 , V 3 , and V 4 are the voltages at intermediate
points along the net. (b) The layout showing the net segments (pieces of interconnect).
(c) The RC model with each segment replaced by a capacitance and resistance. The ideal •229
switch and pull-down resistance R pd model the inverter A.
The problem is to find the voltages at the inputs to logic cells B and C taking
into account the parasitic resistance and capacitance of the metal interconnect.
Figure 17.3 (c) models logic cell A as an ideal switch with a pull-down
resistance equal to R pd and models the metal interconnect using resistors and
capacitors for each segment of the interconnect.
•The Elmore constant for node 4 (labeled V 4 ) in the network
shown in Figure 17.3 (c) is
4
ζ4 = ΣR k 4 C k (17.1)
k=1
= R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 ,
•where,
•230
In Eq. 17.2 notice that R 24 = R pd + R 1 (and not R pd + R 1 + R 2 ) because
R 1 is the resistance to V 0 (ground) shared by node 2 and node 4.
Suppose we have the following parameters (from the generic 0.5 m m CMOS
process, G5) for the layout shown in Figure 17.3 (b):
• m2 resistance is 50 m Ω /square.
• m2 capacitance (for a minimum-width line) is 0.2 pFmm –1 .
• 4X inverter delay is 0.02 ns + 0.5 CLns ( C L is in picofarads).
• Delay is measured using 0.35/0.65 output trip points.
• m2 minimum width is 3 λ = 0.9 µm.
• 1X inverter input capacitance is 0.02 pF (a standard load).
First we need to find the pull-down resistance, Rpd , of the 4X inverter. If we
model the gate with a linear pull-down resistor, Rpd , driving a load CL , the
output waveform is exp – t /( CLRpd ) (normalized to 1V).
The output reaches 63 percent of its final value when t = CLRpd , because
exp (–1) = 0.63. Then,because the delay is measured with a 0.65 trip point, the
constant 0.5 nspF –1 0.5kW is very close to the equivalent pull-down
resistance. Thus, Rpd = 500 Ω .
•231
•m2 resistance is 50 m Ω square.
•m2 capacitance (for a minimum-width
line) is 0.2 pFmm –1 .
•4X inverter delay is 0.02 ns + 0.5 CLns (
C L is in picofarads).
•Delay is measured using 0.35/0.65
output trip points.
•m2 minimum width is 3 λ = 0.9 µm.
•1X inverter input capacitance is 0.02
pF (a standard load).
•232
• R1= R2 = 6 Ω
• R3=56 Ω
• R4=112 Ω
• C 1=0.02 pF
• C 2 =0.04 pF
• C 3=0.2 pF
• C 4=0.42 pF
Now we can calculate the path resistance, Rki, values (notice that Rki = Rki):
R14 = 500 Ω + 6 Ω =506 Ω
R24 = 500 Ω + 6 Ω =506 Ω
R34 =500 Ω + 6 Ω + 56 Ω =562 Ω
R44 =500 Ω + 6 Ω + 56 Ω + 112 Ω =674 Ω (17.5)
•233
Finally, we can calculate Elmore’s constants for node 4 and node 2 as follows :
ζD4 = R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 (17.6)
= (506)(0.02) + (506)(0.04)
+ (562)(0.2) + (674)(0.42)
= 425 ps .
ζD2 = R 12 C 1 + R 22 C 2 + R 32 C 3 + R 42 C 4 (17.7)
= ( R pd + R 1 )( C 1 + C 3 + C 4 )
+ ( R pd + R 1 + R 2 ) C 2
= (500 + 6 + 6)(0.04)
+ (500 + 6)(0.02 + 0.2 + 0.2)
•and ζD4 – ζD2 = (425 – 344) = 81 ps.
= 344 ps .
•A lumped-delay model neglects the effects of interconnect
resistance and simply sums all the node capacitances (the
lumped capacitance ) as follows:
• ζD = R pd ( C 1 + C 2 + C 3 + C 4 ) (17.8)
•234
• = (500) (0.02 + 0.04 + 0.2 + 0.42)
Measurement of delay
The delay of the inverter can be assigned as follows:
– 20 ps (the intrinsic delay, 0.02 ns, due to the cell output
capacitance),
– 340 ps (due to the pull-down resistance and the output
capacitance),
– 4 ps (due to the interconnect from A to B), (ζD2- ζD )
– 85 ps (due to the interconnect from A to C) (ζD4- ζD ).
•235
Measurement of Interconnect Delay (contd.,)
• Even using the Elmore constant we still made the following assumptions in
estimating the path delays:
• A step-function waveform drives the net.
• The delay is measured from when the gate input changes.
• The delay is equal to the time constant of an exponential waveform
that approximates the actual output waveform.
• The interconnect is modeled by discrete resistance and capacitance
elements.
• The global router could use more sophisticated estimates that remove some
of these assumptions, but there is a limit to the accuracy with which delay
can be estimated during global routing
• When the global router attempts to minimize interconnect delay, there is
an important difference between a path and a net.
• The path that minimizes the delay between two terminals on a net is not
necessarily the same as the path that minimizes the total path length of
the net.
•236
Global Routing Methods
• Many of the methods used in global routing are based on the solutions to the
tree on a graph problem.
• sequential routing :
One approach to global routing takes each net in turn and calculates
the shortest path using tree on graph algorithms—with the added
restriction of using the available channels.
Disadvantage:
• As a sequential routing algorithm proceeds, some channels will
become more congested since they hold more interconnects than
others.
• In the case of FPGAs and channeled gate arrays, the channels have a
fixed channel capacity and can only hold a certain number of
interconnects.
•237
Global Routing Methods (contd.,)
• There are two different ways that a global router normally handles this problem.
1.Order independent Routing
2.Order dependent Routing
• Order-independent routing, a global router proceeds by routing each net, ignoring
how crowded the channels are. Whether a particular net is processed first or last does
not matter, the channel assignment will be the same.
• Order-independent routing, after all the interconnects are assigned to channels, the
global router returns to those channels that are the most crowded and reassigns some
interconnects to other, less crowded, channels.
• order dependent :A global router can consider the number of interconnects already
placed in various channels as it proceeds. In this case the global routing is order
dependent —the routing is still sequential, but now the order of processing the nets will
affect the results.
• Iterative improvement or simulated annealing may be applied to the solutions found
from both order-dependent and order-independent algorithms.
•238
Global Routing Methods (contd.,)
• Hierarchical routing handles all nets at a particular level at once.
• Rather than handling all of the nets on the chip at the same time, the global-
routing problem is made more tractable by dividing the chip area into levels of
hierarchy.
• By considering only one level of hierarchy at a time the size of the problem is
reduced at each level.
• There are two ways to traverse the levels of hierarchy.
• top-down approach :- Starting at the whole chip, or highest level, and
proceeding down to the logic cells is the.
• The bottom-up approach starts at the lowest level of hierarchy and globally
routes the smallest areas first.
•239
Global Routing
• There are two types of areas to global route:
– between blocks
– inside the flexible blocks
•240
Global Routing Between Blocks
•FIGURE 17.4 Global routing for a cell-based ASIC formulated
as a graph problem. (a) A cell-based ASIC with numbered
channels. (b) The channels form the edges of a graph. (c) The
channel-intersection graph. Each channel corresponds to an
edge on a graph whose weight corresponds to the channel
length. •241
Global Routing Between Blocks
( contd.,)
•FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC showing a single net
with a fanout of four (five terminals). We have to order the numbered channels to complete
the interconnect path for terminals A1 through F1. (b) The terminals are projected to the
center of the nearest channel, forming a graph. A minimum-length tree for the net that uses
the channels and takes into account the channel capacities. (c) The minimum-length tree
does not necessarily correspond to minimum delay. If we wish to minimize the delay
from terminal A1 to D1, a different tree might be better. •242
Global Routing Between Blocks
( contd.,)
• Global routing is very similar for cell-based ASICs and gate arrays, but there
is a very important difference between the types of channels in these
ASICs.
• In channeled gate-arrays and FPGAs the size, number, and location of
channels are fixed.
• Advantage - the global router can allocate as many interconnects to each
channel as it likes, since that space is committed anyway.
• Disadvantage - there is a maximum number of interconnects that each
channel can hold.
• If the global router needs more room, even in just one channel on the whole
chip, the designer has to repeat the placement-and-routing steps and try again
(or use a bigger chip).
•243
Global Routing Inside Flexible Blocks
•FIGURE 17.6 Gate-array global routing. (a) A small gate array. (b) An enlarged view of the routing. The
top channel uses three rows of gate-array base cells; the other channels use only one. (c) A further
enlarged view showing how the routing in the channels connects to the logic cells. (d) One of the logic
cells, an inverter. (e) There are seven horizontal wiring tracks available in one row of gate-array base •244
cells—the channel capacity is thus 7
Global Routing Inside Flexible Blocks (contd.,)
•FIGURE 17.7 The gate-array inverter from Figure 17.6
d. (a) An oxide-isolated gate-array base cell, showing
the diffusion and polysilicon layers. (b) The metal and
contact layers for the inverter in a 2LM (two-level
metal) process. (c) The router’s view of the cell in a 3LM•245
Global Routing Inside Flexible Blocks
FIGURE 17.8 Global routing a gate array. (a) A single global-routing cell (GRC or routing bin) containing 2-by-4
gate-array base cells. For this choice of routing bin the maximum horizontal track capacity is 14, the maximum
vertical track capacity is 12. The routing bin labeled C3 contains three logic cells, two of which have feedthroughs
marked 'f'. This results in the edge capacities shown. (b) A view of the top left-hand corner of the gate array
showing 28 routing bins. The global router uses the edge capacities to find a sequence of routing bins to connect
the nets.
•246
Timing-DrivenMethods
• As in timing-driven placement, there are two main approaches to timing-driven routing:
– net-based and path-based.
• Path-based methods are more sophisticated.
For example, if there is a critical path from logic cell A to B to C, the global router
may increase the delay due to the interconnect between logic cells A and B if it
can reduce the delay between logic cells B and C.
• Placement and global routing tools may or may not use the same algorithm to
estimate net delay. If these tools are from different companies, the algorithms are
probably different.
• The algorithms must be compatible, however. There is no use performing placement to
minimize predicted delay if the global router uses completely different
measurement methods.
• Companies that produce floorplanning and placement tools make sure that the
output is compatible with different routing tools—often to the extent of using different
algorithms to target different routers.
•247
Back-annotation
• The global router can give not just an estimate of the total net
length (which was all we knew at the placement stage), but the
resistance and capacitance of each path in each net. This RC
information is used to calculate net delays.
• Back-annotate this net delay information
– to the synthesis tool for in-place optimization or
– to a timing verifier to make sure there are no timing surprises.
• Differences in timing predictions at this point arise due to the
different ways in which the placement algorithms estimate the
paths and the way the global router actually builds the paths.
•248
Detailed Routing
Goal:
• The goal of detailed routing is to complete all the connections between logic
cells.
Objectives:
• The most common objective is to minimize one or more of the following:
– The total interconnect length and area
– The number of layer changes that the connections have to make
– The delay of critical paths
• Minimizing the number of layer changes corresponds to minimizing the
number of vias that add parasitic resistance and capacitance to a connection.
•249
Measurement of Channel Density
Definition of Local and Global channel density
• Maximum local density of channel is Global density
• Channel density is less than or equal to Channel capacity.
•250
Left-edge algorithm
The left-edge algorithm ( LEA ) is the basis for several routing algorithms [
Hashimoto and Stevens, 1971].
The LEA applies to two-layer channel routing, using one layer for the trunks and the
other layer for the branches.
For example, m1 may be used in the horizontal direction and m2 in the vertical
direction.
The LEA proceeds as follows:
1. Sort the nets according to the leftmost edges of the net’s horizontal
segment.
2. Assign the first net on the list to the first free track.
3. Assign the next net on the list, which will fit, to the track.
4. Repeat this process from step 3 until no more nets will fit in the current
track.
5. Repeat steps 2–4 until all nets have been assigned to tracks.
6. Connect the net segments to the top and bottom of the channel.
•251
Left-edge algorithm
•252
Left-edge algorithm
•253
Constraints and Routing Graphs
• Two terminals that are in the same column in a channel create a
vertical constraint .
• Overlap between the trunks of nets is called horizontal constraint.
•254
Dog-Leg router
• A dogleg router removes the restriction that each net can use only one
track or trunk.
•255
Area Routing Algorithm- Lee-Maze algorithm
[For general shaped areas]
• Finds a path from source (X) to target (Y) by emitting a wave from both
the source and the target at the same time.
• Successive outward moves are marked in each bin.
• Once the target is reached, the path is found by backtracking (if there is a
choice of bins with equal labeled values, choose the bin that avoids changing
direction). (The original form of the Lee algorithm uses a single wave.)
•256
Hightower or line search-Area routing algorithm
[For general shaped areas]
• 1. Extend lines from both the source and target toward each
other.
•2. When an extended line, known as an escape line , meets
an obstacle, choose a point on the escape line from which to
project another escape line at right angles to the old one. This
•257
point is the escape point .
Special routing- CLK routing
• Gate arrays normally use a clock spine (a regular grid), eliminating the need
for special routing.
• The clock distribution grid is designed at the same time as the gate-array
base to ensure a minimum clock skew and minimum clock latency—given
power dissipation and clock buffer area limitations.
• Cell-based ASICs may use either a clock spine, a clock tree, or a hybrid
approach.
• Figure shows how a clock router may minimize clock skew in a clock spine
by making the path lengths, and thus net delays, to every leaf node equal—
using jogs in the interconnect paths if necessary.
• More sophisticated clock routers perform clocktree synthesis
(automatically choosing the depth and structure of the clock tree) and
clock-buffer insertion (equalizing the delay to the leaf nodes by balancing
•258
interconnect delays and buffer delays).
Special routing- CLK routing
FIGURE: Clock routing. (a) A clock network for the cellbased ASIC
(b) Equalizing the interconnect segments between CLK and all
destinations (by including jogs if necessary) minimizes clock
skew.
•259
Special routing- Power routing
• Power bus width
• Each of the power buses has to be sized according to the current it
will carry.
• Too much current in a power bus can lead to a failure through a
mechanism known as electromigration.
• The required power-bus widths can be estimated automatically
from library information, from a separate power simulation tool, or
by entering the power-bus widths to the routing software by
hand.
• Many routers use a default power-bus width so that it is quite easy
to complete routing of an ASIC without even knowing about this
•260
problem.
Special routing- Power routing
• Gate-Array ASIC
• Gate arrays normally use a regular power grid as part of
the gate-array base.
• The gate-array logic cells contain two fixed-width power
buses inside the cell, running horizontally on m1.
• The horizontal m1 power buses are then strapped in a
vertical direction by m2 buses, which run vertically
across the chip.
•261
Special routing- Power routing
• Cell-based ASIC
• Standard cells are constructed in a similar fashion to gate-array cells, with
power buses running horizontally in m1 at the top and bottom of each
cell.
• A row of standard cells uses end-cap cells that connect to the VDD and VSS
power buses placed by the power router.
• Power routing of cell-based ASICs may include the option to include
vertical m2 straps at a specified intervals.
• In a three-level metal process, power routing is similar to two-level metal
ASICs. Power buses inside the logic cells are still normally run on m1.
Using HVH routing it would be possible to run the power buses on m3 and
drop vias all the way down to m1 when power is required in the cells. •262
Circuit Extraction
• After detailed routing is complete, the exact length and position of each
interconnect for every net is known.
• Now the parasitic capacitance and resistance associated with each
interconnect, via, and contact can be calculated.
• This data is generated by a circuit-extraction tool in one of the formats.
• standard parasitic format ( SPF )
• The standard parasitic format ( SPF ) describes interconnect delay
and loading due to parasitic resistance and capacitance.
• There are three different forms of SPF:
– Two of them ( regular SPF and reduced SPF ) contain the same
information, but in different formats, and model the behavior of
interconnect;
•263
– Third form of SPF ( detailed SPF ) describes the actual parasitic
Circuit Extraction
• The load at the output of gate A is represented by one of three models: lumped-C,
lumped- RC, or PI segment.
Figure: The regular and reduced standard parasitic format (SPF) models for
interconnect. (a) An example of an interconnect network with fanout. The driving-point
admittance of the interconnect network is Y ( s ). (b) The SPF model of the interconnect.
(c) The lumped-capacitance interconnect model. (d) The lumped-RC interconnect
model. (e) The PI segment interconnect model .
The values of C , R , C 1 , and C 2 are calculated so that Y 1 ( s ), Y 2 ( s ), and Y 3 ( s ) are the
first-, second-, and third-order Taylor-series approximations to Y ( s ).
•264
Circuit Extraction
The key features of regular and reduced SPF are as follows:
• The loading effect of a net as seen by the driving gate is represented by
choosing one of three different RC networks: lumped-C, lumped-RC, or PI
segment (selected when generating the SPF) [ O’Brien and Savarino, 1989].
• The pin-to-pin delays of each path in the net are modeled by a simple RC
delay (one for each path). This can be the Elmore constant for each path, but it
need not be.
• The reduced SPF ( RSPF) contains the same information as regular SPF,
but uses the SPICE format.
• Detailed SPF:
• The detailed SPF ( DSPF) shows the resistance and capacitance of each
segment in a net, again in a SPICE format. There are no models or
assumptions on calculating the net delays in this format.
•265
Design-Rule Check ( DRC )
• ASIC designers perform two major checks before fabrication.
• DRC:
• The first check is a design-rule check ( DRC ) to ensure that nothing
has gone wrong in the process of assembling the logic cells and
routing.
• The DRC may be performed at two levels.
• Phantom-Level DRC:
• The first level of DRC is a phantom-level DRC , which checks for
shorts, spacing violations, or other design-rule problems between
logic cells.
• This is principally a check of the detailed router.
• If the real library-cell layouts (sometimes called hard layout ) can be
accessed, we can instantiate the phantom cells and perform a
•266
second-level DRC at the transistor level.
Design-Rule Check ( DRC )
• Dracula check:
• This is principally a check of the correctness of the library cells.
• Normally the ASIC vendor will perform this check using its own
software as a type of incoming inspection.
• The Cadence Dracula software is one de facto standard in this area,
and you will often hear reference to a Dracula deck that consists of
the Dracula code describing an ASIC vendor’s design rules.
• Sometimes ASIC vendors will give their Dracula decks to customers so
that the customers can perform the DRCs themselves.
•267
Design-Rule Check ( DRC )
• Layout Vs Schematic check:
• To ensure that what is about to be committed to silicon
is what is really wanted.
• An electrical schematic is extracted from the physical
layout and compared to the netlist.
• This closes a loop between the logical and physical design
processes and ensures that both are the same.
• The LVS check is not as straightforward as it may sound,
however. •268
Design-Rule Check ( DRC )
• Problems in LVS check:
• The first problem is transistor-level netlist for a large ASIC forms an
enormous graph.
• LVS software essentially has to match this graph against a reference
graph that describes the design.
• Ensuring that every node corresponds exactly to a corresponding
element in the schematic (or HDL code) is a very difficult task.
• The first step is normally to match certain key nodes (such as the
power supplies, inputs, and outputs), but the process can very quickly
become bogged down in the thousands of mismatch errors that are
inevitably generated initially.
•269
Design-Rule Check ( DRC )
• Problems in LVS check:
• The second problem with an LVS check is creating a true reference.
• The starting point may be HDL code or a schematic.
• Logic synthesis, test insertion, clock-tree synthesis, logical-to-physical
pad mapping, and several other design steps each modify the
netlist.
• The reference netlist may not be what we wish to fabricate.
• In this case designers increasingly resort to formal verification that
extracts a Boolean description of the function of the layout and
compare that to a known good HDL description.
•270