Vlsi Module 2
Vlsi Module 2
MODULE 2
Floorplanning: Goals and objectives, Measurement of delay in Floorplanning. Floor planning tools,
Channel definition, I/O and Power planning and clock planning.
Placement: Goals and Obectives, Min-cut Placement algorithm, Iterative Placement Improvement, Tie
driven placement methods, Physical Design Flow.
Routing: Global Routing: Goals and objectives. Global Routing methods, Global routing between blocks,
Back annotation.
                                   ud
2.1 FLOORPLANNING GOALS & OBJECTIVES
➢ The input to a floorplanning tool is a hierarchical netlist that describes the interconnection
   of the blocks (RAM, ROM, ALU, cache controller, and so on); the logic cells (NAND, NOR, D
   flip-flop, and so on) within the blocks; and the logic cell connectors (the terminals, pins, or
   ports mean the same thing as connectors).
➢ The netlist is a logical description of the ASIC; the floorplan is a physical description of an
                                 lo
   ASIC. Floorplanning is thus a mapping between the logical description (the netlist) and the
   physical description (the floorplan).
➢ The goals of floorplanning are to:
    •   Arrange the blocks on a chip,
                   C
    •   Decide the location of the I/O pads,
    •   Decide the location and number of the power pads,
    •   Decide the type of power distribution, and
 tu
    •   Minimize delay.
V
➢ However, the total length of the interconnect and the total capacitance can be estimated.
➢ The interconnect length can be estimated by collecting statistics from previously routed chips
    and analyzing the results. From these statistics tables that predict the interconnect
    capacitance as a function of net fanout and block size is created.
➢ A floorplanning tool can then use these predicted-capacitance tables (also known as
    interconnect-load tables or wire-load tables).
➢ Figure 2.1 shows how to derive and use wire-load tables and illustrates the following facts:
                                   ud
                                 lo
                    C
 tu
➢ The distributions for FO > 1 are more symmetrical and flatter than for FO = 1.
➢ The wire-load tables can only contain one number, for example the average net capacitance,
   for any one distribution. A tool may use a predicted capacitance for which we know 90 percent
   of the nets will have less than the estimated capacitance.
➢ The statistical analysis for blocks with different sizes needs to be repeated. For example, a
   net with a FO = 1 in a 25 k-gate block will have a different (larger) average length than if the
   net were in a 5 k-gate block.
➢ The statistics depend on the shape (aspect ratio) of the block (usually the statistics are only
   calculated for square blocks).
➢ The statistics will also depend on the type of netlist. For example, the distributions will be
                                 ud
   different for a netlist generated by setting a constraint for minimum logic delay during
   synthesis which tends to generate large numbers of two-input NAND gates than for netlists
   generated using minimum-area constraints.
➢ Figure 2.2 shows that, the worst-case interconnect delay increases because we do not
   decrease chip size as we scale down feature size. One way to measure the worst-case delay
   uses an interconnect that completely crosses the chip, a coast-to-coast interconnect.
                               lo
                   C
 tu
    ram_control to be placed in one flexible block. The special symbol, usually '*', is a wildcard
    symbol.
➢ Seeding may be hard or soft. A hard seed is fixed and not allowed to move during the
    remaining floorplanning and placement steps.
➢ A soft seed is an initial suggestion only and can be altered if necessary by the floorplanner.
➢ We may also use seed connectors within flexible blocks forcing certain nets to appear in a
    specified order, or location at the boundary of a flexible block.
                                   ud
                                 lo
                    C
 tu
➢ Figure 2.3 (c) shows how we can move the blocks in a floorplanning tool to minimize routing
    congestion and improve the floorplan.
➢   Figure 2.3(d) shows the updated display with the reduced congestion after the changes.
➢   We need to control the aspect ratio of our floorplan because we have to fit the chip into the
    die cavity (a fixed-size hole, usually square) inside a package.
➢   Figure 2.4(a) shows the initial floorplan with a 2:1.5 die aspect ratio.
➢   (b) shows altering the floorplan to give a 1:1 chip aspect ratio.
➢   (c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we know
    the terminal positions in the channels. Shading indicates the ratio of channel density to the
    channel capacity. Dark areas show regions that cannot be routed because the channel
                                   ud
    congestion exceeds the estimated capacity.
➢   (d) Resizing flexible blocks A and C alleviates congestion.
➢ Figure 2.4 (a) - (c) show how we can rearrange our chip to achieve a square aspect ratio.
➢ Figure 2.4 (c) also shows a congestion map, another form of routability display. There is no
    standard measure of routability. Generally the interconnect channels, (or wiring channels)
    have a certain channel capacity; that is, they can handle only a fixed number of
    interconnects.
                                 lo
➢ One measure of congestion is the difference between the number of interconnects that we
    actually need, called the channel density, and the channel capacity.
➢ Another measure, shown in Figure 2.4 (c), uses the ratio of channel density to the channel
                     C
    capacity. With practice, we can create a good initial placement by floorplanning and a pictorial
    display.
 tu
V
➢ FIGURE 2.5 shows Routing a T-junction between two channels in two-level metal. The dots
   represent logic cell pins.
➢ Routing channel A (the stem of the T) first allows us to adjust the width of channel B.(Fig 2.5a)
➢ If we route channel B first (the top of the T), this fixes the width of channel A. We have to
   route the stem of a T-junction before we route the top.(Fig 2.5b)
                                   ud
                              Fig 2.5 Routing a T junction between channels
   are left. The slicing tree corresponding to the sequence of cuts gives the order in which to
   route the channels: 4, 3, 2, and finally 1.
Fig 2.6 Defining the channel routing order for slicing floorplan using a slicing tree
➢ Figure 2.6 shows a floorplan of a chip containing several blocks. Suppose we cut along the
   block boundaries slicing the chip into two pieces (Figure 2.6 a). Then suppose we can slice
   each of these pieces into two. If we can continue in this fashion until all the blocks are
   separated, then we have a slicing floorplan ( Figure 2.6b).
➢ Figure 2.6(c) shows how the sequence we use to slice the chip defines a hierarchy of the
   blocks. Reversing the slicing order ensures that we route the stems of all the channel
   T-junctions first.
➢ Figure 2.7 shows a floorplan that is not a slicing structure with a cyclic constraint that
   prevents channel routing. This means we cannot route any of the channels in this floorplan
   without routing all of the other channels first. There is a cyclic constraint in this floorplan.
                                   ud
➢ There are two solutions to this problem. One solution is to move the blocks until we obtain a
   slicing floorplan. The other solution is to allow the use of L-shaped, rather than rectangular,
   channels (or areas with fixed connectors on all sides a switch box). We need an area-based
   router rather than a channel router to route L -shaped regions or switch boxes.
➢ Fig 2.7(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic constraints,
   but it is inefficient in area use and will be very difficult to route.
                                 lo
                    C
                                    Figure 2.7 Cyclic constraints.
 tu
➢ Figure 2.8 (a) displays the floorplan of the ASIC. We can remove the cyclic constraint by
   moving the blocks again, but this increases the chip size. Figure 2.8 (b) shows an alternative
   solution. We merge the flexible standard cell areas A and C. We can do this by selective
   flattening of the netlist. Sometimes flattening can reduce the routing area because routing
   between blocks is usually less efficient than routing inside the row-based blocks. Figure 2.8(b)
V
shows the channel definition and routing order for our chip.
                                  ud
➢ On a core-limited die we use short, wide core-limited pads. The core logic determines the die
   size.
➢ Figure 2.9(c) shows how we can use both types of pad to change the aspect ratio of a die to
   be different from that of the core. Using both pad-limited pads and core-limited pads for a
   square die.
                                lo
                   C
 tu
➢ Special power pads are used for the positive supply, or VDD, power buses (or power rails)
V
➢ Depending on the type of package and how the foundry attaches the silicon die to the chip
   cavity in the chip carrier, there may be an electrical connection between the chip carrier and
   the die substrate.
➢ Usually, the die is cemented in the chip cavity with a conductive epoxy, making an electrical
   connection between substrate and the package cavity in the chip carrier. If we make an
   electrical connection between the substrate and a chip pad, or to a package pin, it must be
   to VDD (n-type substrate) or VSS (p-type substrate). This substrate connection (for the whole
   chip) employs a down bond (or drop bond) to the carrier. We have several options:
    •   We can dedicate one (or more) chip pad(s) to down bond to the chip carrier.
        We can make a connection from a chip pad to the lead frame and down bond from the
                                  ud
        chip pad to the chip carrier.
    •   We can make a connection from a chip pad to the lead frame and down bond from the
        lead frame.
    •   We can down bond from the lead frame without using a chip pad.
    •   We can leave the substrate and/or chip carrier unconnected.
➢ Depending on the package design, the type and positioning of down bonds may be fixed. This
                                lo
   means we need to fix the position of the chip pad for down bonding using a pad seed.
➢ A double bond connects two pads to one chip-carrier finger and one package pin. A multiple-
   signal pad or pad group is a set of pads. For example, an oscillator pad usually comprises a
   set of two adjacent pads that we connect to an external crystal. The oscillator circuit and the
                      C
   two signal pads form a single logic cell. Another common example is a clock pad.
➢ Some foundries allow a special form of comer pad (normal pads are edge pads) that squeezes
   two pads into the area at the corners of a chip using a special two-pad comer cell, to help
   meet bond-wire angle design rules (Figure 2.10 b and c).
 tu
➢ To reduce the series resistive and inductive impedance of power supply networks, it is normal
   to use multiple VDD and VSS pads. This is particularly important with the simultaneously
   switching outputs (SSOS) that occur when driving buses off-chip. Depending on the
   technology it may be necessary to provide dedicated VDD and VSS pads for every few SSOS.
➢ These dedicated VDD/VSS pads must follow groups of output pads as they are seeded or
V
   planned on the floorplan with some chip packages this can become difficult because design
   rules limit the location of package pins that may be used for supplies.
➢ Using a pad mapping we translate the logical pad in a netlist to a physical pad from a pad
   library. The handling of I/O pads can become quite complex; there are several nonobvious
   factors that must be considered when generating a pad ring:
    •   Ideally we would only need to design library pad cells for one orientation. For example,
        an edge pad for the south side of the chip, and a corner pad for the southeast corner. We
        could then generate other orientations by rotation and flipping (mirroring). Some ASIC
        vendors will not allow rotation or mirroring of logic cells in the mask file. To avoid these
       problems, we may need to have separate horizontal, vertical, left-handed, and right-
       handed pad cells in the library with appropriate logical to physical pad mappings.
   •   If we mix pad-limited and core-limited edge pads in the same pad ring, this complicates
       the design of corner pads. Usually, the two types of edge pad cannot abut. In this case a
       corner pad also becomes a pad-format changer, or hybrid corner pad.
   •   In single-supply chips we have one VDD net and one VSS net, both global power nets. It
       is also possible to use mixed power supplies (for example, 3.3 V and 5 V) or multiple
       power supplies (digital VDD, analog VDD).
➢ Figure 2.10 (a) and (b) are magnified views of the southeast corner of our example chip and
   show the different types of I/O cells. Figure 2.10 (c) shows a stagger-bond arrangement using
                                  ud
   two rows of I/O pads.
                                lo
                   C
 tu
V
➢ In an MGA(Masked Gate Array) the pad spacing and I/O-cell spacing is fixed—each pad
   occupies a fixed pad slot (or pad site ). This means that the properties of the pad I/O are also
   fixed but, if we need to, we can parallel adjacent output cells to increase the drive.
➢ To increase flexibility further the I/O cells can use a separation, the I/O-cell pitch , that is
   smaller than the pad pitch .
➢ For example, three 4 mA driver cells can occupy two pad slots. Then we can use two 4 mA
   output cells in parallel to drive one pad, forming an 8 mA output pad as shown in Figure 2.11.
   This arrangement also means the I/O pad cells can be changed without changing the base
   array. This is useful as bonding techniques improve and the pads can be moved closer
   together.
                                  ud
➢ Fig 2.11(a) Cell-based ASICs may contain pad cells of different sizes and widths. (b) A corner
   of a gate-array base. (c) A gate-array base with different I/O cell and pad pitches.
                                lo
                   C
 tu
➢ Figure 2.12 shows two possible power distribution schemes. The long direction of a
   rectangular channel is the channel spine. Some automatic routers may require that metal
   lines parallel to a channel spine use a preferred layer (either m1, m2, or m3).
V
➢ Since we can have both horizontal and vertical channels, we may have the situation shown
   in Figure 2.12, where we have to decide whether to use a preferred layer or the preferred
   direction for some channels.
➢ Fig 2.12 (a) Power distributed using m1 for VSS and m2 for VDD. This helps minimize the
   number of vias and layer crossings needed but causes problems in the routing channels.
➢ (b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine.
   This can make automatic routing easier but may increase the number of vias and layer
   crossings. (c) An expanded view of part of a channel. If power runs on different layers along
   the spine of a channel, this force signals to change layers. (d) A closeup of VDD & VSS buses
   as they cross. Changing layers requires a large number of via contacts to reduce resistance.
                                  ud
                                lo
                                Figure 2.12 Power distribution
                   C
2.5 CLOCK PLANNING
➢ Figure 2.13 (a) shows a clock spine routing scheme with all clock pins driven directly from
   the clock driver.
 tu
➢ MGAs and FPGAs often use this fish bone type of clock distribution scheme. Figure 2.13(b)
   shows a clock spine for a cell-based ASIC.
➢ Figure 2.13(c) shows the clock-driver cell, often part of a special clock-pad cell.
➢ Figure 2.13(d) illustrates clock skew and clock latency. Since all clocked elements are driven
V
   from one net with a clock spine, skew is caused by differing interconnect lengths and loads.
➢ If the clock-driver delay is much larger than the interconnect delays, a clock spine achieves
   minimum skew but with long latency.
                                 ud
                               lo
                                 Figure 2.13 Clock distribution.
                   C
  (a) A clock spine for a gate array. (b) A clock spine for a cell-based ASIC (typical chips have
  thousands of clock nets). (c) A clock spine is usually driven from one or more clock-driver
  cells. Delay in the driver cell is a function of the number of stages and the ratio of output to
 tu
input capacitance for each stage (taper). (d) Clock latency and clock skew.
➢ Clock skew represents a fraction of the clock period that we cannot use for computation. A
   clock skew of 500ps with a 200 MHz clock means that we waste 500ps of every 5ns clock
   cycle, or 10 percent of performance.
➢ Latency can cause a similar loss of performance at the system level when we need to
V
➢ We can design a tree of clock buffers so that the taper of each stage is e ⊕ 2.7 by using a
   fanout of three at each node, as shown in Figure 2.14(a) and (b).
➢ The clock tree, shown in Figure 2.14 (c), uses the same number of stages as a clock spine,
   but with a lower peak current for the inverter buffers. Figure 2.14 (c) illustrates that we now
   have another problem—we need to balance the delays through the tree carefully to minimize
   clock skew .
                                  ud
                                lo
                                   Figure 2.14 A clock tree.
                   C
(a) Minimum delay is achieved when the taper of successive stages is about 3. (b) Using a fanout
of three at successive nodes. (c) A clock tree for the cell-based ASIC of Figure 2.3(b). We have to
balance the clock arrival times at all of the leaf nodes to minimize clock skew.
 tu
➢ Designing a clock tree that balances the rise and fall times at the leaf nodes has the beneficial
   side-effect of minimizing the effect of hot-electron wearout . This problem occurs when an
   electron gains enough energy to become “hot” and jump out of the channel into the gate oxide.
➢ The trapped electrons change the threshold voltage of the device and this alters the delay of
V
   the buffers. As the buffer delays change with time, this introduces unpredictable skew. The
   problem is worst when the n -channel device is carrying maximum current with a high voltage
   across the channel—this occurs during the rise-and fall-time transitions. Balancing the rise
   and fall times in each buffer means that they all wear out at the same rate, minimizing any
   additional skew.
➢ A phase-locked loop ( PLL ) is an electronic flywheel that locks in frequency to an input clock
   signal. The input and output frequencies may differ in phase, however. This means that we
   can, for example, drive a clock network with a PLL in such a way that the output of the clock
   network is locked in phase to the incoming clock, thus eliminating the latency of the clock
   network .
2b: PLACEMENT
➢ After completing a floorplan we can begin placement of the logic cells within the flexible
   blocks. Placement is much more suited to automation than floorplanning.
➢ After we complete floorplanning and placement, we can predict both intrablock and interblock
   capacitances. This allows us to return to logic synthesis with more accurate estimates of the
   capacitive loads that each logic cell must drive.
The goal of a placement tool is to arrange all the logic cells within the flexible blocks on a chip.
Ideally, the objectives of the placement step are to
                                  ud
   •   Guarantee the router can complete the routing step
   •   Minimize all the critical net delays
   •   Make the chip as dense as possible
We may also have the following additional objectives:
   •   Minimize power dissipation
   •   Minimize cross talk between signals
                                lo
Current placement tools use more specific and achievable criteria. The most commonly used
placement objectives are one or more of the following:
   •   Minimize the total estimated interconnect length
   •   Meet the timing requirements for critical nets
                   C
   •   Minimize the interconnect congestion
Each of these objectives in some way represents a compromise.
➢ There are two classes of placement algorithms commonly used in commercial CAD tools:
   constructive placement and iterative placement improvement.
   The most commonly used methods are variations on the min-cut algorithm. The other
   commonly used constructive placement algorithm is the eigenvalue method. As in system
   partitioning, placement usually starts with a constructed solution and then improves it using
   an iterative algorithm. In most tools we can specify the locations and relative placements of
   certain critical logic cells as seed placements.
➢ The min-cut placement method uses successive application of partitioning. The following
   steps are shown in Figure 2.1:
   1. Cut the placement area into two pieces.
   2. Swap the logic cells to minimize the cut cost.
   3. Repeat the process from step 1, cutting smaller pieces until all the logic cells are placed.
                                   ud
                                 lo
                    C
                                Figure 2.1 Min-cut placement.
➢ Usually we divide the placement area into bins . The size of a bin can vary, from a bin size
 tu
   equal to the base cell (for a gate array) to a bin size that would hold several logic cells. We
   can start with a large bin size, to get a rough placement, and then reduce the bin size to get
   a final placement.
An iterative placement improvement algorithm takes an existing placement and tries to improve
it by moving the logic cells. There are two parts to the algorithm:
    •   The selection criteria that decides which logic cells to try moving.
    •   The measurement criteria that decides whether to move the selected cells.
There are several interchange or iterative exchange methods that differ in their selection and
measurement criteria:
    •   pairwise interchange,
    •   force-directed interchange,
    •   force-directed relaxation, and
    •   force-directed pairwise relaxation.
➢ All of these methods usually consider only pairs of logic cells to be exchanged. A source logic
   cell is picked for trial exchange with a destination logic cell.
➢ The most widely used methods use group migration, especially the Kernighan–Lin algorithm.
   The pairwise-interchange algorithm is similar to the interchange algorithm used for iterative
   improvement in the system partitioning step:
    1. Select the source logic cell at random.
    2. Try all the other logic cells in turn as the destination logic cell.
    3. Use any of the measurement methods we have discussed to decide on whether to accept
       the interchange.
    4. The process repeats from step 1, selecting each logic cell in turn as a source logic cell.
                                   ud
➢ Figure 2.2 (a) and (b) show how we can extend pairwise interchange to swap more than two
   logic cells at a time. If we swap l logic cells at a time and find a locally optimum solution, we
   say that solution is l -optimum .
➢ The neighborhood exchange algorithm is a modification to pairwise interchange that
   considers only destination logic cells in a neighborhood —cells within a certain distance, e,
   of the source logic cell. Limiting the search area for the destination logic cell to the e-
                                 lo
   neighborhood reduces the search time.
➢ Figure 2.2 (c) and (d) show the one- and two-neighborhoods (based on Manhattan distance)
   for a logic cell.
                       C
 tu
➢ (a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
➢ (b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
   placement, but this is expensive in computation time.
➢ Limiting the search to neighborhoods reduces the search time. Logic cells within a distance
   e of a logic cell form an e-neighborhood. (c) A one-neighborhood. (d) A two-neighborhood.
➢ Neighborhoods are also used in some of the force-directed placement methods.
➢ Imagine identical springs connecting all the logic cells we wish to place. The number of
   springs is equal to the number of connections between logic cells. The effect of the springs is
   to pull connected logic cells together. The more highly connected the logic cells, the stronger
   the pull of the springs. The force on a logic cell i due to logic cell j is given by Hooke’s law ,
   which says the force of a spring is proportional to its extension:
F ij = c ij x ij
➢ The vector component x     ij   is directed from the center of logic cell i to the center of logic cell j .
   The vector magnitude is calculated as either the Euclidean or Manhattan distance between
   the logic cell centers. The c    ij   form the connectivity or cost matrix (the matrix element c      ij   is
   the number of connections between logic cell i and logic cell j). Fig 2.3 illustrates the force-
   directed placement algorithm.
                                      ud
                                              Figure 2.3 Force-directed placement.
➢ 2.3 (a) A network with nine logic cells. (b) We make a grid (one logic cell per bin). (c) Forces
                                    lo
   are calculated as if springs were attached to the centers of each logic cell for each connection.
   The two nets connecting logic cells A and I correspond to two springs.
➢ In the definition of connectivity it was pointed out that the network graph does not accurately
   model connections for nets with more than two terminals. Nets such as clock nets, power
                   C
   nets, and global reset lines have a huge number of terminals.
➢ Without external forces to counteract the pull of the springs between logic cells, the network
   will collapse to a single point as it settles. An important part of force-directed placement is
 tu
                                  ud
    each signal. We also know the required times for the primary outputs —the points in time
    at which we want the signals to be valid.
➢ We can work forward from the primary inputs and backward from the primary outputs to
    determine arrival and required times at each input pin for each net. The difference between
    the required and arrival times at each input pin is the slack time (the time we have to spare).
➢ The zero-slack algorithm adds delay to each net until the slacks are zero, as shown
                                lo
    in Figure 2.5 (b). The net delays can then be converted to weights or constraints in the
    placement. Notice that we have assumed that all the gates on a net switch at the same time
    so that the net delay can be placed at the output of the gate driving the net.
➢ An important point to remember is that adjusting the net weight, even for every net on a chip,
                    C
    does not theoretically make the placement algorithms any more complex—we have to deal
    with the numbers anyway.
➢   The practical problem, however, is getting the weight information for each net (usually in the
    form of timing constraints) from a synthesis tool or timing verifier. These files can easily be
 tu
    information. Usually we still do not compute a routing tree but use simple approximations to
    the total net length (such as the half-perimeter measure) and then use this to estimate a net
    delay (the same to each pin on a net). It is not until the routing step that we can make
    accurate estimates of the actual interconnect delays.
                               ud
                             lo
                 C
tu
  (a) The circuit with no net delays. (b) The zero-slack algorithm adds net delays (at the
  outputs of each gate, equivalent to increasing the gate delay) to reduce the slack times to
  zero.
                                  ud
   tool as load constraints and intrablock capacitances are input as wire-load tables.
4. Synthesis with load constraints. At this point the synthesis tool is able to resynthesize the
   logic based on estimates of the interconnect capacitance each gate is driving. The synthesis
   tool produces a forward annotation file to constrain path delays in the placement step.
5. Timing-driven placement. After placement using constraints from the synthesis tool, the
   location of every logic cell on the chip is fixed and accurate estimates of interconnect delay
                                lo
   can be passed back to the synthesis tool.
6. Synthesis with in-place optimization (IPO). The synthesis tool changes the drive strength
   of gates based on the accurate interconnect delay estimates from the floorplanner without
   altering the netlist structure.
                   C
7. Detailed placement. The placement information is ready to be input to the routing step.
 tu
V
                                            2c: ROUTING
➢ Once the designer has floorplanned a chip and the logic cells within the flexible blocks have
   been placed, it is time to make the connections by routing the chip. This is still a hard problem
   that is made easier by dividing it into smaller problems. Routing is usually split into global
   routing followed by detailed routing.
➢ The details of global routing differ slightly between cell-based ASICs, gate arrays, and FPGAs,
   but the principles are the same in each case. A global router does not make any connections,
   it just plans them. We typically global route the whole chip (or large pieces if it is a large chip)
   before detail routing the whole chip (or the pieces). There are two types of areas to global
   route: inside the flexible blocks and between blocks (the Viterbi decoder, although a cell-
                                   ud
   based ASIC, only involved the global routing of one large flexible block).
➢ The input to the global router is a floorplan that includes the locations of all the fixed and
   flexible blocks; the placement information for flexible blocks; and the locations of all the logic
   cells. The goal of global routing is to provide complete instructions to the detailed router on
    •
                                 lo
   where to route every net. The objectives of global routing are one or more of the following:
• Maximize the probability that the detailed router can complete the routing.
   rectangular grid, since at this point nets have not been assigned to the channels, but the
   global router must use the wiring channels and find the actual path. Often the global router
   needs to find a path that minimizes the delay between two terminals.
➢ Global routing cannot use the interconnect length approximations such as the half perimeter
   measure that were used in placement. One approach to global routing takes each net in turn
   and calculates the shortest path using tree on graph algorithms with the added restriction of
   using the available channels. This process is known as sequential routing.
➢ As a sequential routing algorithm proceeds, some channels will become more congested since
   they hold more interconnects than others. In the case of FPGAs and channelled Gate
   Arrays, the channels have a fixed channel capacity and can only hold a certain number of
   interconnects.
➢ There are two different ways that a global router handles this problem. Using order-
   independent routing a global router proceeds by routing each net ignoring how crowded the
   channels are. Whether a particular net is processed first or last does not matter the channel
   assignment will be the same.
➢ In order independent routing, after all the interconnects are assigned to channels, the Global
   router returns to those channels that are the most crowded and reassigns some interconnects
   to others less crowded channels.
➢ Alternatively, a Global router can consider the number of interconnects already placed in
   various channels as it proceeds. In this case the Global routing is order dependent, the
   routing is still sequential but now the order of processing the nets will affect the results.
➢ In contrast to sequential Global routing methods, which handle nets one at a time,
   hierarchical routing handles all nets at a particular level at once. rather than handling all of
                                  ud
   the nets on the chip at the same time the Global routing problem is made more tractable by
   dividing the ship area into levels of Hierarchy.
➢ By considering only one level of Hierarchy at a time the size of the problem is reduced at each
   level. There are two ways to traverse the levels of Hierarchy starting at the whole ship or
   highest level and preceding down to the logic cells is the top-down approach the bottom-up
   approach starts at the lowest level of hierarchy and globally routes the smallest areas first.
                                lo
2.2.1 GLOBAL ROUTING BETWEEN BLOCKS
➢ Figure 2.1 illustrates the global routing problem for a cell-based ASIC. Each edge in the
   channel- intersection graph in figure 2.1(c) represents a channel. The global router is
                   C
   restricted to using these channels. The weight of each edge in the graph corresponds to the
   length of the channel. The global router plans a path for each interconnect using this graph.
 tu
V
Figure 2.1 Global routing for a cell-based ASIC formulated as a graph problem.
(a) A cell-based ASIC with numbered channels. (b) The channels form the edges of a graph
(c)The channel-intersection graph. Each channel corresponds to an edge on a graph whose
weight corresponds to the channel length.
➢ Figure 2.2 shows an example of global routing for a net with five terminals, labelled A1
   through F1, for the cell-based ASIC shown in figure 2.1. If a designer wishes to use minimum
   total interconnect path length as an objective.
                                 ud
                             Figure 2.2 Finding paths in global routing
                               lo
   (a) A cell based ASIC showing a single net with a fanout of 4(five terminals). We have to order
   the numbered channels to complete interconnect path for terminals A1 through F1.
   (b) The terminals are projected to the centre of the nearest channel forming a graph. a
   minimum length tree for the net that uses the channels and takes into account the channel
                   C
   capacities
   (c) The minimum length tree does not necessarily correspond to minimum delay. If we wish
   to minimise the delay from terminal A1 to D1 a different tree might be better.
 tu
➢ Global routing is very similar for cell based ASICs and Gate arrays but there is a very
   important difference between the types of channels in these ASICs.
➢ The size of the channel in sea of Gates array ,channelless Gate arrays and cell based ASICs
   can be varied to make sure there is enough space to complete the wiring.
➢ In chanelled gate arrays and FPGAs the size, number and location of channels are fixed.
V
➢ The Global router can allocate as many interconnects to each channel as it likes since that
   space is committed anyway but there is a maximum number of interconnects that each
   channel can hold.
➢ If the Global router needs more room even in just one channel on the whole chip the designer
   has to repeat the placement and routing steps and try again(or use bigger chip).
➢ Back annotation After Global routing is complete it is possible to accurately predict what the
   length of each interconnect in every net will be after detailed routing. The Global router can
   give us not just an estimate of the total net length but the resistance and capacitance of each
   path in each net. This RC information is used to calculate net delays.
➢ We can back annotate this net delay information to the synthesis tool for in place
   Optimisation or to a timing verifier to make sure there are no timing surprises. Differences
   in timing predictions at this point arise due to the different ways in which the placement
   algorithms estimate the parts and the way the Global router actually builds the paths.
                                 ud
                               lo
                   C
 tu
V