Chap 8
Chap 8
                                                                  Original Authors:
                                     Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 1
                                                                                                                           Lienig
   Chapter 8 – Timing Closure
                                                                                              © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   2
                                                                                                       Lienig
   8.1               Introduction
                                                                                                                  © KLMH
                         System Specification
                                                                      Partitioning
                         Architectural Design
  ENTITY test is
   port a: in bit;
 end ENTITY test;
                          Functional Design                          Chip Planning
                          and Logic Design
                            Physical Design
                                                                  Clock Tree Synthesis
                          Physical Verification
     DRC                      and Signoff
     LVS                                                             Signal Routing
     ERC
                              Fabrication
Timing Closure
Chip
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 3
                                                                                                                           Lienig
    8.1           Introduction
                                                                                              © KLMH
•   IC layout must satisfy geometric constraints, electrical constraints,
    power & thermal constraints as well as timing constraints
    − Setup (long-path) constraints
    − Hold (short-path) constraints
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 4
                                                                                                       Lienig
    8.1           Introduction
                                                                                              © KLMH
Components of timing closure covered in this lecture:
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 5
                                                                                                       Lienig
    8.1           Introduction
                                                                                                       © KLMH
•   Timing optimization engines must estimate circuit delays quickly and accurately
    to improve circuit timing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 6
                                                                                                                Lienig
    8.1           Introduction
                                                                                              © KLMH
•   Timing closure is the process of satisfying timing constraints
    through layout optimizations and netlist modifications
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 7
                                                                                                       Lienig
   8.2            Timing Analysis and Performance Constraints
                                                                                              © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   8
                                                                                                       Lienig
   8.2            Timing Analysis and Performance Constraints
                                                                                                                  © KLMH
Sequential circuit, “unrolled” in time
Clock
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 9
                                                                                                                                   Lienig
    8.2           Timing Analysis and Performance Constraints
                                                                                               © KLMH
•   Main delay concerns in sequential circuits
    − Gate delays are due to gate transitions
    − Wire delays are due to signal propagation along wires
    − Clock skew is due to the difference in time the sequential elements activate
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 10
                                                                                                        Lienig
    8.2.1         Static Timing Analysis
                                                                                               © KLMH
•   STA: assume worst-case scenario where every gate transitions
⇒ Negative slack at any output means the circuit does not meet timing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 11
                                                                                                        Lienig
   8.2.1          Static Timing Analysis
                                                                                                                                © KLMH
Combinational circuit as DAG
           a               (0.15)
                                                     y (2)          (0.2)
                                                                                    w (2)         (0.2)            f
                                           (0.1)
           b       (0.1) x (1)                      (0.3)                  (0.25)
                                                                   z (2)
           c                              (0.1)
                                                                                                                                                 Lienig
   8.2.1          Static Timing Analysis
                                                                                                                                   © KLMH
Compute AATs at each node:
where FI(v) is the fanin nodes, and t(u,v) is the delay between u and v
(AATs of inputs are given)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 13
                                                                                                                                                    Lienig
   8.2.1          Static Timing Analysis
                                                                                                                                    © KLMH
Compute RATs at each node:
where FO(v) are the fanout nodes, and t(u,v) is the delay between u and v
(RATs of outputs are given)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 14
                                                                                                                                                     Lienig
   8.2.1          Static Timing Analysis
                                                                                                                              © KLMH
Compute slacks at each node:
                                                                                                                                               Lienig
    8.2.2         Delay Budgeting with the Zero-Slack Algorithm
                                                                                               © KLMH
•   Establish timing budgets for nets
    − Gate and wire delays must be optimized during timing driven layout design
    − Wire delays depend on wire lengths
    − Wire lengths are not known until after placement and routing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 16
                                                                                                        Lienig
      8.2.2       Delay Budgeting with the Zero-Slack Algorithm
                                                                                                             © KLMH
Input: timing graph G(V,E)
Output: timing budgets TB for each v ∈ V
1. do
2.    (AAT,RAT,slack) = STA(G)
3.    foreach (vi ∈ V)
4.       TB[vi] = DELAY(vi) + DELAY(ei)
5.    slackmin = ∞
6.    foreach (v ∈ V)
7.       if ((slack[v] < slackmin) and (slack[v] > 0))
8.            slackmin = slack[v]
9.            vmin = v
10. if (slackmin ≠ ∞)
11.      path = vmin
12.      ADD_TO_FRONT(path,BACKWARD_PATH(vmin,G))
13.      ADD_TO_BACK(path,FORWARD_PATH(vmin,G))
14.      s = slackmin / |path|
15.      for (i = 1 to |path|)
16.           node = path[i]                                      // evenly distribute
17.           TB[node] = TB[node] + s                             // slack along path
18. while (slackmin ≠ ∞)
VLSI Physical Design: From Graph Partitioning to Timing Closure                 Chapter 8: Timing Closure   17
                                                                                                                      Lienig
     8.2.2        Delay Budgeting with the Zero-Slack Algorithm
                                                                                                         © KLMH
Forward Path Search (FORWARD_PATH(vmin,G))
Input: node vmin with minimum slack slackmin, timing graph G
Output: maximal downstream path path from vmin such that no node v ∈ V affects
        the slack of path
1. path = vmin
2. do
3.   flag = false
4.   node = LAST_ELEMENT(path)
5.   foreach (fanout node fo of node)
6.      if ((RAT[fo] == RAT[node] + TB[fo]) and (AAT[fo] == AAT[node] + TB[fo]))
7.            ADD_TO_BACK(path,fo)
8.            flag = true
9.            break
10. while (flag == true)
11. REMOVE_FIRST_ELEMENT(path)                                                         // remove vmin
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 18
                                                                                                                  Lienig
     8.2.2        Delay Budgeting with the Zero-Slack Algorithm
                                                                                                             © KLMH
Backward Path Search (BACKWARD_PATH(vmin,G))
Input: node vmin with minimum slack slackmin, timing graph G
Output: maximal upstream path path from vmin such that no node v ∈ V affects the
        slack of path
1.    path = vmin
2.    do
3.      flag = false
4.      node = FIRST_ELEMENT(path)
5.      foreach (fanin node fi of node)
6.             if ((RAT[fi] == RAT[node] – TB[fi]) and (AAT[fi] == AAT[node] – TB[fi]))
7.                             ADD_TO_FRONT(path,fi)
8.                            flag = true
9.                            break
10. while (flag == true)
11. REMOVE_LAST_ELEMENT(path)                                                              // remove vmin
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 19
                                                                                                                      Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                        © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
                                                                                              O1: <13,4,17>
     <1,4,5> [0]                                                                              O2: <6,8,14>
I1                                        <3,4,7> [0]
                                2
I2
     <0,5,5> [0]
                                                                       <7,4,11> [0]                    <13,4,17> [0]
                                                                   4                                                   O1
I3                                                                                               6
     <1,6,7> [0]
                                                                                                      <6,8,14> [0]
                                                                   3                         0                         O2
I4                                                                     <6,5,11> [0]
     <3,5,8> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                  Chapter 8: Timing Closure        20
                                                                                                                                 Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                        © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum nonzero slack
                                                                                              O1: <13,4,17>
     <1,4,5> [0]                                                                              O2: <6,8,14>
I1                                        <3,4,7> [0]
                                2
I2
     <0,5,5> [0]
                                                                       <7,4,11> [0]                    <13,4,17> [0]
                                                                   4                                                   O1
I3                                                                                               6
     <1,6,7> [0]
                                                                                                      <6,8,14> [0]
                                                                   3                         0                         O2
I4                                                                     <6,5,11> [0]
     <3,5,8> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                  Chapter 8: Timing Closure        21
                                                                                                                                 Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                        © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum slack
     •   Distribute the slacks and update the timing budgets
                                                                                              O1: <17,0,17>
     <1,0,1> [1]                                                                              O2: <6,8,14>
I1                                        <3,0,4> [1]
                                2
I2
     <0,2,2> [0]
                                                                       <9,0,9> [1]                     <16,0,16> [1]
                                                                   4                                                   O1
I3                                                                                               6
     <1,4,5> [0]
                                                                                                      <6,8,14> [0]
                                                                   3                         0                         O2
I4                                                                     <6,4,10> [0]
     <3,4,7> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                  Chapter 8: Timing Closure        22
                                                                                                                                 Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                        © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum slack
     •   Distribute the slacks and update the timing budgets
                                                                                              O1: <17,0,17>
     <1,0,1> [1]                                                                              O2: <6,8,14>
I1                                        <4,0,4> [1]
                                2
I2
     <0,0,0> [2]
                                                                       <9,0,9> [1]                     <16,0,16> [1]
                                                                   4                                                   O1
I3                                                                                               6
     <1,4,5> [0]
                                                                                                      <6,8,14> [0]
                                                                   3                         0                         O2
I4                                                                     <6,4,10> [0]
     <3,4,7> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                  Chapter 8: Timing Closure        23
                                                                                                                                 Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                       © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum slack
     •   Distribute the slacks and update the timing budgets
                                                                                             O1: <16,0,16>
     <1,0,1> [1]                                                                             O2: <6,8,14>
I1                                        <4,0,4> [1]
                                2
I2
     <0,0,0> [2]
                                                                       <9,0,9> [1]                    <16,0,16> [1]
                                                                   4                                                  O1
I3                                                                                              6
     <1,2,3> [2]
                                                                                                     <6,8,14> [0]
                                                                   3                        0                         O2
I4                                                                     <6,2,8> [2]
     <3,2,5> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                 Chapter 8: Timing Closure        24
                                                                                                                                Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                       © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum slack
     •   Distribute the slacks and update the timing budgets
                                                                                             O1: <17,0,17>
     <1,0,1> [1]                                                                             O2: <10,4,14>
I1                                        <4,0,4> [1]
                                2
I2
     <0,0,0> [2]
                                                                       <9,0,9> [1]                    <16,0,16> [1]
                                                                   4                                                  O1
I3                                                                                              6
     <1,0,1> [3]
                                                                                                    <10,4,14> [0]
                                                                   3                        0                         O2
I4                                                                     <7,0,7> [3]
     <3,1,4> [0]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                 Chapter 8: Timing Closure        25
                                                                                                                                Lienig
         8.2.2         Delay Budgeting with the Zero-Slack Algorithm Example
                                                                                                                       © KLMH
     •   Example: Use the zero-slack algorithm to distribute slack
     •   Format: <AAT, Slack, RAT>, [timing budget]
     •   Find the path with the minimum slack
     •   Distribute the slacks and update the timing budgets
                                                                                             O1: <17,0,17>
     <1,0,1> [1]                                                                             O2: <10,4,14>
I1                                        <4,0,4> [1]
                                2
I2
     <0,0,0> [2]
                                                                       <9,0,9> [1]                    <16,0,16> [1]
                                                                   4                                                  O1
I3                                                                                              6
     <1,0,1> [3]
                                                                                                    <10,4,14> [4]
                                                                   3                        0                         O2
I4                                                                     <7,0,7> [3]
     <3,0,3> [1]
     VLSI Physical Design: From Graph Partitioning to Timing Closure                 Chapter 8: Timing Closure        26
                                                                                                                                Lienig
   8.3             Timing-Driven Placement
                                                                                               © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   27
                                                                                                        Lienig
    8.3            Timing-Driven Placement
                                                                                                   © KLMH
•   Timing-driven placement optimizes circuit delay
    to satisfy timing constraints
•   Let T be the set of all timing endpoints
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 28
                                                                                                            Lienig
    8.3.1         Net-Based Techniques
                                                                                                            © KLMH
•   Net weights are added to each net – placer optimizes weighted wirelength
                                 ω1 if slack > 0
    − Discrete net weights: w =                  where ω1 > 0, ω2 > 0, and ω2 > ω1
                                 ω
                                 2  if slack ≤ 0
                                                                  α
                                      slack 
    − Continuous net weights: w = 1 −                               where t is the longest path delay
                                        t                           and α is a criticality exponent
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 29
                                                                                                                     Lienig
    8.3.1         Net-Based Techniques
                                                                                                               © KLMH
•   Dynamic net weights: (re)computed during placement
                                    1
                                     2 (υ k −1 + 1) if among the top 3% of critical nets
    − Update net criticality: υ k = 
                                      1
                                     υ k −1         otherwise
                                    2
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 30
                                                                                                                        Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                               © KLMH
•   Construct a set of constraints for timing-driven placement
    − Physical constraints define locations of cells
    − Timing constraints define slack requirements
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 31
                                                                                                        Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                               © KLMH
•   For physical constraints, let:
    − xv and yv be the center of cell v ∈ V
− δx(v,e) and δy(v,e) be pin offsets from xv and yv for v’s pin connected to e
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 32
                                                                                                        Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                                      © KLMH
•   Then, for all v ∈ Ve:
                                             left(e) ≤ xv + δ x (v, e)
                                           right(e) ≥ xv + δ x (v, e)
                                        bottom(e) ≤ yv + δ y (v, e)
                                             top(e) ≥ yv + δ y (v, e)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 33
                                                                                                               Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                               © KLMH
•   For timing constraints, let
    − tGATE(vi,vo) be the gate delay from an input pin vi to the output pin vo for cell v
− tNET(e,uo,vi) be net e’s delay from cell u’s output pin uo to cell v’s input pin vi
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 34
                                                                                                        Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                                           © KLMH
•   For every input pin vi of cell v :
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 35
                                                                                                                    Lienig
    8.3.2         Embedding STA into Linear Programs for Placement
                                                                                                         © KLMH
•   Optimize for total negative slack:
                                      max :            ∑ slack (τ
                                                τ p ∈Pins ( τ ), τ∈Τ
                                                                       p)
max : WNS
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 36
                                                                                                                  Lienig
   8.4            Timing-Driven Routing
                                                                                               © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   37
                                                                                                        Lienig
    8.4           Timing-Driven Routing
                                                                                               © KLMH
•   Timing-driven routing seeks to minimize:
    − Maximum sink delay: delay from the source to any sink in a net
    − Total wirelength: routed length of the net
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 38
                                                                                                        Lienig
    8.4           Timing-Driven Routing
                                                                                               © KLMH
•   For any spanning tree T over G, let:
    − radius(T) be the length of the longest source-sink path in T
    − cost(T) be the total edge weight of T
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 39
                                                                                                        Lienig
   8.4             Timing-Driven Routing
                                                                                                                © KLMH
                                                         2             3                2
      2
                                                    3                               3
               6                                                                              6
                           7
 5                                                  5                               5
              s0                                                  s0                            s0
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 40
                                                                                                                                 Lienig
    8.4.1         The Bounded-Radius, Bounded-Cost Algorithm
                                                                                                   © KLMH
•   Trades off radius for cost by setting upper bounds on both
                                                                                                            Lienig
    8.4.2         Prim-Dijkstra Tradeoff
                                                                                                                   © KLMH
•   Prim-Dijkstra Tradeoff based on Prim’s algorithm and Dijkstra’s algorithm
• From the set of sinks S, iteratively add sink s based on different cost function
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 42
                                                                                                                            Lienig
   8.4.2          Prim-Dijkstra Tradeoff
                                                                                                              © KLMH
                      4                                              4
                             7
                                                                                   11
                                 8                                            8
                                                7                                             7
9 s0 9 s0
                    radius(T) = 19                                radius(T) = 15
                     cost(T) = 35                                  cost(T) = 39
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 43
                                                                                                                               Lienig
    8.4.3         Minimization of Source-to-Sink Delay
                                                                                                   © KLMH
•   Iteratively forms a tree by adding sinks, and optimizes for critical sink(s)
                                          ∑ α(i) ⋅ t (s , s )
                                           i =1
                                                            0     i
where α(i) are sink criticalities for sinks si, and t(s0,si) is the delay from s0 to si
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 44
                                                                                                            Lienig
    8.4.3         Minimization of Source-to-Sink Delay
                                                                                               © KLMH
•   In the critical-sink Steiner tree problem, construct a
    minimum-cost Steiner tree T for all sinks except for the most critical sink sc
    − H1: the shortest possible wire that can join sc to T, so long as the path
          from s0 to sc is the shortest possible total length
    − HBest: try all shortest connections from sc to edges in T and from sc to s0.
             Perform timing analysis on each of these trees and pick the one
             with the lowest delay at sc
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 45
                                                                                                        Lienig
   8.5            Physical Synthesis
                                                                                               © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   46
                                                                                                        Lienig
    8.5           Physical Synthesis
                                                                                               © KLMH
•   Physical synthesis is a collection of timing optimizations to fix negative slack
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 47
                                                                                                        Lienig
    8.5.1         Gate Sizing
                                                                                                                  © KLMH
•   Let a gate v have 3 sizes A, B, C, where:                           size (vC ) > size (v B ) > size (v A )
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 48
                                                                                                                           Lienig
             8.5.1      Gate Sizing
                                                                                                                              © KLMH
    •        Let a gate v have 3 sizes A, B, C, where:                   size (vC ) > size (v B ) > size (v A )
                                                                                              40
             40
                                                                                35
             35                                                                               33
                                                                        30      30
             30
                                                             25         27                                       A
                                             24                                               28
Delay (ps)
             25              23                                         26      27
                                             21                                                                  B
                                                             24
             20              18                                                                                  C
                                             20
             15
                             15
             10
                                                                                                                                               Lienig
       8.5.1          Gate Sizing
                                                                                                                    © KLMH
                                                                      d   C(d) = 1.5
                                 a
                                              v                       e   C(e) = 1.0
                                 b
                                                                      f   C(f) = 0.5
t(vA) = 40 t(vC) = 28
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 50
                                                                                                                             Lienig
    8.5.2         Buffering
                                                                                               © KLMH
•   Buffer: a series of two serially-connected inverters
•   Improve delays by
    − speeding up the circuit or serving as delay elements
    − changing transition times
    − shielding capacitive load
•   Drawbacks:
    − Increased area usage
    − Increased power consumption
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 51
                                                                                                        Lienig
       8.5.2          Buffering
                                                                                                                               © KLMH
                                                                                              d C(d) = 1
                                   d C(d) = 1
                                   e C(e) = 1                                                 e C(e) = 1
a                                                                     a                                           f C(f) = 1
           vB                       f C(f) = 1                                vB
b                                                                     b                                           g C(g) = 1
                                   g C(g) = 1                                                 y
    C(vB) = 5 fF                   h C(h) = 1                             C(vB) = 3 fF                            h C(h) = 1
t(vB) = 45 ps t(vB) = 33 ps
                                                                                                      C(y) = 3 fF
                                                                                           t(y) = t(vB) + t(y) = 66 ps
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 52
                                                                                                                                        Lienig
    8.5.3         Netlist Restructuring
                                                                                               © KLMH
•   Netlist restructuring only changes existing gates, does not change functionality
•   Changes include
    − Cloning: duplicating gates
    − Redesign of fanin or fanout tree: changing the topology of gates
    − Swapping communicative pins: changing the connections
    − Gate decomposition: e.g., changing AND-OR to NAND-NAND
    − Boolean restructuring: e.g., applying Boolean laws to change circuit gates
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 53
                                                                                                        Lienig
    8.5.3         Netlist Restructuring
                                                                                                                                    © KLMH
•   Cloning can reduce fanout capacitance
                                      d    C(d) = 1                   a                                    d C(d) = 1
                                      e    C(e) = 1                                vA                      e C(e) = 1
a                                                                     b
            vB                        f    C(f) = 1                                                        f    C(f) = 1
b
                                      g    C(g) = 1                                vB                      g C(g) = 1
                                      h    C(h) = 1                                                        h C(h) = 1
                                                                                                                                                     Lienig
     8.5.3          Netlist Restructuring
                                                                                                                           © KLMH
  Redesigning the fanin tree can change AATs
                                                                                                                                            Lienig
   8.5.3          Netlist Restructuring
                                                                                                               © KLMH
Redesigning fanout trees can change delays on specific paths
                  path1
                                                                         path1
                  y1 (1)
                                         (1)                                                    (1)
   (1)                                                            (1)
y2 (1) y2 (1)
path2 path2
                                                                                                                                Lienig
     8.5.3          Netlist Restructuring
                                                                                                                         © KLMH
 Swapping commutative pins can change the final delay
                                                                                                                                          Lienig
   8.5.3          Netlist Restructuring
                                                                                                       © KLMH
Gate decomposition can change the general structure of the circuit
                                                                                                                        Lienig
        8.5.3         Netlist Restructuring
                                                                                                                              © KLMH
   Boolean restructuring uses laws or properties,
   e.g., distributive law, to change circuit topology
                                               (a + b)(a + c) = a + bc
a <4>                                                                 a <4>
                       (1)                                                                          (1)      x <5>
b <1>                                                                 b <1>
                                      (1)           x <6>                           (1)
                                                                      c <2>
                       (1)
c <2>
                                      (1)           y <6>                           (1)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 59
                                                                                                                                               Lienig
   8.6            Performance-Driven Design Flow
                                                                                               © KLMH
8.1     Introduction
8.2     Timing Analysis and Performance Constraints
        8.2.1 Static Timing Analysis
        8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
        8.3.1 Net-Based Techniques
        8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
        8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
        8.4.2 Prim-Dijkstra Tradeoff
        8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
        8.5.1 Gate Sizing
        8.5.2 Buffering
        8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
8.7 Conclusions
VLSI Physical Design: From Graph Partitioning to Timing Closure   Chapter 8: Timing Closure   60
                                                                                                        Lienig
   8.6            Performance-Driven Design Flow
                                                                                               © KLMH
Baseline Physical Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 61
                                                                                                        Lienig
   8.6                                 Performance-Driven Design Flow
                                                                                                                                     © KLMH
                                                                  Floorplanning Example
                           Digital-to-Analog Converter
      Analog Processing                                    Video         Video
                              Analog-to-Digital and        Pre/Post-
                                                          processing Codec DSP    Embedded
                                                         Control + DSP           Controller for  Protocol
                                                                                  Dataplane     Processing
                                                           Audio         Audio    Processing
                                                           Pre/Post-     Codec
                                                          processing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 62
                                                                                                                                                      Lienig
   8.6            Performance-Driven Design Flow
                                                                                                       © KLMH
                                       Global Placement Example
                                                                                                                        Lienig
   8.6            Performance-Driven Design Flow
                                                                                                       © KLMH
                                Clock Network Synthesis Example
                                                                                                                        Lienig
   8.6            Performance-Driven Design Flow
                                                                                                       © KLMH
                              Global Routing Congestion Example
                                                                                                                        Lienig
       8.6            Performance-Driven Design Flow
                                                                                                                                © KLMH
                                   Chip Planning and Logic Design
Performance-Driven                                                                         Block Shaping, Sizing
                                                    I/O Placement
                                                                                              and Placement
   Chip Planning
                                                                                     With Optional Net Weights
                                              Performance-Driven
                                                                                         Single Global Net Routes
   Block-Level                                  Trial Synthesis and                           and Buffering
 Delay Budgeting                                   Floorplanning
                                                                                 fails
                                                                                                   RTL
Logic Synthesis and                                                                          Timing Estimation
                                                   Power Planning
Technology Mapping
                                                                                                              passes
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 66
                                                                                                                                                 Lienig
           8.6            Performance-Driven Design Flow
                                                                                                                           © KLMH
                                   Block-level or Top-level Global Placement
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 67
                                                                                                                                            Lienig
           8.6            Performance-Driven Design Flow
                                                                                                                          © KLMH
                                                      Physical Synthesis
                                                                  AND
fails         Static                                                                         Redesign of Fanin
         Timing Analysis                                      Gate Sizing
                                                                                             and Fanout Trees
                      passes
                                                                Routing
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 68
                                                                                                                                           Lienig
     8.6            Performance-Driven Design Flow
                                                                                                                       © KLMH
                                                          Routing
                                                             fails
                                                                                         Timing-driven
                                                         Static
Timing-Driven Routing                               Timing Analysis              Legalization + Congestion-
                                   passes
                                                                                 Driven Detailed Placement
Sign-off
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 69
                                                                                                                                        Lienig
       8.6            Performance-Driven Design Flow
                                                                                                                           © KLMH
                                                        Sign-off
    ECO Placement and Routing                                  fails
          fails                                                                  Design Rule Checking
                                                  Manufacturability,
Static Timing Analysis                           Electrical, Reliability           Layout vs. Schematic
                              passes
                                                     Verification
                                                                                          Antenna Effects
                                                               passes
                                                                                       Electrical Rule Checking
                                                  Mask Generation
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 70
                                                                                                                                            Lienig
    Summary of Chapter 8 – Timing Constraints and Timing Analysis
                                                                                               © KLMH
•   Circuit delay is measured on signal paths
    − From primary inputs to sequential elements; from sequentials to primary outputs
    − From sequentials to sequentials
•   Timing constraints
    − Actual arrival times (AATs) at primary inputs and output pins of sequentials
    − Required arrival times (RATs) at primary outputs and input pins of sequentials
• Time budgeting: divides prescribed circuit delay into net delay bounds
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 71
                                                                                                        Lienig
    Summary of Chapter 8 – Timing-Driven Placement
                                                                                               © KLMH
•   Gate/cell locations affect wire lengths, which affect net delays
•   Timing-driven placement optimizes gate/cell locations to improve timing
    − Interacts with timing analysis to identify critical nets, then biases placement opt.
    − Must keep total wirelength low too, otherwise routing will fail
    − Timing optimization may increase routing congestion
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 72
                                                                                                        Lienig
    Summary of Chapter 8 – Timing-Driven Routing
                                                                                               © KLMH
•   Timing-driven routing has several aspects
    − Individual nets: trading longer wires for shorter source-to-sink paths
    − Coupling capacitance and signal integrity: parallel wires act as capacitors
      and can slow-down/speed-up signal transitions
    − Full-netlist optimization: prioritize the nets that should be optimized first
•   Full-netlist optimization
    − Run trial routing, then run timing analysis to identify critical nets
    − Then adjust accordingly, repeat until convergence
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 73
                                                                                                        Lienig
    Summary of Chapter 8 – Physical Synthesis
                                                                                               © KLMH
•   Traditionally, place-and-route have been performed after the netlist is known
•   However, fixing gate sizes and net topologies early
    does not account for placement-aware timing analysis
    − Gate locations and net routes are not available
•   Physical synthesis uses information from trial placement to modify the netlist
•   Net buffering: splits a net into smaller (approx. equal length) segments
    − A long net has high capacitance, the driver may be too weak
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 74
Lienig