ELEX 7660 : Digital System Design
2018 Winter Term
                                            Static Timing Analysis
This lecture describes how static timing analysis is used to ensure timing constraints for a digital design are met.
After this lecture you should understand the terms defined in this lecture and compute setup and hold slack from a delay-
annotated schematic.
                                                                                                D-flip-flop
Introduction
Timing “constraints,” are requirements such as the                                                       0
                                                                                                1               Q
clock rate or setup and hold specifications. Meeting                                                      1
these constraints is often as difficult as ensuring a de-                                 D      0    QM
sign is logically correct.                                                          clock
   Reliable operation requires that the designer cor-
                                                                                        transparent latch
rectly specify the timing constraints and modify the
design until they are met.
                                                                          When the clock input is 0, the output of the first
                                                                       multiplexer follows the input – the latch is “transpar-
Propagation Delays                                                     ent”. When the clock input is 1, the output level is fed
Propagation delay, 𝑡PD , is the delay from a change at                 back to the input and held at that level1 .
an input to the change at the output of a combina-                        If the latch output is at the logic threshold voltage
tional logic circuit. The clock-to-output delay, 𝑡CO , is              when the clock change from 0 to 1 then the multi-
the delay from the rising edge at a flip-flop clock in-                  plexer might not be able to decide whether to feed
put to the change at the Q output.                                     back a 0 or 1. The multiplexer output could remain
                                                                       at an invalid level for much longer than 𝑡CO . This be-
                                                                       haviour is called “metastability” and can result in un-
           x0                         D         Q                      reliable circuit operation.
           x1              Y
                            clock
                   tPD                    tCO
           X                clock                                              clock
           Y                    Q
                                                                       input                                                    output
                                                                                                                    ?
                                                                                                                          ?
                                                                                                                    ?
   These delays are caused by the time required to
charge the parasitic capacitances of transistors and
                                                          To avoid metastability we must ensure the voltage
interconnects.
                                                       at the latch input is at valid level long enough to drive
   In the timing diagrams above the parallel lines in-
                                                       the latch output to a valid voltage level. The time re-
dicate times during which is signal is held at a high
                                                       quired for this is called the “setup” time, 𝑡SU .
or low level. The crossing lines indicate the times at
                                                          The input level must also be held at the correct
which the signal changes.
                                                       level until the multiplexer has switched off com-
                                                       pletely. This is typically a much shorter time – often
Metastability, Setup and Hold Times                    zero – and is called the “hold” time, 𝑡H .
Consider the following implementation of an edge-    1
                                                       This is a “master-slave” flip-flop. The second, “slave,“ latch
triggered D flip-flop:                              holds the previously latched value when the clock is 0
lec8.tex                                                           1                                                2018-03-19 23:34
Synchronous Design                                                                                 data arrival path
                                                                                 D
                                                                                                    combinational       D1
To avoid metastability almost all digital circuits are                                   Q
                                                                                          0           logic
                                                                                                                              Q
“synchronous.” These circuits are composed of edge-
triggered flip-flops with combinational logic between                   clock                               delay
their outputs and inputs:
                                                                                               clock arrival path
              D   Q
                            combinational    D1   Q     The time from the launch edge to the data arrival
                  0           logic
                                                     at the D flip-flop input is called the data available
     clock                                           time. The time at which the latch clock edge arrives is
                                                     called the clock arrival time. The delays included in
                                                     calculating these times include interconnect delays,
  By ensuring the propagation delays through the
                                                     𝑡 , and 𝑡PD
combinational logic will meet the setup and hold re- CO
quirements we can avoid metastable behaviour.
  The timing diagram below shows the relationship Timing Netlists
between the clock edges and the valid times at the It is important to note that the only timing paths that
inputs and outputs of each flip-flop:                  need to be analyzed are those that start at a clock in-
                                                              put (or a chip input pin, called an “input port”) and
                                      >tSU                    end at the D input of a flip-flop (or a chip output port).
          Q0 or D1                                               However, there may be more than one path from a
                      tCO       tPD
                                                              clock to a particular D input. For example, consider
              clock
                                                              the half-adder shown in Figure 1. The numbers next
            launch edge         latch edge
                                                              to input pins are the delays from that input to the out-
                                                              put, including interconnect delay to that input2
                                                                 The data structure used for STA is a directed
   Q changes 𝑡CO after the rising clock edge. 𝑡PD later
                                                              (acyclic) graph where each node represents a pin.
the input at the D input of the right flip-flop will have
                                                              Edges represent delays and are labelled with the
a valid (and correct) logic level. This must happen 𝑡SU
                                                              propagation delay (including both gate and intercon-
at the latest before the next rising edge of the clock.
                                                              nect). In the following graph each node represents
This level must also be held for at least 𝑡H before it
                                                              an output and the values on the edges represents the
changes.
                                                              delays:
   The diagram above identifies two clock edges, the
“launch” and “latch” edges. In this example the
edges are separated by the clock period. However,                                                     carry_next
the clocks may arrive at different times due to dif-                                  9                            6
                                                                                                                      carry
                                                                                     10
ferent interconnect delays. This is known as “clock                          a            5   t2      4
skew.” It’s also possible that the two clocks have dif-                 11                5                                       5
ferent frequencies or latch on the falling edge.                                                                      sum
                                                              clock                            sum_next           6
   This setup time is often called a “library” or “mi-                 7                  7
                                                                             b       4        t1      3
cro” setup time to distinguish it from the chip I/O
setup and hold times.
                                                      In this example there is only one clock, clock, and
                                                    so all paths start at clock. There are four flip-flops
Static Timing Analysis                              but we will limit our analysis to carry and sum for
                                                    now.
Timing Paths                                          The sums of the delays along the data paths, work-
To avoid metastability we must compare the propa-   ing  from output to input, are:
gation delay along the data path to the propagation    2
                                                         These are not the real values, I’m using round numbers to
delay along the clock path as shown below:          make the arithmetic easier.
                                                          2
                                                          xxxxx
                                        a
                                    7
               a_in                 D                 5xxxxxx                                     carry~reg0
                                    11                                   carry_next
                                     CLK     Q                           9                           6
                            1'h0                                                                     D
                                    SCLR                                 10                          24          1
                                                                                                       CLK   Q       carry
                                                          t2                                  1'h0
                                        b                 5                                          SCLR
                                    7                     5
               b_in                 D                                    sum_next                    sum~reg0
                                    7                     t1                 4
              clock                   CLK    Q            7                                          6
                            1'h0                                          3                          D
                                    SCLR                  4                                          23          1
                                                                                                      CLK    Q       sum
                                                                                              1'h0
                                                                                                     SCLR
                                      Figure 1: Delay-annotated half-adder schematic.
carry.D (6) + carry_next (9) + a.clk (11) = 26                           The minimum clock arrival time and maximum
                                                                      data arrival time are used when computing the setup
carry.D (6) + carry_next (10) + b.clk (7) = 23
                                                                      time:
sum.D (6) + sum_next (4) + t2 (5) + a.clk (11) = 26                      𝑡SU = 𝑡clock arrival (min) − 𝑡data arrival (max)
sum.D (6) + sum_next (4) + t2 (5) + b.clk (7) = 22                    and the max data arrival and minimum clock arrival
                                                                      times are used when computing the hold time:
sum.D (6) + sum_next (3) + t1 (7) + a.clk (11) = 27
                                                                         𝑡H = 𝑡data arrival (min) − 𝑡clock arrival (max)
                                                                         The setup and hold times on each timing path are
Exercise 1: What is the remaining path and delay? What are the        then compared to the required setup and hold times.
clock path delays?                                                    The difference is called the “slack.” A positive slack
   For the purposes of timing analysis we only need                   means the requirement is exceeded.
to find the path with the minimum and the path with                       Since each clock has many launch and latch edges,
the maximum delay between each clock to D-input                       the STA must pick an appropriate pair. The rule
path and each clock to clock input path. For the                      is to use the latch edge immediately following the
carry flip-flop the data path delays are 23 (min) and                   launch edge when computing the setup time and to
26 (max). For the sum flip-flop these are 20 (min) and                  use the latch edge immmediately before the launch
27 (max). This reduces the graph to:                                  edge when computing the hold time.
                                                                      Exercise 2: Use numbers in the graph above to compute the setup
                            23/26                                     time slack for carry if the clock period is 10 ns.
                                            carry.D
                            20/27
                                                                         The following screen capture from Time Quest, the
                                            sum.D                     Intel FPGA STA tool, shows the clock and data wave-
             clock          24/24
                                            carry.clk                 forms used to compute of the setup time along the
                             23/23
                                            sum.clk                   path from the clock input to the carry flip-flop. Note
                                                                      the two slightly different clock delays and the term
                                                                      “Data Required Time” which includes the setup time.
where the pairs of numbers are the minimum and
maximum delays on each path.
STA Algorithm
A static timing analyzer finds the timing paths and
min/max delays from a delay-annotated netlist, com-
putes the time difference between clock and data ar-
rival times and checks that the corresponding setup
and hold requirements are met.
                                                                  3
   The following screen capture shows how the data Asynchronous Clocks and Inputs
arrival time is computed by adding up the various
propagation delays along the path:                   If all clocks are derived from the same source clock
                                                     (e.g. through clock division or using a PLL) the time
                                                     relationships between clocks remains constant and
                                                     it’s possible to verify that timing constraints will be
                                                     met.
                                                         However, if two clocks are physically independent
                                                     then this is not possible – the setup and hold tim-
                                                     ing requirement of flip-flops with asynchronous in-
                                                     puts are bound to be violated at some point. Even
                                                     though it’s not possible to do timing analysis on asyn-
                                                     chronous signals, it is possible to determine how
                                                     often timing violations happen when signals cross
                                                     clock “domains” and the consequences. This topic
                                                     will be covered in more detail later.
   In this example the clock period is 10 ns and the
setup slack on this path is about 9 ns.
                                                             Timing Simulations
Closing Timning                                              A timing-annotated netlist can be used by a simula-
                                                             tor to run simulations that take into account delays.
“Closing” timing is the process of iterating a design
                                                             During the simulation the simulator can check that
until all paths have positive slack. There are various
                                                             the setup and hold requirements of each flip-flop are
options when a design does not meet its timing re-
                                                             met.
quirements:
                                                                The advantage of this “dynamic” timing analysis is
  • ask the EDA software to spend more time (ef-             that the verification results are independent of, and
    fort) optimizing the layout and routing                  can serve as a check on, user-provided timing con-
                                                             straints. The disadvantage is that the simulation may
  • use a larger or faster device or process – this          not cover all possible events. Timing simulations can
    makes it easier to optimize PAR                          be time-consuming for large designs and are primar-
  • modify the design to speed up critical timing            ily used for ASIC “sign-off.”
    paths. This might mean having more logic in
    parallel or dividing up the computation into
    more clock cycles.
  • relax the design constraints (e.g. reduce the
    clock rate)
the choice will depend on the project requirements
and available resources.
PVT and Corners
The propagation delays on one die will depend on the
temperate and voltage. There will also be differences
between die due to process differences. STA should
be repeated using delays for the expected “PVT” (Pro-
cess, Voltage, Temperature) extremes. The PVT com-
bination that results in the maximum or minimum
delays is called a “corner.”