VLSI Design I
CMOS Sequential Logic
                           Clocking Strategies
           Today’s handouts:
              (1) Lecture Slides
                                            MicroLab, VLSI-10 (1/21)
JMM v1.2
                                   Sequential Logic
    Use #1: Get better utilization from
    idle combinational logic blocks.
    Pipeline the system so that new
    computations start before the old ones
    complete. Add registers to keep
    computations separate.
               8
           A
                       8                    Use #2: Convert parallel operations
                   x       C
           B                                to a sequence of (faster, smaller)
               8                            serial operations.
                               1
           A
                                            1
                                    +                           C
           B
                               8        8
           Use #3: Need to process a
           sequence of inputs and want to
           reuse the same hardware (finit
           state machine).
                                                        MicroLab, VLSI-10 (2/21)
JMM v1.2
                         Latches and Flip-Flops
                                                        Q follows D
                  D      Q             D
                  G                    G
                                       Q
            level sensitive latch
                                                               Q stable
                                                        Q takes value from D
                  D      Q             D
                   clk                clk
                                       Q
           edge sensitive flip-flop
                                                               Q stable
 A static latch will hold data while G is inactive, however long
 that may be. A dynamic latch will hold data while G is
 inactive, but only “for a while”, after which the saved value
 may decay.
                Do static latches dissipate static power?
                How long is “for a while”?
                Which one should I use?
                                                            MicroLab, VLSI-10 (3/21)
JMM v1.2
                   Latch Timing Constraints #1
                         latch a                  latch b
                             D Q     CLa              D Q     CLb                  D Q
                             G                        G                            G
             CLK
                                            t1a
                                                  t2b
                                                  H       S
                        CLK         H       S
           Do I have to
           check ALL these            t1a = tmqa+ tmda > thb
           constraints?
                                     t1b = tmqb + tmdb > tha
                                   t2a = tqa + tda < tc0 - tsb
                                   t2b = tqb + tdb < tc1 - tsa
               th = hold time
               ts = setup time
               tm = min delay from invalid input to invalid output
               td = max delay from valid input to valid output for comb. logic
               tq = max delay from G to Q
               tc0 = low periode of clock cycle tc
                                                              MicroLab, VLSI-10 (4/21)
JMM v1.2
             Latch Timing Constraints #2
                                    t1a
                                          t2b
                                          H     S
                CLK         H      S
                              t1a = tmqa+ tmda > thb
                             t1b = tmqb + tmdb > tha
                           t2a = tqa + tda < tc0 - tsb
                           t2b = tqb + tdb < tc1 - tsa
  Questions for latch-based designs:
           w how much time for useful work (i.e. for combinational logic
             delay)?
                           tda + tdb < tc - 2(ts + tq)
           w what is the maximal clock frequency
           w does it help to guarantee a minimum tm, for example, by requiring
             a minimum number of gates in each cloud?
           w Suppose the maximum clock skew is tSKEW. How does that affect
             the equations above? Clock skew measures the difference in
             arrival of CLK at two cascaded latches (not necessarily any two
             latches!).
                                                    MicroLab, VLSI-10 (5/21)
JMM v1.2
                                   Static Latches
           Basic idea:                         Want storage node to
                                               be isolated from whatever
      Need gain around                         user does to Q.
      this loop to make                0
      latch static.
                                                  Q
                               D       1
                                            Would like fast CLK-to-Q,
                                            small setup and zero hold
                                            times.
                                      CLK
                                                   Oops… feedback not
           Obvious implementation:                 isolated from Q. Could
                                                   add additional
                                                   output inverters...
 Good! Input goes
 only to fet gates
                                                                             Q
            D            D
    CLKN
           CLK               CLK
                                                 Should we buffer CLK
                                                 0, 1 or 2 times?
                                                  MicroLab, VLSI-10 (6/21)
JMM v1.2
                             Latch Timing
                                        1                         2
                   CLK
             setup time = how long D input has to be stable
             before CLK transition.
             hold time = how long D input has to be stable
             after CLK transition.
                                              ts
                                                         th
                     CLK
           So, what node should we use to measure
           setup and hold times? And what should we measure?
           Other time of interest: CLK-to-Q        MicroLab, VLSI-10 (7/21)
JMM v1.2
                                Dynamic Latches
           Suppose in the interest of speed we were
           willing to give up the “static guarantee”
           and take our chances with dynamic latches,
           i.e., remove feedback path...
                                                             Eliminate when
                                                             Q fanout is small (1)
                           D                                                         Q
      Can combine
      other logic
      with inverter
                               CLK                                   local or global
                                                                     clock inverter?
            Can we do without the CLK inverter too?
            DEC did without on 21064 but put in back in for 21164
           CLKN
                                                      D                                  Q
              D                        Q
                                                CLK
            CLK
                  Delete the PFET driven by CLKN and then add
                  NFET driven by CLK in Q’s pulldown path to
                  handle what happens when D goes from 1 to 0.
                                                          MicroLab, VLSI-10 (8/21)
JMM v1.2
                   Single-Phase Clocked Systems
           RTL #1:
                        D Q                   D Q                                  D Q
                         clk                   clk                                 clk
             CLK
           latch #2:
                        D Q                   D Q                              D Q
                        G                     G                                G
             CLK
           Simplest clocking methodology is to use a single clock in conjunction
           with a register. Clocks are generated with global clock buffers.
           CLK and CLK are generated locally.
                                                  buffers necessary
                                                  for large loads
              clk-in
                                                clk
                                                  clk
                                                        MicroLab, VLSI-10 (9/21)
JMM v1.2
                             Clock Skew
                  D Q                D Q                               D Q
                   clk                clk                                  clk
           CLK           delay               delay
      w if a clock net is heavily loaded, there might be a race
         between clock and data -> clock skew
      w special attention has be made by designing the clock
         tree. CAD tools are able to design balanced clock trees.
      w two methods to avoid clock skew:
                                     latch
                  D Q                D Q             D Q
                   clk                clk             clk
           CLK           delay
                  D Q                D Q
                   clk                clk
                         delay                       CLK
                                               MicroLab, VLSI-10 (10/21)
JMM v1.2
                  Two-Phase Clocked Systems
                       D Q               D Q                           D Q
                       G                 G                             G
           PHI1
           PHI2
                                  phi1
              “non-overlapping
              two phase clocks”   phi2
    w a problem in singlem phase clocked systems is the
       generation ad distribution of nearly perfect overlapping
       clocks.
    w in two-phase clocked systems this is solved by non-
       overlapping clocks
    w non-overlapping clocks can be generated with latch
       structures
                       clk          ≥1                   phi1
                                    ≥1                    phi2
                                               MicroLab, VLSI-10 (11/21)
JMM v1.2
                         Clock Distribution
           Two main techniques for clock distribution exist:
           u a single large buffer (see Alpha processor)
           u a distributed clock tree approach
                                           n-bit datapath
                                           n-bit datapath
                                           n-bit datapath
                                           n-bit datapath
                                           n-bit datapath
                                           n-bit datapath                 delays have
                                           n-bit datapath                 to match
           clk                             n-bit datapath                 between
                                           n-bit datapath                 stages
                                           n-bit datapath
                                           n-bit datapath
                                           n-bit datapath
           u there is no such thing as design-free clocking
             strategy in today’s high-performance processes
           u clock buffers should be surrounded by power pads
             due to its large power consumption
           vdd clk gnd               clk
                               clk           clk            clk     clk driver
                                     clk
                                                   MicroLab, VLSI-10 (12/21)
JMM v1.2
           Phase Locked Loop Clock Technique
             Phase locked loops (PLL) are used to generate
             internal clocks on chips for two main reasons:
           u to synchronize the internal clock of a chip with an
             external clock
           u to operate the internal clock at a higher rate than
             the external clock input
               clock                                  clock
                                                   PLL
                          clock                                        clock
                          route                                        route
                   dclk                                      dclk
                          dclk+dpad                                    dclk+dpad
  clock                                    clock
    dclk                                    dclk
    data out                                data out
                                               MicroLab, VLSI-10 (13/21)
JMM v1.2
                           Flip-flops (registers)
      Using alternating positive and negative dynamic latches with
      a single clock gives great speed and small area, but…
               w lots of worries about clock skew
               w must balance logic delays to minimize wastage
               w need latch size checks (check optimizations!)
      What about those of us who don’t have buildings full of
      engineers to sweat the details? Use D-flip-flops and
      address all the problems once!
           D           D       Q   D       Q       Q           D             D     Q   Q
                        master         slave
                       G           G                      CLK
    CLK
                           D
                      CLK
                           Q
                                               !
                                                       MicroLab, VLSI-10 (14/21)
JMM v1.2
                    Flip-flop Implementations
           Obvious implementation:
                                                                                 Q
     D
           CLK
           Use “jamb” latches to lighten CLK load:
                              “Weak” feedback inverters
                              (long n and p) get overridden
           D                                                                     Q
      CLK
                                                     MicroLab, VLSI-10 (15/21)
JMM v1.2
                         Flip-Flop Timing
                             D Q         CLa        D Q
                              clk                     clk
                  CLK
                                    t1
                                               t2
                  CLK
                            t1 = tmq + tma > th
                          t2 = tq + tda < tc - ts
    Questions for register-based designs:
           w how much time for useful work (i.e. for combinational logic
             delay)?
           w does it help to guarantee a minimum tm? How about designing
             registers so that
                                        tmq > th?
           w Supose the maximum clock skew is tSKEW. How does that affect
             the equations above?
                                                    MicroLab, VLSI-10 (16/21)
JMM v1.2
                       Dynamic Flip-Flops
                            I’ll have the Christer Svensson
                            special please!
                                                         2
            CLK                                                                   QN
           CLK is low:
             w node 1 follows not(D)
             w node 2 pulled up
             w QN is “floating” with it’s old value
           CLK is high:
             w node 2 = “0” if node 1 = “1”,
                     otherwise it stays “1”
                     ð node 2 = not(node 1) shortly after CLKé
             w QN = not(node 2) ð stable soon after CLKé
             w node 1 can be pulled down if D goes to “0” (capacitive
                     coupling), but node 2 won’t change!
                                                      MicroLab, VLSI-10 (17/21)
JMM v1.2
                      Static Timing Analysis
             Do I have to              Yup, for every pair of connected
             check ALL the             register/latches AND for all
             constraints?              possible data values!
           We need a CAD tool: static timing analyzer. Here’s how
           it works:
           Step 1: “Level-ize” all signal nodes.
                Start by assigning all register outputs and top-level inputs a
                level of 0. For all other gates: levelOUTPUT =
                max(levelINPUT )+1.
           Step 2: Compute min/max signal delays.
                For each successive node level, compute min and max time for
                all nodes on that level (see next slide for details). This is a
                “data independent” computation. Might need case analysis to
                avoid false paths.
           Step 3: Check setup and hold constraints
                Use min times of register inputs to check hold time. Use max
                times and tCLK to check setup time or use max time + tSETUP
                to determine min tCLK.
                                                      MicroLab, VLSI-10 (18/21)
JMM v1.2
                       Stage Delay Computation
           Look at each gate and use knowledge of input timing and rise/fall
           timing to compute earliest and latest time output could change for
           both rising and falling output transitions.
                                                                       IN      VDD
                                          D é ð OUT ê
                                                                                C1   COUT
                                2
              CLKN                                          min ð 1=OV, fast
                 IN                 OUT                     max ð 1=VDD, slow
               CLK
                                1                                      IN GND
                                          D ê ð OUT é
                                                                                C2   COUT
               Other transitions:
                CLK é, CLK ê, CLKN é, CLKN ê                min ð 2= VDD , fast
                                                            max ð 2=0V, slow
           Use Penfield-Rubenstein model to compute
           td,in-out = sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
           total “effective resistance” to power rail and C i is non-zero if node
           capacitor needs to be charged/discharged. Multiply by derating
           factor to account for rise/fall time of input.
                                                         MicroLab, VLSI-10 (19/21)
JMM v1.2
                              Coming Up...
           Next topic…
             Finite state machines: state diagrams, state
             minimization, state assignment, logic and PLA
             implementations.
           Readings for next time…
             Weste:
               u Sections 5.5 thru 5.5.6 (latch, FF)
               u 5.5.8 thru 5.5.11 (clock strategy)
               u 5.5.15 and 5.5.16 (clock strategy)
           Selfstudy…
              Weste:
               u PLL   section 9.3.5.3
                                                  MicroLab, VLSI-10 (20/21)
JMM v1.2
                         Exercises: VLSI-10
           Ex vlsi10.1 (difficulty: easy): calculate peak current
              and power cnsumption of a 100MHz clock driver
              with rise and fall times of 1ns driving 30k registers
              bits at 100fF each with Vdd=3.3V
           Result: Ipeak=9.9A, Pd=2.18 Watt
                                                MicroLab, VLSI-10 (21/21)
JMM v1.2