0% found this document useful (0 votes)
20 views38 pages

Clock Distribution Network Design

Uploaded by

Sindhu Ojha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views38 pages

Clock Distribution Network Design

Uploaded by

Sindhu Ojha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Custom Clock

Distribution Network Design


Fundamentals

Fuding Ge
Clock Uncertainty
• Jitter (temporal, dynamic uncertainty due to PLL,
generally not considered in the design)
• Skew (spatial, static uncertainty)
– Systematic: uneven load, routing,…
– Statistic: process variation, temperature, noise, power
supply,…
• Total clock inaccuracy (jitter & skew) is generally
about 10%

The goal of clock distribution network is to deliver clock


signal with minimum skew at the cost of reasonable
power, area, delay.
8/23/2001 Fuding Ge 2
Skew Source

• Non-uniform load
• Temperature Gradient
• Threshold Voltage Fluctuation
• Transistor channel length Tolerance
• Gate oxide thickness Tolerance
• Wire thickness variation

8/23/2001 Fuding Ge 3
Design Target

• Delay (Power-Area-Delay Trade off)


• Skew (As small as possible)
• Duty cycle (Keep constant: rising edge delay = falling
edge delay, not rising time = falling time)
• Slew rate (large means more power, large noise, and small
delay. The Slew rate should be kept reasonable fast)

8/23/2001 Fuding Ge 4
Noise Reduction

• Shielding the clock signal path:


– Minimum width vcc and vss line in each side of the
clock routing metal line (same metal layer).
– Minimum crossover signal lines, shielding (different
metal layers).
• Decoupling Capacitor:
– Size of decoupling capacitor = 3 – 5 (load capacitor) for
each buffer (inverter).

8/23/2001 Fuding Ge 5
Matching

• The goal is to reduce statistic skew.


• Using the same number of inversions for every branch.
• Using same gate for clock gating (both NAND or NOR).
• Using the same size inverters if possible.
• Same transistor orientation with each other.
• Clock choppers should be close to the clock signal
destination (usually bistable elements).
• The average polysilicon density in a region should be kept
nearly constant.

8/23/2001 Fuding Ge 6
Layout Issue: Matching

• Mirror: Sometime the layout is to mirror one block to


become another block. Make sure when the transistor flips,
no mismatch occur: break it into even number fingers
S(d)

G
d(s)

8/23/2001 Fuding Ge 7
Equivalent Pin Ordering (Low Power Idea)

• Transitions involve transistors closer to the output


node have less delay and consume less energy !!!
– Clock signal always connect to the transistors closer to
output node;
– Example: NAND, Clock connect to pin A !

8/23/2001 Fuding Ge 8
Fanout, Delay and Power

• Fanout = 3- 4 for minimum delay and power


• Fanout = 7-8 for minimum power and reasonable delay
• Fanout = or < 2, both power and delay increase
dramatically.
• The last drive usually need a large slew rate to reduce skew
(Fanout = 3 is good)

8/23/2001 Fuding Ge 9
FO4 Inverter Delay
1x 4x 16x 64x

delay

• FO4 (fan-out of 4) inverter delay is a process-independent


unit of delay.
• Divide speed of circuit by speed of FO4 delay to get a
metric pretty stable over process, voltage and temperature.
• Cycle time can be expressed with FO4 inverter delay to
show the design aggressiveness.

8/23/2001 Fuding Ge 10
FO4 Inverter Delay

1x 4x 16x 64x

delay

1x inverter size

This figure shows the FO4 delay VS the transistor


size. 0.13 µm CMOS technology.

8/23/2001 Fuding Ge 11
Interconnect Modeling Parameters

• Width and length of the wire and which metal layer it is


• The metal layer above and below
• Space between the interconnect and its neighboring line in
the same metal level (space).
• Miller Coefficient models the line-line capacitance

Line-Line Capacitance is typically 60-80% of the total


wire capacitance for a minimum wire at minimum pitch
in a fully occupied wiring environment.

8/23/2001 Fuding Ge 12
Interconnect Modeling
R R/2 R/2

C/2 C/2 C

π Model T model

It is recommended to model long wire using 4 π−


segments in simulation and one for hand calculation.
td

Rw
CL The Elmore delay of the model is:
Original Circuit

Rw
t d = Rinv (C diff + C / 2) + ( Rinv + Rw )(C / 2 + C L )
Rinv Cdiff
C/2 C/2 cL
= Rinv (C diff + C L + C ) + Rw C / 2 + Rw C L
RC Model 1st term: inverter driving its own diffusion, the load and the wire
capacitance;
2nd term: quadratic delay of the wire self loading;
3rd term: extra delay contributed by wire resistance discharging
8/23/2001 Fuding
the load Ge
capacitance. 13
Interconnect Delay and Power VS Width
delay

current

Space = 1µm
Length= 1000 µm

For a given driver size and wire space, there is a interconnect width
that the delay shows minimum value. The reason is that as the
width increases, the resistance decreases, while the capacitance
8/23/2001 increases. Fuding Ge 14
Minimum Interconnect Delay VS Inverter
Repeater Size
Space =1 µm
Length = 1000 µm
14/5
delay

28/10

56/20
112/40

The best interconnect width is driver size dependent !!!


8/23/2001 Fuding Ge 15
Interconnect Delay VS Wire Spacing

Increasing the inter-wire space reduces the cross


coupling delay.

It is typically better first to increase wire spacing rather than width when trying to minimize
interconnect delay. This is more important because of non-uniform scaling of metal layers. The
thickness of metal layers is not being scaled down as fast as the width and length dimensions.
8/23/2001 Fuding Ge 16
Interconnect Delay VS Wire Pitch

For the same pitch, increasing the width (and, therefore


decreasing the space) tends to increase the interconnect delay,
as shown in Fig.c above. Therefore, for a given pitch, it is
better to increase the wire space than the wire width.
Reference: J. Yim, S. Bae, and C. Kyung “A Floorplan-based Planning Methodology for Power and Clock Distribution in
ASICs”, Proc. Design Automation Conference, PP. 766-71, 1999

8/23/2001 Fuding Ge 17
Elmore Delay
Equation: D i = ∑
k ∈ Pi
R k C k
R R

C2 Co1
Rdev R R R
R
Example: input
C1 C3 Co2

R R

C4 C03

The delay from input to Co1 is:

D Co 1 = ( Rdev + R ) C 1 + ( Rdev + 2 R )( C 3 + Co 2 + C 4 + Co 3 )
+ ( Rdev + 3 R ) C 2 + ( rdev + 4 R ) Co 1

Elmore delay of a branched network can be described as the


RC delay of the unique path from input to output where all side
branches are replaced by their lumped capacitive value
8/23/2001 Fuding Ge 18
Interconnect Delay Modeling Using Elmore
Delay
Assume the total resistance and capacitance of the interconnect is R and C
respectively. We model the wire using n segments of RC network. Then let n to
be infinity to simulate the distribution manner of the wire.
Using Elmore delay model:

R/n R/n R/n R/n


in ... out
C/n C/n C/n C/n

C R 2R 2R nR n ( n + 1)
D= ( + + +L+ )= 2
RC
n n n n n 2n
n ( n + 1) 1
lim n→ ∞ 2
RC = RC
2n 2

The delay of a distributed RC line is equal to the the


8/23/2001 delay of a lumped RC with
Fuding Ge half the resistance !!! 19
RC Delay and Rise (Fall) Time Constant
V R
Vin Vout
Vdd
C
0 Vout(t)=Vdd(1-e-t/RC)

t t

When Vout is 50% Vdd:


Vdd (1 − e − t / RC ) = 1 Vdd
2
t d = RC ln 2 = 0 . 69 RC

The rise time from 10% Vdd to 90% Vdd is


t r = RC ln 0 . 9 − RC ln 0 . 1 = RC ln 9 = 2 . 2 RC

8/23/2001 Fuding Ge 20
Delay of Long Wire

W=2.5 µm, s=1 µm

The delay of a long wire is proportional to the square


of its length because both the resistance and
capacitance increase linearly with length.
8/23/2001 Fuding Ge 21
How to Minimize Long Wire Delay ?

• One way to minimize the delay of a long wire is to break


the wire into segments and insert repeaters between
segments. In this way the delay can be made linear with
length.
• How many segments and how large the repeater ???
• We answer these questions step by step.

8/23/2001 Fuding Ge 22
Device Modeling

Rt Rt RL

Ci Cp Ci Cp CL

•We model the driver as shown above, Ci is the input


capacitance (gate capacitance), Rt is the on resistance and
Co is the parasitic capacitance from source to drain. RtCo
product is the intrinsic delay of the MOS transistor which
is the delay of an inverter driving its own gate (about 12 ps
for 0.35 channel length).
•When driving a RC network, the 50% delay for an step
input is:
t d = 0 . 69 [ RtCp + ( Rt + RL ) CL ]
8/23/2001 Fuding Ge 23
Logic Effort: General Gate Delay Model

•Logic effort Le is the ratio of the input capacitance of a gate to the


input capacitance of normal skew inverter with the same drive
strength. It describes the relative ability of gate topology to deliver
current (defined to be 1 for an inverter)
•Electric effort (fanout) Ee is the ratio of output to input
capacitance CL/Ci.
•t0 is the intrinsic delay of the normal skew inverter and p=Cp/Ci
We have: td = t0 Le (E e + p)
where t 0 = 0.69RtCi
t0 is a process technology parameter, independent of the
size of the inverter.
8/23/2001 Fuding Ge 24
Logic Efforts of Some Gates
2 2 2

B
2
1
A
2

Inverter, Le=1
NAND, Le=4/3
4 4
A
4
4

B
4 C
2
B
1
A
1 1 2

NOR, Le=5/3 AOI, Le=2 for A,C; 5/3 for B

Le=(CgateRgate)/(CinvRinv). One find Le by either making the input capacitance of


the gate equal to the inverter, and looking the resistance ratio, or by making the
resistance equal and looking the capacitance.
8/23/2001 Fuding Ge 25
Long Wire Delay Driving by Inverter Repeaters
• Source/drain diffusion capacitance is negligible.
• The size of the inverter repeater are n times of an unit
inverter whose on resistance and input capacitance are Rt
and Ci respectively, so its on resistance is Rt/n and input
capacitance is nCi.
• The resistance and capacitance per unit length of the wire
are Rw and Cw respectively, and the wire length is L.
• The long wire is broken into S segments.
The delay through the entire wire, using Elmore delay
equation, is:
Rt RwL CwL Rt RwL
t w = t 0 S [( + ) +( + ) nCi ]
n 2S S n S
8/23/2001 Fuding Ge 26
Optimal Inverter Size and Number of Segments
From ∂ t w ∂ n = 0 and ∂ t w ∂ S = 0
At these condition,
We have: CwRt Cwt 0
n= = the inverter delay
CiRw Ci 2 Rw
is equal the wire
CwRw CwRw RC delay which is
S=L =L
2CiRt 2t 0 RtCt
CiRt 2t 0
Optimal segment length: ls = =
CwRw / 2 CwRw

Wire delay: t w = L CwRwCiRt(2 + 2 ) = 3.41L CwRwCiRt


Note that CiRt is the intrinsic delay of the inverter driver. The optimal inverter size is
independent of wire length; it is only a function of the physical parameters of wire and
transistors.

8/23/2001 Fuding Ge 27
Inverter Driver Characterization: t0
For a inverter, it’s logic effort Le is 1, we have:
t d = t 0 ( Ee + p) = 0.69 RtCi( Ee + p)
We see that delay is a linear function of fan-out and the
intrinsic delay t0 is the slope.

This is the results for 0.13 µm technology, t0 is only about 7 ps!!


8/23/2001 Fuding Ge 28
Inverter Driver Characterization: Ci
The input capacitance of the ideal R
inverter can be measured using input X Inv
the following test circuit. The
measurement
middle point delay from input node x
to node x is 0.693RCi.

This figure shows the simulation


results: R=100Ω and the load is 100
1x inverters.

Time (ns)

The result should be compared with Ci = εoxWL/Tox, where εox is the


dielectric constant of the gate oxide (about 35 aF/µm for SiO2).
8/23/2001 Fuding Ge 29
Optimal for Delay-Power Product

The issues with speed-optimal design are large driver size,


high power dissipation and noise generation.
It has been shown that when the inverter size is scaled
down by a factor of 0.7 from the speed-optimal size and
the repeating length is enlarged by 1.43 (=1/0.7), the
delay-power product reaches minimum.

Reference: Hongjiang Song, “A theoretical design basis for low power and small area VLSI interconnect repeaters”,

8/23/2001 Fuding Ge 30
Buffer Repeater Optimal for Speed
Length=L

nx knx nx knx nx knx nx

The delay of the interconnect is minimized by choosing


the drivers size n, k and segments number S. The delay is:
Rt RwL CwL Rt RwL
t w = t 0 S [( + ) +( + ) nCi + kRtCt ]
nk 2S S nk S

Take the partial derivatives with respect to k, n and S,


we get:

CwL CwRt CwRw


k = 1+ n= S=L
nSCi kCiRw 2( k + 1 )CiRt
k

8/23/2001 Fuding Ge 31
Buffer Repeater Size

k = 2 + 5 = 2.06

The ratio of inverter size is independent of wire and


transistor characteristics.

t w = 3 . 64 L CwRwCiRt

8/23/2001 Fuding Ge 32
Inverter or Buffer Repeater ?

No polarity problem for buffer repeaters and longer repeat


length for buffer repeater.
But:
Delay is 3.64/3.41=1.067, %7 delay penalty for buffer
repeater;
Area penalty is about 1.33;
Power is about 1.17 times of the inverter repeater.

8/23/2001 Fuding Ge 33
Critical Signals VS Non-critical Signals

• Critical Signals: Signals need to be synchronous


with signal out of the chip
– Skew must be small, global skew
• Non-Critical: synchronous inside the chip
– Signals between different blocks
– Skew can be larger than the critical signals, local skew

8/23/2001 Fuding Ge 34
Clock Distribution Topologies:
H-Tree

H-Tree: Non-uniform load distribution Serpentine:Great amount of wiring resource.


leads to skew. Best wiring efficiency. Easy to implement with non-even load
Poor automatic clock routing

A mesh distribution reduces the skew by tying the


outputs of their drivers together in a 2-dimensional
mesh. It can range anywhere from a global grid mesh to
a localized net at the end of a H-tree. Skew can be very
Grid: Relatively independent of the actual small. Large static current (so large power)
distribution of clock load, very robust. Excellent for
8/23/2001 automatic routing Fuding Ge 35
Hierarchical Clock Distribution Scheme Example

Reference: J. Yim, S. Bae, and C. Kyung “A Floorplan-based Planning Methodology for Power and Clock Distribution in
ASICs”, Proc. Design Automation Conference, PP. 766-71, 1999

8/23/2001 Fuding Ge 36
Rules of Thumb

• Distribute a single global clock, then locally derive the


multiple phases near where they are necessary.
• The number of buffering stages in the clock system should
be minimized to reduce the skew introduced by process
variation
• Clock buffer stages should be scattered across the chip to
avoid large RC effects by reducing interconnect lengths.
• An unbalanced clock buffering system is inevitable, clock
skew ranges typically 5%.
• Enough decoupling capacitance should be used to reduce
VCC reduction and ground bounce.
8/23/2001 Fuding Ge 37
Error Analysis

• Assume the clock tree has n stages of inversion, and every


stage can introduce Si ps skew, the standard deviation is:

n
σ = ∑
i =1
s i2

Assume Si = 5 ps,
If n = 6 , σ=12 ps
If n = 8, σ =14 ps

8/23/2001 Fuding Ge 38

You might also like