14/2/2004
Interconnect-Power Dissipation
in a Microprocessor
N. Magen, A. Kolodny, U. Weiser, N. Shamir
Intel corporation
Technion - Israel Institute of Technology
14/2/2004
Interconnect-Power Definition
Interconnect-Power is dynamic power consumption
due to interconnect capacitance switching
How much power is consumed by Interconnections ?
Future generations trends ?
How to reduce the interconnect power ?
0.13 m cross-section, source - Intel
14/2/2004
Background
Power is becoming a major design issue
Scope: Dynamic power, the majority of
power
P = S AF iCi V2 f
This work focuses on the capacitance term
14/2/2004
Outline
Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary
14/2/2004
Case study
Low-power, state-of-the-art -processor
Dynamic switching power analysis
Interconnect attributes:
Length
Capacitance
Fan Out (FO)
Hierarchy data
Net type
Activity factors (AF)
Miscellaneous.
14/2/2004
Interconnect Length Model
1
Net model
Total wire length
Stitched across hierarchies
Summed over repeaters
4
6
7
13
wire
diff.
gate
14/2/2004
Activity Factors Generation
Power test vectors generation
(worst case for high power, unit stressing)
RTL full-chip simulation
(results in blocks primary inputs: Activity,Probability)
Monte-Carlo based block inputs generation
(based on the RTL statistics)
Transistor level simulation - per block
(Unit delay, tuning for glitches)
Per node activity factor
Source -Intel Pentium M Processor Power Estimation, Budgeting, Optimization, and Validation, ITJ 2003
14/2/2004
Outline
Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary
14/2/2004
Interconnect Length Distribution
10000
1000
Number of nets
100
10
Pentium 0.5 [um]
Pentium MMX 0.35 [um]
Pentium Pro 0.5 [um]
Pentium II 0.35 [um]
Pentium II 0.25 [um]
Pentium III 0.18 [um]
Low Power Processor 0.13 [um]
0.1
0.01
0.001
1
10
100
1000
Net Length [um]
Source: Shekhar Y. Borkar, CRL - Intel
10000
100000
14/2/2004
10
Interconnect Length Distribution
Nets vs. Net Length
Log Log
scale
1000
Local
100
Global clock
not included
Total
Number of Nets
Exponential
decrease with
length
Global
Total
10
0.1
0.01
0.001
1
10
100
1000
Length [um]
10000
100000
14/2/2004
11
Total Dynamic Power
Total Power vs. Net Length
Peak 2
Total Dynamic
Power
Local
nets = 66%
Global
nets = 34%
Peak 1
90
Local
Interconnect
Global
Total
Total
Total
80
Normalized Dynamic Power
Global clock
not included
100
70
Nets: 390k
Nets: 75k
60
Cap: 10[nF]
Cap: 20[nF]
50
FO: 2
FO: 20
AF: 0.0485
AF: 0.055
40
30
20
10
0
1
10
100
1000
Length [um]
10000
100000
14/2/2004
12
Total Dynamic Power Breakdown
Global clock included
Gate
34%
Interconnect
51%
Diffusion
15%
14/2/2004
13
Power Breakdown by Net Types
Global clock included
global
signals
34%
local signals
27%
global
signals
21%
local signals
37%
global clock
13%
local clock
20%
global clock
19%
local clock
29%
Interconnect power
Total power
(Interconnect only)
(Gate, Diffusion and Interconnect)
14/2/2004
14
Interconnect Power Breakdown
Interconnect consumes 50% of dynamic power
Clock power ~40% (of Interconnect and total)
90% of power consumed by 10% of nets
Interconnect design is NOT power-aware !
Predictive model can project the interconnect power.
Interconnect power
Total power
100 %
90 %
global signals
80 %
34 %
70 %
global clock
60 %
19 %
Interconnect
51%
50 %
Local signals
40 %
local clock
Gate
34%
27 %
30 %
20 %
local signals
Local
clock
20 %
10 %
0%
Diffusion
15%
14/2/2004
15
Outline
Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary
14/2/2004
16
Experiment - Power-Aware Router
Routing Experiment optimizing processors blocks
Local nodes (clock and signals) consume 66% of dynamic power
10% of nets consume 90% of power
Min. spanning trees can save over 20% Interconnect power
Routing with spacing can save up to 40% Interconnect power
Small blocks local clock network
14/2/2004
17
Power-Aware Router Flow
Power grid routing
Clock tree:
high FO, long lines, very active
Avoiding congestion
Clock tree routing
With spacing
Top n% power consuming
signal nets routing
Global and Detailed Routing of the un-routed nets
(timing and congestion driven)
Rip-up: not high power nets
All nets
routed?
Yes
Followed by downsizing
Finish
No Power-aware Rip up
and re-route
14/2/2004
18
Results - Power Saving
Dynamic power saving
60%
Driver Downsizing
50%
Router Power Saving
40%
30%
Downsize
saving
20%
Average
10%
Router
saving
0%
Block A
Block B
Block C
Block D
Block E
Average saving results: 14.3% for ASIC blocks 1
1 - Estimated based on clock interconnect power
14/2/2004
19
Outline
Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary
14/2/2004
20
Future of Interconnect Power
Dynamic Power breakdown
100%
90%
Gate
80%
70%
Diffusion
60%
50%
40%
Interconnect
30%
% G POW
20%
% D POW
% IC POW
10%
0%
0.15
Source - ITRS 2001 Edition adapted data
0.13
0.1
0.09
0.08
0.07
0.065
0.045
0.032
Generation
Technology generation [m]
Interconnect power grows to 65%-80% within 5 years !
(using optimistic interconnect scaling)
0.022
14/2/2004
21
1001
Interconnect length projection
Measured
100 .1
Number of Nets
The number of nets
vs. unit length
Modified Davis model
Number of Nets (normalized)
Interconnect Power Prediction
model
1 0 . 01
0 . 001
0.1
Upper local bound
0 . 0001
0.01
Lower global bound
0 1
0.001
0 . 00001
10
100
Power
Length
1000
10000
100000
[um ]
Dynamic power breakdown
100 %
90 %
Interconnect
80 %
Interconnect
Diff
Gate
Power
Power
The dynamic power
average breakdown
70 %
60 %
50 %
40 %
Diffusion
30 %
20 %
Gate
10 %
0%
Local
Local
Intermediate
Intermediate
Global
Global
14/2/2004
22
Interconnect Power Model
Multiplication of the number of interconnects with power
breakdowns gives:
Projected dynamic power vs. net length
Power (normalized)
Power
Measured power
Projection
0
1
10
100
1000
10000
100000
Length [m]
[ um ]
Length
The power model matches processor power distribution !
14/2/2004
23
Outline
Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary
14/2/2004
24
Summary
Interconnect is 50% of the dynamic power of processors, and
getting worse.
Clock consumes 40% of interconnect power.
Clock interconnect spacing is suggested
Interconnect power is sum of nearly all net lengths and types.
Interconnect power-aware design is recommended
Router level Interconnect power reduction addresses all
Interconnect power has strong dependency on the hierarchy
Per Hierarchy analysis and optimization algorithms
14/2/2004
25
Future Research
1. Interconnect Power characterization and prediction
2. Investigate Interconnect power reduction techniques:
Interconnect-Spacing for power
Interconnect Power-Aware physical design
Aspect Ratio optimization for power
Architectural communication reduction
14/2/2004
26
Questions ?
14/2/2004
27
BACKUP-Slides
14/2/2004
28
Processor Case Study
Analysis subject: Processor, 0.13 [m]
77 million transistors, die size of 88 [mm2]
Data sources (AF, Capacitance, Length)
Excluded: L2 cache, global clock, analog units
14/2/2004
29
Global Communication
30%
Global Power % vs. Test Power
%Global
Poly
. (%Global
)
25%
Global Power Percent
Global power is
important
Global power is
mostly IC
For higher power
benchmarks
Global power is
higher
G-clock excluded
35%
20%
15%
10%
5%
0%
Total Power
[uW]
14/2/2004
30
Benchmark Selection
High power test benchmarks
Worst case design
Suitable for: thermal design, power grid design
Average power is a fraction of peak power
Unit stressing benchmarks
Averaging of all high power benchmarks
High node coverage
ITC logo
14/2/2004
31
Interconnect Power Implications
Interconnect power can be reduced by
minimizing switched capacitance:
Fabrication process (wire parameters)
Power-driven physical design
Logic optimization for power
Architectural interconnect optimization
14/2/2004
32
Interconnect Capacitance
Side-cap is increasing:
70% to 80%
self-cap.
Layer 3
Global Capacitance breakdown
100%
90%
80%
70%
60%
V-cap
H-cap
50%
H-cap
Layer 2
Side-cap.
V-cap
40%
30%
V-cap
20%
H-cap
10%
Layer 1
0%
0.15
0.13
0.1
0.09
0.08
0.07
0.065
0.045
0.032
Generation
Source - ITRS 2001 Edition adapted data
Technology generation [m]
The majority of interconnect capacitance is side-capacitance !
0.022
14/2/2004
33
Fabrication Process
Aspect Ratio (AR)
Interconnect AR =
Thickness
Width
Thickness
Width
Low AR = Low Interconnect power
Low AR = High resistance
Frequency Modeling
Local
Rint
Rint
Rint
Rint
Rint
Rint
Rint
...
Cint
Cint
Local: average gate, average IC
Global: optimally buffered global IC
Cint
Cint
Cint
Cint
Cint
Global
Lcrit
Lcrit
Lcrit
Rint
Rint
Rint
Lcrit
Rint
...
Cint
Cint
Cint
Cint
14/2/2004
34
Aspect Ratio Trade offs
Freq. And Power vs. Relative AR
Power depends on cap.
Frequency:
Local gates and IC cap.
Global mostly IC RC
Per layer AR optimization !
Scaling ? more power save,
less frequency loss
120 %
110 %
Local path speed
100 %
90%
Power
Frequency - Local
80%
Frequency - Global
70%
Global path speed
Dynamic Power
60%
50.0%
62 .5%
75.0%
87 .5%
100.0%
112 .5%
125 .0%
137.5%
150 .0%
Relative AR
Relative AR
Aspect Ratio optimization can save over 10% of dynamic power !
14/2/2004
35
Physical Design - Spacing
0.13 [m] global IC cap. vs. spacing
Spacing can save up to 40%
120%
Global capacitance
Spacing advantages: scaling,
frequency, reliability, noise,
easy to modify
min. space
Relative capacitance
About 30% is with double space
Capacitance
100%
80%
60%
40%
20%
2X
3X
4X
0%
1
1.5
2.5
3.5
Spacing
Spacing
Wire spacing can save up to 20% of the dynamic power !
4.5
14/2/2004
36
Spacing calculation
Back of an envelope estimation:
10% of Interconnect ? 90% power
X2 spacing = extra 20% wiring
Global clock not spaced (inductance)
Global clock is 20% of interconnect power
Save: 30% of (90%-20%) = 20%
Interconnect is 50% ? 10% power save
Expected 20% with downsizing
Minor losses - congestion
14/2/2004
37
-Architecture - CMP
Comparing two scaling
methods, by IC power.
Gen. 1
Uniprocessor
P`
P
Gen. 2
L2`
IC - predicted by Rent
CMP
L2
P P
L2 - identical, minor
L2`
Clock - Identical !
Same average AF.
Result ~5% less dynamic power for CMP
14/2/2004
38
Power critical
vs.
Timing critical
100%
90%
acummulated power
80%
Accumulated Power
70%
60%
50%
40%
30%
20%
10%
0%
Timing critical
Slack [ps]
14/2/2004
39
Outline
Research methodology
Interconnect Power Analysis
Future Trends Analysis
Interconnect Power Implications
Summary
14/2/2004
40
Interconnect Length Prediction
Technology projections - ITRS
Interconnect length predictions:
ITRS model: 1/3 of the routing space
- most optimistic
Davis model:
o Rents rule based
o Predicts number of nets as function of:
the number of gates and complexity factors
Time
Models calibrated based on the case study
14/2/2004
41
Rents parameters
Rents rule: T = k N r
T
N
K
r
= # of I/O terminals (pins)
= # of gates
= avg. I/Os per gate
= Rents exponent
can be: 0 < r < 1 , but common (simple) 0.5 < r < 0.75 (complex)
N gates
T terminals
14/2/2004
42
Donaths length estimation model
For the i-th level:
There are 4i blocks
r
N
For each block there are: k i terminal s
4
r
k N
Assuming two terminal nets : i nets
2 4
The nets of the i-1 level must be substracted.
r
i k N
i 1 k N
i k N
Nets for level i : ni= 4 i - 4 i 1 = 4 i 1 4r 1
2 4
2 4
2 4
14/2/2004
43
Average interconnection length
The wires can be of two types A and D.
[ + i
iB + jB jA ]
4
1
=
4
3
3
[2 + iA + jA iB jB ]
LD = i
A =1 j A =1iB =1 jB =1
= 2
4
Taken from a SLIP 2001 tutorial by Dirk Stroobandt
14
2
The average: ri= 9 9
I
ni ri
2 N r 0.5 1 1 N r 1.5 1 4 r 1
i =1
Overall : R = I
equals
7 r 0.5
r 1.5
r 1
9 4
1 1 4
1 N
n
i
LA =
i A =1 j A =1iB =1 jB =1
i =1
14/2/2004
44
Davis Model
From Rents rule:
IDF:
1 l N
Tr = r N P
l3
r
2 N l 2 + 2 N l l 2 p 4
: 2
3
i (l ) =
3
r
N l 2 N :
2 p 4
2
N
l
l
Where: =
FO
FO + 1
, =
2 N (1 N P 1 )
1 + 2 p 22 p1
1
2 N
N
3
6
p
2
1
p
1
(
)
(
)
(
)
Interconnect total number and length:
Nets: I = i ( ) d Length: L = i ( ) d
Multipoint Length: L
= L where
2 N
2 N
total
total
multi_terminal
total
4
FO +3
14/2/2004
45
Davis Model - extension
Constant factor favors shorter nets.
Short P2P net has higher chance to be a part
of a multipoint net.
number of point to point nets shorter than l
multi-term
inal
factor(
l
)
=
Correction factor:
total point to point nets
1
I
l
=
g i ( )gmulti-terminal factor( )gd
(
)
Length:
FO
l
multi-terminal
1000000
Measured
Extracted
100000
Extended Davis Model
Davis Model
0.1
Number of Nets
10000
Nets
0 .01
0. 001
1000
100
0.0001
10
0 .00001
1
10
100
1000
Length
[um ]
10000
100000
10
100
1000
Length |um|
10000
100000
14/2/2004
46
RMST - Example
14/2/2004
47
Total Dynamic Power
6
Global clock
not included
Local
nets = 66%
Global
nets = 34%
Power
(normalized)
Power
Total Dynamic
Power
Total Power vs. Net Length
TOTAL
Total_IC
Interconnect
0
1
10
100
Length [um]
Length
[m]
1000
10000
100000
14/2/2004
48
Local and Global IC
Local Power breakdown vs. Net Length
100 %
IC
Diff
Gate
80 %
Power
60 %
40 %
20 %
0%
4 . 16
8 . 32
16 . 64
32 . 864
65 . 728
131 . 456
262 . 496
523 . 744
Length
1044 . 99 2084 . 99
4160
8300
. 45 16561
.4
33930
83850
[um ]
Global Power breakdown vs. Net Length
100 %
IC
Diff
80 %
Gate
60 %
Power
Local and Global
IC are different:
Number by
Length
breakdown
IC breakdown
cap and power
Fan out
Metal usage
AF is similar
40 %
20 %
0%
4 . 16
8 . 32
16 . 64
32 . 864
65 . 728
131 . 456
262 . 496
523 . 744
Length
1044 . 99
[ um ]
2084 .99
4160
8300 . 45
16561
.4
33930
83850
14/2/2004
49
Benchmarks Comparison
Global Dynamic power vs. Length
45
40
High Power Tests
Benchmarks
35
Power
30
25
20
15
10
0
1
10
100
1000
10000
100000
1000000
Length [um ]
High power tests show similar behavior to average SPEC !
14/2/2004
50
Interconnect Peaks
Total wire length vs. Length
Measured
Davis Model
Total wire length
3
2
1
0
1
10
10
10
100
100
1000
Length
[ um
Length [m]
1000
10000
100000
10000
100000
10000
100000
Average gate sizing vs. Length
1 .8
Average gate sizing
1 .6
Relative sizing
1 .4
1 .2
0 .8
0 .6
0 .4
0 .2
0
1
10
10
100
100
1000
1000
Length
[ um
Length
[m]]
10000
100000
14/2/2004
51
ITRS Power Trends
The ITRS power projection interconnect power
reduction that happens in 2006-2007 is based on:
1. Aggressive voltage reduction
2. Low-k dielectric improvements
The devices capacitance increase by +30% (trend -15%)
The combined effect:
Interconnect power reduction (relative to voltage)
Device power remains constant
14/2/2004
52
Dynamic power - ITRS trend
Dynamic power projection
600 . 00
IC
Diff
Gate
400 . 00
300 . 00
[
W
]
Power (normalized)
500 . 00
200 . 00
100 . 00
0 . 00
0 .15
0 . 13
0 .1
0 . 09
0 . 08
0 . 07
0 . 065
0 . 045
Technology generation [m]
Generation
1 / 2 min pitch
The Black curve is the ITRS maximum heat removal capabilities
0 . 032
0 . 022
14/2/2004
53
Power-Aware Flow
Placement
The reduced IC cap allows for
driver downsizing
On average it reduced the
dynamic power by 1.4 of the IC
power saving
Downsizing is timing verified
Cells downsizing reduced the
total area and leakage by 0.4%
No signal spacing was applied
over 30% unused metal
Post-layout optimization are
possible
Power-aware Routing
RC Extraction
Timing driven driver upsizing
Timing Analysis,
Power Analysis
All slacks
positive
?
Yes
Power driven
Timing constrained
driver downsizing
Yes
Sizing
modified
?
No
Finish
No
14/2/2004
54
FUBS description
A medium, randomly picked
B small, highest clock power
C small, good potential
D medium, good potential
E worse than average
Block Name
2
Area [m ]
Devices
Inactive Nodes
Power [uW]
Block A
Block B
Block C
Block D
Block E
AVERAGE
138801.6
101274.6
65816.1
164229.1
14574
8644
7618
18194
59766.3 209537.8
6109
16675
63.66%
98.78%
82.36%
39.22%
35.38%
52.94%
17170.22
251.15
1786.76
11811.90
6757.11
15373.86
14.3%
17%
22%
29%
4.1%
17%
RMST potential
power saving
Clock cap.
11.25%
2.59%
12.75%
13.16%
3.27%
8.01%
Clock power
72.10%
99.99%
96.46%
94.99%
33.84%
60.47%
IC cap.
34.00%
27.70%
38.14%
36.05%
29.86%
34.67%
IC power
28.89%
59.54%
46.74%
48.62%
40.65%
36.83%
Clock IC power
20.19%
59.54%
45.48%
46.26%
16.87%
23.87%
Clock IC length
1.71%
2.34%
2.05%
2.09%
0.74%
3.85%
82.23%
113.15%
87.46%
83.74%
85.97%
88.46%
Relative Capacitance per
Length Unit.
14/2/2004
55
Miller Factor - Power
R1
V1
C
R2
Opposite direction switching-
d Vc
dQ d (C Vc )
I
=
=
=
C
The current:
c
dt
dt
dt
T
T
Vdd
d Vc
Energy: Ec = I c Vdd dt = C
Vdd dt = C Vdd dvc = 2 C Vdd2
dt
0
0
Vdd
V2
That is 4 times a single switching energy.
Decoupling by Miller factor of 2.
Same direction switching => no current.
Decoupling by Miller factor of 0.
Average case: Miller factor of 1 suitable for poweraverage case sum metric.
14/2/2004
56
Routing Model
Via blockage: Layer multiplier = (1 - blocking fraction)
Router efficiency: 0.6
Power grid:
20% of routing
Clock grid:
10% of top tier
More accurate than ITRS 2001.
Low layer pitch
High layer pitch