0% found this document useful (0 votes)
57 views56 pages

Power

power

Uploaded by

Avanish Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views56 pages

Power

power

Uploaded by

Avanish Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

14/2/2004

Interconnect-Power Dissipation
in a Microprocessor

N. Magen, A. Kolodny, U. Weiser, N. Shamir


Intel corporation
Technion - Israel Institute of Technology

14/2/2004

Interconnect-Power Definition
Interconnect-Power is dynamic power consumption
due to interconnect capacitance switching
How much power is consumed by Interconnections ?
Future generations trends ?
How to reduce the interconnect power ?

0.13 m cross-section, source - Intel

14/2/2004

Background
Power is becoming a major design issue
Scope: Dynamic power, the majority of
power
P = S AF iCi V2 f
This work focuses on the capacitance term

14/2/2004

Outline

Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary

14/2/2004

Case study
Low-power, state-of-the-art -processor
Dynamic switching power analysis
Interconnect attributes:

Length
Capacitance
Fan Out (FO)
Hierarchy data
Net type
Activity factors (AF)
Miscellaneous.

14/2/2004

Interconnect Length Model


1

Net model

Total wire length


Stitched across hierarchies
Summed over repeaters

4
6

7
13

wire

diff.

gate

14/2/2004

Activity Factors Generation


Power test vectors generation
(worst case for high power, unit stressing)

RTL full-chip simulation


(results in blocks primary inputs: Activity,Probability)

Monte-Carlo based block inputs generation


(based on the RTL statistics)

Transistor level simulation - per block


(Unit delay, tuning for glitches)

Per node activity factor


Source -Intel Pentium M Processor Power Estimation, Budgeting, Optimization, and Validation, ITJ 2003

14/2/2004

Outline

Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary

14/2/2004

Interconnect Length Distribution


10000

1000

Number of nets

100

10
Pentium 0.5 [um]
Pentium MMX 0.35 [um]
Pentium Pro 0.5 [um]
Pentium II 0.35 [um]
Pentium II 0.25 [um]
Pentium III 0.18 [um]
Low Power Processor 0.13 [um]

0.1
0.01

0.001
1

10

100

1000

Net Length [um]


Source: Shekhar Y. Borkar, CRL - Intel

10000

100000

14/2/2004

10

Interconnect Length Distribution


Nets vs. Net Length

Log Log
scale

1000
Local

100

Global clock
not included

Total

Number of Nets

Exponential
decrease with
length

Global
Total

10

0.1

0.01

0.001
1

10

100

1000

Length [um]

10000

100000

14/2/2004

11

Total Dynamic Power


Total Power vs. Net Length

Peak 2

Total Dynamic
Power

Local
nets = 66%
Global
nets = 34%

Peak 1

90

Local
Interconnect
Global
Total
Total
Total

80

Normalized Dynamic Power

Global clock
not included

100

70

Nets: 390k

Nets: 75k

60

Cap: 10[nF]

Cap: 20[nF]

50

FO: 2

FO: 20

AF: 0.0485

AF: 0.055

40
30
20
10
0
1

10

100

1000

Length [um]

10000

100000

14/2/2004

12

Total Dynamic Power Breakdown


Global clock included

Gate
34%
Interconnect
51%

Diffusion
15%

14/2/2004

13

Power Breakdown by Net Types


Global clock included

global
signals
34%

local signals
27%

global
signals
21%

local signals
37%

global clock
13%
local clock
20%
global clock
19%

local clock
29%

Interconnect power

Total power

(Interconnect only)

(Gate, Diffusion and Interconnect)

14/2/2004

14

Interconnect Power Breakdown


Interconnect consumes 50% of dynamic power
Clock power ~40% (of Interconnect and total)
90% of power consumed by 10% of nets
Interconnect design is NOT power-aware !
Predictive model can project the interconnect power.
Interconnect power

Total power

100 %

90 %

global signals
80 %

34 %

70 %

global clock

60 %

19 %

Interconnect
51%

50 %

Local signals

40 %

local clock

Gate
34%

27 %
30 %

20 %

local signals
Local
clock

20 %
10 %

0%

Diffusion
15%

14/2/2004

15

Outline

Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary

14/2/2004

16

Experiment - Power-Aware Router


Routing Experiment optimizing processors blocks
Local nodes (clock and signals) consume 66% of dynamic power
10% of nets consume 90% of power
Min. spanning trees can save over 20% Interconnect power
Routing with spacing can save up to 40% Interconnect power

Small blocks local clock network

14/2/2004

17

Power-Aware Router Flow


Power grid routing

Clock tree:
high FO, long lines, very active

Avoiding congestion

Clock tree routing


With spacing

Top n% power consuming


signal nets routing

Global and Detailed Routing of the un-routed nets


(timing and congestion driven)

Rip-up: not high power nets

All nets
routed?
Yes

Followed by downsizing

Finish

No Power-aware Rip up
and re-route

14/2/2004

18

Results - Power Saving


Dynamic power saving

60%

Driver Downsizing

50%

Router Power Saving


40%
30%

Downsize
saving

20%

Average

10%

Router
saving

0%
Block A

Block B

Block C

Block D

Block E

Average saving results: 14.3% for ASIC blocks 1


1 - Estimated based on clock interconnect power

14/2/2004

19

Outline

Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary

14/2/2004

20

Future of Interconnect Power


Dynamic Power breakdown

100%
90%

Gate

80%
70%

Diffusion

60%
50%
40%

Interconnect

30%
% G POW

20%

% D POW
% IC POW

10%
0%
0.15

Source - ITRS 2001 Edition adapted data

0.13

0.1

0.09

0.08

0.07

0.065

0.045

0.032

Generation

Technology generation [m]

Interconnect power grows to 65%-80% within 5 years !


(using optimistic interconnect scaling)

0.022

14/2/2004

21

1001

Interconnect length projection


Measured

100 .1

Number of Nets

The number of nets


vs. unit length
Modified Davis model

Number of Nets (normalized)

Interconnect Power Prediction


model

1 0 . 01
0 . 001
0.1

Upper local bound

0 . 0001
0.01

Lower global bound


0 1
0.001

0 . 00001

10

100

Power
Length

1000

10000

100000

[um ]

Dynamic power breakdown


100 %

90 %

Interconnect

80 %

Interconnect
Diff
Gate

Power

Power

The dynamic power


average breakdown

70 %

60 %

50 %

40 %

Diffusion

30 %

20 %

Gate

10 %

0%

Local

Local

Intermediate

Intermediate

Global

Global

14/2/2004

22

Interconnect Power Model


Multiplication of the number of interconnects with power
breakdowns gives:
Projected dynamic power vs. net length

Power (normalized)
Power

Measured power
Projection

0
1

10

100

1000

10000

100000

Length [m]
[ um ]
Length

The power model matches processor power distribution !

14/2/2004

23

Outline

Research methodology
Interconnect Power Analysis
Power-Aware Router Experiment
Interconnect Power Prediction
Summary

14/2/2004

24

Summary

Interconnect is 50% of the dynamic power of processors, and


getting worse.

Clock consumes 40% of interconnect power.

Clock interconnect spacing is suggested

Interconnect power is sum of nearly all net lengths and types.

Interconnect power-aware design is recommended

Router level Interconnect power reduction addresses all

Interconnect power has strong dependency on the hierarchy

Per Hierarchy analysis and optimization algorithms

14/2/2004

25

Future Research
1. Interconnect Power characterization and prediction
2. Investigate Interconnect power reduction techniques:
Interconnect-Spacing for power
Interconnect Power-Aware physical design
Aspect Ratio optimization for power
Architectural communication reduction

14/2/2004

26

Questions ?

14/2/2004

27

BACKUP-Slides

14/2/2004

28

Processor Case Study

Analysis subject: Processor, 0.13 [m]


77 million transistors, die size of 88 [mm2]
Data sources (AF, Capacitance, Length)
Excluded: L2 cache, global clock, analog units

14/2/2004

29

Global Communication
30%

Global Power % vs. Test Power


%Global
Poly
. (%Global
)

25%

Global Power Percent

Global power is
important
Global power is
mostly IC
For higher power
benchmarks
Global power is
higher
G-clock excluded

35%

20%

15%

10%

5%

0%
Total Power
[uW]

14/2/2004

30

Benchmark Selection
High power test benchmarks
Worst case design
Suitable for: thermal design, power grid design
Average power is a fraction of peak power

Unit stressing benchmarks


Averaging of all high power benchmarks
High node coverage

ITC logo

14/2/2004

31

Interconnect Power Implications


Interconnect power can be reduced by
minimizing switched capacitance:

Fabrication process (wire parameters)


Power-driven physical design
Logic optimization for power
Architectural interconnect optimization

14/2/2004

32

Interconnect Capacitance
Side-cap is increasing:
70% to 80%
self-cap.
Layer 3

Global Capacitance breakdown


100%
90%
80%
70%
60%

V-cap
H-cap

50%

H-cap

Layer 2

Side-cap.

V-cap

40%
30%
V-cap

20%

H-cap

10%

Layer 1
0%
0.15

0.13

0.1

0.09

0.08

0.07

0.065

0.045

0.032

Generation

Source - ITRS 2001 Edition adapted data

Technology generation [m]

The majority of interconnect capacitance is side-capacitance !

0.022

14/2/2004

33

Fabrication Process
Aspect Ratio (AR)
Interconnect AR =

Thickness
Width

Thickness
Width

Low AR = Low Interconnect power


Low AR = High resistance
Frequency Modeling

Local
Rint

Rint

Rint

Rint

Rint

Rint

Rint

...
Cint

Cint

Local: average gate, average IC


Global: optimally buffered global IC

Cint

Cint

Cint

Cint

Cint

Global
Lcrit

Lcrit

Lcrit

Rint

Rint

Rint

Lcrit
Rint

...
Cint

Cint

Cint

Cint

14/2/2004

34

Aspect Ratio Trade offs


Freq. And Power vs. Relative AR
Power depends on cap.
Frequency:
Local gates and IC cap.
Global mostly IC RC

Per layer AR optimization !


Scaling ? more power save,
less frequency loss

120 %

110 %

Local path speed

100 %

90%

Power
Frequency - Local

80%

Frequency - Global
70%

Global path speed

Dynamic Power

60%
50.0%

62 .5%

75.0%

87 .5%

100.0%

112 .5%

125 .0%

137.5%

150 .0%

Relative AR
Relative AR

Aspect Ratio optimization can save over 10% of dynamic power !

14/2/2004

35

Physical Design - Spacing


0.13 [m] global IC cap. vs. spacing
Spacing can save up to 40%

120%

Global capacitance

Spacing advantages: scaling,


frequency, reliability, noise,
easy to modify

min. space

Relative capacitance

About 30% is with double space

Capacitance

100%

80%

60%

40%

20%

2X

3X

4X

0%
1

1.5

2.5

3.5

Spacing

Spacing

Wire spacing can save up to 20% of the dynamic power !

4.5

14/2/2004

36

Spacing calculation
Back of an envelope estimation:
10% of Interconnect ? 90% power
X2 spacing = extra 20% wiring
Global clock not spaced (inductance)
Global clock is 20% of interconnect power
Save: 30% of (90%-20%) = 20%
Interconnect is 50% ? 10% power save
Expected 20% with downsizing
Minor losses - congestion

14/2/2004

37

-Architecture - CMP
Comparing two scaling
methods, by IC power.

Gen. 1

Uniprocessor

P`
P

Gen. 2

L2`

IC - predicted by Rent
CMP
L2
P P
L2 - identical, minor
L2`
Clock - Identical !
Same average AF.
Result ~5% less dynamic power for CMP

14/2/2004

38

Power critical
vs.
Timing critical
100%

90%
acummulated power

80%

Accumulated Power

70%

60%

50%

40%

30%

20%

10%

0%

Timing critical

Slack [ps]

14/2/2004

39

Outline

Research methodology
Interconnect Power Analysis
Future Trends Analysis
Interconnect Power Implications
Summary

14/2/2004

40

Interconnect Length Prediction


Technology projections - ITRS
Interconnect length predictions:

ITRS model: 1/3 of the routing space


- most optimistic
Davis model:
o Rents rule based
o Predicts number of nets as function of:
the number of gates and complexity factors

Time

Models calibrated based on the case study

14/2/2004

41

Rents parameters
Rents rule: T = k N r
T
N
K
r

= # of I/O terminals (pins)


= # of gates
= avg. I/Os per gate
= Rents exponent
can be: 0 < r < 1 , but common (simple) 0.5 < r < 0.75 (complex)

N gates

T terminals

14/2/2004

42

Donaths length estimation model


For the i-th level:
There are 4i blocks
r

N
For each block there are: k i terminal s
4
r

k N
Assuming two terminal nets : i nets
2 4

The nets of the i-1 level must be substracted.


r

i k N
i 1 k N
i k N
Nets for level i : ni= 4 i - 4 i 1 = 4 i 1 4r 1
2 4
2 4
2 4

14/2/2004

43

Average interconnection length


The wires can be of two types A and D.

[ + i

iB + jB jA ]

4
1
=

4
3
3

[2 + iA + jA iB jB ]
LD = i
A =1 j A =1iB =1 jB =1
= 2
4

Taken from a SLIP 2001 tutorial by Dirk Stroobandt


14
2

The average: ri= 9 9


I
ni ri

2 N r 0.5 1 1 N r 1.5 1 4 r 1
i =1
Overall : R = I
equals

7 r 0.5

r 1.5
r 1
9 4
1 1 4
1 N
n
i

LA =

i A =1 j A =1iB =1 jB =1

i =1

14/2/2004

44

Davis Model
From Rents rule:

IDF:
1 l N

Tr = r N P

l3

r
2 N l 2 + 2 N l l 2 p 4
: 2
3

i (l ) =
3
r
N l 2 N :
2 p 4
2
N
l
l

Where: =

FO
FO + 1

, =

2 N (1 N P 1 )

1 + 2 p 22 p1
1
2 N
N

3
6

p
2

1
p

1
(
)
(
)
(
)

Interconnect total number and length:


Nets: I = i ( ) d Length: L = i ( ) d
Multipoint Length: L
= L where
2 N

2 N

total

total

multi_terminal

total

4
FO +3

14/2/2004

45

Davis Model - extension


Constant factor favors shorter nets.
Short P2P net has higher chance to be a part
of a multipoint net.
number of point to point nets shorter than l
multi-term
inal
factor(
l
)
=
Correction factor:
total point to point nets
1
I
l
=
g i ( )gmulti-terminal factor( )gd
(
)
Length:
FO
l

multi-terminal

1000000
Measured
Extracted

100000

Extended Davis Model

Davis Model

0.1

Number of Nets

10000

Nets

0 .01

0. 001

1000

100

0.0001

10

0 .00001
1

10

100

1000

Length

[um ]

10000

100000

10

100

1000

Length |um|

10000

100000

14/2/2004

46

RMST - Example

14/2/2004

47

Total Dynamic Power


6

Global clock
not included
Local
nets = 66%
Global
nets = 34%

Power
(normalized)
Power

Total Dynamic
Power

Total Power vs. Net Length


TOTAL
Total_IC

Interconnect

0
1

10

100
Length [um]
Length
[m]

1000

10000

100000

14/2/2004

48

Local and Global IC


Local Power breakdown vs. Net Length
100 %
IC
Diff
Gate

80 %

Power

60 %

40 %

20 %

0%
4 . 16

8 . 32

16 . 64

32 . 864

65 . 728

131 . 456

262 . 496

523 . 744
Length

1044 . 99 2084 . 99

4160

8300

. 45 16561

.4

33930

83850

[um ]

Global Power breakdown vs. Net Length

100 %

IC
Diff

80 %

Gate

60 %

Power

Local and Global


IC are different:
Number by
Length
breakdown
IC breakdown
cap and power
Fan out
Metal usage
AF is similar

40 %

20 %

0%
4 . 16

8 . 32

16 . 64

32 . 864

65 . 728

131 . 456

262 . 496

523 . 744
Length

1044 . 99
[ um ]

2084 .99

4160

8300 . 45

16561

.4

33930

83850

14/2/2004

49

Benchmarks Comparison
Global Dynamic power vs. Length

45

40

High Power Tests


Benchmarks

35

Power

30

25

20

15

10

0
1

10

100

1000

10000

100000

1000000

Length [um ]

High power tests show similar behavior to average SPEC !

14/2/2004

50

Interconnect Peaks
Total wire length vs. Length
Measured

Davis Model

Total wire length

3
2
1
0
1

10
10
10

100

100

1000

Length

[ um

Length [m]

1000

10000

100000

10000

100000

10000

100000

Average gate sizing vs. Length

1 .8
Average gate sizing

1 .6

Relative sizing

1 .4

1 .2

0 .8

0 .6

0 .4

0 .2

0
1

10
10

100
100

1000
1000

Length
[ um
Length
[m]]

10000

100000

14/2/2004

51

ITRS Power Trends


The ITRS power projection interconnect power
reduction that happens in 2006-2007 is based on:
1. Aggressive voltage reduction
2. Low-k dielectric improvements
The devices capacitance increase by +30% (trend -15%)
The combined effect:

Interconnect power reduction (relative to voltage)

Device power remains constant

14/2/2004

52

Dynamic power - ITRS trend


Dynamic power projection

600 . 00

IC
Diff
Gate

400 . 00

300 . 00

[
W
]

Power (normalized)

500 . 00

200 . 00

100 . 00

0 . 00
0 .15

0 . 13

0 .1

0 . 09

0 . 08

0 . 07

0 . 065

0 . 045

Technology generation [m]


Generation

1 / 2 min pitch

The Black curve is the ITRS maximum heat removal capabilities

0 . 032

0 . 022

14/2/2004

53

Power-Aware Flow
Placement

The reduced IC cap allows for


driver downsizing
On average it reduced the
dynamic power by 1.4 of the IC
power saving
Downsizing is timing verified
Cells downsizing reduced the
total area and leakage by 0.4%
No signal spacing was applied
over 30% unused metal
Post-layout optimization are
possible

Power-aware Routing

RC Extraction
Timing driven driver upsizing

Timing Analysis,
Power Analysis

All slacks
positive
?
Yes
Power driven
Timing constrained
driver downsizing

Yes

Sizing
modified
?
No
Finish

No

14/2/2004

54

FUBS description
A medium, randomly picked
B small, highest clock power
C small, good potential
D medium, good potential
E worse than average
Block Name
2

Area [m ]
Devices
Inactive Nodes
Power [uW]

Block A

Block B

Block C

Block D

Block E

AVERAGE

138801.6

101274.6

65816.1

164229.1

14574

8644

7618

18194

59766.3 209537.8
6109

16675

63.66%

98.78%

82.36%

39.22%

35.38%

52.94%

17170.22

251.15

1786.76

11811.90

6757.11

15373.86

14.3%

17%

22%

29%

4.1%

17%

RMST potential
power saving
Clock cap.

11.25%

2.59%

12.75%

13.16%

3.27%

8.01%

Clock power

72.10%

99.99%

96.46%

94.99%

33.84%

60.47%

IC cap.

34.00%

27.70%

38.14%

36.05%

29.86%

34.67%

IC power

28.89%

59.54%

46.74%

48.62%

40.65%

36.83%

Clock IC power

20.19%

59.54%

45.48%

46.26%

16.87%

23.87%

Clock IC length

1.71%

2.34%

2.05%

2.09%

0.74%

3.85%

82.23%

113.15%

87.46%

83.74%

85.97%

88.46%

Relative Capacitance per


Length Unit.

14/2/2004

55

Miller Factor - Power

R1

V1

C
R2

Opposite direction switching-

d Vc
dQ d (C Vc )
I
=
=
=
C
The current:
c
dt
dt
dt
T
T
Vdd
d Vc
Energy: Ec = I c Vdd dt = C
Vdd dt = C Vdd dvc = 2 C Vdd2
dt
0
0
Vdd
V2

That is 4 times a single switching energy.


Decoupling by Miller factor of 2.
Same direction switching => no current.
Decoupling by Miller factor of 0.
Average case: Miller factor of 1 suitable for poweraverage case sum metric.

14/2/2004

56

Routing Model

Via blockage: Layer multiplier = (1 - blocking fraction)


Router efficiency: 0.6
Power grid:
20% of routing
Clock grid:
10% of top tier

More accurate than ITRS 2001.

Low layer pitch


High layer pitch

You might also like