Embedded Systems 9.
Low Power Design
Lothar Thiele
Swiss Federal Institute of Technology
9-1
Computer Engineering and Networks Laboratory
Contents of Course
1. Embedded Systems Introduction 2. Software Introduction 3. Real-Time Models 4. Periodic/Aperiodic Tasks 5. Resource Sharing 6. Real-Time OS 12. Model Based Design 7. System Components 8. Communication 9. Low Power Design 10. Models 11. Architecture Synthesis
Software and Programming
Swiss Federal Institute of Technology
Processing and Communication
9-2
Hardware
Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-3
Computer Engineering and Networks Laboratory
Power and Energy Consumption
Need for efficiency (power and energy):
Power is considered as the most important constraint in embedded systems. [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW] Power demands are increasing rapidly, yet battery capacity cannot [in Diztel et al.: Power-Aware Architecting for data-dominated applications, 2007, Springer] keep up.
Swiss Federal Institute of Technology 9-4 Computer Engineering and Networks Laboratory
Implementation Alternatives
General-purpose processors
Performance Power Efficiency
Application-specific instruction set processors (ASIPs) Microcontroller DSPs (digital signal processors)
Flexibility
Programmable hardware FPGA (field-programmable gate arrays)
Application-specific integrated circuits (ASICs)
Swiss Federal Institute of Technology 9-5 Computer Engineering and Networks Laboratory
The Power/Flexibility Conflict
10 1 0.1 0.01 Operations/Watt [MOPS/mW] DSP-ASIPs Ps poor design techniques Technology
1.0
0.5
0.25
0.13
0.07
Necessary to optimize HW and SW. Use heterogeneous architectures. Apply specialization techniques.
Swiss Federal Institute of Technology 9-6
[H. de Man, Keynote, DATE02; T. Claasen, ISSCC99]
Computer Engineering and Networks Laboratory
Energy Efficiency
Hugo De Man, IMEC, Philips, 2007
Swiss Federal Institute of Technology
9-7
Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-8
Computer Engineering and Networks Laboratory
Power and Energy are Related
P
E t In many cases, faster execution also means less energy, but the opposite may be true if power has to be increased to allow faster execution.
Swiss Federal Institute of Technology 9-9 Computer Engineering and Networks Laboratory
Low Power vs. Low Energy
Minimizing the power consumption is important for
the design of the power supply the design of voltage regulators the dimensioning of interconnect cooling (short term cooling)
high cost (estimated to be rising at $1 to $3 per Watt for heat dissipation [Skadron et al. ISCA 2003]) limited space
Minimizing the energy consumption is important due to
restricted availability of energy (mobile systems) limited battery capacities (only slowly improving) very high costs of energy (solar panels, in space) long lifetimes, low temperatures
9-10 Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology
Power Consumption of a CMOS Gate
subthreshold and gate-oxide leakage
Ileak : leakage current Iint : short circuit current Isw : switching current
Swiss Federal Institute of Technology
9-11
Computer Engineering and Networks Laboratory
Power Consumption of CMOS Processors
Main sources:
Dynamic power consumption
charging and discharging capacitors
Short circuit power consumption
short circuit path between supply rails during switching
Leakage
leaking diodes and translators becomes one of the major factors due to shrinking feature sizes in semiconductor technology
Swiss Federal Institute of Technology
9-12
Computer Engineering and Networks Laboratory
Dynamic Voltage Scaling (DVS)
Power consumption of CMOS circuits (ignoring leakage): Delay for CMOS circuits:
: supply voltage : switching activity : load capacity : clock frequency
: supply voltage : threshold voltage
Decreasing Vdd reduces P quadratically (f constant). The gate delay increases only reciprocally. Maximal frequency fmax decreases linearly.
Swiss Federal Institute of Technology 9-13 Computer Engineering and Networks Laboratory
Potential for Energy Optimization: DVS
Saving energy for a given task: Reduce the supply voltage Vdd Reduce switching activity Reduce the load capacitance CL Reduce the number of cycles #cycles
Swiss Federal Institute of Technology
9-14
Computer Engineering and Networks Laboratory
Example: Voltage Scaling
[Courtesy, Yasuura, 2000]
Swiss Federal Institute of Technology 9-15
Vdd
Computer Engineering and Networks Laboratory
Power Supply Gating
Power gating is one of the most effective ways of minimizing static power consumption (leakage)
Cut-off power supply to inactive units/components Reduces leakage
Swiss Federal Institute of Technology
9-16
Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-17
Computer Engineering and Networks Laboratory
Use of Parallelism
Vdd fmax Vdd/2 fmax/2 Vdd/2 fmax/2
Swiss Federal Institute of Technology
9-18
Computer Engineering and Networks Laboratory
Use of Pipelining
Vdd fmax Vdd/2 fmax/2 Vdd/2 fmax/2
Swiss Federal Institute of Technology
9-19
Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-20
Computer Engineering and Networks Laboratory
New ideas help
Pentium Crusoe
Running the same multimedia application.
As published by Transmeta [www.transmeta.com]
Swiss Federal Institute of Technology 9-21 Computer Engineering and Networks Laboratory
VLIW Architectures
Large degree of parallelism
many computational units, (deeply) pipelined
Simple hardware architecture
explicit parallelism (parallel instruction set) parallelization is done offline (compiler)
Swiss Federal Institute of Technology
9-22
Computer Engineering and Networks Laboratory
Transmeta is a typical VLIW Architecture
Swiss Federal Institute of Technology
9-23
Computer Engineering and Networks Laboratory
Transmeta
VLIW
(VLIW)
Swiss Federal Institute of Technology 9-24 Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-25
Computer Engineering and Networks Laboratory
Spatial vs. Dynamic Voltage Management
Slow Module 1.3V 50MHz Normal Mode 1.3 V 50MHz
Standard Modules 1.8V 100MHz
Busy Module 3.3V 200MHz
Busy Mode 3.3 V 200MHz
Not all components require same performance.
Swiss Federal Institute of Technology 9-26
Required performance may change over time
Computer Engineering and Networks Laboratory
Potential for Energy Optimization: DVS
Saving energy for a given task: Reduce the supply voltage Vdd Reduce switching activity Reduce the load capacitance CL Reduce the number of cycles #cycles
Swiss Federal Institute of Technology
9-27
Computer Engineering and Networks Laboratory
Example: INTEL Xscale
OS should schedule distribution of the energy budget.
Swiss Federal Institute of Technology
9-28
Computer Engineering and Networks Laboratory
From Intels Web Site
Example: Voltage Scaling
[Courtesy, Yasuura, 2000]
Swiss Federal Institute of Technology 9-29
Vdd
Computer Engineering and Networks Laboratory
DVS Example: a) Complete task ASAP
Task that needs to execute 109 cycles within 25 seconds.
Ea= 109 x 40 x 10-9 = 40 [J]
Swiss Federal Institute of Technology
9-30
Computer Engineering and Networks Laboratory
DVS Example: b) Two voltages
Eb= 750 106 x 40 x 10-9 + 250 106 x 10 x 10-9 = 32.5 [J]
Swiss Federal Institute of Technology
9-31
Computer Engineering and Networks Laboratory
DVS Example: c) Optimal Voltage
Ec = 109 x 25 x 10-9 = 25 [J]
Swiss Federal Institute of Technology
9-32
Computer Engineering and Networks Laboratory
DVS: Optimal Strategy
y z x Ta T t Vdd P(y) P(z) P(x)
Execute task in fixed time T with variable voltage Vdd(t):
gate delay: execution rate: invariant:
z = a x + (1-a) y
case A: execute at voltage x for T a time units and at voltage y for (1-a) T time units; energy consumption T ( P(x) a + P(y) (1-a) ) case B: execute at voltage z = a x + (1-a) y for T time units; energy consumption T P(z)
Swiss Federal Institute of Technology 9-33 Computer Engineering and Networks Laboratory
DVS: Optimal Strategy
Dynamic power is a convex function of Vdd P(y) P(x) a + P(y) (1-a) P(x) P(z)
If possible, running at a constant frequency (voltage) minimizes the energy consumption for dynamic voltage scaling:
case A is always worse if the power consumption is a convex function of the supply voltage
Swiss Federal Institute of Technology 9-34 Computer Engineering and Networks Laboratory
DVS: Offline Scheduling on One Processor
Let us model a set of independent tasks as follows: We suppose that a task vi V
requires ci computation time at normalized processor frequency 1 arrives at time ai has (absolute) deadline constraint di
How do we schedule these tasks such that all these tasks can be finished no later than their deadlines and the energy consumption is minimized?
YDS Algorithm from A Scheduling Model for Reduce CPU
Energy, Frances Yao, Alan Demers, and Scott Shenker, FOCS 1995. If possible, running at a constant frequency (voltage) minimizes the energy consumption for dynamic voltage scaling.
Swiss Federal Institute of Technology 9-35 Computer Engineering and Networks Laboratory
YDS Algorithm for Offline Scheduling
1 2 4 3
0 4 8 12
5 6
3,6,5 2,6,3 7
16 time
0,8,2 6,14,6 10,14,6 11,17,2 12,17,2
Define intensity G([z, z]) in some time interval [z, z]: average accumulated execution time of all tasks that have arrival and deadline in [z, z] relative to the length of the interval z-z
ai,di,ci
Swiss Federal Institute of Technology
9-36
Computer Engineering and Networks Laboratory
YDS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first schedule and running at the intensity as the frequency.
1 2 4 3
0 4 8 12
5 6 7
16 time
3,6,5 2,6,3 0,8,2 6,14,6 10,14,6 11,17,2 12,17,2
G([0,6]) = (5+3)/6=8/6, G([0,8]) = (5+3+2)/ (8-0) = 10/8, G([0,14]) = (5+3+2+6+6)/14=11/7, G([0,17]) = (5+3+2+6+6+2+2)/17=26/17 G([2, 6]) = (5+3)/(6-2)=2, G([2,14]) = (5+3+6+6) / (14-2) = 5/3, G([2,17]) = (5+3+6+6+2+2)/15=26/15 G([3,6]) =5/3, G([3,14]) = (5+6+6)/(14-3) = 17/11, G([3,17])=(5+6+6+2+2)/14=21/14 G([6,14]) = 12/(14-6)=12/8, G([6,17]) = (6+6+2+2)/(17-6)=16/11 G([10,14]) = 6/4, G([10,17]) = 10/7, G([11,17]) = 4/6, G([12,17]) = 2/5
Swiss Federal Institute of Technology 9-37
ai,di,ci
Computer Engineering and Networks Laboratory
YDS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first schedule and running at the intensity as the frequency.
1 2 4 3
0 4 8 12
5 6 7
16 time
3,6,5 2,6,3 0,8,2 6,14,6 10,14,6 11,17,2
2
0 4
1
8 12 16
12,17,2
ai,di,ci
Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology
9-38
YDS Algorithm for Offline Scheduling
Step 2: Adjust the arrival times and deadlines by excluding the possibility to execute at the previous critical intervals.
1 2 4 3
0 4 8 12
5 6 0,8,2 7
16 time
0,4,2 2,10,6 6,10,6 7,13,2 8,13,2
6,14,6 10,14,6 11,17,2
5 6 4 3
0 4 8
12,17,2
ai,di,ci
7
12 16
9-39
time
Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology
YDS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 6 4 3
0 4 8
0,4,2 2,10,6 6,10,6
12 16 time
7,13,2 8,13,2
G([0,4])=2/4, G([0,10]) = 14/10, G([0,13])=18/13 G([2,10])=12/8, G([2,13]) = 16/11, G([6,10])=6/4 G([6,13])=10/7, G([7,13])=4/6, G([8,13])=4/5
ai,di,ci
4
0 4
Swiss Federal Institute of Technology
5
8 12 16
9-40
time
Computer Engineering and Networks Laboratory
YDS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again Step 4: Put pieces together
frequency
0,4,2 2 1
4
0,2,2 2,5,2 2,5,2
4
8
5
12 16
time
7,13,2 8,13,2
frequency
3
0
2
4
0,2,2 4
8
0,2,2
5
12
7
16
time
v1 frequency
Swiss Federal Institute of Technology
v2 2
v3 1
v4 1.5
9-41
v5 1.5
v6 4/3
v7 4/3
Computer Engineering and Networks Laboratory
DVS: Online Scheduling on One Processor
frequency 3 2 1
3,6,5 2,6,3 2 3 2 1 3 4 4 5 6 7
time
0,8,2 6,14,6 10,14,6 11,17,2 12,17,2
0 4 8 12 16 Continuously update to the best schedule for all arrived tasks
Time 0: task v3 is executed at 2/8 Time 2: task v2 arrives G([2,6]) = , G([2,8]) = 4.5/6=3/4 => execute v2 at Time 3: task v1 arrives G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12 Time 6: task v4 arrives G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16 Time 10: task v5 arrives G([10,14]) = 39/16 => execute v4 and v5 at 39/16 Time 11 and Time 12 The arrival of v6 and v7 does not change the critical interval Time 14: G([14,17]) = 4/3 => execute v6 and v7 at 4/3
Swiss Federal Institute of Technology 9-42
ai,di,ci
Computer Engineering and Networks Laboratory
Remarks on YDS Algorithm
Offline
The algorithm guarantees the minimal energy consumption while satisfying the timing constraints The time complexity is O(N3), where N is the number of tasks in V
Finding the critical interval can be done in O(N2) The number of iterations is at most N
Exercise:
For periodic real-time tasks with deadline=period, running at constant speed with 100% utilization under EDF has minimum energy consumption while satisfying the timing constraints.
Online
Compared to the optimal offline solution, the on-line schedule uses at most 27 times of the minimal energy consumption.
Swiss Federal Institute of Technology 9-43 Computer Engineering and Networks Laboratory
Topics
General Remarks Power and Energy Basic Techniques
Parallelism VLIW (parallelism and reduced overhead) Dynamic Voltage Scaling Dynamic Power Management
Swiss Federal Institute of Technology
9-44
Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology
9-45
Computer Engineering and Networks Laboratory
Dynamic Power Management (DPM)
Dynamic Power management tries to assign optimal power saving states Requires Hardware Support Example: StrongARM SA1100 400mW
RUN: operational IDLE: a SW routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity 10s 4J
RUN
90s 10s 36J 4J 90s 5J 160ms 64mJ
IDLE 50mW
Swiss Federal Institute of Technology 9-46
SLEEP 160W
Computer Engineering and Networks Laboratory
Reduce Power According to Workload
application states shut down busy run Tsd Tbs
Tsd: shutdown delay Tbs: time before shutdown Twu: wakeup delay
wake up waiting sleep Twu busy run
power states
Desired: Shutdown only during long idle times Tradeoff between savings and overhead
Swiss Federal Institute of Technology 9-47 Computer Engineering and Networks Laboratory
The Challenge
Questions: When to go to a power-saving state? Is an idle period long enough for shutdown? Predicting the future
Swiss Federal Institute of Technology
9-48
Computer Engineering and Networks Laboratory
Combining DVFS and DPM
DVS Critical frequency (voltage): Running at any frequency/voltage lower than this frequency is not worthwhile for execution. sleep
power during run task using voltage and frequency scaling
run task sleep
time
energy for executing task
Critical voltage
Swiss Federal Institute of Technology 9-49 Computer Engineering and Networks Laboratory
Procrastination Schedule
frequency 3 2 1 critical frequency: 1.5
3,6,5 2,6,3
3
0
2
4
4
8
54 5 6
12
7 7
16 time
0,8,1 7,14,2 10,14,2 13,17,2 15,17,2
YDS algorithm, rounded up
procrastinate scheduling
Execute by using voltages higher or equal to the critical voltage only apply YDS algorithm round up voltages lower than the critical voltage Procrastinate the execution of tasks to aggregate enough time for sleeping Try to reduce the number of times to turn on/off Sleep as long as possible
Swiss Federal Institute of Technology 9-50
ai,di,ci
Computer Engineering and Networks Laboratory