Chapter 1
Computer Abstractions and
Technology
Introduction
Types of computers?
Different types:
desktop:General purpose, variety of software
Subject to cost/performance tradeoff
servers:
Network based
High capacity, performance, reliability
Range from small servers to building sized
embedded devices:
Hidden as components of systems
Stringent power/performance/cost constraints
Different uses: automobiles, graphics, finance, genomics…
Different manufacturers: Intel, Apple, IBM, Microsoft, Sun…
Different underlying technologies and different costs!
Chapter Outline
How are programs in high-level language (C or Java), translated into
hardware language and how hardware executes it?
What is the interface between S/W and H/W, and how does S/W
instruct the H/W to perform needed functions?
What determines the performance of a program, and how can a
programmer improve performance?
What techniques can be used by hardware designers to improve
performance?
Why learn this stuff?
You want to build software people use (need performance)
You need to make a purchasing decision or offer “expert” advice
Both Hardware and Software affect performance:
Algorithm determines number of source-level statements and I/O
operations executed
Language/Compiler/Architecture determine # of machine
instructions for each source statement (Chapter 2 and 3)
Processor/Memory determine how fast instructions are executed
(Chapter 5, 6, and 7)
I/O system (hardware and OS) determines how fast is I/O
What is a computer?
5 Components:
input (mouse, keyboard)
output (display, printer)
memory (disk drives, DRAM, SRAM, CD)
the processor (datapath and control): Our primary focus:
implemented using millions of transistors
Impossible to understand by looking at each transistor
Inside processor chip
PC Board
Below Your Program
• Application software
– Written in high-level language
– hardware in a computer can only execute extremely
simple lowlevel instructions
– To go from a complex application to the simple
instructions involves several layers of software
that interpret or translate highlevel operations
into simple computer instructions
• System software
– Compiler: translates HLL code to machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
Levels of Program Code
High-level language
Level of abstraction closer
to problem domain
Provides for productivity
and portability
Assembly language
Textual representation of
instructions
Hardware representation
Binary digits (bits)
Encoded instructions and
data
The role of the compiler
The compiler translates a HLL program into the machine language for the
given Instruction Set Archirecture (ISA)
Compilers allow software developers to work at the HLL level without
worrying about low-level details of the underlying machine
The compiler writer’s first responsibility is to ensure that the machine
language program
Exactly matches the functionality of the HLL program
Exactly conforms to the ISA specification
Compiler product differences include
Speed of code execution on the hardware
Code density (reduces memory requirements)
Compilation speed (how long from HLL to machine code)
Debugging capabilities
Instruction Set Architecture (ISA)
A very important abstraction
Interface( boundary) between hardware and low-level software
Another definition: is a protocol that defines how a computing machine appears to
a machine language programmer or compiler
The ISA describes the (1) memory model, (2) instruction format, types and
modes, and (3) operand registers, types, and data addressing. Instruction types
include arithmetic, logical, data transfer, and flow control. Instruction modes
include kernel and user instructions.advantage: different implementations of the
same architecture
Modern instruction set architectures:
IA-32 (x86 architecture), PowerPC, MIPS, SPARC, ARM, RISC-Vand others
• Two main approaches of ISA:
– RISC (Reduced Instruction Set Computer) architecture
– CISC (Complex Instruction Set Computer) architecture.
RISC ISA Characteristics
• All operations(add, sub, div,…) on data apply to data in registers
• The only operations that affect memory are load and store
operations that move data from memory to a register or to
memory from a register, respectively;
• A small number of memory addressing modes;
• The instruction formats are few in number with all instructions
typically being same size;
These simple properties lead to dramatic simplifications in the
implementation of advanced pipelining techniques, which is
why RISC architecture instruction sets were designed this way.
RISC architecture goals are ease of implementation (with emphasis on concepts
such as advanced pipelining) and compatibility with highly optimized compilers.
CISC Architecture
CISC – Complex (and Powerful) Instruction Set Computer
CISC characteristic:
•powerful(complex) instructions: where each instruction can perform more than
one task
•Variable length instructions
• powerful addressing modes
Question: What is today
Answer: Intel IA-32 architecture
where: IA-32 (short for "Intel Architecture, 32-bit", sometimes also called is
the 32-bit version of the x86 instruction set architecture, designed by Intel and first
implemented in the 80386 microprocessor in 1985.
Intel 64 Architecture refers to systems based on IA-32 architecture processors which
-
have 64-bit architectural extensions, for example, Intel CoreTM2 processor family)
- named AMD64, was introduced by AMD in 2000
Instruction Set Architecture
software
instruction set
hardware
What is Computer Architecture?
Computer Architecture =
Instruction Set Architecture
+
Machine Organization
What is Computer Architecture?
Application
Operating
System
Compiler Firmware
Instruction Set
Architecture
Instars. Set Proc.I/O system
Datapath & Control
Digital Design
Circuit Design
Layout
The ISA and computer hardware
The designer of computer hardware (CPU, caches, MM, and I/O) must first
ensure that the hardware correctly executes the machine code specified in
the ISA
Hardware product differences include
Performance (emphasis of this course)
Power dissipation (a huge issue today!)
Cost (die size, package pin count, cooling costs)
Reliability, availability, serviceability
Ability to upgrade
Historical Perspective
1944: The First Electronic Computer ENIAC at
IAS, Princeton Univ. (18,000 vacuum tubes)
Decade of 70’s (Microprocessors)
Programmable Controllers, Single Chip Microprocessors
Personal Computers
Decade of 80’s (RISC Architecture)
Instruction Pipelining, Fast Cache Memories
Compiler Optimizations
Decade of 90’s (Instruction Level Parallelism)
Superscalar Processors, Instruction Level Parallelism (ILP),
Aggressive Code Scheduling, Out of Order Execution
Decade of 2000’s (Multi-core processors)
Thread Level Parallelism (TLP), Low Cost Supercomputing
Technology => Dramatic Change
Processor
2X in performance every 1.5 years; 1000X
performance in last decade (Moore’s Law)
Main Memory
DRAM capacity: 2x / 2 years; 1000X size
in last decade
Cost/bit: improves about 25% per year
Disk
capacity: > 2X in size every 1.5 years
Cost/bit: improves about 60% per year
This increase in transistor count for an
integrated circuit is popularly known as
Moore’s law
which states that transistor capacity doubles
every 18–24 months.
Chapter 1 — Computer Abstractions and Technology — 22
Technology Trends
Electronics technology continues to
evolve
Increased capacity and
performance
Reduced cost
shows the growth in DRAM capacity since 1977
Relative performance per unit cost of technologies used in computers over time.
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
(hundreds to millions transistors)
2005 Ultra large scale IC 6,200,000,000
( millions of transistors on a single silicon
semiconductor microchip)
§1.7 Real Stuff: The AMD Opteron X4
Manufacturing ICs
Yield: proportion of working dies(chips)per
wafer
Chapter 1 — Computer Abstractions and Technology — 24
AMD Opteron X2 Wafer
X2: 300mm wafer, 117 chips, 90nm technology
X4: 45nm(transistor size) technology
Chapter 1 — Computer Abstractions and Technology — 25
Performance
When we say one computer has better
performance than another, what do we mean?
If you were running a program on two different
desktop computers, you’d say that the faster one
is the desktop computer that gets the job done
first.
If you were running a datacenter that had
several servers running jobs submitted by many
users, you’d say that the faster computer was
the one that completed the most jobs during a
day.
Chapter 1 — Computer Abstractions and Technology — 26
Response Time and Throughput
As an individual computer user, you are interested in reducing
response time, which is defined as the
time between the start and completion of a task, it is also
referred to as execution time
Datacenter managers are often interested in increasing throughput
or bandwidth—
It is the total amount of work done in a given time.
e.g., tasks/transactions/… per hour
How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more computers?
We’ll focus on response time for now…
27
Relative Performance
Performance = 1/Execution Time
We say “machine X is n time faster than machine Y” if
Performanc e X Performanc e Y
Execution time Y Execution time X n
•Also , if we have two computers X and Y, if the
performance of X is greater than the performance of Y, we
have
28
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 29
Measuring Execution Time
Elapsed time
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
is the time the CPU spends computing for this task and does not
include time spent waiting for I/O or running other programs.
CPU time = user CPU time +system CPU time
Where user CPU time = the CPU time spent in the program
And system CPU time = is the CPU time spent in the operating
system performing tasks on behalf of the program
The execution time used to measure the
performance Is the user CPU time
Chapter 1 — Computer Abstractions and Technology — 30
CPU Clocking
Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Note Clock frequency =1/ Clock period
31
CPU Time
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
Clock Rate
Performance improved by
Reducing number of clock cycles (good
algorithm or hardware design)
Increasing clock rate (good technology)
Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology — 32
CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB
CPU Time B 6s
Clock CyclesA CPU Time A Clock Rate A
10s 2GHz 20 10 9
1.2 20 10 9 24 10 9
Clock RateB 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 33
Instruction Count and CPI
Clock Cycles Instructio n Count averagCycles per Instructio n
CPU Time Instructio n Count avrageCPI Clock Cycle Time
Instructio n Count avrageCPI
Clock Rate
Instruction Count f or a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 34
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA, and for the same program
Which is faster, and by how much?
CPU Time Instruction Count CPI Cycle Time
A A A
I 2.0 250ps I 500ps A is faster…
CPU Time Instruction Count CPI Cycle Time
B B B
I 1.2 500ps I 600ps
CPU Time
B I 600ps 1.2
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 35
CPI in More Detail
If different instruction classes take different
numbers of cycles
n
Clock Cycles (CPIi Instruction Count i )
i1
average CPI
n
Clock Cycles Instructio n Count i
CPI CPIi
Instructio n Count i1 Instructio n Count
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 36
CPI Example
A compiler designer is trying to decide between two code sequences for a particular
computer. each code sequence contains instructions in classes A, B, C . The hardware
designers have supplied the following. What is the average CPI for each code sequence
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
37
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Example continued
If the computer’s clock rate for the previous
example is 100MHZ. Decide which code
sequence is faster.
solution :
Exe time for code sequence 1 = IC * ave CPI * CT
= 2*5/(100 * 10 6)= 10*10-8
sec
Exe time for code sequence 2 = IC * ave CPI * CT
= 6*1.5/(100 * 10 6)= 9*10-8
sec
Since Exe time for code sequence 2< Exe time for code
sequence 1 Exe time for code sequence 2 is faster
Chapter 1 — Computer Abstractions and Technology — 38
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds
CPU Time
Program Instruction Clock cycle
Performance depends on
Algorithm: affects IC, possibly ave CPI
Programming language: affects IC, ave CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, T c
Chapter 1 — Computer Abstractions and Technology — 39
Power Trends
Both clock rate and power increased rapidly for decades,
and then flattened off recently.
The reason they grew together is that they are correlated,
and the reason for their recent slowing is that we have run
into the practical power limit for cooling commodity
microprocessors.
The dominant technology for integrated circuits is called
CMOS (complementary metal oxide semiconductor).
For CMOS, the primary source of power dissipation
is so-called dynamic power
that is, power that is consumed during switching.
— 40
The dynamic power dissipation depends
on the:
capacitive loading of each transistor(called the
fanout),
the voltage applied,
and the frequency that the transistor is
switched(which is a function of the clock rate).
Chapter 1 — Computer Abstractions and Technology — 41
§1.5 The Power Wall
Power Trends
In CMOS IC technology
Power Capacitive load Voltage 2 Frequency
×30 5V → 1V ×1000
Chapter 1 — Computer Abstractions and Technology — 42
Reducing Power
Suppose a new CPU has
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85 4
2
0.85 0.52
Pold Cold Vold Fold
The power wall
We can’t reduce voltage further
We can’t remove more heat
How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 43
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 44
SPEC CPU Benchmark
Programs used to measure performance
Supposedly typical of actual workload
Standard Performance Evaluation Corp (SPEC)
Develops benchmarks for CPU, I/O, Web, …
Chapter 1 — Computer Abstractions and Technology — 45
§1.8 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved Tunaffected
improvemen t factor
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to
get 5× overall?
80 Can’t be done!
20 20
n
Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 46
Chapter 1 — Computer Abstractions and Technology — 47
Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second
Doesn’t account for
Differences in ISAs between computers
Differences in complexity between instructions
Instruction count
MIPS
Execution time 10 6
Instruction count Clock rate
Instruction count CPI 6 CPI 10 6
10
Clock rate
Average CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 48
MIPs example1
A compiler designer is trying to decide between two code sequences
for a particular computer. each code sequence contains instructions
in classes A, B, C . The hardware designers have supplied the
following. clock rate of the CPU is 100MHZ Which code sequence is
faster according to MIPs
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
MIPs1=100MHz/2*106 MIPs2=100MHz/
=0.5 sec 1.5*106
Mips2>Mips1 code
=2/3 sec
49
sequence 2 faster
MIPs example2
Chapter 1 — Computer Abstractions and Technology — 50