0% found this document useful (0 votes)
35 views50 pages

Chapter 1

chapter-1 (1)

Uploaded by

salehbawaneh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views50 pages

Chapter 1

chapter-1 (1)

Uploaded by

salehbawaneh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 1

Computer Abstractions and


Technology
Introduction
 Types of computers?
 Different types:

desktop:General purpose, variety of software

Subject to cost/performance tradeoff

servers:
 Network based

 High capacity, performance, reliability

 Range from small servers to building sized


embedded devices:
 Hidden as components of systems

 Stringent power/performance/cost constraints

 Different uses: automobiles, graphics, finance, genomics…


 Different manufacturers: Intel, Apple, IBM, Microsoft, Sun…
 Different underlying technologies and different costs!
Chapter Outline
 How are programs in high-level language (C or Java), translated into
hardware language and how hardware executes it?

 What is the interface between S/W and H/W, and how does S/W
instruct the H/W to perform needed functions?

 What determines the performance of a program, and how can a


programmer improve performance?

 What techniques can be used by hardware designers to improve


performance?
Why learn this stuff?
 You want to build software people use (need performance)
 You need to make a purchasing decision or offer “expert” advice
 Both Hardware and Software affect performance:
 Algorithm determines number of source-level statements and I/O

operations executed
 Language/Compiler/Architecture determine # of machine

instructions for each source statement (Chapter 2 and 3)


 Processor/Memory determine how fast instructions are executed

(Chapter 5, 6, and 7)
 I/O system (hardware and OS) determines how fast is I/O
What is a computer?
 5 Components:
 input (mouse, keyboard)

 output (display, printer)

 memory (disk drives, DRAM, SRAM, CD)


the processor (datapath and control): Our primary focus:

implemented using millions of transistors

Impossible to understand by looking at each transistor
Inside processor chip
PC Board
Below Your Program

• Application software
– Written in high-level language
– hardware in a computer can only execute extremely
simple low­level instructions
– To go from a complex application to the simple
instructions involves several layers of software
that interpret or translate high­level operations
into simple computer instructions

• System software
– Compiler: translates HLL code to machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
Levels of Program Code
 High-level language
 Level of abstraction closer
to problem domain
 Provides for productivity
and portability
 Assembly language
 Textual representation of
instructions
 Hardware representation
 Binary digits (bits)
 Encoded instructions and
data
The role of the compiler
 The compiler translates a HLL program into the machine language for the
given Instruction Set Archirecture (ISA)
 Compilers allow software developers to work at the HLL level without
worrying about low-level details of the underlying machine
 The compiler writer’s first responsibility is to ensure that the machine
language program
 Exactly matches the functionality of the HLL program
 Exactly conforms to the ISA specification
 Compiler product differences include
 Speed of code execution on the hardware
 Code density (reduces memory requirements)
 Compilation speed (how long from HLL to machine code)
 Debugging capabilities
Instruction Set Architecture (ISA)
 A very important abstraction
 Interface( boundary) between hardware and low-level software
 Another definition: is a protocol that defines how a computing machine appears to
a machine language programmer or compiler
 The ISA describes the (1) memory model, (2) instruction format, types and
modes, and (3) operand registers, types, and data addressing. Instruction types
include arithmetic, logical, data transfer, and flow control. Instruction modes
include kernel and user instructions.advantage: different implementations of the
same architecture

Modern instruction set architectures:


IA-32 (x86 architecture), PowerPC, MIPS, SPARC, ARM, RISC-Vand others
• Two main approaches of ISA:

– RISC (Reduced Instruction Set Computer) architecture

– CISC (Complex Instruction Set Computer) architecture.


RISC ISA Characteristics
• All operations(add, sub, div,…) on data apply to data in registers

• The only operations that affect memory are load and store
operations that move data from memory to a register or to
memory from a register, respectively;
• A small number of memory addressing modes;

• The instruction formats are few in number with all instructions


typically being same size;

These simple properties lead to dramatic simplifications in the


implementation of advanced pipelining techniques, which is
why RISC architecture instruction sets were designed this way.
RISC architecture goals are ease of implementation (with emphasis on concepts
such as advanced pipelining) and compatibility with highly optimized compilers.
CISC Architecture
CISC – Complex (and Powerful) Instruction Set Computer
CISC characteristic:
•powerful(complex) instructions: where each instruction can perform more than

one task
•Variable length instructions

• powerful addressing modes

Question: What is today

Answer: Intel IA-32 architecture


where: IA-32 (short for "Intel Architecture, 32-bit", sometimes also called is
the 32-bit version of the x86 instruction set architecture, designed by Intel and first
implemented in the 80386 microprocessor in 1985.

Intel 64 Architecture refers to systems based on IA-32 architecture processors which


-

have 64-bit architectural extensions, for example, Intel CoreTM2 processor family)
- named AMD64, was introduced by AMD in 2000
Instruction Set Architecture

software

instruction set

hardware
What is Computer Architecture?

Computer Architecture =

Instruction Set Architecture


+
Machine Organization
What is Computer Architecture?

Application
Operating
System
Compiler Firmware
Instruction Set
Architecture
Instars. Set Proc.I/O system
Datapath & Control

Digital Design
Circuit Design
Layout
The ISA and computer hardware
 The designer of computer hardware (CPU, caches, MM, and I/O) must first
ensure that the hardware correctly executes the machine code specified in
the ISA
 Hardware product differences include
 Performance (emphasis of this course)
 Power dissipation (a huge issue today!)
 Cost (die size, package pin count, cooling costs)
 Reliability, availability, serviceability
 Ability to upgrade
Historical Perspective
 1944: The First Electronic Computer ENIAC at
IAS, Princeton Univ. (18,000 vacuum tubes)
 Decade of 70’s (Microprocessors)
Programmable Controllers, Single Chip Microprocessors
Personal Computers
 Decade of 80’s (RISC Architecture)
Instruction Pipelining, Fast Cache Memories
Compiler Optimizations
 Decade of 90’s (Instruction Level Parallelism)
Superscalar Processors, Instruction Level Parallelism (ILP),
Aggressive Code Scheduling, Out of Order Execution
 Decade of 2000’s (Multi-core processors)
Thread Level Parallelism (TLP), Low Cost Supercomputing
Technology => Dramatic Change
 Processor
 2X in performance every 1.5 years; 1000X
performance in last decade (Moore’s Law)
 Main Memory
 DRAM capacity: 2x / 2 years; 1000X size
in last decade
 Cost/bit: improves about 25% per year
 Disk
 capacity: > 2X in size every 1.5 years
 Cost/bit: improves about 60% per year
 This increase in transistor count for an
integrated circuit is popularly known as
Moore’s law
 which states that transistor capacity doubles
every 18–24 months.

Chapter 1 — Computer Abstractions and Technology — 22


Technology Trends
 Electronics technology continues to
evolve
 Increased capacity and
performance
 Reduced cost
shows the growth in DRAM capacity since 1977

Relative performance per unit cost of technologies used in computers over time.
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
(hundreds to millions transistors)
2005 Ultra large scale IC 6,200,000,000
( millions of transistors on a single silicon
semiconductor microchip)
§1.7 Real Stuff: The AMD Opteron X4
Manufacturing ICs

 Yield: proportion of working dies(chips)per


wafer
Chapter 1 — Computer Abstractions and Technology — 24
AMD Opteron X2 Wafer

 X2: 300mm wafer, 117 chips, 90nm technology


 X4: 45nm(transistor size) technology
Chapter 1 — Computer Abstractions and Technology — 25
Performance
 When we say one computer has better
performance than another, what do we mean?
 If you were running a program on two different
desktop computers, you’d say that the faster one
is the desktop computer that gets the job done
first.
 If you were running a datacenter that had
several servers running jobs submitted by many
users, you’d say that the faster computer was
the one that completed the most jobs during a
day.
Chapter 1 — Computer Abstractions and Technology — 26
Response Time and Throughput

 As an individual computer user, you are interested in reducing


response time, which is defined as the

time between the start and completion of a task, it is also
referred to as execution time
 Datacenter managers are often interested in increasing throughput
or bandwidth—

It is the total amount of work done in a given time.

e.g., tasks/transactions/… per hour
 How are response time and throughput affected by
 Replacing the processor with a faster version?

 Adding more computers?

 We’ll focus on response time for now…

27
Relative Performance
 Performance = 1/Execution Time
 We say “machine X is n time faster than machine Y” if

Performanc e X Performanc e Y
Execution time Y Execution time X n

•Also , if we have two computers X and Y, if the


performance of X is greater than the performance of Y, we
have

28
 Example: time taken to run a program
 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B

Chapter 1 — Computer Abstractions and Technology — 29


Measuring Execution Time
 Elapsed time
 Total response time, including all aspects

Processing, I/O, OS overhead, idle time
 Determines system performance
 CPU time

is the time the CPU spends computing for this task and does not
include time spent waiting for I/O or running other programs.
 CPU time = user CPU time +system CPU time

Where user CPU time = the CPU time spent in the program

And system CPU time = is the CPU time spent in the operating
system performing tasks on behalf of the program

 The execution time used to measure the


performance Is the user CPU time

Chapter 1 — Computer Abstractions and Technology — 30


CPU Clocking
 Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

 Clock period: duration of a clock cycle


 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
 Note Clock frequency =1/ Clock period
31
CPU Time
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles

Clock Rate
 Performance improved by
 Reducing number of clock cycles (good
algorithm or hardware design)
 Increasing clock rate (good technology)
 Hardware designer must often trade off clock
rate against cycle count

Chapter 1 — Computer Abstractions and Technology — 32


CPU Time Example
 Computer A: 2GHz clock, 10s CPU time
 Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles
 How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB  
CPU Time B 6s
Clock CyclesA CPU Time A Clock Rate A
10s 2GHz 20 10 9
1.2 20 10 9 24 10 9
Clock RateB   4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 33
Instruction Count and CPI
Clock Cycles Instructio n Count averagCycles per Instructio n
CPU Time Instructio n Count avrageCPI Clock Cycle Time
Instructio n Count avrageCPI

Clock Rate
 Instruction Count f or a program
 Determined by program, ISA and compiler
 Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI

Average CPI affected by instruction mix

Chapter 1 — Computer Abstractions and Technology — 34


CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 Same ISA, and for the same program
 Which is faster, and by how much?
CPU Time Instruction Count CPI Cycle Time
A A A
I 2.0 250ps I 500ps A is faster…
CPU Time Instruction Count CPI Cycle Time
B B B
I 1.2 500ps I 600ps
CPU Time
B I 600ps 1.2
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 35
CPI in More Detail
 If different instruction classes take different
numbers of cycles
n
Clock Cycles  (CPIi Instruction Count i )
i1

 average CPI
n
Clock Cycles  Instructio n Count i 
CPI    CPIi  
Instructio n Count i1  Instructio n Count 

Relative frequency

Chapter 1 — Computer Abstractions and Technology — 36


CPI Example
 A compiler designer is trying to decide between two code sequences for a particular
computer. each code sequence contains instructions in classes A, B, C . The hardware
designers have supplied the following. What is the average CPI for each code sequence

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
37
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
Example continued
 If the computer’s clock rate for the previous
example is 100MHZ. Decide which code
sequence is faster.
 solution :
 Exe time for code sequence 1 = IC * ave CPI * CT
= 2*5/(100 * 10 6)= 10*10-8
sec
 Exe time for code sequence 2 = IC * ave CPI * CT

= 6*1.5/(100 * 10 6)= 9*10-8


sec
Since Exe time for code sequence 2< Exe time for code
sequence 1  Exe time for code sequence 2 is faster
Chapter 1 — Computer Abstractions and Technology — 38
Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time   
Program Instruction Clock cycle

 Performance depends on
 Algorithm: affects IC, possibly ave CPI
 Programming language: affects IC, ave CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, T c

Chapter 1 — Computer Abstractions and Technology — 39


Power Trends
 Both clock rate and power increased rapidly for decades,
and then flattened off recently.
 The reason they grew together is that they are correlated,
 and the reason for their recent slowing is that we have run
into the practical power limit for cooling commodity
microprocessors.
 The dominant technology for integrated circuits is called
CMOS (complementary metal oxide semiconductor).
 For CMOS, the primary source of power dissipation
 is so-called dynamic power

that is, power that is consumed during switching.

— 40
 The dynamic power dissipation depends
on the:
 capacitive loading of each transistor(called the
fanout),
 the voltage applied,
 and the frequency that the transistor is
switched(which is a function of the clock rate).

Chapter 1 — Computer Abstractions and Technology — 41


§1.5 The Power Wall
Power Trends

 In CMOS IC technology
Power Capacitive load Voltage 2 Frequency

×30 5V → 1V ×1000

Chapter 1 — Computer Abstractions and Technology — 42


Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85 4
 2
0.85 0.52
Pold Cold Vold Fold
 The power wall
 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 43
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism

Hardware executes multiple instructions at once

Hidden from the programmer
 Hard to do

Programming for performance

Load balancing

Optimizing communication and synchronization

Chapter 1 — Computer Abstractions and Technology — 44


SPEC CPU Benchmark
 Programs used to measure performance

Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)

Develops benchmarks for CPU, I/O, Web, …

Chapter 1 — Computer Abstractions and Technology — 45


§1.8 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
 Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved   Tunaffected
improvemen t factor
 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to
get 5× overall?
80  Can’t be done!
20   20
n
 Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 46
Chapter 1 — Computer Abstractions and Technology — 47
Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesn’t account for

Differences in ISAs between computers

Differences in complexity between instructions

Instruction count
MIPS 
Execution time 10 6
Instruction count Clock rate
 
Instruction count CPI 6 CPI 10 6
10
Clock rate
 Average CPI varies between programs on a given CPU

Chapter 1 — Computer Abstractions and Technology — 48


MIPs example1
 A compiler designer is trying to decide between two code sequences
for a particular computer. each code sequence contains instructions
in classes A, B, C . The hardware designers have supplied the
following. clock rate of the CPU is 100MHZ Which code sequence is
faster according to MIPs

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5


 MIPs1=100MHz/2*106  MIPs2=100MHz/
 =0.5 sec 1.5*106
 Mips2>Mips1 code
 =2/3 sec
49
sequence 2 faster
MIPs example2

Chapter 1 — Computer Abstractions and Technology — 50

You might also like