0% found this document useful (0 votes)

84 views54 pages

Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition

Uploaded by

mwangilaureen493

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views54 pages

Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition

Uploaded by

mwangilaureen493

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Computer Architecture

A Quantitative Approach, Fifth Edition

Chapter 1
Fundamentals of Quantitative
Design and Analysis

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Contents
1. Introduction
2. Classes of computers
3. Trends in computer architecture
4. Parallelism
5. Power and energy
6. Chip fabrication costs
7. Benchmarks
8. Principles of computer design
9. Fallacies and pitfalls
10. Evolution of supercomputers
11. Problem solving

Copyright © 2012, Elsevier Inc. All rights reserved. 2

Introduction
Computer technology

 Performance improvements:
 Improvements in semiconductor technology
 Feature size, clock speed
 Improvements in computer architectures
 Enabled by HLL compilers, UNIX
 Lead to RISC architectures

 Together have enabled:

 Lightweight computers
 Productivity-based managed/interpreted
programming languages

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Introduction
Single processor performance
Move to multi-processor

RISC

Copyright © 2012, Elsevier Inc. All rights reserved. 4

Introduction
Current trends in architecture

 Cannot continue to leverage Instruction-Level parallelism

(ILP)
 Single processor performance improvement ended in 2003.
Why?

 New models for performance:

 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)

 The new models for performance require explicit

restructuring of the application. No more free lunch for
application developers!!!

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Classes of Computers
Classes of computers
1. Personal Mobile Device (PMD)
1. e.g. smart phones, tablet computers
2. Emphasis on energy efficiency and real-time
2. Desktop Computers
1. Emphasis on price-performance
3. Servers
1. Emphasis on availability, scalability, throughput
4. Clusters / Warehouse Scale Computers (WSCs)
1. Used for “Software as a Service (SaaS)”
2. Emphasis on availability and price-performance
3. Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
5. Embedded Computers
1. Emphasis: price

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Personal Desktop Server Cluster Embedded
Mobile WSC
device

Price of 100 - 1,000 300 – 2,500 5,000 – 100,000 – 10 – 100,000

system ($) 10,000,000 200,000,000

Price of 10 - 100 50 - 500 200- 2,000 50 - 250 0.01 - 100

processor
($)

Critical Cost, energy, Cost, energy, Throughput, Price- Price,

system performance performance, availability, performance, energy,
design graphics Scalability, energy- performance
issues energy proportionality

dcm 7
Classes of Computers
Parallelism
 Application parallelism:
 Data-Level Parallelism (DLP)
 Task-Level Parallelism (TLP)

 Architectural parallelism exploits application parallelism:

 Instruction-Level Parallelism (ILP)  pipelining, speculative
execution.
 Vector architectures/Graphic Processor Units (GPUs) exploit
DLP in SIMD architectures.
 Thread-Level Parallelism  DLP or TLP of interacting threads.
 Request-Level Parallelism parallelism among decoupled tasks.

dcm. 8
Classes of Computers
Michel Flynn’s taxonomy
 SISD - Single Instruction stream, Single Data stream

 SIMD - Single Instruction stream, Multiple Data streams

 Vector architectures
 Multimedia extensions
 Graphics processor units

 MIMD - Multiple Instruction streams, Multiple Data streams

 Tightly-coupled MIMD
 Loosely-coupled MIMD

 MISD - Multiple Instruction streams, Single Data stream

 No commercial implementation

Copyright © 2012, Elsevier Inc. All rights reserved. 9

Defining Computer Architecture
Defining computer architecture
 Old view of computer architecture:
 Instruction Set Architecture (ISA) design
 Decisions regarding:
 registers, memory addressing,

 addressing modes,

 instruction operands,

 available operations,

 control flow instructions,

 instruction encoding.

 Real computer architecture:

 Specific requirements of the target machine
 Design to maximize performance within constraints: cost, power,
and availability
 Includes ISA, microarchitecture, hardware

Copyright © 2012, Elsevier Inc. All rights reserved. 10

MIPS instruction format
 R-instructions  all data values are in registers
OPCODE rd,rs,rt Example: add $s1, $s2, $s3
rd- destination register
rs, rt – source registers
 I-instructions  operate on an immediate value and a register value.
Immediate values may be a maximum of 16 bits long.
OPCODE rs,rt,Imm
 J-instructions  used to transfer control
OPCODE label
 FR- instructions  similar to R-instruction but operating of floating point
OPCODE fmt,fs,ft,fd,funct
 FI- instructions  similar to I-instruction but operating of floating point
OPCODE fmt,ft,Imm

Copyright © 2012, Elsevier Inc. All rights reserved. 11

Figure 1.6 MIPS64 instruction set architecture formats. All instructions are 32 bits long. The R format is for integer
register-to-register operations, such as DADDU, DSUBU, and so on. The I format is for data transfers, branches, and
immediate instructions, such as LD, SD, BEQZ, and DADDIs. The J format is for jumps, the FR format for floating-point
operations, and the FI format for floating-point branches.

Copyright © 2011, Elsevier Inc. All rights Reserved. 12

dcm 13
Computer implementation

 Organization / microarchitecture  high-level aspects of

computer design including:
 Memory system
 Memory interconnect
 CPU
 Hardware  detailed logic design and the packaging
technology.

dcm 14
Trends in Technology
Technology improvement rate per year
 Integrated circuit
 Transistor density: 35% (Moore’s law)
 Die size: 10-20%
 Integration overall: 40-55%

 DRAM capacity: 25-40% (slowing)

 Flash capacity: 50-60%

 15-20X cheaper/bit than DRAM

 Magnetic disk: 40%

 15-25X cheaper/bit then Flash
 300-500X cheaper/bit than DRAM

Copyright © 2012, Elsevier Inc. All rights reserved. 15

Flash memory

 Flash memory - electronic non-volatile storage medium that

can be electrically erased and reprogrammed.
 NAND flash memory
 May be written and read in blocks (or pages) which are generally much
smaller than the entire device.
 Used in main memory, memory cards, USB flash drives, solid-state
drives for general storage and transfer of data.
 NOR flash memory
 Allows a single machine word (byte) to be written—to an erased
location—or read independently.
 Allows true random access and therefore direct code execution

dcm 16
DRAM – dynamic random-access memory
 Stores each bit in a separate capacitor within an
integrated circuit. The capacitor can be either charged or
discharged; these two states are taken to represent the
two values of a bit, 0 and 1.
 Dynamic, as opposite to SRAM (static RAM)needs to
be periodically refreshed as capacitors leak charge.
 Structural simplicity: only one transistor and a capacitor
are required per bit, compared to four or six transistors in
SRAM. This allows DRAM to reach very high densities.
 Unlike flash memory, DRAM is volatile memory since it
loses its data quickly when power is removed.

dcm 17
Trends in Technology
Evolution of bandwidth and latency

 Bandwidth or throughput total work done in a given time

 improvement for processors 10,000 - 25,000 times

 improvement for memory and disks  300 - 1,200 times

 Latency or response time  time between start and

completion of an operation
 improvement for processors  30 - 80 times

 improvement for memory and disks  6 - 8 times

 Processors have improved at a much faster rate than

memory and disks.!!

Copyright © 2012, Elsevier Inc. All rights reserved. 18

Trends in Technology
Bandwidth and latency

Log-log plot of bandwidth and latency milestones

Copyright © 2012, Elsevier Inc. All rights reserved. 19

Trends in Technology
Feature size of transistors and wires

 Feature size Minimum size of transistor or wire

in x or y dimension
 10.0 microns in 1971

 0.32 microns in 2011

 Transistor performance scales linearly with feature size.

 Wire delay does not improve with feature size!
 Integration density scales quadratically

Copyright © 2012, Elsevier Inc. All rights reserved. 20

Moore’s Law

 The number of transistors in a dense integrated circuit

doubles approximately every two years, 18 months to be
exact.
 Gordon E. Moore, co-founder of Intel Corporation, who
described the trend in a 1965 paper.
 His prediction has proven to be accurate and the law is
now used in the semiconductor industry to guide long-
term planning and to set targets for research and
development

Copyright © 2012, Elsevier Inc. All rights reserved. 21

Feature size of transistors and wires (cont’d)

Nature 479, 310–316 (17 November 2011) 22

Application: questions related to Moore’s law
(a) The number of transistors on a chip in 2015 should be how many times
the number in 2005 based on Moore’s law?
(b) In the 90s the increase in clock rate once mirrored the trend. Had the
clock rate continued to climb at the same rate fast would the clock rate
be in 2015?
(c) At the current rate of increase what are the clock rates projected to be
in 2015?
(d) What has limited the growth of the clock rate and what are architects
doing with the extra transistors to increase performance?
(e) The rate of growth of DRAM capacity has also slowed down. For 20
years it increased by 60%/year. It dropped to 40%/year and now is in
the 25-40%/year . If this trend continues what will be this rate in 2020?

dcm 23
Answers

dcm 24
Trends in Power and Energy
Power and energy
 Problem: Get power in, get power out

 Thermal Design Power (TDP)

 Characterizes sustained power consumption

 Used as target for power supply and cooling system

 Lower than peak power, higher than average power

consumption

 Clock rate can be reduced dynamically to limit power

consumption

 Energy per task is often a better measurement

Copyright © 2012, Elsevier Inc. All rights reserved. 25

Trends in Power and Energy
Dynamic energy and power
 Dynamic energy – energy to switch the transistor state
 Transistor switch from 0  1 or 1  0
 ½ x Capacitive load x Voltage2

 Dynamic power – power to switch the transistor state

2
 ½ x Capacitive load x Voltage x Frequency switched

 Reducing clock rate reduces power, not energy

Copyright © 2012, Elsevier Inc. All rights reserved. 26

Trends in Power and Energy
Processor power consumption

 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel Core
i7 consumes 130 W

 Heat must be
dissipated from 1.5 x
1.5 cm chip
 This is the limit of
what can be cooled
by air

 Dramatic increase in
power consumption!!

Copyright © 2012, Elsevier Inc. All rights reserved. 27

Trends in Power and Energy
Techniques for reducing power

 Do nothing well
 Dynamic Voltage-Frequency Scaling (DVFS)
 Low power state for DRAM, disks
 Over-clocking, turning off cores

Copyright © 2012, Elsevier Inc. All rights reserved. 28

Trends in Power and Energy
Power consumption
 Static power consumption:
 I x V (Static current x Voltage)
 Scales with number of transistors
 Power gating  technique used in integrated circuit
design to reduce power consumption, by shutting off the
electric current to blocks of the circuit that are not in use.
 Clock gating  technique used in many synchronous
circuits for reducing dynamic power dissipation. It saves
power by adding more logic to a circuit to disable
portions of the circuitry so that the flip-flops in them do
not have to switch states. Switching states consumes
power. The switching power consumption goes to zero,
and only leakage currents are incurred

Copyright © 2012, Elsevier Inc. All rights reserved. 29

Trends in Cost
Trends in cost

 Cost driven down by yield learning curve

 Yield  the ratio of the number of products that can be sold to the
number of products that can be manufactured.
 Estimated typical cost of modern 300 mm or 12 inch
wafer 0.13 nm process fabrication plant is $2-4 billion.
Typical number of processing steps for a modern integrated circuit is
more than 150. Typical production cycle-time is over 6 weeks.
Individual wafers cost multiple thousands of dollars. Given such
huge investments, consistent high yield is necessary for faster time
to profit.
 DRAM: price closely tracks cost
 Microprocessors: price depends on volume
 10% less for each doubling of volume

Copyright © 2012, Elsevier Inc. All rights reserved. 30

Trends in Cost
Integrated circuit cost
 Integrated circuit

 Bose-Einstein formula:

 Defects per unit area = 0.016-0.057 defects per square cm (2010)

 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Intel i7 microprocessor

dcm 32
Left - floor plan of Core i7; Right - floor plan of second core

 QPI Quick Path Interconnect

This 300 mm wafer contains 280 full Sandy Bridge dies, each 20.7 by 10.5 mm in a 32 nm
process. (Sandy Bridge is Intel’s successor to Nehalem used in the Core i7.) At 216 mm2, the
formula for dies per wafer estimates 282. (Courtesy Intel.)

Case study – chip fabrication costs

Die size Estimated defect Manufacturing Transistors

(mm2) rate per(cm2) size (nm) (millions)

IBM Power 5 389 0.3 130 276

Sun Niagara 380 0.75 90 279

AMD Opteron 199 0.75 90 233

dcm 35
Problem

 a. What is the yield for IBM Power 5?

 b. Why does IBM Power 5 have a lower defect rate?

 Notes: We assumed that the wafer yield is 100/%, no wafers are bad
 N is the process complexity factor. For the 40 nm process it is in the
range 11.5 – 15.5. For the 130 nm process we took N=4

dcm 36
More questions
 A new facility uses a fabrication identical with the one for the Power 5
and produces two chips from 300 mm wafers:
 Woods : 150 mm2 ; the profit is $20/defect-free chip.
 Markon: 250 mm2 ; the profit is $25/defect-free chip
 How much profit can be made for (a) Woods; (b) Markon?
 (c) Which chip should be produced at the new facility?
 (d) If the demand is 50,000 Woods and 25,000 Mackron
chips/month and you can fabricate 150 wafers/month , how many
wafers should be made for each chip?

 Module reliability
 Mean time to failure (MTTF)
 Mean time to repair (MTTR)
 Mean time between failures (MTBF) = MTTF + MTTR
 Availability = MTTF / MTBF

Measuring Performance
Measuring performance
 Typical performance metrics:
 Response time
 Throughput

 Speedup of X relative to Y
 Execution timeY / Execution timeX

 Execution time
 Wall clock time: includes all system overheads
 CPU time: only computation time

 Benchmarks
 Kernels (e.g. matrix multiply)
 Toy programs (e.g. sorting)
 Synthetic benchmarks (e.g. Dhrystone)
 Benchmark suites (e.g. SPEC06fp, TPC-C)

Evolution of benchmarks over time

 Of the 12 SPEC2006 integer programs, 9 are written in

C, and the rest in C++.
 For the floating-point programs, the split is 6 in Fortran, 4
in C++, 3 in C, and 4 in mixed C and Fortran.

 SPEC2006 programs and the evolution of the SPEC benchmarks
over time, with integer programs above the line and floating-point
programs below the line. The figure shows all 70 of the programs in
the 1989, 1992, 1995, 2000, and 2006 releases.
 The benchmark descriptions on the left are for SPEC2006 only and
do not apply to earlier versions. Programs in the same row from
different generations of SPEC are generally not related; for example,
fpppp is not a CFD code like bwaves. Gcc is the senior citizen of the
group. Only 3 integer programs and 3 floating-point programs
survived three or more generations. Note that all the floating-point
programs are new for SPEC2006.
 Although a few are carried over from generation to generation, the
version of the program changes and either the input or the size of the
benchmark is often changed to increase its running time and to avoid
perturbation in measurement or domination of the execution time by
some factor other than CPU time.

Copyright © 2011, Elsevier Inc. All rights Reserved. 43
Figure 1.19 Power-performance of the three servers in Figure 1.18. Ssj_ops/watt values are on
the left axis, with the three columns associated with it, and watts are on the right axis, with
the three lines associated with it. The horizontal axis shows the target workload, as it varies
from 100% to Active Idle. The Intel-based R715 has the best ssj_ops/watt at each workload
level, and it also consumes the lowest power at each level.

Figure 1.20 Percentage of peak performance for four programs on four multiprocessors scaled
to 64 processors. The Earth Simulator and X1 are vector processors (see Chapter 4 and
Appendix G). Not only did they deliver a higher fraction of peak performance, but they also had
the highest peak performance and the lowest clock rates. Except for the Paratec program, the
Power 4 and Itanium 2 systems delivered between 5% and 10% of their peak. From Oliker et al.
[2004].

dcm 45
Principles
Principles of computer design
 Take Advantage of Parallelism
 e.g. multiple processors, disks, memory banks,
pipelining, multiple functional units

 Principle of Locality
 Reuse of data and instructions

 Focus on the Common Case

 Amdahl’s Law

Principles
The processor performance equation

Principles
Different instruction types have different CPIs

Fallacies
 Multiprocessors are a silver bullet  to improve performance replace a
high-clock rate single core with multiple lower-clock-rate, efficient cores.
The burden is now on application developers to exploit parallelism.

 Increasing performance improves energy efficiency.

 Benchmarks remain valid indefinitely  almost 70% of the original

kernels in the SPEC2000 or earlier were dropped.

 Accuracy of reported MTTF  the MTTF of disks as currently reported

is almost 140 years!!

 Peak performance tracks observed performance  peak performance

of different programs on the same processor varies widely.

dcm 49
Pitfalls

 Ignoring Amdahl’s law

 Optimize a feature before measuring its usage.
 Dependability depends on the weakest link
 Fault detection can lower availability
 Some errors, e.g., an error in the branch predictor, could lower
the performance but not the availability.

dcm 50
Supercomputers of the late 1960s - IBM 360/91

 Launched in January 1968. Installed at NASA Aimes.

 Primary memory - up to 6 MB interleaved 16 ways.
 Secondary memory – 300 MB (two IBM 2301 drum and 2 IBM 2314
disks).
 The CPU had five highly autonomous execution units:
 processor storage,
 storage bus control,
 instruction processor,
 fixed-point processor and
 floating-point processor.
 Only four floating point registers.
 Tomasulo’s algorithm for register renaming in 360/91 used in many
modern processors for exploiting Instruction Level Parallelism (ILP).

51
dcm 52
Supercomputers of late 1960s – CDC 7600
 Designed by Seymour Cray.
 RISC architecture with a 15-bit instruction word containing a six-
bit operation code. Only 64 machine codes; no fixed-point
arithmetic in the central processor.
 Pipelined execution - 10-word instruction stack. All addresses in
the stack are fetched, without waiting for the instruction field to
be processed.
 Ten 60-bit read registers and ten 60-bit write registers, each
with an address register.
 Clock rate 36.4 MHz (27.5 ns clock cycle). Could deliver
about 10 MFLOPS on hand-compiled code, with a peak
of 36 MFLOPS.
 65 Kword primary memory; up to 512 Kword secondary
memory.
 Cooled by liquid freon.
53
Massively parallel systems of the 90s

 Touchstone Delta – prototype developed by Intel in 1990

 Installed at Caltech for the Concurrent Supercomputer Consortium
 MIMD architecture with hypercube interconnect; wormhole
routing.
 A node: i860 RISC chip, 60 MFLOPS peak, with 8--16 Mbytes of
memory.
 Peak performance: 32 GFLOPS for a configuration of 484 nodes.
 LINPACK rating=13.9 GFLOPS; SLALOM benchmark = 5750
patches.
 Significantly above the Moore curve
 The Paragon
 Production version of the Touchstone Delta
 Up to 4,000 nodes
 A light-weight kernel called SUNMOS
developed at Sandia National Laboratories
run on the Paragon's compute processors

CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
37 pages
1 BookIntro
No ratings yet
1 BookIntro
23 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
77 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
80 pages
Chapter 01
No ratings yet
Chapter 01
40 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
24 pages
Fundamentals of Computer Design
No ratings yet
Fundamentals of Computer Design
14 pages
Unit I Fundamentals of Computer Design and Ilp-1-14
No ratings yet
Unit I Fundamentals of Computer Design and Ilp-1-14
14 pages
Introduction To ACA 2021
No ratings yet
Introduction To ACA 2021
73 pages
COA Module-4 Notes
No ratings yet
COA Module-4 Notes
11 pages
Fundamentals of Computer Design
No ratings yet
Fundamentals of Computer Design
133 pages
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
No ratings yet
Lecture1 ch1 Fundamentals of Quantitative Design and Analysis
28 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
Computer Architecture Insights
No ratings yet
Computer Architecture Insights
29 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Advanced Computer Architecture Course
No ratings yet
Advanced Computer Architecture Course
28 pages
Computer Evolution & Performance
No ratings yet
Computer Evolution & Performance
71 pages
Instruction Set Architecture and Trends
No ratings yet
Instruction Set Architecture and Trends
4 pages
Unit 1
No ratings yet
Unit 1
194 pages
Chapter 01
No ratings yet
Chapter 01
78 pages
Aca 1st Unit
No ratings yet
Aca 1st Unit
13 pages
Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)
No ratings yet
Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)
68 pages
Chapter1 Aca
No ratings yet
Chapter1 Aca
26 pages
CH02 COA9e
No ratings yet
CH02 COA9e
61 pages
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
No ratings yet
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
41 pages
MS Computer Architecture Course
No ratings yet
MS Computer Architecture Course
49 pages
William Stallings Computer Organization and Architecture 7 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 7 Edition Computer Evolution and Performance
44 pages
Ch2 PDF
No ratings yet
Ch2 PDF
53 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
Chapter 1 - Fundamentals of Computer Design
100% (1)
Chapter 1 - Fundamentals of Computer Design
40 pages
Histroy of Computer Generation
No ratings yet
Histroy of Computer Generation
28 pages
Unit 1
No ratings yet
Unit 1
65 pages
Chapter I - Basic Concepts
No ratings yet
Chapter I - Basic Concepts
60 pages
William Stallings Computer Organization and Architecture
No ratings yet
William Stallings Computer Organization and Architecture
20 pages
02-Computer Evolution and Perfo
No ratings yet
02-Computer Evolution and Perfo
57 pages
Microcontrollers & Moore's Law
No ratings yet
Microcontrollers & Moore's Law
9 pages
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
No ratings yet
Lecture 0. Introduction: Instructor: Weidong Shi (Larry), PHD Computer Science Department University of Houston
65 pages
Preliminary
No ratings yet
Preliminary
100 pages
CHAPTER 1-Orig
No ratings yet
CHAPTER 1-Orig
50 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
30 pages
Computer Architecture and Organization: General Introduction
No ratings yet
Computer Architecture and Organization: General Introduction
72 pages
Advanced Computer Architecture: Azvjvhd
No ratings yet
Advanced Computer Architecture: Azvjvhd
61 pages
Week 3 Evolution and Performance
No ratings yet
Week 3 Evolution and Performance
33 pages
Brief History of Computer Evolution
No ratings yet
Brief History of Computer Evolution
13 pages
COA - 02 - Computer Evolution and Performance
No ratings yet
COA - 02 - Computer Evolution and Performance
9 pages
Computer Architecture Slides
No ratings yet
Computer Architecture Slides
274 pages
Recap 1
No ratings yet
Recap 1
15 pages
Recap 1
No ratings yet
Recap 1
15 pages
1-Module 1-12-12-2024
No ratings yet
1-Module 1-12-12-2024
43 pages
CH02 COA9e
No ratings yet
CH02 COA9e
55 pages
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
Microcontrollers and Introduction To Real-Time Programming
No ratings yet
Microcontrollers and Introduction To Real-Time Programming
41 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
12 pages
المحاضرة 2
No ratings yet
المحاضرة 2
65 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
17 pages
Epikure-3370 Tds
No ratings yet
Epikure-3370 Tds
6 pages
Biography of Manuel Medrano
No ratings yet
Biography of Manuel Medrano
2 pages
11 PSCRB
No ratings yet
11 PSCRB
75 pages
7 Habits Competencies
67% (3)
7 Habits Competencies
2 pages
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
No ratings yet
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
83 pages
Philosophical Foundation
No ratings yet
Philosophical Foundation
2 pages
Module Reading Writing Quarter 4
No ratings yet
Module Reading Writing Quarter 4
94 pages
Entirecauselist
No ratings yet
Entirecauselist
17 pages
Pearson Ecourse Vs Cengage SAM
No ratings yet
Pearson Ecourse Vs Cengage SAM
4 pages
Cause - Effect Essay 2A - Versi151122
No ratings yet
Cause - Effect Essay 2A - Versi151122
1 page
Forms Mine Rule
No ratings yet
Forms Mine Rule
22 pages
IPSAS 21& 26 Impairment
No ratings yet
IPSAS 21& 26 Impairment
39 pages
Features
No ratings yet
Features
15 pages
Superexcels Provide Differentiated Supervision: First Edition
No ratings yet
Superexcels Provide Differentiated Supervision: First Edition
23 pages
Encyclopedia of Homeopathy Andrew Lockie Download
100% (12)
Encyclopedia of Homeopathy Andrew Lockie Download
151 pages
Physics Lab Manual For PH23132-2023-24 ODD SEM
No ratings yet
Physics Lab Manual For PH23132-2023-24 ODD SEM
45 pages
CFP p2p Special Issue 25 Ad Online
No ratings yet
CFP p2p Special Issue 25 Ad Online
1 page
The Delphic Oracle: Lesson Plan: Topic: Unit: Plos
No ratings yet
The Delphic Oracle: Lesson Plan: Topic: Unit: Plos
3 pages
Module-17 - CPR and AED PDF
No ratings yet
Module-17 - CPR and AED PDF
28 pages
Museum Building: Ground Floor Plan
No ratings yet
Museum Building: Ground Floor Plan
1 page
Speaking Presentation: Wine Tourism in Moldova
No ratings yet
Speaking Presentation: Wine Tourism in Moldova
12 pages
ICOM IC-2820H Brochure
No ratings yet
ICOM IC-2820H Brochure
3 pages
Worksheet 2 - Intensifying Adverbs
No ratings yet
Worksheet 2 - Intensifying Adverbs
2 pages
Manual AlarmScout 910 117 - 4417900 - Rev00
No ratings yet
Manual AlarmScout 910 117 - 4417900 - Rev00
4 pages
River Mapping for Class X Geography
No ratings yet
River Mapping for Class X Geography
1 page
Digraph Representation in Discrete Math
No ratings yet
Digraph Representation in Discrete Math
10 pages
Product Brochure
100% (1)
Product Brochure
20 pages
Arduino Irrigation Timer Setup
No ratings yet
Arduino Irrigation Timer Setup
23 pages
The Sound of Silence Disturbed
No ratings yet
The Sound of Silence Disturbed
6 pages
SAP Number Data
No ratings yet
SAP Number Data
5 pages

Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition

Uploaded by

Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition

Uploaded by

Computer Architecture

A Quantitative Approach, Fifth Edition

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Copyright © 2012, Elsevier Inc. All rights reserved. 2

 Together have enabled:

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Copyright © 2012, Elsevier Inc. All rights reserved. 4

 Cannot continue to leverage Instruction-Level parallelism

 New models for performance:

 The new models for performance require explicit

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Price of 100 - 1,000 300 – 2,500 5,000 – 100,000 – 10 – 100,000

Price of 10 - 100 50 - 500 200- 2,000 50 - 250 0.01 - 100

Critical Cost, energy, Cost, energy, Throughput, Price- Price,

 Architectural parallelism exploits application parallelism:

 SIMD - Single Instruction stream, Multiple Data streams

 MIMD - Multiple Instruction streams, Multiple Data streams

 MISD - Multiple Instruction streams, Single Data stream

Copyright © 2012, Elsevier Inc. All rights reserved. 9

 control flow instructions,

 Real computer architecture:

Copyright © 2012, Elsevier Inc. All rights reserved. 10

Copyright © 2012, Elsevier Inc. All rights reserved. 11

Copyright © 2011, Elsevier Inc. All rights Reserved. 12

 Organization / microarchitecture  high-level aspects of

 DRAM capacity: 25-40% (slowing)

 Flash capacity: 50-60%

 Magnetic disk: 40%

Copyright © 2012, Elsevier Inc. All rights reserved. 15

 Flash memory - electronic non-volatile storage medium that

 Bandwidth or throughput total work done in a given time

 improvement for memory and disks  300 - 1,200 times

 Latency or response time  time between start and

 improvement for memory and disks  6 - 8 times

 Processors have improved at a much faster rate than

Copyright © 2012, Elsevier Inc. All rights reserved. 18

Log-log plot of bandwidth and latency milestones

Copyright © 2012, Elsevier Inc. All rights reserved. 19

 Feature size Minimum size of transistor or wire

 0.32 microns in 2011

 Transistor performance scales linearly with feature size.

Copyright © 2012, Elsevier Inc. All rights reserved. 20

 The number of transistors in a dense integrated circuit

Copyright © 2012, Elsevier Inc. All rights reserved. 21

Nature 479, 310–316 (17 November 2011) 22

 Thermal Design Power (TDP)

 Used as target for power supply and cooling system

 Lower than peak power, higher than average power

 Clock rate can be reduced dynamically to limit power

 Energy per task is often a better measurement

Copyright © 2012, Elsevier Inc. All rights reserved. 25

 Dynamic power – power to switch the transistor state

 Reducing clock rate reduces power, not energy

Copyright © 2012, Elsevier Inc. All rights reserved. 26

Copyright © 2012, Elsevier Inc. All rights reserved. 27

Copyright © 2012, Elsevier Inc. All rights reserved. 28

Copyright © 2012, Elsevier Inc. All rights reserved. 29

 Cost driven down by yield learning curve

Copyright © 2012, Elsevier Inc. All rights reserved. 30

 Defects per unit area = 0.016-0.057 defects per square cm (2010)

Copyright © 2012, Elsevier Inc. All rights reserved. 31

 QPI Quick Path Interconnect

Copyright © 2011, Elsevier Inc. All rights Reserved. 33

Copyright © 2011, Elsevier Inc. All rights Reserved. 34

Die size Estimated defect Manufacturing Transistors

IBM Power 5 389 0.3 130 276

Sun Niagara 380 0.75 90 279

AMD Opteron 199 0.75 90 233

 a. What is the yield for IBM Power 5?

Copyright © 2012, Elsevier Inc. All rights reserved. 37

Copyright © 2012, Elsevier Inc. All rights reserved. 39

Copyright © 2012, Elsevier Inc. All rights reserved. 40

 Of the 12 SPEC2006 integer programs, 9 are written in

Copyright © 2012, Elsevier Inc. All rights reserved. 41

Copyright © 2012, Elsevier Inc. All rights reserved. 42