ELT3047 Computer Architecture
Lecture 1: Introduction
Hoang Gia Hung
Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Curriculum overview
❑ VNU-UET’s computer engineering curriculum
INT1008 INT2210 INT2290
Intro. to prog. DS & algo. C prog.
INT2241 INT3217 INT3402
Princ. of OS Sys. Prog. Compilers
ELT3047E Computer
Architecture
ELT3048 Digital
Design & Comp. Org.
ELT2041 Digital Elec.
ELT2040 Analog Elec.
ELT2032 Semiconductor
ELT2030 Elec. Eng.
EPN1095 Physics
Course information
❑ Lecturers
➢ Hoàng Gia Hưng, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Dương Minh Ngọc, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Appointment-based consultation
❑ Pre-requisites: INT1008/2290, ELT2041.
❑ Text book: David Patterson and John Hennessy, “Computer
Organization & Design: The Hardware/Software Interface” (5th
Ed.), Morgan Kaufmann Publishers, 2014.
❑ Grading:
➢ Quizzes/essays: 20%
➢ Midterm: 20%
➢ Final: 60%
❑ Some ground rules:
➢ Respect
➢ Honest
➢ Proactive
Old-time computers
❑ Mechanical computers
➢ Schickhard (1623), Pascal
(1642)
➢ Babbage (1823 – Difference
Engine, 1834 – Analytical
Engine)
❑ The Electronic Era (1946-
1974)
➢ ENIAC (1943-1946)
➢ EDVAC (1944-1952) & Von
Neumann computer (1945)
➢ Mark I-IV (1944-1952) Harvard
architecture
❑ Modern computers
➢ The PC Era (1975-2009)
➢ The Post-PC Era (2010-present)
Device trend
Modern computers
❑ Personal Mobile Device (PMD)
✓ e.g. smart phones, tablet computers
✓ Emphasis: energy efficiency & real-time
❑ Desktop Computing
✓ Emphasis on price-performance
❑ Servers
✓ Emphasis: availability, scalability, throughput
❑ Warehouse Scale Computers
✓ Used for “Software as a Service (SaaS)”
✓ Emphasis: availability, price-performance
❑ IoT/Embedded Computers
✓ Emphasis: price
What exactly is this course about?
❑ Course contents: you’ll learn what’s under the hood of a
modern computer
➢ How programs are translated into the machine language
✓ And how the hardware executes them
➢ The hardware/software interface
✓ How does software instruct the hardware to perform needed functions?
➢ What determines program performance
✓ And can a programmer improve the performance?
➢ How hardware designers improve performance
Below your program
❑ Application software
✓ Written in high-level language
❑ System software
✓ Compiler: translates HLL code to machine code
✓ Operating System: service code
▪ Handling input/output
▪ Managing memory and storage
▪ Scheduling tasks & sharing resources
❑ Hardware
✓ Electronic components organized in
accordance with a certain design
✓ Examples of principal components are:
Processor, memory, I/O controllers
Levels of Program Code
❑ High-level language
➢ Level of abstraction closer to problem
domain
➢ Provides for productivity and
portability
❑ Assembly language
➢ Human-readable format of
instructions
❑ Machine language
➢ Computer-readable format
➢ Binary digits (bits)
➢ Encoded instructions and data
The Five Classic Components of a (Von
Neumann) Computer
❑ A central arithmetical unit
capable of perform the
elementary operations of
arithmetic (Datapath)
❑ A central control unit capable
of logical control of the device,
i.e. properly sequencing of its
operations (Control) + Network
❑ A main memory, which stores both data and instructions (Memory)
➢ Stored-program concept
❑ Input units to transfer information from the outside recording medium
(R) into its specific parts C (CA+CC) and M (Input).
❑ Output units to transfer to transfer information from C (CA+CC) and M
to R (Output).
❖ Major components are interconnected through system bus
How is a stored program executed?
❑ A program written in HLL is a series of
instructions, which will be turn into
binary numbers, just like data, and
stored in memory.
➢ c.f. Harvard architecture
❑ To run or execute the stored program,
the processor fetches the instructions
from memory sequentially.
➢ The fetched instructions are then
decoded and executed by the digital
hardware.
➢ Large, complex program execution is a
series of memory reads and instruction
executions.
➢ Operation of HW is governed by a clock
Clock period
Computer Architecture
❑ Objective: design a computer to meet
functional requirements as well as price,
power, and performance goals.
➢ inspired by the target market (PC/embedded/…)
➢ Use abstract layer representation to hide lower-
level detail → reduce design complexity
❑ Computer engineers mainly involve with:
➢ Instruction set architecture (ISA): what the
computer does (programmer’s view).
➢ Organization: how the ISA is implemented
(physical view of the computer)
➢ Hardware: implementation of the ISA on
specific logic design & packaging technology.
➢ In this course, we’ll be focusing ISA &
organization only.
Defining computer performance
❑ Response time
➢ Time between start and completion of a task (on a machine X), as observed
1
by end user: 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋 =
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑋
❑ Throughput
➢ Total work done per unit time e.g., tasks/transactions/… per hour
❑ Response time vs. Throughput
➢ Decreasing execution time improves throughput: less time to run a task ⇒
more tasks can be executed
➢ Increasing throughput can also improve response time: even execution time
of individual sequential tasks is not changed, more tasks can be executed in
parallel ⇒ less waiting time in scheduling queue.
❑ In this course, we will primarily focus on response time
➢ Performance are relative: engineers want future generation of computers
(X) they are designing to be 𝑛 times faster than the current generation (Y)
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑌
= =𝑛
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑌 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑋
Measuring Execution Time
❑ Elapsed time = Total response time, including all aspects
➢ Processing, I/O, OS overhead, idle time
➢ Useful, but often not very good for designers (why?)
❑ Our focus: CPU time = time the CPU spent processing a job
➢ Does NOT count I/O time, other jobs’ shares
➢ Can be measured in seconds or number of CPU clock cycles
Clock period
Clock (cycles)
Data transfer
& computation
Update state
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
Example 1
❑ Given Computer A: 2GHz clock, 10s CPU time. Task: design
Computer B with following specs:
➢ Aim for 6s CPU time
➢ Can do faster clock, but causes 1.2 × clock cycles
❑ How fast must Computer B clock be?
❑ Solution:
Clock Cycles B 1.2 Clock Cycles A
Clock Rate B = =
CPU Time B 6s
Clock Cycles A = CPU Time A Clock Rate A
= 10s 2GHz = 20 10 9
1.2 20 10 9 24 10 9
Clock Rate B = = = 4GHz
6s 6s
Instruction Count and CPI
❑ Instructions of a program is sequentially fetched & executed at a
constant clock rate.
➢ #clock cycles depends on #instructions in a program (instruction count) &
#cycles required by each instruction (e.g. multiplication > addition)
➢ In general, #clock cycles = IC x CPI (average #cycles per instruction).
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝐶𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝐶×𝐶𝑃𝐼
= 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
➢ CPI provides one way of comparing different implementations of the same
CPI in More Detail
❑ The average number of cycles per instruction
➢ It’s unrealistic to count #clock cycles for every instruction in a program.
❑ In practice, CPI depends on a wide variety of design details
➢ HW factors: the memory system and the processor structure;
➢ SW factors: the mix of instruction types executed in an application
❑ Each instruction classes (e.g. ALU, MEM, …) has different CPI
➢ If a program has 𝑛 different classes of instructions with corresponding 𝐶𝑃𝐼𝑖
and instruction count 𝐼𝐶𝑖 , then 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = σ𝑛𝑖=0 𝐶𝑃𝐼𝑖 × 𝐼𝐶𝑖 .
➢ The (weighted average) CPI of the program is
𝑛
𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝐼𝐶𝑖
𝐶𝑃𝐼 = = 𝐶𝑃𝐼𝑖 ×
𝐼𝐶 𝐼𝐶
𝑖=1 Relative frequency
Example 2
❑ Alternative compiled code sequences using instructions in classes A, B,
C. Which code sequence executes the most instructions? Which will be
faster? What is the CPI for each sequence?
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
❑ Sequence 1: IC = 5 ❑ Sequence 2: IC = 6
❑ Clock Cycles ❑ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
❑ Avg. CPI = 10/5 = 2.0 ❑ Avg. CPI = 9/6 = 1.5
➢ Sequence 2 is faster, even though it executes one extra instruction.
Performance summary
❑ The iron law of CPU performance
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝐶𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝐶 × 𝐶𝑃𝐼
=
𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
➢ Cycle time: reciprocal of the CPU frequency, provided by manufacturer
➢ Instruction count: measured by software tools (profiler) or hardware counters
➢ CPI: obtain by simulation or hardware counters
❑ Performance depends on
➢ Hardware: affects clock rate (limited by the power wall!), CPI (parallelism)
➢ Algorithm: affects IC, possibly CPI
➢ Programming language: affects IC, CPI
➢ Compiler: affects IC, CPI
➢ Instruction set architecture: affects IC, CPI, Clock cycle time
Quizz 1
❑ Suppose we have two implementations of the same ISA for a
given program. Which one is faster and by how much?
Machine Clock cycle CPI
A 250 (ps) 2.0
B 500 (ps) 1.2
❑ Solution:
CPU Time = Instruction Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps
CPU Time = Instruction Count CPI Cycle Time
B B B A is faster…
= I 1.2 500ps = I 600ps
CPU Time
B = I 600ps = 1.2
…by this much
CPU Time I 500ps
A
Quizz 2
❑ Given: instruction mix of a program on a computer
Classi Freqi CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
➢ What is average CPI? What is the % of time used by each instruction class?
➢ How faster would the machine be if load time is 2 cycles? What if two ALU
instructions could be executed at once?
❑ Solution:
➢ Average CPI = 0.5x1+0.2x5+0.1x3+0.2x2 = 2.2. Time percentages: 23%, 45%,
14%, 18%.
𝑜𝑙𝑑 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If load time is 2 cycles: = = = 1.38
𝑛𝑒𝑤 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑛𝑒𝑤 𝐶𝑃𝐼 1.6
𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If two ALU instructions could be executed at once: = = 1.13
𝑛𝑒𝑤 𝐶𝑃𝐼 1.95
A brief review of binary numbers
❖ You’ll have to conduct a detailed review by yourselves after this
session.
Binary representations of integers
❑ Natural numbers: unsigned binary
MSB LSB
❑ Negative numbers
➢ Sign-magnitude: one bit is used for the sign, the remaining represent the
magnitude of the number → several disadvantages.
➢ Two’s complement: the positive half (from 0 to 231 -1) use the same bit
pattern as unsigned binary. The negative half begins with 1000 . . . 0000two
representing – 231 and ends with 1111 . . . 1111two = -1.
Binary number conversions
❑ Given an 𝑛-bit two’s compliment number
x = −xn−12n−1 + xn−2 2n−2 + + x121 + x 0 20
➢ Leading bit is the sign bit (0 → +ve, 1 → -ve)
❑ Example: 1111 1111 1111 1111 1111 1111 1111 11002 = ?10
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
❑ Example: -510 = ?2 (two’s complement)
1. Convert magnitude to binary: 510 = 0101
2. Invert bits: 1010
3. Add 1 to lsb: + 1
10112
Some useful shortcuts
❑ Sign extension
➢ How does computer convert a two’s complement number stored in 8 bit
format to its 16 bit equivalent?
❑ Negation
➢ Is there a quick way to negate a two’s complement binary number?
Hexadecimal representation
❑ Binary numbers are written in long, tedious strings of 0s
and 1s
➢ Hexadecimal: a higher base that can be easily converted to binary
❑ Easy binary-hexadecimal conversion
Binary representation of fractions
❑ Binary point is implied
❑ Fixed point: the number of integer and fraction bits must be
agreed upon (fixed) beforehand.
➢ Example: What’s the binary representation of 6.7510 using 4 integer bits
and 4 fraction bits?
01101100, implying:
0110.1100
22 + 21 + 2-1 + 2-2 =6.75
❑ Floating point: binary point floats to the right of the MSB
➢ Similar to decimal scientific notation: −2340 = −2.34 × 103 (normalized,
i.e. exactly one non-zero digit appears before the point), or −0.234 × 104
(not normalized)
➢ Normalized binary representation: ±1.xxxxxxx2 × 2yyyy → significand =
±1.xxxxxxx2, and fraction = xxxxxxx2. Notice that the exponent is also
binary, i.e. exponent = yyyy2, but the notation was dropped in the above
expression for simplification.
IEEE 754 Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
x = ( −1) S (1+ Fraction) 2(Exponent −Bias)
❑ S: sign bit (0 → non-negative, 1 → negative)
❑ Normalize significand: 1.0 ≤ |significand| < 2.0
➢ Always has a leading pre-binary-point 1 bit, so no need to represent it
explicitly (hidden bit)
➢ Significand is Fraction with the “1.” restored
❑ Exponent = actual exponent + Bias (excess representation)
➢ Ensures exponent is unsigned
➢ Single: Bias = 127; Double: Bias = 1023
❑ Example: What number is represented by the single-precision
float 11000000101000…00?
➢ S = 1, Fraction = 01000…002, Exponent = 100000012 = 129
➢ x = (–1)1 × (1 + 012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0
IEEE 754 Special Cases
Number Sign Exponent Fraction
0 X 00000000 00000000000000000000000
∞ 0 11111111 00000000000000000000000
-∞ 1 11111111 00000000000000000000000
NaN X 11111111 non-zero
IEEE 754 Ranges
single: 23 bits single: 8 bits single: 127
double: 52 bits double: 11 bits double: 1023
x = ( −1) S (1+ Fraction) 2(Exponent −Bias)
❑ Exponents 000…00 and 111…11 are reserved
❑ Single-precision range:
➢ Smallest value: Exponent = 00000001 (i.e. actual exponent = 1 – 127 = –
126), Fraction: 000…00 → significand = 1.0 → the smallest values are ±1.0 ×
2–126 ≈ ±1.2 × 10–38.
➢ Largest value: Exponent = 11111110 (i.e. actual exponent = 254 – 127 =
+127), Fraction: 111…11 → significand ≈ 2.0 → the largest values are ±2.0 ×
2+127 ≈ ±3.4 × 10+38.
❑ Double-precision range:
➢ Smallest value: Exponent = 00000000001 (i.e. actual exponent = 1 – 1023 =
–1022), Fraction: 000…00 → significand = 1.0 → the smallest values are
±1.0 × 2–1022 ≈ ±2.2 × 10–308.
➢ Largest value: Exponent = 11111111110 (i.e. actual exponent = 2046 – 1023 =
+1023), Fraction: 111…11 → significand ≈ 2.0 → the largest values are ±2.0
× 2+1023 ≈ ±1.8 × 10+308.
Example
❑ Convert –0.8125 to binary in single and double precision.
❑ Solution:
➢ Fraction bits can be obtained using multiplication by 2
▪ 0.8125 × 2 = 1.625
▪ 0.625 × 2 = 1.25 0.8125 = (0.1101)
▪ 0.25 × 2 = 0.5 2
▪ 0.5 × 2 = 1.0
▪ Stop when fractional part is 0
➢ Fraction = (0.1101)2 = (1.101)2 × 2 –1 (Normalized)
➢ Exponent = –1 + Bias = 126 (single precision) and 1022 (double)
Single
10111111010100000000000000000000
Precision
10111111111010100000000000000000 Double
Precision
00000000000000000000000000000000
Underflow and denormal numbers
❑ Numbers below 2–126 are too small to be represented
(underflow) → how do we extend the range to 0?
❑ Denormalized: change the behavior of IEEE 754 numbers
when Exponent is 0 and Fraction ≠ 0
➢ Implicit 1. before the fraction now becomes 0. (not normalized)
➢ Ends up losing precision at small numbers but provide gradual underflow to 0
❑ Value of denormalized number (S, 0, F)
➢ Single precision: (–1)S × (0.F)2 × 2–126
➢ Double precision: (–1)S × (0.F)2 × 2–1022
Negative Negative Positive Positive
Overflow Underflow Underflow Overflow
+
-∞ Normalized (–ve) Denorm Denorm Normalized (+ve)
∞
-2128 -2–126 0 2–126 2128
Summary
❑ Course overview
➢ You’ll learn the HW/SW interface in a modern (=Von Neumann) computer
❑ Von Neumann model
➢ Instructions are stored in a linear memory → can be modified like data.
➢ Instructions are fetched and executed in CPU sequentially.
❑ Computer architecture
➢ Represented in abstract layers to manage complexity
➢ ISA = what the computer does; Organization = how the ISA is implemented;
Realization = implementation of the ISA on specific integrated circuits.
❑ Computer performance
➢ Is reciprocal of CPU time
➢ Also follows the classical CPU performance equation
❑ Next week: ISA design.