1
CS-506 ADVANCED COMPUTER
 SYSTEMS ARCHITECTURE
       Lecture# 02-03
       Gul Munir Ujjan
     Assistant Professor
 CISE Department, NEDUET
           Karachi
                                                  2
  Grading Plan
 Assessment Type    Marks   Schedule (Week No.)
Midterm              20             8
Quizzes              12          5, 12, 14
Assignments          08             ---
  Total Sessional    40
      Marks
   Final Marks       60
     TOTAL           100
                                                      3
   BOOKS
Computer Architecture – A Quantitative Approach
                J.L. Hennessy & D.A. Patterson
                      Morgan Kaufmann
Computer Arithmetic: Algorithms and Hardware Design
                       Behrooz Parhami
Advanced Computer Architecture
                   Kai Hwang & Jotwani
                       McGraw Hill
                                                  4
   Architecture & Organization
Architecture is those attributes visible to the
programmer
  Instruction set, number of bits used for data
  representation, I/O mechanisms, addressing
  techniques.
  e.g. Is there a multiply instruction?
Organization is how features are implemented
  Control signals, interfaces, memory
  technology.
  e.g. Is there a hardware multiply unit or is it
  done by repeated addition?
                                                 5
   Architecture & Organization
All Intel x86 family share the same basic
architecture
The IBM System/370 family share the same basic
architecture
This gives code compatibility
  At least backwards
  Organization differs between different
  versions
                                             6
Structure & Function
Structure: The way in which the components
are interrelated.
Function: The operation of each individual
component as part of the structure.
                        7
Structure of a
computer
      Computer
   CPU in Computer
  Control Unit in CPU
                              8
   Function
All computer functions are:
  Data processing
  Data storage
  Data movement
  Control
                                   9
BASIC COMPUTER FUNCTIONS
Function
   Both the structure and
   functioning of a computer
   are, in essence, simple. In
   general terms, there are
   only four basic functions
   that a computer can
   perform:
   Data processing: Data may
   take a wide variety of
   forms, and the range of
   processing requirements is
   broad. However, we shall
   see that there are only a few
   fundamental methods or
   types of data processing.
                                         10
BASIC COMPUTER FUNCTIONS
Function
   Data storage: Even if the
   computer is processing data on
   the fly (i.e., data come in and get
   processed, and the results go out
   immediately), the computer must
   temporarily store at least those
   pieces of data that are being
   worked on at any given moment.
   Thus, there is at least a short-
   term data storage function.
   Equally important, the computer
   performs a long-term data
   storage function. Files of data
   are stored on the computer for
   subsequent retrieval and update.
                                    11
BASIC COMPUTER FUNCTIONS
Function
   Data Movement: The
   computer’s operating
   environment consists of
   devices that serve as either
   sources or destinations of
   data. When data are received
   from or delivered to a device
   that is directly connected to
   the computer, the process is
   known as input–output (I/O),
   and the device is referred to
   as a peripheral. When data
   are moved over longer
   distances, to or from a remote
   device, the process is known
   as data communications.
                                   12
BASIC COMPUTER FUNCTIONS
Function
   Control: Within the computer,
   a control unit manages the
   computer’s resources and
   orchestrates the performance
   of its functional parts in
   response to instructions.
                                                               13
BUS INTERCONNECTION
The bus was the dominant means of computer system
component interconnection for decades.
For general-purpose computers, it has gradually given way to
various point-to-point interconnection structures, which now
dominate computer system design.
A bus is a communication pathway connecting two or more
devices.
                                                                    14
    BUS INTERCONNECTION
A key characteristic of a bus is that it is a shared transmission
medium. Multiple devices connect to the bus, and a signal transmitted
by any one device is available for reception by all other devices
attached to the bus.
If two devices transmit during the same time period, their signals will
overlap and become garbled. Thus, only one device at a time can
successfully transmit.
Typically, a bus consists of multiple communication pathways, or
lines. Each line is capable of transmitting signals representing binary
1 and binary 0.
For example, an 8-bit unit of data can be transmitted over eight bus
lines.
A bus that connects major computer components (processor, memory,
I/O) is called a system bus.
                                                            15
   DATA BUS
The data lines provide a path for moving data among
system modules. These lines, collectively, are called the data
bus.
The data bus may consist of 32, 64, 128, or even more
separate lines, the number of lines being referred to as the
width of the data bus. Because each line can carry only one
bit at a time, the number of lines determines how many bits
can be transferred at a time.
The width of the data bus is a key factor in determining
overall system performance. For example, if the data bus is
32 bits wide and each instruction is 64 bits long, then the
processor must access the memory module twice during
each instruction cycle.
                                                                    16
    ADDRESS BUS
The address lines are used to designate the source or destination of the
data on the data bus. For example, if the processor wishes to read a
word (8, 16, or 32 bits) of data from memory, it puts the address of the
desired word on the address lines.
Clearly, the width of the address bus determines the maximum
possible memory capacity of the system.
 Furthermore, the address lines are generally also used to address I/O
ports.
Typically, the higher-order bits are used to select a particular module
on the bus, and the lower-order bits select a memory location or I/O
port within the module.
For example, on an 8-bit address bus, address 01111111 and below
might reference locations in a memory module (module 0) with 128
words of memory, and address 10000000 and above refer to devices
attached to an I/O module (module 1).
                                                         17
   CONTROL BUS
The control lines are used to control the access to and the
use of the data and address lines. Because the data and
address lines are shared by all components, there must be a
means of controlling their use.
Control signals transmit both command and timing
information among system modules.
Timing signals indicate the validity of data and address
information.
Command signals specify operations to be performed.
                                              18
 CONTROL BUS
Typical control lines include:
  Memory write (WR) : causes data on the bus
  to be written into the addressed location.
  Memory read (RD): causes data from the
  addressed location to be placed on the bus.
  I/O write(IOW): causes data on the bus to be
  output to the addressed I/O port.
  I/O read(IOR): causes data from the addressed
  I/O port to be placed on the bus.
                                                             19
 CONTROL BUS
Typical control lines include:
   Transfer ACK (ACK): indicates that data have been accepted
   from or placed on the bus.
   Bus request(REQ): indicates that a module needs to gain
   control of the bus.
   Bus grant(GNT): indicates that a requesting module has been
   granted control of the bus.
   Interrupt request(INTR): indicates that an interrupt is
   pending.
   Interrupt ACK(INTA): acknowledges that the pending
   interrupt has been recognized.
   Clock(CK): is used to synchronize operations.
   Reset(RST): initializes all modules.
                        20
Structure of a
computer
      Computer
   CPU in Computer
  Control Unit in CPU
                        21
Computer Architecture
                        22
Structure of a
computer
      Computer
   CPU in Computer
  Control Unit in CPU
                                          23
  HISTORY OF COMPUTING
The generations
   First – Vacuum Tubes
   Second – Transistors
   Third – Integrated Circuits (IC)
   Later – Advancement in IC Technology
                                                   24
   Microelectronics
Literally - “small electronics”
A computer is made up of gates, memory cells and
interconnections
These can be manufactured on a semiconductor
  e.g. silicon wafer
                                                                  25
    Moore’s Law
Increased density of components on chip
Gordon Moore – co-founder of Intel
   Number of transistors on a chip will double every year
Since 1970’s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths, giving higher
performance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability
                                 26
Growth in CPU Transistor Count
                                      27
   Semiconductor Memory
1970 Fairchild Semiconductor
International, Inc.
Size of a single core
  i.e. 1 bit of magnetic core
  storage
Holds 256 bits
Non-destructive read
Much faster than core
Capacity approximately doubles each
year
                                                  28
   Intel
1971 - 4004
   First microprocessor
   All CPU components on a single chip
   4 bit
Followed in 1972 by 8008
   8 bit
   Both designed for specific applications
1974 - 8080
   Intel’s first general purpose microprocessor
                         29
  Speeding it up
Pipelining
On board cache
On board L1 & L2 cache
Branch prediction
Data flow analysis
Speculative execution
                                           30
   Performance Balance
Processor speed increased
Memory capacity increased
Memory speed lags behind processor speed
                                   31
Logic and Memory Performance Gap
                                   32
Logic and Memory Performance Gap
                                   33
Logic and Memory Performance Gap
                                                34
   Solutions
Increase number of bits retrieved at one time
   Make DRAM “wider” rather than “deeper”
Change DRAM interface
   Cache
Reduce frequency of memory access
   More complex cache and cache on chip
Increase interconnection bandwidth
   High speed buses
   Hierarchy of buses
                                                      35
  THE FIRST GENERATION
The ENIAC (Electronic Numerical Integrator and
Computer), designed and constructed at the
University of Pennsylvania, was the world’s first
general purpose electronic digital computer.
   weighing 30 tons,
   occupying 1500 square feet of floor space, and
   containing more than 18,000 vacuum tubes
   consumed 140 kilowatts of power when operational
   capable of 5000 additions per second
                                                                   36
    VON NEUMANN ARCHITECTURE
A fundamental design approach first implemented in the IAS
computer is known as the stored-program concept. This idea is
usually attributed to the mathematician John von Neumann. Alan
Turing developed the idea at about the same time.
The first publication of the idea was in a 1945 proposal by von
Neumann for a new computer, the EDVAC (Electronic Discrete
Variable Computer).
In 1946, von Neumann and his colleagues began the design of a new
stored-program computer, referred to as the IAS computer, at the
Princeton Institute for Advanced Studies. The IAS computer,
although not completed until 1952, is the prototype of all subsequent
general-purpose computers.
COMPUTER COMPONENTS   37
                                 38
Von Neumann (IAS) Architecture
   INSTRUCTION FETCH AND                                          39
   EXECUTE
Von-Neumann’s IAS machine was capable to receive instruction(s)
stored in a program memory and decode it.
Execution of an instruction begins after it is decoded.
                    40
INSTRUCTION CYCLE
                                                          41
   INSTRUCTION FETCH CYCLE
At the beginning of each instruction cycle, the processor
fetches an instruction from memory.
In a typical processor, a register called the program counter
(PC) holds the address of the instruction to be fetched next.
Always increments the PC after each instruction fetch so
that it will fetch the next instruction in sequence.
The fetched instruction is loaded into a register in the
processor known as the instruction register (IR).
                                                                     42
    INSTRUCTION EXECUTE CYCLE
Processor-memory: Data may be transferred from processor to
memory or from memory to processor.
Processor-I/O: Data may be transferred to or from a peripheral
device by transferring between the processor and an I/O module.
Data processing: The processor may perform some arithmetic or logic
operation on data.
Control: An instruction may specify that the sequence of execution be
altered. For example, the processor may fetch an instruction from
location 149, which specifies that the next instruction be from location
182. The processor will remember this fact by setting the program
counter to 182. Thus, on the next fetch cycle, the instruction will be
fetched from location 182 rather than 150.
                  43
INSTRUCTION SET
                  44
INSTRUCTION SET
                               45
Instruction-Set Architecture
           (ISA)
Instruction-Set Architecture                    46
Changing Definitions of Computer Architecture
     Three Pillars of Computer Architecture
software
                 instruction set
hardware
   Instruction-Set Architecture                    47
   Changing Definitions of Computer Architecture
No
Component
Can be
Treated In
Isolation From
the Others
   Instruction-Set Architecture                    48
   Changing Definitions of Computer Architecture
1950s to 1960s:
  The focus of the Computer Architecture
  Courses has been Computer Arithmetic
1970s to mid 1980s:
  The focus of Computer Architecture Course
  has been Instruction Set Design, the portion
  of the computer visible to programmer and
  compiler writer….. Cont’d
   Instruction-Set Architecture                    49
   Changing Definitions of Computer Architecture
1990s to date:
  The focus of the Computer Architecture
  Course is the Design of CPU, memory
  system, I/O system, Multiprocessors based
  on the quantitative principles to have price -
  performance design; i.e., maximum
  performance at minimum price
   Instruction-Set Architecture                            50
   as an interface
                                     software
                                         instruction set
                                     hardware
Our focus today will be the
Instruction Set Architecture – ISA
which is the interface between the
hardware-software.
It plays a vital role in
understanding the computer
architecture from any of the above
mentioned perspectives
   Instruction-Set Architecture                           51
   as an interface
                                    software
                                        instruction set
                                    hardware
The design of hardware and
software can’t be initiated
without defining ISA
It describes the instruction word
format and identifies the memory
addressing for data manipulation
and control operations.
    Instruction-Set Architecture                          52
    What is an interface?
                            use               imp 1    time
                                  Interface
                        use                    imp 2
                            use               imp 3
A good interface:
  Lasts through many implementations (portability,
  compatibility)
  Is used in many different ways (generality)
  Provides convenient functionality to higher levels
  Permits an efficient implementation at lower levels.
   Instruction-Set Architecture                      53
   Taxonomy of Instruction Set
Major advances in computer architecture are
typically associated with landmark instruction set
designs – stack, accumulator, general purpose
register etc.
Basic Differentiator: The type of internal storage of
the operands.
    Instruction-Set Architecture                54
    Taxonomy of Instruction Set
Major Choices of ISA:
    Stack Architecture:
    Accumulator Architecture
    General Purpose Register Architecture
      Register – memory
      Register – Register (load/store)
      Memory – Memory Architecture (Obsolete)
    Instruction-Set Architecture                             55
    Stack Architecture
Both the operands are implicitly on     Processor
the TOS
                                            TOS
Thus, it is also referred to as Zero-
Address machine
The operand may be either an input
(orange shade) or result from the ALU                  ALU
(red shade)
All operands are implicit (implied or               ....
inherited)                              Memory
The first operand is removed from the
stack and the second operand is
replaced by the result.                             ....
     Instruction-Set Architecture              56
     Stack Architecture
To execute: C=A+B            Processor
 ADD instruction has implicit TOS
 operands for the stack –
 operands are written in the
 stack using PUSH
 instruction                             ALU
  PUSH A
  PUSH B
  ADD
  POP C
   Instruction-Set Architecture                              57
   Accumulator Architecture
An accumulator is a special       Processor
register within the CPU that
serves both as both the as                     Accumulator
the implicit source of one
operand and as the result
destination for arithmetic
and logic operations.
                                                  ALU
Thus, it accumulates or
collect data and doesn’t
serve as an address register      Memory      ....
at any time
Limited number of
accumulators - usually only                   ....
one – are used
   Instruction-Set Architecture                              58
   Accumulator Architecture
The second operand is in the      Processor
memory, thus accumulator
based machines are also                        Accumulator
called 1-address machines.
They are useful when                              ALU
memory is expensive or
when a limited number of
addressing modes is to be                     ....
used.                             Memory
                                              ....
    Instruction-Set Architecture                 59
    Accumulator Architecture
To execute: C=A+B
 ADD instruction has implicit      ACCUMULATOR
 operand A for the
 accumulator, written using
 LOAD instruction; and the
                                       ALU
 second operand B is in
 memory at address B
 Load A
 ADD B
 Store C
    Instruction-Set Architecture                60
    Taxonomy of Instruction Set
Major Choices of ISA:
    Stack Architecture:
    Accumulator Architecture
    General Purpose Register Architecture
      Register – memory
      Register – Register (load/store)
      Memory – Memory Architecture (Obsolete)
  Instruction-Set Architecture                    61
  General Purpose Register Architecture
Many general purpose registers are
available within CPU like A, B, C, D etc.
Generally, CPU registers do not have
dedicated functions and can be used for a
variety of purposes – address, data and
control
A relatively small number of bits in the
instruction is needed to identify the register.
    Instruction-Set Architecture                     62
    General Purpose Register Architecture
In addition to the GPRs, there are many
dedicated or special-purpose registers as
well, but many of them are not “visible” to
the programmer
GPR architecture has explicit operands
either in register or memory thus there may
exist:
-    Register – memory architecture
-    Register – Register (Load/Store) Architecture
-    Memory – Memory Architecture (Obsolete)
   Instruction-Set Architecture                                 63
   General Purpose Register Architecture
                               Register – Memory Architecture
One explicit operand is in a
register and one in memory      Processor           ....
                                               R3
and the result goes into the                   R2
                                               R1
register                                            ....
                                                     ALU
The operand in memory is
accessed directly
                                Memory         ....
                                               ....
    Instruction-Set Architecture                                 64
    General Purpose Register Architecture
To execute: C=A+B               Register – Memory Architecture
 ADD instruction has explicit    Processor           ....
                                                R3
 operand A loaded in a                          R2
 register and the operand B                     R1
 is in memory and the result                         ....
 is in register
                                                      ALU
 Load R1, A
 ADD R2, R1, B                   Memory         ....
 Store R2, C
                                                ....
   Instruction-Set Architecture                                   65
   General Purpose Register Architecture
                               Register – Register (Load/store)
The explicit operands in       Architecture
memory are first loaded         Processor            ....
                                                R3
into registers temporarily                      R2
                                                R1
and                                                  ....
Are transferred to
                                                      ALU
memory by Store
instruction
                                Memory          ....
                                                ....
     Instruction-Set Architecture                                   66
     General Purpose Register Architecture
To execute: C=A+B                Register – Register (Load/store)
                                 Architecture
  ADD instruction has             Processor            ....
  explicit operands A and B                       R3
                                                  R2
  loaded in registers                             R1
                                                       ....
  Load R1, A
  Load R2, B                                            ALU
  ADD R3, R1, R2
  Store R3, C
  Both the explicit               Memory          ....
  operands are not
  accessed from memory
  directly, i.e., Memory –
  Memory Architecture is                          ....
  obsolete