COMPUTER ARCHITECTURE
UNIT 3
      Processors.
                                  Processors
• The main brain or the engine of the PC is the processor (sometimes called
  microprocessor) or the Central Processing Unit. The CPU performs the system’s
  calculating and processing.
• It is the most expensive single component in the system, costing up to four times
  greater than the mother board. This invention is generally accredited to Intel. This
  means that all PC-compatible systems use Intel Processor.
• The processor acts like the conductor in an orchestra. It reads program instructions
  ( commands) from main memory that tell it what it needs to do to accomplish the
  work that the user wants, and then executes them.
• The CPU is functionally divided into the control unit, internal registers and the
  Arithmetic and Logic unit.
• 
                 The CPU and Program Execution
• The computer has to read and obey every program, including the operating system itself,
  one instruction at a time. The basic operation is the fetch-decode-execute cycle. It is the
  sequence whereby each instruction within a program is read into the CPU from program
  memory and then decoded and executed.
• In the history of the development of computers, the limiting factors have been dictated the
  units involved in the fetch, decodes or execute cycle: memory, bus , or CPU. This affects
  both design parameters for computer architecture and also the selection of algorithms for
  problem solving. For example: ‘memory-intensive’ methods are chosen when memory is
  fast and cheap, otherwise ‘compute –intensive’ methods are chosen.
• DRAM chips are used in main memory but they are not as fast as the CPU.
• SRAM chips are available or they faster than DRAM but cost higher so they are only used
  in small, fat buffers, called memory caches. Memory cache helps reduce the main memory
  access delay by holding copies of current instructions and data.
• The processor is responsible for actually executing the instructions that make up
  programs and the operating system. Processors are made up several building
  blocks: execution units, registers files, and control logic.
• The execution units contains the hardware that executes instructions. this includes
  the hardware that fetches and decode instructions, as well as the arithmetic logic
  units that perform actual computation.
• Many processor contain separate execution units for integer and floating –point
  computations because very difficult hardware is required to handle these two data
  types. Also, modern processors use multiple execution units to execute
  instructions in parallel to improve performance.
                             The Register file
• This is small storage area for data that the processor is using. Values stored in the
  register file can accessed more quickly than stored in the memory system, and
  register file usually support multiple simultaneous accesses. This allows an
  operation be read all of its inputs from the register file at the same time.
• The control logic controls the rest of the processor: It determines when
  instructions can be executed and what operations are required to execute each
  instruction.
• The main function of the microprocessor or the CPU is to accept data in the form
  of a program from input devices, process the data, output the result and transform
  the result either to the memory or an output device.
• All processors are organized into three major sections
Arithmetic and Logical Unit Section (ALU)
Control Unit section (CU)
Registers (Internal Mamory) IR
• The function of the ALU is to perform arithmetic operations such as addition,
  subtraction, division, multiplication and Logical operations such as AND, OR and
  NOT.
• The function of the Control Unit is to control I/O devices, generate control signals
  to the other components of the computer such as the Read and Write signals and
  also perform instruction execution.
• Information is moved from memory to the registers and also pass the information
  to ALU for logical and arithmetic operations. It should also be noted that the
  function of the microprocessor and the CPU are the same.
• If the Control Unit, the Registers and the ALU are all packaged into one ie,
  Integrated circuit it is referred to as a microprocessor. Otherwise the unit is CPU.
                              Processor Design
• Processor design is typically divided into two subcategories: instruction set
  architecture and processor micro architecture. Instruction set architecture refers to
  the design of the set of operations that the processor executes and includes the
  choice of programming model, number of registers, and decision about how data
  is accessed.
• Processor micro architecture describes how instructions are implemented and
  includes factors such as how long it takes to execute instructions, how many
  instructions may be executed at one time, and how processor modules at one time,
  and how processor modules such as the register file are designed.
                            Working definition
• Any aspect of the processor that an assembly-language programmer needs to
  know about to write a correct program is part of the instruction set architecture,
  and any aspect that only affects performance not correctness, is part of the micro
  architecture.
                       Instruction Set Architecture
• When most computer programming was done in assembly language, instruction set
  architecture was considered the most important part of computer architecture, because
  it determined how difficult it was to obtain optimal performance from the system.
• Over the years, instruction set architecture has become less significant, for some
  reasons. First, most programming is now done in high-level languages, so the
  programmer never interacts with the instruction set.
• Second, consumers have come to expect compatibility between different generations
  of a complex system, meaning that they expect program that ran on their old system to
  run on their new system without changes.
• As a result, the instruction set of new processor is often required to be the same as the
  instruction set of the company’s previous processor, sometimes with a few additional
  instructions [meaning that most of the design effort for a processor goes into
  improving the micro architecture to increase performance.]
                             RISC VS. CISC
CISC-Complex Instruction Set Computers
• Generally require fewer instructions than RISC computers to perform a given
  computation, so a CISC computer will have higher performance than a RISC
  computer that executes instruction at the same rate.
•Programs written for CISC architectures tend to take les space in memory than the
same program written for RISC architecture.
RISC- Reduce Instruction Set Computers
• The simple instruction set of RISC architecture often allow them to be
implemented at higher clock rates than CISC architectures, allowing them to
execute more instructions in the same amount of time
• RISC architecture are load-store architectures, meaning that only load and store
instructions may access the memory system.
                                 Example
• In many CISC architectures, architecture and other instructions may need their
  inputs from or write their output to the memory system, instead of the register.
 A CISC architecture might allow and ADD operation of the form:
 ADD (RI), (R2),(R3)
 Where the parentheses around a register name indicates that the register contains
the address in memory where the operand can be found or the result should be
placed. Thus, the above ADD instruction wants the processor to add the value
contained in the memory location whose address is stored in e2 to the value
contained in the memory location whose address is stored in e3, and store the result
into memory at the address contained in R1.
• ADPR architecture is a load- store architecture to perform the same ADD
  operation, assuming in appropriate memory address are present in R1,R2 and R3
  at the start of the instruction sequence.
• It will require:
• LD R4, (R2)
• LD R5 (R3)
• ADD R6, R4, R5
• ST (R1), R6
                           Addressing Modes
• An architecture addressing modes are the set of syntax and methods that
  instruction use to specify and memory address either as the target address of a
  memory reference or as the address that a branch will jump to.
• Depending on the architectures, same of the addressing modes may only be
  available to some of the instruction that reference memory.
• Architecture that allows any instruction that references memory to use any
  addressing mode are described as Orthogonal, because the choice of addressing
  mode is independent from the choice of instruction.
Register Addressing
Label Addressing
Register plus Immediate Addressing
                       Register Addressing:
• In register addressing, an instruction that reads the value out of a
  register and uses that as the address of the memory reference or
  branch target.
                              Label Addressing
• In label addressing, a branch instruction specifies its destination as a label that is
  placed on an instruction elsewhere in the program. [Most branch instruction does
  not explicitly contain their destination addresses. Instead, the assembler/linker
  translates the label into an offset (which can be either positive or negative) from
  the location of the branch instruction to be location of its target. In effect, the
  branch instruction tells the processor how far away the target instruction is
  located.]
              Register plus Immediate Addressing
• This is typically expressed is added to the immediate (constant) values specified
  in the instruction to generate a memory address.
• One problem with all addressing modes that compute their address rather than
  taking it straight from a register is that these addressing modes increase the
  execution time of instructions that use them, since the processor must perform a
  computation before the address can send to the memory system.
• In order to provide flexibility in addressing without increasing memory latency,
  some architecture provides post incrementing addressing modes.
• These addressing modes read their address out of the specified register, send that
  address to the memory system, and then add the specified immediate to the value
  of the register
• This result is then written back to the register file. Because the address is sent
  directly from the register file to the memory system, these instructions execute
  more quickly than register plus immediate addressing mode instructions, but they
  still reduce the number instructions required to implement a program as compared
  to ISAs that only provide register addressing.
                 Multimedia Vector Instructions
• Many processor families have recently added multimedia vector instructions to
  their ISAs. These instructions are intended to improve performance on multimedia
  applications, such as video decompression and audio playback. The applications
  have several traits that make it possible to significantly improve their performance
  with a small number of new instructions:
• First, they perform the same sequence of operations on a large number of
  independent data objects, such as 8×8 blocks of compressed pixels. This trait is
  often described as data parallelism, because multiple data objects can be
  processed at the same time.
• Second, the application operate on data that is much smaller than the 32-bits or
  64-bits data words found in most modern processor.
• Video pixels, which are described by 8-bit red, green, and blue color values, are
  an example of this. Each of the color values is generally computed independently,
  meaning that 24 bits of a 32-bit ALU are idle during the computation.
• Multimedia vector instructions treat the processors data word as a collection of
  smaller data objects. Thus instead of operating on a 32-bit quantity, the data word
  is treated as a collection of four 8-bit quantities or two 16-bit quantities.
• Most of the multimedia vector instruction sets can operate on longer data types,
  such as 64-bit or 128-bit quantities, allowing more operations to be done in
  parallel.
• Many multimedia vector instructions allow the option to operate in saturating
  arithmetic mode.
• In saturating arithmetic, computations that overflow the number of bits in their
  representation return the maximum value that representation can represent, and
  computations that underflow return Ǿ. For example: adding OXAA and OXBC in
  8-bit saturated arithmetic has a result of OXFF instead of OX66
• Saturating arithmetic is useful when it is desirable to have a computation be
  limited by its maximum value. [ for example, increasing the amount of red should
  result in a pixel that is already extremely red in pixel that has the maximum
  allowable amount of redness instead of pixel that has very little redness because
  the computation has wrapped around to a small value]
• When a multimedia vector instruction executes if performs its computation in
  parallel on each of the smaller objects within its input word.
• Multimedia vector instructions can significantly improve a processors
  performance on data parallel applications that operate on small data types by
  allowing multiple computations to be performed in parallel.
• The hardware required to implement multimedia operations is typically fairly
  implement a processor’s non vector operations can be reused, making these
  operations attractive to compute architects who expect their processor to be used
  for data-parallel applications.
  Fixed-length vs. variable-length Instruction Encodings
• Instruction set architecture (ISA) encoding is the set of bits that is used to
  represent the instructions in the memory of the computer. Generally, we need an
  encoding that is both compact and requires little logic to decode, [meaning it is
  simple for the processor to figure out which instruction is represented by a figure
  out which instruction is represented by a given bit pattern in the program.]
  Unfortunately, these two goals are somewhat in conflict.
                              Fixed-length
• Instruction set encodings use the same number of bits to encode each instruction
  in the ISA. Fixed-length encodings have the encodings have the advantage that
  they are simple to decode, reducing the amount of latency of the decode logic
  required and the latency of the decode logic. Also, a processor that uses a fixed-
  length ISA encoding can easily predict the location of the next instruction to be
  executed (assuming that the current instruction is not a branch). This makes it
  easier for the processor to use pipelining of multiple instructions.
                              Variable-length
• instruction set encodings use different numbers of bits to encode the instructions
  in the ISA, depending on the number of inputs to the instruction, the addressing
  modes used, and other factors.
• Using a variable length of encoding, each instruction takes only as much space in
  memory as it requires, although many systems require that all instruction
  encodings be an integer number of bytes long.
• Using a variable-length instruction set can reduce the amount of space taken up by
  a program, but it greatly increases the complexity of the logic required to decode
  instructions, since parts of the instruction, such as the input operands, may be
  stored in different bit positions in different instructions.
• Also, the hardware cannot predict the location of the next instruction until the
  current instruction has been decoded enough to know how long the current
  instruction is
• Given the pros and cons of fixed-and variable-length instruction encodings fixed-
  length encodings are more common in recent architectures. Variable-length
  encodings are mainly used in architectures where there is a large variance between
  the amounts of space required for the longest instruction in the ISA.
• Examples of this include stack-based architectures, because many operations do
  not specify their inputs, and CISC architectures, which often contain a few
  instructions that can take a large number of inputs.
                    Processor Micro architecture
• Processor micro architecture includes all of the details about how a processor is
  implemented. The ISA has a great deal of impact on the micro architecture. An
  ISA that contains only simple operations can be implemented using a simple,
  straightforward micro architecture, while an ISA containing complex micro
  architecture to implement.
                                    SUMMARY OF PROCESSOR DESIGN
                                                  
• The architecture is the build up of the processor. There are two main types of the
  architecture or technology used to design a CPU.
  CISC – Complex Instruction Set Computer
  RISC – Reduced Instruction Set Computer
                                CISC technology
• The CISC was adopted or developed in 1978 by Intel. It started with the 8086
  microprocessor chip. It was designed to process 16 bit data word. It had no
  instructions for floating points operation. Presently Pentium processors possess 32
  bit and 64 bit word and it can process floating point instructions. This is because
  Intel designed the Pentium processor in such a way that it can execute programs
  written for 8086 processor.
                       Characteristics of CISC
A large number of instructions
Many addressing modes
Variable length of instruction
Most instructions can be manipulated in the main memory
Control unit is micro programmed.
• It should also be noted that the function of the microprocessor and the CPU are
  the same.
• If the control unit, the registers and the ALU are all packaged into one integrated
  circuit (IC), then it is referred to as a microprocessor otherwise the unit is called
  CPU as seen in most older systems.
                                     RISC
• Until the mid 1990s manufacturers were designing processors using the CISC
  technology with large set of instructions. Because of the setbacks in the former
  technology, manufacturers decided to adopt the RISC technology that executes
  instructions with only
               Characteristics of RISC technology
It requires few instructions
All instructions are of the same length
Most instructions are executed in one machine cycle
Control unit is hardwired
Few address modes
A large number of registers.                                          
                        Processor Specification.
Processor can be identified by two (2) main parameters
How wide they are
How fast they are based on their architecture.
• The speed of the processor is a fairly simple concept. It is measured in megahertz
  (mhz or Ghz) which means millions or billions of cycles per second. The faster
  the better.
• The width of a processor is a little more complicating because there are three main
  specification in a processor that are expressed in width
 Data input / output bus
 Internal registers
 Memory address bus.
 Processor speeds and marking / motherboard speed.
• Another confusing factor when comparing processor performance is that virtually
  all modern processors since the 486 DX3 run at some multiple of the motherboard
  speed. Example, a Pentium II 333 runs a multiple of five times the motherboard
  speed of 66mhz; while a Pentium II 400 runs four times the motherboard speed of
  100mhz. the number of times also determines the clock speed of the system. Most
  of the modern Pentium motherboard used today have 3 to 4 speed settings.
• If you know the clock speed of the system and the motherboard speeds, it should
  give you the speed of the processor.
• The processor speed = CPU clock speed x motherboard speed
• Pentium II 350 with CPU clock speed of 3.5x and motherboard speed of 100
    3.5x 100 = 350
       = 350 mhz
• exx 2.
 CPU Clock speed = 4.5x
 Motherboard speed = 100
 Type of CPU = 450
                                   Databus.
• The most common way to describe a processor is by the width of the processor’s
  external data bus. This defines the number of data bits that can be moved into or
  out of the processor in one cycle. A
• bus is simply a series of connections that carry common signals. Data buses are
  bundles of wires (or pins) used to send and receive data.
• The more signals that can be sent at the same time, the more data that can be
  transmitted in a specific interval and therefore the faster the bus.
• A wider bus is like having a highway with more lanes, which allow for greater
  throughput. Since data in a computer is sent as digital information consisting of a
  time interval in which a signal carries 1 data bit, the more wires you have, the
  more individual bits you can send in the same time interval.
                                Internal Registers.
• The size of the internal register indicate how much information the processor can
  operate on at one time, and how it moves data around internally within the chip.
  The register size is essentially the internal databus size.
• A register is a holding cell within the processor eg the processor can add numbers
  in two different registers, storing the result in a third register. The register size
  determines the size of data the processor can operate on.
• The register size also describes the type of software or commands and instructions a
  chip can run. That is, a processor with 32-bits internal registers can run 32-bit
  instructions that are processing 32 bit chunks of data, but processors with 16bit
  registers cannot.
• More advanced 6th generation processors such as Pentium Pro have as many as
  six(6) internal pipelines for executing instructions.
                                  Internal Cache
• Most processors have an integrated (LI) cache controller. This controller has built-in
  full core speed cache memory.
• This cache basically is an area of very fast memory built into the processor and is used
  to hold some of the current working set of code and data.
• Cache memory can be accessed with no wait states because it can fully keep up with
  the speed of the processor core.
• Using cache memory reduces a traditional system bottle neck because system RAM
  often is much slower than the CPU. This prevents the processor from having to wait for
  code and data from much slower main memory, therefore improving performance.
• If the data the processor wants is already in the internal cache, the CPU does not have
  to wait. If the data is not in the cache, the CPU must fetch it from the level 2 cache or
  from the system bus, meaning main memory directly.
                                 Address bus.
• The address bus is the set of wires that carry the addressing information used to
  describe the memory location to which the data is being sent, or from which the
  data is being retrieved. As with the data bus, each wire in an address bus carries a
  single bit of information.
• This single bit is a single digit in the address. The more wire ( digits) used in
  calculating these addresses, the greater the total number of address location. The
  size ( or width) of the address bus indicates the maximum amount of RAM that a
  chip can address.
                             Processor modes
• All Intel processors, from 386 on up, can run in several modes. Processor modes
  refer to the various operating environments and affect the instructions and
  capabilities of the chip. The processor sees and manages the system’s memory and
  the tasks that use it.
• The 3 different modes of operation possible are
   Real mode
   Protected Mode
   Virtual Real Mode (Real within Protected Mode)
                                 Real Mode
• All software running in real mode must use only 16-bit instructions and live
  within the 20-bit (1M) memory architecture. It also supports software of the type
  – single tasking
• which means that only one program can run at a time. There is no built-in
  protection to keep one from overwriting another program or even the operating
  system in memory, which means that if more than one program is running, it is
  possible for one of them to bring the entire system to a crash.
                             Protected mode.
• This chip can run an entirely new 32 bit of instruction set which also means that
  softwares running at that mode is protected from overwriting one another in
  memory. Such protection helps make the system more crash-proof as an errant
  program cannot very easily damage other programs or the operating system. In
  addition, a crashed program can easily be terminated.
           Virtual Real Mode (Real within Protected)
• Virtual Real is essentially a virtual real mode 16-bit environment that run inside
  32-bit protected mode. Eg. When you run a DOS prompt window inside windows
  98 you have created a virtual real mode session.
• Because protected mode allows true multi-tasking you can actually have several
  real mode sessions running, each with its own software running on a virtual; PC.
  This can all run simultaneously even while other 32-bit applications are running.
• Note that any program running in virtual real mode window can access up to only
  1M of memory.
                               Processor Features,
• Modern processors have several different features. The most notable are
SMM (Power Management)
Super Scaler execution
MMX Technology
Dynamic execution
Dual Independent Bus (DIN) architecture
                     SMM (Power Management)
• It is a power management circuitry put on by Intel. This circuitry enables
  processors to conserve energy use and lengthen battery life. This was introduced
  in the Intel 486 SL processor, which is an enhanced version of 486 DX processor.
• This power management feature was universalized and incorporated into all
  Pentium and later processors.
• This feature set is called SMM which stands for System Management Mode.
  SMM circuit is integrated into the physical chip but operates independently to
  control the processor’s power use based on its activity level.
• It also supports the suspend / resume features that allows for instant power on and
  power off used in laptops. These settings are normally controlled through system
  BIOS settings.
                          Superscalar Execution
• This is a newer processor feature. It is a multiple internal instruction execution
  pipeline which enables processors to execute multiple instructions at the same
  time. This technology is usually associated with high speed or high-output RISC
  chip. It is now a standard feature on newer PCs.
                           MMX Technology
• MMX Technology is named for Multi-Media eXtension or Matrix Math
  aXtensions. This was introduced in the later Pentium processor to improve video
  compression / decompression, image manipulation, encryption and I/O processing
  all of which are used in variety of softwares today.
• MMX comes in two (2) main forms. The first is very basic – it has longer L1
  cache.
• The second one has extra 57 new commands or instructions and a new instruction
  capability called Single Instruction Multiple Data (SIMD)
                           Dynamic Execution.
• It is an innovative combination of 3 processing techniques designed to help the
  processor manipulate data more efficiently. These techniques are multiple branch
  prediction, dataflow analysis and speculative execution. This means more efficient
  means of manipulating data in a more logical ordered fashion rather than simply
  processing a list of instructions.
• Dynamic Execution consists of the following
Multi branch prediction – predicts the flow of the program through several
  branches using special algorithms. The processor can anticipate jumps of
  branches in the instruction flow. It uses this to predict where the next instruction
  can be found in memory. This is possible because while the processor is fetching
  instructions, it is also looking at instructions further ahead in the program.
Dataflow analysis – it analyses and schedules instructions to be executed in an
  optimal sequence, independent of the original program order. The processor
  determines the optimal sequence for processing and executing of instruction in
  the most efficient manner.
Speculative Execution. – it increases performance by looking ahead of the
  program        counter and executing instruction that are likely to be needed later.
  This technique essentially allows the processor to complete instructions in
  advance and then grab the already completed results when necessary.
DIB (Dual Independent Bus) architecture.
• It was created to improve processor bus bandwidth and performance. Having 2
  (dual) independent data I/O buses enables the processor to access data from either
  of its buses simultaneously and in parallel, rather than in a singular sequential
  manner.
• The second backside bus in a processor with DIB is used for L2 cache, allowing it
  to run at much greater speed than if it were to share the main processor bus.
• Two buses make up the dual independent bus architecture: the L2 cache bus and
  the processor-to-main memory or system bus.