0% found this document useful (0 votes)
31 views20 pages

Coa Unit 5

The document discusses two main computer architecture types: Reduced Instruction Set Architecture (RISC) and Complex Instruction Set Architecture (CISC), highlighting their characteristics and differences. It also covers parallel processing techniques, including pipelining and vector processing, explaining how these methods enhance computational speed and efficiency. Additionally, it describes the structure and operation of pipelines, the challenges they face, and the benefits of vector processing in handling large data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

Coa Unit 5

The document discusses two main computer architecture types: Reduced Instruction Set Architecture (RISC) and Complex Instruction Set Architecture (CISC), highlighting their characteristics and differences. It also covers parallel processing techniques, including pipelining and vector processing, explaining how these methods enhance computational speed and efficiency. Additionally, it describes the structure and operation of pipelines, the challenges they face, and the benefits of vector processing in handling large data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

unit:v

RISC and CISC


 Difficulty Level : Easy
 Last Updated : 19 Feb, 2021

Reduced Instruction Set Architecture (RISC) –


The main idea behind is to make hardware simpler by using an instruction set composed of a
few basic steps for loading, evaluating, and storing operations just like a load command will
load data, store command will store the data.
Complex Instruction Set Architecture (CISC) –
The main idea is that a single instruction will do all loading, evaluating, and storing operations
just like a multiplication command will do stuff like loading data, evaluating, and storing it,
hence it’s complex.
Both approaches try to increase the CPU performance

 RISC: Reduce the cycles per instruction at the cost of the number of instructions per
program.

 CISC: The CISC approach attempts to minimize the number of instructions per program but
at the cost of increase in number of cycles per instruction.

Earlier when programming was done using assembly language, a need was felt to make
instruction do more task because programming in assembly was tedious and error-prone due to
which CISC architecture evolved but with the uprise of high-level language dependency on
assembly reduced RISC architecture prevailed.
Characteristic of RISC –

1. Simpler instruction, hence simple instruction decoding.

2. Instruction comes undersize of one word.

3. Instruction takes a single clock cycle to get executed.

4. More number of general-purpose registers.

5. Simple Addressing Modes.

6. Less Data types.

7. Pipeline can be achieved.


Characteristic of CISC –

1. Complex instruction, hence complex instruction decoding.

2. Instructions are larger than one-word size.

3. Instruction may take more than a single clock cycle to get executed.

4. Less number of general-purpose registers as operation get performed in memory itself.

5. Complex Addressing Modes.

6. More Data types.

Example – Suppose we have to add two 8-bit number:

 CISC approach: There will be a single command or instruction for this like ADD which will
perform the task.

 RISC approach: Here programmer will write the first load command to load data in registers
then it will use a suitable operator and then it will store the result in the desired location.

So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are
longer and require more memory to get stored but require fewer transistors due to less
complex command.
Difference –

RISC CISC

Focus on software Focus on hardware

Uses both hardwired and micro programmed


Uses only Hardwired control unit control unit

Transistors are used for storing complex


Transistors are used for more registers Instructions

Fixed sized instructions Variable sized instructions

Can perform only Register to Register Can perform REG to REG or REG to MEM or
Arithmetic operations MEM to MEM

Requires more number of registers Requires less number of registers

Code size is large Code size is small

An instruction execute in a single clock cycle Instruction takes more than one clock cycle

An instruction fit in one word Instructions are larger than the size of one word
PIPELINE AND VECTOR PROCESSING

Parallel processing:
• Parallel processing is a term used for a large class of techniques that

are used to provide simultaneous data-processing tasks for the purpose of increasing the
computational speed of a computer system.

 It refers to techniques that are used to provide simultaneous data processing.

 The system may have two or more ALUs to be able to execute two or
more instruction at the same time.

 The system may have two or more processors operating concurrently.

 It can be achieved by having multiple functional units that perform same or different
operation simultaneously.

• Example of parallel Processing:

– Multiple Functional Unit:

Separate the execution unit into eight functional units operating in parallel.

 There are variety of ways in which the parallel processing can be classified

 Internal Organization of Processor

 Interconnection structure between processors

 Flow of information through system


Architectural Classification:

– Flynn's classification

» Based on the multiplicity of Instruction Streams and Data Streams

» Instruction Stream

• Sequence of Instructions read from memory

» Data Stream

• Operations performed on the data in the processor

 SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.

 SIMD represents an organization that includes many processing units under the
supervision of a common control unit.

 MISD structure is of only theoretical interest since no practical system has been
constructed using this organization.

 MIMD organization refers to a computer system capable of processing


several programs at the same time.

The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.

 Parallel Processing can be discussed under following topics:

 Pipeline Processing

 Vector Processing

 Array Processors
PIPELINING:

• A technique of decomposing a sequential process into suboperations, with


each subprocess being executed in a special dedicated segment that operates
concurrently with all other segments.

• It is a technique of decomposing a sequential process into sub operations, with


each sub process being executed in a special dedicated segments that operates
concurrently with all other segments.

• Each segment performs partial processing dictated by the way task


is partitioned.

• The result obtained from each segment is transferred to next segment.

• The final result is obtained when data have passed through all segments.

• Suppose we have to perform the following task:

• Each sub operation is to be performed in a segment within a pipeline. Each segment


has one or two registers and a combinational circuit.
OPERATIONS IN EACH PIPELINE STAGE:

• General Structure of a 4-Segment Pipeline

• Space-Time Diagram

The following diagram shows 6 tasks T1 through T6 executed in 4segments.

PIPELINE SPEEDUP:

Consider the case where a k-segment pipeline used to execute n tasks.

 n = 6 in previous example
 k = 4 in previous example

• Pipelined Machine (k stages, n tasks)

 The first task t1 requires k clock cycles to complete its operation since there
are k segments

 The remaining n-1 tasks require n-1 clock cycles

 The n tasks clock cycles = k+(n-1) (9 in previous example)

• Conventional Machine (Non-Pipelined)

 Cycles to complete each task in nonpipeline = k

 For n tasks, n cycles required is

• Speedup (S)

 S = Nonpipeline time /Pipeline time

 For n tasks: S = nk/(k+n-1)

 As n becomes much larger than k-1; Therefore, S = nk/n = k

PIPELINE AND MULTIPLE FUNCTION UNITS:

Example:

- 4-stage pipeline

- 100 tasks to be executed

- 1 task in non-pipelined system; 4 clock cycles

Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles

Non-Pipelined System : n*k = 100 * 4 = 400 clock cycles

Speedup : Sk = 400 / 103 = 3.88

Types of Pipelining:

• Arithmetic Pipeline

• Instruction Pipeline

ARITHMETIC PIPELINE:

 Pipeline arithmetic units are usually found in very high speed computers.

 They are used to implement floating point operations.


 We will now discuss the pipeline unit for the floating point addition and subtraction.

 The inputs to floating point adder pipeline are two normalized floating point numbers.

 A and B are mantissas and a and b are the exponents.

 The floating point addition and subtraction can be performed in four

segments. Floating-point adder:

[1] Compare the exponents

[2] Align the mantissa

[3] Add/sub the mantissa

[4] Normalize the result

X = A x 10a = 0.9504 x 103

Y = B x 10b = 0.8200 x 102

1) Compare exponents :

3-2=1

2) Align mantissas

X = 0.9504 x 103

Y = 0.08200 x 103

3) Add mantissas

Z = 1.0324 x 103

4) Normalize result

Z = 0.10324 x 104
Instruction Pipeline:

 Pipeline processing can occur not only in the data stream but in the instruction
stream as well.

 An instruction pipeline reads consecutive instruction from memory while


previous instruction are being executed in other segments.

 This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.

Four Segment CPU Pipeline:

 FI segment fetches the instruction.

 DA segment decodes the instruction and calculate the effective address.

 FO segment fetches the operand.

 EX segment executes the instruction.


INSTRUCTION CYCLE:

Pipeline processing can occur also in the instruction stream. An instruction

pipeline reads consecutive instructions from memory while previous

instructions are being executed in other segments.

Six Phases* in an Instruction Cycle

[1] Fetch an instruction from memory

[2] Decode the instruction


[3] Calculate the effective address of the operand

[4] Fetch the operands from memory

[5] Execute the operation

[6] Store the result in the proper place

* Some instructions skip some phases

* Effective address calculation can be done in the part of the decoding phase

* Storage of the operation result into a register is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory

[2] DA: Decode the instruction and calculate the effective address of the operand

[3] FO: Fetch the operand

[4] EX: Execute the operation

Pipeline Conflicts :

– Pipeline Conflicts : 3 major difficulties


1) Resource conflicts: memory access by two segments at the same time. Most of these
conflicts can be resolved by using separate instruction and data memories.

2) Data dependency: when an instruction depend on the result of a previous instruction,


but this result is not yet available.
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.

3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the
value of PC.

Handling Data Dependency:

 This problem can be solved in the following ways:

 Hardware interlocks: It is the circuit that detects the conflict situation


and delayed the instruction by sufficient cycles to resolve the conflict.

 Operand Forwarding: It uses the special hardware to detect the conflict


and avoid it by routing the data through the special path between pipeline
segments.

 Delayed Loads: The compiler detects the data conflict and reorder the
instruction as necessary to delay the loading of the conflicting data by
inserting no operation instruction.

Handling of Branch Instruction:

 Pre fetch the target instruction.

 Branch target buffer(BTB) included in the fetch segment of the pipeline

 Branch Prediction

 Delayed

Branch RISC Pipeline:

 Simplicity of instruction set is utilized to implement an instruction pipeline


using small number of sub-operation, with each being executed in single clock
cycle.

Since all operation are performed in the register, there is no need of effective
address calculation.

Three Segment Instruction Pipeline:

 I: Instruction Fetch

 A: ALU Operation

 E: Execute Instruction

Delayed Load:
Delayed Branch:

Let us consider the program having the following 5 instructions


Vector Processing
Vector processing performs the arithmetic operation on the large array of integers or floating-point
number. Vector processing operates on all the elements of the array in parallel providing each pass
is independent of the other.

Vector processing avoids the overhead of the loop control mechanism that occurs in general-purpose
computers.

In this section, we will have a brief introduction on vector processing, its characteristics, about vector instructions
and how the performance of the vector processing can be enhanced? So lets us start.

Content: Vector Processing in Computer Architecture


Introduction

1. Characteristics
2. Vector Instruction
3. Improving Performance
4. Key Takeaways

Introduction
We need computers that can solve mathematical problems for us which include, arithmetic
operations on the large arrays of integers or floating-point numbers quickly. The general-purpose
computer would use loops to operate on an array of integers or floating-point numbers. But, for large
array using loop would cause overhead to the processor.

To avoid the overhead of processing loops and fasten the computation, some kind of parallelism
must be introduced. Vector processing operates on the entire array in just one operation i.e. it
operates on elements of the array in parallel. But, vector processing is possible only if the
operations performed in parallel are independent.

Look at the figure below, and compare the vector processing with the general computer processing,
you will notice the difference. Below, instructions in both the blocks are set to add two arrays and
store the result in the third array. Vector processing adds both the array in parallel by avoiding the
use of the loop.
Operating on multiple data in just one instruction is also called Single Instruction Multiple
Data (SIMD) or they are also termed as Vector instructions. Now, the data for vector instruction
are stored in vector registers.

Each vector register is capable of storing several data elements at a time. These several data
elements in a vector register is termed as a vector operand. So, if there are n number of elements
in a vector operand then n is the length of the vector.

Supercomputers were evolved to deal with billions of floating-point operations/second.


Supercomputer optimizes numerical computations (vector computations).

But, along with vector processing supercomputers are also capable of doing scalar processing.
Later, Array processor was introduced which particularly deals with vector processing, they do not
indulge in scalar processing.

Characteristics of Vector Processing


Each element of the vector operand is a scalar quantity which can either be an integer, floating-
point number, logical value or a character. Below we have classified the vector instructions in four
types.
Here, V is representing the vector operands and S represents the scalar operands. In the figure
below, O1 and O2 are the unary operations and O3 and O4 are the binary operations.

Most of the vector instructions are pipelined as vector instruction performs the same operation on
the different data sets repeatedly. Now, the pipelining has start-up delay, so longer vectors would
perform better here.

The pipelined vector processors can be classified into two types based on from where the operand
is being fetched for vector processing. The two architectural classifications are Memory-to-Memory
and Register-to-Register.

In Memory-to-Memory vector processor the operands for instruction, the intermediate result and
the final result all these are retrieved from the main memory. TI-ASC, CDC STAR-100, and Cyber-
205 use memory-to-memory format for vector instructions.

In Register-to-Register vector processor the source operands for instruction, the intermediate
result, and the final result all are retrieved from vector or scalar registers. Cray-1 and Fujitsu VP-
200 use register-to-register format for vector instructions.

Vector Instruction
A vector instruction has the following fields:

1. Operation Code

Operation code indicates the operation that has to be performed in the given instruction. It decides
the functional unit for the specified operation or reconfigures the multifunction unit.

2. Base Address

Base address field refers to the memory location from where the operands are to be fetched or to
where the result has to be stored. The base address is found in the memory reference instructions.
In the vector instruction, the operand and the result both are stored in the vector registers. Here,
the base address refers to the designated vector register.
3. Address Increment

A vector operand has several data elements and address increment specifies the address of the
next element in the operand. Some computer stores the data element consecutively in main
memory for which the increment is always 1. But, some computers that do not store the data
elements consecutively requires the variable address increment.

4. Address Offset

Address Offset is always specified related to the base address. The effective memory address is
calculated using the address offset.

5. Vector Length

Vector length specifies the number of elements in a vector operand. It identifies


the termination of a vector instruction.

Improving Performance
In vector processing, we come across two overheads setup time and flushing time. When the vector
processing is pipelined, the time required to route the vector operands to the functional unit is
called Set up time. Flushing time is a time duration that a vector instruction takes right from
its decoding until its first result is out from the pipeline.

The vector length also affects the efficiency of processing as the longer vector length would cause
overhead of subdividing the long vector for processing.

For obtaining the better performance the optimized object code must be produced in order to utilize
pipeline resources to its maximum.

1. Improving the vector instruction

We can improve the vector instruction by reducing the memory access, and maximize resource
utilization.

2. Integrate the scalar instruction

The scalar instruction of the same type must be integrated as a batch. As it will reduce the overhead
of reconfiguring the pipeline again and again.

3. Algorithm

Choose the algorithm that would work faster for vector pipelined processing.

4. Vectorizing Compiler

A vectorizing compiler must regenerate the parallelism by using the higher-level programming
language. In advance programming, the four-stage are identified in the development of the
parallelism. Those are

 Parallel Algorithm(A)
 High-level Language(L)
 Efficient object code(O)
 Target machine code (M)

You can see a parameter in the parenthesis at each stage which denotes the degree of parallelism.
In the ideal situation, the parameters are expected in the order A≥L≥O≥M.

Key Takeaways

 Computers having vector instruction are vector processors.


 Vector processor have the vector instructions which operates on the large array of integer or
floating-point numbers or logical values or characters, all elements in parallel. It is
called vectorization.
 Vectorization is possible only if the operation performed in parallel are independent of each other.
 Operands of vector instruction are stored in the vector register. A vector register stores several
data elements at a time which is called vector operand.
 A vector operand has several scalar data elements.
 A vector instruction needs to perform the same operation on the different data set. Hence, vector processors
have a pipelined structure.
 Vector processing ignores the overhead caused due to the loops while operating on an array.

So, this is how vector processing allows parallel operation on the large arrays and fasten the
processing speed.

You might also like