SRM Institute of Science and Technology
Mode of Exam
College of Engineering and Technology
School of Computing OFFLINE
(Common to all Branches)
DEPARTMENT OF COMPUTING TECHNOLOGIES
SRM Nagar, Kattankulathur – 603203, Chengalpattu District, Tamilnadu
Academic Year: 2023-24 (ODD)
Test: CLAT3 Date: 8.11.2023
Course Code & Title: 21CSS201T / COA Duration: 100 minutes
Year & Sem: II & III SET C Max. Marks: 50
Course Articulation Matrix:
Course
Learning At the end of this course, learners PO PO PO PO PO PO PO PO PO PO PO PO
Outcomes will be able to: 1 2 3 4 5 6 7 8 9 10 11 12
(CLO)
Analyze concepts of parallelism - - - - - - -
CO-4 H - - - -
and multi-core processors
Classify the memory technologies,
CO-5 input-output systems and evaluate H M - - - - - - - - - -
the performance of memory system
Part – A
Instructions: Answer all (10 x 1 = 10 Marks)
Q. Question Marks BL CO PO PI
No Code
1. The function of Rin is to 1 1 4 1.3 1.3.1
a. load the data from bus to register
b. place the content of register onto the bus
c. fetch the data
d. execute the data
2 What is the reason for incrementing the PC by 4 in multi-bus 1 2 4 1.3 1.3.1
organization?
a. The word size is 8 bit
b. The word size is 16 bit
e. The word size is 32 bit
c. The word size is 64 bit
3 Y, temp and Z in single bus organization refers to 1 1 4 1.3 1.3.1
a. General purpose registers used by the processor
b. Temporary registers used by processor
c. Accumulator
d. Special purpose registers used by processor
4 How many buffers are required in the Hardware organization for 2 1 1 4 1.3 1.3.1
stage pipeline and4 stage pipeline?
a. 2, 1
b. 1, 3
c. 3, 1
d. 2, 4
5 If the state machine moves from LNT to LT in 2-state branch 1 1 4 1.3 1.3.1
prediction algorithm, then
a. Branch is taken
b. Branch is not taken
c. Branch is strongly taken
d. Branch is strongly not taken
6 What is the primary need for parallelism in computing? 1 1 5 1 2.1.2
a) Reduce power consumption
b) Increase performance
c) Decrease memory usage
d) Enhance security
7 Flynn's classification categorizes parallel systems based on what two 1 1 5 1 2.1.2
key parameters?
a) Processing power and memory size
b) Instruction stream and data stream
c) Program control and data flow
d) Data-level parallelism and task-level parallelism
8 What is the primary advantage of the Thumb instruction set in ARM 1 1 5 1 2.1.2
processors?
a) Greater computational power
b) Smaller code size
c) Enhanced multimedia capabilities
d) Improved memory management
9 Which of the following programming challenges is often associated 1 1 5 1 2.1.2
with MIMD architectures?
a) Limited processing power
b) Data synchronization and communication overhead
c) Lack of parallelism
d) D. Reduced memory capacity
10 Which of the following is a key feature of ARM5 architecture? 1 1 5 1 2.1.2
a. 64-bit instruction set
b. SIMD (Single Instruction, Multiple Data) support
c. Variable-length instruction encoding
d. Superscalar execution
Part – B
Instructions: Answer any 4 ( 4 x 4 = 16 Marks)
11 State the role of following control signals. MARin, MARout
Signal MARin controls the connection to the internal processor address bus and signal MARout controls
the connection to the memory address bus.
12 Analyze the advantages and disadvantages of hardwired and micro programmed control?
Advantages of Hardwired Control Unit :
1. Because of the use of combinational circuits to generate signals, Hardwired Control Unit is fast.
2. It depends on number of gates, how much delay can occur in generation of control signals.
3. It can be optimized to produce the fast mode of operation.
4. Faster than micro- programmed control unit.
5. It does not require control memory.
Disadvantages of Hardwired Control Unit :
The complexity of the design increases as we require more control signals to be generated (need of
more encoders & decoders)
1. Modifications in the control signals are very difficult because it requires rearranging of wires in the
hardware circuit.
2. Adding a new feature is difficult & complex.
3. Difficult to test & correct mistakes in the original design.
4. It is Expensive.
Microprogrammed Control units generally execute instructions at slower speed. These units are
more complex in their design. The cost of implementing Microprogrammed Control unit is higher. It is
comparatively difficult to identify and fix errors in microprogrammed control units as compared to
hardwired units
13 Outline the two state diagram for dynamic branch prediction.
• State 1: LT : Branch is likely to be taken
• State 2: LNT : Branch is likely not to be taken
• 1.If the branch is taken,the machine moves to LT. otherwise it remains in state LNT.
• 2.The branch is predicted as taken if the corresponding state machine is in state LT, otherwise it is
predicted as not taken
14 Device a data-level parallelism based algorithm for adding the elements of an array of size n using p
processors with the following properties:
i) It should be optimal in terms of speedup, i.e., the speedup should be as close to p as possible.
ii) It should be efficient in terms of communication overhead, i.e., the amount of communication
between processors should be minimized
iii) It should be scalable to large values of n and p.
Step 1:
Divide the array into p equal segments. This can be done by assigning each processor a segment of the array to
process
Step 2 :
Each processor adds the elements of its assigned segment. This is a local operation that does not require any
communication between processors
Step 3:
The processors communicate with each other to reduce the partial sums to a single processor. This can be done
using a variety of techniques, such as a tree reduction or a ring reduction
Step 4:
The processor that receives the final sum returns it to all other processors. This is a broadcast operation that can
be done efficiently using a variety of techniques, such as a tree broadcast or a butterfly broadcast.
Speedup: The speedup of this algorithm is optimal, i.e., it is close to p for large values of n. This is because the
algorithm divides the work evenly among the processors and minimizes the amount of communication between
processors.
Communication overhead: The communication overhead of this algorithm is efficient. The only
communication that occurs is during the reduction and broadcast operations. The amount of data communicated
during these operations is relatively small, and it decreases as the number of processors increases.
Scalability: This algorithm is scalable to large values of n and p. The algorithm can be implemented efficiently
on a wide range of parallel architectures, such as distributed memory systems, shared memory systems, and
GPU clusters.
15 Compare and contrast the memory unit performance of ARM5 and ARM7 architectures in terms of their
cache hierarchy, memory bus width, memory interfacing and instruction sets
Instruction Sets:
ARM5: ARM5 is an early iteration of the ARM architecture, introduced in the mid-1980s. It featured a 32-bit
RISC (Reduced Instruction Set Computer) architecture, which means it used a reduced and optimized set of
instructions to perform operations. ARM5 included 16 general-purpose registers, and its instruction set was
relatively simple compared to later versions
ARM7: ARM7 is a more advanced iteration of the ARM architecture, introduced in the early 1990s. It also used
a 32-bit RISC architecture and had 37 general-purpose registers. ARM7 expanded the instruction set with
additional features, including conditional execution, load/store multiple, and a wider variety of addressing
modes
Performance of Memory Unit
Cache Hierarchy: ARM7 architecture features a more advanced cache hierarchy compared to ARM5. ARM7
typically has separate instruction and data caches with larger sizes, resulting in better cache performance. In
contrast, ARM5 may have smaller or unified caches, which can lead to more cache misses and slower memory
access.
Memory Bus Width: ARM7 architectures often have wider memory buses, allowing for faster data transfer
between the CPU and memory. This wider bus enables ARM7 to access data from memory more quickly than
ARM5, which may have a narrower memory bus.
Memory Interfacing: ARM7 architectures are designed to interface with a wider range of memory technologies,
including newer memory types like DDR and DDR2, with improved memory controller units. This capability
enables ARM7 to take full advantage of the latest memory technologies, resulting in faster memory unit
performance, while ARM5 may have limitations in this regard.
PART C
Instructions: Answer all (12 x 2 = 24 Marks)
16. A Aravind’s semester wise top 3 course marks are stored in 3 registers namely R1, R2 and R3. Specify the
instruction for storing the summation of all marks and store it in R1 itself. Write the complete set of
control sequences to the above operation.
1 PCout R=B MARin Read , IncPC
2 WMFC
3 MDRoutB R=B IRin
4 R1outA R2outB , R3outC , SELECTA ADD R3IN END
OR
16. B Explain with a scenario how unconditional branching can cause pipeline to stall with suitable example.
• If Sequence of instruction being executed in two stages pipeline instruction I1 to I3 are stored at
consecutive memory address and instruction I2 is a branch instruction.
• If the branch is taken then the PC value is not known till the end of I2.
• Next third instructions are fetched even though they are not required
• Hence they have to be flushed after branch is taken and new set of instruction have to be fetched from
the branch address
Branch penalty
• The time lost as the result of branch instruction
Reducing the penalty
• The branch penalties can be reduced by proper scheduling using compiler techniques.
• For longer pipeline, the branch penalty may be much higher
• Reducing the branch penalty requires branch target address to be computed earlier in the pipeline
• Instruction fetch unit must have dedicated hardware to identify a branch instruction and compute
branch target address as quickly as possible after an instruction is fetched
17 A. Explain about Flynn’s Classification with an example.
SISD
SIMD
MISD
MIMD
OR
17 B Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five
distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and a
twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a
program has 100 instructions, how much amount of memory (in bytes) is consumed by the program text?
The instruction consists of opcode and operands. Given the instruction set of size 12, 4 bits are
required for opcode (2^4 = 16).
As there are total 64 registers, 6 bits are required for identifying a register.
As the instruction contains 3 registers (2 source + 1 designation), 3 * 6 = 18 bit are required for
register identifiers.
12 bits are required for immediate value as given.
Total bits for an instruction = 4 + 18 + 12 = 34 bits
The instructions are required to be stored in a byte-aligned fashion. The nearest byte boundary
after 34 bits is at 40 bits (5 bytes).
Hence, for 100 instructions, the memory required is 5 * 100 = 500 bytes