0% found this document useful (0 votes)

12 views13 pages

Com - 612 Exam

The document discusses parallel algorithms for summing integers in a list, detailing a method that divides the workload among multiple processors to compute partial sums and then combines them for the final result. It also explains the theoretical speedup of k-stage pipelined processors compared to non-pipelined processors, outlines the architecture and functionality of array processors, and highlights the requirements for effective multiprocessor operating systems. Additionally, it covers the P-RAM model and its assumptions, as well as the differences between multiprocessor and single-processor systems.

Uploaded by

Thar Nge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

Com - 612 Exam

Uploaded by

Thar Nge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Long Questions (16 marks)

1. Describe a parallel algorithm to find the sum of all the integers in a list of integer. Explain
how you would divide the work among multiple processors and how the results would be
combined.
The goal is to divide the work among multiple processors so that they concurrently
compute partial sums of the array, and then combine these partial sums to produce the final
result. The process involves divide-and-conquer and reduction. The general algorithm for this
procedure is as follow:
Input: Array A of size N, Number of processors P
Step 1: Divide the array into P parts
chunk_size = N / P
Step 2: Assign each processor a chunk and compute partial sum
for i = 0 to P-1 in parallel:
start_index = i * chunk_size
end_index = (i + 1) * chunk_size - 1
sum_i = sum(A[start_index : end_index]) // Each processor computes a
sum
Step 3: Gather all partial sums into an array S
S = [sum_0, sum_1, ..., sum_(P-1)]
Step 4: Perform a sequential processing to compute the final sum
total_sum = sum(S)
Output: total_sum
Let’s assume that we have an array A of 20 integers: And suppose we have 4 processors
available.
A = [3, 7, 1, 5, 9, 2, 8, 4, 6, 10, 15, 12, 18, 11, 20, 17, 13, 14, 19, 16]
To find the sum of all 20 integers in an array (A[20]) using a parallel algorithm, we can
divide the work among multiple processors as follows:
 Divide the Work: Split the array of 20 integers into subsets based on the number of
processors available. Each processor will handle a portion of the array.
 Calculate Partial Sums: Each processor computes the sum of its assigned subset of
integers. For example, if you have 4 processors, the array A can be divided into 4 parts,
each processor computing the sum of 5 integers.
 Combine Partial Results: After each processor has computed its partial sum, gather
these partial sums into a list.
 Sequential Summation: Finally, sum up the list of partial sums sequentially to get the
total sum of all 20 integers.
(1) Divide the Work:
 Processor 1: Handles A[0] to A[4]
 Processor 2: Handles A[5] to A[9]
 Processor 3: Handles A[10] to A[14]
 Processor 4: Handles A[15] to A[19]
(2) Calculate Partial Sums:
 Processor 1 computes: sum1 = 3 + 7 + 1 + 5 + 9 = 25
 Processor 2 computes: sum2 = 2 + 8 + 4 + 6 + 10 = 30
 Processor 3 computes: sum3 = 15 + 12 + 18 + 11 + 20 = 76
 Processor 4 computes: sum4 = 17 + 13 + 14 + 19 + 16 = 79
(3) Combine Partial Results:
 Gather the partial sums into an array: S = [25, 30, 76, 79]
(4) Sequential Summation:
 Sum up the elements of array S sequentially:
25 + 30 = 55
55 + 76 = 131
131 + 79 = 210
By dividing the work among processors, each processor independently computes a
smaller sum, which reduces the overall computational time compared to a single processor
handling all computations sequentially.
After obtaining the partial sums from each processor, combining them into a single list is
straightforward and efficient.
The final step, summing up the list of partial sums sequentially, ensures that all
processors' results are integrated to give the total sum of the original array. This approach
optimizes the computation by leveraging parallel processing capabilities to handle larger data
sets efficiently.

Parallel algorithms are used to solve real-world problems.

Real-World Applications:
(1) Database Sorting: In large databases, where the dataset is enormous, parallel quicksort can
be used to quickly sort data.
(2) Distributed Systems: In cloud-based systems, where data is distributed across multiple
machines, parallel quicksort allows sorting tasks to be distributed among available processors
to achieve faster data retrieval and analysis.
(3) Scientific Computing: In simulations and numerical, large-scale matrix multiplications are
often required. Parallel matrix multiplication helps in accelerating these simulations.
(4) Machine Learning: Many machine learning algorithms, such as training neural networks,
require operations involving large matrices (e.g.. weight matrices, input-output matrices).
Parallel matrix multiplication speeds up these operations, allowing faster model training.

2. Prove that a processor with k-stage pipeline can be at most k times faster than non-pipelined
processor.
Pipelined processor: Instructions are broken down into stages, and multiple instructions
are processed concurrently, each in a different stage.
Non-pipelined processor: Instructions are executed one at a time, completing all stages
before the next instruction begins.
Step 1: Execution Time in a Non-Pipelined Processor
In a non-pipelined processor, each instruction is executed sequentially, meaning the next
instruction starts only after the current one completes.
- Let T be the time taken to execute one instruction.
- For n instructions, the total execution time is:
Tnon-pipelined = nT
Step 2: Execution Time in a k-Stage Pipelined Processor
A k-stage pipeline splits the execution of each instruction into k stages, with each stage
completing a part of the instruction execution. The pipeline operates as follows:
1. The first instruction takes kT time to complete (since it must go through all k stages).
2. After that, each subsequent instruction completes in T time, as the pipeline is filled.
3. Thus, for n instructions, the total execution time is:
Tpipelined = kT + (n−1)T
Simplifying:
Tpipelined = (n+k−1)T
For large n, the term k−1becomes negligible, so we approximate:
nT
Tpipelined ≈
k
Step 3: Speedup Calculation
Speedup S is defined as the ratio of execution time in a non-pipelined processor to that in a
pipelined processor:
S = Tnon-pipelined / Tpipelined
nT
Substituting the values: S = nT =k
k
Thus, under ideal conditions, a k-stage pipeline can be at most k times faster than a
non-pipelined processor.
However, in real-world scenarios, factors such as pipeline stalls, data hazards, and branch
mispredictions often reduce the actual speedup below this theoretical maximum.
3. ARRAY PROCESSORS
A synchronous array of parallel processors is called an array processor, which consists of
multiple processing elements (PEs) under the supervision of one control unit (CU). An array
processor can handle single instruction and multiple data (SIMD) streams. SIMD machines are
specifically designed to perform vector computations over matrices and arrays of data.

SIMD computers appear in two basic architectural organizations: array processors, using
random-access memory, and associative processors, using content addressable (or associative)

SIMD Computer Organization

In general, an array processor assumes the configuration as shown in Figure. The

processor is structured with N synchronized PEs, all of which are under the control of one CU.
Each PE; is essentially an ALU with attached working registers and local memory, PEM i, which
collectively form the storage space for the distributed data. The CU has its own main memory for
the storage of programs. The operating system and user programs are executed under the control
of the CU. The user programs are loaded into the CU memory from an external source. The
function of CU is to decode all the instructions and determine where the decoded instructions
should be executed. Scalar and control-type instructions are executed directly by the CU. Vector
instructions are broadcast to the PEs for distributed execution to achieve spatial parallelism
through duplicated arithmetic units.

All the PEs perform the same function synchronously in a lock-step fashion under the
command of the CU. Vector operands are distributed to the PEMs before parallel execution in
the array of PEs. The distributed data can be loaded into the PEMs from an external source via
the system data bus, or via the CU in a broadcast mode before using the control bus.

An array processor is normally interfaced to a host computer through the control unit.
The host computer is a general-purpose machine which serves as a coordinator of the entire
system, consisting of the host system and the processor array. The functions of a host computer
include resource management and peripheral and I/O supervision. The control unit of the
processor array directly supervises the execution of programs.
4. The additional requirements of a multiprocessor operating system
The additional requirements of a multiprocessor operating system are enumerated below
(An operating system that fails to perform well in these respects tends to negate other advantages
which are associated with multiprocessing):
(1) Load balancing
The operating system must utilize the resources efficiently. This is commonly expressed
in terms of achieving a uniform balance of loads across the processors. The operating system
should schedule the subtasks such that there are no idle resources including the processors.
(2) Scheduling cooperating processes
Parallel programs which consist of concurrently executing tasks must be scheduled such
that collectively they are able to use the resources in the machine required to solve the given
problem. Scheduling half the subtasks of one parallel program and half the subtasks of another
parallel program which cannot cooperate can be wasteful. If a parallel program needs all its
subtasks to be running and cooperating, then scheduling only half of the tasks can lead to
starvation.
(3) Graceful degradation in case of failure of one of the resources:
Given that a parallel machine has multiple resources of the same kind, one must expect a
higher degree of fault tolerance from it. Failure of one of its resources should not result in a
catastrophic system crash. The operating system should be able to reschedule the task that had
been running on the failed resource and continue the parallel program. Ideally, there should be
only a fractional degradation in the performance of the parallel machine.
(4) Communication schemes:
Most parallel programs need to share data and intermediate results across subtasks during
the processing towards the solution of the problem. To achieve effective cooperation among the
subtasks of a parallel program, the operating system must provide adequate facilities for
communication between tasks. These facilities vary depending on whether the machine is of a
shared memory type or of a distributed memory type.

(5) Synchronization mechanisms:

To ensure integrity of the shared data across the subtasks of a parallel program,
synchronization between tasks is required. The shared data must be accessed under mutual
exclusion (implemented using a mechanism like semaphores). The tasks may need to wait till
some state is reached across all the tasks of the parallel program. The operating systems need to
provide signaling mechanisms for such synchronization requirements.

5. P-RAM
The random access machine (RAM)- not to be confused with random access memory-is a
model of one-address computer. A RAM consists of a memory, a read-only input tape, a write-
only output tape, and a program. The program is not stored in the memory and cannot be
modified. The input tape consists of a sequence of integers. Every time an input value is read, the
input head advances one square.

The P-RAM (parallel RAM) consists of a control unit, global memory, and an unbounded
set of processors, each with its own private memory. Every processor has a unique index; this
value can be used to address the processor for passing signals and interrupts. All the processors
share access to global memory which is unbounded in capacity. Execution of instructions can be
started on any of the processors at any time, starting at any location of memory (its local or
global memory).

A processor can cause the start of execution of some task on some other processor
through the signalling mechanism. This signalling mechanism can be available through hardware
or software, or both. This model represents the capabilities of an MIMD machine for parallel
computations. Several simplifying assumptions are made while considering the development of
algorithms for such a machine. They are:

(1) There is no limit on the number of processors in the machine.

(2) Any memory location is uniformly accessible from any processor. (Uniformly in terms of
time required for the access.)
(3) There is no limit on the amount of shared memory in the system.
(4) Resource contention is absent. A processor does not have to wait for reading a memory
location till the other processor has finished accessing the location.
(5) The programs written on these machines are, in general, of type MIMD. Certain special cases
such as SIMD may also be handled in such a framework.

While mapping an algorithm developed on such a machine to real machines, some

constraints are applied to modify the implementation aspects of the algorithm. Formally, such
constraints are expressed in the following terms:
(1) EREW (Exclusive Read Exclusive Write): Read or write conflicts are not allowed.
(2) CREW (Concurrent Read Exclusive Write): Concurrent reading is allowed; however, write
conflicts are not allowed.
(3) CRCW (Concurrent Read Concurrent Write): Concurrent reading and concurrent writing are
allowed. Such a model, in general, is of little practical value.

6. The Configuration of OS (Tr. YYA ပေးထားတဲ့ စာရွက်)

Short Questions (8 marks)

1. What is a multiprocessor system, and how does it differ from a single-processor system?

A multiprocessor system is a computer system with two or more processors that share a
common physical memory and work together to execute tasks. These processors can execute
multiple instructions simultaneously, making them suitable for parallel processing tasks.

A single-processor system, on the other hand, contains only one processor that executes
one instruction at a time. While it can multitask using scheduling techniques, it does not achieve
true parallelism like a multiprocessor system.

Differences:

Aspect Multiprocessor System Single-Processor System

Number of Multiple One
processors
Parallelism True parallelism Simulated through
multitasking
Performance Higher, due to parallel tasks Limited by sequential
execution
Cost Higher Lower
Applications High-performance computing, Personal computers, basic
databases tasks

Example:

 A multiprocessor system: A supercomputer used for weather prediction (e.g., IBM

Summit).
 A single-processor system: A basic laptop performing web browsing.
2. Explain how multiple functional units contribute to parallelism in sequential machines.

Multiple functional units in a sequential machine enable the system to perform different
operations (e.g., addition, multiplication) simultaneously, thereby increasing throughput. These
units include Arithmetic Logic Units (ALUs), Floating-Point Units (FPUs), and other specialized
processors.

Functions:

 A CPU with multiple functional units divides tasks between these units.
 While one unit processes an addition operation, another performs multiplication or a
memory fetch.

Example:

 Consider a CPU executing two instructions:

1. Addition (handled by the ALU).
2. Data fetching from memory (handled by a Memory Management Unit).
 The tasks are performed simultaneously instead of sequentially, reducing execution time.

Multiple functional units within a processor core enable parallelism in sequential

machines by allowing different operations to be executed simultaneously on separate data
streams, effectively "overlapping" the execution of instructions and improving overall processing
speed by taking advantage of instruction-level parallelism (ILP) within a single core; essentially,
multiple units can work on different parts of a task at the same time, even if the instructions are
processed sequentially.

Example
Imagine a processor with separate functional units for integer addition, floating-point
multiplication, and logical operations.
If an instruction sequence includes an addition, a multiplication, and a logical AND, these
operations can be executed simultaneously on their respective units, significantly reducing the
overall execution time compared to a single functional unit processing them sequentially.

3. Real-world applications (in algorithm).

4. Data Parallel Model

The programming model for data-parallel SIMD processors is an extension of the

sequential programming. For example, Fortran 90 is specifically tailored for data parallelism.
The language compiler for a specific machine must translate the source code into executable
object code, which will exploit the parallelism in the hardware.

Data-parallel programs require the use of pre-distributed data sets. Thus the choice of
parallel data structure plays a significant role in data-parallel programming.

Synchronization of data-parallel programs is done at compile-time rather than at run-

time. Hardware optimization is enforced by the control unit to carry out lockstep execution of
SIMD programs. The compiler must be aware of the underlying interconnection topology of PEs
to generate optimal code for the array processor.

5. Parallel processing

Parallel processing is the simultaneous execution of multiple tasks or instructions to

improve computational speed and efficiency. It is widely used in high-performance computing,
AI, and large-scale data processing. The types of parallelism encountered in computing: look-
ahead, pipelining, vectorization, concurrency, data parallelism, partitioning, interleaving,
overlapping, replication, time sharing, space sharing, multitasking, multiprogramming,
multithreading, and distributed computing.
The most well-known classification of computer architectures are as follow:
(1) SISD: This is the conventional von Neumann computer. Only a single stream of instructions
and a single stream of data is involved.

(2) SIMD: Here we have only one instruction stream, but multiple data streams are active at a
time. This is typically done by replicating arithmetic units in the CPU and allowing the
different units to refer to different operands, but follow a common instruction. This model is
known as data parallelism. Vector computers and Array processors are examples of this
class.

(3) MISD: This is the opposite of SIMD. By definition this means multiple instructions on a
single stream of data. This appears counter intuitive and hence many people believe that
MISD is an impossible class. However, there are some who would place the systolic array
architecture in this class.
(4) MIMD: The other end of the spectrum, of course, would be to let loose both the streams.
There are multiple instructions and multiple data. An arbitrary combination of the instruction
and data streams is possible. This is the most general model in Flynn's classification.

Parallel Architecture Course Guide
No ratings yet
Parallel Architecture Course Guide
65 pages
PP Assignment
No ratings yet
PP Assignment
6 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
21 pages
High Performance Computing
No ratings yet
High Performance Computing
8 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
Presentation 3
No ratings yet
Presentation 3
37 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Parallel Computing: Types of Parallelism
No ratings yet
Parallel Computing: Types of Parallelism
27 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
29 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Unit 1
No ratings yet
Unit 1
44 pages
Module 4
No ratings yet
Module 4
12 pages
Vectors
No ratings yet
Vectors
52 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Final
No ratings yet
Final
26 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
3rd Unit
No ratings yet
3rd Unit
72 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
Csa - Unit-4
No ratings yet
Csa - Unit-4
9 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Multiprocessor Systems & Pipelining
No ratings yet
Multiprocessor Systems & Pipelining
11 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
47 pages
Coa Mod 4 5
No ratings yet
Coa Mod 4 5
91 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
RS - Pds-Oe 3010
No ratings yet
RS - Pds-Oe 3010
8 pages
Chapter
No ratings yet
Chapter
9 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Chapter 01
No ratings yet
Chapter 01
52 pages
PP 1
No ratings yet
PP 1
41 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
04 Progbasics
No ratings yet
04 Progbasics
43 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
Unit5 Parallel Processing Multiprocessor
No ratings yet
Unit5 Parallel Processing Multiprocessor
32 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
Questions With Answers
No ratings yet
Questions With Answers
22 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
CS 3111
No ratings yet
CS 3111
9 pages
Comp - 611 Exam
No ratings yet
Comp - 611 Exam
8 pages
Teach Eal My Atsu
No ratings yet
Teach Eal My Atsu
37 pages
3r Yr Exam Exercises
No ratings yet
3r Yr Exam Exercises
21 pages
SPMS
No ratings yet
SPMS
23 pages
247AA Load Cell Transducer
No ratings yet
247AA Load Cell Transducer
2 pages
Session 9 - "Nonmotorized Transport and Climate Change" by Bob Noland
No ratings yet
Session 9 - "Nonmotorized Transport and Climate Change" by Bob Noland
9 pages
Test 40-Đã Chuyển Đổi
No ratings yet
Test 40-Đã Chuyển Đổi
4 pages
Countable vs. Uncountable Nouns Worksheet
No ratings yet
Countable vs. Uncountable Nouns Worksheet
4 pages
Morena Maorihjfgc
No ratings yet
Morena Maorihjfgc
1 page
Cognitive Science Notes
No ratings yet
Cognitive Science Notes
5 pages
Grilon As V0
No ratings yet
Grilon As V0
5 pages
Tooth Anatomy Flashcards
No ratings yet
Tooth Anatomy Flashcards
4 pages
Ultrasonic Tech in Food & Environment
No ratings yet
Ultrasonic Tech in Food & Environment
7 pages
Service Manual: iPF700 Series
No ratings yet
Service Manual: iPF700 Series
198 pages
2022 - Grammar Intermediate
No ratings yet
2022 - Grammar Intermediate
53 pages
Rockfon Hygenic Plus
No ratings yet
Rockfon Hygenic Plus
2 pages
Argala: Illustration: Define Primary Argala and Virodha Argala On Lagna For A Standard Birth
No ratings yet
Argala: Illustration: Define Primary Argala and Virodha Argala On Lagna For A Standard Birth
4 pages
Vitamin C Project Kartik Styled
No ratings yet
Vitamin C Project Kartik Styled
14 pages
Overland Ttransportation
No ratings yet
Overland Ttransportation
3 pages
Case 2 Presentation - Final
No ratings yet
Case 2 Presentation - Final
17 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Sa 29 PDF
No ratings yet
Sa 29 PDF
26 pages
3.5a2 Exam Form Submission NOTICE 626 - R Students Sem 68 Students of BE B Tech - ESE Even 23 24 From 02.05.2024
No ratings yet
3.5a2 Exam Form Submission NOTICE 626 - R Students Sem 68 Students of BE B Tech - ESE Even 23 24 From 02.05.2024
2 pages
JK Tyre IR 2022-23
No ratings yet
JK Tyre IR 2022-23
164 pages
Language Registers Notes2.0 PDF
No ratings yet
Language Registers Notes2.0 PDF
5 pages
011 Modes of Speciation
No ratings yet
011 Modes of Speciation
4 pages
IIT Madras Freshman Handbook 2022
No ratings yet
IIT Madras Freshman Handbook 2022
68 pages
Lenovo ST250 Tower Server
No ratings yet
Lenovo ST250 Tower Server
2 pages
Common Apeejay Annual Exam Datesheet 2023-2024
No ratings yet
Common Apeejay Annual Exam Datesheet 2023-2024
1 page
Education System Cameroon
No ratings yet
Education System Cameroon
18 pages
Assessing Teachers' Digital Competence
No ratings yet
Assessing Teachers' Digital Competence
24 pages
Designing Safe Tech for Sex Workers
No ratings yet
Designing Safe Tech for Sex Workers
42 pages
Introduction To Embedded Systems Report
No ratings yet
Introduction To Embedded Systems Report
7 pages
Untitled
No ratings yet
Untitled
40 pages

Com - 612 Exam

Uploaded by

Com - 612 Exam

Uploaded by

Long Questions (16 marks)

Parallel algorithms are used to solve real-world problems.

SIMD Computer Organization

In general, an array processor assumes the configuration as shown in Figure. The

(5) Synchronization mechanisms:

(1) There is no limit on the number of processors in the machine.

While mapping an algorithm developed on such a machine to real machines, some

6. The Configuration of OS (Tr. YYA ပေးထားတဲ့ စာရွက်)

Aspect Multiprocessor System Single-Processor System

 A multiprocessor system: A supercomputer used for weather prediction (e.g., IBM

 Consider a CPU executing two instructions:

Multiple functional units within a processor core enable parallelism in sequential

3. Real-world applications (in algorithm).

The programming model for data-parallel SIMD processors is an extension of the

Synchronization of data-parallel programs is done at compile-time rather than at run-

Parallel processing is the simultaneous execution of multiple tasks or instructions to

You might also like