3.2 - Foster Methodology Ch2
3.2 - Foster Methodology Ch2
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 1
Methodical Design
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 2
Methodology structures
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 3
Methodology structures
1. Partitioning. The computation that is to be performed and the data operated on by this
computation are decomposed into small tasks. Practical issues such as the number of
processors in the target computer are ignored, and attention is focused on recognizing
opportunities for parallel execution.
2. Communication. The communication required to coordinate task execution is
determined, and appropriate communication structures and algorithms are defined.
3. Agglomeration. The task and communication structures are evaluated with respect to
performance requirements and implementation costs. If necessary, tasks are combined
into larger tasks to improve performance or to reduce development costs.
4. Mapping. Each task is assigned to a processor in a manner that attempts to satisfy the
competing goals of maximizing processor utilization and minimizing communication costs.
Mapping can be specified statically or determined at runtime by load-balancing
algorithms.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 4
Partitioning
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 5
Partitioning
One of the first steps in designing a parallel program is to break the problem
into discreet "chunks" of work that can be distributed to multiple tasks.
This is known as decomposition or partitioning.
1. Partition problem into concurrent sub-problems !
2. Solve each sub-problem independently !
3. Combine partial solutions into solution of initial problem !
There are two basic ways to partition computational work among parallel tasks:
domain decomposition
functional decomposition.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 6
Domain Decomposition
data associated with a problem is decomposed. Each parallel task then works
on a portion of the data.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 7
Data partition
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 8
Data partition
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 9
Data partition
Basic idea:
Each array element can be re-computed independently of all others.
Huge degree of concurrency, usually exceeds available processing elements.
Combine individual elements to groups.
Associate each group with one processing element.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 10
Domain Decomposition on Trees
Basic idea:
Data distribution across upper branches of tree.
Each processing element has complete subtree
Examples:
Search trees.
Problems:
Size and structure of trees not available in advance.
Keep tree balanced
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 11
Domain Decomposition on Irregular Meshes
Basic idea:
Irregular distribution of points.
Each point has some maximum number of neighbours.
____ _
New points may be introduced.
Problems:
Very irregular data structure.
Difficult to keep balanced among processing elements.
Difficult to find good distributions,
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 12
Functional Decomposition
The focus is on the computation that is to be performed rather than on the data
manipulated by the computation.
The problem is decomposed according to the work that must be done.
Each task performs a portion of the overall work.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 13
Functional Decomposition
atmosphere model generates wind velocity data that are used by the
ocean model
ocean model generates sea surface temperature data that are used by
the atmosphere model
and so on.
Characteristics:
Task is either solved sequentially
or is recursively split into subtasks.
Subtasks are computed in parallel.
Task waits for subtasks to ¯nish.
Task combines partial results of
subtasks.
Examples:
Quick Sort.
All divide-and-conquer algorithms.
Combinatorial optimization.
Integer optimization.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 15
Partitioning Design Checklist
Have you identified several alternative partitions? You can maximize flexibility
in subsequent design stages by considering alternatives now. Remember to
investigate both domain and functional decompositions.
Negative answers to these questions may suggest that, we have a ``bad'' design.
In this situation
risky simply to push ahead with implementation.
use the performance evaluation techniques to determine whether the design
meets our performance goals despite its apparent deficiencies.
revisit the problem specification. Particularly in science and engineering
applications, where the problem to be solved may involve a simulation of a
complex physical process, the approximations and numerical techniques used
to develop the simulation can strongly influence the ease of parallel
implementation.
In some cases, optimal sequential and parallel solutions to the same problem
may use quite different solution techniques.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 17
Communication
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 18
Communication
In local communication, each task communicates with a small set of other tasks (its
``neighbors'');
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 19
Communication
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 20
Communication
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 21
Communication
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 22
Communication Design Checklist
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 23
Agglomeration
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 24
Agglomeration
with a view to obtaining an algorithm that will execute efficiently on some class of
parallel computer.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 25
Agglomeration
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 26
Agglomeration
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 27
Agglomeration. Increasing Granularity
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 28
Agglomeration
If the number of communication partners per task is small, we can often reduce
both the number of communication operations and the total communication volume
by increasing the granularity of our partition, that is, by agglomerating several tasks
into one.
In a two-dimensional problem,
o ``surface'' scales with the problem size while
o ``volume'' scales as the problem size squared.
Hence, the amount of communication performed for a unit of computation (the
communication/computation ratio ) decreases as task size increases.
This effect is often visible when a partition is obtained by using domain
decomposition techniques.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 29
Agglomeration Design Checklist
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 32
Mapping
These two strategies will sometimes conflict, in which case the design will involve
tradeoffs.
Resource limitations may restrict the number of tasks that can be placed on a
single processor.
are applied
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 34
Mapping
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 35
Mapping
The most complex problems are those in which either the number of tasks or
the amount of computation or communication per task changes dynamically
during program execution.
In the case of problems developed using domain decomposition techniques,
we may use a dynamic load-balancing strategy in which a load-balancing
algorithm is executed periodically to determine a new agglomeration and
mapping.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 36
Load Balancing
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 37
Load Balancing
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 38
Static Load Balancing
For maximum efficiency, domain decomposition should give equal work to each
processor. In building the wall, can just give each bricklayer an equal length
segment.
But things can become much more complicated:
• What if some bricklayers are faster than others? (this is like an
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 39
Dynamic Load Balancing
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 41
How to Achieve Load Balance
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 42
Common Task Pool approach
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 43
Loop Scheduling Types
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 44
Loop Scheduling Types
simple
partition the iterations evenly among all the available threads.
dynamic
gives each thread chunksize iterations.
Chunksize should be smaller than the number of total iterations divided by
the number of threads. The advantage of dynamic over simple is that dynamic
helps distribute the work more evenly than simple.
interleave
gives each thread chunksize iterations of the loop, which are then assigned to
the threads in an interleaved way.
gss (guided self-scheduling)
gives each processor a varied number of iterations of the loop. This is like
dynamic, but instead of a fixed chunksize, the chunksize iterations begin with
big pieces and end with small pieces.
Programs with triangular matrices should use gss.
runtime
Tells the compiler that the real schedule type will be specified at run time.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 45
Load balancing. MPMD – Multiple Program, Multiple Data
Characteristics:
Constant number of tasks.
Each task may execute different program.
Tasks operate on pairwise disjoint data sets.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 46
Load balancing. SPMD – Single Program, Multiple Data
Characteristics:
Constant number of tasks.
Each task executes identical program.
Tasks operate on pairwise disjoint data sets.
Tasks may identify themselves !
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 47
Dynamic Task Creation
A Parent process
B C Child processes
D E
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 48
Load balancing. Master – Worker balancing structure
Hierarchical Manager/Worker.
A variant of the manager/worker scheme divides workers into disjoint sets, each
with a submanager.
Workers request tasks from submanagers, which themselves communicate
periodically with the manager and with other submanagers to balance load
between the sets of processors for which they are responsible.
Decentralized Schemes.
there is no central manager.
a separate task pool is maintained on each processor, and idle workers request
problems from other processors.
task pool becomes a distributed data structure that is accessed by the different
tasks in an asynchronous fashion.
A variety of access policies can be defined.
a worker may request work from a small number of predefined “neighbors”
may select other processors at random.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 50
Load balancing. Master – Worker balancing structure
Notice that while this manager will certainly be a bottleneck on large numbers of
processors, it will typically be accessed less frequently than will the manager in a
manager/worker scheduler and hence is a more scalable construct.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 51
Mapping Design Checklist
The following checklist can serve as a basis for an informal evaluation of the
mapping design.
1. If considering an SPMD design for a complex problem, have you also
considered an algorithm based on dynamic task creation and deletion?
The latter approach can yield a simpler algorithm; however, performance
can be problematic.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 52
Mapping Design Checklist
2. If considering a design based on dynamic task creation and deletion, have you
also considered an SPMD algorithm?
An SPMD algorithm provides greater control over the scheduling of
communication and computation, but can be more complex.
3. If using a centralized load-balancing scheme, have you verified that the
manager will not become a bottleneck?
You may be able to reduce communication costs in these schemes by
passing pointers to tasks, rather than the tasks themselves, to the
manager.
4. If using a dynamic load-balancing scheme, have you evaluated the relative
costs of different strategies?
Be sure to include the implementation costs in your analysis.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 53
Parallel Program Examples
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 54
Array Processing
do j = 1,n
do i = 1,n
a(i,j) = fcn(i,j)
end do
end do
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 55
Array Processing
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 56
Array Processing. Parallel Solution 1
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 57
Array Processing. Parallel Solution Implementation
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 58
Array Processing. Parallel Solution Implementation
if I am MASTER
initialize the array
send each WORKER info on part of array it owns
send each WORKER its portion of initial array
receive from each WORKER results
else if I am WORKER
receive from MASTER info on part of array I own
receive from MASTER my portion of initial array
# calculate my portion of array
do j = my first column, my last column
do i = 1,n
a(i,j) = fcn(i,j)
end do
end do
send MASTER results
endif
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 59
Array Processing. Parallel Solution 2. Pool of Tasks
Static load balancing is not usually a major concern if all tasks are performing the
same amount of work on identical machines.
If you have a load balance problem (some tasks work faster than others), you
may benefit by using a "pool of tasks" scheme.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 60
Array Processing. Parallel Solution 2. Pool of Tasks
o Performs computation
• Worker processes do not know before runtime which portion of array they will
handle or how many tasks they will perform.
• Dynamic load balancing occurs at run time: the faster tasks will get more work
to do.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 61
Array Processing. Parallel Solution 2. Pool of Tasks
if I am MASTER
do until no more jobs
send to WORKER next job
receive results from WORKER
end do
tell WORKER no more jobs
else if I am WORKER
do until no more jobs
receive from MASTER next job
calculate array element: a(i,j) = fcn(i,j)
send results to MASTER
end do
endif
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 62
Array Processing. Parallel Solution 2. Pool of Tasks
Discussion:
In the above pool of tasks example, each task calculated an individual array
element as a job. The computation to communication ratio is finely granular.
Finely granular solutions incur more communication overhead in order to
reduce task idle time.
A more optimal solution might be to distribute more work with each job. The
"right" amount of work is problem dependent.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 63
Matrix Vector Multiplication.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 64
Matrix Vector Multiplication.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 65
Matrix Vector Multiplication.
matvec()
{
int i,j;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 66
Matrix Vector Multiplication. Shared Memory Parallel Implementation
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 67
Matrix Vector Multiplication. Shared Memory Parallel Implementation
void matvec()
{
int i,j,nprocs,myid;
float tmp[max];
nprocs = m_get_numprocs();
myid = m_get_myid();
for (j = myid; j < n; j = j + nprocs) {
for (i=0; i < m; i++)
tmp[i] = tmp[i] + a[i][j] * b[j];
m_lock();
for (i=0; i < m; i++)
c[i] = c[i] + tmp[i];
m_unlock();
}
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 68
Matrix Multiplication
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 69
Matrix Multiplication. Serial Implementation
matmul()
{
int i, j, k;
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
c[i][j] = 0.0;
for (k = 0 ; k < n; k++) {
c[i][j] = c[i][j] + a[i][k] * b [k][j];
}
}
}
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 70
Matrix Multiplication. Parallel Implementation
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 71
Matrix Multiplication. Parallel Implementation
void matmul()
{
int i, j, k;
int nprocs, iprocs, jprocs;
int my_id, i_id,j_id,ilb,iub,jlb,jub;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 72
Matrix Multiplication. Parallel Implementation
my_id = m_get_myid();
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 73
Quick Sort
• two steps:
o During the divide step, a sequence A[q..r] is partitioned into two nonempty
subsequences A[q..s] and A[s+1..r] such that each element of A[q..s] <=
each element of A[s+1..r]
o During conquer step, the subsequent subsequences are sorted by
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 74
Quick Sort. Serial Algorithm
QSORT(start,end)
QUICKSORT(A,q,r) {
begin if (start==end) return;
if q<r then x=a[start]; i=start-1; j=end+1;
x=A[q]; while (i<j) {
s=q; j--;
for i=q+1 to r while ((a[ j ]>x)&&(j>start)) j--;
if A[i]<=x then i++;
s=s+1; while ((x>a[ i ])&&(i<end)) i++;
swap(A[s],A[i]); if (i<j) {
end if temp=a[ i ]; a[ i ]=a[ j ]; a[ j ]=temp;
end i-loop }
}
swap(A[q],A[s]); qsort(start,j);
QUICKSORT(A,q,s); qsort(j+1,end);
QUICKSORT(A,s+1,r); }
end if main()
end QUICKSORT {
qsort(0,N-1);
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 75
Quick Sort. Parallel Implementation
#include <stdio.h>
#include <stdlib.h>
#define N 100
int a[N];
int start[N],end[N];
int head,tail,numdone;
int pivot(start,end)
int start,end;
{
int x,i,j,temp;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 76
Quick Sort. Parallel Implementation
if (start==end) return(start);
x=a[start];
i=start-1;
j=end+1;
while (i<j) {
j--;
while ((a[j]>x)&&(j>start)) j--;
i++;
while ((x>a[i])&&(i<end)) i++;
if (i<j) {
temp=a[i];
a[i]=a[j];
a[j]=temp;
}
}
return(j);
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 77
Quick Sort. Parallel Implementation
parallel_qsort()
{
int l_start, l_end, l_pivot, l_flag, l_numdone, id;
id=m_get_myid();
l_numdone=0;
while (l_numdone<N) {
l_flag=0;
m_lock();
if (head<tail) {
l_flag=1;
l_start=start[head];
l_end=end[head];
head++;
printf("[%d] : (%d,%d)\n",id,l_start,l_end);
}
l_numdone=numdone;
m_unlock();
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 78
Quick Sort. Parallel Implementation
if (l_flag==1) {
l_pivot=pivot(l_start,l_end);
m_lock();
if (l_start<l_pivot) {
start[tail]=l_start;
end[tail]=l_pivot;
tail++;
} else {
numdone++;
}
if (l_pivot+1<l_end) {
start[tail]=l_pivot+1;
end[tail]=l_end;
tail++;
} else {
numdone++;
}
l_numdone=numdone;
m_unlock();
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 79
Quick Sort. Parallel Implementation
}
}
}
main()
{
int i;
Problem
• Consider parallel algorithm for computing the value of through the
following numerical integration
1
4
Pi = ∫ dx
0
1+ x 2
This integration can be evaluated by computing the area under the curve for
4
f (x) =
1+ x 2 from 0 to 1.
• With numerical integration using the rectangle rule for decomposition, one
divides the region x from 0 to 1 into n points.
• The value of the function is evaluated at the midpoint of each interval
• The values are summed up and multiplied by the width of one interval.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 81
Parallel Program Examples. PI Calculation
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 82
Pi: Sequential Algorithm
pi()
{
h = 1.0 / n;
sum = 0.0;
for (i=0; i < n; i++) {
x = h * (i - 0.5);
sum = sum + 4.0 / (1 + x * x);
}
pi = h * sum;
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 83
Pi: Sequential Algorithm
#include <stdio.h>
#include <stdlib.h>
double piece_of_pi(int);
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 85
Pi: Parallel Program 1. Shared Memory Parallel Implementation
#include <stdio.h>
int main()
{
long n=30000000,i;
long double h=0, pi=0, x, sum;
h=1.0/(long double)n;
#pragma parallel shared(pi) byvalue(h) local(x,i,sum)
{
sum=0; x=0;
#pragma pfor
for(i=0;i<n; i++)
{
x = ( i + 0.5 )*h;
sum += ( 4.0 / ( 1.0 + x * x )) * h;
}
#pragma critical
pi+=suma;
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 86
Pi: Parallel Program 2. Shared Memory Parallel Implementation
global_i = 0; pi = 0; h = 1.0 / n;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 87
Pi: Parallel Program 2. Shared Memory Parallel Implementation
/* compute pi in parallel */
m_fork(computepi);
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 88
Pi: Parallel Program 2. Shared Memory Parallel Implementation
void computepi()
{
int i;
float sum, localpi, x;
sum = 0.0;
while (i < n) {
m_lock();
i = global_i;
global_i = global_i + 1;
m_unlock();
x = h * (i - 0.5);
sum = sum + 4.0 / (1 + x * x);
}
localpi = h * sum;
m_lock();
pi = pi + localpi;
m_unlock();
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 89
Pi: Parallel Program 3. Shared Memory Parallel Implementation
#include <stdio.h>
#include <stdlib.h>
#include <ulocks.h>
#include <task.h>
#define MAX_PROCS 12
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 90
Pi: Parallel Program 3. Shared Memory Parallel Implementation
m_set_procs(num_procs);
m_fork(piece_of_pi,n,num_procs);
m_kill_procs();
for(i=0;i<num_procs;i++)
total_pi += part_pi[i];
return(0);
}
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 91
Pi: Parallel Program 3. Shared Memory Parallel Implementation
my_id = m_get_myid();
chunk = n/num_procs;
start = my_id*chunk;
end = start+chunk;
if(my_id==(num_procs-1))
end=n;
width=1.0/n;
for(i=start;i<end;i++){
x = (i+0.5)*width;
mypiece += 4.0/(1.0+x*x);
}
part_pi[my_id] = mypiece*width; }
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 92
Pi: Solution 3.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 93
Pi: Solution 3.
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 94
Pi: Solution 3.
for i := 1 to npoints
xcoordinate := random( 0,1) ;
ycoordinate := random( 0,1) ;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 95
Pi: Parallel Solution 3.
• Parallel strategy: break the loop into portions that can be executed by the
tasks.
• For the task of approximating PI:
o Each task executes its portion of the loop a number of times.
o Each task can do its work without requiring any information from the other
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 96
Pi: Parallel Solution 3.
for i := 1 to num
generate 2 random numbers between 0 and 1
xcoordinate = random1 ; ycoordinate = random2
if (xcoordinate, ycoordinate) inside circle
then circle_count = circle_count + 1
endif
endfor
for i := 1 to num_tasks() - 1
tmp := receive( i) ;
circle_count := circle_count + tmp ;
endfor
PI = 4.0 * circle_count/npoints ;
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 98
Notes
____________
© A. Tchernykh. Parallel Programming. TU Clausthal, IFI, Germany, 2003 99