Distributed Process Management
I - 2011/2012 Distributed Process Management 1
Distributed Process Management
Process Migration
Transfer of sufficient amount of the state of a process from
one machine to another
The process executes on the target machine
I - 2011/201 Distributed Process Management 2
Distributed Process Management
Motivation
Load sharing
Move processes from heavily loaded to lightly load systems
Load can be balanced to improve overall performance
Communications performance
Processes that interact intensively can be moved to the same
node to reduce communications cost
May be better to move process to where the data reside when
the data is large
I - 2011/201 Distributed Process Management 3
Distributed Process Management
Availability
Long-running process may need to move because the machine it
is running on will be down
Utilizing special capabilities
Process can take advantage of unique hardware or software
capabilities
I - 2011/201 Distributed Process Management 4
Distributed Process Management
Initiation of Migration
Operating system
When goal is load balancing
Process
When goal is to reach a particular resource
I - 2011/201 Distributed Process Management 5
What is Migrated?
Must destroy the process on the source system and create it on the
target system
Process control block and any links must be moved
I - 2011/201 Distributed Process Management 6
Example of Process Migration: Before Migration
I - 2011/201 Distributed Process Management 7
Example of Process Migration: After Migration
I - 2011/201 Distributed Process Management 8
What is Migrated?
Eager (all):Transfer entire address space
No trace of process is left behind
If address space is large and if the process does not need most
of it, then this approach my be unnecessarily expensive
Precopy: Process continues to execute on the source node while the
address space is copied
Pages modified on the source during precopy operation have to
be copied a second time
Reduces the time that a process is frozen and cannot execute
during migration
I - 2011/201 Distributed Process Management 9
Eager (dirty): Transfer only that portion of the address space that is in
main memory and have been modified
Any additional blocks of the virtual address space are transferred
on demand
The source machine is involved throughout the life of the process
Copy-on-reference: Pages are only brought over on reference
Variation of eager (dirty)
Has lowest initial cost of process migration
I - 2011/201 Distributed Process Management 10
Flushing: Pages are cleared from main memory by flushing dirty
pages to disk
Relieves the source of holding any pages of the migrated process
in main memory
I - 2011/201 Distributed Process Management 11
Negotiation of Migration
Migration policy is responsibility of Starter utility
Starter utility is also responsible for long-term scheduling
and memory allocation
Decision to migrate must be reached jointly by two Starter
processes (one on the source and one on the destination)
I - 2011/201 Distributed Process Management 12
Negotiation of Process Migration
I - 2011/201 Distributed Process Management 13
Eviction
System evict a process that has been migrated to it
If a workstation is idle, process may have been migrated to
it
Once the workstation is active, it may be necessary to
evict the migrated processes to provide adequate
response time
I - 2011/201 Distributed Process Management 14
Distributed Global States
Operating system cannot know the current state of all
process in the distributed system
A process can only know the current state of all processes
on the local system
Remote processes only know state information that is
received by messages
These messages represent the state in the past
I - 2011/201 Distributed Process Management 15
Example
Bank account is distributed over two branches
The total amount in the account is the sum at each branch
At 3 PM the account balance is determined
Messages are sent to request the information
I - 2011/201 Distributed Process Management 16
If at the time of balance determination, the balance from
branch A is in transit to branch B
The result is a false reading
I - 2011/201 Distributed Process Management 17
All messages in transit must be examined at time of
observation
Total consists of balance at both branches and amount in
message
If clocks at the two branches are not perfectly synchronized
I - 2011/201 Distributed Process Management 18
Transfer amount at 3:01 from branch A
Amount arrives at branch B at 2:59
At 3:00 the amount is counted twice
I - 2011/201 Distributed Process Management 19
Some Terms
Channel
Exists between two processes if they exchange
messages
State
Sequence of messages that have been sent and
received along channels incident with the process
I - 2011/201 Distributed Process Management 20
Snapshot
Records the state of a process
Global state
The combined state of all processes
Distributed Snapshot
A collection of snapshots, one for each process
I - 2011/201 Distributed Process Management 21
Global State
I - 2011/201 Distributed Process Management 22
I - 2011/201 Distributed Process Management 23
Distributed Snapshot Algorithm
I - 2011/201 Distributed Process Management 24
Mutual Exclusion Requirements
Mutual exclusion must be enforced: only one process at a
time is allowed in its critical section
A process that hales in its noncritical section must do so
without interfering with other processes
It must not be possible for a process requiring access to a
critical section to be delayed indefinitely: no deadlock or
starvation
I - 2011/201 Distributed Process Management 25
When no process is in a critical section, any process that
requests entry to its critical section must be permitted to
enter without delay
No assumptions are made about relative process speeds
or number of processors
A process remains inside its critical section for a finite time
only
I - 2011/201 Distributed Process Management 26
Centralized Algorithm for Mutual Exclusion
One node is designated as the control node
This node control access to all shared objects
If control node fails, mutual exclusion breaks down
I - 2011/201 Distributed Process Management 27
I - 2011/201 Distributed Process Management 28
Distributed Algorithm
All nodes have equal amount of information, on average
Each node has only a partial picture of the total system and
must make decisions based on this information
All nodes bear equal responsibility for the final decision
All nodes expend equal effort, on average, in effecting a
final decision
I - 2011/201 Distributed Process Management 29
Failure of a node, in general, does not result in a total
system collapse
There exits no systemwide common clock with which to
regulate the time of events
I - 2011/201 Distributed Process Management 30
Ordering of Events
Events must be order to ensure mutual exclusion and
avoid deadlock
Clocks are not synchronized
Communication delays
State information for a process is not up to date
Need to consistently say that one event occurs before
another event
I - 2011/201 Distributed Process Management 31
Messages are sent when want to enter critical section and
when leaving critical section
Time-stamping
Orders events on a distributed system
System clock is not used
I - 2011/201 Distributed Process Management 32
Time-Stamping
Each system on the network maintains a counter which
functions as a clock
Each site has a numerical identifier
When a message is received, the receiving system sets is
counter to one more than the maximum of its current value
and the incoming time-stamp (counter)
I - 2011/201 Distributed Process Management 33
If two messages have the same time-stamp, they are
ordered by the number of their sites
For this method to work, each message is sent from one
process to all other processes
Ensures all sites have same ordering of messages
For mutual exclusion and deadlock all processes must
be aware of the situation
I - 2011/201 Distributed Process Management 34
I - 2011/201 Distributed Process Management 35
I - 2011/201 Distributed Process Management 36
I - 2011/201 Distributed Process Management 37
Token-Passing Approach
Pass a token among the participating processes
The token is an entity that at any time is held by one
process
The process holding the token may enter its critical section
without asking permission
When a process leaves its critical section, it passes the
token to another process
I - 2011/201 Distributed Process Management 38
Deadlock in Resource Allocation
Mutual exclusion
Hold and wait
No preemption
Circular wait
I - 2011/201 Distributed Process Management 39
Deadlock Prevention
Circular-wait condition can be prevented by defining a
linear ordering of resource types
Hold-and-wait condition can be prevented by requiring that
a process request all of its required resource at one time,
and blocking the process until all requests can be granted
simultaneously
I - 2011/201 Distributed Process Management 40
Deadlock Avoidance
Distributed deadlock avoidance is impractical
Every node must keep track of the global state of the
system
The process of checking for a safe global state must be
mutually exclusive
Checking for safe states involves considerable
processing overhead for a distributed system with a
large number of processes and resources
I - 2011/201 Distributed Process Management 41
Distributed Deadlock Detection
Each site only knows about its own resources
Deadlock may involve distributed resources
Centralized control – one site is responsible for deadlock
detection
Hierarchical control – lowest node above the nodes
involved in deadlock
Distributed control – all processes cooperate in the
deadlock detection function
I - 2011/201 Distributed Process Management 42
Deadlock in Message Communication
Mutual Waiting
Deadlock occurs in message communication when each of a
group of processes is waiting for a message from another
member of the group and there are no messages in transit
Unavailability of Message Buffers
Well known in packet-switching data networks
Example: buffer space for A is filled with packets destined for B.
The reverse is true at B.
I - 2011/201 Distributed Process Management 43
Unavailability of Message Buffers
For each node, the queue to the adjacent node in one direction is
full with packets destined for the next node beyond
I - 2011/201 Distributed Process Management 44
I - 2011/201 Distributed Process Management 45
I - 2011/201 Distributed Process Management 46
Structured Buffer Pool
I - 2011/201 Distributed Process Management 47
I - 2011/201 Distributed Process Management 48