Distributed Systems
Process Migration & Allocation
Paul Krzyzanowski pxk@cs.rutgers.edu ds@pk.org
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Page 1
Processor allocation
Easy with multiprocessor systems
Every processor has access to the same memory and resources. All processors pick a job from a common run queue. Process can be restarted on any available processor.
Much more complex with multicomputer systems
No shared memory (usually) Little or no shared state (file name space, open files, signals, ) Network latency
Page 2
Allocation or migration?
Migratory or nonmigratory? Most environments are nonmigratory
System decides where a process is born User decides where a process is born
Migratory processes:
Move a process between machines during its lifetime Can achieve better system-wide utilization of resources
Page 3
Need transparency
Process must see the same environment on different computers Same set of system calls & shared libraries Non-migratory processes: Migratory processes:
File system name space stdin, stdout, stderr
File system name space Open file descriptors (including stdin, stdout, stderr) Signals Shared-memory segments Network connections (e.g. TCP sockets) Semaphores, message queues Synchronized clocks
Page 4
Migration strategies
Move state
Page 5
Migration strategies
Move state Keep state on original system
Use RPC for system calls
Page 6
Migration strategies
Move state Keep state on original system
Use RPC for system calls
Ignore state
Page 7
Constructing process migration algorithms
Deterministic vs. heuristic centralized, hierarchical or distributed optimal vs. suboptimal local or global information location policy
Page 8
Up-down algorithm
Centralized coordinator maintains usage table Goal: provide a fair share of available compute power
System creates process decides if local system is too congested for local execution sends request to central manager, asking for a process
do not allow the user to monopolize the environment
Centralized coordinator keeps points per workstation
+points for running jobs on other machines -points if you have unsatisfied requests pending If your points > 0: you are a net user of processing resources
coordinator takes request from workstation with lowest score
Page 9
Hierarchical algorithm
Removes central coordinator to provide greater scalability Each group of workers (processors) gets a manager (coordinator responsible for process allocation to its workers) Manager keeps track of workers available for work (similar to centralized algorihtm) If a manager does not have enough workers (CPU cycles), it then passes the request to its manager (up the hierarchy)
Page 10
Distributed algorithms
Sender initiated distributed heuristic
If a system needs help in running jobs:
pick machine at random send it a message: Can you run my job? if it cannot, repeat (give up after n tries)
Algorithm has been shown to behave well and be stable Problem: network load increases as system load increases
Receiver initiated distributed heuristic
If a system is not loaded:
pick machine at random send it a message: I have free cycles if it cannot, repeat (sleep for a while after n tries and try again)
Heavy network load with idle systems but no extra load during critical (loaded) times
Page 11
Migrating a Virtual Machine
Checkpoint an entire operating system Restart it on another system Does the checkpointed image contain a filesystem?
Easy if all file access is network or to a migrated file system Painful if file access goes through the host OS to the host file system.
Page 12
The end.
Page 13