CS9222 Advanced Operating
System
Unit – V
Dr.A.Kathirvel
Professor & Head/IT - VCEW
Unit - V
Structures – Design Issues – Threads – Process
Synchronization – Processor Scheduling – Memory
Management – Reliability / Fault Tolerance; Database
Operating Systems – Introduction – Concurrency
Control – Distributed Database Systems –
Concurrency Control Algorithms.
Motivation for Multiprocessors
Enhanced Performance -
Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism
within process)
Fault Tolerance -
graceful degradation in face of failures
Basic MP Architectures
Single Instruction Single Data (SISD) -
conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) -
Vector and Array Processors
Multiple Instruction Single Data (MISD) -
Not Implemented.
Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
MIMD Classifications
Tightly Coupled System - all processors
share the same global memory and have
the same address spaces (Typical SMP
system).
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is
partitioned and attached to each processor.
Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.
MP Block Diagram
CPU CPU CPU CPU
cache MMU cache MMU cache MMU cache MMU
Interconnection Network
MM MM MM MM
Memory Access Schemes
• Uniform Memory Access (UMA)
– Centrally located
– All processors are equidistant (access times)
• NonUniform Access (NUMA)
– physically partitioned but accessible by all
– processors have the same address space
• NO Remote Memory Access (NORMA)
– physically partitioned, not accessible by all
– processors have own address space
Other Details of MP
Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!
Write-update
Write-invalidate
bus snooping
MP OS Structure - 1
Separate Supervisor -
all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency
MP OS Structure - 2
• Master/Slave Configuration
– master monitors the status and assigns work to
other processors (slaves)
– Slaves are a schedulable pool of resources for
the master
– master can be bottleneck
– poor fault tolerance
MP OS Structure - 3
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally
have little interaction
multithread kernel and control access to resources
(continuum)
MP Overview
MultiProcessor
SIMD MIMD
Shared Memory Distributed Memory
(tightly coupled) (loosely coupled)
Master/Slave Symmetric Clusters
(SMP)
SMP OS Design Issues
Threads - effectiveness of parallelism depends
on performance of primitives used to express
and control concurrency.
Process Synchronization - disabling interrupts
is not sufficient.
Process Scheduling - efficient, policy controlled,
task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
dependencies
SMP OS design issues - 2
Memory Management - complicated since
main memory is shared by possibly many
processors. Each processor must maintain its
own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade
gracefully in the event of failures
Typical SMP System
CPU CPU CPU CPU
500MHz
cache MMU cache MMU cache MMU cache MMU
System/Memory Bus
Issues:
• Memory contention Main 50ns I/O
• Limited bus BW INT Memory Bridge subsystem
• I/O contention
• Cache coherence
System Functions ether
(timer, BIOS, reset)
scsi
Typical I/O Bus:
• 33MHz/32bit (132MB/s) video
• 66MHz/64bit (528MB/s)
Some Definitions
Parallelism: degree to which a multiprocessor
application achieves parallel execution
Concurrency: Maximum parallelism an
application can achieve with unlimited
processors
System Concurrency: kernel recognizes multiple
threads of control in a program
User Concurrency: User space threads
(coroutines) provide a natural programming
model for concurrent applications. Concurrency
not supported by system.
Process and Threads
Process: encompasses
set of threads (computational entities)
collection of resources
Thread: Dynamic object representing an
execution path and computational state.
threads have their own computational state: PC,
stack, user registers and private data
Remaining resources are shared amongst threads
in a process
Threads
Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
Threads separate the notion of execution from
the Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting
performance
Three types: User threads, kernel threads and
Light Weight Processes (LWP)
User Level Threads
User level threads - supported by user level
(thread) library
Benefits:
no modifications required to kernel
flexible and low cost
Drawbacks:
can not block without blocking entire process
no parallelism (not recognized by kernel)
Kernel Level Threads
Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
Benefits:
coordination between scheduling and
synchronization
less overhead than a process
suitable for parallel application
Drawbacks:
more expensive than user-level threads
generality leads to greater overhead
Light Weight Processes (LWP)
Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel
User threads scheduled by library onto LWPs
Multiple LWPs per process
First Class threads (Psyche OS)
Thread operations in user space:
create, destroy, synch, context switch
kernel threads implement a virtual processor
Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread
packages to coordinate.
Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current
user thread when stopped by kernel
kernel is responsible for processor allocation =>
preemption by kernel.
Thread package responsible for scheduling
threads on available processors (activations)
Support for Threading
• BSD:
– process model only. 4.4 BSD enhancements.
• Solaris:provides
– user threads, kernel threads and LWPs
• Mach: supports
– kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.
• Digital UNIX: extends MACH to provide usual
UNIX semantics.
– Pthreads library.
Process Synchronization:Motivation
Sequential execution runs correctly but
concurrent execution (of the same program)
runs incorrectly.
Concurrent access to shared data may result in
data inconsistency
Maintaining data consistency requires
mechanisms to ensure the orderly execution of
cooperating processes
Let’s look at an example: consumer-producer
problem.
Producer-Consumer Problem
Producer Consumer
while (true) { while (true) {
/* produce an item and put in while (count == 0); // do nothing
nextProduced */ nextConsumed = buffer[out];
while (count == BUFFER_SIZE); // do out = (out + 1) % BUFFER_SIZE;
nothing count--;
buffer [in] = nextProduced; // consume the item in nextConsumed
in = (in + 1) % BUFFER_SIZE; }
count++;
}
What can go wrong in concurrent
count: the number of items in the execution?
buffer (initialized to 0)
Race Condition
count++ could be implemented as
register1 = count
register1 = register1 + 1
count = register1
count-- could be implemented as
register2 = count
register2 = register2 - 1
count = register2
Consider this execution interleaving with “count = 5” initially:
S0: producer execute register1 = count {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = count {register2 = 5}
S3: consumer execute register2 = register2 - 1 {register2 = 4}
S4: producer execute count = register1 {count = 6 }
S5: consumer execute count = register2 {count = 4}
What are all possible values from concurrent execution?
How to prevent race condition?
Define a critical section in do {
each process entry section
Reading and writing critical section
common variables. exit section
Make sure that only one remainder section
process can execute in the } while (TRUE);
critical section at a time.
What sync code to put into
the entry & exit sections to
prevent race condition?
Solution to Critical-Section
Problem
1. Mutual Exclusion - If process Pi is executing in its critical section, then no
other processes can be executing in their critical sections
2. Progress - If no process is executing in its critical section and there exist
some processes that wish to enter their critical section, then the
selection of the processes that will enter the critical section next cannot
be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that
other processes are allowed to enter their critical sections after a process
has made a request to enter its critical section and before that request is
granted
What is the difference between
Progress and Bounded Waiting?
Peterson’s Solution
Simple 2-process solution
Assume that the LOAD and STORE instructions are
atomic; that is, cannot be interrupted.
The two processes share two variables:
int turn;
Boolean flag[2]
The variable turn indicates whose turn it is to enter
the critical section.
The flag array is used to indicate if a process is ready
to enter the critical section. flag[i] = true implies that
process Pi is ready!
Algorithm for Process Pi
while (true) { Entry Section Mutual exclusion
flag[i] = TRUE; Only one process enters critical section
turn = j; at a time.
while ( flag[j] && turn == j); Proof: can both processes pass the while
loop (and enter critical section) at the
CRITICAL SECTION same time?
Exit Section Progress
flag[i] = FALSE; Selection for waiting-to-enter-critical-
section process does not block.
REMAINDER SECTION Proof: can Pi wait at the while loop
forever (after Pj leaves critical section)?
} Bounded Waiting
Limited time in waiting for other
processes.
Proof: can Pj win the critical section
twice while Pi waits?
Algorithm for Process Pi
while (true) { Entry Section
flag[i] = TRUE; while (true) {
turn = j; flag[j] = TRUE;
while ( flag[j] && turn == j); turn = i;
while ( flag[i] && turn == i);
CRITICAL SECTION
Exit Section
CRITICAL SECTION
flag[i] = FALSE;
flag[j] = FALSE;
REMAINDER SECTION
REMAINDER SECTION
} }
Synchronization Hardware
Many systems provide hardware support for critical section code
Uniprocessors – could disable interrupts
Currently running code would execute without preemption
Generally too inefficient on multiprocessor systems
Operating systems using this not broadly scalable
Modern machines provide special atomic hardware instructions
Atomic = non-interruptable
TestAndSet(target): Either test memory word and set value
Swap(a,b): Or swap contents of two memory words
TestAndSet Instruction
• Definition:
boolean TestAndSet (boolean *target)
{
boolean rv = *target;
*target = TRUE;
return rv:
}
Solution using TestAndSet
Shared boolean variable lock, initialized to false.
Solution: Entry Section
while (true) {
while ( TestAndSet (&lock ))
; /* do nothing
// critical section Exit Section
lock = FALSE;
// remainder section
}
Does it satisfy mutual exclusion?
How about progress and bounded waiting?
How to fix this?
Bounded-Waiting TestAndSet
Mutual exclusion
• Shared variable Proof: can two processes pass the
boolean waiting[n]; while loop (and enter critical section)
boolean lock; // initialized false. at the same time?
• Solution: Entry Section Bounded Waiting
do { Limited time in waiting for other
waiting[i] = TRUE;
processes.
while (waiting[i] &&
TestAndSet(&lock); What is waiting[] for? When does
waiting[i] = FALSE; waiting[i] set to FALSE?
Proof: how long does Pi’s wait till
// critical section Exit Section waiting[i] becomes FALSE?
j=(i+1)%n; Progress
while ((j!=i) && !waiting[j]) Proof: exit section unblocks at least
j=(j+1)%n; one process’s waiting[] or set the lock
to FALSE.
If (j==i) lock = FALSE;
else waiting[j] = FALSE;
// reminder section
} while (TRUE);
Swap Instruction
• Definition:
void Swap (boolean *a, boolean *b)
{
boolean temp = *a;
*a = *b;
*b = temp:
}
Solution using Swap
Shared Boolean variable lock initialized to FALSE; Each process
has a local Boolean variable key.
Solution:
while (true) { Entry Section
key = TRUE;
while ( key == TRUE)
Swap (&lock, &key ); Exit Section
// critical section
lock = FALSE;
// remainder section
}
Mutual exclusion? Progress and Bounded Waiting?
Notice a performance problem with Swap & TestAndSet
solutions?
Processor Scheduling
PS: ready tasks are assigned to the processors so
that performance is maximized.
Cooperate and communicate through shared
variables or message passing, PS in multiprocessor
system is difficult problem.
PS is very critical to the performance of
multiprocessor systems because a naïve scheduler
can degrade performance substantially.
Issues in Processor Scheduling
3 major causes of performance degradation are
Preemption inside spinlock-controlled critical sections.
This situation occurs when a task is preempted inside CS when there are
other tasks spinning the lock to enter the same CS.
cache corruption
Big chunk of data needed by the previous tasks must be purged from the
cache and new data must be brought into the cache.
Very high miss ratio a processor switched to another task – Cache corrp.
context switching overheads
Execution of a large no. of instructions to save and store the registers, to
initialize the registers, to switch address space, etc.
Co-Scheduling of the Medusa OS
Co-scheduling –proposed by ousterhout for MOS
for cm*
All runnable tasks of an application are scheduled
on the processor simultaneously.
Context switching between appl. Rather than bet.
Tasks of several different applications.
Pbm: tasks wasting resources in lock-spinning
while they wait for a preempted task to release
the critical section.
Smart Scheduling
Proposed by zahorjan et al. – 2 nice features
It avoids preempting a task when the task is inside its
CS
It avoids the rescheduling of tasks that were busy
waiting at the time of their preemption until the task
that is executing the corresponding CS release it.
Eliminates the resource waste due to a processor
spinning a lock.
To reduce the overhead due to context switching
nor to reduce the performance degradation due to
cache corruption.
Scheduling in the NYU Ultracomputer
Edler et al. and it cobines the the strategies of the
previous 2 scheduling techniques.
Tasks can be formed into groups and scheduled in
any of the following ways:
task – scheduled or preempted in the normal manner
All task in group are sched. Or preempted
simultaneously.
Tasks in group are never preempted.
Memory Management
The Mach Operating System
Virtual MM of mach OS developed at cm*
Design Issues
Portability
Data sharing
Protection
Efficiency
The Mach Kernel
Basic primitives necessary for building parallel and
distributed applications.
The Mach Kernel
User process
User space
System V
Software 4.3 BSD emulator HP/UX
emulator
layer emulator emulator Other
emulator
Microkernel Kernel space
The kernel manages five principal
abstractions:
1. Processes.
2. Threads.
3. Memory objects.
4. Ports.
5. Messages.
Process Management in Mach
Address space process
Thread
Process Bootstrap Exception Registered
port port port ports
kernel
Ports
The process port is used to communicate with the
kernel.
The bootstrap port is used for initialization when a
process starts up.
The exception port is used to report exceptions
caused by the process. Typical exceptions are division
by zero and illegal instruction executed.
The registered ports are normally used to provide a
way for the process to communicate with standard
system servers.
Ports
A process can be runnable or blocked.
If a process is runnable, those threads that are
also runnable can be scheduled and run.
If a process is blocked, its threads may not
run, no matter what state they are in.
Process Management Primitives
Create Create a new process, inheriting certain properties
Terminate Kill a specified process
Suspend Increment suspend counter
Resume Decrement suspend counter. If it is 0, unblock the process
Priority Set the priority for current or future threads
Assign Tell which processor new threads should run on
Info Return information about execution time, memory usage, etc.
Threads Return a list of the process’ threads
Threads
Mach threads are managed by the kernel. Thread creation and destruction are
done by the kernel.
Fork Create a new thread running the same code as the
parent thread
Exit Terminate the calling thread
Join Suspend the caller until a specified thread exits
Detach Announce that the thread will never be jointed (waited
for)
Yield Give up the CPU voluntarily
Self Return the calling thread’s identity to it
Scheduling algorithm
When a thread blocks, exits, or uses up its quantum,
the CPU it is running on first looks on its local run
queue to see if there are any active threads.
If it is nonzero, run the highest-priority thread,
starting at the queue specified by the hint.
If the local run queue is empty, the same algorithm is
applied to the global run queue. The global queue
must be locked first.
Scheduling
Global run queue for processor set 1 Global run queue for processor set 2
Priority
(high) 0 0
Low 31 31
:Free :Busy
Count: 6 Count: 7
Hint: 2 Hint: 4
Memory Management in Mach
Mach has a powerful, elaborate, and highly flexible memory
management system based on paging.
The code of Mach’s memory management is split into three
parts. The first part is the pmap module, which runs in the
kernel and is concerned with managing the MMU.
The second part, the machine-independent kernel code, is
concerned with processing page faults, managing address
maps, and replacing pages.
The third part of the memory management code runs as a
user process called a memory manager. It handles the logical
part of the memory management system, primarily
management of the backing store (disk).
Virtual Memory
The conceptual model of memory that Mach user
processes see is a large, linear virtual address space.
The address space is supported by paging.
A key concept relating to the use of virtual address
space is the memory object. A memory object can be
a page or a set of pages, but it can also be a file or
other, more specialized data structure.
An address space with allocated regions,
mapped objects, and unused addresses
File xyz region
Unused
Stack region
Unused
Data region
Unused
Text region
System calls for virtual address
space manipulation
Allocate Make a region of virtual address space usable
Deallocate Invalidate a region of virtual address space
Map Map a memory object into the virtual address space
Copy Make a copy of a region at another virtual address
Inherit Set the inheritance attribute for a region
Read Read data from another process’ virtual address
space
Write Write data to another process’ virtual address space
Memory Sharing
Process 1 Process 2 Process 3
Mapped
file
Operation of Copy-on-Write
Physical memory
Prototype’s address space Child’s address space
7 RW 7 RO 7
6 6 6
5 5 5
4 4 4
3 3 3
RO
2 2 2
1 1 1
0 0 0
Operation of Copy-on-Write
Physical memory Copy of page 7
Prototype’s address space 8 Child’s address space
7 RW 7 7
R
6 6 O 6
5 5 5
4 4 4
3 3 3
RO
2 2 2
1 1 1
0 0 0
Advantages of Copy-on-write
1. some pages are read-only, so there is no
need to copy them.
2. other pages may never be referenced, so
they do not have to be copied.
3. still other pages may be writable, but the
child may deallocate them rather than using
them.
Disadvantages of Copy-on-write
1. the administration is more complicated.
2. requires multiple kernel traps, one for each
page that is ultimately written.
3. does not work over a network.
External Memory Managers
Each memory object that is mapped in a process’ address
space must have an external memory manager that controls
it. Different classes of memory objects are handled by
different memory managers.
Three ports are needed to do the job.
The object port, is created by the memory manager and will
later be used by the kernel to inform the memory manager
about page faults and other events relating to the object.
The control port, is created by the kernel itself so that the
memory manager can respond to these events.
The name port, is used as a kind of name to identify the
object.
Distributed Shared Memory in Mach
The idea is to have a single, linear, virtual
address space that is shared among processes
running on computers that do not have any
physical shared memory. When a thread
references a page that it does not have, it
causes a page fault. Eventually, the page is
located and shipped to the faulting machine,
where it is installed so that the thread can
continue executing.
Communication in Mach
The basis of all communication in Mach is a kernel data
structure called a port.
When a thread in one process wants to communicate with a
thread in another process, the sending thread writes the
message to the port and the receiving thread takes it out.
Each port is protected to ensure that only authorized
processes can send it and receive from it.
Ports support unidirectional communication. A port that can
be used to send a request from a client to a server cannot also
be used to send the reply back from the server to the client. A
second port is needed for the reply.
A Mach port
Message queue
Current message count
Maximum messages
Port set this port belongs to
Counts of outstanding capabilities
Capabilities to use for error reporting
Queue of threads blocked on this port
Pointer to the process holding the RECEIVE capability
Index of this port in the receiver’s capability list
Pointer to the kernel object
Miscellaneous items
Message passing via a port
Receiving thread
Sending
thread
send receive
port Kernel
Capabilities
A B
process
thread
Capability 1 Port 1
with RECEIVE 2 X
2
right kernel
3 Port 3
4 Y 4
Capability with Capability list
SEND right
Primitives for Managing Ports
Allocate Create a port and insert its capability in the capability list
Destroy Destroy a port and remove its capability from the list
Deallocate Remove a capability from the capability list
Extract_right Extract the n-th capability from another process
Insert_right Insert a capability in another process’ capability list
Move_member Move a capability into a capability set
Set_qlimit Set the number of messages a port can hold
Sending and Receiving Messages
Mach_msg(&hdr, options, send_size, rcv_size, rcv_port, timeout, notify_port);
The first parameter, hdr, is a pointer to the message to be sent or to the place
where the incoming message is put, or both.
The second parameter, options, contains a bit specifying that a message is to be
sent, and another one specifying that a message is to be received. Another bit
enables a timeout, given by the timeout parameter. Other bits in options allow a
SEND that cannot complete immediately to return control anyway, with a status
report being sent to notify_port later.
The send_size and rcv_size parameters tell how large the outgoing message is and
how many bytes are available for storing the incoming message, respectively.
Rcv_port is used for receiving messages. It is the capability name of the port or
port set being listened to.
The Mach message format
Complex/SimpleReply rights Dest. rights
Message size
Capability index for destination port
Header
Capability index for reply port
Message kind Not examined
by the
Function code kernel
Descriptor 1
Message Data field 1
body
Descriptor 2
Data field 2
Complex message field descriptor
Bits1 1 1 1 12 8 8
Number of Data field size Data field type
in the data field In bits
0: Out-of-line data present
1: No out-of-line data Bit
Byte
0: Short form descriptor Unstructured word
1: Long form descriptor Integer(8,16,32 bits)
Character
0: Sender keeps out-of-line data 32 Booleans
1: Deallocate out-of-line data from sender Floating point
String
Capability
Reliability/Fault Tolerance: the
SEQUOIA System
Sequoia system – a loosely coupled
multiprocessor system.
Attains a high level of fault tolerance by
performing fault detection in hardware and
fault recovery in the OS.
Design Issues
Fault detection and isolation
Fault recovery
Efficiency
The sequoia Architecture
The Sequoia Architecture
Reliability/Fault Tolerance: the
SEQUOIA System
Fault detection
Error detecting codes
Comparison of duplicated operations
Protocol monitoring
Fault Recovery
Recovery from processor failures
Recovery from main memory failures
Recovery from I/O failures
Database Operating Systems
Database system have
been implemented as
an application on top
of general purpose OS
Requrements of DBOS
Transaction
Management
Support for complex,
persistent data
Buffer Management
Concurrency Control
CC is the process of controlling concurrent access to a database to
ensure that the correctness of the database is maintained.
Database systems
Set of shared data objects that can be accessed by users.
Transactions
A transaction consists of a sequence of R, compute & W s/m
that refer to the data objects of a database.
Conflicts
Transactions conflicts if they access the same data objects.
Transaction processing
A transaction is executed by executing its actions one by one
from the beginning to the end.
A concurrency control model of DBS
3 software modules
Transaction manager (TM)
Supervises the execution of a transaction
Data manager (DM)
Responsible for enforcing concurrency control
Scheduler
Distributed Database System
A distributed database is a database in which storage devices
are not all attached to a common processing unit such as the
CPU.
It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of
interconnected computers.
Unlike parallel systems, in which the processors are tightly
coupled and constitute a single database system, a distributed
database system consists of loosely coupled sites that share
no physical components.
Model of Distributed Database System
Distributed Database System
Motivations: DDBS offers several advantages over a centralized
database system such as
Sharing
Higher system availability (reliability)
Improved performance
Easy expandability
Large databases
Transaction Processing Model
Serializability condition in DDBS
Data replication
Complications due to Data replication
Fully Replicated Database Systems
1. Enhanced reliability 2. Improved responsiveness 3. No directory
management 4. Easier load balancing
Concurrency Control Algorithms
It controls the interleaving of conflicting actions of
transactions so that the integrity of a database is
maintained, i.e., their net effect is a serial execution.
Basic synchronization primitives
Locks
A transaction can request, hold or release the lock on a data
object.
lock a data object in 2 modes: exclusive and shared
Timestamps
Unique number is assigned to a transaction or a data object and is
chosen from a monotonically increasing sequence.
Commonly generated using Lamport’s scheme
Lock based algorithms
Static locking
Two Phase Locking (2PL)
Problems with 2PL: Price for Higher concurrency
2PL in DDBS
Timestamp Based locking
Conflict Resolution
Wait Restart Die Wound
Non-two-phase locking
Timestamp Based Algorithms
Basic timestamp ordering algorithm
Thomas Write Rule (TWR)
Multiversion timestamp ordering algorithm
Conservative timestamp ordering algorithm
Thank U