Unit 4
Unit 4
MEMORY MANAGEMENT
Main Memory: Background – Swapping - Contiguous Memory Allocation – Segmentation – Paging - Structure
of the Page Table – Virtual Memory: Background - Demand Paging - Copy-on-Write - Page Replacement -
Allocation of Frames – Thrashing - Memory-Mapped Files - Allocating Kernel Memory.
Introduction to Memory
Memory Background
Role of OS in memory management
Role of Memory
• A typical instruction-execution cycle, for example, first fetches an
  instruction from memory.
• The instruction is then decoded and may cause operands to be fetched
  from memory.
• After the instruction has been executed on the operands, results are stored
  back in memory.
• Main memory and the registers built into the processor itself are the only
  general-purpose storage that the CPU can access directly.
• Registers that are built into the CPU are generally accessible within one
  cycle of the CPU clock.
Process and Address
• Each process has a separate memory space.
• Separate per-process memory space protects the processes from each other
  and is fundamental to having multiple processes loaded in memory for
  concurrent execution.
• To separate memory spaces, we need the ability to determine the range of
  legal addresses that the process may access and to ensure that the process
  can access only these legal addresses.
A base and a limit register define a logical
address space.
• 300040 and the limit register is 120900, then the
  program can legally access all addresses from 300040
  through 420939 (inclusive).
• Any attempt by a program executing in user mode to
  access operating-system memory or users’ memory
  results in a trap to the operating system, which treats
  the attempt as a fatal error.
• This scheme prevents a user program from
  (accidentally or deliberately) modifying the code or
  data structures of either the operating system or other
  users.
Base and limit register
• The base and limit registers can be loaded only by the operating system,
  which uses a special privileged instruction.
• Since privileged instructions can be executed only in kernel mode, and
  since only the operating system executes in kernel mode, only the
  operating system can load the base and limit registers.
• This scheme allows the operating system to change the value of the
  registers but prevents user programs from changing the registers’ contents.
Address Binding
• A program resides on a disk as a binary executable file.
• To be executed, the program must be brought into memory and placed
  within a process.
• Depending on the memory management in use, the process may be moved
  between disk and memory during its execution.
• The processes on the disk that are waiting to be brought into memory for
  execution form the input queue.
• The normal single-tasking procedure is to select one of the processes in the
  input queue and to load that process into memory.
Address Binding (Cont..)
• Most systems allow a user process to reside in any part of the physical memory.
• Thus, although the address space of the computer may start at 00000, the first
  address of the user process need not be 00000.
• Addresses may be represented in different ways during these steps.
• Addresses in the source program are generally symbolic (such as the variable count).
• A compiler typically binds these symbolic addresses to relocatable addresses (such
  as “14 bytes from the beginning of this module”).
• The linkage editor or loader in turn binds the relocatable addresses to absolute
  addresses (such as 74014).
• Each binding is a mapping from one address space to another.
• The binding of instructions and data to memory addresses can be done at any step
  like compile time, load time, execution time.
Dynamic Linking and Shared Libraries
• Some operating systems support only static linking, in which system
  libraries are treated like any other object module and are combined by the
  loader into the binary program image.
• Dynamic linking, in contrast, is similar to dynamic loading. Here, though,
  linking, rather than loading, is postponed until execution time.
• With dynamic linking, a stub is included in the image for each library
  routine reference.
• The stub is a small piece of code that indicates how to locate the
  appropriate memory-resident library routine or how to load the library if
  the routine is not already present.
Types of Loading
  Dynamic Loading
• The size of a process has thus been limited to the size of physical memory. To
  obtain better memory-space utilization, we can use dynamic loading.
• With dynamic loading, a routine is not loaded until it is called. All routines are
  kept on disk in a relocatable load format.
• The advantage of dynamic loading is that a routine is loaded only when it is
  needed.
• This method is particularly useful when large amounts of code are needed to
  handle infrequently occurring cases, such as error routines.
• It is the responsibility of the users to design their programs to take advantage of
  such a method. Operating systems may help the programmer, however, by
  providing library routines to implement dynamic loading.
Logical Versus Physical Address Space
• An address generated by the CPU is commonly referred to as a logical
  address (virtual address), whereas an address seen by the memory unit—
  that is, the one loaded into the memory-address register of the memory is
  commonly referred to as a physical address.
• The set of all logical addresses generated by a program is a logical address
  space.
• The set of all physical addresses corresponding to these logical addresses
  is a physical address space.
• The compile-time and load-time address-binding methods generate
  identical logical and physical addresses.
• Execution time → Different logical (virtual) and physical address.
Runtime mapping Virtual   MMU   Physical
Memory Management Unit (MMU)
• The run-time mapping from virtual to physical addresses is done by a
  hardware device called the memory-management unit (MMU).
• The base register is now called a relocation register.
• The value in the relocation register is added to every address generated by
  a user process at the time the address is sent to memory.
• The user program never sees the real physical addresses.
• We now have two different types of addresses: logical addresses (in the
  range 0 to max) and physical addresses (in the range R + 0 to R + max for a
  base value R).
Swapping
• A process must be in memory to be executed.
• A process, however, can be swapped temporarily out of memory to a
  backing store and then brought back into memory for continued
  execution.
• Swapping makes it possible for the total physical address space of all
  processes to exceed the real physical memory of the system, thus
  increasing the degree of multiprogramming in a system.
• Standard swapping involves moving processes between main memory
  and a backing store.
Swapping of two process
What is a backing store ?
• The backing store is commonly a fast disk.
• Large enough to accommodate copies of all memory images for all
  users, and it must provide direct access to these memory images.
• The system maintains a ready queue consisting of all processes
  whose memory images are on the backing store or in memory and are
  ready to run.
• Whenever the CPU scheduler decides to execute a process, it calls the
  dispatcher.
• The dispatcher checks to see whether the next process in the queue is
  in memory
Standard swapping
• The major part of the swap time is transfer time.
• The total transfer time is directly proportional to the amount of
  memory swapped.
• If we want to swap a process, we must be sure that it is
  completely idle.
• Never swap a process with pending I/O, or execute I/O operations
  only into operating-system buffers.
• It requires too much swapping time and provides too little
  execution time to be a reasonable.
Memory Allocation
• The main memory must accommodate both the operating system
  and the various user processes.
• We can place the operating system in either low memory or high
  memory.
• Since the interrupt vector is often in low memory, programmers
  usually place the operating system in low memory as well.
• In contiguous memory allocation, each process is contained in a
  single section of memory that is contiguous to the section
  containing the next process.
Memory Allocation (Cont..)
• One of the simplest methods for allocating memory is to divide memory
  into several fixed-sized partitions.
• In this multiple partition method, when a partition is free, a process is
  selected from the input
• queue and is loaded into the free partition.
• When the process terminates, the partition becomes available for another
  process.
• In the variable-partition scheme, the operating system keeps a table
  indicating which parts of memory are available and which are occupied.
• Initially, all memory is available for user processes and is considered one
  large block of available memory, a hole.
Memory Allocation (Cont..)
• In general, as mentioned, the memory blocks available comprise a set of
  holes of various sizes scattered throughout memory.
• When a process arrives and needs memory, the system searches the set for
  a hole that is large enough for this process.
• If the hole is too large, it is split into two parts.
• One part is allocated to the arriving process; the other is returned to the set
  of holes.
• If the new hole is adjacent to other holes, these adjacent holes are merged
  to form one larger hole.
• System may need to check whether there are processes waiting for
  memory and whether this newly freed and recombined memory could
  satisfy the demands of any of these waiting processes.
How to satisfy a request of size n from a
list of free holes ?
• First fit. Allocate the first hole that is big enough. Searching can start
  either at the beginning of the set of holes or at the location where the
  previous first-fit search ended. We can stop searching as soon as we find a
  free hole that is large enough.
• Best fit. Allocate the smallest hole that is big enough. We must search the
  entire list, unless the list is ordered by size. This strategy produces the
  smallest leftover hole.
• Worst fit. Allocate the largest hole. Again, we must search the entire list,
  unless it is sorted by size. This strategy produces the largest leftover hole,
  which may be more useful than the smaller leftover hole from a best-fit
  approach.
External Fragmentation
• Both the first-fit and best-fit strategies for memory allocation suffer from
  external fragmentation.
• As processes are loaded and removed from memory, the free memory
  space is broken into little pieces.
• External fragmentation exists when there is enough total memory space to
  satisfy a request but the available spaces are not contiguous: storage is
  fragmented into a large number of small holes.
• In the worst case, we could have a block of free (or wasted) memory
  between every two processes. If all these small pieces of memory were in
  one big free block instead, we might be able to run several more processes.
Internal Fragmentation
• Statistical analysis of first fit, for instance, reveals that, even with some
  optimization, given N allocated blocks, another 0.5 N blocks will be lost to
  fragmentation. That is, one-third of memory may be unusable! This
  property is known as the 50-percent rule.
• Memory fragmentation can be internal as well as external.
• Consider a multiple-partition allocation scheme with a hole of 18,464
  bytes.
• Suppose that the next process requests 18,462 bytes.
• If we allocate exactly the requested block, we are left with a hole of 2 bytes.
  The overhead to keep track of this hole will be substantially larger than the
  hole itself.
Internal Fragmentation
• The general approach to avoiding this problem is to break the physical
  memory into fixed-sized blocks and allocate memory in units based on
  block size.
• With this approach, the memory allocated to a process may be slightly
  larger than the requested memory. The difference between these two
  numbers is internal fragmentation—unused memory that is internal to a
  partition.
• One solution to the problem of external fragmentation
  is compaction
Compaction
• The goal is to shuffle the memory contents to place all free memory
  together in one large block.
• Compaction is not always possible, however.
• If relocation is static and is done at assembly or load time, compaction
  cannot be done.
• Compaction is possible only if relocation is dynamic and is done at
  execution time.
• If addresses are relocated dynamically, relocation requires only moving
  the program and data and then changing the base register to reflect the
  new base address.
Compaction (Cont..)
• The simplest compaction algorithm is to move all processes toward one
  end of memory; all holes move in the other direction, producing one large
  hole of available memory. This scheme can be expensive.
• Another possible solution to the external-fragmentation problem is to
  permit the logical address space of the processes to be noncontiguous, thus
  allowing a process to be allocated physical memory wherever such
  memory is available.
Solution to external-Fragmentation
• Another possible solution to the external-fragmentation
  problem is to permit the logical address space of the processes
  to be noncontiguous, thus allowing a process to be allocated
  physical memory wherever such memory is available.
Two complementary techniques achieve this solution are:
    1. Segmentation
    2. Paging
Segmentation – Programmers view of memory
• User’s view of memory is not the same as the actual physical memory.
• Segmentation is a memory-management scheme that supports this
  programmer view of memory.
Segmentation (cont..)
• Each segment has a name and a length. The addresses specify both the
  segment name and the offset within the segment.
• The programmer therefore specifies each address by two quantities: a
  segment name and an offset.
• For simplicity of implementation, segments are numbered and are referred
  to by a segment number, rather than by a segment name.
• Thus, a logical address consists of a two tuple:
<segment-number, offset>.
Segments
A C compiler might create separate segments for the following:
1. The code
2. Global variables
3. The heap, from which memory is allocated
4. The stacks used by each thread
5. The standard C library
Segmentation Hardware
• Although the programmer can now refer to objects in the program by a
  two-dimensional address, the actual physical memory is still a one
  dimensional sequence of bytes.
• Thus, we must define an implementation to map two-dimensional user-
  defined addresses into one-dimensional physical addresses.
• This mapping is effected by a segment table.
• Each entry in the segment table has a segment base and a segment limit.
• The segment base contains the starting physical address where the
  segment resides in memory, and the segment limit specifies the length of
  the segment.
Segmentation Hardware
Segmentation Example
Paging
• Paging avoids external fragmentation and the need for compaction
  whereas segmentation does not.
• The problem arises because, when code fragments or data residing in main
  memory need to be swapped out, space must be found on the backing
  store.
• Paging is implemented through cooperation between the operating system
  and the computer hardware.
Basic Method of Paging Implementation
• The basic method for implementing paging involves breaking physical
  memory into fixed-sized blocks called frames and breaking logical
  memory into blocks of the same size called pages.
• The backing store is divided into fixed-sized blocks that are the same size
  as the memory frames or clusters of multiple frames.
• Every address generated by the CPU is divided into two parts: a page
  number (p) and a page offset (d).
• The page number is used as an index into a page table.
• The page table contains the base address of each page in physical memory.
• This base address is combined with the page offset to define the physical
  memory address that is sent to the memory unit.
Paging Hardware
Paging Model
Paging
• The page size (like the frame size) is defined by the hardware.
• If the size of the logical address space is 2m, and a page size is 2n bytes,
  then the high-order m− n bits of a logical address designate the page
  number, and the n low-order bits designate the page offset.
• p is an index into the page table and d is the displacement within the page
Multi-level Paging
• Most modern computer systems
  support a large logical address
  space. In such an environment, the
  page table itself becomes excessively
  large.
• One way is to use a two-level paging
  algorithm, in which the page table
  itself is also paged.
Multi-level address translation
Implementation of Paging
• The standard solution is to use a special, small, fast lookup hardware cache
  called a translation look-aside buffer (TLB).
• The TLB is associative, high-speed memory.
• When the associative memory is presented with an item, the item is compared
  with all keys simultaneously.
• If the item is found, the corresponding value field is returned.
• The search is fast; a TLB lookup in modern hardware is part of the instruction
  pipeline, essentially adding no performance penalty.
• To be able to execute the search within a pipeline step, however, the TLB must
  be kept small. It is typically between 32 and 1,024 entries in size.
Translation Lookaside Buffer (TLB)
• The TLB contains only a few of the page-table entries.
• When a logical address is generated by the CPU, its page number is presented to the TLB.
EMAT
• The percentage of times that the page number of interest is found in the
  TLB is called the hit ratio.
• t – TLB look up time, m – mapping to physical memory, h-hit
EMAT calculation example TLB access time is
20 ns
1.   An 80-percent hit ratio, for example, means that we find the desired page number in the
     TLB 80 percent of the time. If it takes 100 nanoseconds to access memory, then a
     mapped-memory access takes 100 nanoseconds when the page number is in the TLB.
15 page faults
• Thus, when the kernel requests memory for an object, the slab allocator
  returns the exact amount of memory required to represent the object.
Allocating Kernel Memory through slab
Slab Allocation (Cont..)
2. Memory   requests can be satisfied quickly.
• The slab allocation scheme is thus particularly effective for managing
  memory when objects are frequently allocated and deallocated, as is often
  the case with requests from the kernel.
• Objects are created in advance and thus can be quickly allocated from the
  cache.
• When the kernel has finished with an object and releases it, it is marked as
  free and returned to its cache, thus making it immediately available for
  subsequent requests from the kernel.