Garbage Collection
slide 1
         Major Areas of Memory
● Static area
   • Fixed size, fixed content, allocated at compile time
   • Part of it holds things that do not change while the
     program runs, including:
      – the machine language program;
      – constants, including string constants.
   • That part of the static area is marked read-only, and
     your program is not allowed to change it.
   • A second part of the static area holds global variables,
     which can be changed by the programs, but that
     always occupy a fixed number of bytes.
   • The area is called static because its size does not
     change while the program runs.                         slide 2
Major Areas of Memory
● Run-time stack
  • fixed size, variable content (activation records)
  • Used for managing function calls and returns.
  • The run-time stack holds frames, where each frame
    holds information about one function call, including
    the function's variables, information about which
    function called it, and some information to help the
    program manage the run-time stack.
  • The run-time stack grows and shrinks as the program
    runs.
  • When a function is called, a frame is created for it in
    the run-time stack.
  • When a function returns, its frame is removed.          slide 3
Major Areas of Memory
● Heap
  • Fixed/Variable size, variable content
  • Dynamically allocated objects and data structures
     – Examples: ML reference cells, malloc in C, new in Java
     – The heap is another area of memory that can grow
       and shrink, but where you have direct control over
       when memory is allocated and when it is
       deallocated.
                                                                slide 4
Cells and Liveness
● Cell = data item in the heap
   • Cells are “pointed to” by pointers held in registers,
     stack, global/static memory, or in other heap cells
● Roots: registers, stack locations, global/static
  variables
● A cell is live if its address is held in a root or held
  by another live cell in the heap
                                                             slide 5
Garbage
● Garbage is a block of heap memory that cannot
  be accessed by the program
   • An allocated block of heap memory does not have a
     reference to it (cell is no longer “live”)
   • Another kind of memory error: a reference exists to a
     block of memory that is no longer allocated
● Garbage collection (GC) - automatic
  management of dynamically allocated storage
   • Reclaim unused heap blocks for later use by program
                                                             slide 6
Example of Garbage and Dangling
Reference
   class node {
       int value;    p = new node();
       node next;    q = new node();
   }                 q = p;
   node p, q;        delete p;
                                       slide 7
Why Garbage Collection?
● Today’s programs consume storage freely
   • 1GB laptops, 1-4GB deskops, 8-512GB servers
   • 64-bit address spaces (SPARC, Itanium, Opteron)
● … and mismanage it
   • Memory leaks, dangling references, null pointer
     dereference, heap fragmentation
   • Poor use of reference locality, resulting in high cache
     miss rates and/or excessive demand paging
● Explicit memory management breaks high-level
  programming abstraction
                                                               slide 8
GC and Programming Languages
● GC is not a language feature
● GC is a pragmatic concern for automatic and
  efficient heap management
   • Cooperative langs: Lisp, Scheme, Prolog, Smalltalk …
   • Uncooperative languages: C and C++
      – But garbage collection libraries have been built for C/C++
● Recent GC revival
   • Object-oriented languages: Modula-3, Java
      – In Java, runs as a low-priority thread; System.gc may be called
        by the program
   • Functional languages: ML and Haskell
                                                                     slide 9
The Perfect Garbage Collector
● No visible impact on program execution
● Works with any program and its data structures
   • For example, handles cyclic data structures
● Collects garbage (and only garbage) cells quickly
   • Incremental; can meet real-time constraints
● Has excellent spatial locality of reference
   • No excessive paging, no negative cache effects
● Manages the heap efficiently
   • Always satisfies an allocation request and does not
     fragment
                                                           slide 10
Summary of GC Techniques
● Reference counting
   • Directly keeps track of live cells
   • GC takes place whenever heap block is allocated
   • Doesn’t detect all garbage
● Tracing
   • GC takes place and identifies live cells when a
     request for memory fails
   • Mark-sweep
   • Copy collection
● Modern techniques: generational GC
                                                       slide 11
Reference Counting
● Simply count the number of references to a cell
● Requires space and time overhead to store the
  count and increment (decrement) each time a
  reference is added (removed)
   • Reference counts are maintained in real-time, so no
     “stop-and-gag” effect
   • Incremental garbage collection
● Unix file system uses reference counts for files
● C++ “smart pointer” (e.g., auto_ptr) uses
  reference counts
                                                           slide 12
Reference Counting: Example
                     Heap space
root
 set                  1
       1         1                    1
       1     2              1
                                          slide 13
Reference Counting: Example
  • How many pointers are currently pointing to the
    Heap elements?
  • We mainitain the reference count along with the
    Heap information.
                                          p    q
                        After q =p
   p     10     rc=1                      10       rc=2
   q    20      rc=1       free(p);
                                        20     rc=0
  • Scan all those heap elements that have rc=0, they
    are collected by Garbage Collector.
Reference Counting: Strengths
● Incremental overhead
   • Cell management interleaved with program execution
   • Good for interactive or real-time computation
● Relatively easy to implement
● Can coexist with manual memory management
● Can re-use freed cells immediately
   • If RC == 0, put back onto the free list
                                                      slide 15
Reference Counting: Weaknesses
● Space overhead
   • 1 word for the count, 1 for an indirect pointer
● Time overhead
   • Updating a pointer to point to a new cell requires:
      –   Check to ensure that it is not a self-reference
      –   Decrement the count on the old cell, possibly deleting it
      –   Update the pointer with the address of the new cell
      –   Increment the count on the new cell
● One missed increment/decrement results in a
  dangling pointer / memory leak
● Cyclic data structures may cause leaks
                                                                      slide 16
Reference Counting: Cycles
                      Heap space
root
                                   Memory leak
 set                   1
        1         1                     1
        1     2              1
                                                 slide 17
Reference Counting: Cycles
  • Garbage collection with reference counts
 The list shown here cannot be found via any program variable, but
 because it is circular, every cell contains a nonzero count.
     “Smart Pointer” in C++
•In modern C++, the Standard Library includes
 smart pointers, which ensure that programs are
 free of memory and resource leaks and are
 exception-safe.
•Smart pointers are defined in the std namespace in
 the <memory> header file.
•When you initialize a raw pointer to point to an
 actual resource, pass the pointer to a smart
 pointer.
            “Smart Pointer” in C++
•A smart pointer is a class template that you declare on the
 stack, and initialized by using a raw pointer that points to a
 heap-allocated object.
•After the smart pointer is initialized, it owns the raw
 pointer.
•The smart pointer is responsible for deleting the memory
 that the raw pointer specifies.
•The smart pointer destructor contains the call to delete,
 and because the smart pointer is declared on the stack, its
 destructor is invoked when the smart pointer goes out of
 scope.
  –Easily passed by value as an argument or result of a function
  – Takes no more space than regular pointer, but much “safer”
“Smart Pointer” in C++
            void main( )
            {
             shared_ptr<int> sptr1( new int );
             shared_ptr<int> sptr2 = sptr1;
             shared_ptr<int> sptr3;
             sptr3 = sptr2;
            }
Mark-Sweep Garbage Collection Idea
● If we can reach an object by following pointers from program
  variables, then the object is live (not garbage).
● These program variables are called roots and can be either
  frame-allocated or static.
● Approach: If a word has a value between the minimum and
  maximum address of the heap, then it is considered to be a pointer
  to the heap.
● An object is live if it is pointed by either a root or by a live object
  (this is a recursive definition).
● The garbage collector needs to start from each root and follow
  pointers recursively to mark all live objects.
Mark-Sweep Garbage Collection
● Each cell has a mark bit
● Marking phase
   • Starting from the roots, set the mark bit on all live cells
● Sweep phase
   • Return all unmarked cells to the free list
   • Reset the mark bit on all marked cells
                                                             slide 23
Mark-Sweep Example (1)
                Heap space
root
 set
                             slide 24
Mark-Sweep Example (2)
                Heap space
root
 set
                             slide 25
Mark-Sweep Example (3)
                Heap space
root
 set
                             slide 26
 Mark-Sweep Example (4)
                  Heap space
                               Free unmarked
root                                cells
 set
Reset mark bit
of marked cells
                                           slide 27
                     Another Example
                                                     Memory Addresses
                                                1     6    e   3
                                                2     0    i   4
                                                3     0    d   5
                                                4     0    g   0
                                                5     0    a   0
                                                6     8    b   7
                                                7     10   k   0
                                                8     0    c   0
                                                9     0    j   10
                                                10    0    f   0
                                                11    0    h   12
                                                12    11   l   0
•0 are null pointers while the other pointers are memory addresses. That is, the
 free list is 0 (empty) and the roots are 1, 6, and 9.
                   Example Cont..
1    6    e   3
2    0
3    0    d   5           After the mark-and-sweep
4    2                    garbage collection, the
5    0    a   0           memory layout becomes:
6    8    b   7
7    10   k   0
8    0    c   0
9    0    j   10
10   0    f   0
11   4
12   11
Mark-Sweep
● This process is naturally recursive
● The depth of recursion depends on the longest
  chain of cells
● Each recursive call will require a stack frame.
● Can this be done without using recursion?
                                                    slide 30
Mark-Sweep without Recursion
            Heap exploration via pointer reversal.
Mark-Sweep Costs and Benefits
● Good: handles cycles correctly
● Good: no space overhead
   • 1 bit used for marking cells may limit max values that
     can be stored in a cell (e.g., for integer cells)
● Bad: normal execution must be suspended
                                                          slide 32
Copying Collector
● Divide the heap into “from-space” and “to-space”
● Cells in from-space are traced and live cells are
  copied (“scavenged”) into to-space
   • To keep data structures linked, must update pointers
     for roots and cells that point into from-space
      – This is why references in Java and other languages are not
        pointers, but indirect abstractions for pointers
   • Only garbage is left in from-space
● When to-space fills up, the roles flip
   • Old to-space becomes from-space, and vice versa
                                                                     slide 33
Copying a Linked List
                                                      [Cheney’s algorithm]
        from-space                                   pointer
            root
                               A                  forwarding address
                     B
                                       C
                                   D
        to-space
                         A’   B’       C’   D’
                                            Cells in to-space
                                              are packed
                                                                       slide 34
Flipping Spaces
         to-space                            pointer
                                          forwarding address
         from-space
        root
                      A’   B’   C’   D’
                                                               slide 35
Copying Collector Tradeoffs
● Good: compacting
   • Eliminates fragmentation, good locality of reference
● Bad: twice the memory footprint
   • Probably Ok for 64-bit architectures (except for paging)
      – When copying, pages of both spaces need to be swapped in.
        For programs with large memory footprints, this could lead to
        lots of page faults for very little garbage collected
      – Large physical memory helps
                                                                   slide 36
Generational Garbage Collection
● Observation: most cells that die, die young
   • Nested scopes are entered and exited more
     frequently, so temporary objects in a nested scope
     are born and die close together in time
   • Inner expressions in Scheme are younger than outer
     expressions, so they become garbage sooner
● Divide the heap into generations, and GC the
  younger cells more frequently
   • Don’t have to trace all cells during a GC cycle
   • Periodically reap the “older generations”
                                                          slide 37
Generational Observations
● Can measure “youth” by time or by growth rate
● Common Lisp: 50-90% of objects die before they
  are 10KB old
● Glasgow Haskell:75-95% die within 10KB
   • No more than 5% survive beyond 1MB
● Standard ML reclaims over 98% of objects of any
  given generation during a collection
● C: one study showed that over 1/2 of the heap
  was garbage within 10KB and less than 10% lived
  for longer than 32KB
                                               slide 38
Generations with Semi-Spaces
root                                 Youngest
 set
       .
       .   From-space   To-space
       .
                                      Middle
                                   generation(s)
                                      Oldest
           From-space   To-space
                                                   slide 39
    Reference Counts versus Tracing
●   Reference counts require a counter field in every heap object. For small
    objects, this space overhead may be significant.
●   The expense of updating reference counts when pointers are changed can also
    be significant in a program with large amounts of pointer manipulation.
●   Tracing generally requires a reversed pointer indicator in every heap block,
    which reference counting does not,
●   Generational collectors must generally incur overhead on every pointer
    assignment in order to keep track of pointers into the newest section of the
    heap.
●   The two principal tradeoffs between reference counting and tracing are the
    inability of the former to handle cycles and the tendency of the latter to “stop
    the world” periodically in order to reclaim space.
                                                                                   slide 40
               References
● https://www.codeproject.com/Articles/541067/C
  plusplus-Smart-Pointers