0% found this document useful (0 votes)
19 views41 pages

MRP Garbage Collection

The document discusses garbage collection (GC) in programming, outlining major memory areas such as static, run-time stack, and heap. It explains the concept of garbage, which refers to inaccessible memory blocks, and various GC techniques including reference counting and mark-sweep. Additionally, it highlights the importance of GC in modern programming languages to manage memory efficiently and avoid issues like memory leaks and fragmentation.

Uploaded by

geniusloq361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views41 pages

MRP Garbage Collection

The document discusses garbage collection (GC) in programming, outlining major memory areas such as static, run-time stack, and heap. It explains the concept of garbage, which refers to inaccessible memory blocks, and various GC techniques including reference counting and mark-sweep. Additionally, it highlights the importance of GC in modern programming languages to manage memory efficiently and avoid issues like memory leaks and fragmentation.

Uploaded by

geniusloq361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Garbage Collection

slide 1
Major Areas of Memory

● Static area
• Fixed size, fixed content, allocated at compile time
• Part of it holds things that do not change while the
program runs, including:
– the machine language program;
– constants, including string constants.
• That part of the static area is marked read-only, and
your program is not allowed to change it.
• A second part of the static area holds global variables,
which can be changed by the programs, but that
always occupy a fixed number of bytes.
• The area is called static because its size does not
change while the program runs. slide 2
Major Areas of Memory
● Run-time stack
• fixed size, variable content (activation records)
• Used for managing function calls and returns.
• The run-time stack holds frames, where each frame
holds information about one function call, including
the function's variables, information about which
function called it, and some information to help the
program manage the run-time stack.
• The run-time stack grows and shrinks as the program
runs.
• When a function is called, a frame is created for it in
the run-time stack.
• When a function returns, its frame is removed. slide 3
Major Areas of Memory
● Heap
• Fixed/Variable size, variable content
• Dynamically allocated objects and data structures
– Examples: ML reference cells, malloc in C, new in Java

– The heap is another area of memory that can grow


and shrink, but where you have direct control over
when memory is allocated and when it is
deallocated.

slide 4
Cells and Liveness
● Cell = data item in the heap
• Cells are “pointed to” by pointers held in registers,
stack, global/static memory, or in other heap cells

● Roots: registers, stack locations, global/static


variables

● A cell is live if its address is held in a root or held


by another live cell in the heap

slide 5
Garbage
● Garbage is a block of heap memory that cannot
be accessed by the program
• An allocated block of heap memory does not have a
reference to it (cell is no longer “live”)
• Another kind of memory error: a reference exists to a
block of memory that is no longer allocated

● Garbage collection (GC) - automatic


management of dynamically allocated storage
• Reclaim unused heap blocks for later use by program

slide 6
Example of Garbage and Dangling
Reference

class node {
int value; p = new node();

node next; q = new node();

} q = p;

node p, q; delete p;

slide 7
Why Garbage Collection?
● Today’s programs consume storage freely
• 1GB laptops, 1-4GB deskops, 8-512GB servers
• 64-bit address spaces (SPARC, Itanium, Opteron)
● … and mismanage it
• Memory leaks, dangling references, null pointer
dereference, heap fragmentation
• Poor use of reference locality, resulting in high cache
miss rates and/or excessive demand paging
● Explicit memory management breaks high-level
programming abstraction

slide 8
GC and Programming Languages
● GC is not a language feature
● GC is a pragmatic concern for automatic and
efficient heap management
• Cooperative langs: Lisp, Scheme, Prolog, Smalltalk …
• Uncooperative languages: C and C++
– But garbage collection libraries have been built for C/C++
● Recent GC revival
• Object-oriented languages: Modula-3, Java
– In Java, runs as a low-priority thread; System.gc may be called
by the program
• Functional languages: ML and Haskell
slide 9
The Perfect Garbage Collector
● No visible impact on program execution
● Works with any program and its data structures
• For example, handles cyclic data structures
● Collects garbage (and only garbage) cells quickly
• Incremental; can meet real-time constraints
● Has excellent spatial locality of reference
• No excessive paging, no negative cache effects
● Manages the heap efficiently
• Always satisfies an allocation request and does not
fragment
slide 10
Summary of GC Techniques
● Reference counting
• Directly keeps track of live cells
• GC takes place whenever heap block is allocated
• Doesn’t detect all garbage
● Tracing
• GC takes place and identifies live cells when a
request for memory fails
• Mark-sweep
• Copy collection
● Modern techniques: generational GC

slide 11
Reference Counting
● Simply count the number of references to a cell
● Requires space and time overhead to store the
count and increment (decrement) each time a
reference is added (removed)
• Reference counts are maintained in real-time, so no
“stop-and-gag” effect
• Incremental garbage collection
● Unix file system uses reference counts for files
● C++ “smart pointer” (e.g., auto_ptr) uses
reference counts
slide 12
Reference Counting: Example

Heap space

root
set 1

1 1 1

1 2 1

slide 13
Reference Counting: Example

• How many pointers are currently pointing to the


Heap elements?
• We mainitain the reference count along with the
Heap information.
p q
After q =p
p 10 rc=1 10 rc=2

q 20 rc=1 free(p);
20 rc=0
• Scan all those heap elements that have rc=0, they
are collected by Garbage Collector.
Reference Counting: Strengths
● Incremental overhead
• Cell management interleaved with program execution
• Good for interactive or real-time computation
● Relatively easy to implement

● Can coexist with manual memory management

● Can re-use freed cells immediately


• If RC == 0, put back onto the free list

slide 15
Reference Counting: Weaknesses
● Space overhead
• 1 word for the count, 1 for an indirect pointer
● Time overhead
• Updating a pointer to point to a new cell requires:
– Check to ensure that it is not a self-reference
– Decrement the count on the old cell, possibly deleting it
– Update the pointer with the address of the new cell
– Increment the count on the new cell
● One missed increment/decrement results in a
dangling pointer / memory leak
● Cyclic data structures may cause leaks
slide 16
Reference Counting: Cycles

Heap space

root
Memory leak
set 1

1 1 1

1 2 1

slide 17
Reference Counting: Cycles

• Garbage collection with reference counts

The list shown here cannot be found via any program variable, but
because it is circular, every cell contains a nonzero count.
“Smart Pointer” in C++

•In modern C++, the Standard Library includes


smart pointers, which ensure that programs are
free of memory and resource leaks and are
exception-safe.

•Smart pointers are defined in the std namespace in


the <memory> header file.

•When you initialize a raw pointer to point to an


actual resource, pass the pointer to a smart
pointer.
“Smart Pointer” in C++

•A smart pointer is a class template that you declare on the


stack, and initialized by using a raw pointer that points to a
heap-allocated object.
•After the smart pointer is initialized, it owns the raw
pointer.
•The smart pointer is responsible for deleting the memory
that the raw pointer specifies.
•The smart pointer destructor contains the call to delete,
and because the smart pointer is declared on the stack, its
destructor is invoked when the smart pointer goes out of
scope.
–Easily passed by value as an argument or result of a function
– Takes no more space than regular pointer, but much “safer”
“Smart Pointer” in C++
void main( )
{
shared_ptr<int> sptr1( new int );
shared_ptr<int> sptr2 = sptr1;
shared_ptr<int> sptr3;
sptr3 = sptr2;
}
Mark-Sweep Garbage Collection Idea

● If we can reach an object by following pointers from program


variables, then the object is live (not garbage).

● These program variables are called roots and can be either


frame-allocated or static.

● Approach: If a word has a value between the minimum and


maximum address of the heap, then it is considered to be a pointer
to the heap.

● An object is live if it is pointed by either a root or by a live object


(this is a recursive definition).

● The garbage collector needs to start from each root and follow
pointers recursively to mark all live objects.
Mark-Sweep Garbage Collection
● Each cell has a mark bit

● Marking phase
• Starting from the roots, set the mark bit on all live cells
● Sweep phase
• Return all unmarked cells to the free list
• Reset the mark bit on all marked cells

slide 23
Mark-Sweep Example (1)

Heap space

root
set

slide 24
Mark-Sweep Example (2)

Heap space

root
set

slide 25
Mark-Sweep Example (3)

Heap space

root
set

slide 26
Mark-Sweep Example (4)

Heap space
Free unmarked
root cells
set

Reset mark bit


of marked cells

slide 27
Another Example
Memory Addresses
1 6 e 3
2 0 i 4
3 0 d 5
4 0 g 0
5 0 a 0
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 0 h 12
12 11 l 0

•0 are null pointers while the other pointers are memory addresses. That is, the
free list is 0 (empty) and the roots are 1, 6, and 9.
Example Cont..

1 6 e 3
2 0
3 0 d 5 After the mark-and-sweep
4 2 garbage collection, the
5 0 a 0 memory layout becomes:
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 4
12 11
Mark-Sweep
● This process is naturally recursive
● The depth of recursion depends on the longest
chain of cells
● Each recursive call will require a stack frame.

● Can this be done without using recursion?

slide 30
Mark-Sweep without Recursion

Heap exploration via pointer reversal.


Mark-Sweep Costs and Benefits
● Good: handles cycles correctly
● Good: no space overhead
• 1 bit used for marking cells may limit max values that
can be stored in a cell (e.g., for integer cells)
● Bad: normal execution must be suspended

slide 32
Copying Collector
● Divide the heap into “from-space” and “to-space”
● Cells in from-space are traced and live cells are
copied (“scavenged”) into to-space
• To keep data structures linked, must update pointers
for roots and cells that point into from-space
– This is why references in Java and other languages are not
pointers, but indirect abstractions for pointers
• Only garbage is left in from-space
● When to-space fills up, the roles flip
• Old to-space becomes from-space, and vice versa

slide 33
Copying a Linked List
[Cheney’s algorithm]

from-space pointer

root
A forwarding address

B
C
D

to-space

A’ B’ C’ D’

Cells in to-space
are packed

slide 34
Flipping Spaces

to-space pointer

forwarding address

from-space
root
A’ B’ C’ D’

slide 35
Copying Collector Tradeoffs
● Good: compacting
• Eliminates fragmentation, good locality of reference
● Bad: twice the memory footprint
• Probably Ok for 64-bit architectures (except for paging)
– When copying, pages of both spaces need to be swapped in.
For programs with large memory footprints, this could lead to
lots of page faults for very little garbage collected
– Large physical memory helps

slide 36
Generational Garbage Collection
● Observation: most cells that die, die young
• Nested scopes are entered and exited more
frequently, so temporary objects in a nested scope
are born and die close together in time
• Inner expressions in Scheme are younger than outer
expressions, so they become garbage sooner

● Divide the heap into generations, and GC the


younger cells more frequently
• Don’t have to trace all cells during a GC cycle
• Periodically reap the “older generations”
slide 37
Generational Observations
● Can measure “youth” by time or by growth rate
● Common Lisp: 50-90% of objects die before they
are 10KB old
● Glasgow Haskell:75-95% die within 10KB
• No more than 5% survive beyond 1MB
● Standard ML reclaims over 98% of objects of any
given generation during a collection
● C: one study showed that over 1/2 of the heap
was garbage within 10KB and less than 10% lived
for longer than 32KB
slide 38
Generations with Semi-Spaces

root Youngest
set
.
. From-space To-space
.

Middle
generation(s)

Oldest

From-space To-space

slide 39
Reference Counts versus Tracing

● Reference counts require a counter field in every heap object. For small
objects, this space overhead may be significant.

● The expense of updating reference counts when pointers are changed can also
be significant in a program with large amounts of pointer manipulation.

● Tracing generally requires a reversed pointer indicator in every heap block,


which reference counting does not,

● Generational collectors must generally incur overhead on every pointer


assignment in order to keep track of pointers into the newest section of the
heap.

● The two principal tradeoffs between reference counting and tracing are the
inability of the former to handle cycles and the tendency of the latter to “stop
the world” periodically in order to reclaim space.
slide 40
References

● https://www.codeproject.com/Articles/541067/C
plusplus-Smart-Pointers

You might also like