2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011)
A New Garbage Collection Technique for Server Environment
                                                            Xiting Xie
                                              Zhong Xing Telecommunication Inc.
                                                       Nanjing, China
AbstractConventional generational garbage collection              threshold for each object; (2) providing an extra generation
techniques cannot achieve low tenuring costs, low scavenging       (e.g., aging space); (3) enlarging young generation size.
costs on the medium-lived objects and low generation size at the   These approaches give medium-lived objects more time to
meantime. In this paper, we will propose a novel Two-              die in the youngest generation, which avoid premature
Dimensional Generational Garbage Collector (2D-GC) which           promotion problem to some extent. However, the first two
can meet these three requirements at once. By its name, the        approaches will bring multiple-scavenging problem, that is,
2D-GC arranges the heap in two dimensions. Our                     the medium-lived object will be scavenged for multiple times
experimental results show that 2D-GC can achieve the lowest        before its death. Therefore, this approach will increase
tenuring cost and scavenging cost without enlarging generation
                                                                   scavenging cost of the medium-lived object. The last
size.
                                                                   approach will cost a lot of space, which can translate to long
   Keywords-Java Virtual Machine; Garbage Collection               pause time of a minor collection.
                                                                   In summary, conventional generational garbage collection
                                                                   techniques can not achieve low tenuring costs, low
                     I.     INTRODUCTION
                                                                   scavenging costs on the medium-lived objects and low
   One of the most useful language features of modern              generation size at the meantime. In this paper, we will
Object-Oriented programming languages is garbage                   propose a novel Two-Dimensional Generational Garbage
collection (GC).GC improves programming productivity by            Collector (2D-GC) which can meet these three requirements
reducing errors resulting from explicit memory management.         at once. The 2D-GC leverages Xian et.al.s research on
Moreover, GC underpins sound software engineering                  remote/local objects [10][23] to dynamically identify
principles of abstraction and modularity. GC leads to              clusters of similar-lifespan objects. Xian et.al defined key
cleaner code since memory management concerns are no               objects and segregate objects into groups based on temporal
longer cluttered with the programming logic.                       locality and lifespan similarity. Similarly, the 2D-GC
    To date, the best performing garbage collectors in wide        arranges the heap in two dimensions: (1) heap is divided
use are generational collectors. These collectors rely on the      into two vertical generations: a young generation and an old
fact that most objects die young. Generational GC partitions       generation; (2) the young generation is divided into two
the heap into age-based generations of objects, where age is       horizontal generations: one for accommodating short-lived
measured in the amount of allocation. New objects are              1 objects and the other for medium-lived and long-lived
allocated in the youngest generation. Rather than collecting       objects. New objects are allocated in either one horizontal
the entire heap and incurring the cost of copying all live         generation, depending on the estimated lifetimes of these
objects, generational collectors collect the youngest              objects. Survivors of either horizontal generation will be
generation (i.e., minor collection or nursery collection),         promoted to the old generation. The 2D-GC can be easily
place survivors in the next older generation, and only collect     extended to supply more horizontal generations in the young
older generations (i.e., major collection or full collection) if   generation, or tailored to a conventional generational
necessary.                                                         collector.
    However, the generational garbage collection has                   In the remainder of the paper, we will present some
difficulty of treating medium-lived objects. Medium-lived          related work and then describe our 2D-GC. Next we discuss
objects are the objects which survive at least one minor           its design issues and its implementation in Sun's JVM. Then
collection but die before the next full collection. Tenuring       we give experimental results and finally conclude this paper.
medium-lived objects too early will cause premature
promotion problem, which increases tenuring cost and also                II.   TWO DIMENSIONAL GENERATIONAL GARBAGE
causes the older generation to fill up too soon, resulting in a                       COLLECTION (2DGC)
full collection with a longer pause time. Also, premature
promotion has an adverse effect on the user program's                  Figure 1 illustrates the diagram of 2D-GC. 2D-GC
locality, since it is likely that most program accesses will be    arranges the heap in two dimensions: (1) heap is divided into
to younger objects. The premature promotion problem can be         a young generation and an old generation vertically; (2) the
alleviated by the following approaches: (1) setting a tenuring     young generation is divided into two horizontal generations:
978-1-4244-9243-5/11/$26.00   C   2011 IEEE                   V2-630
                       2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011)
Gen-s for accommodating short-lived objects and Gen-o for         usually short-lived) and non-short-lived sites. During an
non-short-lived objects (i.e., medium-lived and long-lived        actual run, the runtime system uses the prediction advice to
objects). New objects are allocated in either one horizontal      identify short-lived objects. Thus, no monitoring overhead is
generation, depending on the estimated lifetimes of these         incurred during the production run of the application. Online
objects. When the Gen-s is filled up, it will trigger a Gen-s     profiling approaches can be accurate but they are tedious and
collection. When the Gen-o is filled up, it will trigger a Gen-   inconsistent in different production runs (e.g., different
o collection. Survivors of either collection will be promoted     input). In online sampling, we do not profile all object
to the old generation.                                            lifetimes. Instead, we compute survival rate of allocation
                                                                  sites during every collection. By the sample information, we
                                                                  can guide collector to identity short-lived (or non-short-lived)
                                                                  objects. The accuracy of online sampling approaches highly
                                                                  depends on the sampling interval.
                                                                      The other way, that we plan to use, is to segregate objects
                                                                  based on type and connectivity information. Our segregation
                                                                  scheme is based on the following two observations:
                                                                          1.     The type with higher allocation rate tends to
                                                                                 have shorter lifetime. This observation
                                                                                 conforms to the prolific hypothesis [20] that
                                                                                 prolific types (i.e., objects types that are
                                                                                 allocated most frequently) have short lifetimes.
                                                                                 An intuitive basis for this hypothesis is that if
                                                                                 this were not true, an application that
                                                                                 continuously allocates dynamic memory
                                                                                 would       have      unsustainable     memory
                                                                                 requirements, as it would keep creating objects
                   Figure 1.      2D-GC Diagram
                                                                                 of prolific types at a fast pace without
                                                                                 reclaiming sufficient space. Put in another way,
                III.   DESIGN ISSUES OF 2D-GC                                    the hypothesis predicts that the objects of
                                                                                 prolific types die younger than objects of non-
    In the design of 2D-GC, several important and                                prolific types. Shuf et al. [20] validated this
interrelated challenges need to be solved:                                       hypothesis and found that the relative survival
    1. Object segregation. Object segregation is directly                        rates are usually lower for objects of prolific
           related with object allocation. Objects are                           types than for objects of non-prolific types.
           segregated based on lifetime classification. Short-                   We also found that most of the dead objects
           lived objects are allocated in Gen-s and the rest                     and most of the garbage comes from short-
           objects are allocated in Gen-o. We need a cost-                       lived objects of prolific types.
           effective approach to classify objects into short-             2.     Connected objects tend to die together. Hirzel
           lived and non-short-lived.                                            et al. [19] investigated the relationship
    2. Heap organization. What is the optimal size ratio                         between lifetime and heap connectivity. They
           between Gen-s and Gen-o?                                              found that on average linked objects have a
    3. Intergenerational pointers. Intergenerational                             very high probability (about 70-80.4%) of
           pointers include pointers from the old generation                     dying at the same time.
           to the young generation, and pointers between              Our segregation scheme has two phases: the first phase is
           Gen-s and Gen-o. How do we remember these              coarse-grained classification phase; the second is refining
           pointers?                                              classification phase. The first phase use type-based lifetime
                                                                  classification approach. Based on the first observation, we
A. Object Segregation                                             can classify object types into short-lived types and non-short-
    We can use several ways to differentiate short-lived and      lived types based on their allocate rates. We compute
non-short-lived objects, varying in terms of accuracy and         allocation rate of each object type periodically (here we
overhead.                                                         perform computation at each collection point). If the
     One possible way is predicting object lifetimes based on     allocation rate of an object type exceeds a certain threshold,
allocation-site by using online profiling [12][13] or online      then this type will be treated as short-lived type, or else
sampling [21]. Most recent research indicated a strong            treated as non-short-lived type. All objects of short-lived
correlation between allocation-sites and object lifetimes. In     types will be assumed as short-lived objects and others will
online profiling, a runtime system monitors memory                be non-short-lived objects.
activities at each allocation site. When an application exits,        However, type-based lifetime classification is too strict
the collected profile is saved into a file. By analyzing the      and error-prone. For example, objects of long-lived types
profile, we generate an advice file which classifies allocation   might die young because they are linked to short-lived
sites into short-lived sites (i.e., the sites whose objects are   objects. Therefore, we use connectivity-based lifetime
                                                             V2-631
                      2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011)
classification to refine our prediction. By the second                              IV.    EXPERIMENTAL RESULTS
observation, it is reasonable to assume that objects connected         To simply discussion, we begin with a baseline
with other short-lived objects will be most likely short-lived.    generational collector, Moon's Ephemeral Garbage Collector
Note that we can not apply this assumption directly to non-        [11], in which all survival objects in the young generation
short-lived objects, because the second observation does not       are promoted to the old generation. Then we discuss two
correlate object death-times with their lifetimes. Assume that     optimized version of the baseline generational collectors: one
an object x is allocated and linked to a long-lived object y,      using tenuring-threshold, the other which enlarges the
and then y dies immediately after the allocation. We can           generational size. Next we present 2D-GC and finally
reasonably assume that x will die together with y at high          compare the tenuring costs, scavenging cost and the young
probability, which implies that x is most likely short-lived       generation size of these four collectors.
instead of long-lived. However, if an objects is connected to          We summarize results in Table 1. As shown in the table,
a same-aged long-lived object (i.e., these two objects are         2D-GC can achieve the lowest tenuring cost and scavenging
allocated very closely in time), then it is highly possible that   cost without enlarging generation size.
this object is long-lived because the two objects will have the
same death-time. Therefore, given two connected objects o1                         TABLE I. Comparisons of 4 GC techniques
and o2 (o1 points to o2 directly or indirectly), we can decide
o2's lifetime by the following rules:
        1.      Rule 1: If o1 is short-lived, then o2 is most
                likely short-lived;                                                                                        Young
                                                                                                       Scavenging
        2.      Rule 2: If o1 is non-short-lived and o1 has the                     Tenuring cost                        generation
                                                                                                          cost
                same birth-time with o2, then o2 is most likely    Garbage                                                  size
                non-short-lived.                                   collector
    Note that these two rules cannot be applied at the same        Ephemeral
                                                                                          10                0                20
                                                                   GC
time because they might fail at the situation shown in Figure
4. In Figure 4, o1 and o2 are allocated at time 1. Assume we       GC without
know o1 is short-lived and o2 is long-lived. Then o3 is            tenuring                4               10                30
allocated at the same time and later both o1 and o2 point to       threshold
o3. By rule 1, o3 should be most likely short-lived. By rule 2,
however, o3 should be most likely long-lived. Therefore, we        GC       with
set different priority levels to these two rules. By default,      enlarged
                                                                                           4               10                20
rule 2 has higher priority than rule 1.                            generation
                                                                   size
                                                                   2D-GC                   4                0                20
                                                                                          V.   RELATED WORK
                                                                       Blackburn et al. [15][16] used the profile-based approach
                                                                   to select objects for pretenuring. They classified objects into
                                                                   three different classes: short lived, long lived, and immortal
                                                                   based on two measures: age and time of death. The proposed
                                                                   method was to provide the statically-analyzed object
                                                                   information to the compiler so that the optimal object
    When using connectivity-based lifetime classification to
                                                                   allocation site could be determined at runtime.
segregate objects, we encounter another difficulty: when an
                                                                       Jump and Blackburn et al. [21] introduced a low-
object is allocated, its future connectivity information is not
                                                                   overhead object sampling technique to dynamically
known a priori. This is a dilemma: we need the object's
                                                                   pretenure long-lived objects. The sampling technique can
connectivity information to predict its lifetime and allocate it
                                                                   accurately predict allocation site survival rates. Their
to a proper space (Gen-s or Gen-o); but connectivity can only
                                                                   approach cannot attain the consistent performance
be established after its allocation. Fortunately, we can
                                                                   improvement on SPECjvm98 benchmarks. So they leave
capture partial connectivity information from calling context
                                                                   open optimization policies that can benefit from lifetime
before the object's allocation. For example, if o2 is allocated
                                                                   sampling.
within o1's method call, then o1 will be connecting to o2
                                                                       Huang et al. extended the aforementioned work by
indirectly. Actually, we do not need the thorough
                                                                   maintaining online lifetime history based on the object type
connectivity information of an object, because it will be
                                                                   to adaptively select which objects are to be pretenured. They
costly and have little help to improve the accuracy of lifetime
                                                                   found a strong correlation between an object type and its
classification.
                                                                   lifespan. Depending on the nature of the applications, some
                                                                   object types are by default long-lived. They claimed that by
                                                              V2-632
                      2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011)
pretenuring objects of long-lived types can reduce the                                           REFERENCES
garbage collection time by up to 37%.
     Xian et al. developed AS-GC [23][10] to solve long GC         [1]    M. Hertz, S.M. Blackburn, J. E. B. Moss, K. S. Mckliney, D.
pause time during heavy workload. Their approach classified               Stefanovic. Generatingobject lifetime traces with Merlin.
objects into remote objects and local objects. Their approach             Transactions on Programming Languages And Systems (TOPLAS)
is the first implementation of segregating objects of different           28, 3(May), 476516.
lifespans. In their study, they found that remote objects tend     [2]    M. Hertz, Y. Feng, E. D. Berger. Garbage collection without paging.
to live longer than local objects. Therefore, their approach              In Proceedings of the 2005 ACM SIGPLAN Conference on
                                                                          Programming Language Design and Implementation (PLDI 2005),
divided young generation into local space and remote space.               Volume 40(7) of ACM SIGPLAN Notices (Chicago, IL, June 2005),
We extended their approach to a more general scenario. Our                pp. 143153.
approach can intelligently distinguish long-lived objects and      [3]    M. Hertz, N. Immerman, J. E. B. Moss. Framework for analyzing
short-lived objects.                                                      garbage collection. In R. BAEZA-YATES, U. MONTANARI, AND
Optimizing JVM performance There are a plenty of                          N. SANTORO Eds., Foundations of Information Technology in the
literatures of optimizing Java virtual machine performance                Era of Network and Mobile Computing: IFIP 17th World Computer
                                                                          Congress - TC1 Stream (TCS 2002), Volume 223 of IFIP Conference
through OS support or hardware support.                                   Proceedings (Montreal, Canada, 2002), pp. 230241.
    Hertz et al. developed Bookmarking Collector (BC) [2].         [4]    F. Xian, W. Srisa-an, H. Jiang. Allocation-phase aware thread
The BC is a garbage collection approach which is tightly                  scheduling policies to improve garbage collection performance. In
coupled with VMM of the operating system to minimize                      Proceedings of the 6th international Symposium on Memory
paging overhead when memory pressure is very high.                        Management (Montreal, Quebec, Canada, October 21 - 22, 2007).
    Yang et al. developed CRAMM [9] to dynamically                        ACM, New York, NY, 79-90.
adjusts heap size through coordination between VMM and             [5]    F. Xian, W. Srisa-an, H. Jiang. Garbage collection: Java application
                                                                          servers' Achilles heel. Sci. Comput. Program. 70, 2-3 (Feb. 2008),
Java virtual machine. In CRAMM, a new VMM is                              89-110.
implemented to compute working-set size information of             [6]    F. Xian, W. Srisa-an, H. Jiang. Contention-aware scheduler:
each process. The information is passed to a heap sizing                  unlocking execution parallelism in multithreaded java programs. In
analytical model to predict an appropriate heap size that can             Proceedings of the 23rd ACM SIGPLAN Conference on Object-
improve performance as well as minimizing page faults.                    Oriented Programming Systems Languages and Applications
    Michael et al. proposed Yield Predictor (YP) [7] which                (Nashville, TN, USA, October 19 - 23, 2008). ACM, New York, NY,
                                                                          163-180.
provides a simple solution to improve GC efficiency by
                                                                   [7]    M. Wegiel, C. Krintz. Dynamic prediction of collection yield for
estimating GC yield (i.e., number of dead objects) using                  managed runtimes. In Proceeding of the 14th international
hardware page reference bits used by OS virtual memory                    Conference on Architectural Support For Programming Languages
manager. The YP exploits the prior empirical result that                  and Operating Systems (Washington, DC, USA, March 07 - 11, 2009).
objects with similar life spans tend to be clustered in the heap          ACM, New York, NY, 289-300.
and tend to die together.                                          [8]    L. Zhang, C. Krintz, and P. Nagpurkar, Supporting Exception
    Xian et al. implemented a Contention-Aware Scheduler                  Handling for Futures in Java, ACM International Conference on the
                                                                          Principles and Practice on Programming in Java (PPPJ), Sep, 2007.
[6] which exploits lock contention information to reduce the
occurrences of lock convoys in large application servers           [9]    L. Zhang, C. Krintz, and P. Nagpurkar, Language and Virtual
                                                                          Machine Support for Efficient Fine-Grained Futures in Java, The
running in multi-core systems. CA-Scheduler relies on                     International Conference on Parallel Architectures and Compilation
collaborations between operating system kernels and JVM to                Techniques, (PACT) Sep, 2007.
improve the efficiency of Java runtime environments.               [10]   F. Xian, W. Srisa-an, H. Jiang. Service oriented garbage collection:
    Xian and Srisa-an et al. proposed an allocation-phase                 improving performance and robustness of application servers. In
aware scheduler [4]. The allocation-phase aware scheduler                 Proceedings of ACM SIGPLAN Conference on Object-Oriented
                                                                          Programming, Systems, Languages and Applications 2006: 661-662.
exploits object allocation pattern to guide thread scheduling
decisions. The principle of the approach is based on the           [11]   D.A. Moon. Garbage collection in a large LISP system. In
                                                                          Conference Record of the 1994 ACM Symposium on LISP and
observations that objects are created and die in phases [22]              Functional Programming (Austin, TX,Aug. 1994), pp. 235245.
and that current thread scheduling method is not friendly to       [12]   S. M. Blackburn, P. Cheng, and K.S. McKinley. Myths and reality:
garbage collection [24] [5].                                              The performance impact of garbage collection. In Proceedings of the
                                                                          2004 Joint International Conference on Measurement and Modeling
                                                                          of Computer Systems (New York, NY, June 2004), ACM, pp. 2536.
                     VI.    CONSLUSIONS                            [13]   S. M. Blackburn, P. Cheng, and K. S. McKinley, "Oil and Water?
                                                                          High Performance Garbage Collection in Java with MMTk," in ICSE
   Conventional generational garbage collection techniques                2004, 26th International Conference on Software Engineering,
cannot achieve low tenuring costs, low scavenging costs on                Edinburgh, Scotland, May 23-28, 2004, 2004.
the medium-lived objects and low generation size at the            [14]   S. M. Blackburn, R. E. Jones, K. S. McKinley, and E. J. B. Moss,
meantime. In this paper, we will propose a novel Two-                     "Beltway: Getting Around Garbage Collection Gridlock," in
Dimensional Generational Garbage Collector (2D-GC)                        Proceedings of SIGPLAN 2002 Conference on Programming
which can meet these three requirements at once. By its                   Languages Design and Implementation, PLDI02, Berlin, June, 2002,
                                                                          Berlin, Germany, 2002.
name, 2D-GC arranges the heap in two dimensions. Our
experimental results show that 2D-GC can achieve the               [15]   S. M. Blackburn, S. Singhai, M. Hertz, K.S. McKinley, and J. E. B.
                                                                          Moss. Pretenuring for Java. In Proceedings of the 16th ACM
lowest tenuring cost and scavenging cost without enlarging                Conference on Object-Oriented Programming, Systems, Languages,
generation size.                                                          & Applications (Tampa, FL, Oct. 2001), ACM, pp. 342352.
                                                              V2-633
                         2011 3rd International Conference on Computer Modeling and Simulation (ICCMS 2011)
[16] S. M. Blackburn, M. Hertz, K. S. McKinley, E. J. B. Moss, and T.       [20] Y. Shuf, M. Gupta, R. Bordawekar, and J. Pal Singh. Exploiting
     Yang, "Profile-based pretenuring," ACM Transactions on                      prolific types for memory management and optimizations. SIGPLAN
     Programming Languages and Systems, vol. 27, iss. 1, 2007.                   Not. 37, 295-306.
[17] S. M. Blackburn, P. Cheng, and K. S. McKinley, "Myths and              [21] M. Jump, S. M. Blackburn, and K. S. McKinley, "Dynamic Object
     Realities: The Performance Impact of Garbage Collection," in                Sampling for Pretenuring," in The 2004 International Symposium on
     SIGMETRICS  Performance 2004, Joint International Conference               Memory Management (ISMM 2004), Vancouver, Canada, October
     on Measurement and Modeling of Computer Systems, New York, NY,              24-25, 2004.
     USA, June 1216, 2004, 2004.                                           [22] F. Xian, W. Srisa-an, H. Jiang. MicroPhase: Proactively Invoking
[18] S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S.                 Garbage Collection for Improved Performance. In Proceedings of
     McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z.             ACM SIGPLAN Conference on Object-Oriented Programming,
     Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. Eliot, B. Moss, A.        Systems, Languages and Applications (OOPSLA'07), Montreal,
     Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B.           Canada. Oct 21-25.
     Wiedermann, "The DaCapo benchmarks: Java benchmarking                  [23] F. Xian, W. Srisa-an, C. Jia, H. Jiang. "AS-GC: An Efficient
     development and analysis," in OOPSLA 06: Proceedings of the 21st           Generational Garbage Collector for Java Application Servers". In
     annual ACM SIGPLAN conference on Object-oriented programming                Proceedings of 21st European Conference on Object-Oriented
     systems, languages, and applications, New York, NY, USA, 2006, pp.          Programming (ECOOP'07), Berlin, Germany. July 30-Aug 03, 2007.
     169-190.
                                                                            [24] F. Xian, W. Srisa-an, H. Jiang. "Investigating Throughput
[19] M. Hirzel, A. Diwan, and M. Hertz. Connectivity-based garbage               Degradation Behavior of Java Application Servers: A View from
     collection. In Proceedings of the 18th ACM Conference on Object-            Inside a Virtual Machine". In Proceedings of ACM International
     Oriented Programming, Systems, Languages, & Applications                    Conference on Principles and Practices of Programming In Java
     (Anaheim, CA, Oct. 2003), pp. 359373.                                      (PPPJ'06), Mannheim, Germany, Aug 30-Sep 1, 2006, Page 40-49.
                                                                      V2-634