Our Collectors

I drew this diagram on a white board for some customers recently. They seemed to like it (or were just being very polite) so I thought I redraw it for your amusement.

Each blue box represents a collector that is used to collect a generation. The young generation is collected by the blue boxes in the yellow region and the tenured generation is collected by the blue boxes in the gray region.

  • "Serial" is a stop-the-world, copying collector which uses a single GC thread.
  • "ParNew" is a stop-the-world, copying collector which uses multiple GC threads. It differs from "Parallel Scavenge" in that it has enhancements that make it usable with CMS. For example, "ParNew" does the synchronization needed so that it can run during the concurrent phases of CMS.
  • "Parallel Scavenge" is a stop-the-world, copying collector which uses multiple GC threads.
  • "Serial Old" is a stop-the-world, mark-sweep-compact collector that uses a single GC thread.
  • "CMS" is a mostly concurrent, low-pause collector.
  • "Parallel Old" is a compacting collector that uses multiple GC threads.

    Using the -XX flags for our collectors for jdk6,

  • UseSerialGC is "Serial" + "Serial Old"
  • UseParNewGC is "ParNew" + "Serial Old"
  • UseConcMarkSweepGC is "ParNew" + "CMS" + "Serial Old". "CMS" is used most of the time to collect the tenured generation. "Serial Old" is used when a concurrent mode failure occurs.
  • UseParallelGC is "Parallel Scavenge" + "Serial Old"
  • UseParallelOldGC is "Parallel Scavenge" + "Parallel Old"

    FAQ

    1) UseParNew and UseParallelGC both collect the young generation using multiple GC threads. Which is faster?

    There's no one correct answer for this questions. Mostly they perform equally well, but I've seen one do better than the other in different situations. If you want to use GC ergonomics, it is only supported by UseParallelGC (and UseParallelOldGC) so that's what you'll have to use.

    2) Why doesn't "ParNew" and "Parallel Old" work together?

    "ParNew" is written in a style where each generation being collected offers certain interfaces for its collection. For example, "ParNew" (and "Serial") implements space_iterate() which will apply an operation to every object in the young generation. When collecting the tenured generation with either "CMS" or "Serial Old", the GC can use space_iterate() to do some work on the objects in the young generation. This makes the mix-and-match of collectors work but adds some burden to the maintenance of the collectors and to the addition of new collectors. And the burden seems to be quadratic in the number of collectors. Alternatively, "Parallel Scavenge" (at least with its initial implementation before "Parallel Old") always knew how the tenured generation was being collected and could call directly into the code in the "Serial Old" collector. "Parallel Old" is not written in the "ParNew" style so matching it with "ParNew" doesn't just happen without significant work. By the way, we would like to match "Parallel Scavenge" only with "Parallel Old" eventually and clean up any of the ad hoc code needed for "Parallel Scavenge" to work with both.

    Please don't think too much about the examples I used above. They are admittedly contrived and not worth your time.

    3) How do I use "CMS" with "Serial"?

    -XX:+UseConcMarkSweepGC -XX:-UseParNewGC. Don't use -XX:+UseConcMarkSweepGC and -XX:+UseSerialGC. Although that's seems like a logical combination, it will result in a message saying something about conflicting collector combinations and the JVM won't start. Sorry about that. Our bad.

    4) Is the blue box with the "?" a typo?

    That box represents the new garbage collector that we're currently developing called Garbage First or G1 for short. G1 will provide

  • More predictable GC pauses
  • Better GC ergonomics
  • Low pauses without fragmentation
  • Parallelism and concurrency in collections
  • Better heap utilization

    G1 straddles the young generation - tenured generation boundary because it is a generational collector only in the logical sense. G1 divides the heap into regions and during a GC can collect a subset of the regions. It is logically generational because it dynamically selects a set of regions to act as a young generation which will then be collected at the next GC (as the young generation would be).

    The user can specify a goal for the pauses and G1 will do an estimate (based on past collections) of how many regions can be collected in that time (the pause goal). That set of regions is called a collection set and G1 will collect it during the next GC.

    G1 can choose the regions with the most garbage to collect first (Garbage First, get it?) so gets the biggest bang for the collection buck.

    G1 compacts so fragmentation is much less a problem. Why is it a problem at all? There can be internal fragmentation due to partially filled regions.

    The heap is not statically divided into a young generation and a tenured generation so the problem of an imbalance in their sizes is not there.

    Along with a pause time goal the user can specify a goal on the fraction of time that can be spent on GC during some period (e.g., during the next 100 seconds don't spend more than 10 seconds collecting). For such goals (10 seconds of GC in a 100 second period) G1 can choose a collection set that it expects it can collect in 10 seconds and schedules the collection 90 seconds (or more) from the previous collection. You can see how an evil user could specify 0 collection time in the next century so again, this is just a goal, not a promise.

    If G1 works out as we expect, it will become our low-pause collector in place of "ParNew" + "CMS". And if you're about to ask when will it be ready, please don't be offended by my dead silence. It's the highest priority project for our team, but it is software development so there are the usual unknowns. It will be out by JDK7. The sooner the better as far as we're concerned.

    Updated February 4. Yes, I can edit an already posted blog. Here's a reference to the G1 paper if you have ACM portal access.

    http://portal.acm.org/citation.cfm?id=1029879

  • Comments:

    G1 sounds neat. Can you comment on the similarities and differences between G1 and a Beltway collector?

    Posted by Trevor Strohman on February 01, 2008 at 08:05 AM PST #

    Interesting.
    You mentioned "Parallelism and concurrency in collections" in the featurelist of G1, is it already clear what kind of collections could be run concurrently and when would a full stop accrue?

    I just recently thought about stack allocation for special kind of objects. Couldn't the hotspot compiler provide enough information to determine points in code when its safe to delete certain objects?
    For example many methods use temporary objects. Is it really worth to put them into the young generation?
    (just a thought)

    thank you very much for the great overview of GCs

    Posted by Michael Bien on February 02, 2008 at 05:14 AM PST #

    Hi Jon,

    Nice summary. I've always used a matrix to explain which collectors can be used together.

    I like the idea of G1. It seems to play on cards. I was also wondering if a continuous collector had been considered. Something that would work very conservatively to reclaim memory while mutators where allowed to run. I suspect that eventually the space would have to be properly collected. I would also suspect that it leave memory highly fragmented. But, if it can delay a collection (partial or full), it might offer some advantages.

    Kirk

    Posted by KIrk on February 02, 2008 at 09:34 AM PST #

    Where is the image?

    Posted by HashiDiKo on February 03, 2008 at 11:26 PM PST #

    Where do you find that kind of information about the garbage collectors?

    Posted by Nico on February 04, 2008 at 01:47 AM PST #

    Hi all,

    This is Tony from the HotSpot GC Group (and I've been working on G1
    for far too long...).

    Trevor,

    The Beltway GC (you're very well informed!) is a stop-the-world GC and
    a generalization over copying GCs (basically, you can express a
    variety of different copying schemes in terms of the Betlway
    abstractions). G1 is a particular (copying-based, I suppose) GC with a
    lot of its functionality (e.g., marking) done concurrently. I'm not
    quite sure whether you can actually express G1 in terms of the Beltway
    (the latter "churns" through regions when collecting, whereas in G1 we
    target the regions with the most garbage in them). Maybe, the Beltway
    folks can enlighten us on this.

    Michael,

    Inititally, G1 will behave similarly to CMS, i.e., stop-the-world
    "young GCs" (with every now and then some old regions also being
    reclaimed during such GCs) and concurrent marking (but no sweeping, as
    it's not needed). But, with several advantages (compaction, better
    predictability, faster remarks, etc.). We have many ideas on how to
    proceed in the future to do even more work concurrently, but nothing
    is certain yet. so we will not say much else on this at this time.

    Regarding stack allocation. I believe (and I've seen data on papers
    that support this) that stack allocation can pay off for GCs that (a)
    do not compact or (b) are not generational (or both, of course).

    In the case of (a), a non-compacting GC has an inherently slower
    allocation mechanism (e.g., free-list look-ups) than a compacting GC
    (e.g., "bump-the-pointer"). So, stack allocation can allow some
    objects to be allocated and reclaimed more cheaply (and, maybe, reduce
    fragmentation given that you cut down on the number of objects
    allocated / de-allocated from the free lists).

    In the case of (b), typically objects that are stack allocated would
    also be short-lived (not always, but I'd guess this holds for the
    majority). So, effectively, you add the equivalent of a young
    generation to a non-generational GC.

    For generational GCs, results show that stack allocation might not pay
    off that much, given that compaction (I assume that most generational
    GCs would compact the young generation through copying) allows
    generational GCs to allocate and reclaim short-lived objects very
    cheaply. And, given that escape analysis (which is the mechanism that
    statically discovers which objects do not "escape" a thread and hence
    can be safely stack allocated as no other thread will access them)
    might only prove that a small proportion of objects allocated by the
    application can be safely stack allocated (so, the benefit would be
    quite small overall).

    (BTW, your 3D engine in Java shots on your blog look really cool!)

    Kirk,

    There are GCs that are pauseless. For example, this is how the
    real-time GC in our Java RTS (Real-Time System) product works.
    However, they cost a lot in terms of overhead, as well as
    complexity. I'd guess that, for the SE product, we might stick to
    having GCs that do pauses, but try to decrease the pause times as much
    as possible.

    Tony

    PS There's currently an outage on the blog server which is preventing
    images from being rendered. This is being rectified "as we speak".

    Posted by Tony Printezis on February 04, 2008 at 01:50 AM PST #

    Image is broken :(

    Posted by Arman on February 04, 2008 at 02:08 AM PST #

    Thank you very much for the detailed explanation Tony!

    >(BTW, your 3D engine in Java shots on your blog look really cool!)
    Thank you, the engine is actually pretty old. I tweaked it a lot to run without full stops (thats why I asked ;))

    Posted by Michael Bien on February 04, 2008 at 04:56 AM PST #

    HashiDiKo,

    If you are asking about the Hotspot garbage collectors and have not already seen the GC whitepaper, it is a good place to start.

    http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf

    Posted by Jon Ustmiasam on February 04, 2008 at 07:00 AM PST #

    A basic GC question: Is it worthwhile to try and collect 'cheap' garbage? Or does the cost of tracking it outweigh the benefits?

    One thought was having a bit on each object that indicated whether it had ever been assigned to a non-stack location, coupled with a list on each stack frame for objects that had been allocated.

    Then, when unwinding that stack frame you could immediately GC any object that wasn't potentially referenced from elsewhere.

    (Note: the flag would just be turned on, no attempt would be made to reference count, etc)

    Or am I just being naive?

    Posted by Andrew on February 04, 2008 at 09:31 AM PST #

    Andrew,

    In practice doing what you're proposing is not really straightforward
    (even though it sounds good "on paper"!).

    The main issue is that for GCs that rely on compaction (or that at
    least have a copying young generation, which is basically all the GCs
    in HotSpot), GCing specific objects is just not possible (or at least,
    it's not very efficient). Compacting GCs assume that, when a GC
    happens, all live objects will move somewhere (to another space in
    copying GCs, or to the bottom of the compacting space in sliding
    compacting GCs) and all available free space will be in one
    place. This means that such GCs do not keep track of individual free
    chunks and makes it impossible to just reclaim specific objects. And
    there are several good reasons why we like such collectors, aaone of the
    most important ones being that they allow for very fast, very scalable
    bump-the-pointer allocation.

    Even if we could GC specific objects, how are we going to find all the
    objects allocated by a particular stack frame? Are we going to link
    them at allocation? That's extra overhead.

    Performance-wise, copying young generations (like the ones we have in
    HotSpot) are super efficient in reclaiming young, short-lived objects
    (they just evacuate the few survivors they come across and never even
    touch the dead objects; this is why they are so fast). So, in most
    cases, they should be able to reclaim space at least as efficiently as
    what you propose. In fact, they might be even more efficient, given
    that they don't have to iterate over the dead objects: they copy the
    survivors, the rest are reclaimed, done. Whereas, according to what
    you propose, we would have to iterate over the dead objects and
    de-allocate them one-by-one.

    To summarize, your scheme might work for a non-generational,
    non-compacting GC (where you can de-allocate specific objects). But, I
    can't see it working for our GCs.

    I got slightly carried away in my reply here... Hope it helps!

    Tony

    Posted by Tony Printezis on February 05, 2008 at 04:39 AM PST #

    Thanks for the response Tony.

    I hadn't considered the compaction aspect of generational garbage collectors, and certainly it's not beneficial for that set of assumptions (ie the pool for the young generation pool has filled up at exactly the same rate).

    Basically I thought the idea had some interesting characteristics, particularly for SMP environments, and thought I should share it before I forgot it.

    Andrew

    Posted by Andrew on February 05, 2008 at 08:21 AM PST #

    Hi guys,

    G1 sounds very interesting! A few questions:

    1. How large is a region likely to be? (just a ballpark range)

    2. Will TLAB's be (partially) replaced by regions, so threads may be allocating into different regions?

    3. It's not obvious to me how to choose which regions have more garbage, without first having marked them, but then how do you know which region to allocate into?

    4. When doing a young gen collection, will you compact into a different region or the same region? I.e., does it work more like a copying collector than a mark-sweep-compact? (Or am I missing something there :)

    Thanks!
    Adam

    Posted by Adam Hawthorne on February 05, 2008 at 12:36 PM PST #

    Will the new GC also collect the non-heap (i.e. Code Cache and Perm Gen)? Or will you get rid of those two?

    I opt for the latter since these two spaces cause lots of problems for dynamic software like Eclipse (which loads plug-ins at runtime) or Groovy (which creates classes at runtime).60

    Posted by Aaron Digulla on February 05, 2008 at 04:48 PM PST #

    Adam,

    Hi. Answers to your (very good!) questions:

    1. Right now, regions are 1MB. We allocate a contiguous block of
    regions for objects that are "humongous", i.e. that are too large to
    fit in one region.
    2. No, regions will not replace TLABs. There's one allocating region
    and threads will allocate TLABs from it. When that region is full,
    then it will be "retired" and another one will become the allocating
    region. So, a single region might hold TLABs from several threads.
    3. As I mentioned in an earlier post, we perform a marking phase every
    now and then to get up-to-date liveness information.
    4. (you're not missing anything! good question) Collections are done
    by copying. Basically, we pick the regions we want to GC (we refer to
    that set of regions as the "collection set") and we evacuate the
    surviving objects from those regions to another set (the
    "to-space"). The assumption is that to-space will have fewer regions
    than the collection set and this is how we reclaim space. Given that
    we assume that the survival rate in the collection set will be quite
    low (we chose which regions to GC, remember?), copying is the most
    efficient way to perform such collections.

    Aaron,

    Right now the G1 heap replaces the young / old generations. I.e., we
    still have a permanent space + code cache. In the future, we might be
    able to incorporate the permanent space into G1 heap (there are many
    tricky issues that we need to resolve first to do that...). However, I
    don't think we'll also incorporate the code cache too.

    Point taken though.

    Tony

    Posted by Tony Printezis on February 06, 2008 at 12:03 AM PST #

    pretty interesting discussion!

    I have written a blog entry and aggregated the discussion in a question-answer schema.
    http://www.michael-bien.com/roller/mbien/entry/garbage_first_the_new_concurrent

    Maybe the first step to a FAQ ;)

    Posted by Michael Bien on February 06, 2008 at 12:44 AM PST #

    For those without ACM access, it's published on Sun's website at http://research.sun.com/jtech/pubs/

    Posted by Bob Hansen on February 06, 2008 at 08:29 AM PST #

    Tony,

    More and more dynamic languages try to use the JVM as their core engine. Some of them generate bytecode on the fly and with the current implementation of the Sun VM, we invariably run out of memory in the Code Cache or PermGen. Do you know if this issue is being worked on or is Java 7 again not recommendable for languages which rely on bytecode generation at runtime?

    Best regards,

    Aaron

    Posted by Aaron Digulla on February 07, 2008 at 01:28 AM PST #

    Aaron,

    Hi. The code cache works quite differently to the rest of the heap. In
    particular, code is non-relocatable, whereas objects in the heap
    typically move (we could relocate code, but it's tricky and
    error-prone).

    I think the issue with the code cache is not that we don't grow it
    enough, but that we do not have a good policy to work out which
    methods have not been used for some time and evict their compiled
    code. Right now, we just grow it and, when it's full, we stop
    compiling new methods.

    Notice, however, that code is evicted when classes are unloaded (as
    well, as when methods are "invalidated", i.e., some assumptions were
    made during their compilation that do not hold any more). So, if you
    fill up the heap with lots of little classes that are used only for a
    short period of time and which are then never reclaimed, then, yes,
    you'll blow up the code cache and/or the permanent generation. But, I
    would claim that this is due to a memory leak, not bad behavior of the
    code cache.

    Also notice that the AppServers heavily load/unload classes (and
    stress the code cache quite a lot because of that) and with a bit of
    tuning they work quite well.

    Tony

    Posted by Tony Printezis on February 08, 2008 at 03:18 AM PST #

    In FAQ#2: "This makes the mix-and-match of collectors work but adds some burden to the maintenance of the collectors and to the addition of new collectors. And the burden seems to be quadratic in the number of collectors."

    Could you elaborate on that statement? May be a specific example to help illustrate the "burden".

    Posted by /dev/null on February 13, 2008 at 12:52 AM PST #

    hi All,
    I'm Ali from pakistan hyderabad,and i'm biggner in java,so i have no idia about java,that how i install it,and how to run,so humble request to all plz help me,via e-mail,my e-mail edress is sms_boy2001@yahoo.com,i will awaiting ur e-mails,thanks

    Posted by ALI Gulshan on February 13, 2008 at 01:30 PM PST #

    Tony,

    On the stack allocation front, is there not an particular case that could be supported:

    An object could be stack allocated if it is only ever referenced within the function.

    This optimization (I'm guessing that there are going to be cache line benefits) could also be done if the object is used to call another function, but that function gets inlined and it turns out that no additional references to the object are made.

    My gut feel is that stack allocation could have some notable benefits in certain cases.

    Posted by Neale on February 13, 2008 at 05:24 PM PST #

    Regarding the question of what I mean by burden, below
    are descriptions of 2 methods that are implemented
    for each generation.

    // This function gives a generation a chance to note a point between
    // collections. For example, a contiguous generation might note the
    // beginning allocation point post-collection, which might allow some later
    // operations to be optimized.
    virtual void save_marks();

    // This function is "true" iff any no allocations have occurred in the
    // generation since the last call to "save_marks".
    virtual bool no_allocs_since_save_marks();

    These methods were created for use with generations that have
    the free space in the generation in a single contiguous
    region. As the comment for save_marks() says, noting the
    high water mark for allocations in the generation is an example
    of what save_marks() can do. Originally the description was
    more specific to generations with a contiguous region for its
    free space. But then we implemented CMS which maintains its
    free space in free lists of chunks of space that are available
    for allocation. save_marks() needs to be implemented for the
    CMS generation but what does it mean for a generation with free
    lists of free space? In order to figure that out you need to know
    how the information from save_marks() is used. no_allocs_since_save_marks()
    is a method that depends on the effects of save_marks().
    For a generation with a contiguous free space, no_allocs_since_save_marks()
    checks if the high water mark has moved since of the last save_marks().
    For the no_allocs_since_save_marks() for CMS it would be sufficient to
    just note whether any allocations have occured in the CMS generation.
    It turns out, however, that other methods need to look at each
    object that has been allocated since the save_mark(). So for
    CMS the save_mark() starts a list of allocations out of the CMS
    generation and no_allocs_since_save_marks() checks to see if that
    list is empty. The young generation has a slightly specialized
    version of these methods because the young generation has three
    spaces (eden and 2 survivor spaces). G1 has its own version of
    save_marks() but does not have its own no_allocs_since_save_marks().
    So just to get off the ground with a new collector, you need
    to think about a bunch of methods which may
    or may not make sense for the new collector. And if you
    need to change a method, you have to look at the implementations
    for all the collectors to see if your change is going to make
    a difference. On the other hand when you want your new
    collector to work with the others, you have a big head start.

    Posted by Jon Ustimasam on February 15, 2008 at 03:38 AM PST #

    Please take into consideration an ability for an application to \*really\* shed all memory. Basically, a possibility to request a \*Super Garbage Collection\*, like System.utterShedding(), where everything possible was collected, the memory layout compacted and rearranged, and then every byte reclaimed was given back to the OS (bypassing the -XX:[Min|Max]HeapFreeRatio=xx switches, instead free'ing any memory not immediately needed). A rather large GC pause would be expected on invocation of this method.

    -XX:[Min|Max]HeapFreeRatio=xx :
    http://www.gossamer-threads.com/lists/lucene/java-user/44286#44286

    This would make java on the desktop slightly more possible. An application would do such a request on minimize, and in particular on "minimize to tray" - or in other situations where it knew that quite a bit of garbage was generated (some particular state change in the application, unloading of whatever), or when it had been inactive for some time X.

    Also, the GCer should try to not cycle through all memory when needed, touching every byte. Instead, if the application seems idle (very little mem churning), it could restrict itself to use only a few such regions. This of course to enable the OS to do swapping properly: When the GC repeatedly sweeps through the entire memory block, all memory looks like it is active and hence cannot be swapped out.

    Btw, what do you have to say to the "Bookmarking Collection"?
    http://www-cs.canisius.edu/~hertzm/bc.html
    Basically, it states that by cooperating with the OS kernel, you'd get way better performance on systems where memory is becoming scarce. On a multitasking desktop system, memory is ALWAYS scarce.

    Posted by Endre Stølsvik on February 19, 2008 at 06:51 AM PST #

    Also, this paper:
    http://www-cs.canisius.edu/~hertzm/gcmalloc-oopsla-2005.pdf

    .. from Matther Hertz
    http://www-cs.canisius.edu/~hertzm/

    is interesting in that it tries to compare explicit memory management to GCing.

    Its findings are somewhat disturbing: With 5 times as much memory as needed, GC is as good as, or even somewhat better, to explicit memory management. But it degrades fast: with three times as much memory, it 17% slower, and with twice, it runs 70% slower than explicit. Also, GCing performns VERY bad on a swapping system, with orders-of-magnitude worse behaviour than explicit memory management.

    Any comments?

    Posted by Endre Stølsvik on February 20, 2008 at 05:27 PM PST #

    Neale,

    Hi. Yes, what you're describing is referred to as "escape analysis" in
    the literature (i.e., detecting whether an object "escapes" a
    thread, usually by doing static analysis over the code).

    If an object is proven not to escape a thread, then there are a few
    interesting optimizations that can be done.

    One, as you say, is to stack allocate it (even though, as I said
    earlier on this thread, a generational GC can have similar performance
    benefits by reclaiming shortly-lived objects very cheaply). In fact,
    you can do better, as in some cases, you can just "allocate" the
    object directly on registers (i.e., the object might never actually
    see memory).

    Another optimization that escape analysis allows is actually lock
    elision (if no other thread is looking at this object, why lock
    it?). But, again, our biased locking optimization has similar
    performance benefits in a lot of cases.

    The HotSpot compiler folks have been working on escape analysis for a
    bit. Expect it in a 6 update release, as well as 7.

    Tony

    Posted by Tony Printezis on February 21, 2008 at 03:28 AM PST #

    Endre,

    We have heard requests from both inside and outside of Sun
    for an API which would allow an application to indicate
    that it is not going to be using memory for a while and
    so garabge collect and shrink the heap. We plan on implementing
    something but don't have anyone to work on it at the moment.

    Regarding cycling through memory, a minor collection (just
    collecting the young generation) will only touch a subset
    of the heap so will keep fewer pages in memory. But for
    a full collection (with all our current collectors)
    we really don't have any choice but to
    look at the entire heap so everything gets touched. G1
    can be more flexible about this and we plan on exploring
    different policies.

    Regarding the "Bookmarking" collector, swapping is definitely a killer
    in terms of performance for a garbage collections. In a
    tight memory situations, if you can avoid swapping, performance
    will definitely be better. If I recall correctly there was some
    performance cost for the OS support and that would have to be
    traded off against the benefits. When we reviewed the paper here,
    there was not a sense that we should implement such a collector
    in the near term. We actually have other issues that we would
    like to address before we tackle something like that. For one thing
    we would like to be smarter about choosing the heap size when the
    JVM starts up on a system. Currently if the JVM starts up on a
    large system (generally speaking more than two processors and more
    than 2g of memory), we look at the physical memory on the system
    and choose some fraction of that for the maximum heap size. We really should
    be looking at the memory pressure on the system and base the
    maximum heap size on that. Actually, depending on the situation,
    we should not have a maximum heap size but should monitor the system
    and use what we need. "What we need" being something less than
    an amount that stresses the system. At start up we also choose how
    many GC threads we're going to use in parallel collections. Again we
    need to look at the activity on the systems and make smarter choices.
    And we should vary that dynamically. We refer to these smarter choices
    as "external ergonomics". "External" because it's looking outside
    the JVM.

    Regarding the cost of user explicit memory management vs. GC, as you note our
    collectors (GC ergonomics collectors excluded) by default try to
    keep about 40% of the heap free after a collection. Users seem to
    be satisfied with the GC cost with that size heap (less than
    twice the size of the live data).

    Posted by Jon Ustimasma on February 22, 2008 at 01:51 AM PST #

    @Jon: Thanks for the answer!

    In regard to your last comment, I'd just like to point out that "Java on the Desktop" isn't really big. At all. In any way. What-so-ever. The only application I know of that have any circulation at all, is Azureus..! (And of course IDEs, but they don't count - I (at least) am talking about the mainstream population, not professional coders - we're not into this to make stash for ourselves, now are we? :-) )

    I postulate that a fair big reason for this is exactly the memory issues: it is not an option to have three or more "javas" going at the same time, not in the same way it is possible to have dozens of native applications running. It is \*fully\* out of the question to leave a java application "running in the systray", not to mention ten systray applications.

    So, what I believe is that to actually get java outside of servers, the resource consumption side, of which memory is on the forefront, have to be much more focused on.

    MS's .NET seems to be handling this rather better, but this is admittedly just a hunch.

    Btw: regarding the OS cost side of the Bookmarking collection: how big would that cost be for processes that didn't want it? I seem to remember that the cost was actually very close to zero, much like Sun states that the dtrace stuff is when not in use?

    Posted by Endre Stølsvik on February 24, 2008 at 06:46 PM PST #

    Endre,

    You're very right that making our JVM work better
    for client apps is a matter of focus. We have
    a very vocal client platform group within Sun that
    reminds us of that but when the rubber hits the
    road, we're doing G1 instead of something like
    a bookmarking collector.

    Regarding my comments on the bookmarking collector,
    I went back and looked at the paper and a process
    registers if it wants notification about memory being
    paged out so I was wrong about the cost being born by
    all processes.

    Posted by Jon Ustimamsa on March 03, 2008 at 12:17 AM PST #

    Hi, let me suggest another way of specifying goals. Have the user specify a "cycle period", and then strive to keep the collection time in each cycle as low as possible, but also as consistent as possible.

    Imagine a game, for example, you would specify a cycle time of 1/60 seconds (for 60Hz framerate), and you want the GC to try to spread out its collections so that no single frame is disproportionately affected. This would be very useful, I think.

    Posted by Sebastian on March 08, 2008 at 08:21 AM PST #

    You mention that G1 will initially behave like CMS does now, with some extra tweaks enabled by default, like compaction...

    Anyway, I've just discovered that on OS level (Linux 2.6.5-7 sles9) the amount of file descriptors is not cleared by ParNew ... and CMS kicks in too late to free up excess "CLOSE_WAIT" and "pipe" fd's associated with uid that jvm runs under... This results in "Too many open files ..." for jvm and need to restart (or call system.gc(), but we have DisableExplicitGC on all our jvms)

    I know I can use "-XX:CMSInitiatingOccupancyFraction=xx%" to have CMS kick in earlier and thus free up OldGen faster, but I'd rather not do this, because when application is at peak CMS's get triggered continuously (CPU load) and what about Xmx setting, I might as well put a lower Xmx value in and keep an eye on CMS activity <-> heap usage.

    Will G1 take this into account ? Meaning, can it detect on OS level the need for a CMS, cleaning old gen and thus freeing up file descriptors ?

    Oh yes, and the compaction in G1, is this basically a "-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled" ? I don't have access to G1 paper you see ...

    grtz,
    gert

    Posted by Gert on March 09, 2008 at 10:36 PM PDT #

    Sebastion,

    With G1 you can specify a time slice NNN
    and the maximum amount of time to be spent on GC in
    that time slice MMM. G1 will then schedule
    it's work such that for any NNN time slice no more than
    MMM will be spent doing GC. You can clearing
    over specify this goal (MMM = 0 will never work) but
    G1 tries to provide the application with at least
    NNN - MMM per time slice.

    Gert,

    The garbage collector never directly releases
    OS resources such as file descriptors. Some
    applications have Java objects (TTT) that hold a
    resource such as a file descriptor and when
    TTT becomes unreachable the application executes
    code to release that resource. Weak references
    are often used to recognize when TTT becomes
    unreachable. Some applications use finalizers
    and take their chances with the unpredictability
    of when a finalizer will run. With regard to the
    release of OS resources, G1 and CMS will behave
    similarly.

    CMS's compaction of the tenured generation should
    be an exceptional event which occurs when the
    tenured generation becomes full before CMS's
    concurrent collection can free up space. That's
    when a stop-the-world full collection is done and
    a compaction occurs. With G1 compaction is the
    usual case. At each collection part of the heap
    will be compacted.

    Posted by Jon Ustimmasa on March 17, 2008 at 04:12 AM PDT #

    So what happens if I do specify MMM=0? Will the amount of time it does occupy be mostly constant, or will I get spikes in some slices?

    My point is that I don't know in advance how much time I want to spend doing GC because the workload varies, so I can't give you an MMM value. So effectively I would like the GC to choose an MMM for me so that it varies slowly (i.e. the amount of time available to the application for each time slice is roughly constant in the short term, but varies in the long term for the duration of the application run).

    Posted by Sebastian on March 23, 2008 at 09:18 AM PDT #

    Sebastian,

    With MMM you select your pause time goal.
    After that there is the idea of a throughput
    goal which might be useful to you.
    Say your pause time goal has been satisfied
    (or you don't have one). Then your throughput
    goal would determine how much time you spent
    doing GC work. You could specify that you
    don't want to spend more than 10% of the total
    times doing GC and the heap would be sized
    (changing the heap is the usual way to affect
    total GC cost) to make that so. Your application
    would not necessarily see a single 10 second pause
    over the next 100 seconds. It might see two 5
    second pauses instead.

    Posted by Jon Ustimmasa on March 26, 2008 at 08:35 AM PDT #

    Post a Comment:
    Comments are closed for this entry.
    About

    jonthecollector

    Search

    Categories
    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today