Java | Friday, February 1, 2008

Our Collectors

I drew this diagram on a white board for some customers recently. They seemed to
like it (or were just being very polite) so I thought I redraw it for your
amusement.

Each blue box represents a collector that is used to collect a generation. The
young generation is collected by the blue boxes in the yellow region and
the tenured generation is collected by the blue boxes in the gray region.


  • "Serial" is a stop-the-world, copying collector which uses a single GC thread.

  • "ParNew" is a stop-the-world, copying collector which uses multiple GC threads. It differs
    from "Parallel Scavenge" in that it has enhancements that make it usable
    with CMS. For example, "ParNew" does the
    synchronization needed so that it can run during the
    concurrent phases of CMS.

  • "Parallel Scavenge" is a stop-the-world, copying collector
    which uses multiple GC threads.

  • "Serial Old" is a stop-the-world,
    mark-sweep-compact collector that uses a single GC thread.

  • "CMS" is a mostly concurrent, low-pause collector.

  • "Parallel Old" is a compacting collector that uses multiple GC threads.

    Using the -XX flags for our collectors for jdk6,


  • UseSerialGC is "Serial" + "Serial Old"

  • UseParNewGC is "ParNew" + "Serial Old"

  • UseConcMarkSweepGC is "ParNew" + "CMS" + "Serial Old". "CMS" is used most of the time to collect the tenured generation. "Serial Old" is used when a concurrent mode failure occurs.

  • UseParallelGC is "Parallel Scavenge" + "Serial Old"

  • UseParallelOldGC is "Parallel Scavenge" + "Parallel Old"

    FAQ

    1) UseParNew and UseParallelGC both collect the young generation using
    multiple GC threads. Which is faster?

    There's no one correct answer for
    this questions. Mostly they perform equally well, but I've seen one
    do better than the other in different situations. If you want to use
    GC ergonomics, it is only supported by UseParallelGC (and UseParallelOldGC)
    so that's what you'll have to use.

    2) Why doesn't "ParNew" and "Parallel Old" work together?

    "ParNew" is written in
    a style where each generation being collected offers certain interfaces for its
    collection. For example, "ParNew" (and "Serial") implements
    space_iterate() which will apply an operation to every object
    in the young generation. When collecting the tenured generation with
    either "CMS" or "Serial Old", the GC can use space_iterate() to
    do some work on the objects in the young generation.
    This makes the mix-and-match of collectors work but adds some burden
    to the maintenance of the collectors and to the addition of new
    collectors. And the burden seems to be quadratic in the number
    of collectors.
    Alternatively, "Parallel Scavenge"
    (at least with its initial implementation before "Parallel Old")
    always knew how the tenured generation was being collected and
    could call directly into the code in the "Serial Old" collector.
    "Parallel Old" is not written in the "ParNew" style so matching it with
    "ParNew" doesn't just happen without significant work.
    By the way, we would like to match "Parallel Scavenge" only with
    "Parallel Old" eventually and clean up any of the ad hoc code needed
    for "Parallel Scavenge" to work with both.

    Please don't think too much about the examples I used above. They
    are admittedly contrived and not worth your time.

    3) How do I use "CMS" with "Serial"?

    -XX:+UseConcMarkSweepGC -XX:-UseParNewGC.
    Don't use -XX:+UseConcMarkSweepGC and -XX:+UseSerialGC. Although that's seems like
    a logical combination, it will result in a message saying something about
    conflicting collector combinations and the JVM won't start. Sorry about that.
    Our bad.

    4) Is the blue box with the "?" a typo?

    That box represents the new garbage collector that we're currently developing called
    Garbage First or G1 for short. G1 will provide


  • More predictable GC pauses

  • Better GC ergonomics

  • Low pauses without fragmentation

  • Parallelism and concurrency in collections

  • Better heap utilization

    G1 straddles the young generation - tenured generation boundary because it is
    a generational collector only in the logical sense. G1 divides the
    heap into regions and during a GC can collect a subset of the regions.
    It is logically generational because it dynamically selects a set of
    regions to act as a young generation which will then be collected at
    the next GC (as the young generation would be).

    The user can specify a goal for the pauses and G1
    will do an estimate (based on past collections) of how many
    regions can be collected in that time (the pause goal).
    That set of regions is called a collection set and G1 will
    collect it during the next GC.

    G1 can choose the regions with the most garbage to collect first (Garbage First, get it?)
    so gets the biggest bang for the collection buck.

    G1 compacts so fragmentation is much less a problem. Why is it a problem at all?
    There can be internal fragmentation due to partially filled regions.

    The heap is not statically divided into
    a young generation and a tenured generation so the problem of
    an imbalance in their sizes is not there.

    Along with a pause time goal the user can specify a goal on the fraction of
    time that can be spent on GC during some period (e.g., during the next 100 seconds
    don't spend more than 10 seconds collecting). For such goals (10 seconds of
    GC in a 100 second period) G1 can choose a collection set that it expects it can collect in 10 seconds and schedules the collection 90 seconds (or more) from the previous collection. You can see how an evil user could specify 0 collection
    time in the next century so again, this is just a goal,
    not a promise.

    If G1 works out as we expect, it will become our low-pause collector in place of
    "ParNew" + "CMS". And if you're about to ask when will it be ready, please don't
    be offended by my dead silence. It's the highest priority project for our team,
    but it is software development so there are the usual unknowns. It will be out
    by JDK7. The sooner the better as far as we're concerned.

    Updated February 4. Yes, I can edit an already posted blog. Here's
    a reference to the G1 paper if you have ACM portal access.

    http://portal.acm.org/citation.cfm?id=1029879

  • Join the discussion

    Comments ( 34 )
    • Trevor Strohman Friday, February 1, 2008

      G1 sounds neat. Can you comment on the similarities and differences between G1 and a Beltway collector?


    • Michael Bien Saturday, February 2, 2008

      Interesting.

      You mentioned "Parallelism and concurrency in collections" in the featurelist of G1, is it already clear what kind of collections could be run concurrently and when would a full stop accrue?

      I just recently thought about stack allocation for special kind of objects. Couldn't the hotspot compiler provide enough information to determine points in code when its safe to delete certain objects?

      For example many methods use temporary objects. Is it really worth to put them into the young generation?

      (just a thought)

      thank you very much for the great overview of GCs


    • KIrk Saturday, February 2, 2008

      Hi Jon,

      Nice summary. I've always used a matrix to explain which collectors can be used together.

      I like the idea of G1. It seems to play on cards. I was also wondering if a continuous collector had been considered. Something that would work very conservatively to reclaim memory while mutators where allowed to run. I suspect that eventually the space would have to be properly collected. I would also suspect that it leave memory highly fragmented. But, if it can delay a collection (partial or full), it might offer some advantages.

      Kirk


    • HashiDiKo Monday, February 4, 2008

      Where is the image?


    • Nico Monday, February 4, 2008

      Where do you find that kind of information about the garbage collectors?


    • Tony Printezis Monday, February 4, 2008

      Hi all,

      This is Tony from the HotSpot GC Group (and I've been working on G1

      for far too long...).

      Trevor,

      The Beltway GC (you're very well informed!) is a stop-the-world GC and

      a generalization over copying GCs (basically, you can express a

      variety of different copying schemes in terms of the Betlway

      abstractions). G1 is a particular (copying-based, I suppose) GC with a

      lot of its functionality (e.g., marking) done concurrently. I'm not

      quite sure whether you can actually express G1 in terms of the Beltway

      (the latter "churns" through regions when collecting, whereas in G1 we

      target the regions with the most garbage in them). Maybe, the Beltway

      folks can enlighten us on this.

      Michael,

      Inititally, G1 will behave similarly to CMS, i.e., stop-the-world

      "young GCs" (with every now and then some old regions also being

      reclaimed during such GCs) and concurrent marking (but no sweeping, as

      it's not needed). But, with several advantages (compaction, better

      predictability, faster remarks, etc.). We have many ideas on how to

      proceed in the future to do even more work concurrently, but nothing

      is certain yet. so we will not say much else on this at this time.

      Regarding stack allocation. I believe (and I've seen data on papers

      that support this) that stack allocation can pay off for GCs that (a)

      do not compact or (b) are not generational (or both, of course).

      In the case of (a), a non-compacting GC has an inherently slower

      allocation mechanism (e.g., free-list look-ups) than a compacting GC

      (e.g., "bump-the-pointer"). So, stack allocation can allow some

      objects to be allocated and reclaimed more cheaply (and, maybe, reduce

      fragmentation given that you cut down on the number of objects

      allocated / de-allocated from the free lists).

      In the case of (b), typically objects that are stack allocated would

      also be short-lived (not always, but I'd guess this holds for the

      majority). So, effectively, you add the equivalent of a young

      generation to a non-generational GC.

      For generational GCs, results show that stack allocation might not pay

      off that much, given that compaction (I assume that most generational

      GCs would compact the young generation through copying) allows

      generational GCs to allocate and reclaim short-lived objects very

      cheaply. And, given that escape analysis (which is the mechanism that

      statically discovers which objects do not "escape" a thread and hence

      can be safely stack allocated as no other thread will access them)

      might only prove that a small proportion of objects allocated by the

      application can be safely stack allocated (so, the benefit would be

      quite small overall).

      (BTW, your 3D engine in Java shots on your blog look really cool!)

      Kirk,

      There are GCs that are pauseless. For example, this is how the

      real-time GC in our Java RTS (Real-Time System) product works.

      However, they cost a lot in terms of overhead, as well as

      complexity. I'd guess that, for the SE product, we might stick to

      having GCs that do pauses, but try to decrease the pause times as much

      as possible.

      Tony

      PS There's currently an outage on the blog server which is preventing

      images from being rendered. This is being rectified "as we speak".


    • Arman Monday, February 4, 2008

      Image is broken :(


    • Michael Bien Monday, February 4, 2008

      Thank you very much for the detailed explanation Tony!

      >(BTW, your 3D engine in Java shots on your blog look really cool!)

      Thank you, the engine is actually pretty old. I tweaked it a lot to run without full stops (thats why I asked ;))


    • Jon Ustmiasam Monday, February 4, 2008

      HashiDiKo,

      If you are asking about the Hotspot garbage collectors and have not already seen the GC whitepaper, it is a good place to start.

      http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf


    • Andrew Monday, February 4, 2008

      A basic GC question: Is it worthwhile to try and collect 'cheap' garbage? Or does the cost of tracking it outweigh the benefits?

      One thought was having a bit on each object that indicated whether it had ever been assigned to a non-stack location, coupled with a list on each stack frame for objects that had been allocated.

      Then, when unwinding that stack frame you could immediately GC any object that wasn't potentially referenced from elsewhere.

      (Note: the flag would just be turned on, no attempt would be made to reference count, etc)

      Or am I just being naive?


    • Tony Printezis Tuesday, February 5, 2008

      Andrew,

      In practice doing what you're proposing is not really straightforward

      (even though it sounds good "on paper"!).

      The main issue is that for GCs that rely on compaction (or that at

      least have a copying young generation, which is basically all the GCs

      in HotSpot), GCing specific objects is just not possible (or at least,

      it's not very efficient). Compacting GCs assume that, when a GC

      happens, all live objects will move somewhere (to another space in

      copying GCs, or to the bottom of the compacting space in sliding

      compacting GCs) and all available free space will be in one

      place. This means that such GCs do not keep track of individual free

      chunks and makes it impossible to just reclaim specific objects. And

      there are several good reasons why we like such collectors, aaone of the

      most important ones being that they allow for very fast, very scalable

      bump-the-pointer allocation.

      Even if we could GC specific objects, how are we going to find all the

      objects allocated by a particular stack frame? Are we going to link

      them at allocation? That's extra overhead.

      Performance-wise, copying young generations (like the ones we have in

      HotSpot) are super efficient in reclaiming young, short-lived objects

      (they just evacuate the few survivors they come across and never even

      touch the dead objects; this is why they are so fast). So, in most

      cases, they should be able to reclaim space at least as efficiently as

      what you propose. In fact, they might be even more efficient, given

      that they don't have to iterate over the dead objects: they copy the

      survivors, the rest are reclaimed, done. Whereas, according to what

      you propose, we would have to iterate over the dead objects and

      de-allocate them one-by-one.

      To summarize, your scheme might work for a non-generational,

      non-compacting GC (where you can de-allocate specific objects). But, I

      can't see it working for our GCs.

      I got slightly carried away in my reply here... Hope it helps!

      Tony


    • Andrew Tuesday, February 5, 2008

      Thanks for the response Tony.

      I hadn't considered the compaction aspect of generational garbage collectors, and certainly it's not beneficial for that set of assumptions (ie the pool for the young generation pool has filled up at exactly the same rate).

      Basically I thought the idea had some interesting characteristics, particularly for SMP environments, and thought I should share it before I forgot it.

      Andrew


    • Adam Hawthorne Tuesday, February 5, 2008

      Hi guys,

      G1 sounds very interesting! A few questions:

      1. How large is a region likely to be? (just a ballpark range)

      2. Will TLAB's be (partially) replaced by regions, so threads may be allocating into different regions?

      3. It's not obvious to me how to choose which regions have more garbage, without first having marked them, but then how do you know which region to allocate into?

      4. When doing a young gen collection, will you compact into a different region or the same region? I.e., does it work more like a copying collector than a mark-sweep-compact? (Or am I missing something there :)

      Thanks!

      Adam


    • Aaron Digulla Wednesday, February 6, 2008

      Will the new GC also collect the non-heap (i.e. Code Cache and Perm Gen)? Or will you get rid of those two?

      I opt for the latter since these two spaces cause lots of problems for dynamic software like Eclipse (which loads plug-ins at runtime) or Groovy (which creates classes at runtime).60


    • Tony Printezis Wednesday, February 6, 2008

      Adam,

      Hi. Answers to your (very good!) questions:

      1. Right now, regions are 1MB. We allocate a contiguous block of

      regions for objects that are "humongous", i.e. that are too large to

      fit in one region.

      2. No, regions will not replace TLABs. There's one allocating region

      and threads will allocate TLABs from it. When that region is full,

      then it will be "retired" and another one will become the allocating

      region. So, a single region might hold TLABs from several threads.

      3. As I mentioned in an earlier post, we perform a marking phase every

      now and then to get up-to-date liveness information.

      4. (you're not missing anything! good question) Collections are done

      by copying. Basically, we pick the regions we want to GC (we refer to

      that set of regions as the "collection set") and we evacuate the

      surviving objects from those regions to another set (the

      "to-space"). The assumption is that to-space will have fewer regions

      than the collection set and this is how we reclaim space. Given that

      we assume that the survival rate in the collection set will be quite

      low (we chose which regions to GC, remember?), copying is the most

      efficient way to perform such collections.

      Aaron,

      Right now the G1 heap replaces the young / old generations. I.e., we

      still have a permanent space + code cache. In the future, we might be

      able to incorporate the permanent space into G1 heap (there are many

      tricky issues that we need to resolve first to do that...). However, I

      don't think we'll also incorporate the code cache too.

      Point taken though.

      Tony


    • Michael Bien Wednesday, February 6, 2008

      pretty interesting discussion!

      I have written a blog entry and aggregated the discussion in a question-answer schema.

      http://www.michael-bien.com/roller/mbien/entry/garbage_first_the_new_concurrent

      Maybe the first step to a FAQ ;)


    • Bob Hansen Wednesday, February 6, 2008

      For those without ACM access, it's published on Sun's website at http://research.sun.com/jtech/pubs/


    • Aaron Digulla Thursday, February 7, 2008

      Tony,

      More and more dynamic languages try to use the JVM as their core engine. Some of them generate bytecode on the fly and with the current implementation of the Sun VM, we invariably run out of memory in the Code Cache or PermGen. Do you know if this issue is being worked on or is Java 7 again not recommendable for languages which rely on bytecode generation at runtime?

      Best regards,

      Aaron


    • Tony Printezis Friday, February 8, 2008

      Aaron,

      Hi. The code cache works quite differently to the rest of the heap. In

      particular, code is non-relocatable, whereas objects in the heap

      typically move (we could relocate code, but it's tricky and

      error-prone).

      I think the issue with the code cache is not that we don't grow it

      enough, but that we do not have a good policy to work out which

      methods have not been used for some time and evict their compiled

      code. Right now, we just grow it and, when it's full, we stop

      compiling new methods.

      Notice, however, that code is evicted when classes are unloaded (as

      well, as when methods are "invalidated", i.e., some assumptions were

      made during their compilation that do not hold any more). So, if you

      fill up the heap with lots of little classes that are used only for a

      short period of time and which are then never reclaimed, then, yes,

      you'll blow up the code cache and/or the permanent generation. But, I

      would claim that this is due to a memory leak, not bad behavior of the

      code cache.

      Also notice that the AppServers heavily load/unload classes (and

      stress the code cache quite a lot because of that) and with a bit of

      tuning they work quite well.

      Tony


    • /dev/null Wednesday, February 13, 2008

      In FAQ#2: "This makes the mix-and-match of collectors work but adds some burden to the maintenance of the collectors and to the addition of new collectors. And the burden seems to be quadratic in the number of collectors."

      Could you elaborate on that statement? May be a specific example to help illustrate the "burden".


    • ALI Gulshan Wednesday, February 13, 2008

      hi All,

      I'm Ali from pakistan hyderabad,and i'm biggner in java,so i have no idia about java,that how i install it,and how to run,so humble request to all plz help me,via e-mail,my e-mail edress is sms_boy2001@yahoo.com,i will awaiting ur e-mails,thanks


    • Neale Thursday, February 14, 2008

      Tony,

      On the stack allocation front, is there not an particular case that could be supported:

      An object could be stack allocated if it is only ever referenced within the function.

      This optimization (I'm guessing that there are going to be cache line benefits) could also be done if the object is used to call another function, but that function gets inlined and it turns out that no additional references to the object are made.

      My gut feel is that stack allocation could have some notable benefits in certain cases.


    • Jon Ustimasam Friday, February 15, 2008

      Regarding the question of what I mean by burden, below

      are descriptions of 2 methods that are implemented

      for each generation.

      // This function gives a generation a chance to note a point between

      // collections. For example, a contiguous generation might note the

      // beginning allocation point post-collection, which might allow some later

      // operations to be optimized.

      virtual void save_marks();

      // This function is "true" iff any no allocations have occurred in the

      // generation since the last call to "save_marks".

      virtual bool no_allocs_since_save_marks();

      These methods were created for use with generations that have

      the free space in the generation in a single contiguous

      region. As the comment for save_marks() says, noting the

      high water mark for allocations in the generation is an example

      of what save_marks() can do. Originally the description was

      more specific to generations with a contiguous region for its

      free space. But then we implemented CMS which maintains its

      free space in free lists of chunks of space that are available

      for allocation. save_marks() needs to be implemented for the

      CMS generation but what does it mean for a generation with free

      lists of free space? In order to figure that out you need to know

      how the information from save_marks() is used. no_allocs_since_save_marks()

      is a method that depends on the effects of save_marks().

      For a generation with a contiguous free space, no_allocs_since_save_marks()

      checks if the high water mark has moved since of the last save_marks().

      For the no_allocs_since_save_marks() for CMS it would be sufficient to

      just note whether any allocations have occured in the CMS generation.

      It turns out, however, that other methods need to look at each

      object that has been allocated since the save_mark(). So for

      CMS the save_mark() starts a list of allocations out of the CMS

      generation and no_allocs_since_save_marks() checks to see if that

      list is empty. The young generation has a slightly specialized

      version of these methods because the young generation has three

      spaces (eden and 2 survivor spaces). G1 has its own version of

      save_marks() but does not have its own no_allocs_since_save_marks().

      So just to get off the ground with a new collector, you need

      to think about a bunch of methods which may

      or may not make sense for the new collector. And if you

      need to change a method, you have to look at the implementations

      for all the collectors to see if your change is going to make

      a difference. On the other hand when you want your new

      collector to work with the others, you have a big head start.


    • Endre Stølsvik Tuesday, February 19, 2008

      Please take into consideration an ability for an application to \*really\* shed all memory. Basically, a possibility to request a \*Super Garbage Collection\*, like System.utterShedding(), where everything possible was collected, the memory layout compacted and rearranged, and then every byte reclaimed was given back to the OS (bypassing the -XX:[Min|Max]HeapFreeRatio=xx switches, instead free'ing any memory not immediately needed). A rather large GC pause would be expected on invocation of this method.

      -XX:[Min|Max]HeapFreeRatio=xx :

      http://www.gossamer-threads.com/lists/lucene/java-user/44286#44286

      This would make java on the desktop slightly more possible. An application would do such a request on minimize, and in particular on "minimize to tray" - or in other situations where it knew that quite a bit of garbage was generated (some particular state change in the application, unloading of whatever), or when it had been inactive for some time X.

      Also, the GCer should try to not cycle through all memory when needed, touching every byte. Instead, if the application seems idle (very little mem churning), it could restrict itself to use only a few such regions. This of course to enable the OS to do swapping properly: When the GC repeatedly sweeps through the entire memory block, all memory looks like it is active and hence cannot be swapped out.

      Btw, what do you have to say to the "Bookmarking Collection"?

      http://www-cs.canisius.edu/~hertzm/bc.html

      Basically, it states that by cooperating with the OS kernel, you'd get way better performance on systems where memory is becoming scarce. On a multitasking desktop system, memory is ALWAYS scarce.


    • Endre Stølsvik Thursday, February 21, 2008

      Also, this paper:

      http://www-cs.canisius.edu/~hertzm/gcmalloc-oopsla-2005.pdf

      .. from Matther Hertz

      http://www-cs.canisius.edu/~hertzm/

      is interesting in that it tries to compare explicit memory management to GCing.

      Its findings are somewhat disturbing: With 5 times as much memory as needed, GC is as good as, or even somewhat better, to explicit memory management. But it degrades fast: with three times as much memory, it 17% slower, and with twice, it runs 70% slower than explicit. Also, GCing performns VERY bad on a swapping system, with orders-of-magnitude worse behaviour than explicit memory management.

      Any comments?


    • Tony Printezis Thursday, February 21, 2008

      Neale,

      Hi. Yes, what you're describing is referred to as "escape analysis" in

      the literature (i.e., detecting whether an object "escapes" a

      thread, usually by doing static analysis over the code).

      If an object is proven not to escape a thread, then there are a few

      interesting optimizations that can be done.

      One, as you say, is to stack allocate it (even though, as I said

      earlier on this thread, a generational GC can have similar performance

      benefits by reclaiming shortly-lived objects very cheaply). In fact,

      you can do better, as in some cases, you can just "allocate" the

      object directly on registers (i.e., the object might never actually

      see memory).

      Another optimization that escape analysis allows is actually lock

      elision (if no other thread is looking at this object, why lock

      it?). But, again, our biased locking optimization has similar

      performance benefits in a lot of cases.

      The HotSpot compiler folks have been working on escape analysis for a

      bit. Expect it in a 6 update release, as well as 7.

      Tony


    • Jon Ustimasma Friday, February 22, 2008

      Endre,

      We have heard requests from both inside and outside of Sun

      for an API which would allow an application to indicate

      that it is not going to be using memory for a while and

      so garabge collect and shrink the heap. We plan on implementing

      something but don't have anyone to work on it at the moment.

      Regarding cycling through memory, a minor collection (just

      collecting the young generation) will only touch a subset

      of the heap so will keep fewer pages in memory. But for

      a full collection (with all our current collectors)

      we really don't have any choice but to

      look at the entire heap so everything gets touched. G1

      can be more flexible about this and we plan on exploring

      different policies.

      Regarding the "Bookmarking" collector, swapping is definitely a killer

      in terms of performance for a garbage collections. In a

      tight memory situations, if you can avoid swapping, performance

      will definitely be better. If I recall correctly there was some

      performance cost for the OS support and that would have to be

      traded off against the benefits. When we reviewed the paper here,

      there was not a sense that we should implement such a collector

      in the near term. We actually have other issues that we would

      like to address before we tackle something like that. For one thing

      we would like to be smarter about choosing the heap size when the

      JVM starts up on a system. Currently if the JVM starts up on a

      large system (generally speaking more than two processors and more

      than 2g of memory), we look at the physical memory on the system

      and choose some fraction of that for the maximum heap size. We really should

      be looking at the memory pressure on the system and base the

      maximum heap size on that. Actually, depending on the situation,

      we should not have a maximum heap size but should monitor the system

      and use what we need. "What we need" being something less than

      an amount that stresses the system. At start up we also choose how

      many GC threads we're going to use in parallel collections. Again we

      need to look at the activity on the systems and make smarter choices.

      And we should vary that dynamically. We refer to these smarter choices

      as "external ergonomics". "External" because it's looking outside

      the JVM.

      Regarding the cost of user explicit memory management vs. GC, as you note our

      collectors (GC ergonomics collectors excluded) by default try to

      keep about 40% of the heap free after a collection. Users seem to

      be satisfied with the GC cost with that size heap (less than

      twice the size of the live data).


    • Endre Stølsvik Monday, February 25, 2008

      @Jon: Thanks for the answer!

      In regard to your last comment, I'd just like to point out that "Java on the Desktop" isn't really big. At all. In any way. What-so-ever. The only application I know of that have any circulation at all, is Azureus..! (And of course IDEs, but they don't count - I (at least) am talking about the mainstream population, not professional coders - we're not into this to make stash for ourselves, now are we? :-) )

      I postulate that a fair big reason for this is exactly the memory issues: it is not an option to have three or more "javas" going at the same time, not in the same way it is possible to have dozens of native applications running. It is \*fully\* out of the question to leave a java application "running in the systray", not to mention ten systray applications.

      So, what I believe is that to actually get java outside of servers, the resource consumption side, of which memory is on the forefront, have to be much more focused on.

      MS's .NET seems to be handling this rather better, but this is admittedly just a hunch.

      Btw: regarding the OS cost side of the Bookmarking collection: how big would that cost be for processes that didn't want it? I seem to remember that the cost was actually very close to zero, much like Sun states that the dtrace stuff is when not in use?


    • Jon Ustimamsa Monday, March 3, 2008

      Endre,

      You're very right that making our JVM work better

      for client apps is a matter of focus. We have

      a very vocal client platform group within Sun that

      reminds us of that but when the rubber hits the

      road, we're doing G1 instead of something like

      a bookmarking collector.

      Regarding my comments on the bookmarking collector,

      I went back and looked at the paper and a process

      registers if it wants notification about memory being

      paged out so I was wrong about the cost being born by

      all processes.


    • Sebastian Saturday, March 8, 2008

      Hi, let me suggest another way of specifying goals. Have the user specify a "cycle period", and then strive to keep the collection time in each cycle as low as possible, but also as consistent as possible.

      Imagine a game, for example, you would specify a cycle time of 1/60 seconds (for 60Hz framerate), and you want the GC to try to spread out its collections so that no single frame is disproportionately affected. This would be very useful, I think.


    • Gert Monday, March 10, 2008

      You mention that G1 will initially behave like CMS does now, with some extra tweaks enabled by default, like compaction...

      Anyway, I've just discovered that on OS level (Linux 2.6.5-7 sles9) the amount of file descriptors is not cleared by ParNew ... and CMS kicks in too late to free up excess "CLOSE_WAIT" and "pipe" fd's associated with uid that jvm runs under... This results in "Too many open files ..." for jvm and need to restart (or call system.gc(), but we have DisableExplicitGC on all our jvms)

      I know I can use "-XX:CMSInitiatingOccupancyFraction=xx%" to have CMS kick in earlier and thus free up OldGen faster, but I'd rather not do this, because when application is at peak CMS's get triggered continuously (CPU load) and what about Xmx setting, I might as well put a lower Xmx value in and keep an eye on CMS activity <-> heap usage.

      Will G1 take this into account ? Meaning, can it detect on OS level the need for a CMS, cleaning old gen and thus freeing up file descriptors ?

      Oh yes, and the compaction in G1, is this basically a "-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled" ? I don't have access to G1 paper you see ...

      grtz,

      gert


    • Jon Ustimmasa Monday, March 17, 2008

      Sebastion,

      With G1 you can specify a time slice NNN

      and the maximum amount of time to be spent on GC in

      that time slice MMM. G1 will then schedule

      it's work such that for any NNN time slice no more than

      MMM will be spent doing GC. You can clearing

      over specify this goal (MMM = 0 will never work) but

      G1 tries to provide the application with at least

      NNN - MMM per time slice.

      Gert,

      The garbage collector never directly releases

      OS resources such as file descriptors. Some

      applications have Java objects (TTT) that hold a

      resource such as a file descriptor and when

      TTT becomes unreachable the application executes

      code to release that resource. Weak references

      are often used to recognize when TTT becomes

      unreachable. Some applications use finalizers

      and take their chances with the unpredictability

      of when a finalizer will run. With regard to the

      release of OS resources, G1 and CMS will behave

      similarly.

      CMS's compaction of the tenured generation should

      be an exceptional event which occurs when the

      tenured generation becomes full before CMS's

      concurrent collection can free up space. That's

      when a stop-the-world full collection is done and

      a compaction occurs. With G1 compaction is the

      usual case. At each collection part of the heap

      will be compacted.


    • Sebastian Sunday, March 23, 2008

      So what happens if I do specify MMM=0? Will the amount of time it does occupy be mostly constant, or will I get spikes in some slices?

      My point is that I don't know in advance how much time I want to spend doing GC because the workload varies, so I can't give you an MMM value. So effectively I would like the GC to choose an MMM for me so that it varies slowly (i.e. the amount of time available to the application for each time slice is roughly constant in the short term, but varies in the long term for the duration of the application run).


    • Jon Ustimmasa Wednesday, March 26, 2008

      Sebastian,

      With MMM you select your pause time goal.

      After that there is the idea of a throughput

      goal which might be useful to you.

      Say your pause time goal has been satisfied

      (or you don't have one). Then your throughput

      goal would determine how much time you spent

      doing GC work. You could specify that you

      don't want to spend more than 10% of the total

      times doing GC and the heap would be sized

      (changing the heap is the usual way to affect

      total GC cost) to make that so. Your application

      would not necessarily see a single 10 second pause

      over the next 100 seconds. It might see two 5

      second pauses instead.


    Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
     

    Visit the Oracle Blog

     

    Contact Us

    Oracle

    Integrated Cloud Applications & Platform Services