Object Allocation - Better the devil you know...
By tomas.nilsson on Dec 14, 2009
(The post below is written by Josefin Dahlstedt. Mainly due to x-mas holidays, there will be no accompanying webex to this post.)
Being a server JVM, JRockit constantly tries to make the application be all that it can be. However, the applications running in server environments are complex and difficult to predict, making a one-sided effort difficult. By knowing something about your allocation needs you will be able to make informed tuning choices to optimize the runtime for your application. Also, by knowing something of the internal workings of JRockit, your favorite server-side JVM, you will be able to write your application code to maximize performance and avoid poor design choices when running on JRockit.
Before you read further, know that creating a lot of objects, i.e. doing allocation, is going to cost you; the fewer allocations, the fewer garbage collections and the less fragmentation issues. That said; this post will address some opportunities to maximize the advantage of JRockit, when developing a server application in Java, with respect to object allocation and access patterns.
Generally JRockit allocation and garbage collection relies on two old assumptions about objects in Java:
1. Objects allocated together closely in time are usually accessed together closely in time
2. Most objects die young (the weak generational hypothesis)
Also, it is assumed that most objects allocated are smaller than a few hundred kB.
To accommodate these assumedly prevalent smallish objects and to handle (1) well, JRockit implements the concept of thread local areas, TLAs. The areas successfully co-locate objects allocated closely together in time by one thread. Having good locality for objects accessed closely together in time will have a huge impact on cache locality. This quality is increasingly important as computer architectures move towards multi-core and NUMA technologies.
The TLA is a small area of consecutive memory that is handed out to a java thread for allocation. In a naïve implementation all allocating threads have to synchronize on the global java heap structure to get new memory for each allocation; this would create massive contention for the heap lock increasing by the number of java threads. The TLAs help avoid global synchronization for most allocations. The size of the TLAs for smallish object allocations may be configured through a command line option -XXtlaSize:[min= | preferred= | wasteLimit=]. By setting min you specify the minimum size for TLAs, preferred specifies the preferred size, and wasteLimit sets the limit of how much of a TLA you are willing to waste when trying to get memory for an object. The value for wasteLimit will affect fragmentation as it decides if you throw away a partly filled TLA to get a new one for allocation or take the additional cost of allocating directly on the heap. The value for min will also affect fragmentation as you specify how small areas of consecutive memory you accept and partition for allocation, areas smaller than min will not be used and may thus be cause for fragmentation. The options are inherently related so that: wasteLimit <= min <= preferred.
To address the second assumption (2), a generational garbage collector is usually used to address fragmentation of the java heap. It will attempt to keep newly allocated objects in a special area with the hope that when the area has been used up most of the objects there will have become unreachable. Then the area can be recycled for new allocations, making the cost of promotion negligible. This assumption has been shown to not be true enough, so the current solution in JRockit is to keep the part of the nursery where the most recent allocations were made uncollected, i.e. the keep area, to reduce the cost of promoting still used objects out of the nursery.
If you are running a generational garbage collector you will have a young space pool set up for allocation of the application's java objects. This pool will have two parts: a part that will be garbage collected at the next young collection, and a part that will be full of the lastly allocated objects, which will be kept uncollected by the assumption that they are still reachable and would only be promoted to old space.
Previously, when running a generational garbage collector, the entire nursery was split up into TLAs for thread local allocation. Each time an object larger than the TLA wasteLimit was allocated, the allocation had to be done in old space. But a new feature in the JRockit memory system for the upcoming release is to enable allocations in the nursery outside TLAs, and thus avoiding medium sized objects cluttering old space unnecessarily.
The allocation will still mostly be done in TLAs, but when trying to allocate objects larger than the wasteLimit, instead of trying to allocate them in old space, in the worst case potentially causing garbage collection of the entire heap, JRockit will attempt to allocate the object in the nursery first. This should reduce fragmentation of old space and put off an old space garbage collection and compaction to a later time.
What do I do with this information?
These two concepts: TLAs, along with the possibility to accommodate larger objects in the nursery, could be useful to understand to help design the Java application better for performance. With this information you should be able to understand why large objects that are needed for the application may be good to pool for reuse over time. It may also help you take advantage of the TLAs when designing the contents of small objects. Knowing how allocations are locality optimized per thread may help you when distributing work to a number of threads.
Being mindful of object allocation size and patterns in a runtime environment like Java may seem something the Java developer should be able to mostly ignore. But when writing for performance, trying to balance resource use and maximize predictability, the little details tend to matter very much and make all the difference.