If you're using multiple threads (or one of the Java libraries is using multiple
threads in your behalf) then threads in your application are doing allocations
concurrently. All the threads are allocating from the same heap so some allowances
have to be made for simultaneous allocations by 2 or more threads. In the simplest
case (other than the case where no allowances are made and which is just plain wrong)
each thread would grab a lock, do the allocation and release the lock. That gives
the right answer but is just too slow. The slightly more complicated means would be
to use some atomic operation (such as compare-and-swap) plus a little logic to safely
do the allocation. Faster but still too slow. What is commonly done is to give
each thread a buffer that is used exclusively by that thread to do allocations.
You have to use some synchronization to allocate the buffer from the heap, but
after that the thread can allocate from the buffer without synchronization. In
the hotspot JVM we refer to these as thread local allocation buffers (TLAB's).
They work well. But how large should the TLAB's be?
Prior to 5.0 that was a difficult question to answer. TLAB's that were too large
wasted space. If you had tons of threads and large TLAB's,
you could conceivably fill up the heap
with buffers that were mostly unused. Creating more threads might
force a collection which would be unfortunate because the heap
was mostly empty.
TLAB's that were too small would fill quickly and would mean having to get
a new TLAB which would require some form of synchronization. There was
not a general recommendation that we could make on how large TLAB's should
be. Yup, we were reduced to trial-and-error.
Starting with 5.0 living large with TLAB's got much simpler - except for the guy
down the hall that did the implementation. Here's what the VM does for you.
Each thread starts with a small TLAB. Between the end of the last
young generation collection and the start of the next (let me call that
period an epoch), we keep track of the number of TLAB's a thread has
used. We also know the size of the TLAB's for each thread.
Averages for each thread are maintained for these two numbers (number and
size of TLAB's). These averages are weighted toward the most recent
epoch. Based on these averages the sizes of the TLAB's are adjusted so that a
thread gets 50 TLAB's during a epoch. Why 50? All things being equal we
figured that a thread would have used half its TLAB by the end of the epoch.
Per thread that gave us 1/2 a TLAB not used (out of the magic 50) for a wastage
of 1%. That seemed acceptable. Also if the young generation was not large enough
to provide the desired TLAB's, the size of the young generation would be
increased in order to make it so (within the usual heap size constraints, of course).
The initial TLAB size is calculated from the
number of threads doing allocation and the size of the heap. More threads pushes
down the initial size of the TLAB's and larger heaps push up the initial size.
An allocation that can not be made from a TLAB does not always mean that the thread
has to get a new TLAB. Depending on the size of the allocation and the unused
space remaining in the TLAB, the VM could decide to just do the allocation from
the heap. That allocation from the heap would require synchronization but so would
getting a new TLAB. If the allocation was considered large (some significant fraction
of the current TLAB size), the allocation would always be done out of the heap.
This cut down on wastage and gracefully handled the much-larger-than-average