By Poonam-Oracle on Oct 27, 2016
Recently there was a report from a customer that they observed performance degradation while using JFR with G1 when they used it along with -XX:FlightRecorderOptions=stackdepth=512, but didn't see any performance impact when using the same setting for the stackdepth with the Parallel Collector. The flight recordings revealed that the poor performance was due to the long pauses introduced by the JFRCheckpoint operations. The long JFRCheckpoints occurred only for the recordings started with -XX:FlightRecorderOptions=stackdepth=512. When tested with -XX:FlightRecorderOptions=stackdepth=3, there were no long pause JFRCheckpoint operations.
Lets look at the possible reasons for that. The time taken by the JFRCheckpoint operations is directly proportional to the amount of data that needs to be written to the disk. In G1, TLAB size is smaller, so in effect it generates more number of 'Allocation in TLAB' and 'Allocation outside TLAB' events. And if we increase the stack depth using 'stackdepth' option, in G1 we'd have much more stack traces data to be written to the disk as compared to the other collectors.
I ran a simple test where I started a GC intensive application first with the Parallel collector, and then with the G1 collector. For both of these tests, I started HotSpot Default recording, and then started a manual time-fixed recording with the profile settings. Observations:
1. The number of GC events created in case of G1 are much more than the parallel collector.
2. The TLABs are smaller in G1 and that leads to the generation of more allocation events in comparison to the parallel collector.
3. If we compare the size of the file written for a 2 minute profile recording with both the collectors, for G1 the size was 3M and for parallel collector it was 600K. This shows that the amount of data that gets written to the disk with G1 collector is more as compared to the ParallelGC, and that contributes towards longer JFRCheckpoint pauses.
To summarize, if you are using JFR with the G1 collector, my suggestion would be to either use the default value for the 'stackdepth' which is 64, or use an even lower value if you observe long JFRCheckpoint operations.