By Matthew Swift on Mar 26, 2009
- Throughput: the average number of requests handled per second
- Response time: the average time taken to handle a single request
- Variability: the variation, or predictability, of response times
For example, an application server such as Glassfish typically has very strong requirements for throughput, but much weaker requirements for response time and variability: a web service must be able to handle a huge volume of traffic, but users will not be too bothered if their request is handled in 10ms, 100ms, or even 5 seconds. Many real-time applications have weak throughput requirements but, by nature, have very strong variability requirements: when an event occurs it must be handled immediately and predictably (e.g. there should be no unpredictable pauses due to thread scheduling, disk IO, DB check pointing, etc).
OpenDS has relatively strong performance requirements for all three criteria: on suitable hardware we need to handle tens of thousands of requests per second per server, with average response times in the milli-second range, and with soft real-time response time variability. I believe that we have satisfied the first two criteria, however the last one, variability, has posed us some problems. Variability can be caused by many things, the most obvious of which is garbage collection for Java applications.
We recommend that people use the CMS garbage collector which is enabled using the -XX:+UseConcMarkSweepGC JVM option and is recommended by Sun for server applications. However, CMS is not without its problems: typically it performs a stop the world (STW) garbage collection every few seconds which may last over 100ms in some cases. Occasionally (e.g. once a day) it may perform a Full GC - this is a STW garbage collection that, depending on the size of the application, may last several seconds. These kinds of pauses are unacceptable to some OpenDS users and this is where G1 enters the picture.
Garbage First, or G1 as it is commonly referred to, is Sun's next generation soft real time garbage collector. G1 still performs STW pauses but gives the user much more control over their duration and frequency. If you are interested in finding out more about G1 then take a look at the following links:
Here are the potential benefits of G1 to OpenDS:
- No long duration stop the world full GCs,
- Possible to split GC pauses into more frequent but smaller pauses,
- Decouples GC pause times from heap size.
In fact, points 2 and 3 above mean the G1 is more cache friendly which is great for database applications such as OpenDS.
The OpenDS developers are in a fortunate position: we work for Sun and can collaborate closely with the HotSpot JVM team. To that end, I have had the great opportunity to work very with Tony Printezis (a.k.a. "Mr GC") and Laurent Daynes over the past few months. This collaboration is very exciting for both us and the HotSpot JVM team: we get to test and feed our requirements directly to the G1 developers, and they get to test G1 against a Real World application on big hardware that pushes Java, and G1, to the limits.
This collaboration has already been enormously productive to the G1 team:
- We have identified various scalability issues (bottlenecks and contention points) using a specially instrumented G1 based JVM. None of the issues identified so far look like design problems and should be fixed in the coming months,
- We have been able to test experimental changes to G1. For example, we found that for big JVMs (e.g. 32GB) bigger region sizes yielded a huge performance improvement. Tony and Laurent had suspected this would be the case and, with the help of OpenDS, we could prove it.
The benefits to the OpenDS team are obvious:
- Early experience with this exciting new garbage collector,
- A feel for how OpenDS will behave with G1 and great support from the guys who are developing it!
- We can adapt future development of OpenDS based on our findings.
The great news is that G1 looks like a very promising way forward for us. I can already see the potential benefits - even under heavy write load with really big caches the pause times remain stable and short lived, something we have never been able to achieve before.
An experimental version of G1 will be available in JDK6u14 (coming soon). Unfortunately, I don't think that it will include the improvements arising as a result of the G1 - OpenDS collaboration work, as these are arriving too late in the release cycle.