Java SE Out of Box Competitive Performance

Out-of-Box Performance, or no tuning options is in many ways our ultimate goal in HotSpot development. As a JVM performance engineer I too have spent countless hours tweaking command line arguments to squeeze out the last remaining bit of performance. In my last blog entry I asked if there was interest in an out of box competitive performance comparison and the second comment I received hit it on the nose. Command line tuning, albeit fruitful at times, can also be a royal waste of time. Especially when you're shooting in the dark trying any option you can find without any knowledge to what the flag is doing.

Your friends in HotSpot engineering don't want you spending time tuning either. That was the driving force behind Java SE 5.0 Ergonomics and why key performance features previously available via JVM options are now enabled by default in Java SE 6.

The intention of the data charts below is to highlight the importance of customer experience and out-of-box performance to Sun Java Engineering. These are not meant to be high performance benchmark results. Hand tuning can change the results significantly.

The following is an out-of-box performance comparison on a Sun Fire X4200. The system is configured with 2 dual-core Opteron 280 Processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The only variable in this configuration is the JVM.

The JVM distributions and versions tested were the latest versions publicly available at the time of testing. I was sure to use the BEA JRockit JVM used in recent SPECjbb2005 submissions. The IBM JVM is the latest available on the IBM developer website.

  • Sun JDK 1.5.0_08
    • 32-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) Server VM (build 1.5.0_08-b03, mixed mode)
    • 64-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_08-b03, mixed mode)
  • Sun Java SE 6 build 98
    • 32-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) Server VM (build 1.6.0-rc-b99, mixed mode)
    • 64-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b99, mixed mode)
  • IBM JDK 5.0 SR2
    • 32-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxi32dev-20060511 (SR2)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20060504 (JIT enabled) J9VM - 20060501_06428_lHdSMR JIT - 20060428_1800_r8 GC - 20060501_AA) JCL - 20060511a
    • 64-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxa64dev-20060511 (SR2)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20060504 (JIT enabled) J9VM - 20060501_06428_LHdSMr JIT - 20060428_1800_r8 GC - 20060501_AA) JCL - 20060511a
  • BEA JRockit 5.0_06 R26.4
    • 32-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-ia32, )
    • 64-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build P26.4.0-10-62459-1.5.0_06-20060529-2101-linux-x86_64, )
As stated above and in the title no JVM tuning options were used for these results. The results below are statistical comparisons. No less than 10 samples were performed, and a T-test (single-tailed) was used to ensure confidence in the result. The data is normalized to the 32-bit Sun JDK 1.5.0_08 result.

The first chart is SPECjbb2005. SPECjbb2005 is SPEC's benchmark for evaluating the performance of server side Java. It evaluates server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). It extensively stress Java collections, BigDecimal, and XML processing. The cool thing about SPECjbb2005 is that optimizations targeted for it also show performance gains in other competitive benchmarks, such as SPECjappserver2004, and a broad range of customer workloads. The benchmark results below are run in single instance mode. Notice the impressive gains with Java SE 6 with nearly a 15% improvement over JDK 5.0_08. Also notice there is very little difference between 32-bit and 64-bit BEA JRockit results.

SciMark 2.0 is a Java benchmark for scientific and numerical computing and is a benchmark where Sun's JVMs have continued to shine. Its a decent test of generated code, particularly for tight computational loops. However it is particularly sensitive to alignment issues and can show some level of variance from run to run, mostly in a bimodal fashion. All in all its a good set of microbenchmarks. Notice that 64-bit is faster than 32-bit for all of the JVMs under test. The additional registers available running 64-bit on AMD Opteron certainly do impact computational performance.

Volano is a popular Java chat server. The benchmark is quick and involves both a client and server instance. From a JVM perspective the workload is heavily dominated by classic Java socket I/O which is a bit long in the tooth, an NIO version would be quite interesting. That being said, some customers have found this benchmark quite useful so we continue to test it. Running Volano the performance gaps are not as large, most likely because this benchmark has very little garbage collection overhead. BEA JRockit is showing good performance here with a result thats 10% over the baseline. Sun Java SE 6 shines as well with a result thats nearly 20% over baseline.

In summary, we in Java SE Performance and HotSpot Engineering feel that out-of-box performance is extremely important to Java developers and customers, and I hope the results above differentiate our product and highlight our ongoing work and focus. Next step is a out-of-box vs. highly tuned comparison. Stay tuned.

SPEC(R) and the benchmark name SPECjbb(TM) are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect experiments performed by Sun Microsystems, Inc. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Comments:

David Very cool blog post.. How does one use the Server VM in a stand alone application? i.e. Do we need to download and install a different VM or does it come with the SE distribution (if so how do we choose to use it instead of the client VM)? PD

Posted by powerdroid on October 10, 2006 at 06:52 AM EDT #

If you're running Solaris or Linux on a server class machine the server compiler is the default. A server class machine is one that has 2 physical core and 2 GB of RAM. To chose the server compiler explicitly, simply use the -server flag. If you are using the JRE on Windows you'll need to download the JDK found on http://java.sun.com as the JRE only includes the client compiler as Java on Windows is targeted for client apps where startup and footprint are the primary issues.

Posted by dagastine on October 11, 2006 at 02:56 AM EDT #

David, Thank you for your response. I'm testing some repetitive float math. The loop is as follows: //start code double a = 3.0; double b = 1.2; double c = 2.0; double d = 12.0; double y = 0.5; double x = 0.0; long num = 0; int k; int z; int m; for (k = 0; k<=192000; k++) { for (z = 0; z<=300; z++) { num=num+k+z; x=k\*a + b\*b\*c +a\*c +y\*a\*b\*c; x=z\*a + b\*b\*c +a\*c +y\*a\*b\*c; x=k\*z + b\*b\*c +a\*c +y\*a\*b\*c; x=y\*z\*k + b\*b\*c +a\*c +y\*a\*b\*c; } } //end code This is simply a test to see if I can get HotSpot's (either client or server mode) to come close to c++. I can't figure out why I can't get it close. Generally, this is taking around 250 - 350ms with Java5 and around 50ms when I bounce it out to c++ via jni. I fully expected HotSpot to optimize it and get near or better than c++...any ideas what might be going wrong? Does HotSpot or javac unroll loops or do any optimizations that can be utilized? Perhaps you could discuss squeezing out performance for situations like this in a future entry?? Thanks again, PD

Posted by powerdroid on October 11, 2006 at 07:43 AM EDT #

Sorry David, my post (and code section) looks pretty bad as it seems to not have pasted any formatting.. PD

Posted by powerdroid on October 11, 2006 at 07:44 AM EDT #

PD, I took a look at your code. Yes, HotSpot does aggressive optimizations and it looks like you are measuring the effect of compilation. The results are actually quite amusing. I'll try to post this code as an attachment, but I'll give you a description of what I've done. Basically I'm moved the loop into a separate method, wrapped it with currentTimeMillis() and called in 5 times from the main method. This is run on my Mac Book Pro. Time elapsed (ms): 299

Time elapsed (ms): 293

Time elapsed (ms): 0

Time elapsed (ms): 0

Time elapsed (ms): 0

So what happened? Basically HotSpot noticed that no work was returned from the method, so all the work was eliminated. How's that for optimizing! I'll modify the code to do useful work and will post another comment or blog a bit later.

Posted by dagastine on October 11, 2006 at 10:05 AM EDT #

This is pretty much the same setup I did, except I did not move the loop to a separate method. I even looped through the loop and calculated an average of the ms of each run. It pretty much came out with the same results. The 250-350 ms range I gave was the range I saw between running on windows via bootcamp and running on OSX. I wonder if the c++ code via jni is doing the same thing you found hotspot doing...meaning it may be optimizing and doing nothing since there is no return value. Thus, it would look like a huge speed improvement, when really it is really just not a good test.

Posted by powerdroid on October 11, 2006 at 11:12 PM EDT #

David
The theory mentioned in my previous comment seems to be exactly what was happening. Thank you for pointing that out.
When changing the "all java" approach to have the double calculations in it's own method and then changing both the "all java" and jni approach to actually return the value of x, they both approach taking the same amount of time to process.
The -server flag during running helps tremendously.
PD

Posted by powerdroid on October 12, 2006 at 04:08 AM EDT #

This whole thing warms my heart. -Steve

Posted by Steve Wilson on October 13, 2006 at 02:45 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

dagastine

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today