Java 6 Leads Out of the Box Server Performance

Java 6 is finally here. Its our fastest, most reliable release and specifically targets out-of-box performance. What does this mean? Simply put it means no tuning options are needed for the JVM to achieve optimal performance. Looking at the bigger picture it means much more. No longer will you spend hours pouring over cryptic JVM tuning parameters to determine the optimal configuration for your application. No more expensive re-qualification of your application for special command line tuning. Java 6 makes performance tuning easy.

Java 6, powered by the recently open-sourced HotSpot JVM is impressive. Here's a summary:

  • On SPECjbb2005 the numbers are impressive. Java 6 out of the box is more than 40% ahead of the competition on Intel Core, and 30% ahead on AMD Opteron.
  • On Scimark Java 6 continues to show solid performance leading the performance of the competition by more than 40%.
  • On Volano, Java 6 improves performance by more than 20% over the most recent update of the JDK 5.
The out of box performance of a Java application is an intriguing and difficult engineering problem. The requirements of client and server applications couldn't be more different. On one hand client apps want fast startup and low footprint, on the other hand server applications want highly optimized code, throughput and low pause times; while both want reliability and compatibility.

Out of box performance is the right goal for JVM development, and future Java benchmarks should reflect that goal. Delivering optimizations quickly to allow high benchmark results is fun but it doesn't help customers unless they become part on the default runtime behavior of the JVM.

Just to be clear and to reiterate once again, the intention of the data charts below is to highlight the importance of customer experience and out-of-box performance to Sun Java Engineering. These are not meant to be high performance benchmark results. Hand tuning can change the results significantly.

The following is an out-of-box performance comparison on a Dell 2950 and a Sun Fire X4200. The Dell system is configured with 2 dual-core Intel 5160 processors (2 CPUs, 4 cores @ 3.0Ghz) and 16GB of RAM. The Sun system is configured with 2 dual-core Opteron 280 processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System installed on both systems is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The only variable in this configuration is the JVM.

The JVM distributions and versions tested were the latest versions publicly available at the time of testing. The BEA JRockit JVMs tested are downloaded from their main GA website and their 64-bit performance update website. The IBM JVM is the latest available on the IBM developer website.

  • Sun JDK 1.5.0_10
    • 32-bit: java version "1.5.0_10" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03) Java HotSpot(TM) Server VM (build 1.5.0_10-b03, mixed mode)
    • 64-bit: java version "1.5.0_10" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_10-b03, mixed mode)
  • Sun Java SE 6 build 98
    • 32-bit: java version "1.6.0" Java(TM) SE Runtime Environment (build 1.6.0-b105) Java HotSpot(TM) Server VM (build 1.6.0-rc-b99, mixed mode)
    • 64-bit: java version "1.6.0" Java(TM) SE Runtime Environment (build 1.6.0-b105) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-b105, mixed mode)
  • IBM JDK 5.0 SR3
    • 32-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxi32dev-20061002a (SR3) ) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20061001 (JIT enabled) J9VM - 20060915_08260_lHdSMR JIT - 20060908_1811_r8 GC - 20060906_AA) JCL - 20061002
    • 64-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxa64dev-20061002a (SR3) ) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20061001 (JIT enabled) J9VM - 20060915_08260_LHdSMr JIT - 20060908_1811_r8 GC - 20060906_AA) JCL - 20061002
  • BEA JRockit 5.0_06 R26.4
    • 32-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-ia32, )
    • 64-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build P26.4.1-12-67782-1.5.0_06-20061003-1620-linux-x86_64, )
As stated above and in the title no JVM tuning options were used for these results. The results below are statistical comparisons. No less than 10 samples were performed, and a T-test (single-tailed) was used to ensure confidence in the result. The data is normalized to the 32-bit IBM JDK 5 SR3 result.

The first set of charts reflect performance on Intel's latest Core 2 micro-architecture. The results below, particularly the SPECjbb2005 results, strongly highlight a core difference in philosophy between Sun HotSpot and its competitors. If you look at highly tuned competitive submissions of our competitors, BEA JRockit in particular, have impressive numbers on the new chip. Our competitors have chosen to quickly deliver platform specific performance optimizations for the purpose of benchmark submissions but require the use of several tuning parameters to achieve that level of performance. Unfortunately this is quite misleading for customers. Yes, the benchmark numbers are good, but can a customer jump right in and use these features? If they were thoroughly tested and ready for prime time shouldn't they be enabled by default on the platforms that require them? We think so, and we have chosen differently, and thats the difference with HotSpot.

The first chart is SPECjbb2005. SPECjbb2005 is SPEC's benchmark for evaluating the performance of server side Java. It evaluates server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). It extensively stresses Java collections, BigDecimal, and XML processing. The cool thing about SPECjbb2005 is that optimizations targeted for it also show performance gains in other competitive benchmarks, such as SPECjappserver2004, and a broad range of customer workloads. The benchmark results below are run in single instance mode.

SciMark 2.0 is a Java benchmark for scientific and numerical computing and is a benchmark where Sun's JVMs have continued to shine. Its a decent test of generated code, particularly for tight computational loops. However it is particularly sensitive to alignment issues and can show some level of variance from run to run, mostly in a bimodal fashion. The test has three modes of exectution; small, large, and default. This is the size of the data under test, more details can be found at the scimark website. All in all its a good set of microbenchmarks.

Note that the 32-bit JVMs in all cases are faster than the 64-bit JVMs when running on the Intel Core system. This is quite different than the AMD Opteron system further down the page where 64-bit is significantly faster. Since the Scimark 2.0 test is using the large dataset, its likely that the added pressure of 64-bit pointers on the memory subsystem increases bandwidth enough to impede performance, however this is just a hypothesis.

Volano is a popular Java chat server. The benchmark is quick and involves both a client and server instance. From a JVM perspective the workload is heavily dominated by classic Java socket I/O which is a bit long in the tooth, an NIO version would be quite interesting. That being said, some customers have found this benchmark quite useful so we continue to test it, however it is by no means our favorite benchmark as my friends at BEA have suggested. Running Volano the performance gaps are not as large, most likely because this benchmark has very little garbage collection overhead. BEA JRockit is showing good performance here with a result thats 19% over the baseline. Sun Java SE 6 shines as well with a result thats nearly 22% over baseline.

The second set of charts are run on a Sun Fire X4200 with AMD Opteron 280 CPUs. This is the identical system used in my previous blog articles on this subject, this time with updated JVM releases from Sun and IBM. I'm sure someone will be curious why I didn't compare the Intel and AMD based systems directly. The primary reason is simple, I'm writing about JVM performance, not CPU performance. That being said, I didn't have the latest AMD CPUs readily available. In short, Intel is faster when running some of these benchmarks, while AMD is faster on others. In general the memory subsystem differences between these platforms is prevalent when comparing the performance of Java benchmarks. Sun Java 6 is showing impressive results running SPECjbb2005 with a result 30% over baseline and ~15% faster than J2SE 5.0_10.

Scimark 2.0 is impressive on AMD Opteron as well. The large dataset is an interesting workload as its effect on cache can highlight memory subsystem limitations. If your application crunches on a large dataset, take a look at the large dataset of Scimark when comparing JVMs and system architectures.

Last but not least is Volano on AMD Opteron (and again, no this is not our favorite benchmark!). Java 6 shows a strong improvement of with results more than 20% greater than 5.0_10, pulling ahead of 64-bit BEA JRockit. Nice.

SPECjbb2005 Result Disclosure
Single Instance Run. SPECjbb2005 bops = SPECjbb2005 bops/JVM
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.

JVM Version 32-bit SPECjbb2005 bops 64-bit SPECjbb2005 bops
IBM 5.0 SR3 43,575 32,617
BEA JRockit 5.0_06 R26.4 26,071 26,092
Sun J2SE 5.0_10 49,308 46,080
Sun Java SE 6 62,246 56,488

System: Sun Fire X4200, 2 X AMD Opteron 280 (2 CPUs, 4 cores @ 2.4Ghz), 16GB of RAM.
JVM Version 32-bit SPECjbb2005 bops 64-bit SPECjbb2005 bops
IBM 5.0 SR3 30,500 23,998
BEA JRockit 5.0_06 R26.4 19,309 19,185
Sun J2SE 5.0_10 35,297 31,096
Sun Java SE 6 39,973 34,975

SciMark 2.0 Result Disclosure
Large Dataset. Score is in SciMark MFlops
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.

JVM Version 32-bit Score 64-bit Score
IBM 5.0 SR3 171.49 207.21
BEA JRockit 5.0_06 R26.4 278.15 276.37
Sun J2SE 5.0_10 321.85 292.89
Sun Java SE 6 357.72 336.58

System: Sun Fire X4200, 2 X AMD Opteron 280 (2 CPUs, 4 cores @ 2.4Ghz), 16GB of RAM.
JVM Version 32-bit Score 64-bit Score
IBM 5.0 SR3 175.02 180.46
BEA JRockit 5.0_06 R26.4 230.85 231.53
Sun J2SE 5.0_10 300.23 332.23
Sun Java SE 6 320.42 343.74

VolanoMark 2.5.0.9 Result Disclosure
Loopback performance test
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.

JVM Version 32-bit Score 64-bit Score
IBM 5.0 SR3 121,747 111,826
BEA JRockit 5.0_06 R26.4 128,185 146,012
Sun J2SE 5.0_10 120,048 116,959
Sun Java SE 6 149,198 142,602

System: Sun Fire X4200, 2 X AMD Opteron 280 (2 CPUs, 4 cores @ 2.4Ghz), 16GB of RAM.
JVM Version 32-bit Score 64-bit Score
IBM 5.0 SR3 64,218 60,802
BEA JRockit 5.0_06 R26.4 73,627 76,675
Sun J2SE 5.0_10 66,955 64,316
Sun Java SE 6 80,592 75,156

SPEC(R) and the benchmark name SPECjbb(TM) are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect experiments performed by Sun Microsystems, Inc. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.
Comments:

Hi,

Nice stuff!

Presumably the deployer will still have to explicitly specify -client/-server for now if their host's parameters don't match the the "ergonomics" automatic choice for that flag, eg a powerful client machine that would force -server by default for an interactive app.

Also, presumably it is still necessary to choose (say) CMS GC explicitly if running a highly-interactive app, ie where absolute performance is trumped by the need to keep pauses short?

Rgds

Damon

Posted by Damon Hart-Davis on December 11, 2006 at 01:57 AM EST #

Yes you're assumptions are correct for now. Future development plans target the situations you reference. If you're curious feel free to participate in the OpenJDK project to get the latest information.

Posted by dagastine on December 11, 2006 at 03:19 AM EST #

Cool. I've always wondered about a persitent JIT cache. I would assume something like that could provide higly optimized code without significantly impacting startup time.

Posted by Michael Slattery on December 13, 2006 at 04:01 AM EST #

Post a Comment:
Comments are closed for this entry.
About

dagastine

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today