Monday Oct 16, 2006

No Tuning Required: Java SE Out-of-Box Vs. Tuned Performance

In my last entry titled Java SE Out-of-Box Competitive Performance I stressed the importance of out-of-box performance to customers and developers and how it is a passionate focus for Sun HotSpot and JVM Performance engineering. The following is a comparison of out-of-box and hand-tuned performance. The charts below are run on the same system as my previous entry and the charts are normalized to the same baseline, therefore the two sets of charts are directly comparable.

I have to say the numbers are quite impressive (hence the "No Tuning Required" in the title). My colleagues are going to say I'm blogging us out of a job :-).

  • On SPECjbb2005 the numbers are impressive. JDK 5.0_08 is ~22% faster when tuned compared to JDK 5.0_08 right out of the box. JDK 6 is ~11% faster when tuned versus right out of the box, and JDK 6 out of the box is only ~7% slower than a highly tuned JDK 5.0_08. Very impressive indeed!
  • On Scimark, tuning only improved slightly when running JDK 5.0_08. JDK 6 is more or less a wash.
  • On Volano, except when running JDK 5.0_08 64-bit, the out-of-the-box configuration seems to work well, and doesn't require any explicit tuning.
The system under test is a 2-way dual-core Opteron 280 Processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The charts are statistical comparisons. No less than 10 samples were performed, and a T-test (single-tailed) was used to ensure confidence in the significance of the result. The data is normalized to the 32-bit Sun JDK 1.5.0_08 out-of-box result.

The following JVMs were tested:

  • Sun JDK 1.5.0_08
    • 32-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) Server VM (build 1.5.0_08-b03, mixed mode)
    • 64-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_08-b03, mixed mode)
  • Sun Java SE 6 build 99
    • 32-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) Server VM (build 1.6.0-rc-b99, mixed mode)
    • 64-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b99, mixed mode)
The following command line arguments were used:
  • SPECjbb2005
    • J2SE 5.0_08 32-bit: -Xmn1g -Xms1500m -Xmx1500m -XX:+UseBiasedLocking -XX:+AggressiveOpts -XX:+UseLargePages -XX:+UseParallelOldGC -Xss128k
    • J2SE 5.0_08 64-bit: -Xmn2g -Xms3g -Xmx3g -XX:+UseBiasedLocking -XX:+AggressiveOpts -XX:+UseLargePages -XX:+UseParallelOldGC -Xss128k
    • Java SE 6 RC1 32-bit: -Xmn1g -Xms1500m -Xmx1500m -XX:+UseLargePages -XX:+UseParallelOldGC -Xss128k
    • Java SE 6 RC1 64-bit: -Xmn2g -Xms3g -Xmx3g -XX:+UseLargePages -XX:+UseParallelOldGC -Xss128k
  • SciMark2
    • J2SE 5.0_08 32-bit: -XX:+UseBiasedLocking
    • J2SE 5.0_08 64-bit: -XX:+UseBiasedLocking
    • Java SE 6 RC1 32-bit: -XX:+DoEscapeAnalysis
    • Java SE 6 RC1 64-bit: -XX:+DoEscapeAnalysis
  • Volano 2.5.0.9
    • J2SE 5.0_08 32-bit: -XX:CompileThreshold=1500
    • J2SE 5.0_08 64-bit: -XX:CompileThreshold=1500
    • Java SE 6 RC1 32-bit: -XX:CompileThreshold=1500 -XX:-UseBiasedLocking
    • Java SE 6 RC1 64-bit: -XX:CompileThreshold=1500 -XX:-UseBiasedLocking
The SPECjbb2005 numbers are impressive. JDK 5.0_08 is ~22% faster tuned compared to JDK 5.0_08 out-of-box. JDK 6 is only ~11% faster tuned vs. JDK 6 out-of-box, and JDK 6 out-of-box is only ~7% slower than highly tuned JDK 5.0_08. Nice.

Tuning only improved Scimark slightly when running 5.0_08. When running JDK 6 it's more or less a wash. The JDK 6 64-bit difference is statistically insignificant.

Tuning seems to hurt Volano, except when running 5.0_08 64-bit. Come to find out the negative differences are statistically insignificant so tuning is a wash with Volano as well.

In summary, meeting or exceeding tuned performance is the end game for out-of-box performance engineering. The above results make me quite proud of our accomplishments. Yes, every application is different and in some cases we'll find ourselves needing to tune. But chances are if you let us know the issues you're facing a release or two down the line you won't need to tune. Eventually it will just be us geeks who can't help it :-). Next step is a Solaris x86 vs Linux comparison. Stay tuned.

SPEC(R) and the benchmark name SPECjbb(TM) are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect experiments performed by Sun Microsystems, Inc. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Friday Oct 06, 2006

Java SE Out of Box Competitive Performance

Out-of-Box Performance, or no tuning options is in many ways our ultimate goal in HotSpot development. As a JVM performance engineer I too have spent countless hours tweaking command line arguments to squeeze out the last remaining bit of performance. In my last blog entry I asked if there was interest in an out of box competitive performance comparison and the second comment I received hit it on the nose. Command line tuning, albeit fruitful at times, can also be a royal waste of time. Especially when you're shooting in the dark trying any option you can find without any knowledge to what the flag is doing.

Your friends in HotSpot engineering don't want you spending time tuning either. That was the driving force behind Java SE 5.0 Ergonomics and why key performance features previously available via JVM options are now enabled by default in Java SE 6.

The intention of the data charts below is to highlight the importance of customer experience and out-of-box performance to Sun Java Engineering. These are not meant to be high performance benchmark results. Hand tuning can change the results significantly.

The following is an out-of-box performance comparison on a Sun Fire X4200. The system is configured with 2 dual-core Opteron 280 Processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The only variable in this configuration is the JVM.

The JVM distributions and versions tested were the latest versions publicly available at the time of testing. I was sure to use the BEA JRockit JVM used in recent SPECjbb2005 submissions. The IBM JVM is the latest available on the IBM developer website.

  • Sun JDK 1.5.0_08
    • 32-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) Server VM (build 1.5.0_08-b03, mixed mode)
    • 64-bit: java version "1.5.0_08" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_08-b03, mixed mode)
  • Sun Java SE 6 build 98
    • 32-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) Server VM (build 1.6.0-rc-b99, mixed mode)
    • 64-bit: java version "1.6.0-rc" Java(TM) SE Runtime Environment (build 1.6.0-rc-b99) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b99, mixed mode)
  • IBM JDK 5.0 SR2
    • 32-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxi32dev-20060511 (SR2)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20060504 (JIT enabled) J9VM - 20060501_06428_lHdSMR JIT - 20060428_1800_r8 GC - 20060501_AA) JCL - 20060511a
    • 64-bit: Java(TM) 2 Runtime Environment, Standard Edition (build pxa64dev-20060511 (SR2)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20060504 (JIT enabled) J9VM - 20060501_06428_LHdSMr JIT - 20060428_1800_r8 GC - 20060501_AA) JCL - 20060511a
  • BEA JRockit 5.0_06 R26.4
    • 32-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-ia32, )
    • 64-bit: java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) BEA JRockit(R) (build P26.4.0-10-62459-1.5.0_06-20060529-2101-linux-x86_64, )
As stated above and in the title no JVM tuning options were used for these results. The results below are statistical comparisons. No less than 10 samples were performed, and a T-test (single-tailed) was used to ensure confidence in the result. The data is normalized to the 32-bit Sun JDK 1.5.0_08 result.

The first chart is SPECjbb2005. SPECjbb2005 is SPEC's benchmark for evaluating the performance of server side Java. It evaluates server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). It extensively stress Java collections, BigDecimal, and XML processing. The cool thing about SPECjbb2005 is that optimizations targeted for it also show performance gains in other competitive benchmarks, such as SPECjappserver2004, and a broad range of customer workloads. The benchmark results below are run in single instance mode. Notice the impressive gains with Java SE 6 with nearly a 15% improvement over JDK 5.0_08. Also notice there is very little difference between 32-bit and 64-bit BEA JRockit results.

SciMark 2.0 is a Java benchmark for scientific and numerical computing and is a benchmark where Sun's JVMs have continued to shine. Its a decent test of generated code, particularly for tight computational loops. However it is particularly sensitive to alignment issues and can show some level of variance from run to run, mostly in a bimodal fashion. All in all its a good set of microbenchmarks. Notice that 64-bit is faster than 32-bit for all of the JVMs under test. The additional registers available running 64-bit on AMD Opteron certainly do impact computational performance.

Volano is a popular Java chat server. The benchmark is quick and involves both a client and server instance. From a JVM perspective the workload is heavily dominated by classic Java socket I/O which is a bit long in the tooth, an NIO version would be quite interesting. That being said, some customers have found this benchmark quite useful so we continue to test it. Running Volano the performance gaps are not as large, most likely because this benchmark has very little garbage collection overhead. BEA JRockit is showing good performance here with a result thats 10% over the baseline. Sun Java SE 6 shines as well with a result thats nearly 20% over baseline.

In summary, we in Java SE Performance and HotSpot Engineering feel that out-of-box performance is extremely important to Java developers and customers, and I hope the results above differentiate our product and highlight our ongoing work and focus. Next step is a out-of-box vs. highly tuned comparison. Stay tuned.

SPEC(R) and the benchmark name SPECjbb(TM) are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect experiments performed by Sun Microsystems, Inc. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Thursday Sep 21, 2006

Java SE Out of Box Performance: Any interest in a performance comparison?

Out of box performance, or using no JVM tuning options, has been a focus of Sun HotSpot Engineering for quite some time. Our first major steps came with J2SE 5.0 Ergonomics, and we're taking it further in JDK 6 with many of our performance features enabled by default. I find it quite cool when no tuning yields performance close to or exceeding the best I can muster with command line tuning.

With that, I'd like to publish some "Out of Box" competitive performance comparisions on my blog. As you can imagine this could be a bit of a touchy subject for our competitors. Before I post data I'd like to get a feel of how interesting this would be. So, I'd like to ask for a bit of feedback. Is there interest in a Java SE out of box competitive performance out there? Are there any benchmarks that people would like to see? I was thinking the usual benchmarks I talk about, SPECjbb2005 and SciMark. Any others?

Thanks in advance for the feedback!

Thursday Aug 10, 2006

Sun JDK 5.0_08 Is Now Available!

JDK 5.0_08 is now publicly available on Java.sun.com!. Another fine day for Sun Java Performance. This is our highest performing and most reliable release to date. We have demonstrated winning performance across Sun's server offering, from x64 Systems to CoolThread servers then all the way up to the Sun Fire E25K.

Winning performance on The Sun Blade X8400, beating BEA JRockit on a comparable system! (Sun Hotspot result, BEA JRockit result)
Winning performance on The Sun Fire T1000 and Sun Fire T2000 benchmark result (T1000 result,T2000 result)
Winning performance on The Sun Fire E25K (benchmark result)

SPECjbb2005 Sun Fire T1000 (1 chip, 8 cores) 60,323 SPECjbb2005 bops, 15,081 SPECjbb2005 bops/JVM; Sun Fire T2000 (1 chip, 8 cores) 74,365 SPECjbb2005 bops, 18,591 SPECjbb2005 bops/JVM; Sun Fire E25K (72-way, 72 chips, 144 cores) 1,387,437 SPECjbb2005 bops, 19,270 SPECjbb2005 bops/JVM; Sun Blade X8400 (8 cores, 4 chip, Solaris 10, Sun HotSpot 5.0_08) 121,228 SPECjbb2005 bops, 30,307 SPECjbb2005 bops/JVM; Fabric7 Q80 (8 cores, 4 chip, Microsoft Windows Server 2003, JRockit 5.0 P26.4.0) . SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 06/19/06 on www.spec.org.

Wednesday Jun 21, 2006

Sun Java and the Sun Fire E25K Raise the Bar on SPECjbb2005

The Sun Fire 25K and Sun J2SE 5.0_08 team up to demonstrate leadership on large servers running SPECjbb2005, increasing performance by 19.1% over our previous submission on the same hardware. Not bad for 6 months of performance work! The 72-way Sun Fire 25K score is 1,387,437 SPECjbb2005 bops, 19,270 SPECjbb2005 bops/JVM. That is 11% faster than the 128-way Fujitsu PRIMEPOWER 2500 and many times faster than IBM's fastest SPECjbb2005 result to date. The BMSeer once again beats me the punch talking about SPECjbb2005 results, he/she (who is BMSeer anyway?) has a great piece talking about this result. Required Disclosure Statement SPECjbb2005 Sun Fire E25K (72-way, 72 chips, 144 cores) 1,387,437 SPECjbb2005 bops, 19,270 SPECjbb2005 bops/JVM, Fujitsu PRIMEPOWER 2500 (128 chips, 128 cores) 1,251,024 SPECjbb2005 bops, 39,095 SPECjbb2005 bops/JVM, IBM eServer p5 570 (8 chips, 16 cores, 16-way) 244,361 SPECjbb2005 bops, 30,545 SPECjbb2005 bops/JVM. SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 06/19/06 on www.spec.org.

Monday Jun 19, 2006

Sun Java vs. C#

Here's my latest round of platform performance comparisons using Scimark. This time I compare Java to C# and once again Java performance is looking quite good. Thanks to Tony Zhang, another colleague of mine on the performance team who ran the initial performance comparision a few months back and provided me the environment to re-run the tests with our latest JVMs. The system under test was a 4 CPU Intel Xeon MP server (4x2.78 GHz, 8 cores, 3.87 GB memory) running Microsoft Windows 2003 Server and .NET 2.0. The CLR version under test according to SciMark was 2.0.50727.42. We used the Scimark 2.0 C# port found here. The HotSpot server compiler (-server) was used for both J2SE 5.0_08 and Java SE 6 b87. SciMark was run with the large data set (-large). Also, I found the chart below in an interesting writeup showing similar performance comparisions with older versions of the JVM. I particularly like HotSpot's performance lead over JRockit.

Friday Jun 09, 2006

Sun Java is faster than C/C++ (Round 2)

I received a few comments on my previous blog entry saying the results were bogus since I used an old compiler. I quickly found another test system running Suse SLES 9 U2 with gcc 3.3.3 and repeated the test. If I get around to installing the latest Visual Studio I'll repeat the test there as well. The JVM versions are different as I wanted to quickly post the results. Guess what, the results are a lot better! I ran this several times and its quite repeatable. I appreciate comments so please let me know what your thoughts. Especially if there are issues with the choice of gcc 3.3.3. The system under test was a 2 x 3.0Ghz Intel Xeon MP System (4-core) running Suse SLES 9 U2 and gcc 3.3.3. The C code was compiled with full optimization as shown by the Makefile in the SciMark source package. This time no tuning parameters were used for either 5.0_08 or 6.0 b83. Here's some output from /proc/cpuinfo: vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) MP CPU 3.00GHz For background here's the skinny on SciMark2. Scimark2 is a set of simple numerical kernels and its performance is directly related to the performance and quality of the generated code. The tests are single threaded and have little to no garbage collection overhead. In short, a great set of applications to compare statically compiled C code and dynamically compiled Java. This time Java is 35% faster than C. Here's a breakdown of the subtests. C is only ahead on Sparse MatMult by a small margin. Any one interested to see how the other JVM vendors look? Can JRockit or IBM beat C?

Sun Java is faster than C/C++

This is quite cool. Andy Johnson, a colleague of mine on the Java performance team, did a few performance tests comparing Java to native C. SciMark2 was used for the performance comparision. The system under test was a 2Ghz Pentium white box running Windows 2000 and using the Microsoft Visual C/C++ 6. The C code was compiled with full optimization. The server compiler was used for both J2SE 5.0_07 and Java SE 6. Scimark2 is a set of simple numerical kernels and its performance is directly related to the performance and quality of the generated code. The tests are single threaded and have little to no garbage collection overhead. In short, a great set of applications to compare statically compiled C code and dynamically compiled Java. The chart below is quite revealing. Both the charts are normalized to J2SE 5.0_07. Native C is only 3% faster than 5.0_07 and Java SE 6 pulls ahead of native C by 2%. The following chart breaks the comparison down further. Remember SciMark2 is a composite benchmark and the overall score is a simple mean of each subtest mflops score. With that, Java is ahead in some cases, and behind in others. Actually Java is ahead in all cases except Sparse Matmult. Looks like we have something to look at for additional optimization.

Friday Jun 02, 2006

Java Performance Continues to Accelerate on Sun CoolThreads Technology

The performance of Java on Sun CoolThreads servers continues to be impressive. Our latest round of improvements have increased performance on SPECjbb2005 by 17% on the Sun Fire T1000 and T2000. If you thought the competitive positioning of these systems was impressive before, take a look at them now. The charts below represent the competitive landscape for the Sun CoolThreads servers and by no means are they meant to be a complete comparison of all systems in the classes described below. If there are particular descrepencies that are annoying, please let me know. For more detailed information on the Sun Fire T1000 and T2000 and comparisions running competitive benchmarks check out BMSeer's blog. The first chart shows the competitive landscape for 1 RU servers. The Sun Fire T1000 shines compared to other systems in this space. The Sun Fire X4100 (powered by AMD Opteron CPUs) looks rather good as well. The second chart shows the competitive landscape for 2 RU and 4 RU servers. The Sun Fire T2000 shows impressive performance against the competition in this space as well. Now this is were the Sun Fire T1000 and Sun Fire T2000 truly excel. The first power performance graph shows a comparision based on performance per watt using the SPECjbb2005 bops metric. The data presented is limited to what I've gathered using The Sun Fire CoolThreads systems and what has been gathered on http://www.sun.com/coolthreads. Here's another look at power performance using the SWaP metric. The SWaP metric is similar to performance / Watt, but includes system footprint as a part of the equation. The Sun Fire T1000 number is impressive. The light bulb next to my workbench in my basement uses more power than this server. For those individuals who prefer a spreadsheet to charts, here the same information as show above. Finally, this chart shows the performance difference between J2SE 5.0_06 and J2SE 5.0_08 on the same hardware, demonstrating a 17% increase in performance on both the Sun Fire T1000 and Sun Fire T2000. If we can improve performance by 17% in 6 months, wait to you see what Java SE 6 ("Mustang") can do. Required Disclosure Statement: SPECjbb2005 Sun Fire T1000 (1 chip, 8 cores) 51,528 SPECjbb2005 bops, 12,882 SPECjbb2005 bops/JVM submitted for review; SPECjbb2005 Sun Fire T2000 (1 chip, 8 cores) 74,365 SPECjbb2005 bops, 18,591 SPECjbb2005 bops/JVM submitted for review; Sun Fire X4100 (2 chips, 2 cores) 38,090 SPECjbb2005 bops, 19,045 SPECjbb2005 bops/JVM submitted for review; IBM eServer p5 550 (2 chips, 4 cores) 61,789 SPECjbb2005 bops, 61,789 SPECjbb2005 bops/JVM; IBM x346 (2 chips, 4 cores) 39,585 SPECjbb2005 bops, 39,585 SPECjbb2005 bops/JVM; IBM eServer p5 520 (1 chip, 2 cores) 32,820 SPECjbb2005 bops, 32,820 SPECjbb2005 bops/JVM; IBM eServer p5 510 (1 chip, 2 cores) 36,039 SPECjbb2005 bops, 36,039 SPECjbb2005 bops/JVM; Fujitsu Siemens RX220 (2 chips, 2 cores) 61,155 SPECjbb2005 bops, 30,578 SPECjbb2005 bops/JVM, Dell PE SC1425 (2 chips, 2 cores) 24,208 SPECjbb2005 bops, 24208 SPECjbb2005 bops/JVM; Dell PE 850 (1 chips, 2 cores) 31,138 SPECjbb2005 bops, 31,138 SPECjbb2005 bops/JVM; Dell PE 2950 (2 chips, 4 cores) 64,288 SPECjbb2005 bops, 64,288 SPECjbb2005 bops/JVM; SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 6/02/06 on www.spec.org

Monday May 22, 2006

Sun Fire T2000 Blows Cold Air!!

This year at JavaOne we had a demo at the performance pod demonstrating Java SE performance and scalability. We had a Sun Fire T2000 with a 1.0 Ghz UltraSPARC T1 processor and 8gb RAM running Sun J2SE 5.0_06, J2SE 5.0_08, and Java SE 6. Brian Doherty did the setup this year (thanks a lot Brian!). He spent the entire day on Monday fighting the networking issues on the JavaOne pavilion floor but was eventually able to get the demo working (but we had to buy our own USB to serial kit to do it). It was also quite cold in the building, and Brian didn't bring his jacket because of the 80 degree weather that day in San Francisco. So, just like any resourceful engineer working in a lab, Brian decided to warm his hands with the fan exhaust in the back of the Sun Fire T2000. Much to his surprise, the T2000 was blowing cold air! Over the next few days on the show floor we put the system to the test. We ran SPECjbb2005 every day for 10 hours straight with the CPU fully consumed at 100%. Guess what? It still blew cold air. This was absolutely amazing, especially since my little laptop is about to burn my legs as I type this... I find this incredible. At risk of being a bit annoying I asked nearly everyone who stopped by our booth to put their hands by the fans and feel the air. I wasn't the only one amazed, many people wanted to the the CPU stats to be sure the system was running full tilt. Very cool. (literally).

Sun Java Performance: Here we come again

I love performance work. The sweet taste of knowing that your product is the fastest is like no other. Perhaps it is because I have a competitive personality, but beating the competition is a lot of fun. And you know what? Active competition between vendors on public Java benchmarks benefits customers. So without further ado, I'd like to announce our latest round of world record Java competitive benchmark results. Sun J2SE 5.0_08, powered by the ripping fast Sun HotSpot JVM, has new world records running SPECjbb2005, improving our previous scores on the exact same hardware by a whopping 17%, and publishing the improved score in less than 6 months. See what I mean by sweet? The BMSeer has a great piece on the new results, check it out here. Be sure to check out the very popular press release here. To top it off, performance is not Sun Java Software's highest priority. I'm sure you're well aware that performance optimization is my highest priorty, but really its not the top focus of the organization as a whole. Our primary foci are reliability , compatibility (but performance and scalability are not that far down the list). We would pass up a 20% performance gain at the drop of a hat if it imposed any reliability risk. I do mean any risk, as a performance guy I've butted heads with this ideology many times in the past. But you know what, in the end I agree, because that's what customers need. Reliability is always first. Brian Doherty, an esteemed colleague of mine has often said, “The performance of a crashing JVM is zero”, and that's dead on. A close second is compatibility, but that's an easy one as it speaks to the core of what defines Java technology. I'm proud to say Sun has taken this to heart, we support more hardware and OS combinations than any other vendors, Any JVM vendor can claim they are the “World's fastest JVM”. Competitive benchmarking is a lot of fun and is an opportunity to promote software and hardware performance. What's important is that your application is as fast as you need it to be, and its so reliable that you don't have to think about it.

Friday Mar 10, 2006

Java Compatibility Call to Arms

Capatibility between Java implementations is critical to the success of the platform. Its the responsibility of the JRE vendor to ensure that any Java application will run. Yes, any Java application. After all, compatibility is the a key ingredient to what makes Java. "Write Once Run Anywhere", Right? Apparently this isn't always the case. Here's an example of a "compatibility issue identified on the Java.net Glassfish Project.":https://glassfish.dev.java.net/servlets/ReadMsg?list=dev&msgNo=761 There are always bugs in software and some of those bugs can break compatibility. It is of utmost importance that issues such as this are addressed in a timely manner. This is where you come in. When testing Java software, whether it be new development, a purchase evaluation, or your tried and true back office application, please do the following. Run your application with your JVM of choice, but also test it against other JVMs running on the same platform. That's right, if you're running Sun's JVM, also test BEA JRockit and IBM JDK. Multiple Java implementations are available on Windows, Linux, and now Solaris SPARC. If any of the implementations show incorrect behavior or dare I say don't run at all, I implore you to send a note to the implementation's support channels and if possible file a bug. None of the Java vendors out there can possibly test enough Java applications, and in many ways we're relying on the users to let us know if something's broken. In the end, any Java application should run on any Java implementation. Hands down. No excuses. If you run into problems, have questions about Java performance, or identify compatibility issues running Sun's JVMs, in particular "Java SE 6 ":https://mustang.dev.java.net, please post a note on the java.net performance forum or feel free to send me a comment here. I would love to hear your compatibility successes, along with the issues seen with our competitors JVMs :-) We're very serious about performance, compatibility, and reliability of the Java platform. If a vendor is not doing well in this regard I would like to know about so I can take steps to ensure compatibility guarantees of that implementation.

Thursday Mar 09, 2006

Java SE Tuning Tip: Large Pages on Windows and Linux

Enabling large page support on supported operating environments can give a significant boost to performance. This is especially true for applications with large datasets or running with large heap sizes. Below is a summary of how to enable large pages on Solaris, Windows, and Linux. The text is largely from the "HotSpot VM Options Page ":http://java.sun.com/docs/hotspot/VMOptions.html, but I've had a lot of questions about this and thought it merited highlighting the information here. Stay tuned for a revamped "HotSpot VM Options Page ":http://java.sun.com/docs/hotspot/VMOptions.html coming your way in the next few weeks. Beginning with Java SE 5.0 there is now a cross-platform flag for requesting large memory pages: -XX:+UseLargePages (on by default for Solaris, off by default for Windows and Linux). The goal of large page support is to optimize processor Translation-Lookaside Buffers. A Translation-Lookaside Buffer (TLB) is a page translation cache that holds the most-recently used virtual-to-physical address translations. TLB is a scarce system resource. A TLB miss can be costly as the processor must then read from the hierarchical page table, which may require multiple memory accesses. By using bigger page size, a single TLB entry can represent larger memory range. There will be less pressure on TLB and memory-intensive applications may have better performance. However please note sometimes using large page memory can negatively affect system performance. For example, when a large mount of memory is pinned by an application, it may create a shortage of regular memory and cause excessive paging in other applications and slow down the entire system. Also please note for a system that has been up for a long time, excessive fragmentation can make it impossible to reserve enough large page memory. When it happens, either the OS or JVM will revert to using regular pages. Operating system configuration changes to enable large pages:

Solaris

As of Solaris 9, which includes Multiple Page Size Support (MPSS), no additional configuration is necessary. If you're running 32-bit J2SE versions prior to J2SE 5.0 Update 5 on AMD Opteron hardware additional tuning is necessary. Due to a bug in HotSpot Large page code, the default large page size running the 32-bit x86 binary is 4mb. Since 4mb pages is not supported on Opteron, the large page request fails and the page size defaults to 8k. To get around this, explicitly set the large page size to 2mb with the following flag: -XX:LargePageSizeInBytes=2m

Linux

Large page support is included in 2.6 kernel. Some vendors have backported the code to their 2.4 based releases. To check if your system can support large page memory, try the following: # cat /proc/meminfo | grep Huge HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB If the output shows the three "Huge" variables then your system can support large page memory, but it needs to be configured. If the command doesn't print out anything, then large page support is not available. To configure the system to use large page memory, one must log in as root, then: # Increase SHMMAX value. It must be larger than the Java heap size. On a system with 4 GB of physical RAM (or less) the following will make all the memory sharable: # echo 4294967295 > /proc/sys/kernel/shmmax # Specify the number of large pages. In the following example 3 GB of a 4 GB system are reserved for large pages (assuming a large page size of 2048k, then 3g = 3 x 1024m = 3072m = 3072 \* 1024k = 3145728k, and 3145728k / 2048k = 1536): # echo 1536 > /proc/sys/vm/nr_hugepages Note the /proc values will reset after reboot so you may want to set them in an init script (e.g. rc.local or sysctl.conf). Also, internal testing has shown that root permissions may be necessary to get large page support on various flavors of Linux, most notably Suse SLES 9.

Windows

Only Windows Server 2003 supports large page memory. In order to use it, the administrator must first assign additional privilege to the user who will be running the application: # select Control Panel -> Administrative Tools -> Local Security Policy # select Local Policies -> User Rights Assignment # double click "Lock pages in memory", add users and/or groups # reboot the machine As always, every application is different and true performance is always defined by each individual running their own application. If you run into problems or have questions about Java performance visit the java.net performance forum or feel free to send me a comment.

Monday Feb 27, 2006

High Performance Java on Sun CoolThread Servers

Back in December when Sun's CoolThread Servers were announced, I wrote a similar blog entry comparing the Sun Fire T1000 and T2000 SPECjbb2005 scores to our competitor's SPECjbb2005 scores on 1U, 2U, and 4U systems. Below is updated data, along with space and power data using the SWaP metric. The Sun Fire T1000 scores are phenomenal!. All run with Sun J2SE 5.0._06. with HotSpot JVM technology. Interested in finding out for yourself? Go here to try a Sun Fire T2000 free for 60 days. Take a look at the the chart below. The Sun T2000 surpasses all other competition in the 2U and 4U space. How are these results comparable? Its simple, compare the raw throughput SPECjbb2005 bops score. One may ask: "How can you compare a 8 core / 32 thread box to a 4 core / 8 thread Power 5+?". Its easy. Chip and core counts are steadily becoming irrelavent. What really matters is how much work (throughput) a system can achieve and how much is that system going to cost to run. This includes lab space, power, and cooling costs. Below is a system comparison using the SWaP--the Space, Watts and Performance (SWaP) metric. The SWaP metric is defined as follows: How about scalability? Here's a good example of how the Sun Fire T2000 and the UltraSPARC T1 processor scales from 1 to 32 threads. Each SPECjbb2005 warehouse is a new thread. Throughput steadily increases as new threads are added, peaking at 32. Fine print SPEC disclosure: SPECjbb2005 Sun Fire T1000 (1 chip, 8 core, 32 threads) 51,540 bops, 12,885 bops/JVM, Sun Fire T2000 (1 chip, 8 core, 32 threads) 63,378 bops, 15,845 bops/JVM, IBM eServer p5 520 (2 chips, 2 cores, 4 thread) 32,820 bops, 32,820 bops/JVM, IBM eServer p5 510 (2 chips, 2 cores, 4 thread) 32,820 bops, 32,820 bops/JVM (referenced on IBM benchmark website), AMD Tyan white box (2 chips, 4 cores, 4 thread) 44,574 bops, 44,574 bops/JVM, IBM eServer p5 550 (4 chips, 4 cores, 4 thread) 61,789 bops, 61,789 bops/JVM . SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of February 27, 2006. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Thursday Feb 23, 2006

Sun Fire E25K and J2SE 5.0_06 SPECjbb2005 World Record

The Sun Fire E25K running J2SE 5.0_06 now holds the overall world record running SPECjbb2005! Hot off the presses, here's the new world record result: 1,164,995 SPECjbb2005 bops, 32,361 SPECjbb2005 bops/JVM. This result beats the recently announced result from Fujitsu for the PRIMEPOWER 2500 with SPARC64 V. Once again the combination of Sun's world class enterprise server architecture, the Ultra SPARC IV+ processor, and Sun J2SE 5.0_06 with HotSpot JVM technology team up to prove once again world class performance and scalability with the SPECjbb2005 benchmark. Very, very impressive. As a designer and developer of this benchmark I found it hard to envision the day where the SPECjbb2005 bops score would breach 1 million. The day is here and much sooner than I could have ever anticipated. These are exciting times for Java performance (and there's more performance optimizations coming soon!) Stay tuned for more information on this latest world record. The BMSeer has a excellent competitive overview of this result, the price performance of the Sun Fire E25K is quite impressive compared to our competition $$ (add an extra $ for IBM). (Hey BMSeer, next time you won't beat me to the punch announcing our latest SPECjbb2005 world record!!). Fine print SPEC disclosure: SPECjbb2005 Sun Fire E25K (72-way, 72 chips, 144 cores) 1,164,995 SPECjbb2005 bops, 32,361 SPECjbb0205 bops/JVM submitted for review, Fujitsu PRIMEPOWER 2500 (128 chips, 128 cores) 1,157,619 SPECjbb2005 bops, 72,351 SPECjbb2005 bops/JVM. SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of February 23, 2006. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Wednesday Feb 22, 2006

Sun HotSpot J2SE 5.0_06 Crushes BEA JRockit Running SPECjbb2005

(The following is a resubmission of a blog entry from February 10, 2006 with a few comments and edits. Changes are noted below.) Looks like our friends from BEA JRockit are at it again. Take a look at the following blog entry from BEA. http://dev2dev.bea.com/blog/hstahl/archive/2006/01/new_specjbb2000_1.html First SPECjbb2000 is a 5 year old retired benchmark. Its time has past and SPECjbb2005 is its replacement. BEA loves to talk about SPECjbb2000, they obviously spent a lot of time optimizing for SPECjbb2000. The problem with JRockit is that they are optimized just for SPECjbb2000. If time was spent on optimizations for the real world they'd be able to maintain their competitive position with SPECjbb2005, right? The same applies for any other competitive benchmark (SPECjappserver2004, Scimark, and so on). The reality is much different, SPECjbb2000 is a special case for JRockit and performance gains there don't pan out in the real world. One more comment on SPECjbb2000. As I stated above the benchmark retired the beginning of January. Which JVM ended on top? Reading the BEA blog you'd assume it was BEA JRockit. Sun HotSpot J2SE 5.0_06 closed this benchmark as the final world record holder. Now lets move on, SPECjbb2000 is over. BEA JRockit tried to spin their current competitive situation in the best possible light, omitting many results that did not suit their smoke and mirrors argument. First, BEA positioned a fully configured 32-way, 32-core, 32-thread Itanium2 system against a partially configured 16-way, 32-core, 32-thread Sun Fire 6900 in an attempt to highlight JVM performance. These are completely different hardware platforms and any attempt to highlight JVM performance alone using these results is inaccurate. Comparing these results does give insight on throughput and scaling capacity but the comparison is at a system level and only demonstrates a JVMs capacity to fully utilize the underlying hardware platform. When comparing a fully configured mid-sized enterprise systems regardless of the platform, the Sun Fire 6900 (24-way, 48-core, 48-thread) beats the JRockit result hands down. 342,578 SPECjbb2005 bops, 28,548 SPECjbb2005 bops/JVM (Sun Fire E6900 with Sun JVM) 322,719 SPECjbb2005 bops, 40,340 SPECjbb2005 bops/JVM (Fujitsu PRIMEQUEST 480 with JRockit) Also, please review the SPECjbb2005 results page. A quick scan will show that Sun HotSpot holds the record for single and multi-instance results, more than doubling BEA's single JVM result, and tripling BEA's multi-instance result. Funny how BEA forgot to mention these results. http://www.spec.org/jbb2005/results/jbb2005.htmlTWO(2) JVMs on a 4 core box. They even use 2 JVMs on a 2-core box. That's absolutely ridiculous. Why would anyone choose to do this? The only reason is they can't beat HotSpot running a single JVM and have difficultly scaling this benchmark on small 2 and 4 core systems. HotSpot could easily beat these multi-instance results, but chances are we won't submit multi-instance SPECjbb2005 on configurations that don't match customer deployments. (Author's note: Since hindsight is always 20/20, the following is more specific than the above paragraph) Now onto the AMD based SPECjbb2005 results referred to in the BEA blog. I'm embarrassed for BEA because they had to use these results to talk about performance. Their 2-way, 2-core result uses TWO(2) JVMs on a 4 core box. They even use 2 JVMs on a 2-core box. That's absolutely ridiculous. Why would anyone choose to do this? The only logical reason is they can't beat HotSpot running a single JVM and have difficultly scaling SPECjbb2005 on small 2 and 4 core systems. HotSpot could easily beat these multi-instance results, but chances are we won't submit multi-instance SPECjbb2005 on configurations that don't match customer deployments. Here are the latest 2 and 4 core single instance SPECjbb2005 submissions on a Sun Fire X4200 running Windows, Linux, and Solaris. 49,097 SPECjbb2005 bops, 49,097 SPECjbb2005 bops/JVMSun Fire X4200 running Solaris 10 x64 47,437 SPECjbb2005 bops, 47,437 SPECjbb2005 bops/JVMSun Fire X4200 running Windows 2003 Server 43,076 SPECjbb2005 bops, 43,076 SPECjbb2005 bops/JVMSun Fire X4200 running Red Hat EL 4 Fine print SPEC disclosure: SPECjbb2005 Sun Fire X4200 on Solaris 10 (2 chips, 4 cores, 4 threads) 49,097 bops, 49,097 bops/JVM,SPECjbb2005 Sun Fire X4200 on Windows 2003 Server (2 chips, 4 cores, 4 threads) 47,437 bops, 47,437 bops/JVM, SPECjbb2005 Sun Fire X4200 on Red Hat EL 4 (2 chips, 2 cores, 2 threads) 43,076 bops, 43,076 bops/JVM, Fujitsu Limited PrimeQuest 480 (32 chips, 32 cores, 32 threads) 322,719 bops, 40,340 bops/JVM. SPECjbb2005 Sun Fire E6900 on Solaris 10 (24 chips, 32 cores, 32 threads) 342,578 bops, 28,548 bops/JVM. SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of February 22, 2006. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Friday Feb 17, 2006

Java Performance: Solaris 10 x86 vs. Linux

Solaris 10 screams running Java. Competitive benchmarks do a good job highlighting this,just take a look at the latest SPECjbb2005 and SPECjappserver2004 results. I have noticed some fundamental differences in "Out of the Box" tuning when comparing Solaris and Linux. When running Java server applications, Solaris 10 default tuning is general purpose and tuned for moderate thread counts similar to a time shared system. This in many ways is an indication of the maturity of the platform. Linux, on the other hand, is specfically tuned for high thread counts and performance suffers when running low thread counts. A good example of this behavior can be seen comparing SPECjbb2005 results. Below are two results run on the exact same hardware, only differing the OS and minor JVM tuning (the heap tuning has minimal performance impact). SPECjbb2005 on Sun Fire X4200 running Solaris 10 Update 1, 49,097 SPECjbb2005 bops, 49,097 SPECjbb2005 bops/JVM SPECjbb2005 on Sun Fire X4200 running Red Hat EL 4, 43,076 SPECjbb2005 bops, 43,076 SPECjbb2005 bops/JVM Running SPECjbb2005 on identical hardware with optimal tuning parameters Solaris 10 is 14% faster than Linux. SPECjbb2005 on small x64 hardware runs only a moderate number of threads, in the above example to peak application thread count is 8. What tuning can be applied when running high thread counts on Solaris 10 x86? Here's two quick tuning steps you can try with your application. 1. If you're running many threads and performing socket I/O, try libumem.so. When launching your application within a shell script, set the following environment variable. LD_PRELOAD=/usr/lib/libumem.so;export LD_PRELOAD 2. Tune the Solaris scheduler. Simple scheduler tuning can yield significant performance gains, especially with highly threaded short lived applications. Try the FX scheduling class: priocntl -c FX -e java class_name Try the IA scheduling class: priocntl -c IA -e java class_name Every application is different and true performance is always defined by each individual running their own application. If you run into problems or have questions about Java on Solaris performance visit the java.net performance forum or feel free to send me a comment. Fine print SPEC disclosure: SPECjbb2005 Sun Fire X4200 on Solaris 10 (2 chips, 4 cores, 4 threads) 49,097 bops, 49,097 bops/JVM, SPECjbb2005 Sun Fire X4200 on Red Hat EL 4 (2 chips, 2 cores, 2 threads) 43,076 bops, 43,076 bops/JVM. SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of February 17, 2006. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Wednesday Feb 15, 2006

Java SE 6 Beta is Released!

Hey Look, Java SE 6 ("Mustang") has gone Beta! http://java.sun.com/javase/6/download.jsp Huge performance improvements, slick client improvements (love the font smoothing!), and a plethora of other features make this our best beta release to date. Give it a try and let us know what you think. As always, please let us know if you run into issues or regressions. Go to the Java SE 6 Regressions Challenge Page if you identify a regression for a chance to win a Sun Ultra 20 Workstation. For performance issues and questions visit the java.net performance forum.

Tuesday Feb 14, 2006

Java SE Tuning Tip: Server Ergonomics on Windows

J2SE 5.0 Server Ergonomics is not on by default on Windows. The basic reasoning here is that Windows is largely a client platform and automatic server tuning may negatively impact startup performance. We are revisiting this for Mustang, but for now do the following to enable server ergonomics on Windows: 1). Specify JVM tuning options equivalent to server ergonomics java -server -Xmx1g -XX:+UseParallelGC 2). Check to make sure server ergonomics is enabled by checking the JVM version: $ java -server -Xmx1g -XX:+UseParallelGC -version java version "1.6.0-rc" Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-rc-b69) Java HotSpot(TM) Server VM (build 1.6.0-rc-b69, mixed mode) If you see "Server VM", you're ready to test.

Thursday Feb 02, 2006

SPECjbb2005: A Valid Representation of Java Server Workloads

I was reading some of the other blogs at Sun and noticed some entertaining comments on BMSeer's blog. In particular the comments on the entry titled Sun head-to-head wins again: SPECjbb2005. Specifically the set of comments is from Robin (basspetersen@yahoo.com). Robin apparently works for or has close association with HP. Hello Robin, I hope you are reading this. Robin doesn't feel that SPECjbb2005 represents real world Java server applications and workloads, mostly because it doesn't stress the network or I/O subsystems. I strongly disagree and feel that SPECjbb2005 is a valid representation of Java server workloads and has already had a significant impact on JVM and Java SE performance. Here's a few quotes from Robin's comments: "It looks like HP is the only company smart enough to stay out of this benchmark game, with no relevance in the real world." ... "JBB pretends to measure the server-side performance of Java runtime environments but it is not at all representative of a real workload. Running unrealistic workloads to measure performance is a disservice to customers." This statement is a bit naive. SPECjbb2005 has significant features that highlight its relevance to real world workloads. First, garbage collection is part of the measurement interval. SPECjbb2000 called a System.gc() before each measurement interval to ease the impact of GC on the score. This was somewhat necessary to have the benchmark scale back in 2000, not the case now. Garbage collection is fully a part of this benchmark, large GC pauses significantly impact benchmark scores. Second XML DOM L3 is part of the benchmark, will 20% of the workload in DOM tree creation and manipulation. Parsing is not included in order to avoid I/O bottlenecks. Third, the benchmarks must run with thread counts (warehouses) 2X the number of hardware threads on the system. A 4-way must run to 8 warehouses. A 32-way must run 64 warehouses. When did managing 64-threads become trivial and not impacted by system performance? Fourth, many of the optimizations and performance work that started with SPECjbb2005 had direct impact on customer and Java EE benchmark performance. Take a look at the latest SPECjappserver2004 world record. BEA WebLogic Server 9.0 on Sun Fire T2000 Cluster running Sun J2SE 5.0_06 Sun's HotSpot J2SE 5.0_06 was the JVM for this benchmark result, the same JVM which currently holds many, many major performance records on SPECjbb2005. If performance optimizations targeted for SPECjbb2005 have direct impact on Java EE benchmarking, how again is SPECjbb2005 irrelevant? "In my opinion HP does not want to give credit to a bad benchmark by publishing results. Why should they give you the satisfaction of jumping off the bridge after you? Clearly HP thinks the benchmark is not important." HP was on the core development team of SPECjbb2005. Take a look at one of my first blog entries announcing SPECjbb2005. Why would HP think a benchmark was not important or irrelavant when they put resources on the development of the benchmark? . Fifth, I/O and network were purposely left out of the benchmark to concentrate on JVM, OS, and Hardware performance. The benchmark heavily stresses the memory subsystem with large Java heaps and high memory allocation counts. The OS needs to manage many threads and possibly many processes effectively for high performance. SPECjbb2005 stresses JVM, OS, and Memory, it is a complete system benchmark concentrating on Java server performance. Lastly, I would like to see HP submit SPECjbb2005 numbers, competition leads to innovation and performance optimization that benefits customers. Chances are HP is plugging away working to improve their HotSpot implementation, preparing for the day they will submit a result.

Wednesday Jan 25, 2006

Sun Fire E6900 and Hotspot dominate SPECjbb2005 under 32 CPUs

The Sun Fire E6900 (24 chip) takes the lead running SPECjbb2005 on configurations with 32 chips or less with a score of 342,578 bops. This score surpasses the previous high score of 322,719 bops run on a Fujitsu Prime Quest 480 (32 chip!). Why is this result interesting? First, the Sun Fire E6900 surpasses all other competitors in this space, faster than the IBM P5 570 and the Fujitsu Prime Quest 480. Second, and most importantly to me, this is the first of many results that highlight the performance of Sun Hotspot J2SE 5.0_06. Today's a good day for Sun Java Performance. SPEC Footnote: SPECjbb2005 Sun Fire E6900 (24-way, 24 chips, 48 cores) 342,578 bops, 28,548 bops/JVM submitted for review, Fujitsu PRIMEQUEST 480 (32 chips, 32 cores) 322,719 bops, 40,340 bops/JVM, IBM eServer p5 570 (8 chips, 16 cores, 16-way) 244,361 bops, 30,545 bops/JVM. SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 01/23/06 on www.spec.org.

SPECjbb2005: Single Instance vs. Multiple Instance Competitive Comparisons

SPECjbb2005 can be run in single and multiple-instance modes. Single instance is where one JVM runs the benchmark on a single system. Multiple instance is where n JVMs run in parallel, with the benchmark load distributed between the separate JVM processes. SPECjbb2005 also has two equally important metrics. SPECjbb2005 bops (business operations per second) is a measure of overall system throughput, and SPECjbb2005 bops/JVM, which is a measure of JVM performance and scalability. Single instance results target hardware, OS, and highlight JVM performance and scalability. The multiple instance results target hardware, OS, JVM performance and scalability, and highlight total system throughput. Both single and multi instance configurations of SPECjbb2005 can provide a sense of hardware, OS, and JVM performance and scalability. However, single instance configurations put more focus on the throughput delivered by the JVM, where as multi instance configurations put more focus on total throughput delivered by the system. When multiple instance configurations demonstrate higher throughput than single instance configurations, it's usually an indication that there's either a JVM limitation, such as maximum heap size or 64-bit JVM performance, or that there's some hardware architectural aspect of the system that multiple JVMs can take advantage of, such as a NUMA memory architecture. A SPECjbb2005 performance comparison between two hardware platforms is a comparison of the highest bops score as a measure of overall system throughput. Whe comparing hardware platforms the comparison can be made regardless of the benchmark configuration, but its important that you choose a configuration type that matches the deployment characteristics of your system as deployed in production. Most large MP servers with greater than 16 hardware threads are deployed with many, many JVM (or OS) instances, and customers are concerned with complete system throughput and scalability. The comparison is system throughput, not necessarily software component performance, but often JVM scalability is a factor considering each JVM must scale to 8 hardware threads or more. In this case the fastest results by hardware vendor A should be compared to the fastest results by hardware vendor B, with an eye to JVM scalability as measured by the bops/JVM metric. Small x86 or x64 systems with 8 or less cores are not typically deployed with more than one JVM. Customers are concerned with total system throughput but also efficient system utilization by their Java server software and the JVM. The SPECjbb2005 single instance configuration is a good match for small systems with less than 8 hardware threads. SPECjbb2005 multiple instance results should not be used to compare systems with less than 8 hardware threads simply because those systems are not typically deployed in production in that fashion. Its the responsibility of the hardware and JVM vendor along with the benchmark submitter to hold the line on SPECjbb2005 configuration types and to ensure that the configuration type matches the system under test and more importantly how they are deployed in production. JVM performance comparisions using SPECjbb2005 are a bit different. In this case JVM performance and scalability are the concentration and are best demonstrated using the single instance SPECjbb2005 configuration. When comparing JVMs, multiple instance results can only be compared to other multiple instance results, and its best if each result was run with the same number of JVM instances. Single instance SPECjbb2005 results on large SMP systems can help give insight into performance capabilities of the JVM within given instruction set and the potential scalability characteristics on other supported platforms. The latest SPECjbb2005 score can be found at http://www.spec.org/jbb2005

Thursday Jan 19, 2006

Sun Hotspot Wins Best Java Virtual Machine

Sun J2SE has won JDJ Reader's Choice Best Java Virtual Machine Award. Take a look, its category #16. Congratulations Java Software!

Wednesday Jan 18, 2006

Sun Hotspot SPECjbb2000 World Record

The last results for SPECjbb2000 have been accepted at SPEC and its official, Sun Hotspot running on a Fujitsu PrimePower 25000 holds the end all SPECjbb2000 World Record!. As much as I personally disliked this benchmark (I've talked about it quite often), this result is more proof of the world class performance and scalability of Sun J2SE 1.5.0_06. Congratulations to Fujitsu Limited and the Sun Hotspot development team!

Monday Jan 09, 2006

SPECjbb2000 has finally retired

SPECjbb2000 has finally retired! SPECjbb2005 has replace SPECjbb2000 and the competitive landscape has changed drastically. Strangely, a particular JVM vendor who showed strong performance in SPECjbb2000 doesn't seem to do as well with SPECjbb2005. Hmmm. Gone are the days when a stunt JVM can make broad claims in world record performance based on a 5 year old benchmark. No more risky lock elision optimizations for 30% gains and special object ordering and prefetching because GC is outside the measurement intervals. Good riddance I say!

Tuesday Dec 13, 2005

Sun's Hotspot JVM = Industry Leading Performance

Sun's Hotspot JVM continues to demonstrate industry leading performance. Here's just a few examples where Hotspot shines. SPECjbb2005 Leading x64 on Opteron 2-core result; 27004 bops, 27004 bops/JVM; Sun Fire X4100 and Sun Fire X4200 Leading x64 on Xeon 2-core result; 28,314 bops, 28,314 bops/JVM Fujitsu Siemens Computers PRIMERGY TX300 S2 Leading x64 on Opteron 4-core result: 45,124 bops, 45,124 bops/JVM Sun Fire X4100 and Sun Fire X4200 Best of class 1U result; Sun Fire T1000, 51,540 bops, 12,885 bops/JVM; Results under review. Best of class 2U result; 63,378 bops, 15,845 bops/JVM; Sun Fire T2000, powered by UltraSPARC T1 SPECjappserver2004 SPECjappserver2004 World Record 6 Sun Fire T2000 servers SPECjappserver2004 Single J2EE Node World Record 1 Sun Fire T2000 server SciMark Top 3 submitted results; running Solaris, Linux, and Windows Please post comments and questions here or on the java.net performance forum sharing your experiences running Hotspot. Yes, I'd love to here success stories, but what is most important are those situations where performance wasn't what you expected. We are serious about Java performance here at Sun, and want to do what it takes to make every Java user satisfied with the performance of their application. We want to fix any and all performance issues you run into. We can and will continue to demonstrate industry leading performance, but what is most important is broad and reliable JVM performance which is defined individually with every user's application. Fine print SPEC disclosure: SPECjbb2005 Sun Fire X4200 (2 chips, 2 cores, 2 threads) 27004 bops, 27004 bops/JVM, Fujitsu Siemens Computers PRIMERGY TX300 S2 (2 chips, 2 cores, 4 threads) 28,314 bops, 28,314 bops/JVM, Sun Fire X4200 (2 chips, 4 cores, 4 threads) 45,124 bops, 45,124 bops/JVM, Sun Fire T1000 (1 chip, 8 core, 32 threads) 51,540 bops, 12,885 bops/JVM submitted for review, Sun Fire T2000 (1 chip, 8 core, 32 threads) 63,378 bops, 15,845 bops/JVM. SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of November 30, 2005. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

Monday Dec 12, 2005

Sun's Hotspot JVM = Reliable Performance

Take a look at the latest SPECappserver2004 World Record results. BEA Weblogic running on Sun Fire T2000 servers powered by UltraSPARC T1 processors and Sun J2SE 5.0_06. Thats right BEA's "Record setting Weblogic 9" set the world records running on Sun's Hotspot JVM. SPECjappserver2004 World Record (Multi-Node) SPECjappserver2004 World Record (2-Node) But how can this be? Sounds like BEA Weblogic relies on the cool performance and reliability of Sun's Hotspot JVM to achieve their world record performance on SPECjappserver2004.

Tuesday Dec 06, 2005

New Java Performance Tuning Whitepaper

Check out our new Java performance tuning whitepaper on java.sun.com. This has been on the Java performance group's to do list for very long time, thanks to Tom Marble for making this happen. There's nothing liking the kickin' performance the new UltraSPARC T1 processor, the Sun Fire T1000 and T2000 servers, and our latest update release J2SE 5.0_06 to give the needed kick in the pants to put out a tuning guide. This is a work in progress so your feedback is very much appreciated and needed. Thanks.

UltraSPARC T1 Screams Running Java

Sun has announced the new Sun Fire T1000 and T2000 servers today along with SPECjbb2005 benchmark results on these systems. What makes these results so special? They run the UltraSPARC T1 processor with 8 cores and 32 threads on a single chip. The performance of the UltraSPARC T1 systems easily surpasses performance on all other 1U, 2U, or 4U Systems. These results also leverage the high performance features in the newly released J2SE 5.0._06. Take a look at the the chart below. The Sun T2000 surpasses all other competition in the 2U and 4U space. The 1U Sun Fire T1000 leads the 1U results. How are these results comparable? Its simple, compare the raw throughput SPECjbb2005 bops score. One may ask: "How can you compare a 8 core / 32 thread box to a 4 core / 8 thread Power 5+?". Its easy. Chip and core counts are steadily becoming irrelavent. What really matters is how much work (throughput) a system can achieve, how much is that system going to cost to run, and how much lab space, power, and cooling will this system require. Looking at the above results with this in mind clearly shows why Sun UltraSPARC T1 systems are separate from the pack. Sun Fire UltraSPARC T1 much, much less expensive to run than is competitors. How about those Cool Threads! Here's the details on the configurations compared above: How about scalability? Here's a good example of how the Sun Fire T2000 and the UltraSPARC T1 processor scales from 1 to 32 threads. Each SPECjbb2005 is a new thread. Throughput steadily increases as new threads are added, peaking at 32. Fine print SPEC disclosure: SPECjbb2005 Sun Fire T1000 (1 chip, 8 core, 32 threads) 51,540 bops, 12,885 bops/JVM submitted for review, Sun Fire T2000 (1 chip, 8 core, 32 threads) 63,378 bops, 15,845 bops/JVM submitted for review, IBM eServer p5 520 (2 chips, 2 cores, 4 thread) 32,820 bops, 32,820 bops/JVM, AMD Tyan white box (2 chips, 4 cores, 4 thread) 44,574 bops, 44,574 bops/JVM, IBM eServer p5 550 (4 chips, 4 cores, 4 thread) 61,789 bops, 61,789 bops/JVM . SPEC™ and the benchmark name SPECjbb2005™ are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of November 30, 2005. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.

[ T: http://technorati.com/tag/NiagaraCMT ]

About

dagastine

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today