Java Rocks Intel Nehalem

Today Intel announced its next generation server processors, the Xeon 5500 series, codename Nahalem.

This processor has been on our radar for performance optimizations since 2007, and the Sun and Intel Java VM and Performance teams have pushed the new chip to higher and higher levels.

Today I'm pleased to announce world class SPECjbb2005 results running Sun Java 6 Update 14 64-bit Server Performance Release on a Sun branded system with 2 Intel Xeon X5570 CPUs with a score of 556,882 SPECjbb2005 bops, 278,411 SPECjbb2005 bops/JVM.  Details on the system configuration will be available later this month.

This proves once again that Sun Java powered by the HotSpot Server VM is the fastest, most reliable, and most widely deployed,  open source Java Runtime in the world.  World class reliability and performance and still open source, we can beat the proprietary competition even though they can use all of our source code. 

World Class Performance on 2-Chip Systems: 556,882 SPECjbb2005 bops, 278,411 SPECjbb2005 bops/JVM.





Here's What we've done for Nehalem
  • 64-bit Compressed Oops Performance Optimization: 64-bit now surpasses 32-bit performance.
  • Optimized class libraries to reduce cache misses and decrease path length
  • NUMA-aware Garbage Collection
  • XML library Optimizations
  • SSE 4.2 Support
Congratulations to the Sun Java VM Technologies team and the Intel Collaboration Team!

SPEC Disclosure Statement
SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Sun Fire X4450
results submitted to SPEC. Other results as of 03/30/2009 on www.spec.org
. Sun Branded Intel Xeon X5570 System (2 chips, 8 cores, Sun JDK 6u14-P) SPECjbb2005 bops = 556,882, SPECjbb2005 bops/JVM =278,411.

Comments:

Hi,

Great work on having the performance enhancements ready on the launch of the new processor. It's also great to see the 64-bit JDK finally surpassing the 32-bit one in terms of performance (for some benchmarks at least).

"Optimized class libraries to reduce cache misses and decrease path length"

Do you have any additional information on this (e.g. what areas of the library were changed)?

"NUMA-aware Garbage Collection"

I assume the throughput collector was used?

Thanks,
Ismael

Posted by Ismael Juma on March 30, 2009 at 12:26 PM EDT #

We modified several class libraries to reduce path length. The biggest changes to reduce cache misses were in the collection classes.

The NUMA changes were in the throughput collector, however many of the ideas will find their way into G1. The changes were to the young generation and the survivor spaces. We'll have an impressive single JVM result to talk about in a couple of weeks.

Posted by David Dagastine on March 30, 2009 at 12:36 PM EDT #

"we can beat the proprietary competition even though they can use all of our source code."

With GPL licensing for OpenJDK, isn't this a fairly inaccurate statement?

Posted by naveed on March 30, 2009 at 02:57 PM EDT #

Those are very impressive numbers. The NUMA changes are surely more to do with allocation - creation of lgrps - than collection?

It strikes me that this benchmark's biggest failing (although there are many) is that it has a tiny data set. When run with Sun/OpenJDK compressed oops - or jrockit x86-64 with less than 4 gbytes of heap - the default working data set fits into cache.

This means that SPECJBB2005 is not remotely representative of real-world server-side workloads where the data set simply cannot be expected to be that small.

Is there any chance we can get a "SPECJBB2005a" that increases the item count from 20,000 to say 200,000 or more? That would make the benchmark much more representative of real-world large-heap server side apps.

Posted by Paul Murray on March 30, 2009 at 07:37 PM EDT #

Well, Nehalem still looks a \*lot\* like AMD's quad cores, too much for my taste.

I find it a bit disappointing that a company as large/rich as Intel does almost a 1:1 copy of a very small competitor's product. (memory controler on-die, interconnect busses instead of FSB, 3-tier cache architecture).

Nontheless, this finally solves many of the scaling problems earlier Core-based Xeons experienced.

Posted by Clemens Eisserer on March 30, 2009 at 10:51 PM EDT #

Thank you for the answers David.

Clemens, Nehalem may look like AMD quad cores on the surface, but they are a lot better in reality. There are plenty of benchmarks around, for one example, see:

http://it.anandtech.com/IT/showdoc.aspx?i=3536

The things you mention are logical progressions of an architecture. AMD itself borrowed many ideas from the DEC Alpha EV7.

Having said that, AMD should be applauded for bringing these enhancements to the x86 architecture (including the introduction of x86-64) as Intel might have pushed us towards the Itanium otherwise.

Best,
Ismael

Posted by Ismael Juma on March 30, 2009 at 11:21 PM EDT #

Sorry I forgot to add that the mods to the collections stuff seems to be generally very useful - they double the numbers from the derby benchmark on SPECJVM2008. Do you have a feel for when they will be production-ready?

Posted by Paul Murray on March 31, 2009 at 12:29 AM EDT #

"World class reliability and performance and still open source"
Congrats on the numbers and on being free software/open source.

I couldn't immediately find the source changes for the collection stuff you mention. Are they already in the openjdk repository?

Posted by Mark Wielaard on March 31, 2009 at 02:08 AM EDT #

A few responses to comments:
naveed, yes, GPL requires that changes made must be given back. However the proprietary competition are also Java licensees, which gives them unrestricted access to our source code while in development.

Paul Murray, we understand the limitations of SPECjbb2005, especially when running small warehouse counts such as with this configuration. However on large configurations, Sun CMT systems for instance, its a much larger data set and is certainly out of L2. SPEC plans to fix this issue in the next generation of the benchmark. Also, yes, I agree the collections work is very useful. All library work will be included in JDK 6 Update 14. The only difference with JDK 6 Update 14 Performance is the JVM.

Posted by David Dagastine on March 31, 2009 at 02:14 AM EDT #

Thanks for the reply David. You say: "Also, yes, I agree the collections work is very useful. All library work will be included in JDK 6 Update 14. The only difference with JDK 6 Update 14 Performance is the JVM."

Could you clarify how the Performance and Update 14 releases relate to the OpenJDK code? Does this mean that the library work and the JVM changes are already in OpenJDK, or are they only going to be in the proprietary fork?

Posted by Mark Wielaard on March 31, 2009 at 05:20 AM EDT #

The library changes need additional work to integrate into OpenJDK, but that work is in progress now. The JVM changes are already in, or are about to go into OpenJDK. No proprietary fork, its all going into OpenJDK.

Posted by David Dagastine on March 31, 2009 at 07:36 AM EDT #

how do the other jvms perform on the same hardware setup? I am sure you could overclock the Nahalem and break the record again but this won't tell us how much faster the JVM actually is. A comparison between latest 1.5, 1.6 and performance release would be interesting.

Posted by mbien on March 31, 2009 at 10:25 AM EDT #

I'm lucky enough to have a Nehalem demo box in our lab that I'm going to be evaluating later in the week. On our 'old' X5450 Xeons I'm using Sun JRE 6u12. What JRE/JDK would you recommend for performance testing the Nehalem chips? For our production systems we'll be sticking with the Sun JRE for the foreseeable future, but I'm flexible for the purposes of evaluation. We have a fairly high throughput application (allocating ~300MB/sec) and we would like to see how it responds to the new architecture. Thanks for your help.

Posted by Jeff Hiltz on April 14, 2009 at 08:47 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

dagastine

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today