Sun SPARC Enterprise M9000 vs Sun Fire E25k - Datapoints
By mrbenchmark on Aug 20, 2007
As you know, Sun has a new high-end server available ; The Sun SPARC Enterprise M9000.
Detailed documentations can even be found here.
But how do you compare the performance of an E25k versus a M9000 ?
Tough question....so here are some datapoints before you start developping your own Customer Benchmark.
Let's start by a side-by-side chip comparison :
|Die size||356 sq mm||421 sq mm|
|Transistors||295 million||540 million|
|L1 I-cache||64 KB||128KB/core|
|L1 D-Cache||64 KB||128KB/core|
|on-chip L2-cache||2 MB||6 MB|
|off-chip L3-cache||32 MB||None|
Interesting but not necessary helpful. What is delicate to determine is how the very different memory architectures will influence the performance levels.The SPARC64 VI chips has 3 times more L2 cache but has 5.6 times less total chip cache. As we know, one popular workload was very much influenced by the addition of a L3 cache on the UltraSPARC-IV+ : Online Transaction Processing or OLTP.
A note on multi-threading : The two threads of the SPARC VI 64 processor are not designed to double the throughput of a single core. The goals are to minimize CPU core wait time and increase CPU core utilization. A critical piece of information is that the two threads share the two Translation Lookaside buffers.All of the results below have been obtained with the second thread disabled on each core as we obtained similar or better performance doing so. More info can be found here.
And of course this is only a chip comparison, let's learn more with a side-by-side view of the M9000 and E25k servers :
|Max HW threads||144||256|
|Max memory||1152 GB||2048 GB|
|Memory bandwidth||173 GB/s||737 GB/s|
|I/O bandwidth||36 GB/s||244 GB/s|
|Max internal disks||0||64|
|OS support||Solaris 9 or 10||Solaris 10 U4|
|Power type||1 phase||1 or 3 phase|
|Max Power||30.6 kW||42.6 kW|
With this table, we certainly have a better idea of the immense capacity of this servers, but it still does not help us to estimate performance. Now, I did not have the luxury to test two fully loaded servers...so here is what I tested. The big decision was to use the same number of CPU boards on each system (called CMUs on the M9000).
So here is the tested hardware :
System clock freq.
SPARC-VI 64 server
16@ 2280MHz (32 cores)
16@ 1800MHz (32 cores)
56x73GB 15k drives
Regarding performance, the first metric we can look at is the basic CPU frequency ratio.
This value is a good starting point to base our expectation even if we know that comparing frequency on different chips has little meaning.
|Frequency||2280 Mhz||1800 Mhz|
Can we conclude that we will observe a 1.26 speed up if we upgrade our current 4 -boards E25k with a 4-CMUs M9000 ?
Not exactly.So let's try to be a little bit more specific using five different 100% Java (1.6) workloads :
- iGenCPU v3 - Fractal simulation 50% Integer / 50% floating point
- iGenRAM v3 - Lotto simulation (Memory allocation and search)
- iGenBATCH v2 - (Oracle 10g batch using partionning, triggers,
stored procedures and sequences)
- iGenOLTP v4 - (Heavy-weight OLTP)
The values showed hare are peak results obtained by building the complete scalability curve. The response times mentioned are average, at peak and in Milliseconds.
|Throughput||RT (ms)||Throughput||RT (ms)|
|iGenCPU v3||303 fractals/second||105||728 fractals/second||44|
|iGenRAM v3||2865 lottos/ms||55||4881 lottos/ms||17|
|iGenBatch v2||35 TPS||907||50 TPS||626|
|iGenOLTP v4||3938 TPM||271||4500 TPM||351|
As we are trying to compare to the M-value 1.33 factor, let's look at those results by giving a factor 1 to the E25k.
First, here is throughput :
Which would be this chart :
Performance notes on throughput
- As you can see, for pure CPU calculations, the M9000 is 2.4 times more powerful than the E25k. Way beyond the M-value.
- Memory allocation & access time are really faster on the
M9000 causing a 1.7 times increase in Throughput.
- Only one index is below the M-value : OLTP. It seems that
the large reduction in total chip cache (all levels) has a big
impact on this workload.
And here is the average reponse time at peak throughput (still using a base 1 for the E25k) :
And the chart :
Performance notes on response time
- The CPU & RAM micro-benchmarks show very impressive
improvements in response time. What takes 1s on the E25k, takes about
400ms on the M9000 at peak throughput.
- Because of the richness of the batch benchmark and the inclusion of CPU intensive Oracle stored procedures, we observe a nice factor of 0.69
- Oracle OLTP is disappointing on the M9000 with an increase in
response time at peak throughput. Upcoming release of Solaris and
Oracle 10g should improve this result.
As you can notice from te diversity of this factors, we should be really busy in the Sun Solution Center - Customer benchmarking group. There is no magic number...and yes, it is only by testing your own application that you will obtain the relevant numbers.
See you next time in the wonderful world of benchmarking....