Sun SPARC Enterprise M9000 vs Sun Fire E25k - Datapoints

<big>M9000 vs E25k</big> By : Mr benchmark - Sun Solution Center

As you know, Sun has a new high-end server available ; The Sun SPARC Enterprise M9000.
Detailed documentations can even be found here.
But how do you compare the performance of an E25k versus a M9000 ?
Tough question....so here are some datapoints before you start developping your own Customer Benchmark.


Chips

Let's start by a side-by-side chip comparison :

Chips UltraSPARC-IV+ SPARC64 VI
Manufacturing 90 nm 90nm
Die size 356 sq mm 421 sq mm
Transistors 295 million 540 million
Cores 2 2
Threads/Core 1 2
Frequency 1.8Ghz 2.28Ghz
L1 I-cache 64 KB 128KB/core
L1 D-Cache 64 KB 128KB/core
on-chip L2-cache 2 MB 6 MB
off-chip L3-cache 32 MB None


Interesting but not necessary helpful. What is delicate to determine is how the very different memory architectures will influence the performance levels.The SPARC64 VI chips has 3 times more L2 cache but has 5.6 times less total chip cache. As we know, one popular workload was very much influenced by the addition of a L3 cache on the UltraSPARC-IV+ : Online Transaction Processing or OLTP.

A note on multi-threading : The two threads of the SPARC VI 64 processor are not designed to double the throughput of a single core. The goals are to minimize CPU core wait time and increase CPU core utilization. A critical piece of information is that the two threads share the two Translation Lookaside buffers.All of the results below have been obtained with the second thread disabled on each core as we obtained similar or better performance doing so. More info can be found here.

And of course this is only a chip comparison, let's learn more with a side-by-side view of the M9000 and E25k servers :

Servers

Systems E25k M9000
Max processors 72 64
Max cores 144 128
Max HW threads 144 256
Max memory 1152 GB 2048 GB
Memory bandwidth 173 GB/s 737 GB/s
I/O bandwidth 36 GB/s 244 GB/s
Max internal disks 0 64
Max domains 18 24
OS support Solaris 9 or 10 Solaris 10 U4
Media None DVD, DAT
Power type 1 phase 1 or 3 phase
Max Power 30.6 kW 42.6 kW

With this table, we certainly have a better idea of the immense capacity of this servers, but it still does not help us to estimate performance. Now, I did not have the luxury to test two fully loaded servers...so here is what I tested. The big decision was to use the same number of CPU boards on each system (called CMUs on the M9000).


So here is the tested hardware :


Hardware Stack


Server

System clock freq.

Per Domain

Role 

Model

Qty

Sockets@Freq.

RAM

SPARC-VI 64 server

M9000-32

1

960 MHz

16@ 2280MHz (32 cores)

256GB

UltraSPARC-IV+ server

E25k

1

150 MHz

16@ 1800MHz (32 cores)

256GB

Console

X4200

1

800 Mhz

2@ 2600Mhz

8GB

Storage

SE6540

1

8xRAID1

56x73GB 15k drives

8GB





Frequency

Regarding performance, the first metric we can look at is  the basic CPU frequency ratio.
This value is a good starting point to base our expectation even if we know that comparing frequency on different chips has little meaning.

Server
             M9000        E25k
Frequency 2280 Mhz 1800 Mhz
Comparison 1.26 1

Can we conclude that we will observe a 1.26 speed up if we upgrade our current 4 -boards E25k with a 4-CMUs M9000 ?

Java workloads

Not exactly.So let's try to be a little bit more specific using five different 100% Java (1.6) workloads :
  1. iGenCPU v3 - Fractal simulation 50% Integer / 50% floating point
  2. iGenRAM v3 - Lotto simulation (Memory allocation and search)
  3. iGenBATCH v2 - (Oracle 10g batch using partionning, triggers, stored procedures and sequences)
  4. iGenOLTP v4 - (Heavy-weight OLTP)
Note : This workloads are available for your own usage. Just let me know at benoit@sun.com

Datapoints

The values showed hare are peak results obtained by building the complete scalability curve. The response times mentioned are average, at peak and in Milliseconds.



E25k
M9000

Throughput RT (ms) Throughput RT (ms)
iGenCPU v3 303 fractals/second 105 728 fractals/second 44
iGenRAM v3 2865 lottos/ms 55 4881 lottos/ms 17
iGenBatch v2 35 TPS 907 50 TPS 626
iGenOLTP v4 3938 TPM 271 4500 TPM 351

As we are trying to compare to the M-value 1.33 factor, let's look at those results  by giving a factor 1 to the E25k.

First, here is throughput :

Throughput E25k M9000
'iGenCPU v3 1 2.403
'iGenRAM v3 1 1.704
'iGenBATCH v2 1 1.450
'iGenOLTP v4 1 1.143
Frequency 1 1.26

Which would be this chart :

M9E25k_Thr


Performance notes on throughput

  1. As you can see, for pure CPU calculations, the M9000 is 2.4 times more powerful than the E25k. Way beyond the M-value.
  2. Memory allocation & access time are really faster on the M9000 causing a 1.7 times increase in Throughput.
  3. Only one index is below the M-value : OLTP.  It seems that the large reduction in total chip cache (all levels)  has a big impact on this workload.

And here is the average  reponse time at peak throughput (still using a base 1 for the E25k) :

RT E25k M9000
iGenCPU v3 1 0.419
iGenRAM v3 1 0.301
iGenBATCH v2 1 0.690
iGenOLTP v4 1 1.295


And the chart :

M9E25k_RT




Performance notes on response time

  1. The CPU & RAM micro-benchmarks show very impressive improvements in response time. What takes 1s on the E25k, takes about 400ms on the M9000 at peak throughput.
  2. Because of the richness of the batch benchmark and the inclusion of CPU intensive Oracle stored procedures, we observe a nice factor of 0.69
  3. Oracle OLTP is disappointing on the M9000 with an increase in response time at peak throughput. Upcoming release of Solaris and Oracle 10g should improve this result.

As you can notice from te diversity of this factors, we should be really busy in the Sun Solution Center - Customer benchmarking group. There is no magic number...and yes, it is only by testing your own application that you will obtain the relevant numbers.

See you next time in the wonderful world of benchmarking....



<script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct="UA-917120-1"; urchinTracker(); </script>
Comments:

I don't think you really mean those system max memory sizes in the System table do you?

Posted by James Mansion on August 20, 2007 at 07:16 PM PDT #

The system max memory sizes are correct.

Posted by MrBenchmark on August 21, 2007 at 02:22 AM PDT #

Actually E25K (the official name is not E25000) can have 1152GB of memory with the high-density dimms or 576 with the medium-density dimms. There are 18 boards not 16 boards so that is why it is more than just 1/2TB or 1TB.

Posted by BM Seer on August 21, 2007 at 06:20 AM PDT #

BM Seer;

Thanks for the detailed memory informations. It is now corrected...

Posted by MrBenchmark on August 21, 2007 at 07:05 AM PDT #

BM Seer you say "(the official name is not E25000)" then what the official name ?

Posted by Egitim on December 09, 2010 at 10:52 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder