Comparative data of ORACLE 10g on SPARC & SOLARIS 10

Oracle 10g OLTP performance on SPARC chips








A boring ratio

Customers would love to have their performance levels linked to their hardware. But more often than you think, they migrate from System X (designed 10 years ago) to System Y (fresh from the oven) and are surprised with the performance improvements. In the past two years, we have completed many successful migrations from F15k/E25k servers to new Enterprise Servers M9000. Customer have reported great improvements in throughput and response time. But what can you really expect and what percentage of the improvement is actually due to the operating system enhancement ? Can the recent small frequency increase on our SPARC64 VII chipset be at all interesting ? The new SPARC64 VII 2.88Ghz available on our M8000 and M9000 flagships propose no architectural change, no additional features and a modest frequency increase going from 2.52 Ghz to 2.88 Ghz - for a ratio of 1.14. We could stop our analysis there and label this change 'marginal' or 'not interesting'. But my initial testings showed a comparative OLTP peak throughput to be way higher than this frequency-based ratio.



What happened ?












A passion for Solaris

Most of the long term Sun employees have a passion for Solaris. Solaris is the uncontested Unix leader and include such a huge amount of features that when you are a Solaris addict, it is difficult to get in love with another Operating System. And Oracle executives made no mistake : Sun has the best UNIX kernel & performance engineers in the world. Without them, Solaris would not scale today to a 512 hardware thread system (M9000-64).

But of course, Solaris is a moving target. Every release brings its truck load of features, bug fixes and other performance improvements. Here are critical fixes done between Solaris 10 Update 4 and the brand new Solaris 10 Update 8 influencing Oracle performance on the M9000 :

  • In Solaris 10 Update 5 (05/08), we optimized interrupt management ( cr=5017144), math operations (cr=6491717). We also streamlined CPU yield (cr=6495392) and cache hierarchy (cr=6495401).

  • In Solaris 10 Update 6 (10/08), we optimized libraries and implemented shared context for Jupiter (cr=6655597 & 6642758)

  • In Solaris 10 Update 7 (05/09), we enhanced MPXIO as well as the PCI framework (cr=6449810 and others) and improved thread scheduling (cr=6647538). We also enhanced Mutex operations (cr=6719447).

  • Finally, in Solaris 10 Update 8 , after long customer escalations, we fixed the single threaded nature of callout processing (cr=6565503-6311743). [This is critical for all calls made to nanosleep & usleep.] We also improved the throughput & latency of the very common e1000g driver (cr=6335837 + 5 more) and optimized the mpt driver (cr=6784459). We cleaned up interrupt management (cr=6799018) and optimized bcopy and kcopy operations (cr=6292199). Finally, we improved some single threaded operations (cr=6755069).

My initial SPARC64 VII iGenOLTP tests were done with Solaris 10 Update 4. But I could not test the new SPARC64 VII 2.88Ghz with this release because it was not supported ! Therefore, I had to compare the new chip performance to SPARC64VII 2.52Ghz using each S10U4 and S10U8. We will see below that most of the improvements are not coming from the frequency increase but from Solaris itself.






Chips & Chassis

Please find below , the key characteristics of the chips we have tested :



Chips

UltraSPARC IV+

SPARC64 VI

SPARC64 VII

SPARC64 VII (+)

Manufacturing

90nm

90nm

65nm

65nm

Die size

356 sq mm

421 sq mm

421 sq mm

421 sq mm

Transistors

295 million

540 million

600 million

600 million

Cores

2

2

4

4

Threads/core

1

2

2

2

Total threads

2

4

8

8

Frequency

1.5 Ghz

2.28 Ghz

2.5Ghz

2.88Ghz

L1 I-cache

64 KB

128 KB/core

512 KB

512 KB

L1 D-cache

64 KB

128 KB/core

512 KB

512 KB

On-chip L2

2 MB

6 MB

6 MB

6 MB

Off-chip L3

32 MB

None

None

None

Max Watts

56 W

120 W

135 W

140 W

Watts/thread

28 W

30 W

17 W

17 W



Note on (+): The new SPARC64 VII is not officially labeled with a plus sign in order to reflect the absence of new features.





Now, here is our hardware list. Note that to avoid the need for a huge Client system, we ran this iGenOltp workload in a Console/Server mode. It means that the Java processes sending SQL queries via JDBC are running directly on the server tested. While this model was unusual ten years ago in the era of Client/Server, it is more and more commonly found today in new customer deployments.



Servers

E25k

M9000-32

M9000-32

M9000-32

Chip

UltraSPARC-IV+

SPARC64 VI

SPARC64 VII

SPARC64 VII+

# chips

8

8

8

8

Total hardware threads

16

16

32

32

Frequency

1.5 Ghz

2.28 Ghz

2.52 Ghz

2.88 Ghz

System Clock

150 Mhz

960 Mhz

960 Mhz

960 Mhz~

RAM

64 GB

64 GB

64 GB\*

64 GB\*

Operating System

Solaris 10 Update 4

Solaris 10 Update 4

Solaris 10 Update 4 & 8

Solaris 10 Update 8






Console system


Storage

SE9990V


X4240


[shared]

64 GB cache


Opteron quad-core



25 TB


2x2.33Ghz



200 Hitachi HDD





15k RPM





8x2Gbit/s




Note on (~): While the system clock has not changed, the new M9000 CMUs are equipped with an optimized Memory Access Controller labeled MAC+. The MAC+ chip set is critical for system reliability, in particular for the memory mirroring and memory patrolling features. We have not identified performance improvements linked to this new feature.

Note on (\*): Those domains have 128GB total memory. To compare apple-to-apple, 64GB of memory are allocated, populated and locked in place with my very own _shmalloc tool.






Chart

The iGenOLTPv4 workload is a Java-based lightweight OLTP database workload. Simulating a classic Order Entry system, it is tested in stream mode (I.e no wait time between transactions). For this particular exercise, we have created a very large database of 8 Terabyte total. This database is stored on the SE9990V using Oracle ASM. We query 100 million customer identifiers on this very large database in order to create an I/O intensive (but not I/O bound) workload similar to the largest OLTP installations in the world. (Example : the E25ks running the bulk load of Oracle internal applications). The exact throughput in number of transactions per second and average response times are reported and coalesced for each scalability level. For this test, we used Solaris 10 Update 4 & 8, Java version 1.6 build 16, and the Oracle database server 10.2.0.4




Performance notes :

  • In peak, the new SPARC64VII 2.88Ghz produce 1.10x OLTP throughput compared to the 2.52Ghz on S10U8.

  • But compared to the 2.52Ghz chips on S10U4, the ratio is 1.54x and compared to the SPARC64 VI it is 2.38x.

  • For a customer willing to upgrade a E25k equipped with 1.5Ghz chips, the throughput ratio is 4.125 ! It means that we can easily replace a 8 boards E25k with a 2 boards M8000 for better throughput and improved response times.

  • Average transaction response times in peak are 126 ms on the UltraSPARC IV+ domain, 87ms on the SPARC64 VI, 82 ms on the SPARC64VII 2.52Ghz (U4), 77 ms on the SPARC64 VII 2.52Ghz (U8) and 72 ms on the latest chip.





Conclusion

As expected, Oracle OLTP improvements due to the new SPARC64VII chip are modest using the latest Solaris 10. However, all the customer already in production using previous release of Solaris 10 will see throughput improvement up to 1.54x. Most likely, this is enough to motivate a refresh of their system. And all E25k customers have now a very interesting value proposition with our M8000 and M9000 chassis.


See you next time in the wonderful world of benchmarking....



Comments:

So a customer will get 90% of the higher performance numbers by upgrading Solaris on existing hardware rather than buying the new hardware... :-/

Posted by Network the dog on January 22, 2010 at 08:31 AM PST #

good points. to some extent, i'm disappointed at Solaris and don't intend to use it anymore.

Posted by abercrombie fitch online on January 22, 2010 at 09:15 PM PST #

Nice work but confused on the configuration. The threads per core on SPARC64 are 2, but you are showing 1 per core based on the number of physical chips in your test configurations. Is this a typo or did you disable half the threads?

Posted by Ghost of Jonathan Schwartz on October 05, 2010 at 11:11 PM PDT #

SPARC64 number of physical chips in not test.

Posted by Egitim on December 09, 2010 at 10:49 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder