Tuesday Apr 23, 2013

More SPARC T5 Performance Results

Performance results for the new SPARC T5 systems keep coming in...

Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark. This benchmark "is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community" according to SPEC.

All of the published results are at: http://www.spec.org/jbb2013/results/jbb2013multijvm.html.

For the first table below, I selected all of the max-JOPS results greater than 50,000 JOPS using the most recent Java version, for the SPARC T5-2 and for competing systems. From the SPECjbb2013 data, I derived two new values, max-JOPS/chip and max-JOPS/core. The latter value compensates for the different quantity of cores used in one of the tests. Finally, the "Advantage of T5" column shows the portion by which the T5-2 cores perform better than the other systems' cores. For example, on this benchmark a 32-core T5-2 computer demonstrated 15% better per-core performance than an HP DL560p with the same number of cores.

As you can see, a SPARC T5 core is faster than an Intel Xeon core, compared against competing systems with 32 or more cores.

Model CPU Chips Cores OS max-JOPS Date Published max-JOPS per chip max-JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 75658 April 2013 37829 2364
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 67850 April 2013 16963 2120 12%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 66007 April 2013 16502 2063 15%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 106141 April 2013 13268 1327 78%

The SPECjbb2013 benchmark also includes a performance measure called "critical-JOPS." This measurement represents the ability of a system to achieve high levels of throughput while still maintaining a short response time. The performance advantage of the T5 cores is even more pronounced.

Model CPU Chips Cores OS critical- JOPS Date Published critical- JOPS per chip critical- JOPS per core Advantage of T5
SPARC T5-2 SPARC T5 2 32 Solaris 11.1 23334 April 2013 11667 729
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 Windows Server 2008 R2 Enterprise 16199 April 2013 4050 506 44%
HP ProLiant DL560p Gen8 Intel E5-4650 4 32 RHEL 6.3 18049 April 2013 4512 564 29%
HP ProLiant DL980 G7 Intel E7-4870 8 80 RHEL 6.3 23268 April 2013 2909 291 151%

As always, care should be taken in choosing a benchmark that is similar to the workload that you will run on a computer. For example, if you plan to implement a database server, using the SPECint benchmark will not help you, because that benchmark merely measures the performance of the CPU cores and speed and size of memory caches (and perhaps the memory system). It does not measure performance of network or disk I/O, and both of those are important factors in database performance - especially storage I/O.

According to the SPECjbb2013 design document, this benchmark "exercises the CPU, memory and network I/O, but not disk I/O." Because of this, it can be used as a simple method to estimate relative Java processing performance. From the data shown in the tables above, it is clear that the newest SPARC cores deliver Java performance that is competitive with the most recent Intel Xeon CPU cores.

Edit [2013.04.23]: Jim Laurent uses the same benchmark results in a quick look at the smooth scalability of Solaris 11, compared to RHEL6.

For more information on recent SPARC T5 world records, see https://blogs.oracle.com/BestPerf/.

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 4/21/2013, see http://www.spec.org for more information.

Wednesday Dec 09, 2009

Virtual Overhead?

So you're wondering about operating system efficiency or the overhead of virtualization. How about a few data points?

SAP created benchmarks that measure transaction performance. One of them, the SAP SD, 2-Tier benchmark, behaves more like real-world workloads than most other benchmarks, because it exercises all of the parts of a system: CPUs, memory access, I/O and the operating system. The other factor that makes this benchmark very useful is the large number of results submitted by vendors. This large data set enables you to make educated performance comparisons between computers, or operating systems, or application software.

A couple of interesting comparisons can be made from this year's results. Many submissions use the same hardware configuration: two Nehalem (Xeon X5570) CPUs (8 cores total) running at 2.93 GHz, and 48GB RAM (or more). Submitters used several different operating systems: Windows Server 2008 EE, Solaris 10, and SuSE Linux Enterprise Server (SLES) 10. Also, two results were submitted using some form of virtualization: Solaris 10 Containers and SLES 10 on VMware ESX Server 4.0.

Operating System Comparison

The first interesting comparison is of different operating systems and database software, on the same hardware, with no virtualization. Using the hardware configuration listed above, the following results were submitted. The Solaris 10 and Windows results are the best results on each of those operating systems, on this hardware. The SLES 10 result is the best of any Linux distro, with any DB software, on the same hardware configuration.

Operating SystemDBResult (SAPS)
Solaris 10Oracle 10g21,000
Windows Server 2008 EESQL Server 200818,670
SLES 10MaxDB 7.817,380

(Note that all of the results submitted in 2009 cannot be compared against results from previous years because SAP changed the workload.)

With those data points, it's very easy to conclude that for transactional workloads, the combination of Solaris 10 and Oracle 10g is roughly 20% more powerful than Linux and MaxDB.

Virtualization Comparison

The virtualization comparison is also interesting. The same benchmark was run using Solaris 10 Containers and 8 vCPUs. It was also run using SLES 10 on VMware ESX, also using 8 vCPUs.

Operating SystemVirtualizationDBResult (SAPS)
Solaris 10Solaris ContainersOracle 10g15,320
SLES 10VMware ESXMaxDB 7.811,230


Some of the 36% advantage of the Solaris Containers result is due to the operating systems and DB software, as we saw above. But the rest is due to the virtualization tools. The virtualized and non-virtualized results for each OS had only one difference: virtualization was used. For example, the two Solaris 10 results shown above used the same hardware, the same OS, the same DB software and the same workload. The only difference was the use of Containers and the limitation of 8 vCPUs.

If we assume that Solaris 10/Oracle 10G is consistently 21% more powerful than SLES 10/MaxDB on this benchmark, than it's easy to conclude that VMWare ESX has 13% more overhead than Solaris Containers when running this workload.

However, the non-virtualized performance advantage of the Solaris 10 configuration over that of SLES 10 may be different with 8 vCPUs than with 8 cores. If Solaris' advantage is less, then the overhead of VMware is even worse. If the advantage of Solaris 10 Containers/Oracle over VMware/SLES 10/MaxDB with 8 vCPUs is more than the non-virtualized results, than the real overhead of VMware is not quite that bad. Without more data, it's impossible to know.

But one of those three cases (same, less, more) is true. And the claims by some people that VMware ESX has "zero" or "almost no" overhead are clearly untrue, at least for transactional workloads. For compute-intensive workloads, like HPC, the overhead of software hypervisors like VMware ESX is typically much smaller.

What Does All That Mean?

What does that overhead mean for real applications? Extra overhead means longer response times for transactions or fewer users per workload, or both. It also means that fewer workloads (guests) can be configured per system.

In other words, response time should be better (or maximum number of users should be greater) if your transactional workload is running in a Solaris Container rather than in a VMware ESX guest. And when you want to add more workloads, Solaris Containers should support more of those workloads than VMware ESX, on the same hardware.


Of course, the comparison shown above only applies to certain types of workloads. You should test your workload on different configurations before committing yourself to one.


For more detail, see the results for yourself.

SAP wants me to include the results:
Best result for Solaris 10 on 2-way X5570, 2.93GHz, 48GB:
Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 21,000 SAPS, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033.
Best result for any Linux distro on 2-way X5570, 2.93GHz, 48GB:
HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 17,380 SAPS, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006.
Result on Solaris 10 using Solaris Containers and 8 vCPUs:
Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034.
Result on SuSE Enterprise Linux as a VMware guest, using 8 vCPUs:
Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.
SAP, R/3, reg TM of SAP AG in Germany and other countries.

Addendum, added December 10, 2009:

Today an associate reminded me that previous SAP SD 2-tier results demonstrated the overhead of Solaris Containers. Sun ran four copies of the benchmark on one system, simultaneously, one copy in each of four Solaris Containers. The system was a Sun Fire T2000, with a single 1.2GHz SPARC processor, running Solaris 10 and MaxDB 7.5:

  1. 2006029
  2. 2006030
  3. 2006031
  4. 2006032

The same hardware and software configuration - but without Containers - already had a submission:

The sum of the results for the four Containers can be compared to the single result for the configuration without Containers. The single system outpaced the four Containers by less than 1.7%.

Second Addendum, also added December 10, 2009:

Although this blog entry focused on a comparison of performance overhead, there are other good reasons to use Solaris Containers in SAP deployments. At least 10, in fact, as shown in this slide deck. One interesting reason is that Solaris Containers is the only server virtualization technology supported by both SAP and Oracle on x86 systems. <script type="text/javascript"> var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ? "https://secure." : "http://www."); document.write("");</script>

counter for tumblr

Tuesday Feb 10, 2009

Zones to the Rescue

Recently, Thomson Reuters "demonstrated that RMDS [Reuters Marked Data Systems software] performs better in a virtualized environment with Solaris Containers than it does with a number of individual Sun server machines."

This enabled Thomson Reuters to break the "million-messages-per-second barrier."

The performance improvement is probably due to the extremely high bandwidth, low latency characteristics of inter-Container network communications. Because all inter-Container network traffic is accomplished with memory transfers - using default settings - packets 'move' at computer memory speeds, which are much better than common 100Mbps or 1Gbps ethernet bandwidth. Further, that network performance is much more consistent without extra hardware - switches and routers - that can contribute to latency.

Articles can be found at: http://finance.yahoo.com/news/Sun-Microsystems-and-Thomson-bw-14306924.html


Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« April 2014