Thursday Sep 29, 2011

SPARC T4-1 Server Outperforms Intel (Westmere AES-NI) on IPsec Encryption Tests

Oracle's SPARC T4 processor has significantly greater performance than the Intel Xeon X5690 processor when both are using Oracle Solaris 11 secure IP networking (IPsec). The SPARC T4 processor using IPsec AES-256-CCM mode achieves line speed over a 10 GbE network.

  • On IPsec, SPARC T4 processor is 23% faster than the 3.46 GHz Intel Xeon X5690 processor (Intel AES-NI).

  • The SPARC T4 processor is only at 23% utilization when running at its maximum throughput making it 3.6 times more efficient at secure networking than the 3.46 GHz Intel Xeon X5690 processor.

  • The 3.46 GHz Intel Xeon X5690 processor is nearly fully utilized at its maximum throughput leaving little CPU for application processing.

  • The SPARC T4 processor using IPsec AES-256-CCM mode achieves line speed over a 10 GbE network.

  • The SPARC T4 processor approaches line speed with fewer than one-quarter the number of IPsec streams required for the Intel Xeon X5690 processor to achieve its peak throughput. The SPARC T4 processor supports the additional streams with minimal extra CPU utilization.

IPsec provides general purpose networking security which is transparent to applications. This is ideal for supplying the capability to those networking applications that don't have cryptography built-in. IPsec provides for more than Virtual Private Networking (VPN) deployments where the technology is often first encountered.

Performance Landscape

Performance was measured using the AES-256-CCM cipher in megabits per second (Mb/sec) aggregate over sufficient numbers of TCP/IP streams to achieve line rate threshold (SPARC T4 processor) or drive a peak throughput (Intel Xeon X5690).

Processor GHz AES Decrypt AES Encrypt
B/W (Mb/sec) CPU Util Streams B/W (Mb/sec) CPU Util Streams
– Peak performance
SPARC T4 2.85 9,800 23% 96 9,800 20% 78
Intel Xeon X5690 3.46 8,000 83% 4,700 81%
– Load at which SPARC T4 processor performance crosses 9000 Mb/sec
SPARC T4 2.85 9,300 19% 17 9,200 15% 17
Intel Xeon X5690 3.46 4,700 41% 3,200 47%

Configuration Summary

SPARC Configuration:

SPARC T4-1 server
1 x SPARC T4 processor 2.85 GHz
128 GB memory
Oracle Solaris 11
Single 10-Gigabit Ethernet XAUI Adapter

Intel Configuration:

Sun Fire X4270 M2
1 x Intel Xeon X5690 3.46 GHz, Hyper-Threading and Turbo Boost active
48 GB memory
Oracle Solaris 11
Sun Dual Port 10GbE PCIe 2.0 Networking Card with Intel 82599 10GbE Controller

Driver Systems Configuration:

2 x Sun Blade 6000 chassis each with
1 x Sun Blade 6000 Virtualized Ethernet Switched Network Express Module 10GbE (NEM)
10 x Sun Blade X6270 M2 server modules each with
2 x Intel Xeon X5680 3.33 GHz, Hyper-Threading and Turbo Boost active
48 GB memory
Oracle Solaris 11
Dual 10-Gigabit Ethernet Fabric Expansion Module (FEM)

Benchmark Configuration:

Netperf 2.4.5 network benchmark adapted for testing bandwidth of multiple streams in aggregate.

Benchmark Description

The results here are derived from runs of the Netperf 2.4.5 benchmark. Netperf is a client/server benchmark measuring network performance providing a number of independent tests, including the TCP streaming bandwidth tests used here.

Netperf is, however, a single network stream benchmark and to demonstrate peak network bandwidth over a 10 GbE line under encryption requires many streams.

The Netperf documentation provides an example of using the software to drive multiple streams. The example is not sufficient to develop the workload because it does not scale beyond a single driver node which limits the processing power that can be applied. This subsequently limits how many full bandwidth streams can be supported. We chose to have a single server process on the target system (containing either the SPARC T4 processor or the Intel Xeon processor) and to spawn one or more Netperf client processes each across a cluster of the driver systems. The client processes are managed by the mpirun program of the Oracle Message Passing Toolkit.

Tabular results include aggregate bandwidth and CPU utilization. The aggregate bandwidth is computed by dividing the total traffic of the client processes by the overall runtime. CPU utilization on the target system is the average of that reported by all of the Netperf client processes.

IPsec is configured in the operating system of each participating server transparently to Netperf and applied to the dedicated network connecting the target system to the driver systems.

Key Points and Best Practices

  • Line speed is defined as data bandwidth within 10% of theoretical maximum bit rate of network line. For 10 GbE greater than 9000 Mb/sec bandwidth is defined as line speed.

  • IPsec provides network security that is configured completely in the operating system and is transparent to the application.

  • Peak bandwidths under IPsec are achieved only in aggregate with multiple client network streams to the target server.

  • Oracle Solaris receiver fanout must be increased from the default to support the large numbers of streams at quoted peak rates.

  • The ixgbe network driver relevant on servers with Intel 82599 10GbE controllers (driver systems and Intel Xeon target system) was limited to only a single receiver queue to maximize utilization of extra fanout.

  • IPsec is configured to make a unique security association (SA) for each connection to avoid a bottleneck over the large stream counts.

  • Jumbo frames are enabled (MTU of 9000) and network interrupt blanking (sometimes called interrupt coalescence) is disabled.

  • The TCP streaming bandwidth tests, which run continuously for minutes and multiple times to determine statistical significance, are configured to use message sizes of 1,048,576 bytes.

  • IPsec configuration defines that each SA is established through the use of a preshared key and Internet Key Exchange (IKE).

  • IPsec encryption uses the Solaris Cryptographic Framework which applies the appropriate accelerated provider on both the SPARC T4 processor and the Intel Xeon processor.

  • There is no need to configure a specific authentication algorithm for IPsec. With the Encapsulated Security Payload (ESP) security protocol and choosing AES-256-CCM for the encryption algorithm, the encapsulation is self-authenticating.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.


BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages

« April 2014