X

Everything you want and need to know about Oracle SPARC systems performance

Virtualized Network Performance: SPARC T7-1

Brian Whitney
Principal Software Engineer

Oracle's SPARC T7-1 server using Oracle VM Server for SPARC exhibits lower network latency under virtualization.  The network latency and bandwidth were measured using the Netperf benchmark.

  • TCP network latency between two Oracle VM Server for SPARC guests running on separate SPARC T7-1 servers each using SR-IOV is similar to that of two SPARC T7-1 servers without virtualization (native/bare metal).

  • TCP and UDP network latencies between two Oracle VM Server for SPARC guests running on separate SPARC T7-1 servers each using assigned I/O were significantly less than the other two I/O configurations (SR-IOV and paravirtual I/O).

  • TCP and UDP network latencies between two Oracle VM Server for SPARC guests running on separate SPARC T7-1 servers each using SR-IOV were significantly less than when using paravirtual I/O.

Terminology notes:

  • VM – virtual machine
  • guest – encapsulated operating system instance, typically running in a VM.
  • assigned I/O – network hardware driven directly and exclusively by guests
  • paravirtual I/O – network hardware driven by hosts, indirectly by guests via paravirtualized drivers
  • SR-IOV – single root i/o virtualization; virtualized network interfaces provided by network hardware, driven directly by guests.
  • LDom – logical domain (previous name for Oracle VM Server for SPARC)

Performance Landscape

The following tables show the results for TCP and UDP Netperf Latency and Bandwidth tests (single stream).  Netperf latency, often called the round-trip time, is measured in microseconds (usec), smaller is better.

TCP
Networking
Method
Netperf Latency
(usec)
Bandwidth
(Mb/sec)
MTU=1500 MTU=9000 MTU=1500 MTU=9000
Native/Bare Metal 58 58 9100 9900
assigned I/O 51 51 9400 9900
SR-IOV 58 59 9400 9900
paravirtual I/O 91 91 4800 9800

UDP
Networking
Method
Netperf Latency
(usec)
Bandwidth
(Mb/sec)
MTU=1500 MTU=9000 MTU=1500 MTU=9000
Native/Bare Metal 57 57 9100 9900
assigned I/O 51 51 9400 9900
SR-IOV 66 63 9400 9900
paravirtual I/O 98 97 4800 9800

Specifically, the Netperf benchmark latency:
  • is the average request/response time computed by inverse of the throughput reported by the program,
  • is measured within the program from 20 sample-runs of 30 seconds each,
  • uses single-in-flight [i.e. non-burst] 1 byte messages,
  • sends between separate servers connected by 10 GbE,
  • for each test, uses servers connected back-to-back (no network switch) and configured identically: native or guest VM.

Configuration Summary

System Under Test:

2 x SPARC T7-1 servers, each with
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
2 x 600 GB 10K RPM SAS-2 HDD
10 GbE (on-board and PCIe network devices)
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.2

Benchmark Description

Using the Netperf 2.6.0 benchmark to evaluate native and virtualized (LDoms) network performance. Netperf is a client/server benchmark measuring network performance providing a number of independent tests, including the omni Request/Response (aka ping-pong) test with TCP or UDP protocols used here to obtain the Netperf latency measurements, and TCP stream for bandwidth. Netperf was run between separate servers connected back-to-back (no network switch) by 10 GbE network interconnection.

To measure the cost of virtualization, for each test the servers were configured identically: native (without virtualization) or guest VM.  When in a virtual environment, in similar identical fashion on each server, some representative methods were configured to connect the environment to the network hardware (e.g. assigned I/O, paravirtualization, SR-IOV).

Key Points and Best Practices

  • Oracle VM Server for SPARC requires explicit partitioning of guests into Logical Domains of bound CPUs and memory, typically chosen to be local, and does not provide dynamic load balancing between guests on a host.

  • Oracle VM Server for SPARC guests (LDoms) were assigned 32 virtual CPUs (4 complete processor cores) and 64 GB of memory. The control domain served as the I/O domain (for paravirtualized I/O) and was assigned 4 cores and 64 GB of memory.

  • Each latency average reported was computed from the inverse of the reported throughput (similar to the transaction rate) of a Netperf Request/Response test run using 20 samples (aka iterations) of 30 second measurements of non-concurrent 1 byte messages.

  • To obtain a meaningful average latency from a Netperf Request/Response test, it is important that the transactions consist of single messages, which is Netperf's default. If, for instance, Netperf options for "burst" and "TCP_NODELAY" are turned on, multiple messages can overlap in the transactions and the reported transaction rate or throughput cannot be used to compute the latency.

  • All results were obtained with interrupt coalescence (aka interrupt throttling, interrupt blanking) turned on in the physical NIC, and if applicable, for the attachment driver in the guest. Also, interrupt coalescence turned on is the default for all the platforms used here.

  • All the results were obtained with large receive offload (LRO) turned off in the physical NIC, and, if applicable, for the attachment driver in the guest, in order to reduce the network latency between the two guests.

  • The netperf bandwidth test used send and receive 1MB (1048576 Bytes) messages.

  • The paravirtual variation of the measurements refers to the use of a paravirtualized network driver in the guest instance. IP traffic consequently is routed across the guest, the virtualization subsystem in the host, a virtual network switch or bridge (depending upon the platform), and the network interface card.

  • The assigned I/O variation of the measurements refers to the use of the card's driver in the guest instance itself. This use is possible by exclusively assigning the device to the guest. Device assignment results in less (software) routing for IP traffic and consequently less overhead than using paravirtualized drivers, but virtualization still can impose significant overhead. Note also NICs used in this way cannot be shared amongst guests, and may obviate the use of certain other VM features like migration. The T7-1 system has four on-board 10 GbE devices, but all of them are connected to the same PCIe branch, making it impossible to configure them as assigned I/O devices. Using a PCIe 10 GbE NIC allows configuring it as an assigned I/O device.

  • In the context of Oracle VM Server for SPARC and these tests, assigned I/O refers to PCI endpoint device assignment, while paravirtualized I/O refers to virtual I/O using a virtual network device (vnet) in the guest connected to a virtual switch (vsw) through the I/O domain to the physical network device (NIC).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services