Virtual Disk Performance Improvement for Oracle VM Server for SPARC
By Jsavit-Oracle on May 21, 2014
First, a quick review of some performance points, the same ones I discuss in tuning tips posts:
- CPU performance in a logical domain has always been essentially the same as native performance. That's due to the nature of SPARC CMT servers and how Oracle VM Server for SPARC dedicates hardware CPU threads or cores to a domain. This eliminates the overhead of virtualizing CPUs seen in traditional virtual machine environments.
- Memory performance: the same story. Oracle VM Server for SPARC doesn't have the expense of shadow page table management or the risk of thrashing seen with hypervisors that overcommit memory and do swapping and paging.
- I/O: this is where virtual machine systems have always had a performance cost relative to "bare metal", and this applied to Oracle VM Server for SPARC as well as other hypervisors.
Oracle VM Server for SPARC could provide excellent performance for virtual networks (in particular, since the virtual network performance enhancement was delivered). It could provide "good" performance for disk, given appropriately sized service domains and disk backends based on full disks or LUNs instead of convenient but slower file-based backends. However, there still was a substantial performance cost for virtual disk I/O, which became a significant factor for the more demanding applications increasingly deployed in logical domains.
The physical alternative
Oracle VM Server for SPARC addressed this by improving virtual I/O performance over time, and by offering physical I/O as a higher-performance alternative. This could be done by dedicating an entire PCIe bus and its host bus adapters to a domain, which yielded native I/O performance for every device on the bus. This is the highly effective method used with Oracle SuperCluster.
Oracle VM Server for SPARC 3.1.1 added the ability to use Single Root I/O Virtualization (SR-IOV) for Fibre Channel (FC) devices. This provides native performance with better resource granularity: there can be many SR-IOV devices to hand to domains.
Both provide native performance but have limitations: There are a fixed number of PCIe buses on each server based on the server model, so only a limited number of domains can be assigned a bus for its use. SR-IOV provides much more resource granularity, as a single SR-IOV card can be presented as many "virtual functions", but is only supported for qualified FC cards. Both forms of physical I/O prevent the use of live migration, which only applies to domains that use virtual I/O. One had to either compromise on flexibility or on performance - but now you can have both together.
The virtual disk I/O performance boost
Just as this issue was largely addressed for virtual network devices, it has now been addressed for virtual disk devices. Solaris 11.1 SRU 19.6 introduces new algorithms that remove bottlenecks caused by serialization (Update: patch update 150400-13 provides the same improvement on Solaris 10 ). Each virtual disk now has multiple read and multiple write threads assigned to it - this is analogous to the "queue depth" seen for real enterprise-scale disks.
The result is sharply reduced I/O latency and increased I/O operations per second - close to the results that would be seen in a non-virtualized environment. This is especially effective for workloads with multiple readers and writers in parallel, rather than a simplistic dd test.
Want the numbers and more detailed explanation? Read Stefan's Blog!
Stefan Hinker has written an excellent blog entry Improved vDisk Performance for LDoms that quantifies the improvements. Rather than duplicate the material he put there, I strongly urge you to read his blog and then come back here. However, I can't resist "quoting" two of the graphics he produces:
I/O operations per second (IOPS)
This chart shows that delivered IOPS were essentially the same with the new virtual I/O and with bare-metal,
exceeding 150K IOPS. There is a difference, but the data points are so close that the yellow "new virtual" line almost covers the blue "bare metal" line.
IO latency - response timesThis chart shows that I/O response time is also the essentially the same as bare metal. As with the IOPS chart, the yellow line for virtual I/O response time is so close to the blue line for bare metal that it almost obscures the blue line entirely.
This is a game-changing improvement - the flexibility of virtualization with the performance of bare-metal.
That said, I will emphasize some caveats: this will not solve I/O performance problems due to overloaded disks or LUNs. If the physical disk is saturated, then removing virtualization overhead won't solve the problem. A simple, single-threaded I/O program is not a good example to show the improvement, as it is really going to be gated by individual disk speeds. This enhancement provides I/O performance scalability for real workloads backed by appropriate disk subsystems.
How to implement the improvement
The main task to implement this improvement is to update Solaris 11 guest domains and service domains they use to Solaris 11.1 SRU 19.6. Solaris 10 users should apply patch 150400-13, which was delivered June 16, 2014.
All of those domains have to be updated, or I/O will proceed using the prior algorithm. On Solaris 11, assuming that your systems are set up with the appropriate service repository, this is as simple as issuing the command: pkg update and rebooting. This is one of the things Solaris 11 makes really easy. The full dialog looks like this:
$ sudo pkg update Password: Packages to install: 1 Packages to update: 76 Create boot environment: Yes Create backup boot environment: No DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 77/77 2859/2859 208.4/208.4 3.8M/s PHASE ITEMS Removing old actions 325/325 Installing new actions 362/362 Updating modified actions 4137/4137 Updating package state database Done Updating package cache 76/76 Updating image state Done Creating fast lookup database Done A clone of solaris-3 exists and has been updated and activated. On the next boot the Boot Environment solaris-4 will be mounted on '/'. Reboot when ready to switch to this updated BE. --------------------------------------------------------------------------- NOTE: Please review release notes posted at: https://support.oracle.com/epmos/faces/DocContentDisplay?id=1501435.1 ---------------------------------------------------------------------------
After that completes, just reboot by using init 6. That's all you have to do to install the software.
To gain the full performance benefits, it is still important to have properly sized service domains. The small allocations used for older servers and modest workloads, say one CPU core and 4GB of RAM, may not be enough. Consider boosting your control domain and other service domains to two cores and 8GB or 16GB of RAM: if the service domain is starved for resources, than all of the clients depending on it will be delayed. Use ldm list to see if the domains have high CPU utilization and adjust appropriately.
It's also essential to have appropriate virtual disk backends. No virtualization enhancement is going to make a single disk super-fast; a single spindle is going to max out at 150 to 300 IOPS no matter what you do. That's for rotating media - SSD or a LUN backed by cache hits will behave differently, of course. This is really intended for the robust disk resources needed for an I/O intensive application, just as would be the case for non-virtualized systems.
While there may be some benefits for virtual disks backed by files or ZFS 'zvols', the emphasis and measurements have focused on production I/O configurations based on enterprise storage arrays presenting many LUNs.
The big picture
Now, Oracle VM Server for SPARC can be used with virtual I/O that maintains flexibility without compromising on performance, for both network and disk I/O. This can be applied to the most demanding applications with full performance.
Properly configured systems, in terms of choice of device backends and domain configuration, can achieve performance comparable to what they would receive in a non-virtualized environment, while still maintaining the features of dynamic reconfiguration (add and remove virtual devices as needed) and live migration. For upwards compatibility, and for applications requiring the ultimate in performance, we continue the availability of physical I/O, using root complex domains that own entire PCIe buses, or using SR-IOV. That said, the improved performance of virtual I/O means that there will be fewer instances in which physical I/O is necessary - virtual I/O will increasingly be the recommended way to provide I/O without compromising performance or functionality.