Friday Apr 22, 2016

New whitepaper: Optimizing Oracle VM Server for x86 Performance

I am very pleased to announce publication of a new whitepaper "Optimizing Oracle VM Server for x86  Performance". This article contains material previously posted on this blog, plus additional technical information and features newly introduced with Oracle VM 3.4.

Wednesday Apr 20, 2016

Upgrading Oracle VM Server to version 3.3.4

This blog entry shows the step-by-step procedure I used to upgrade Oracle VM Server from version 3.3.3 to 3.3.4, corresponding to the upgrade performed last week for Oracle VM Manager.[Read More]

Tuesday Apr 12, 2016

Upgrading Oracle VM Manager to version 3.3.4

This blog entry shows the step-by-step procedure I used to upgrade Oracle VM Manager from version 3.3.3 to 3.3.4. No muss, no fuss.[Read More]

Thursday Mar 24, 2016

Oracle VM 3.4.1 and new performance features

Oracle VM 3.4.1 has just been released, with important new features and improved performance and scalability. This blog describes a new feature than can be used to further improve disk device performance on Oracle VM 3.4.1, and also on the recent maintenance release Oracle VM 3.3.4.[Read More]

Friday Mar 18, 2016

Root domains and I/O on SPARC M7

Please see the excellent blog entry on root domains and how they've changed (for the better) on SPARC M7 servers at the blog article Complex Root Domains. The article refers to SR-IOV but doesn't discuss it, in order to focus on root domains, but SR-IOV also remains available on M7 systems for physical I/O with high resource granularity.

Monday Nov 30, 2015

Oracle VM Performance and Tuning - Part 5

The fifth article in this series of Oracle VM performance focusses on Oracle VM Server for x86 domain types, huge pages, and CPU scheduling controls.[Read More]

Tuesday Nov 24, 2015

Oracle VM Performance and Tuning - Part 4

The fourth article in this series of Oracle VM performance focusses on Oracle VM Server for x86 CPU and memory performance, with guidance on how to reduce memory latency and control CPU allocation.
[Read More]

Wednesday Nov 11, 2015

Virtual HBA in Oracle VM Server for SPARC

Oracle VM Server for SPARC 3.3 added an important new feature, virtual HBA (vHBA), which adds flexibility and relieves prior limitations of virtual I/O without sacrificing performance. This blog entry describes this new feature and shows how to use it.
[Read More]

Friday Nov 06, 2015

Oracle VM Performance and Tuning - Part 3

The third article in this series of Oracle VM performance focusses on performance goals for the virtual machine and cloud environments, and performance principles for CPU, memory, and I/O that behave differently in VM environments.[Read More]

Thursday Oct 08, 2015

Oracle VM Server for SPARC Best practices: naming virtual network devices

This blog shows a simple usability best practice to make it easier to identify network resources using 'ldm list-netdev'.[Read More]

Friday Oct 02, 2015

Oracle VM Performance and Tuning - Part 2

This article on Oracle VM performance reviews general performance principles, and follows with a review of Oracle VM architectural features that affect performance. This will be high-level as a basis for more technical detail in subsequent articles.

How to evaluate and measure performance (short version)

First, let's consider ways to not evaluate performance. Performance is often stated in unquantified generalities ("Give good response time!") or complaints ("Response time is terrible today. Fix it!"). That doesn't help understand the performance situation.

Another habit is to look at resource utilization without relevance to delivered performance. For example:

  • High CPU utilization. Is 95% CPU busy bad or good?

    High CPU busy may just mean you're getting your money's worth from your servers. It is only a problem if service level objectives aren't being met due to resource starvation. However, it could be a symptom of a problem (program looping, excessive error handling, etc) rather than being a problem.

  • Low CPU utilization. Is 12% CPU busy bad or good?

    Low CPU may mean the workload is idle, or it's a single threaded app on a server with many CPUs. (Is this an 8 CPU machine where one CPU is 100% busy while the others are idle?)

    Applications are often unable to drive CPU because they are waiting on I/O, or because memory is over committed and they are thrashing. Low utilization might be innocent, or can indicate a bottleneck elsewhere.

So, raw utilization numbers are not good or bad of themselves. They can be clues when used in the right context - "it depends". Another trap is to use average numbers, which can hide peak loads and spikes.

There are other popular measurements that don't help: such as microbenchmarks that don't look at all like the actual workload, or measuring how long it takes to run the dd command to write a gigabyte of zeroes when the expected workload is random I/O. Unless the purpose of the system is to use dd to write zeroes, it's of limited utility. Measuring the wrong thing because it's easy is so common that it has its own name, the streetlight effect.

Instead, performance analysts and systems administrators should measure against requirements of their business users, using service level objectives stated in external terms: meeting a deadline to run a particular task (such as: get payroll out, post and clear stock trades, or close the books at end of quarter), or meeting response times for different times of transaction (load a web page, do a stock quote, transact a purchase or trade). Performance objectives are commonly expressed in the form of response times ("95% of a specified transaction must complete in X seconds at a rate of N transactions per second") or in throughput rates ("handle N payroll records in H hours").

In the preceding paragraphs, I've touched on essential performance concepts: thoughput, response time, latency, utilization, service levels. We'll use those terms to relate to virtual machine performance.

Oracle VM - architectural review

Some may find the preceding material too abstract and not specific to virtualization, so lets change gears and discuss architecture of the Oracle VM hypervisors.

The Oracle VM family includes Oracle VM Server for x86 and Oracle VM Server for SPARC. two hypervisors that optionally share Oracle VM Manager as a common management infrastructure. There is also a desktop virtualization product, Oracle VM VirtualBox. VirtualBox is popular for end-user virtual machines on a desktop for laptop, but is out of scope for this series of articles.

Oracle VM Server for x86 and Oracle VM Server for SPARC have architectural similarities and differences. Both use a small hypervisor in conjunction with a privileged virtual machine ("domain") for aministration, and virtual and physical device management. The hypervisor resides in firmware on SPARC, and in software on x86. That constrasts with traditional virtual machine systems that use a monolithic hypervisor kernel for system control and device management.

Oracle VM Server for x86 is based on Xen virtualization technology, and uses a "dom0" (domain 0) as an administrative control point and to provide virtual I/O device services to the guest VMs ("domU"). Oracle VM Server for SPARC uses a small firmware-based hypervisor coupled with a "control domain" that can be compared to dom0 on x86, with the option of having multiple "service domains" for resiliency.

The two products also have similarities and differences in how they handle systems resources:

Oracle VM Server for x86 Oracle VM Server for SPARC
CPU CPUs can be shared, oversubscribed, and timesliced using a share-based scheduler.
CPUs can be allocated cores (or CPU threads if hyperthreading is enabled.)
The number of virtual CPUs in a domain can be changed while the domain is running.
CPUs are dedicated to each domain with static assignment when the domain is "bound".
Domains are given exclusive use of some number of CPU cores or threads, which can be changed while the domain is running.
Memory Memory is dedicated to each domain, no over-subscription.
The hypervisor attempts to assign a VM's memory to a single NUMA node, and has CPU affinity rules to try to keep a VM's virtual CPUs near its memory for local latency.
Memory is dedicated to each domain, no over-subscription.
The hypervisor attempts to assign memory on a single NUMA node, and allocate CPUs on the same NUMA node for local latency.
I/O Guest VMs are provided virtual network, console, and disk devices provided by dom0 Guest VMs are provided virtual HBA, network, console, and disk devices provided by control domain and optional service domains.
VMs can also use physical I/O with direct connection to SR-IOV virtual functions or PCIe buses.
Domain types Guest VMs (domains) may be hardware virtualization (HVM), paravirtualized (PV) or hardware virtualized with PV device drivers. Guest VMs (domains) are paravirtualized.

That's a lot of similarity for two products with different origins. When I'm asked for a quick summary, I say that the two products have a common memory model (VM memory is fixed, not overcommitted or swapped - very important), but different CPU models (Oracle VM Server for SPARC uses dedicated CPUs on servers that have lots of them, while x86 uses a more traditional software based scheduler that time-slices virtual CPUs onto physical CPUs. Both products are aware of the NUMA affects and try in different ways to reduce remote memory latency from CPUs. Both have virtual network and virtual disk devices, but the SPARC side has additional options for device backends and non-virtualized I/O. Finally, the x86 side has more domain types, reflecting the wide range of x86 operating systems.

That's an introduction to the concepts. The next article (rubbing hands together in anticipation!) will delve more into the technology and their performance implications.


For general performance analysis, I recommend Brendan Gregg's book Systems Performance: Enterprise and the Cloud. It has excellent content for any performance analyst, as well as details for various versions of Linux and Solaris.

Thursday Aug 06, 2015

Oracle VM Performance and Tuning - Part 1

This blog entry starts a series of articles on virtual machine performance, focussing (obviously) on Oracle VM Server on both x86 and SPARC, though also including general concepts.[Read More]

Thursday May 07, 2015

Oracle VM Server for SPARC 3.2 now available for Solaris 10 control domains

Oracle has released Oracle VM Server for SPARC 3.2 packages for Solaris 10 control domains. The package can be downloaded from

Not all of the performance and functional enhancements of Oracle VM Server for SPARC 3.2 are available when used with Solaris 10. Oracle recommends that customers use Solaris 11, especially for the control domain, service and I/O domains. Note that future Oracle VM Server for SPARC releases will no longer support the running of the Oracle Solaris 10 OS in control domains. You can continue to run the Oracle Solaris 10 OS in guest domains, root domains, and I/O domains when using future releases. Solaris 10 guest domains can be used with Solaris 11 control domains, allowing interoperability while moving to Solaris 11. For additional details, please see the Release Notes.

Thursday Apr 09, 2015

Oracle VM Server for SPARC 3.2 - Enhanced Virtual Disk Multipathing

Last month, Oracle released Oracle VM Server for SPARC release 3.2 which includes numerous enhancements. One of these is improvement for virtual disk multipathing, which provides redundant paths to virtual disk so that disk access continues even if a path or service domain fails.

Multipath groups are arranged in an active/standby pair of connections to the same physical media. In case of a path or service domain failure, I/O activity continues on a surviving path. This is also helpful for rolling upgrades: a service domain can be rebooted for an upgrade, and virtual disk I/O continues without interruption. That's important for continuous availability while upgrading system software.

A previous limitation was that you could not determine by commands which path was active, and you couldn't force activity onto a selected path. That meant that all the I/O for multiple virtual disks went (typically) to the primary path instead of being load balanced across service domains and HBAs. You could deduce which service domains were actively doing disk I/O by using commands like iostat, but there was no visibility, and no way to spread the load. Oracle VM Server for SPARC addresses this by adding command output that shows which path is active, and let you switch the active path to one of the available paths. Now, the command 'ldm list-bindings' shows which path is active, and the command 'ldm set-vdisk' lets you set which path is active. For further details and syntax, please see the documentation at Configuring Virtual Disk Multipathing

Monday Mar 23, 2015

Oracle VM Server for SPARC 3.2 - Live Migration

Oracle has just released Oracle VM Server for SPARC release 3.2. This update has been integrated into Oracle Solaris 11.2 beginning with SRU 8.4. Please refer to Oracle Solaris 11.2 Support Repository Updates (SRU) Index [ID 1672221.1]. 

This new release introduces the following features:

  • Improved multipath virtual disk I/O (mpgroup): view, set active I/O path
  • Improved domain observability to show dependencies between service and guest domains
  • Improved network observability, quality of service and security management, and PVLAN
  • I/O Resiliency (IOR) for physical I/O
  • Dynamic Bus (dynamically assign PCI bus to domains)
  • Live migration improvements
  • Guest additions (VM API to interact with host environment)
  • Guest access to SPARC performance counters

Live migration performance and security enhancements

This blog entry details 3.2 improvements to live migration. Oracle VM Server for SPARC has supported live migration since release 2.1, and has been enhanced over time to provide features like cross-CPU live migration to permit migrating domains across different SPARC CPU server types. Oracle VM Server for SPARC 3.2 improves live migration performance and security.

Live migration performance

The time to migrate a domain is reduced in Oracle VM Server for SPARC 3.2 by the following improvements:

  • Parallel page copying and memory mapped I/O: data compression and transmission were alredy multi-threaded, but copying from hypervisor memory was single-threaded and buffers were copied twice. This change adds worker threads for parallelism and reduces the number of times data is copied, including for network I/O
  • LZJB used to compress: Memory is compressed before it is encrypted and transmitted over the network. This change uses the fast, lightweight LZJB (Lempel Zev Jeff Bonwick) algorithm to quickly compress and decompress memory pages. Zero-fill pages are skipped, and pages that are only slightly reduced in size are sent unchanged. That reduces overall processing time.
These and other changes reduce overall migration time, reduce domain suspension time (the time at the end of migration when the domain is paused to retransmit the last remaining pages). and reduces CPU utilization. In my own testing I've seen speedups from 50% to 500% faster migration depending on the guest domain activity and memory size. Others may experience different times, depending on network and CPU speeds and domain configuration.

This improvement is available on all SPARC servers supporting Oracle VM Server for SPARC, including the older UltraSPARC T2, UltraSPARC T2 Plus, and SPARC T3 systems. Some speedups are only be available for guest domains running Solaris 11.2 SRU 8 or later, and will not be available on Solaris 10. Solaris 10 guests must run Solaris 10/09 or later, as that release introduced code for cooperative live migration that works with the hypervisor.

Live migration security

Oracle VM Server for SPARC 3.2 improves live migration security by adding certificate-based authentication and supporting the FIPS 140-2 standard.

Certificate based authentication

Live migration requires mutual authentication between the source and target servers. The simplest way to initiate live migration is to issue an "ldm migrate" command on the source system specifying an adminstrator password on the target system, or point to a root-readable file containing the target system's password. That is cumbersome, and not ideal for security. Oracle VM Server for SPARC 3.2 adds a secure, scalable way to permit password-less live migration using certificates that prevents man-in-the-middle attacks.

This is accomplished by using SSL certificates to establish a trust relationship between different server's control domainss as described at Configuring SSL Certificates for Migration. In brief, a certificate is securely copied from the remote system's /var/opt/SUNWldm/server.crt to the local system's /var/opt/SUNWldm/trust and a symbolic is made from certificate in the ldmd trusted certificate directory to /etc/certs/CA. After the certificate and ldmd services are restarted, the two control domains can securely communicate with one another without passwords. This enhancement is available on all servers supporting Oracle VM Server for SPARC, using either Solaris 10 or Solaris 11.

FIPS 140-2 Mode

The Oracle VM Server for SPARC Logical Domains Manager can be configured to perform domain migrations using the Oracle Solaris FIPS 140-2 certified OpenSSL libraries as described at When this is in effect, migrations are conformant with this standard, and can only done between servers that are all in FIPS 140-2 mode.

For more information, please see Using a FIPS 140 Enabled System in Oracle® Solaris 11.2. This enhancement requires that the control domain run Oracle Solaris 11.2 SRU 8.4 or later.

Where to get more information

For additional resources about Oracle VM Server for SPARC 3.2, please see the documentation at, especially the What's New page, the Release Notes and the Administration Guide




« May 2016