Is "Zero-Overhead Virtualization" Just Hype?
At its first release—Oracle SuperCluster T4-4—Oracle claimed zero-overhead virtualization for the domain technology used on Oracle SuperCluster. Was this claim just marketing hype, or was it real? And is the claim still made for current SuperCluster platform releases?
To answer these questions we need to examine the virtual machine implementation used on SuperCluster: Oracle VM Server for SPARC, also known as Logical Domains (LDoms for short). Oracle VM Server for SPARC is a Type 1 hypervisor that is implemented in firmware on all modern SPARC systems. The virtual machines created as a result are referred to as Domains.
The diagram below illustrates a typical industry approach to virtualization. In this case, available hardware resources are shared across virtual machines, with the allocation of resources managed by a hypervisor implemented using a software abstraction layer. This approach delivers flexibility, but at the cost of weaker isolation and increased virtualization overheads. Optimal performance is delivered only by “bare metal” configurations that eliminate the hypervisor (and therefore do not support virtualization).
By contrast, Oracle VM Server for SPARC has a number of unique characteristics:
- SPARC systems always use the SPARC firmware-based hypervisor, whether or not domains have been configured—there is no “bare metal” configuration on SPARC that eliminates the hypervisor. For this reason, the concept of bare metal that applies to most other platforms has no meaning on SPARC systems. An important implication is that no additional virtualization layer is required on SPARC systems when configuring domains. That means no additional performance overheads are introduced, either.
- The SPARC hypervisor partitions CPU and memory resources rather than virtualizing them. That approach is possible because CPU and memory resources are never shared by SPARC domains. Each hardware CPU strand is uniquely assigned to one and only one domain. In other words, each virtual CPU in a domain is backed by a dedicated hardware strand. Further, each memory block is uniquely assigned to one and only one domain. This approach has a number of important implications:
- Since each domain has its own dedicated CPU resources, no virtualization layer is needed to schedule CPU resources in a domain-based virtual machine. The hardware does the scheduling directly. The result is that the scheduling overheads inherent in most virtualization implementations simply don’t apply in the case of SPARC systems.
- Memory resources in each domain are also dedicated to that domain. That means that domain memory access is not subject to an additional layer of virtualization, either. Memory access operates in the same way on all SPARC systems, whether or not they use domains.
- Over-provisioning does not apply to either CPU or memory with SPARC domains.
We have seen that access to CPU and memory resources on SPARC systems used in Oracle SuperCluster does not impose overheads, both because these resources are dedicated to each domain, and also because the same highly efficient SPARC hypervisor is always in use, whether or not domains are configured.
We’ve examined CPU and memory. What about I/O? I/O virtualization is a major source of performance overhead in most virtualization implementations.
I/O virtualization with Oracle VM Server for SPARC takes one of three forms:
- Partition at PCIe slot granularity.
In this case one or more PCIe slots, along with any PCIe devices hosted in them, are assigned uniquely to a single domain. The result is that I/O devices are dedicated to that domain. As for CPU and memory, the virtualization in this case is limited to resource partitioning and therefore does not incur the usual overheads inherent in traditional virtualization.
This type of virtualization has been available on every Oracle SuperCluster platform release, and indeed virtualization of this type was the only option available on the original SPARC SuperCluster T4-4 platform. In this implementation, InfiniBand HCAs (which carry all storage and network traffic within SuperCluster), and 10GbE NICs (which carry network traffic between the SuperCluster rack and the datacenter), are dedicated to the domains to which they are assigned. As is true for CPU and memory access, I/O access for this implementation follows the same code path whether or not domains are in use.
Domains of this type are referred to as Dedicated Domains on SuperCluster, since all CPU and memory resources, and InfiniBand and 10GbE devices, are uniquely dedicated to a single domain. Such domains have zero overheads with respect to performance. SuperCluster Dedicated Domains are illustrated in the diagram below.
- Virtualization based on SR-IOV.
For Oracle SuperCluster T5-8 and subsequent SuperCluster platform releases, shared I/O has also been available for InfiniBand and 10GbE devices. The resulting I/O Domains leverage SR-IOV technology, and feature I/O virtualization with very low, but not zero, performance overheads. The benefit of the SR-IOV technology used in I/O Domains is that InfiniBand and 10GbE devices can be shared between multiple domains, since domains of this type do not require dedicated I/O devices. SuperCluster I/O Domains are illustrated in the diagram below.
- Virtualization based on proxies in combination with virtual device drivers.
This type of virtualization has been used on all SuperCluster implementations for functions that are not performance critical, such as console access and virtual disks used as domain root and swap devices.
All Oracle SuperCluster platforms since Oracle SuperCluster T5-8—including the current Oracle SuperCluster M8—support hybrid configurations that deliver InfiniBand and 10GbE I/O virtualization via Dedicated Domains (domains that use PCIe slot partitioning), and/or via I/O Domains (domains that leverage SR-IOV virtualization).
An additional layer of virtualization is also supported, with one or more low overhead Oracle Solaris Zones able to be deployed in domains of any type. An example of a configuration featuring nested virtualization is illustrated in the diagram below.
The Oracle SuperCluster tooling leverages SuperCluster’s built in redundancy, along with both the resource partitioning and resource virtualization described above, to allow customers to deploy flexible and highly available configurations. High Availability will be the subject of a future SuperCluster blog.
In summary, SPARC domains are able to offer efficient and secure isolation with zero or very low performance overheads. The current Oracle SuperCluster M8 platform
delivers domain-based virtual machines with zero performance overheads for CPU and memory operations. Oracle SuperCluster M8 virtual machines also deliver I/O virtualization for InfiniBand and 10GbE with either zero performance overheads via Dedicated Domains, or with very low performance overheads via I/O Domains. Learn more here
About the Author
Allan Packer is a Senior Principal Software Engineer working for the Solaris Systems Engineering organization in the Operating Systems and Virtualization Engineering group at Oracle. He has worked on issues related to server systems performance, sizing, availability, and resource management, developed performance and regression testing tools, published several TPC industry-standard benchmarks as technical lead, and developed a systems/database training curriculum. He has published articles in industry magazines, presented at international industry conferences, and his book "Configuring and Tuning Databases on the Solaris Platform" was published by Sun Press in December 2001. Allan is currently the technical lead and architect for Oracle SuperCluster.