This is the original version of this blog entry kept for reference.
Please refer to the updated version.
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly called Logical Domains)
Oracle VM Server for SPARC is a high performance virtualization technology for SPARC servers. It provides native CPU performance without the virtualization overhead typical of hypervisors. The way memory and CPU resources are assigned to domains avoids problems often seen in other virtual machine environments, and there are intentionally few "tuning knobs" to adjust.
However, there are best practices that can enhance or ensure performance. This blog post lists and briefly explains performance tips and best practices that should be used
in most environments. Detailed instructions are in the "http://docs.oracle.com/cd/E37707_01/html/E29665/toc.html">Oracle VM Server for SPARC
Administration Guide. Other important information is in the "http://docs.oracle.com/cd/E37707_01/html/E29668/index.html">Release Notes. (The
Oracle VM Server for SPARC documentation home page is "http://www.oracle.com/technetwork/documentation/vm-sparc-194287.html">here.)
*stattools, DTrace, driver options, TCP window sizing,
/etc/systemsettings, and so on.
That include the firmware, which is easy to "install once and forget". The firmware contains much of the logical domains infrastructure, so it should be kept
current. The "http://docs.oracle.com/cd/E37707_01/html/E29668/index.html">Release Notes list minimum and recommended firmware and software levels needed for each platform.
Some enhancements improve performance automatically just by installing the new versions. Others require administrators configure and enable new features. The following items will mention them as needed.
prstatto see if there is pent up demand for CPU. Alternatively, issue
ldm list -lfrom the control domain.
Good news: you can dynamically add and remove CPUs to meet changing load conditions, even on the control domain. You can do this manually or automatically with the built-in policy-based resource manager. That's a Best Practice of its own, especially if you have guest domains with peak and idle periods.
The same applies to memory. Again, the good news is that standard Solaris tools can be used to see if a domain is low on memory, and memory can also added to or removed from a domain. Applications need the same amount of RAM to run efficiently in a domain as they do on bare metal, so no guesswork or fudge-factor is required. Logical domains do not oversubscribe memory, which avoids problems like unpredictable thrashing.
For the control domain and other service domains, a good starting point is at least 1 core (8 vCPUs) and 4GB or 8GB of memory.
Actual requirements must be based on system load: small CPU and memory allocations were appropriate with older, smaller LDoms-capable systems,
but larger values are better choices for the demanding, higher scaled systems and applications now used with domains, Today's faster CPUs are capable of generating much higher I/O rates than older systems, and service domains have to be suitably provisioned to support the load. Don't starve the service domains! Two cores and 8GB of RAM are a good starting point if there is substantial I/O load.
Live migration is known to run much faster if the control domain has at least 2 cores, both for total migration time and suspend time, so don't run with a minimum-sized control domain if live migration times are important. In general, add another core if
ldm list shows that the control domain is busy.
Add more RAM if you are hosting lots of virtual devices are running agents, management software, or applications in the control domain and
vmpstat -p shows that you are short on memory. Both can be done dynamically without an outage.
Split core situations are easily avoided by always assigning virtual CPUs in multiples of 8 (
ldm set-vcpu 8 mydomain or
ldm add-vcpu 24). It is rarely good practice to give tiny allocations of 1 or 2 virtual CPUs, and definitely not for production workloads. If fine-grain CPU
granularity is needed for multiple applications, deploy them in zones within a logical domain for sub-core resource control.
Alternatively, use the whole core constraint (
ldm set-core 1 mydomain or
ldm add-core 3 mydomain). The whole-core constraint requires a domain be given its own cores, or the bind operation will fail. This prevents unnoticed sub-optimal configurations.
In most cases the logical domain manager avoids split-core situations even if you allocate fewer than 8 virtual CPUs to a domain. The manager attempts to
allocate different cores to different domains even when partial core allocations are used. It is not always possible, though, so the best practice is to allocate
For a slightly lengthier writeup, see "https://blogs.oracle.com/jsavit/entry/best_practices_core_allocation">Best Practices - Core allocation.
Starting with release 3.0, the logical domains manager attempts to bind domains to CPU cores and RAM locations on the same CPU socket, making all memory references local. If this is not possible because of the domain's size or prior core assignments, the domain manager tries to distribute CPU core and RAM equally across sockets to prevent an unbalanced configuration. This optimization is automatically done at domain bind time, so subsequent reallocation of CPUs and memory may not be optimal. Keep in mind that that this does not apply to single board servers, like a T4-1. In many cases, the best
practice is to do nothing special.
To further reduce the likelihood of NUMA latency, size domains so they don't unnecessarily span multiple sockets. This is unavoidable for very large domains
that needs more CPU cores or RAM than are available on a single socket, of course.
If you must control this for the most stringent performance requirements, you can use "named resources" to allocate specific CPU and memory resources to the domain, using commands like
ldm add-core cid=3 ldm1 and
ldm add-mem mblock=PA-start:size ldm1. This technique is successfully used in the SPARC Supercluster engineered system, which is rigorously tested on a fixed number of configurations. This should be avoided in general purpose environments unless you are certain of your requirements and configuration, because it requires model-specific knowledge of CPU and memory topology, and increases administrative overhead.
ldm set-domain threading=max-ipc mydomain, but this is generally unnecessary and should not
Perform migrations during low activity periods. Guests that heavily modify their memory take more time to migrate since memory contents have to be retransmitted, possibly several times. The overhead of tracking changed pages also increases CPU utilization.
Use RxDring support to substantially reduce network latency and CPU utilization. To turn this on (now the default) issue
ldm set-domain extended-mapin-space=on mydomain for each of the involved domains. The domains must run Solaris 11 or Solaris 10 update 10 and later, and the involved domains (including the control domain) will require a domain reboot for the change to take effect. This also requires 4MB of RAM per guest.
If you are using a Solaris 10 control or service domain for virtual network I/O, then it is important to plumb the virtual switch (vsw) as the network interface and not use the native NIC or aggregate (aggr) interface. If the native NIC or aggr interface is plumbed, there can be a performance impact sinces each packet may be duplicated to provide a packet to each client of the physical hardware. Avoid this by not plumbing the NIC and only plumbing the vsw. The vsw doesn't need to be plumbed either unless the guest domains need to communicate with the service domain. This isn't an issue for Solaris 11 - another reason to use that in the service domain. (thanks to Raghuram for great tip)
As an alternative to virtual network I/O, use Direct I/O (DIO) or Single Root I/O Virtualization (SR-IOV) to provide native-level network I/O performance. They currently (this was written in 2013: these restrictions are gone now) have two main limitations: they cannot be used in conjunction with live migration, and cannot be dynamically added to or removed from a running domain, but provide superior performance. SR-IOV is described in an excellent href="https://blogs.oracle.com/raghuram/entry/sr_iov_feature_in_ovm">blog article by Raghuram Kothakota.
ZFS can also be used for disk backends. This provides flexibility and useful features (clones, snapshots, compression) but can impose overhead compared to a raw device. Note that local or SAN ZFS disk backends preclude live migration, because a
zpool can be mounted to only one host at a time. When using ZFS backends for virtual disk, use a
zvol rather than a flat file - it performs better. Also: make sure that the ZFS
recordsize for the ZFS dataset matches the application (also, just as in a non-virtual environment). This avoids read-modify-write cycles that inflate I/O counts and overhead. The default of 128K is not optimal for small random I/O.
atime, use hard mounts, and set large read and write sizes.
If the NFS and iSCSI backends are provided by ZFS, such as in the ZFS Storage Appliance, provide lots of RAM for buffering, and install write-optimized solid-state disk (SSD) "logzilla" ZFS Intent Logs (ZIL) to speed up synchronous writes.
By design, logical domains don't have a lot of "tuning knobs", and many tuning practices you would do for Solaris in a non-domained environment apply equally when
domains are used. However, there are configuration best practices and tuning steps you can use to improve performance. This blog note itemizes some of the most effective (andleast exotic) performance best practices.