Best Practices - Top Ten Tuning Tips

This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly called Logical Domains)

Top Ten Tuning Tips

Oracle VM Server for SPARC is a high performance virtualization technology for SPARC servers. It provides native CPU performance without the virtualization overhead typical of hypervisors. The way memory and CPU resources are assigned to domains avoids problems often seen in other virtual machine environments, and there are intentionally few "tuning knobs" to adjust.

However, there are best practices that can enhance or ensure performance. This blog post lists and briefly explains performance tips and best practices that should be used in most environments. Detailed instructions are in the Oracle VM Server for SPARC Administration Guide. Other important information is in the Release Notes. (The Oracle VM Server for SPARC documentation home page is here.)

Big Rules / General Advice

Some important notes first:
  1. "Best practices" may not apply to every situation. There are often exceptions or trade-offs to consider. We'll mention them so you can make informed decisions. Please evaluate these practices in the context of your requirements and systems.
  2. Best practices, and "rules of thumb" change over time as technology changes. What may be "best" at one time may not be the best answer later as new features are added or enhanced.
  3. Continuously measure, and tune and allocate resources to meet service level objectives. Then do something else - it's rarely worth trying to squeeze the last bit of performance when performance objectives have been achieved!
  4. Standard Solaris tools and tuning apply in a domain or virtual machine just as on bare metal: the *stat tools, DTrace, driver options, TCP window sizing, /etc/system settings, and so on.
  5. The answer to many performance questions is "it depends". Your mileage may vary. In other words: there are few fixed "rules" that say how much performance boost you'll achieve from a given practice.

The Tips

  1. Keep firmware, Logical Domains Manager, and Solaris up to date - Performance enhancements are continually added to Oracle VM Server for SPARC, so staying current is important.

    That include the firmware, which is easy to "install once and forget". The firmware contains much of the logical domains infrastructure, so it should be kept current. The Release Notes list minimum and recommended firmware and software levels needed for each platform.

    Some enhancements improve performance automatically just by installing the new versions. Others require administrators configure and enable new features. The following items will mention them as needed.

  2. Allocate sufficient CPU and memory resources to each domain, especially control, I/O and service domains - This should be obvious, but cannot be overemphasized. If a service domain is short on CPU, then all of its clients are delayed. Within the domain you can use vmstat, mpstat, and prstat to see if there is pent up demand for CPU. Alternatively, issue ldm list or ldm list -l from the control domain.

    Good news: you can dynamically add and remove CPUs to meet changing load conditions, even on the control domain. You can do this manually or automatically with the built-in policy-based resource manager. That's a Best Practice of its own, especially if you have guest domains with peak and idle periods.

    The same applies to memory. Again, the good news is that standard Solaris tools can be used to see if a domain is low on memory, and memory can also added to or removed from a domain. Applications need the same amount of RAM to run efficiently in a domain as they do on bare metal, so no guesswork or fudge-factor is required. Logical domains do not oversubscribe memory, which avoids problems like unpredictable thrashing.

    For the control domain and other service domains, a good starting point is at least 1 core (8 vCPUs) and 4GB or 8GB of memory. Actual requirements must be based on system load: small CPU and memory allocations were appropriate with older, smaller LDoms-capable systems, but larger values are better choices for the demanding, higher scaled systems and applications now used with domains, Today's faster CPUs are capable of generating much higher I/O rates than older systems, and service domains have to be suitably provisioned to support the load. Don't starve the service domains! Two cores and 8GB of RAM are a good starting point if there is substantial I/O load.

    Live migration is known to run much faster if the control domain has at least 2 cores, both for total migration time and suspend time, so don't run with a minimum-sized control domain if live migration times are important.

    In general, add another core if ldm list shows that the control domain is busy. Add more RAM if you are hosting lots of virtual devices are running agents, management software, or applications in the control domain and vmpstat -p shows that you are short on memory. Both can be done dynamically without an outage.

  3. Allocate domains on core boundaries - SPARC servers supporting logical domains have multiple CPU cores with 8 CPU threads each. Avoid "split core" situations in which CPU cores are shared by more than one domain (different domains have CPU threads on the same core). This can reduce performance by causing "false cache sharing" in which domains compete for a core's Level 1 cache. The impact on performance is highly variable, depending on the domains' behavior.

    Split core situations are easily avoided by always assigning virtual CPUs in multiples of 8 (ldm set-vcpu 8 mydomain or ldm add-vcpu 24 mydomain). It is rarely good practice to give tiny allocations of 1 or 2 virtual CPUs, and definitely not for production workloads. If fine-grain CPU granularity is needed for multiple applications, deploy them in zones within a logical domain for sub-core resource control.

    Alternatively, use the whole core constraint (ldm set-core 1 mydomain or ldm add-core 3 mydomain). The whole-core constraint requires a domain be given its own cores, or the bind operation will fail. This prevents unnoticed sub-optimal configurations.

    In most cases the logical domain manager avoids split-core situations even if you allocate fewer than 8 virtual CPUs to a domain. The manager attempts to allocate different cores to different domains even when partial core allocations are used. It is not always possible, though, so the best practice is to allocate entire cores.

    For a slightly lengthier writeup, see Best Practices - Core allocation.

  4. Use Solaris 11 in the control and service domains - Solaris 11 contains functional and performance improvements over Solaris 10 (some will be mentioned below), and will be where future enhancements are made. It is also required to use Oracle VM Manager with SPARC. Guest domains can be a mixture of Solaris 10 and Solaris 11, so there is no problem doing "mix and match" regardless of which version of Solaris is used in the control domain. It is a best practice to deploy Solaris 11 in the control domain even if you haven't upgraded the domains running applications.
  5. NUMA latency - Servers with more than one CPU socket, such as a T4-4, have non-uniform memory access (NUMA) latency between CPUs and RAM. "Local" memory access from CPUs on the same socket has lower latency than "remote". This can have an effect on applications, especially those with large memory footprints that do not fit in cache, or are otherwise sensitive to memory latency.

    Starting with release 3.0, the logical domains manager attempts to bind domains to CPU cores and RAM locations on the same CPU socket, making all memory references local. If this is not possible because of the domain's size or prior core assignments, the domain manager tries to distribute CPU core and RAM equally across sockets to prevent an unbalanced configuration. This optimization is automatically done at domain bind time, so subsequent reallocation of CPUs and memory may not be optimal. Keep in mind that that this does not apply to single board servers, like a T4-1. In many cases, the best practice is to do nothing special.

    To further reduce the likelihood of NUMA latency, size domains so they don't unnecessarily span multiple sockets. This is unavoidable for very large domains that needs more CPU cores or RAM than are available on a single socket, of course.

    If you must control this for the most stringent performance requirements, you can use "named resources" to allocate specific CPU and memory resources to the domain, using commands like ldm add-core cid=3 ldm1 and ldm add-mem mblock=PA-start:size ldm1. This technique is successfully used in the SPARC Supercluster engineered system, which is rigorously tested on a fixed number of configurations. This should be avoided in general purpose environments unless you are certain of your requirements and configuration, because it requires model-specific knowledge of CPU and memory topology, and increases administrative overhead.

  6. Single thread CPU performance - Starting with the T4 processor, SPARC servers supporting domains can use a dynamic threading mode that allocates all of a core's resources to a thread for highest single thread performance. Solaris will generally detect threads that will benefit from this mode and "do the right thing" with little or no administrative effort, whether in a domain or not. An excellent writeup can be found in Critical Threads Optimization in the Observatory blog. Mentioned for completeness sake: there is also a deprecated command to control this at the domain level by using ldm set-domain threading=max-ipc mydomain, but this is generally unnecessary and should not be done.
  7. Live Migration - Live migration is CPU intensive in the control domain of the source (sending) host. Configure at least 1 core (8 vCPUs) to the control domain in all cases, but optionally add an additional core to speed migration and reduce suspend time. The core can be added just before starting migration and removed afterwards. If the machine is older than T4, add crypto accelerators to the control domains. No such step is needed on later machines.

    Perform migrations during low activity periods. Guests that heavily modify their memory take more time to migrate since memory contents have to be retransmitted, possibly several times. The overhead of tracking changed pages also increases CPU utilization.

  8. Network I/O - Configure aggregates, use multiple network links, use jumbo frames, adjust TCP windows and other systems settings the same way and for the same reasons as you would in a non-virtual environments.

    Use RxDring support to substantially reduce network latency and CPU utilization. To turn this on, issue ldm set-domain extended-mapin-space=on mydomain for each of the involved domains. The domains must run Solaris 11 or Solaris 10 update 10 and later, and the involved domains (including the control domain) will require a domain reboot for the change to take effect. This also requires 4MB of RAM per guest.

    If you are using a Solaris 10 control or service domain for virtual network I/O, then it is important to plumb the virtual switch (vsw) as the network interface and not use the native NIC or aggregate (aggr) interface. If the native NIC or aggr interface is plumbed, there can be a performance impact sinces each packet may be duplicated to provide a packet to each client of the physical hardware. Avoid this by not plumbing the NIC and only plumbing the vsw. The vsw doesn't need to be plumbed either unless the guest domains need to communicate with the service domain. This isn't an issue for Solaris 11 - another reason to use that in the service domain. (thanks to Raghuram for great tip)

    As an alternative to virtual network I/O, use Direct I/O (DIO) or Single Root I/O Virtualization (SR-IOV) to provide native-level network I/O performance. They currently have two main limitations: they cannot be used in conjunction with live migration, and cannot be dynamically added to or removed from a running domain, but provide superior performance. SR-IOV is described in an excellent blog article by Raghuram Kothakota.

  9. Disk I/O - For best performance, use a whole disk backend (a LUN or full disk). Use multiple LUNs to spread load across virtual and physical disks and reduce queueing (just as you would do in a non-virtual environment). Flat files in a file system are convenient and easy to set up as backends, but have less performance. For completely native performance, use a PCIe root complex domain and physical I/O.

    ZFS can also be used for disk backends. This provides flexibility and useful features (clones, snapshots, compression) but can impose overhead compared to a raw device. Note that local or SAN ZFS disk backends preclude live migration, because a zpool can be mounted to only one host at a time. When using ZFS backends for virtual disk, use a zvol rather than a flat file - it performs much better. Also: make sure that the ZFS recordsize for the ZFS dataset matches the application (also, just as in a non-virtual environment). This avoids read-modify-write cycles that inflate I/O counts and overhead. The default of 128K is not optimal for small random I/O.

  10. Networked disk on NFS and iSCSI - NFS and iSCSI also can perform quite well if an appropriately fast network is used. Apply the same network tuning you would use for in non-virtual applications. For NFS, specify mount options to disable atime, use hard mounts, and set large read and write sizes.

    If the NFS and iSCSI backends are provided by ZFS, such as in the ZFS Storage Appliance, provide lots of RAM for buffering, and install write-optimized solid-state disk (SSD) "logzilla" ZFS Intent Logs (ZIL) to speed up synchronous writes.

Summary

By design, logical domains don't have a lot of "tuning knobs", and many tuning practices you would do for Solaris in a non-domained environment apply equally when domains are used. However, there are configuration best practices and tuning steps you can use to improve performance. This blog note itemizes some of the most effective (and least exotic) performance best practices.

Comments:

I am looking to use nfs so that the live migrations are easier and my San admin doesn't have to provide so many luns ( I've got way too many ldoms and zones ). Can you elaborate on the options for Nfs please?

Thanks
Murali

Posted by Murali on February 20, 2013 at 07:21 AM MST #

Hi Murali,

This is pretty straightforward. Just set up NFS exports on your favorite NFS server as root-writable to the control domains hosting the guest domains (or, to be more specific: to the service domains). I set up an NFS share, and under that I have a separate directory to contain the virtual disks for each domain, created by simple 'mkfile -n'

For example, I have a mountpoint /ldomsnfs exported from Solaris 11:
# zfs create -o compression=on -o mountpoint=/ldomsnfs rpool/export/ldomsnfs
# zfs set share=name=ldomsnfs,path=/ldomsnfs,prot=nfs,sec=sys,rw,root=@<PUT SUBNET HERE> rpool/export/ldomsnfs

From a control domain which is the NFS client:
$ ls -l /ldomsnfs/ldom1
total 16002600
-rw------- 1 root root 21474836480 Feb 20 03:36 disk0.img

Posted by Jeff Savit on February 20, 2013 at 10:49 AM MST #

Jeff,

Thanks for your reply.

I've got a NFS exported out of the EMC SAN (NFS share).

On the control domain - I created a mount point called : /export/LDOMS/
and then I simply do mkfile -n <ldomname> in /export/LDOMS so that I can then add that via vdsdev to the appropriate LDOMs. It looks like both you and I are doing the same thing. However, I'm confused where you're running this command:
# zfs set share=name=ldomsnfs,path=/ldomsnfs,prot=nfs,sec=sys,rw,root=@<PUT SUBNET HERE> rpool/export/ldomsnfs

Again - thanks for this article. I've got over 300 servers (virtual) running of 10 blades, and we're moving away from LUNs to NFS for our "app tier" as everything runs of JVM and logs dont need SAN LUN speeds. Db's we're still doing full LUNs.

Thanks
Murali

Posted by Murali on February 20, 2013 at 11:35 AM MST #

Hi Murali,

I'm glad it's working for you, and the only thing is that I created a separate ZFS dataset and explicitly shared it out (and controlling which subnet could mount it) using the Solaris 11 commands for that.

cheers, Jeff

Posted by guest on February 20, 2013 at 11:40 AM MST #

Thanks Jeff!

Posted by Murali on February 20, 2013 at 02:21 PM MST #

Post a Comment:
Comments are closed for this entry.
About

jsavit

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today