What's up with LDoms: Part 6 - Sizing the IO Domain
By Stefan Hinker on Dez 21, 2012
Before Christmas break, let's look at a topic that's one of the more frequently asked questions: Sizing of the Control Domain and IO Domain.
By now, we've seen how to create the basic setup, create a simple domain and configure networking and disk IO. We know that for typical virtual IO, we use vswitches and virtual disk services to provide virtual network and disk services to the guests. The question to address here is: How much CPU and memory is required in the Control and IO-domain (or in any additional IO domain) to provide these services without being a bottleneck?
The answer to this question can be very quick: LDoms Engineering usually recommends 1 or 2 cores for the Control Domain.
However, as always, one size doesn't fit all, and I'd like to look a little closer.
Essentially, this is a sizing question just like any other system sizing. So the first question to ask is: What services is the Control Domain providing that need CPU or memory resources? We can then continue to estimate or measure exactly how much of each we will need.
As for the services, the answer is straight forward:
- The Control Domain usually provides
- Console Services using vntsd
- Dynamic Reconfiguration and other infrastructure services
- Live Migration
- Any IO Domain (either the Control Domain or an additional IO domain) provides
- Disk Services configured through the vds
- Network Services configured through the vswitch
For sizing, it is safe to assume that vntsd, ldmd (the actual LDoms Manager daemon), ldmad (the LDoms agent) and any other infrastructure tasks will require very little CPU and can be ignored. Let's look at the remaining three services:
- Disk Services
Disk Services have two parts: Data transfer from the IO domain to the backend devices and data transfer from the IO Domain to the guest. Disk IO in the IO domain is relatively cheap, you don't need many CPU cycles to deal with it. I have found 1-2 threads of a T2 CPU to be sufficient for about 15.000 IOPS. Today we usually use T4...
However, this also depends on the type of backend storage you use. FC or SAS rawdevice LUNs will have very little CPU overhead. OTOH, if you use files hosted on NFS or ZFS, you are likely to see more CPU activity involved. Here, your mileage will vary, depending on the configuration and usage pattern. Also keep in mind that backends hosted on NFS or iSCSI also involve network traffic.
- Network Services - vswitches
There is a very old sizing rule that says that you need 1 GHz worth of CPU to saturate 1GBit worth of ethernet. SAE has published a network encryption benchmark where a single T4 CPU at 2.85 GHz will transmit around 9 GBit at 20% utilization. Converted into strands and cores, that would mean about 13 strands - less than 2 cores for 9GBit worth of traffic. Encrypted, mind you. Applying the mentioned old rule to this result, we would need just over 3 cores at 2.85 GHz to do 9 GBit - it seems we've made some progress in efficiency ;-)
Applying all of this to IO Domain sizing, I would consider 2 cores an upper bound for typical installations, where you might very well get along with just one core, especially on smaller systems like the T4-1, where you're not likely to have several guest systems that each require 10GBit wirespeed networking.
- Live Migration
When considering Live Migration, we should understand that the Control Domains of the two involved systems are the ones actually doing all the work. They encrypt, compress and send the source system's memory to the target system. For this, they need quite a bit of CPU. Of course, one could argue that Live Migration is something happening in the background, so it doesn't matter how fast it's actually done. However, there's still the suspend-phase, where the guest system is suspended and the remaining dirty memory pages copied over to the other side. This phase, while typically very very short, significantly impacts the "live" experience of Live Migration. And while other factors like guest activity level and memory size also play a role, there's also a direct connection between CPU power and the length of this suspend time. The relation between Control Domain CPU configuration and suspend time has been studied and published in a Whitepaper "Increasing Application Availability Using Oracle VM Server for SPARC (LDoms) An Oracle Database Example". The conclusion: For minimum suspend times, configure 3 cores in the Control Domain. I personally have made good experience with 2 cores, measuring suspend times as low as 0.1 second with a very idle domain, so again, your mileage will vary.
Another thought here: The Control Domain doesn't usually do Live Migration on a permanent basis. So if a single core is sufficient for the IO Domain role of the Control Domain, you are in good shape for everyday business with just one core. When you need additional CPU for a quick Live Migration, why not borrow it from somewhere else, like the domain being migrated, or any other domain not currently very busy? CPU DR does lend itself for this purpose...
As you've seen, there are some rules, there is some experience, but still, there isn't the single, one answer. In many cases, you should be ok with a single core on T4 for each IO domain. If you use Live Migration a lot, you might want to add another core to the Control Domain. On larger systems with higher networking demands, two cores for each IO Domain might be right. If these recommendations are good enough for you, you're done. If you want to dig deeper, simply check what's really going on in your IO Domains. Use mpstat (1M) to study the utilization of your IO Domain's CPUs in times of high activity. Perhaps record CPU utilization over a period of time, using your tool of choice. (I recommend DimSTAT for that.) With these results, you should be able to adjust the amount of CPU resources of your IO Domains to your exact needs. However, when doing that, please remember those unannounced utilization peaks - don't be too stingy. Saving one or two CPU strands won't buy you too much, all things considered.
A few words about memory: This is much more straight forward. If you're not using ZFS as a backing store for your virtual disks, you should be well in the green with 2-4GB of RAM. My current test system, running Solaris 11.0 in the Control Domain, needs less than 600 MB of virtual memory. Remember that 1GB is the supported minimum for Solaris 11 (and it's changed to 1.5 GB for Solaris 11.1). If you do use ZFS, you might want to reserve a couple GB for its ARC, so perhaps 8 GB are more appropriate. On the Control Domain, which is the first domain to be bound, take 7680MB, which add up to 8GB together with the hypervisor's own 512MB, nicely fitting the 8GB boundary favoured by the memory controllers. Again, if you want to be precise, monitor memory usage in your IO domains.
- Whitepaper about Live Migration & Control Domain sizing: "Increasing Application Availability Using Oracle VM Server for SPARC (LDoms) An Oracle Database Example"
- mpstat (1M) manpage