CPU | CPUs can be shared, oversubscribed, and timesliced using a share-based scheduler. CPUs can be allocated cores (or CPU threads if hyperthreading is enabled.) The number of virtual CPUs in a domain can be changed while the domain is running. |
Memory | Memory is dedicated to each domain, there is no over-subscription. The hypervisor attempts to assign a VM's memory to a single NUMA node, and has CPU affinity rules to try to keep a VM's virtual CPUs near its memory for local latency.
Oracle VM does not over-subscribe memory because that can unpredictably harm virtual machine performance. Guest VMs have poor locality of reference so are not good candidates for normal page replacement, and nesting guest operating systems that oversubscribe under a hypervisor that oversubscribes can lead to pathological (albeit interesting) performance problems and double-paging. |
Domain types | Guest VMs (domains) may be hardware virtualization (HVM), paravirtualized (PV) or hardware virtualized with PV device drivers. |
In addition, a privileged domain "dom0" is used for system control and to map guest VMs ("domU") virtual I/O onto physical devices. It is essential to make sure that dom0 performs well, as all guest I/O is processed by dom0.
The most important tuning actions are to control allocation of virtual CPUs to physical ones. As mentioned in the previous article, giving a domain too many CPUs can harm performance by increasing NUMA latency or increasing multiprocessor overhead like lock management. EDIT: nice write up from ACM Queue at http://queue.acm.org/detail.cfm?id=2852078
Similarly, giving a domain too much memory can cause NUMA latency if the memory straddles sockets. Since the number of VMs is limited by the number we fit in memory, overallocating VM memory reduces the density of guests that can be running.
Oracle VM applies this tuning to dom0 by default: recent versions of Oracle VM size dom0 so its virtual CPUs are pinned to physical CPUs on the first socket of the server, and size its memory based on the server's capacity. This eliminates NUMA memory latency since dom0's CPUs and memory are on the same socket. This is especially important for low-latency networking - further details are in the whitepaper on 10GbE network performance.
Similar tuning should be applied to guest VMs. In particular:
The physical server's architecture of sockets and cores can be determined by logging into dom0 and issuing the following commands:
root # xenpm get-cpu-topology
# note the relationship between CPU number, core and socket
To avoid this, monitor CPU utilization at the system level (Enterprise Manager provides observability, or you can use Oracle VM Manager's Health tab) to see if servers are approaching full utilization without head-room. Be cautious about averages: a smoothed average over time of (say) 95% CPU busy may hide intervals that were 100% busy.
Within virtual machines running Linux or Solaris, use vmstat, mpstat or iostat commands and observe "percent steal" time (%st or %steal). When steal is non-zero, the VM was runnable, had work to do, but the hypervisor was unable to give it CPU cycles.
For example see these commands (run on idle systems, so not themselves interesting):
linux$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 3135616 49224 193668 0 0 0 0 2 5 0 0 100 0 00 0 0 3135680 49224 193668 0 0 0 0 58 60 0 0 100 0 0
Being CPU saturated is not a problem - it means you're getting your money's worth. Being CPU saturated with latent, unserviced resource demand is the problem.
On the other hand, high steal is not a problem for "cycle soaker" VMs (if you are lucky enough to have them): compute-intensive workloads with lax service levels that can run when there are no cycles needed by anyone else.
Hyper-threading (HTT) is a controversial topic, as it improves performance in some cases, degrades others. With hyper-threading each core runs two CPU threads instead of one, and time-slices the threads (in hardware) onto the core's resources. If one thread is idle or stalls on a cache miss, the other thread can continue to execute - this provides a potential throughput advantage. On the other hand, the fact that both threads are competing for the core's resources, especially level 1 cache ("L1$") means that each thread may run slower than if it owned the entire core.
EDIT: Nice writeup can be found at https://en.wikipedia.org/wiki/Hyper-threading
Each thread is assignable by the Xen hypervisor to a VM virtual CPU, and can be dispatched as a CPU. The performance effect is workload dependent, so must be tested for different workload types.
In general, hyper-threading is best for multi-threaded applications that can drive multiple vCPUs, but can reduce per-vCPU performance and affect single-thread dominated applications. It needs to be evaluated in the context of the application.
You can determine if hyper-threading is enabled several ways. Log into dom0 on Oracle VM Server and issue the commands shown below on a server that has hyper-threading enabled:
root# dmidecode|grep HTT
HTT (Hyper-threading technology)
HTT (Hyper-threading technology)
root# egrep 'siblings|cpu cores' /proc/cpuinfo | head -2
siblings: 2
This article described CPU management on Oracle VM Server for x86, with an emphasis on reducing NUMA latency, avoiding problems that can come with over-subscription and explaining the effects of hyper-threading. The next article will discuss Xen domain types and virtual I/O.
Blogger's prerogative: I like to tell old war stories and anecdotes about performance. If you don't enjoy them, feel free to stop here! (If you do like them, drop me a note so I know you did :) )
Back in Ye Olde Days, you could simply look at the system to know what was going on. As Yogi Berra said, "You can observe a lot by just watching." On mainframes and minicomputers in days of yore, you could look at the lights on the front panel (we had great computer lights then) and get a clue about activity. No more, alas, whether because of cloud computing, or simply because the systems we work on are in a data center somewhere, or darn it because we just don't put enough lights on the computer.
While working my way through college as Boy Systems Programmer, I had to write a program that searched accounting data on tape (OS/360 SMF data, for those who know what that is) for records matching a list of file names, to see who had accessed or deleted them. I wrote a program that read through the data, which was on 2400 foot tape reels. When I hit the record type for file access, I searched for matches in an in-memory table of interesting file names. I walked into the computer room while the batch job was running and saw that the CPU was pinned (the CPU "WAIT" light was off) and the tape wasn't moving all that fast. Nice thing about the old tape drives: you could watch them spin and judge performance by how fast they moved. "Hmm, that can't be good." Canceled the job, replaced the sequential search through the table of file names with a binary search, and reran it. The 0.5 MIPS (!) CPU was no longer CPU-saturated, and the tape drive was spinning as fast as it could.
Sometimes seeing the system is the best way to even know there's a problem. You can't fix a problem if you don't know you have a problem. We usually can't look at the computer any more, and even if we can it doesn't tell us much, so we have replace tangible observation with the right tools.