I've been interested in virtual machine performance a very long time, and while I've written Best Practices items that included performance, this post starts a series of articles exclusively focused on Oracle VM performance. Topics will include:
I intend this series to be relatively high-level in essays that relate concepts to think about, illustrated by some examples and with links to other resources for details.
There was a time when working on virtual machine performance, or system performance in general, required fine tuning of many system parameters. CPU resources were scarce, with many virtual machines on servers with one or a few CPUs, so we carefully managed CPU scheduling parameters: priorities or weights for VMs, distinction of interactive vs. non-interactive VMs to prioritize access, duration of time slices. Memory was expensive and capacity was limited, so we aggressively over-subscribed memory (some VM systems still do but it's less a factor than it once was) and administered memory management: page locking and reservation, working set controls, life time of idle pages before displacement.
Since we had to page (sometimes loosely referred to as "swapping", though there is a difference), we created hierarchies of page and swap devices and spread the load over multiple disks. Disks were slow and uncached, so we sometimes individually positioned files to reduce rotational and seek latency. It took a lot of skilled effort to have systems perform well.
These items have less impact in modern virtualization systems. Many of these issues have been eliminated or rendered less important by architecture, product design, or the abundance of system resources seen with today's systems. In general, administering plentiful resources for optimal performance is very different from apportioning scarce resources. In particular, Oracle VM products eliminate as much of the effort as possible, and design in best practices and performance choices to fit today's applications and hardware to perform well "out of the box".
Today's servers have lots of CPU capacity, and we don't need to run at 95% CPU busy (though we can, of course) to make them cost effective, so we don't tweak CPU scheduling parameters to prevent starvation as we once did. Oracle VM Server for x86 lets you set CPU caps and weights as needed, and Oracle VM Server for SPARC dedicates CPU threads or cores directly to guests, so the topic simply evaporates on that platform. Having enough CPU cycles to get the job done is rarely the problem now. Instead, we now tune for scale and to handle Non Uniform Memory Access (NUMA) properties of large systems.
Neither Oracle VM platform over-subscribes memory, so we don't have to worry about managing working sets, virtual to real ratios, or paging performance. That's true in today's non-VM environments too, where it's safe to say that if you're swapping, you've already suffered performance and should just add memory. This eliminates an entire category of problematic performance management that often (in the bad old days) resulted in pathological performance issues. Friends don't let friends swap. Instead, what memory tuning remains is around NUMA latency and alignment with CPUs.
There are still performance problems and the need to manage performance remains - or why would I be writing this? Effort has moved to other parts of the technology stack - network performance is much more important than it once was, for instance. Workloads are more demanding, less predictable, and are subject to scale up/down requirements far beyond those of the earlier systems. There still are, and probably always will be, constraints on performance and scale that have to be understood and managed. That said, the goal of Oracle VM is to increasingly have systems "do the right thing" for performance with as little need for administrative effort as possible. I'll illustrate examples of that in this series of posts.
Here's a problem that existed in the old days and persists today in different form, which makes it interesting to think about.
Let's say it was 8am on a Monday morning back when people logged into timeshared systems for their daily work. The first people to get to their desks, coffee in hand, would get really good response time reading their e-mail until the laggards showed up.
Response time could degrade as users logged in if utilization of a key system component reached a performance sensitive level. However, suitably sized and tuned systems could handle this or other peaks related to business cycles. In my case it was tied to the open and close of the New York stock exchanges or timing of large derivative transactions.
Now suppose there was a system outage: when the system came back up, all the interrupted users would login at the same time to resume their work, and performance could be terrible. All the users would place demand on the system at the same time, instead of the normal mix of busy and idle (think time) users, placing pressure on CPU, memory, and I/O, and single-threading any serialized portion of the system. Viewed from the perspective of "getting work done", it could often be faster to have fewer active people logged on, so they could complete their effort and drop to idle state, than have everyone trying at once. You could even see this with plain old batch jobs: if there was excessive multiprogramming, reducing the number of tasks competing for resources could make the entire collection of work run faster. Classic congestion effect, like too many cars on the same highway.
Systems of the day had clever controls to prevent excessive multiprogramming, usually driven by memory consumption to prevent thrashing on the small RAM capacities available: a subset of users whose working sets fit in RAM would be allowed to run, while other users had to wait. The fortunate users either finished their work and became idle so their memory could be re-used by others, or were evicted after a while to give somebody else a turn (that's where address space swapping comes in).
That applied to the old timesharing systems, and applies even more to today's virtual machine systems, especially since the CPU, memory, or I/O footprint of starting up a VM is substantial. This is such a well known problem that it is referred to as a "boot storm" - a Google search of "boot storm" yields over 40 million hits - this is a well known problem! Let's consider a few reasons this has such a powerful effect:
How can a boot storm be handled? A number of methods can be used:
While the landscape has changed, and we no longer tune to the same factors we once managed, the need to manage performance has not disappeared. Further articles in this series will discuss different aspects of Oracle VM performance management. Some starting concepts:
For additional resources about Oracle VM Server