Best Practices - Live Migration on Oracle VM Server for SPARC
By jsavit on Jun 14, 2013
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly called Logical Domains)Oracle VM Server for SPARC has supported live migration since 2011, providing operational flexibility for customers who need to move a running guest domain between servers. This can be extremely useful, but there's confusion about when it is the right tool to use, when it isn't, and how to best make use of it. This article will discuss some best practices and "do's and don'ts" for live migration.
Oracle VM Server for SPARC (originally, and still often called Logical Domains) provides live migration, which non-disruptively moves a running guest domain from one SPARC server to another. As with most things, this requires planning and has several technical requirements:
- Servers running compatible versions of Oracle VM Server for SPARC and firmware.
- Common network accessibility.
- Networked storage (that is, not using internal direct-attach disks within the server). Disks can be FC SAN LUNs, iSCSI, or NFS disk, but must be available on both source and target system.
- Guests only using virtual devices - that is, virtual network and disk devices provided by a service domain, as opposed to physical assignment of devices. At this time, a guest with physical I/O (dedicated PCIe bus, direct I/O card, SR-IOV) cannot be live migrated.
- Identical CPU chip and clock frequencies, unless cross-CPU live migration is used. That requires planning as described in this blog on cross CPU live migration.
Several virtualization technologies offer live migration, and while details differ, they work in roughly similar ways:
- Migration is initiated by an operator or automated tool
- Checks are made to ensure the guest can be migrated. That includes tests for server compatibility, network connectivity (no point trying if the source and target servers can't even talk to each other), and available resources to meet the guest's requirements.
- The virtual machine is copied from the source to the target server.
- Guest memory contents are copied over the network from the source system to the target while the guest continues to run.
- Since the guest is running, it is changing its memory contents. The virtual machine system tracks which memory pages have been changed after being transmitted to the target.
- Changed pages are retransmitted - but the guest is still changing its memory, so retransmitted pages might be changed again. And again...
- Depending on which product this is, there may be a fixed number of "passes" over memory, or the system may continue to retransmit until the working set of changed pages does not shrink.
- At that point, the guest virtual machine or domain is "suspended" on the source system so its residual memory contents can be copied.
- Finally, the guest is started on the target machine, with all its state intact, including access to its I/O, IP and MAC addresses, etc.
How long will it take? What can be done to reduce migration time?
Migration time is one of the first things that people ask. There are actually several times to consider:
- migration time - the total time it takes to migrate the guest VM.
- suspend time - the amount of time the guest is suspended during the last phase of migration.
- overhead time - how much longer will functions take within a migrating virtual machine due to overhead?
Total migration time is the one most people think of, but in practice suspension time is usually more important, because it represents the time that an application is unresponsive. Overhead added by migration also is significant because it changes the user's experience of the guest's performance.
Unfortunately, there is no easy way to calculate these times. They are a function of server speed, the algorithms used in the live migration process, the speed of the network connecting the servers, the size of the virtual machine's memory, and how actively the guest VM is changing its memory contents. Unfortunately, there is no rule of thumb that accurately predicts how long it will take an arbitrary virtual machine to migrate from one server to another. Even if you know the memory size you can't really predict, because you can have a large memory VM that is only using a small working set, or a relatively small VM that is actively changing most of its memory (think of a large Java heap during a full GC): a CPU can update memory much faster than any network can transmit it. This is true across the industry. You can estimate how long it will take to migrate a virtual machine only by experimenting with a specific system and workload.
The following best practices can be used to reduce live migration time on Oracle VM Server for SPARC:
- Allocate sufficient CPU resources to the control domain, especially the control domain on the source system. Two CPU cores is typically enough, but you can add more if needed, and remove them after migration finishes. One CPU core is the minimum that should be allocated.
- Use a fast and otherwise low-utilization network segment.
- On machines prior to T4, allocate crypto accelerators to the control domain. This is unnecessary on recent servers because the crypto accelerator is built into the instruction pipeline without requiring an administrative step. Remember that Oracle VM Server for SPARC always encrypts memory contents transmitted during live migration. That should be done regardless of virtualization technology to avoid the security exposure of transmitting memory contents in clear.
- Try to reduce the virtual machine's memory size. Oracle VM Server for SPARC lets you add and remove memory from a running domain, so you can try to reduce the memory size before migration. This isn't a panacea, as it takes time to free up in-use memory pages within the guest, but can be a useful method anyway.
- Try to run the live migration during periods of low activity. The lower the load within the guest VM, the lower the overhead effects will be, and the smaller the number of pages changed during migration and requiring retransmission.
Use NTP for time consistency
Note that clocks are not advanced during domain suspension which can create clock skew. It's necessary to run the NTP client service in the guest domain so it can re-sync with correct times after migration completes.
Is live migration always the right answer?
This is an important question - live migration is not always the solution to the problem at hand. Just because you can do something doesn't mean you should do something!
First, it's not a substitute for fault resiliency or high availability technology: you cannot live migrate a virtual machine from a server that isn't alive (yes, I have been asked that). It is useful for vacating a server when planned in advance of a maintenance window or even a tech refresh. That makes it possible to service or even replace a server without an application outage - provided you plan to do it while the source server is available! Of course, if you have a server that is beginning to fail, then by all means use live migration to evacuate it before it goes away.
Second, there are sometimes better solutions than using live migration to provide uninterrupted service - this can sometimes be better solved at the application level than at the virtualization level. For example, if you have a stateless web application behind a load balancer that sprays web requests to multiple machines, it is much simpler to remove a server from the load balancer than migrate all of its guests off. Optionally, you can fire up new application instances on other servers in the web farm. It's a straightforward and well-understood method that works with or without virtualization.
Enterprise applications that provide their own resiliency provide a strong alternative to live migration. For example, Oracle WebLogic and Oracle Real Application Clusters (RAC) already have the ability to control an application that is distributed over cooperating instances on multiple server nodes. Rather than live migrate a WebLogic instance or RAC node, it is generally easier and faster to halt and remove a node from the application cluster than to live migrate the virtual machine containing it. Those applications tend to have large memories that they actively update, making them the ones most likely to have longer migration and suspend times.
Yes, they can be live migrated, but in this case there's an easier way to have the same effect. You can combine techniques to leverage their strengths: for example, remove a node from a cluster (fast, well understood technique), shutdown the VM it runs in, and then optionally do a cold migration of the guest domain to another server, and start it up again. The advantage is that cold migration is essentially instantaneous. No need to live migrate a massive Oracle database memory image in order to rehost it.
Migration and distributed resource management
Live migration is sometimes used for distributed resource management. An important guest VM might be migrated off a mostly-full server to a server with more capacity in order to give it more CPU and memory capacity. Conversely, less important guests can be migrated off of a server in order to free up resources for an important VM that stays where it is.
This is a widely used and effective technique, which can be used with Oracle VM Server for SPARC. However, OVM SPARC offers an elegant alternative method to achieve similar effects. Rather than migrate guests off of a box in order to free up resources, one can simply redistribute resources by shrinking CPU and memory assignments to less-important or idle guests and give freed up resources to the guests that need them. That can be faster and more effective than moving entire virtual machines from one server to another across an intervening network. Not all virtualization technologies permit dynamically adding and removing (often the harder task) resources from a running VM, but Oracle VM Server for SPARC provides this due to cooperation between the hypervisor and Solaris OS.
Oracle VM Server for SPARC can live migrate running guest domains between servers. This can be an effective way to enhance operational flexibility, and can be used to evacuate a server or to provide distributed resource management. This blog entry describes some rules about how to use it effectively, and offers alternatives that can be even more effective
Irrelevant observation: for the first time ever, I saw a system where all the domains were assigned CPU counts that were prime numbers. :-)