Live migration has been an eagerly anticipated addition to Oracle VM Server for SPARC,
so I immediately upgraded a pair of lab machines to OVMSS 2.1 and started experimenting.
The machines I used are a T5120 and T5220, with T2 chips running at 1.2GHz (actually 1165MHz, but why quibble).
Not the fastest or most recent T-series servers, but more than adequate for the test.
First, a review of domain migration.
A domain (also called virtual machine) is moved from one host server (the "source" system) to another (the "target").
Both source and target systems require access to common network infrastructure
and to disk resources used by the guest virtual machine. Virtual disks are typically hosted on a SAN or via NFS.
Oracle VM Server for SPARC offers the following types of domain migration:
In all cases, the LDoms managers on source and target machines cooperate to migrate the domain.
The domain is defined on the source system before migration, and is defined on the target
system afterwards with the same state (actively running or not) and same identity and resources as before.
Also, the same syntax is used in each case, eg:
ldm migrate mydomain <othersystem>.
The LDoms manager performs cold migration if the domain is inactive, and uses live migration (on 2.1 systems) or warm migration otherwise.
Note that cold and warm migration have been available since 2008.
I defined a guest domain called "rover" (perhaps I should have named it "bilbo") with 8 CPUs and 512MB of RAM (later set to 2GB).
I migrated it back and forth between the two machines, which took about 15 seconds each time, plus or minus 1 second.
Later, I increased the domain's memory to 2GB of RAM, and migrated it while it was doing a 'zpool scrub' of a mirrored ZFS root pool.
That took about 25 seconds. The larger the memory image, and the more memory changes during migration, then the longer migration will take.
The really good news is that the guest was responsive almost the entire time in all tests.
I was logged onto rover via
ssh and it felt "normal" at the keyboard, so the unresponsive period was too short to detect
that way. I sent a stream of pings from other hosts to the migrating domain, and really couldn't tell anything there either:
ping times rose slightly but no pings were dropped.
Finally, I launched an X-windows graphics demo
/usr/X/demo/muncher& and watched it run.
This X windows program continuously draws pretty pixels on its windows.
Sure enough, it briefly paused at the very end of the migration.
In this screen shot you can see a live migration in progress.
I've previously migrated rover from the system in the top left window (note the output from
and now am migrating it back again from the system on the right.
You can observe graphical representation of CPU consumption: the source system is able to almost saturate 16 CPUs while compressing
and transmitting domain contents in parallel; it takes far less CPU power on the receiving side.
I have a terminal window open on rover with a small shell script that repeatedly sleeps two seconds and displays the current time.
In a separate window is the X-windows demo
Domain or virtual machine migration (regardless of vendor)
is not the answer to all IT issues. For example, it doesn't provide disaster recovery or
non-disruptive high availability (if the server or site hosting a domain is down, you can't initiate migration).
For that, you need a true high availability solution, such as Oracle Solaris Cluster or Oracle Real Application Clusters (RAC).
Oracle VM Server for SPARC lets you provision redundant I/O and multiple service domains.
That makes it possible to perform non-disruptive "rolling upgrades" within a single box: you can
take a service domain down to update its OS version while guest I/O continues from other service domains.
After the first service domain is upgraded, you can update other domains in like manner without guest domain outages or interruption of service.
This is not possible with systems that use a monolithic hypervisor and must take an outage to update it.
On those platforms, virtual machine migration is the only way to upgrade without disruption.
That said, live migration certainly adds a useful option for upgrading and servicing logical domains systems in conjunction or instead of configuring same-box redundancy.
Oracle VM Server for SPARC 2.1 can now migrate running guest domains between servers with little delay and with
extremely short suspension periods.
Domain migration can now be used in a wider set of purposes.
Possible use cases include migrating a domain to free up resources on the source machine for other domains,
consolidating domains onto a smaller set of servers during low use periods in order to reduce power consumption,
or migrating domains off a server that is being taken out of service for maintenance purposes or to be decommissioned.
Next blog entry: using DTrace to observe domain migration!
EDIT: I'm not the "principal engineer at Oracle responsible for the Sparc hypervisor" as The Register just said
here but thanks for the notice anyway! I agree with them that LDoms is one of the most clever things to come down the pike in a long time. It's a very significant innovation in server virtualization.