Friday Spotlight: Keeping a Server in Reserve in the Event of a Server Failure
By Ronen Kofman on May 02, 2014
Q: How can I ensure I always have one server in reserve to use in the event of server failure?
A: Uncheck the VM server box on a server not running any virtual machines
Everyone should be familiar with the fact that Oracle VM will automatically restart Oracle VM guests on other servers in the pool if a single server reboots or panics for any reason. Oracle VM will attempt restart all virtual machines on other servers in the pool automatically as long as you have selected the “Enable HA” checkbox in the edit virtual machine dialog.
However, even this high availability feature will not execute if all of the remaining servers are out of computing resources such as RAM or CPU. There are a couple ways to ensure that the other servers in the server pool will have adequate computing resources to handle redistributing all the Oracle VM guests from a failed server.
You can distribute Oracle VM guests between the servers ensuring that you always leave enough resources free on each server to handle the additional Oracle VM guest load once another server has failed. This is the best option since it allows the HA process to occur automatically.
But, we all know that people tend to load up the servers to their maximum capacity over time, sneaking an extra virtual machine in here and there until all servers are running at maximum load. There is a way to force people to follow a resource loading scheme by devising an N+1 server pool so you always have one server running but in reserve.
This solution relies on temporarily disabling one of the Oracle VM servers in a server pool from being able to run any virtual machines. You can do this simply by unchecking the VM Server box in the Edit Server dialog. To do this, select and edit a server that has no virtual machines running on it, uncheck the VM Server box and then save the change by pressing OK as show below.
The inherent problem with this approach is that it requires manual intervention by the systems administrator to re-enable the ability to run Oracle VM guests and then restart the virtual machines manually. So, this solution is not as good as simply ensuring the servers in a pool are running a balanced load of 75% or so across the pool, but it does enforce the rule of leaving at least one server ready to take on the entire load from a failed server.
Contributed by Greg King, Principle Best Practices Consultant, Oracle VM Product Management