Best Practices - Dynamic Reconfiguration
By jsavit on Aug 31, 2012
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
Overview of dynamic Reconfiguration
Oracle VM Server for SPARC supports Dynamic Reconfiguration (DR), making it possible to add or remove resources to or from a domain (virtual machine) while it is running. This is extremely useful because resources can be shifted to or from virtual machines in response to load conditions without having to reboot or interrupt running applications. For example, if an application requires more CPU capacity, you can add CPUs to improve performance, and remove them when they are no longer needed. You can use even use Dynamic Resource Management (DRM) policies that automatically add and remove CPUs to domains based on load.
How it works (in broad general terms)
Dynamic Reconfiguration is done in coordination with Solaris, which recognises a hypervisor request to change its virtual machine configuration and responds appropriately. In essence, Solaris receives a message saying "you now have 16 more CPUs numbered 16 to 31" or "8GB more RAM starting at address X" or "here's a new network or disk device - have fun with it". These actions take very little time.
Solaris then can start using the new resource. In the case of added CPUs, that means dispatching processes and potentially binding interrupts to the new CPUs. For memory, Solaris adds the new memory pages to its "free" list and starts using them. Comparable actions occur with network and disk devices: they are recognised by Solaris and then used.
Removing is the reverse process: after receiving the DR message to free specific CPUs, Solaris unbinds interrupts assigned to the CPUs and stops dispatching process threads. That takes very little time.
primary # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m ldom1 active -n---- 5000 16 8G 0.9% 6h 59m primary # ldm set-core 5 ldom1 primary # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 16 4G 0.2% 6d 22h 29m ldom1 active -n---- 5000 40 8G 0.1% 6h 59m primary # ldm set-core 2 ldom1 primary # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m ldom1 active -n---- 5000 16 8G 0.9% 6h 59m
Memory pages are vacated by copying their contents to other memory locations and wiping them clean. Solaris may have to swap memory contents to disk if the remaining RAM isn't enough to hold all the contents. For this reason, deallocating memory can take longer on a loaded system. Even on a lightly loaded system it took several 7 or 8 seconds to switch the domain below between 8GB and 24GB of RAM.
primary # ldm set-mem 24g ldom1 primary # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 16 4G 0.1% 6d 22h 36m ldom1 active -n---- 5000 16 24G 0.2% 7h 6m primary # ldm set-mem 8g ldom1 primary # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- SP 16 4G 0.7% 6d 22h 37m ldom1 active -n---- 5000 16 8G 0.3% 7h 7m
What if the device is in use?
(this is the anecdote that inspired this blog post)
If CPU or memory is being removed, releasing it pretty straightforward, using the method described above. The resources are released, and Solaris continues with less capacity. It's not as simple with a network or I/O device: you don't want to yank a device out from underneath an application that might be using it. In the following example, I've added a virtual network device to ldom1 and want to take it away, even though it's been plumbed.
primary # ldm rm-vnet vnet19 ldom1 Guest LDom returned the following reason for failing the operation: Resource Information ---------------------------------------------------------- ----------------------- /devices/virtual-devices@100/channel-devices@200/network@1 Network interface net1 VIO operation failed because device is being used in LDom ldom1 Failed to remove VNET instance
That's what I call a helpful error message - telling me exactly what was wrong.
In this case the problem is easily solved. I know this NIC is seen in the guest as
ldom1 # ifconfig net1 down unplumb
Now I can dispose of it, and even the virtual switch I had created for it:
primary # ldm rm-vnet vnet19 ldom1 primary # ldm rm-vsw primary-vsw9
If I had to take away the device disruptively, I could have used
ldm rm-vnet -f
but that could disrupt whoever was using it. It's better if that can be avoided.
Oracle VM Server for SPARC provides dynamic reconfiguration, which lets you modify a guest domain's CPU, memory and I/O configuration on the fly without reboot. You can add and remove resources as needed, and even automate this for CPUs by setting up resource policies.
Taking things away can be more complicated than giving, especially for devices like disks and networks that may contain application and system state or be involved in a transaction. LDoms and Solaris cooperative work together to coordinate resource allocation and de-allocation in a safe and effective way. For best practices, use dynamic reconfiguration to make the best use of your system's resources.