X

News, tips, partners, and perspectives for Oracle’s virtualization offerings

Best Practices - Dynamic Reconfiguration

Jeff Savit
Product Management Senior Manager
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)

Overview of dynamic Reconfiguration


Oracle VM Server for SPARC supports Dynamic Reconfiguration (DR), making it possible to add or remove
resources to or from a domain (virtual machine) while it is running.
This is extremely useful because resources can be shifted to or from virtual machines in response to
load conditions without having to reboot or interrupt running applications.
For example, if an application requires more CPU capacity,
you can add CPUs to improve performance, and remove them when they are no longer needed.
You can use even use Dynamic Resource Management (DRM) policies that automatically add
and remove CPUs to domains based on load.

How it works (in broad general terms)

Dynamic Reconfiguration is done in coordination with Solaris, which recognises a hypervisor request to change its
virtual machine configuration and responds appropriately. In essence, Solaris receives a message saying
"you now have 16 more CPUs numbered 16 to 31" or "8GB more RAM starting at address X" or
"here's a new network or disk device - have fun with it". These actions take very little time.

Solaris then can start using the new resource. In the case of added CPUs, that means dispatching processes
and potentially binding interrupts to the new CPUs. For memory, Solaris adds the new memory pages to its
"free" list and starts using them. Comparable actions occur with network and disk devices: they are recognised
by Solaris and then used.

Removing is the reverse process: after receiving the DR message to free specific CPUs,
Solaris unbinds interrupts assigned to the CPUs and stops dispatching process threads.
That takes very little time.

primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m
ldom1 active -n---- 5000 16 8G 0.9% 6h 59m
primary # ldm set-core 5 ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.2% 6d 22h 29m
ldom1 active -n---- 5000 40 8G 0.1% 6h 59m
primary # ldm set-core 2 ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m
ldom1 active -n---- 5000 16 8G 0.9% 6h 59m

Memory pages are vacated by copying their contents to other memory locations and wiping them clean.
Solaris may have to swap memory contents to disk if the remaining RAM isn't enough to hold all the contents.
For this reason, deallocating memory can take longer on a loaded system. Even on a lightly loaded
system it took several 7 or 8 seconds to switch the domain below between 8GB and 24GB of RAM.

primary # ldm set-mem 24g ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.1% 6d 22h 36m
ldom1 active -n---- 5000 16 24G 0.2% 7h 6m
primary # ldm set-mem 8g ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.7% 6d 22h 37m
ldom1 active -n---- 5000 16 8G 0.3% 7h 7m

What if the device is in use?

(this is the anecdote that inspired this blog post)

If CPU or memory is being removed, releasing it pretty straightforward, using the method described above.
The resources are released, and Solaris continues with less capacity.
It's not as simple with a network or I/O device: you don't want to yank a device out from underneath an application
that might be using it. In the following example, I've added a virtual network device to ldom1 and want to take it away,
even though it's been plumbed.

primary # ldm rm-vnet vnet19  ldom1
Guest LDom returned the following reason for failing the operation:
Resource Information
---------------------------------------------------------- -----------------------
/devices/virtual-devices@100/channel-devices@200/network@1 Network interface net1
VIO operation failed because device is being used in LDom ldom1
Failed to remove VNET instance

That's what I call a helpful error message - telling me exactly what was wrong.
In this case the problem is easily solved. I know this NIC is seen in the guest as net1 so:

ldom1 # ifconfig net1 down unplumb 

Now I can dispose of it, and even the virtual switch I had created for it:

primary # ldm rm-vnet vnet19  ldom1
primary # ldm rm-vsw primary-vsw9

If I had to take away the device disruptively, I could have used ldm rm-vnet -f
but that could disrupt whoever was using it. It's better if that can be avoided.

Summary

Oracle VM Server for SPARC provides dynamic reconfiguration, which lets you modify a guest domain's
CPU, memory and I/O configuration on the fly without reboot. You can add and remove resources as needed,
and even automate this for CPUs by setting up resource policies.

Taking things away can be more complicated than giving, especially for devices like disks and networks
that may contain application and system state or be involved in a transaction. LDoms and Solaris cooperative
work together to coordinate resource allocation and de-allocation in a safe and effective way.
For best practices, use dynamic reconfiguration to make the best use of your system's resources.

Join the discussion

Comments ( 3 )
  • Del Saturday, September 1, 2012

    Hi,

    it sounds to good to be true.... I mean that in 90% of the real life situations this DR depends on the running app... FOE.

    If we have a running oracle 10 db on a ldom and if we take some CPU and memory?... I think that the db will crash! Am I wright?


  • Jeff Monday, September 3, 2012

    Hi Del,

    I guess in the ultimate analysis *everything* depends on the app, doesn't it? If, for example, an app won't benefit from extra RAM or extra CPUs (perhaps it only has a fixed number of runnable process threads) then adding more won't help - whether in DR situation or not.

    Now, taking resources away from a running application can be complicated by the fact that some sophisticated applications tune themselves to use all the resources made available to them - and may or may not be programmed to respond to subsequent changes in configuration. (of course, taking CPUs away from an idle application is quite safe) That's in general, but taking the example of Oracle database under Solaris: if you are using Intimate Shared Memory (ISM), you are (this is my understanding - I'm not expert in Oracle DB) locking/pinning pages - so you shouldn't be able to remove them. If you have correctly configured Dynamic Intimate Shared Memory (DISM) you may be able to reduce memory. Again, not my area of expertise, so please consult the relevant documents. Also, it might be that there are things 10g doesn't do in this area that 11g handles.


  • Del Tuesday, September 4, 2012

    Hi Jeff,

    You are absolutely right... *everything* depends on the app ;). In my opinion when we are talking about dynamic resource reallocation (especially for resource removing) we all must think of the whole system (not only the OS, but the app running above, too). :))


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.