Virtualization @ Oracle (Part 5: Resource Management as Enabling Technology for Virtualization)
By uwes on Apr 24, 2012
After discussing Oracle VM and OS-Virtualization with Zones and Containers in the previous articles, we will cover some enabling technologies for virtualization in the next two articles and start with
Resource Management as Enabling Technology for Virtualization
Of course here we are talking about IT Resource Management as technology, but why is this important? In the first article of this series we have used a definition of virtualization:
"Virtualization, in computing, is the creation of a virtual (rather than actual) version of something, such as a hardware platform, operating system, a storage device or network resources."
Resources are the foundation that get virtualized by the different virtualization technologies. These are:
- Hardware like the CPU, memory or devices
- The network
- The Operating System
- The desktop
- A general software layer.
Resource management limits access to shared resources, but also monitors resource consumption and collects accounting information.
The management of these resources is important, because many consumers like VMs, zones, containers or virtual desktops are requesting resources. Consolidating different workloads on one system also means to combine workloads with different needs for throughput, response time, availability and service level agreements on one system.
But resources are always limited and they are shared among many virtualization environments on one IT system, therefore it is important to restrict access to specific shared resources, isolate resources from being used by certain workloads or at least limit shared resource consumption of workloads. With that, we can guarantee a service level for each virtualized environment or can influence their performance. Without resource management all workloads would be handled equally, based on their resource requests. The result could be that e.g. one VM consumes much memory during runtime and other VMs on the same system get blocked because an important memory request can no longer be served, because of no more available memory. Another example is how many resources (e.g. CPU) should be shown to or seen by a virtualized environment. This could be important for license or software behavior reasons.
So the goals of resource management related to zones, container or VM are:
- Prevent them from consuming resources unlimited
- Change a priority, based on external events
- Balance resource guarantees against the goal of maximizing system utilization
By using constraints we set bounds on the consumption of specific resources. With that we can control ill-behaved environments, that would otherwise compromise the performance of other environments, the whole system or might also have an effect to the availability through unregulated resource requests. Typically constraints are enforced through resource controls, which are set by the system administrator. Examples for resource controls can be e.g.:
- Used semaphore
- Number of open files
- Used virtual memory
- Number of processes
- Used network bandwidth
There are different ways to act by the system, if a specific bound has been reached.
- Allowing the request, but let the requester know that the bound has been reached
- Cap the resource delivery on the defined bound
- Reject the whole resource request with an error message to the application
- Generate an action on the system to free-up resources and provide the requester with the needed resources
Depending on the implementation, applications or virtualized environments must be modified to know about resource controls and constraints. But the use of constraints is very flexible and enables the change of boundaries during runtime. And the use of constraints enables a workload to use free resources that have been assigned, but not needed by a different workload.
Example 1: Constraints are important and useful for all kind of shared parallel access to resources. Good examples are processes, project or Oracle Solaris Zones. They all use and share one Kernel. That’s why many resource controls have been introduced in Oracle Solaris, to be able to limit their resource consumption. There is e.g. the resource control of zone.max-processes in Oracle Solaris 11, which limits the number of processes a zone can run. This is important to limit, because the process table of each OS Kernel is large, but limited in size. With this resource control we limit the portion of a zone on this table and prevent the system from ill-behaved administrative work in a zone like infinite shell-scripts creating processes. With the resource control enabled, the kernel will at some point no longer allow the zone to create a new process.
Example 2: Another common shared resource in systems is typically the network, so the connection of the workloads to the outside world. If all VM share one network cable, the bandwidth-consumption needs to be limited by VM. We will cover this example more in detail within the next article.
With scheduling we divide a resource into specific intervals and allocate them, based on a predictable algorithm. If an allocation is not needed, the resource interval can be used by others.
An example for a scheduled resource is CPU-time. With this mechanism the available time of a CPU is divided into allocation units, which are used by applications. Scheduling-based resource management enables full utilization of a configuration. But in a critically committed or over-committed situation the scheduling algorithm guarantees controlled access of all applications to the resource. Depending on the scheduling algorithm it is to be defined what “controlled access” means and under what reasons and measurements the allocation units are changed or assigned to applications. This can be based e.g. on predefined importance of an application.
Example 1: Scheduling is achieved by the use of the fair share scheduler (FSS) in Oracle Solaris together with Oracle Solaris Zones. The FSS allows the allocation of CPU resources. Each zone can have a share assigned to it. The shares are used to manage the CPU resources in the event that the zones compete for CPU time.
- If the workload is less than 100%, no management is done since free CPU capacity is still available.
- If the workload is at 100%, the fair share scheduler is activated and modifies the priority of the participating processes such that the assigned CPU capacity of a zone corresponds to the defined share.
- The defined share is calculated from the share value of an active zone divided by the sum of the shares of all active zones.
With the FSS we guarantee the response time of workloads, based on CPU shares if the system is fully utilized.
Example 2: Another example is the creation and handling of virtual CPUs in Oracle VM for x86 if we do not pin directly vCPU to physical CPU. In that case a virtual CPU (vCPUs) is managed (scheduled) by a local run queue that “divides” a physical CPU into multiple vCPUs. This work is done by the hypervisor. The queue is sorted by vCPU priority. In the queue, every vCPU gets its fair share of CPU resources. The priority that a CPU would get can be managed by manipulating the weight and a cap value. The relative weight parameter is used to assign the amount of CPU cycles that a domain receives. A vCPU with a weight of 64 would receive twice as much CPU cycles as a vCPU with a weight of 32. A second parameter to tune is the cap parameter. This parameter defines in a percentage the maximum amount of CPU cycles that a domain will receive. This is an absolute value; If it is set to 100, it means that the vCPU may consume 100% of available cycles on a physical CPU, if you set it to 50, then that would mean that the VCPU can consume never more than half of the available cycles. In this example we see a combination of scheduling and constraints(capping).
Partitioning is used to assign a subset of resources to a workload. This assignment guarantees that this subset of resources is always available to the workload. But these resources can also not be used by other workloads, because they are assigned and guaranteed to one specific workload. Thus, configurations that use partitioning can avoid overcommitment of resources. However, in avoiding this overcommitment, the ability to achieve high utilizations can be reduced. A reserved resource is not available for use by another workload when the assigned workload is idle. Typical examples for partitioning are the assignment of physical CPU, parts of physical memory or parts of the I/O-system to workloads or virtualized environments.
Example 1: Let’s discuss again the way how Oracle VM for x86 is handling CPU. If we use the feature to pin vCPU to physical CPU and assign them to domains, we have a partitioning of CPU. Certain CPUs are then fixed assigned to domains. With that we guarantee always a fixed performance, but also the vCPU can not be used by other domains, even if they are idle.
Example 2: Partitioning with Oracle VM for SPARC is used for several resource types. CPU and memory are always assign directly to Logical Domains. There are also options to assign PCI slots and complete PCI infrastructure to certain domains. The advantage for this are high performance domains with close to zero overhead and guaranteed performance, if direct I/O is used.
Constraints, Scheduling or Partitioning are basic mechanisms of resource management to enable and guarantee access of various virtualization technologies to limited and shared resources. They are used for different resources based on requirements of different workloads and virtualization technologies.
Partitioning is the most used way to control resources in hypervisor based virtualization. In that case the hypervisor controls resources like CPU, Memory, Privilege-checks or hardware interrupts.
To avoid overcommitment of the CPU resources, they are typically partitioned and the physical CPUs are assigned as virtual CPU to virtual environments. In some cases a physical CPU is divided with a scheduler into multiple virtual CPU, but this generates virtualization overhead and can lead to an overcommitment on CPU resources.
The memory is typically controlled by the memory management system of the hypervisor which allocates and protects memory to guests based on rules. In some cases there is no memory management in the hypervisor, but a direct physical assignment (partitioning) of memory to guests. Overcommitment of memory resources should by avoided or is mostly not possible to configure with hypervisors.
With that we'd like to close this article on Resource Management and hope we've kept you eager to read the ones coming in the following newsletters.
This series already had the following articles:
- December 2011: Introduction to Virtualization (Matthias Pfützner)
- January 2012: Oracle VM Server for SPARC (Matthias Pfützner)
- February 2012: Oracle VM Server for x86 (Matthias Pfützner)
- March 2012: Oracle Solaris Zones and Linux Containers (Detlef Drewanz)
The series will continue as follows (tentative):
- May 2012: Network Virtualization (Detlef Drewanz)
- June 2012: Oracle VM VirtualBox (Detlef Drewanz)
- July 2012: Oracle Virtual Desktop Infrastructure (VDI) (Matthias Pfützner)
- August 2012: OpsCenter as Management Tool for Virtualization (Matthias Pfützner)
If you have questions, feel free to contact me at: Detlef Drewanz
|<<< Part 4: Oracle Solaris Zones and Linux Containers||>>> Part 6: Network Virtualization and Network Resource Management