Best Practices - Core allocation
By jsavit on Jun 09, 2012
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (also called Logical Domains)
SPARC T-series servers currently have up to 4 CPU sockets, each of which has up to 8 or (on SPARC T3) 16 CPU cores, while each CPU core has 8 threads, for a maximum of 512 dispatchable CPUs. The defining feature of Oracle VM Server for SPARC is that each domain is assigned CPU threads or cores for its exclusive use. This avoids the overhead of software-based time-slicing and emulation (or binary rewriting) of system state-changing privileged instructions used in traditional hypervisors.
To create a domain, administrators specify either the number of CPU threads or cores that the domain will own, as well as its memory and I/O resources. When CPU resources are assigned at the individual thread level, the logical domains constraint manager attempts to assign threads from the same cores to a domain, and avoid "split core" situations where the same CPU core is used by multiple domains. Sometimes this is unavoidable, especially when domains are allocated and deallocated CPUs in small increments.
Why split cores can matter
Split core allocations can silenty reduce performance because multiple domains with different address spaces and memory contents are sharing the core's Level 1 cache (L1$). This is called false cache sharing since even identical memory addresses from different domains must point to different locations in RAM. The effect of this is increased contention for the cache, and higher memory latency for each domain using that core.
The degree of performance impact can be widely variable. For applications with very small memory working sets, and with I/O bound or low-CPU utilization workloads, it may not matter at all: all machines wait for work at the same speed. If the domains have substantial workloads, or are critical to performance then this can have an important impact: This blog entry was inspired by a customer issue in which one CPU core was split among 3 domains, one of which was the control and service domain. The reported problem was increased I/O latency in guest domains, but the root cause might be higher latency servicing the I/O requests due to the control domain being slowed down.
What to do about it
Split core situations are easily avoided. In most cases the logical domain constraint manager will avoid it without any administrative action, but it can be entirely prevented by doing one of the several actions:
- Assign virtual CPUs in multiples of 8 - the number of threads per core.
ldm set-vcpu 8 mydomainor
ldm add-vcpu 24 mydomain. Each domain will then be allocated on a core boundary.
- Use the whole core constraint when assigning CPU resources. This allocates
CPUs in increments of entire cores instead of virtual CPU threads.
The equivalent of the above commands would be
ldm set-core 1 mydomainor
ldm add-core 3 mydomain. Older syntax does the same thing by adding the
-cflag to the
set-vcpucommands, but the new syntax is recommended. When whole core allocation is used an attempt to add cores to a domain fails if there aren't enough completely empty cores to satisfy the request. See https://blogs.oracle.com/sharakan/entry/oracle_vm_server_for_sparc4 for an excellent article on this topic by Eric Sharakan.
- Don't obsess: - if the workloads have minimal CPU requirements and don't need anywhere near a full CPU core, then don't worry about it. If you have low utilization workloads being consolidated from older machines onto a current T-series, then there's no need to worry about this or to assign an entire core to domains that will never use that much capacity.
In any case, make sure the most important domains have their own CPU cores, in particular the control domain and any I/O or service domain, and of course any important guests.
Split core CPU allocation to domains can potentially have an impact on performance, but the logical domains manager tends to prevent this situation, and it can be completely and simply avoided by allocating virtual CPUs on core boundaries.