Friday Apr 22, 2016

New whitepaper: Optimizing Oracle VM Server for x86 Performance

I am very pleased to announce publication of a new whitepaper "Optimizing Oracle VM Server for x86  Performance". This article contains material previously posted on this blog, plus additional technical information and features newly introduced with Oracle VM 3.4.

Thursday Dec 10, 2015

Oracle VM Performance and Tuning - Part 6 - I/O

This article focuses on what is often the most important performance topic: I/O, with discussions of network and disk I/O[Read More]

Monday Nov 30, 2015

Oracle VM Performance and Tuning - Part 5

The fifth article in this series of Oracle VM performance focusses on Oracle VM Server for x86 domain types, huge pages, and CPU scheduling controls.[Read More]

Tuesday Nov 24, 2015

Oracle VM Performance and Tuning - Part 4

The fourth article in this series of Oracle VM performance focusses on Oracle VM Server for x86 CPU and memory performance, with guidance on how to reduce memory latency and control CPU allocation.
[Read More]

Friday Nov 06, 2015

Oracle VM Performance and Tuning - Part 3

The third article in this series of Oracle VM performance focusses on performance goals for the virtual machine and cloud environments, and performance principles for CPU, memory, and I/O that behave differently in VM environments.[Read More]

Friday Oct 02, 2015

Oracle VM Performance and Tuning - Part 2

This article on Oracle VM performance reviews general performance principles, and follows with a review of Oracle VM architectural features that affect performance. This will be high-level as a basis for more technical detail in subsequent articles.

How to evaluate and measure performance (short version)

First, let's consider ways to not evaluate performance. Performance is often stated in unquantified generalities ("Give good response time!") or complaints ("Response time is terrible today. Fix it!"). That doesn't help understand the performance situation.

Another habit is to look at resource utilization without relevance to delivered performance. For example:

  • High CPU utilization. Is 95% CPU busy bad or good?

    High CPU busy may just mean you're getting your money's worth from your servers. It is only a problem if service level objectives aren't being met due to resource starvation. However, it could be a symptom of a problem (program looping, excessive error handling, etc) rather than being a problem.

  • Low CPU utilization. Is 12% CPU busy bad or good?

    Low CPU may mean the workload is idle, or it's a single threaded app on a server with many CPUs. (Is this an 8 CPU machine where one CPU is 100% busy while the others are idle?)

    Applications are often unable to drive CPU because they are waiting on I/O, or because memory is over committed and they are thrashing. Low utilization might be innocent, or can indicate a bottleneck elsewhere.

So, raw utilization numbers are not good or bad of themselves. They can be clues when used in the right context - "it depends". Another trap is to use average numbers, which can hide peak loads and spikes.

There are other popular measurements that don't help: such as microbenchmarks that don't look at all like the actual workload, or measuring how long it takes to run the dd command to write a gigabyte of zeroes when the expected workload is random I/O. Unless the purpose of the system is to use dd to write zeroes, it's of limited utility. Measuring the wrong thing because it's easy is so common that it has its own name, the streetlight effect.

Instead, performance analysts and systems administrators should measure against requirements of their business users, using service level objectives stated in external terms: meeting a deadline to run a particular task (such as: get payroll out, post and clear stock trades, or close the books at end of quarter), or meeting response times for different times of transaction (load a web page, do a stock quote, transact a purchase or trade). Performance objectives are commonly expressed in the form of response times ("95% of a specified transaction must complete in X seconds at a rate of N transactions per second") or in throughput rates ("handle N payroll records in H hours").

In the preceding paragraphs, I've touched on essential performance concepts: thoughput, response time, latency, utilization, service levels. We'll use those terms to relate to virtual machine performance.

Oracle VM - architectural review

Some may find the preceding material too abstract and not specific to virtualization, so lets change gears and discuss architecture of the Oracle VM hypervisors.

The Oracle VM family includes Oracle VM Server for x86 and Oracle VM Server for SPARC. two hypervisors that optionally share Oracle VM Manager as a common management infrastructure. There is also a desktop virtualization product, Oracle VM VirtualBox. VirtualBox is popular for end-user virtual machines on a desktop for laptop, but is out of scope for this series of articles.

Oracle VM Server for x86 and Oracle VM Server for SPARC have architectural similarities and differences. Both use a small hypervisor in conjunction with a privileged virtual machine ("domain") for aministration, and virtual and physical device management. The hypervisor resides in firmware on SPARC, and in software on x86. That constrasts with traditional virtual machine systems that use a monolithic hypervisor kernel for system control and device management.

Oracle VM Server for x86 is based on Xen virtualization technology, and uses a "dom0" (domain 0) as an administrative control point and to provide virtual I/O device services to the guest VMs ("domU"). Oracle VM Server for SPARC uses a small firmware-based hypervisor coupled with a "control domain" that can be compared to dom0 on x86, with the option of having multiple "service domains" for resiliency.

The two products also have similarities and differences in how they handle systems resources:

Oracle VM Server for x86 Oracle VM Server for SPARC
CPU CPUs can be shared, oversubscribed, and timesliced using a share-based scheduler.
CPUs can be allocated cores (or CPU threads if hyperthreading is enabled.)
The number of virtual CPUs in a domain can be changed while the domain is running.
CPUs are dedicated to each domain with static assignment when the domain is "bound".
Domains are given exclusive use of some number of CPU cores or threads, which can be changed while the domain is running.
Memory Memory is dedicated to each domain, no over-subscription.
The hypervisor attempts to assign a VM's memory to a single NUMA node, and has CPU affinity rules to try to keep a VM's virtual CPUs near its memory for local latency.
Memory is dedicated to each domain, no over-subscription.
The hypervisor attempts to assign memory on a single NUMA node, and allocate CPUs on the same NUMA node for local latency.
I/O Guest VMs are provided virtual network, console, and disk devices provided by dom0 Guest VMs are provided virtual HBA, network, console, and disk devices provided by control domain and optional service domains.
VMs can also use physical I/O with direct connection to SR-IOV virtual functions or PCIe buses.
Domain types Guest VMs (domains) may be hardware virtualization (HVM), paravirtualized (PV) or hardware virtualized with PV device drivers. Guest VMs (domains) are paravirtualized.

That's a lot of similarity for two products with different origins. When I'm asked for a quick summary, I say that the two products have a common memory model (VM memory is fixed, not overcommitted or swapped - very important), but different CPU models (Oracle VM Server for SPARC uses dedicated CPUs on servers that have lots of them, while x86 uses a more traditional software based scheduler that time-slices virtual CPUs onto physical CPUs. Both products are aware of the NUMA affects and try in different ways to reduce remote memory latency from CPUs. Both have virtual network and virtual disk devices, but the SPARC side has additional options for device backends and non-virtualized I/O. Finally, the x86 side has more domain types, reflecting the wide range of x86 operating systems.

That's an introduction to the concepts. The next article (rubbing hands together in anticipation!) will delve more into the technology and their performance implications.


For general performance analysis, I recommend Brendan Gregg's book Systems Performance: Enterprise and the Cloud. It has excellent content for any performance analyst, as well as details for various versions of Linux and Solaris.

Thursday Aug 06, 2015

Oracle VM Performance and Tuning - Part 1

This blog entry starts a series of articles on virtual machine performance, focussing (obviously) on Oracle VM Server on both x86 and SPARC, though also including general concepts.[Read More]

Wednesday Aug 27, 2014

Best Practices for Oracle Solaris Network Performance with Oracle VM Server for SPARC

A new document has been published on OTN: "How to Get the Best Performance from Oracle VM Server for SPARC" by Jon Anderson, Pradhap Devarajan, Darrin Johnson, Narayana Janga, Raghuram Kothakota, Justin Hatch, Ravi Nallan, and Jeff Savit.

Wednesday May 21, 2014

Virtual Disk Performance Improvement for Oracle VM Server for SPARC

A new Solaris update dramatically improves performance for virtual disks on Oracle VM Server for SPARC, providing near-native I/O performance. With this change, virtual I/O is suitable for the most demanding I/O intensive applications under Oracle VM Server for SPARC. Read the full article to see more details.[Read More]

Friday Mar 28, 2014

Best Practices - Top Ten Tuning Tips Updated

Oracle VM Server for SPARC can be configured to provide optimal CPU and I/O performance - this blog entry updates a previous version to reflect improvements and new capabilities introduced into the product.[Read More]

Friday Mar 01, 2013

Announcing Whitepaper: "Implementing Root Domains with Oracle VM Server for SPARC"

A new whitepaper describes root domains with Oracle VM Server for SPARC, which can be used to provide bare metal performance and other benefits.[Read More]

Monday Feb 04, 2013

Best Practices - Top Ten Tuning Tips

This blog post lists and briefly explains performance tips and best practices that should be used in most environments. [Read More]

Sunday Oct 02, 2011

T4 arrives!

The SPARC T4 adds world-class single CPU thread performance to the throughput computing performance T-series systems are known for. It has 2.85 or 3.0Ghz clock rate, branch prediction, longer pipelines, Out-Of-Order execution, for up to 5x better per-CPU performance than its predecessors.[Read More]

Friday Sep 19, 2008

Response to Joe Temple's blog on my blog...

IBM made an announcement that made invidious and inaccurate comparisons to our products. I blogged about why it was misleading. Other people blogged about my blog and said things that were even more misleading. Read my response here.[Read More]

Sunday Jun 15, 2008

No, there isn't a Santa Claus

I really don't like to use this blog for refuting competitor exaggerations and FUD. I'd really rather spend the infrequent time I spend on this blog talking about Sun technology. THere's so much great new stuff in Solaris, in our servers and storage products, and software stack, that it's just a nuisance to have to refute silly attacks on us.

But, once again into the fray. Read on to see the latest fuzzy math[Read More]




« June 2016