by Martin Tegtmeier from ISV Engineering in Walldorf
Simple CPU metrics (user/system/idle/io-wait) are still widely used although these numbers need interpretation on today's multi-thread / multi-core architectures. "Idle" as measured by operating systems cannot be literally translated into available CPU resources - turning capacity planning into a more complex problem.
Back in the days when 1 processor contained 1 core capable of running 1 thread, CPU utilization reported by the operating system indicated actual resource consumption (and resource availability) of the processor. In such environments CPU utilization grows linearly with increased workload.
Multi-core CPUs: 1 processor = 2 or more cores
In multi-core CPUs, where 1 processor contains 2 or more cores, each processing core has its own arithmetic and logic unit, floating point unit, set of registers, pipeline, as well as some amount of cache. However multi-core CPUs also share some resources between the cores (e.g. L3-Cache, memory controller).
Simultaneous multi-threading CPUs/cores: 1 processor or core = 2 or more threads (aka "Hyper-Threading", "Chip Multi-threading")
The hardware components of one physical core are shared between several threads. Each thread has at least its own set of registers. Most resources of the core (arithmetic and logic unit, floating point unit, cache) are shared between the threads. Naturally those threads compete for processing resources and stall if the desired units are already busy.
What are the benefits of resource sharing?
Resource sharing can increase overall throughput and efficiency by keeping the processing units of a core busy. For instance hyper-threading can reduce or hide stalls on memory access (cache misses). Instead of wasting many cycles while data is fetched from main memory the current thread is suspended and the next runnable thread is resumed and continues execution.
What are the disadvantages?
Idle does not indicate how much more work can be accomplished by the CPU
Assuming 1 CPU core has 4 threads. Currently 2 (single-threaded) processes are scheduled to run on this core and these 2 processes already saturate all available shared compute resources (ALU, FPU, Cache, Memory bandwidth, etc.) of the core. Commonly used performance tools would still report (at least) 50% idle since 2 logical processors (hardware threads) appear completely idle.
In order to correctly estimate how much work can be added until the system approaches full saturation the operating system would need to get detailed utilization information of all shared core processing units (ALU, FPU, Cache, Memory bandwidth, etc.) as well as knowing the
characteristics of the workload to be added (!).
Measurements with SAP ABAP workload
To illustrate our case, let's look at a very specific but very common workload in Enterprise Computing: SAP-SD ABAP. We took these measurements on a SPARC T5 system running the latest Solaris 11 release. Simulated benchmark users logged onto the SAP system and entered SD transactions. The maximum number of SD-Users and SAP transaction throughput the system could handle are represented by the 100% mark on the X-Axis. A series of test runs was carried out in order to measure CPU utilization (Y-Axis) as reported by the operating system at 0%, 12.5%, 25%, 50%, 60%, 75%, 90% and 100% of the maximum number of SD-Users.
Unlike what one could naively assume the diagram does not show a straight diagonal line. Instead we see that at 25% of the SD-User / maximum throughput load, the operating system only reports 8% CPU utilization with 92% idle.
At half of the maximum achievable throughput the system only appears to be 21% busy with 79% idle.
Put it another way, when the OS reports 50% CPU utilization we are already at 80% of maximum throughput, and cannot assume that adding the same load again would double the throughput with the same response time, yet this is a very common mistake we see reported by customers.
The curve shown on the diagram is highly dependent on the workload (application or application mix) and CPU architecture (number of hardware threads, shared computing resources, etc.). It can be assumed that most applications running on multi-threaded architectures will show this non-linear behaviour (more or less pronounced).
Capacity Planning has become a much more complex affair with the advent of multi-thread/multi-core CPU architectures, and to answer the question of how much more load one can add to an existing system, one has to analyze the workload to be added as well as the current resource consumption.
Solaris 11 and even the latest update to Solaris 10, include several performance monitoring tools like pgstat, cpustat, cputrack, to complement the likes of vmstat, iostat, mpstat or prstat and allow a much finer grained observation of CPU resource consumption.
Other tools like Oracle Solaris Studio Performance Analyzer can be invaluable to understand a particular workload.