by Martin Tegtmeier from ISV Engineering in Walldorf
Simple CPU metrics (user/system/idle/io-wait) are still
widely used although these numbers need interpretation on today's
multi-thread/multi-core architectures. "Idle" as measured by
operating systems cannot be literally translated into available CPU
resources - turning capacity planning into a more complex problem.
Back in the days when 1 processor contained 1 core capable of
running 1 thread, CPU utilization reported by the operating system
indicated actual resource consumption (and resource availability) of
the processor. In such environments CPU utilization grows linearly
with increased workload.
Multi-core CPUs: 1 processor = 2 or more cores
In multi-core CPUs, where 1 processor contains 2 or more cores, each processing core has its own arithmetic
and logic unit, floating point unit, set of registers, pipeline,
as well as some amount of cache. However multi-core CPUs also share
some resources between the cores (e.g. L3-Cache, memory controller).
Simultaneous multi-threading CPUs/cores: 1 processor or core = 2 or
more threads (aka "Hyper-Threading", "Chip Multi-threading")
The hardware components of one physical core are shared between
several threads. Each thread has at least its own set of registers.
Most resources of the core (arithmetic and logic unit, floating
point unit, cache) are shared between the threads. Naturally those
threads compete for processing resources and stall if the desired
units are already busy.
What are the benefits of resource sharing?
Resource sharing can increase overall throughput and efficiency by
keeping the processing units of a core busy. For instance
hyper-threading can reduce or hide stalls on memory access (cache
misses). Instead of wasting many cycles while data is fetched from
main memory the current thread is suspended and the next runnable
thread is resumed and continues execution.
What are the disadvantages?
- CPU time accounting measurements (sys/usr/idle) as reported by
standard tools do not reflect the side-effects of resource
sharing between hardware threads
- It is impossible to correctly measure idle and extrapolate
available computing resources
Idle does not indicate how much more work can be accomplished by
Assuming 1 CPU core has 4 threads. Currently 2 (single-threaded)
processes are scheduled to run on this core and these 2 processes
already saturate all available shared compute resources
(ALU, FPU, Cache, Memory bandwidth, etc.) of the core. Commonly used
performance tools would still report (at least) 50% idle since 2
logical processors (hardware threads) appear completely idle.
In order to correctly estimate how much work can be added until the
system approaches full saturation the operating system would need to
get detailed utilization information of all shared core processing
units (ALU, FPU, Cache, Memory bandwidth, etc.) as well as knowing the
characteristics of the workload to be added (!).
Measurements with SAP ABAP workload
To illustrate our case, let's look at a very specific but very common workload in Enterprise Computing: SAP-SD ABAP. We took these measurements on a SPARC T5 system running the latest Solaris 11 release. Simulated
benchmark users logged onto the SAP system and entered SD
transactions. The maximum number of SD-Users and SAP transaction
throughput the system could handle are represented by the 100% mark
on the X-Axis. A series of test runs was carried out in order to
measure CPU utilization (Y-Axis) as reported by the operating system
at 0%, 12.5%, 25%, 50%, 60%, 75%, 90% and 100% of the maximum number