Tuesday Mar 31, 2009

Intelligent Performance - Solaris optimized for Nehalem

Nehalem_Solaris.pngOver the last two years, Sun and Intel have been working together - from design and architecture through implementation - to ensure that Solaris is optimized to unleash the power and capabilities of current and future Intel Xeon processors. The compelling results include:
  • Increased performance as the Solaris OS takes advantage of Intel multi-core processor capabilities and Intel Turbo Boost Technology
  • Optimized power efficiency and utilization by enabling Solaris to take advantage of Intel Xeon processor 5500 (aka Nehalem) series performance-enhanced dynamic power management capabilities.
  • Extending Predictive capabilities to improve reliability by incorporating Nehalem features into the Solaris Fault Management Architecture (FMA).
Now, lets talk more in-depth about the innovations within Solaris in combination with Intels Nehalem Architecture.

Intelligent Performance

We have optimized the performance of Solaris for individual cores and the overall multi-core microarchitecture, which increases both single- or multi-threaded performance. Intel Turbo Boost Technology uses any available power headroom to deliver higher clock rates. In those situations where the application requires maximum processing power, the Intel Xeon processor 5500 series increases the frequency in the active core when conditions such as load, power consumption and temperature permit it.

The Solaris threading model provides a sophisticated performance with specific optimizations for the new Nehalem Architecture. Solaris also takes advantage of the capabilities of the new Intel QuickPath Interconnect QPI with capabilities such as an optimized scheduler and memory placement optimization (MPO) capability that has proven performance benefits with non-uniform memory access (NUMA). This reduces the memory latency. The Solaris NUMA implementation takes information from the Advance Configuration and Power Interface (ACPI) System Resource Affinity Table (SRAT) and the System Locality Information Table (SLIT).

Modern processors provide the ability to observer performance characteristics of applications using performance counters. Solaris provides the libcpc (\*LIB) API to access these performance counters. These interfaces can be used to observe the performance characteristics of applications. Following utilities provide you information from libcpc:
  • cpustat -h provides a listing of all the different events that are available on a give processor.
  • cputrack is used to analyze performance charateristics on a per-process or per-LWP basis
  • Performance Analyzer tools like collect, analyzer and er_print.
The DTrace CPU Performance Counter provider (cpc provider) makes available probes associated with processor performance counter events.

Automated Energy Efficiency

Solaris takes advantage of many power efficiency features in the Nehalem architecture. For example, an innovative Power Aware Dispatcher (PAD) has been integrated into OpenSolaris, enabling granularity in power management (P-states). We have seen a substantial reduction in idle power consumption, lower power cunsumption at maximum cpu utilization, and improved performance when switching between power states.

The kernel dispatcher - the part of the kernel that decides where threads should run - is integrated with the power management subsystem of the Nehalem cpu. Therefore the kernel now has the ability to utilize those parts of the processor that are active, and continue to avoid doing work on those parts that are powered down.

PowerTOP is a new command line tool that shows how effectively a system is taking advantage of the processor's power management features. The application observes the system on an interval basis and displays a summery of how long the processor is spending (on average) at each different state.

Reliability and Availability

Sun and Intel are working together to extend the capabilities of the Solaris Fault Manager by also supporting the Machine Check Architecture (MCA). The Fault Manager automatically diagnoses the underlying problems and responds by off-lining faulty components.

The ability to fast reboot a system drastically reduces downtime and improves efficiency. Fast Reboot is a command-line feature that enables you to reboot an Intel Xeon processor 5500 series system quickly, bypassing the BIOS, power on self test, and the GRUB Bootloader. Fast Reboot (reboot -f) implements an in-kernel boot loader that loads the kernel into memory and then switches to that kernel, so that the reboot process occurs within seconds.

The ability to work around processor errata in the operating system by applying microcode updates is available in Solaris. This support alleviates the need to upgrade a system's BIOS every time a new microcode update is required.


We have made optimizations to the compiler tools and runtime libraries including full support for Streaming SIMD Extensions (SSE 4.2). To enable the automatic usage for SSE instructions, specify -xvector=simd in your compiler options.

To speed up serial application performance on multithreaded chips like the Intel Xeon 5500 Series, you can use the compiler option -xautopar. The compiler will then generate codes which, when executed at runtime, will have more than one thread to execute the loop body.

OpenMP is the de-facto standard for writing multi-threaded applications to run on multi-threaded machines. OpenMP specification version 3.0 defines a rich set of directives, runtime routines, and environment variable that allow the programmer to write multi-threaded applications in C, C++, and Fortran. The Sun Studio software facilitates OpenMP development. The main motivations for using OpenMP are performance, scalability, portability, and standardization. With a relatively small amount of coding effort, a programmer can write multi-threaded applications to run on multi-threaded machines.


This video from David Steward (Intel) also provides some insights about the improvements in Solaris for the Nehalem architecture:

Sun provides everything necessary to get the best out of the latest Intel Xeon 5500 series CPU. Starting at the Hardware with our innovative X-Series Servers, the Solaris operating system with all its enhancements, and ending with the complete compiler environment that leverages all features for your application.

You can find more information on the following links:

In this blog you can find interesting content about solutions and technologies that SUN is developing.

The blog is customer oriented and provides information for architects, chief technologists as well as engineers.


« April 2014