Thursday Jul 02, 2009

Update to: Configuring and Optimizing Intel® Xeon Processor 5500 & 3500 Series (Nehalem-EP) Systems Memory

Two DIMMs per Memory Channel Speed Enhancement
While Intel officially supports a maximum 1066MHz memory bus speed with two single or dual-rank DIMMs per channel populated, Sun has engineered its Nehalem-EP servers to reliably operate at 1333MHz with up to 2 DIMMs per channel installed. The increase in bus speed results in up to 12% memory bandwidth improvement while preserving idle latency and increasing memory power by 5-8% depending on the DIMMs that are installed.

Two DIMM per channel operation at 1333MHz is enabled with the Software 1.2 release for Sun Fire x2270 systems and Software 2.0 release for Sun Fire x4170, x4270, x4275, x6270 and x6275 platforms. Software releases are available on the download page located here. While the x4000 and x6000 series systems enable the enhanced speed operation for a homogeneous population of 4GB DIMMs, the x2270 supports two DIMM per channel at 1333MHz operation with either homogeneous or mixed DIMM densities. Note that the speed enhancement only applies to "Advanced" processor types, i.e. those capable of 1333MHz operation - see Tables 1 and 2 below for more information on the various processor types.

Below is an updated version of my original blog entry including augmented bandwidth and power tables.

The Memory Subsystem
An integrated memory controller and multiple DDR3 memory channels help the Intel® Xeon® 5500 and Intel® Xeon® 3500 processors provide high bandwidth for memory-intensive applications. DDR3 memory components offer greater density and run at higher speeds, but at lower voltage than previous generation DDR2 memories.

A typical DDR3 DIMM (with heat spreader)

Each processor has a three-channel, direct-connect memory interface and supports DDR3 memory from Sun in two speeds; 1066MT/s and 1333MT/s. When configuring system memory, it’s important to note that DIMMs may run at slower than individually rated speeds depending on a number of factors, including the CPU type, the number of DIMMs per channel, and the type of memory (speed, number of ranks, etc.). The speed at which memory will ultimately run is set by system BIOS at startup and all memory channels will run at the highest common frequency.

The maximum theoretical memory bandwidth per processor socket for each supported data rate is:

  • 1333 MT/s: 32 GB/s (10.6 GB/s per channel)
  • 1066 MT/s: 25.5 GB/s (8.5 GB/s per channel)
  • 800 MT/s: 19.2 GB/s (6.4 GB/s per channel)

Depending on the specific platform, Sun’s servers support either two or three registered ECC DDR3 DIMMs per channel in either 2GB, 4GB, or 8GB capacities, with 8GB RDIMMS available shortly after initial product release. The Sun Ultra 27 workstation can accommodate up to two unbuffered DDR3 ECC 1GB or 2GB DIMMs per channel to support densities ranging from 2GB (2x 1GB) to 12GB (6x 2GB) of memory.

Memory Population Guidelines
Each of the processors three memory channels is capable of supporting either two or three DIMM slots, enabling 6 or 9 DIMMs per processor respectively. Memory slots in each channel are color-coded to simplify identification: for server platforms blue for slot 0, white for slot 1 and black for slot 2 on systems supporting 3 DIMMs per channel; and for the Ultra 27 workstation black for slot 1, blue for slot 0 (see Figure 1). As a general rule to optimize memory performance, DIMMs should be populated in sets of three, one per channel per CPU.

  Figure 1 – DIMM & Channel Layout for 3 and 2 DIMMs per Channel Memory Configurations

A basic rule shared by all Intel® Xeon® Processor 5500 and Intel® Xeon® Processor 3500 platforms is that the farthest DIMMs from the CPU in each individual DDR3 channel need to be populated first, starting with the slot furthest from the CPU socket (i.e. the blue slot on servers, the black slot on a workstation). Ideally each channel should be populated with equal capacity DIMMs, and if possible, with the same number of identical DIMMs, which helps to make memory performance more consistent. However, DIMMs of different sizes (i.e. single vs. dual rank) can be installed in different slots within the same channel. In a server with a single processor, the DIMM slots next to the empty CPU socket should not be populated.

Optimizing for Capacity
To design a configuration optimized for capacity, it is recommended that all slots are populated with the highest density DDR3 dual-rank DIMMs available for that specific system. Memory bus speed may be reduced to 1066MT/s when 2 DIMMs per channel are installed (unless the platform is capable of and configured properly to run with 2 DIMMs per channel at 1333MT/s) and to 800MT/s for all platforms when three DIMMs per channel are populated regardless of whether DDR3-1066 or DDR3-1333 DIMMs are used.

Optimizing for Performance
Server configurations with optimal memory bandwidth can be achieved using the “Advanced” class of Intel® Xeon® Processor 5500 & 3500 Series processors (see Tables 1 & 2 below) and memory components that run at 1333MT/s. Similarly, workstations will achieve highest memory performance with Intel® Xeon® 3500 processors that support DDR3-1333 as well. A balanced DDR3 DIMM population is a key factor in achieving optimal performance.

Table 1 - Intel® Xeon® Processor 5500 Classes

Table 2 - Intel® Xeon® Processor 3500 Classes

To optimize a configuration for bandwidth, populate one identical dual rank DDR3 1333MT/s DIMM per channel (or two identical dual-rank DDR3 1333MT/s DIMMs per channel on systems/configurations that support two DIMMs per channel at 1333MT/s operation). Use of single rank DIMMs will provide lower performance than dual rank modules because of an insufficient number of banks per channel available to the memory controller and the resulting underutilization of available bus bandwidth. Other factors that result in less than optimal memory performance include:

  • installing more than one DIMM per channel which may restrict the maximum memory access speed to 1066MT/s or 800MT/s depending on the specific type of system, BIOS version and whether there are two or three DIMMs per channel installed
  • an unbalanced DIMM population (i.e. one channel has more capacity than others)
  • when odd number of DIMM ranks per channel exists (i.e. mixing a 2GB single rank DIMM and a 4GB dual rank DIMM on each channel)

The Numbers
The table below illustrates how selected DIMM configurations compare from a bandwidth perspective. The numbers provided are all relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel (highlighted in violet). A homogeneous DIMM population is used in every case presented. SR = Single Rank DIMM, DR = Dual Rank DIMM

Table 3 - Relative Bandwidth Comparisons

Key takeaways from the above are:

  • for DIMM configurations that support both speeds, memory bandwidth is up to 12% higher with 1333 DIMMs than with 1066 DIMMs (which is not obvious from chart since it shows bandwidths relative to 1x DR per channel and not a direct comparison between 1333 and 1066 for a given configuration)
  • for a given capacity, one dual rank DIMM per channel provides higher bandwidth performance than two single rank DIMMs per channel

Optimizing for Power
Following is an example of how different DIMM configurations compare from a memory power perspective. The power numbers provided are relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel (highlighted in violet). In each case the DIMMs are comprised of the same DRAM technology and density. It’s important to keep in mind that DRAM power requirements typically drop as silicon technology and process change and mature, so the table is only applicable when comparing like technologies. As above, homogeneous DIMM populations are evaluated in every case, and SR = Single Rank DIMM, DR = Dual Rank DIMM.

Table 4 - Relative DIMM Power Comparisons

From the table it can be determined that:

  • for a particular DIMM configuration and bus speed, DDR3-1333 DIMMs consume up to 6% less power than DDR3-1066 modules
  • for a given DIMM configuration, the incremental power required to operate DDR3-1333 DIMMs at 1333MT/s data rate vs. 1066MT/s is 8% or less (also not obvious from chart since it shows power relative to 1x DR per channel and not a direct comparison between 1333 and 1066 for a given configuration)
and somewhat less obvious but equally important:
  • a dual rank DIMM operating at 1333MT/s consumes less power than two single rank DIMMs at 1066MT/s

The data presented indicates there are trade-offs to be made regarding how to best configure a system with memory. As each application will have unique requirements, processor memory bus speed capability, capacity, performance and power are all factors that must be considered as part of the process. Sun's Intel® Xeon® Processor 5500 Series and Intel® Xeon® Processor 3500 Series platforms offer tremendous capability and flexibility to optimize performance and capacity tailored to a specific need.

My sincere thanks go to Vijay Kunda for his tireless effort in helping to obtain the content.

Click here to return to the Nehalem index page.


John Nerl


« July 2009