Configuring and Optimizing Intel® Xeon Processor 5500 & 3500 Series (Nehalem) Systems Memory
By John Nerl on Apr 14, 2009
The Memory Subsystem
An integrated memory controller and multiple DDR3 memory channels help the Intel® Xeon® 5500 and Intel® Xeon® 3500 processors provide high bandwidth for memory-intensive applications. DDR3 memory components offer greater density and run at higher speeds, but at lower voltage than previous generation DDR2 memories.
A typical DDR3 DIMM (with heat spreader)
Each processor has a three-channel, direct-connect memory interface and supports DDR3 memory from Sun in two speeds; 1066MT/s and 1333MT/s. When configuring system memory, it’s important to note that DIMMs may run at slower than individually rated speeds depending on a number of factors, including the CPU type, the number of DIMMs per channel, and the type of memory (speed, number of ranks, etc.). The speed at which memory will ultimately run is set by system BIOS at startup and all memory channels will run at the highest common frequency.
The maximum theoretical memory bandwidth per processor socket for each supported data rate is:
- 1333 MT/s: 32 GB/s (10.6 GB/s per channel)
- 1066 MT/s: 25.5 GB/s (8.5 GB/s per channel)
- 800 MT/s: 19.2 GB/s (6.4 GB/s per channel)
Depending on the specific platform, Sun’s servers support either two or three registered ECC DDR3 DIMMs per channel in either 2GB, 4GB, or 8GB capacities, with 8GB RDIMMS available shortly after initial product release. The Sun Ultra 27 workstation can accommodate up to two unbuffered DDR3 ECC 1GB or 2GB DIMMs per channel to support densities ranging from 2GB (2x 1GB) to 12GB (6x 2GB) of memory.
Memory Population Guidelines
Each of the processors three memory channels is capable of supporting either two or three DIMM slots, enabling 6 or 9 DIMMs per processor respectively. Memory slots in each channel are color-coded to simplify identification: for server platforms blue for slot 0, white for slot 1 and black for slot 2 on systems supporting 3 DIMMs per channel; and for the Ultra 27 workstation black for slot 1, blue for slot 0 (see Figure 1). As a general rule to optimize memory performance, DIMMs should be populated in sets of three, one per channel per CPU.
Figure 1 – DIMM & Channel Layout for 3 and 2 DIMMs per Channel Memory Configurations
A basic rule shared by all Intel® Xeon® Processor 5500 and Intel® Xeon® Processor 3500 platforms is that the farthest DIMMs from the CPU in each individual DDR3 channel need to be populated first, starting with the slot furthest from the CPU socket (i.e. the blue slot on servers, the black slot on a workstation). Ideally each channel should be populated with equal capacity DIMMs, and if possible, with the same number of identical DIMMs, which helps to make memory performance more consistent. However, DIMMs of different sizes (i.e. single vs. dual rank) can be installed in different slots within the same channel.
In a server with a single processor, the DIMM slots next to the empty CPU socket should not be populated.
Optimizing for Capacity
To design a configuration optimized for capacity, it is recommended that all slots are populated with the highest density DDR3 dual-rank DIMMs available for that specific system. Memory bus speed will be reduced to 1066MT/s when 2 DIMMs per channel are installed and to 800MT/s when three DIMMs per channel are populated regardless of whether DDR3-1066 or DDR3-1333 DIMMs are used.
Optimizing for Performance
Server configurations with optimal memory bandwidth can be achieved using the “Performance” class of Intel® Xeon® Processor 5500 & 3500 Series processors (see Tables 1 & 2 below) and memory components that run at 1333MT/s. Similarly, workstations will achieve highest memory performance with Intel® Xeon® 3500 processors that support DDR3-1333 as well. A balanced DDR3 DIMM population is a key factor in achieving optimal performance.
Table 1 - Intel® Xeon® Processor 5500 Classes
Table 2 - Intel® Xeon® Processor 3500 Classes
To optimize a configuration for bandwidth, populate one identical dual rank DDR3 1333MT/s DIMM per channel. Use of single rank DIMMs will provide lower performance than dual rank modules because of an insufficient number of banks per channel available to the memory controller and the resulting underutilization of available bus bandwidth. Other factors that result in less than optimal memory performance include:
- installing more than one DIMM per channel which restricts the maximum memory access speed to 1066MT/s or 800MT/s depending on whether there are two or three DIMMs per channel installed
- an unbalanced DIMM population (i.e. one channel has more capacity than others)
- when odd number of DIMM ranks per channel exists (i.e. mixing a 2GB single rank DIMM and a 4GB dual rank DIMM on each channel)
Below is a table showing how different DIMM configurations compare from a bandwidth perspective. The numbers provided are all relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel. A homogeneous DIMM population is used in every case presented. SR = Single Rank DIMM, DR = Dual Rank DIMM
Table 3 - Relative Bandwidth Comparisons
Key takeaways from the above are:
- for DIMM configurations that support both speeds, memory bandwidth is 5-8% higher with 1333 DIMMs than with 1066 DIMMs
- for a given capacity, one dual rank DIMM per channel provides higher bandwidth performance than two single rank DIMMs per channel
Optimizing for Power
Following is an example of how different DIMM configurations compare from a memory power perspective. The power numbers provided are relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel (bold). In each case the DIMMs are comprised of the same DRAM technology and density. It’s important to keep in mind that DRAM power requirements typically drop as silicon technology and process change and mature, so the table is only applicable when comparing like technologies. As above, homogeneous DIMM populations are evaluated in every case, and SR = Single Rank DIMM, DR = Dual Rank DIMM.
Table 4 - Relative DIMM Power Comparisons
From the table it can be determined that:
- for a particular DIMM configuration and bus speed, DDR3-1333 DIMMs consume up to 6% less power than DDR3-1066 modules
- for a given DIMM configuration, the incremental power required to operate DDR3-1333 DIMMs at 1333MT/s data rate vs. 1066MT/s is 4% or less
- a dual rank DIMM operating at 1333MT/s consumes less power than two single rank DIMMs at 1066MT/s
The data presented indicates there are trade-offs to be made regarding how to best configure a system with memory. As each application will have unique requirements, processor memory bus speed capability, capacity, performance and power are all factors that must be considered as part of the process. Sun's Intel® Xeon® Processor 5500 Series and Intel® Xeon® Processor 3500 Series platforms offer tremendous capability and flexibility to optimize performance and capacity tailored to a specific need.
Click here to return to the Nehalem index page.