Update to: Configuring and Optimizing Intel® Xeon Processor 5500 & 3500 Series (Nehalem-EP) Systems Memory

Two DIMMs per Memory Channel Speed Enhancement
While Intel officially supports a maximum 1066MHz memory bus speed with two single or dual-rank DIMMs per channel populated, Sun has engineered its Nehalem-EP servers to reliably operate at 1333MHz with up to 2 DIMMs per channel installed. The increase in bus speed results in up to 12% memory bandwidth improvement while preserving idle latency and increasing memory power by 5-8% depending on the DIMMs that are installed.

Two DIMM per channel operation at 1333MHz is enabled with the Software 1.2 release for Sun Fire x2270 systems and Software 2.0 release for Sun Fire x4170, x4270, x4275, x6270 and x6275 platforms. Software releases are available on the Sun.com download page located here. While the x4000 and x6000 series systems enable the enhanced speed operation for a homogeneous population of 4GB DIMMs, the x2270 supports two DIMM per channel at 1333MHz operation with either homogeneous or mixed DIMM densities. Note that the speed enhancement only applies to "Advanced" processor types, i.e. those capable of 1333MHz operation - see Tables 1 and 2 below for more information on the various processor types.

Below is an updated version of my original blog entry including augmented bandwidth and power tables.

The Memory Subsystem
An integrated memory controller and multiple DDR3 memory channels help the Intel® Xeon® 5500 and Intel® Xeon® 3500 processors provide high bandwidth for memory-intensive applications. DDR3 memory components offer greater density and run at higher speeds, but at lower voltage than previous generation DDR2 memories.

A typical DDR3 DIMM (with heat spreader)

Each processor has a three-channel, direct-connect memory interface and supports DDR3 memory from Sun in two speeds; 1066MT/s and 1333MT/s. When configuring system memory, it’s important to note that DIMMs may run at slower than individually rated speeds depending on a number of factors, including the CPU type, the number of DIMMs per channel, and the type of memory (speed, number of ranks, etc.). The speed at which memory will ultimately run is set by system BIOS at startup and all memory channels will run at the highest common frequency.

The maximum theoretical memory bandwidth per processor socket for each supported data rate is:

  • 1333 MT/s: 32 GB/s (10.6 GB/s per channel)
  • 1066 MT/s: 25.5 GB/s (8.5 GB/s per channel)
  • 800 MT/s: 19.2 GB/s (6.4 GB/s per channel)

Depending on the specific platform, Sun’s servers support either two or three registered ECC DDR3 DIMMs per channel in either 2GB, 4GB, or 8GB capacities, with 8GB RDIMMS available shortly after initial product release. The Sun Ultra 27 workstation can accommodate up to two unbuffered DDR3 ECC 1GB or 2GB DIMMs per channel to support densities ranging from 2GB (2x 1GB) to 12GB (6x 2GB) of memory.

Memory Population Guidelines
Each of the processors three memory channels is capable of supporting either two or three DIMM slots, enabling 6 or 9 DIMMs per processor respectively. Memory slots in each channel are color-coded to simplify identification: for server platforms blue for slot 0, white for slot 1 and black for slot 2 on systems supporting 3 DIMMs per channel; and for the Ultra 27 workstation black for slot 1, blue for slot 0 (see Figure 1). As a general rule to optimize memory performance, DIMMs should be populated in sets of three, one per channel per CPU.

  Figure 1 – DIMM & Channel Layout for 3 and 2 DIMMs per Channel Memory Configurations


A basic rule shared by all Intel® Xeon® Processor 5500 and Intel® Xeon® Processor 3500 platforms is that the farthest DIMMs from the CPU in each individual DDR3 channel need to be populated first, starting with the slot furthest from the CPU socket (i.e. the blue slot on servers, the black slot on a workstation). Ideally each channel should be populated with equal capacity DIMMs, and if possible, with the same number of identical DIMMs, which helps to make memory performance more consistent. However, DIMMs of different sizes (i.e. single vs. dual rank) can be installed in different slots within the same channel. In a server with a single processor, the DIMM slots next to the empty CPU socket should not be populated.

Optimizing for Capacity
To design a configuration optimized for capacity, it is recommended that all slots are populated with the highest density DDR3 dual-rank DIMMs available for that specific system. Memory bus speed may be reduced to 1066MT/s when 2 DIMMs per channel are installed (unless the platform is capable of and configured properly to run with 2 DIMMs per channel at 1333MT/s) and to 800MT/s for all platforms when three DIMMs per channel are populated regardless of whether DDR3-1066 or DDR3-1333 DIMMs are used.

Optimizing for Performance
Server configurations with optimal memory bandwidth can be achieved using the “Advanced” class of Intel® Xeon® Processor 5500 & 3500 Series processors (see Tables 1 & 2 below) and memory components that run at 1333MT/s. Similarly, workstations will achieve highest memory performance with Intel® Xeon® 3500 processors that support DDR3-1333 as well. A balanced DDR3 DIMM population is a key factor in achieving optimal performance.

Table 1 - Intel® Xeon® Processor 5500 Classes

Table 2 - Intel® Xeon® Processor 3500 Classes

To optimize a configuration for bandwidth, populate one identical dual rank DDR3 1333MT/s DIMM per channel (or two identical dual-rank DDR3 1333MT/s DIMMs per channel on systems/configurations that support two DIMMs per channel at 1333MT/s operation). Use of single rank DIMMs will provide lower performance than dual rank modules because of an insufficient number of banks per channel available to the memory controller and the resulting underutilization of available bus bandwidth. Other factors that result in less than optimal memory performance include:

  • installing more than one DIMM per channel which may restrict the maximum memory access speed to 1066MT/s or 800MT/s depending on the specific type of system, BIOS version and whether there are two or three DIMMs per channel installed
  • an unbalanced DIMM population (i.e. one channel has more capacity than others)
  • when odd number of DIMM ranks per channel exists (i.e. mixing a 2GB single rank DIMM and a 4GB dual rank DIMM on each channel)

The Numbers
The table below illustrates how selected DIMM configurations compare from a bandwidth perspective. The numbers provided are all relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel (highlighted in violet). A homogeneous DIMM population is used in every case presented. SR = Single Rank DIMM, DR = Dual Rank DIMM

Table 3 - Relative Bandwidth Comparisons

Key takeaways from the above are:

  • for DIMM configurations that support both speeds, memory bandwidth is up to 12% higher with 1333 DIMMs than with 1066 DIMMs (which is not obvious from chart since it shows bandwidths relative to 1x DR per channel and not a direct comparison between 1333 and 1066 for a given configuration)
  • for a given capacity, one dual rank DIMM per channel provides higher bandwidth performance than two single rank DIMMs per channel

Optimizing for Power
Following is an example of how different DIMM configurations compare from a memory power perspective. The power numbers provided are relative to a DDR3-1333 capable processor configured with one dual-rank DIMM per channel (highlighted in violet). In each case the DIMMs are comprised of the same DRAM technology and density. It’s important to keep in mind that DRAM power requirements typically drop as silicon technology and process change and mature, so the table is only applicable when comparing like technologies. As above, homogeneous DIMM populations are evaluated in every case, and SR = Single Rank DIMM, DR = Dual Rank DIMM.

Table 4 - Relative DIMM Power Comparisons

From the table it can be determined that:

  • for a particular DIMM configuration and bus speed, DDR3-1333 DIMMs consume up to 6% less power than DDR3-1066 modules
  • for a given DIMM configuration, the incremental power required to operate DDR3-1333 DIMMs at 1333MT/s data rate vs. 1066MT/s is 8% or less (also not obvious from chart since it shows power relative to 1x DR per channel and not a direct comparison between 1333 and 1066 for a given configuration)
and somewhat less obvious but equally important:
  • a dual rank DIMM operating at 1333MT/s consumes less power than two single rank DIMMs at 1066MT/s

The data presented indicates there are trade-offs to be made regarding how to best configure a system with memory. As each application will have unique requirements, processor memory bus speed capability, capacity, performance and power are all factors that must be considered as part of the process. Sun's Intel® Xeon® Processor 5500 Series and Intel® Xeon® Processor 3500 Series platforms offer tremendous capability and flexibility to optimize performance and capacity tailored to a specific need.

My sincere thanks go to Vijay Kunda for his tireless effort in helping to obtain the content.


Click here to return to the Nehalem index page.

Comments:

John, this and its predecessor entry are very helpful. How does one know whether a DDR3 DIMM is single, dual (or quad) ranked? I see in the Sun System Handbook details on X6270 memory options that the 4GB 1333MHz DIMMs are dual ranked... but what about the just announced 2GB 1333MHz (X4653A) DIMMs? I assume just wait for the SSH to be updated?

The software update that allows the X6270 to have two DIMMS per channel is very encouraging. Will this only apply to the 4GB DIMM 1333MHz modules in both slots/channel? I guess one could not mix 4GB 1333 and new 2GB 1333s, right?

If my application wanted 4GB per core, it appears that an unbalanced memory config, - similar to your response to Michelle in the earlier blog entry's comments section - of 3 4GB DIMMs on one side and 6 4GB DIMMs on the other (with the V2 software for the X6270) would provide 36GB memory for the 8 cores... is this correct? Do you have a better suggestion?

Posted by Guy Kent on July 27, 2009 at 07:01 PM EDT #

Guy, Sun currently only offers single and dual-rank DIMMs for all densities, and they're all built with x4 DRAMs to enable SDDC (single device data correction, a.k.a. chipkill). While the 2GB is single-rank, 4GB and 8GB modules are dual-rank but at some point in time, as 2Gbit DRAMs become mainstream, expect the 4GB to morph into a single-rank DIMM. As you noted, the Sun System Handbook is the place to go for this information.

Two DIMM per channel operation at DDR3-1333 for the X6270 only applies to 4GB DIMMs so mixing 2GB and 4GB 1333 modules will result in a speed grade reduction to 1066MHz, as is the case today.

I would concur that the unbalanced configuration you propose to achieve 4GB/core seems to make the most sense as it will, with the BIOS upgrade, run at 1333MHz.

Posted by John Nerl on July 28, 2009 at 12:45 AM EDT #

John, thank you for clarifying. Interesting that the Sun Webdesk configurator does not allow "unbalanced DIMM configs" i.e. it forces qty 6 for each bank. I will try to get an opinion on having this changed. Thank you!

Posted by Guy Kent on July 28, 2009 at 02:26 AM EDT #

John,

Simple question to you, is it default setting in BIOS, or do I need to change some option in BIOS menu ?

Tomo-

Posted by Tomonori Hagihara on August 16, 2009 at 09:53 PM EDT #

Hi Tomo - sorry for the delay.

You do not need to modify anything within BIOS setup to enable the speed enhancement.

john

Posted by John Nerl on September 10, 2009 at 03:40 AM EDT #

John

Thank you for reply. btw, how can I use the unit at original memory speed which is defined by Intel ? This may be needed just in case customer experiences unexpected memory error.

Thank you
Tomo

Posted by Tomo on September 13, 2009 at 07:07 PM EDT #

Tomo,

Sun does not allow the customer to change memory bus speed. All supported DIMM types have been exhaustively tested at rated speed, therefore reducing memory clock rate in the field should not be necessary.

john

Posted by John Nerl on September 14, 2009 at 01:00 AM EDT #

if i buy a x2270 (Advanced proc) with 2 procs, and want to have 24GB, is it possible with 2GB DIMM from Sun (6\*2 + 6\*2) at 1333MT/sec?
from what i understand, is that it is possible only if if install something called "Software 1.2 for Sun x2270", but the download page doesn't show this software.
Could you be more precise?
thanks in advance for help,

Posted by gerard henry on September 21, 2009 at 06:04 AM EDT #

Gerard, other enhancements have been made since my initial posting, so as long as the version is SW1.2 or later the system will support 2 DIMMs per channel at 1333MT/s operation. All available x2270 software versions are located at:
http://www.sun.com/servers/x64/x2270/downloads.jsp

Posted by John Nerl on September 21, 2009 at 08:00 AM EDT #

John,

for completeness do you have the BW numbers for 2xSR and 2xDR for dual channel config?

Posted by Fatso on November 21, 2009 at 02:31 PM EST #

Sorry, but we did not benchmark that configuration as the better option is to spread the DIMMs across all channels. For best performance (independent channel mode) you want to avoid populating more than one DIMM on a channel while leaving another channel empty.

Posted by John Nerl on November 23, 2009 at 12:11 AM EST #

John
Thanks for a very good article, I shall be spending my money 'wisely' upgrading the RAM on a new Dell T3500 workstation (only £750 inc a FX1800 from ITCsales). It is fitted with a W3520 processor and the memory fitted is U-ECC 1333MHz as standard, unless Dell have tweeked things for 1333MHz (no info),the Intel W3520 is at 1066MHz. Would the higher latency of 1333MHz ram be wastefull with 1 dimm/channel compared to 1066MHz or would there be an advantage with 1333MHz ?

Oh! I found this Nehalem article, it discusses UDIMM and RDIMM in banks, guess you might have seen it ?
http://www.delltechcenter.com/page/04-08-2009+-+Nehalem+and+Memory+Configurations

Posted by Dominic on September 12, 2010 at 12:35 PM EDT #

Dominic, there should be no difference in performance between 1333 and 1066 DIMMs when both are operated at 1066MHz, which as you know is the max supported memory speed of the W3520 processor.

-john

Posted by John Nerl on September 13, 2010 at 02:21 AM EDT #

Hi John, thanks for the info, really learned a lot!
I have a question, I'm planning to upgrade my RAM on a workstation board (x58 based Xeon W3680), which has 6 ram slots, I saw two 6x4GB DDR3 1333 ecc kits from Crucial, same price, but one is single rank and another is dual rank, which one should I get?

the motherboard (P6T6 WS REVOLUTION) says it supports ECC and maximum of 24GB memory, but I'm not sure whether it can take full 6x2=12 ranks, in there qualified memory list I do see a "double sided" model from Crucial.

any comments? many thanks.

Posted by Marvin on August 18, 2011 at 05:55 AM EDT #

Marvin wrote:

Hi John, thanks for the info, really learned a lot!
I have a question, I'm planning to upgrade my RAM on a workstation board (x58 based Xeon W3680), which has 6 ram slots, I saw two 6x4GB DDR3 1333 ecc kits from Crucial, same price, but one is single rank and another is dual rank, which one should I get?

the motherboard (P6T6 WS REVOLUTION) says it supports ECC and maximum of 24GB memory, but I'm not sure whether it can take full 6x2=12 ranks, in there qualified memory list I do see a "double sided" model from Crucial.

any comments? many thanks.

I believe it should be able to handle a full complement of dual-rank DIMMs. Why not contact Asus and ask for a definitive answer?

Posted by John Nerl on August 19, 2011 at 06:53 AM EDT #

Thanks John, I tried to install 6 dual-ranked DDR3 1333 4GB memory on this mobo, and it turned out to only display 16GB usable, so that proves X58 chipset only has 8 memory ranks despite that it usually have 6 memory slots.

then I changed to 6 single-ranked memory, full 24GB was recognized.

thanks again.

Posted by Marvin on September 13, 2011 at 01:25 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John Nerl

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today