eXtended System Boards


Like the Sun Fire midrange (6800/6900) and high-end (15K/25K) servers, the Sun SPARC Enterprise M-series servers allow you to organize system boards (SBs) into hardware domains (called "Dynamic Systems Domains" by marketing). Hardware domains contain CPUs, memory and I/O which are isolated from each other; one hardware domain may be powered on or off regardless of the other hardware domains. Like Sun Fire, SPARC Enterprise system boards consist of four CPU chip sockets, 32 DIMM sockets, and I/O.

The Sun SPARC Enterprise midrange and high-end servers, however, take system boards and hardware domains one step further. Physical systems boards can be partitioned into four eXtended system boards (XSBs).

The Sun SPARC Enterprise M4000 can have up to 4 CPU chips and is organized as a single system baord. The M5000 can have up to eight CPU chips and is organized as two system boards. The M8000 and M9000 "system board" consists of a CPU/Memory Unit (CMU) plus an I/O Unit (IOU) which together form a system board. The M8000 can have up to four SBs, while the M9000-64 can have up to 16. When all of the resources on a system board are assigned to domains as a single group, the system board is said to be in "Uni-XSB" mode. The following table shows the CPU, memory and I/O resources on the Sun SPARC Enterprise system boards in Uni-XSB mode:

M4000 in Uni-XSB Mode
SB CPUs Memory I/O
00 CPU#0
CPU#1
CPU#2
CPU#3
32 DIMMs 2 SAS Disks
DVD/DAT
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
PCI-E Slot#3
PCI-E Slot#4
M5000 in Uni-XSB Mode
SB CPUs Memory I/O
00 CPU#0
CPU#1
CPU#2
CPU#3
32 DIMMs 2 SAS Disks
DVD/DAT
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
PCI-E Slot#3
PCI-E Slot#4
01 CPU#0
CPU#1
CPU#2
CPU#3
32 DIMMs 2 SAS Disks
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
PCI-E Slot#3
PCI-E Slot#4
M8000/M9000 SB in Uni-XSB Mode
SB CPUs Memory I/O
00
to
15
CPU#0
CPU#1
CPU#2
CPU#3
32 DIMMs PCI-E Slot#1
PCI-E Slot#2
PCI-E Slot#3
PCI-E Slot#4
PCI-E Slot#5
PCI-E Slot#6
PCI-E Slot#7
PCI-E Slot#8

Normally on a Sun Fire system you would only be able to create as many domains as you have system boards. However, with the Sun SPARC Enterprise servers, you can configure each system board into four XSBs (quad-XSB mode). This allows you to create domains as small as a single CPU, 8 DIMMs, and I/O. To make it easier to map XSBs back to the physical SB, the number used for XSBs is xx-y where xx is the physical system board, and y is the XSB on that system board. For example, 01-2 would refer to the XSB containing CPU#2 on physical system board #1. The next table shows how the various resources are partitioned among the four XSBs per SB.

M4000 in Quad-XSB Mode
SB XSB CPUs Memory I/O
00 00-0 CPU#0 8 DIMMs 2 SAS Disks
DVD/DAT
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
00-1 CPU#1 8 DIMMs PCI-E Slot#3
PCI-E Slot#4
00-2 CPU#2 8 DIMMs No I/O
00-3 CPU#3 8 DIMMs No I/O
M5000 in Quad-XSB Mode
SB XSB CPUs Memory I/O
00 00-0 CPU#0 8 DIMMs 2 SAS Disks
DVD/DAT
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
00-1 CPU#1 8 DIMMs PCI-E Slot#3
PCI-E Slot#4
00-2 CPU#2 8 DIMMs No I/O
00-3 CPU#3 8 DIMMs No I/O
01 01-0 CPU#0 8 DIMMs 2 SAS Disks
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
01-1 CPU#1 8 DIMMs PCI-E Slot#3
PCI-E Slot#4
01-2 CPU#2 8 DIMMs No I/O
01-3 CPU#3 8 DIMMs No I/O
M8000/M9000 in Quad-XSB Mode
SB XSB CPUs Memory I/O
00
to
15
XX-0 CPU#0 8 DIMMs PCI-E Slot#1
PCI-E Slot#2
XX-1 CPU#1 8 DIMMs PCI-E Slot#3
PCI-E Slot#4
XX-2 CPU#2 8 DIMMs PCI-E Slot#5
PCI-E Slot#6
XX-3 CPU#3 8 DIMMs PCI-E Slot#7
PCI-E Slot#8

Note in the above table that on M4000 and M5000 servers, XSB 0 gets the internal disks, DVD, Gigabit Ethernet, PCI-X slot and two PCI-Express slots. XSB 1 gets two PCI-Express slots. XSBs 2 and 3 have no I/O. This is a physical limitation -- the M4000 and M5000 I/O units only have two PCI-Express hostbridges. So, while in theory you could create eight domains with a single CPU each, in reality a domain needs I/O so you can only create four hardware domains in an M5000; two domains in an M4000.

The M8000 and M9000 system boards, on the other hand, have symmetric XSBs -- each system board has four CPUs, 32 DIMMs, four PCI-Express hostbridges and 8 PCI-Express slots. When they're placed in quad-XSB mode, each XSB has one CPU, 8 DIMMs, one PCI-Express hostbridge and two PCI-Express slots. So a SPARC Enterprise M8000 with 4 system boards can effectively be split into 16 domains.

For example, with an M5000, you could place system board 00 in quad-XSB mode, and system board 01 in uni-XSB mode. Then you can create one domain with XSBs 00-0 and 00-1 (call this the "green" domain), and a second domain with 00-2, 00-3 and all of 01 (call this the "blue" domain). Here's what that would look like:

 
Example M5000 With Two Domains
SB XSB CPUs Memory I/O
00 00-0 CPU#0 8 DIMMs 2 SAS Disks
DVD/DAT
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
00-1 CPU#1 8 DIMMs PCI-E Slot#3
PCI-E Slot#4
00-2 CPU#2 8 DIMMs No I/O
00-3 CPU#3 8 DIMMs No I/O
01
CPU#0
CPU#1
CPU#2
CPU#3
32 DIMMs 2 SAS Disks
2 GBE Ports
PCI-X Slot#0
PCI-E Slot#1
PCI-E Slot#2
PCI-E Slot#3
PCI-E Slot#4

The green domain could have 2 CPUs, 16 DIMMs, and lots of I/O, while the blue domain could have 6 CPUs, 48 DIMMs, and lots of I/O.

There are some down-sides to using quad-XSB mode. The primary issue is availability in the face of hardware failures. On an M4000 or M5000 there are two SC chips (officially, these are called "system controller" ASICs; however, due to potential confusion with the Sun Fire System Controllers, I like to just call them SC chips); the M8000/M9000 system board has four SC chips. The SC chips connect the CPUs, memory and I/O on the system board, and connect the system board to the system crossbar (or in the case of the M5000, the SCs on one system board connect directly to the SCs on the other system board). The SC chips are shared by all XSBs on a system baord. If a system board is in uni-XSB mode and there's a fault internal to an SC chip, the system board (and the domain using that system board) may take a fatal error and be reset. If a system board is in quad-XSB mode, an SC fault may require the entire system board to be reset, which would reset all domains using XSBs on that system board.

Using the M5000 example above, if the system experienced a fatal error in CPU#0, only the green domain would be reset. However, if one of the SC chips on system board 00 experiences a fault, then all XSBs on system board 00 are affected; both the blue and the green domains would be reset as a result.

On the other hand, XSBs do offer a great deal of flexibility. With an M4000 which only has one system board, you can create two domains, something you could never do with a Sun Fire 6900/25K with only one system board. On larger systems, you have the flexibility of configuring domains down to the CPU level, rather than at a system board level. If the impact of losing two domains due to a hardware failure is acceptable, then quad-XSB mode offers unprecedented flexibility and configurability.

Comments:

Hi,

I have an M5000 with 4 CPUs, 32 DIMMS and 2 IOUs - I have a single domain which I want to use all the hardware. However, I am unable to assign the second IOU to my domain. When I do showboards, XSB 00-0 is fine, but XSB 01-0 (which I presume contains IOU#1) is marked as unmounted. If I look with "showlogs event" I see "no CPU on XSB#01-0" and "no MEM on XSB#01-0". Currently my 2 CPU modules are in slots 0 and 1 and my memory modules are in slots 0,1,2,3 - do I need to move 1 CPU module to slot 2 and 2 memory modules to slots 4,5 to get XSB 01-0 working? Or is their some XSCF command I can use so I don't have to take the machine apart?

thanks for any help.

Posted by Simon Kilvington on March 27, 2008 at 05:50 AM EDT #

Let me start by saying: You should really contact your Sun Service Engineer for the authoritative answer. But let me tell you what I _think_...

If you look at the first set of diagrams above, you basically have all your CPUs and DIMMs and one IOU in logical "system board 0", and the other IOU in "system board 1". I don't believe you can have a "system board" without CPUs and DIMMs. But like I said, don't trust my word on it.

Posted by Bob Hueston on March 27, 2008 at 06:31 AM EDT #

Hi,

good news - moving the CPU and memory modules has solved the problem - I now have my 2 CPU modules in slots 0 and 2 and my 4 memory modules in slots 0,1,4 and 5 and my 2 IOU modules. The showboards command now shows me that both system boards are available for my domain.

I guess it would be good to get this bit of info added to the service manual or just recorded here.

thanks for your help.

Posted by Simon Kilvington on March 28, 2008 at 01:00 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

Bob Hueston

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today