Recipe for a ZFS RAID-Z Storage Pool on Sun Fire X4540

[Update Sept 26th: I have revised this from the initial posting on Sept 25th. The hot spares have been laid out in a tidier way and I have included an improved script which is a little more generalized.]

Almost  year ago I posted a Recipe for Sun Fire X4500 RAID-Z Config with Hot Spares. Now we have the new SunFire X4540, it has a different disk controller numbering and more bootable disk slots, so I have revisited this.

Using my Sun Fire X4540 Disk Planner, I first worked out how I wanted it to look....

Plan

The server has six controllers, each with 8 disks. In the planner, the first controller is c0, but the controller numbering will not start at c0 in all cases: if you installed Solaris off an ISO image they will run from c1->c6; if Solaris is installed with Jumpstart then they will run c0->c5, in one case I have seen the first controller as c4. Whatever the first controller is, the others will follow in sequence.

I assumed that mirrored boot disks are desirable, so I allocated two disk for the OS.

ZFS is happy with stripes of dissimilar lengths in a pool, but I like all the stripes in a pool to be the same length, so I allocated hot spares across the controllers to enable me to build Eight 5 disk RAID-Z stripes. There is one hot spare per controller.

This script creates the pool as described above. The required arguments are the desired name of the pool and the name of the first controller. It does a basic check to see that you are on a  Sun Fire X4540.

#! /bin/sh
#
#set -x
#
#Make ZFS storage pools on a Sun Fire X4540 (Thor).
#This WILL NOT WORK on Sun Fire X4500 (Thumper) as
#the boot disk locations and controller numbering
#is different.
#
#Need two arguments:
#
# 1. name of pool
# 2. name of first controller e.g c0
#

prtdiag -v | grep -w X4540 > /dev/null 2>&1
if [ $? -ne 0 ] ; then
        echo "This script can only be run on a Sun Fire X4540."
        exit 1
fi

#
case $# in
        2)#This is a valid argument count
        ZPOOLNAME=$1
        CFIRST=$2
        ;;
        \*) #An invalid argument count
        echo "Usage: `basename ${0}` zfspoolname first_controller_number"
        echo "Example: `basename ${0}` tank c0"
        exit 1;;
esac

#The numbering of the disk controllers will vary,
#but will most likely start at c0 or c1.

case $CFIRST in
        c0)
        Cntrl0=c0
        Cntrl1=c1
        Cntrl2=c2
        Cntrl3=c3
        Cntrl4=c4
        Cntrl5=c5
        ;;
        c1)
        Cntrl0=c1
        Cntrl1=c2
        Cntrl2=c3
        Cntrl3=c4
        Cntrl4=c5
        Cntrl5=c6
        ;;
        \*)
        echo "This script cannot work if the first controller is ${CFIRST}."
        echo "If this is the correct controller than edit the script to add"
        echo "settings for first controller = ${CFIRST}."
        exit 1
        ;;
esac

# Create pool with 8 x RAIDZ.4+1 stripes
# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \\
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \\
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \\
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \\
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \\
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \\
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \\
raidz ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \\
raidz ${Cntrl0}t7d0 ${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \\
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0 ${Cntrl0}t6d0 ${Cntrl1}t7d0

#End of script

I have called the script makex4540raidz-6hs.sh. In the below example I create a storage pool called tank and my first controller is c1.

root@isv-x4500a # makex4540raidz-6hs.sh tank c1

This is how it looks...

root@isv-x4540a # zpool status

root@isv-x4500a # zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t6d0  ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
            c4t7d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
        spares
          c3t2d0    AVAIL   
          c4t3d0    AVAIL   
          c5t4d0    AVAIL   
          c6t5d0    AVAIL   
          c1t6d0    AVAIL   
          c2t7d0    AVAIL   

errors: No known data errors

I have used this layout on my systems for over a year now in the labs, pounding the heck out of it. The first two controllers are marginally less busy as they support both a boot disk and hotspare, but I have seen very even performance across all the data disks.

So far, I have not lost a disk so I am probably way over a cautious with my hot spares...famous last words :-)..but if you want to reduce the number of hot spares to four, then it is easy to modify the script by taking spares and adding them to the stripes. If you want to do this, since the first two controllers are marginally less loaded than the other controllers, I recommend you modify the script to extend the stripes on rows t6 & t7 as below . You need to make this decision up front before building the pool as you cannot change the length of a RAID-Z stripe once the pool is built.

The zpool create command in the script would now look like this...the modified lines are in bold text.

<snip>

.

.

# Create pool with 6 x RAIDZ.4+1 stripes & 2 x RAIDZ.5+1 stripes
# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \\
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \\
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \\
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \\
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \\
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \\
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \\
raidz
${Cntrl0}t6d0 ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \\
raidz ${Cntrl0}t7d0
${Cntrl1}t7d0
${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \\
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0
Comments:

Hi Tim,

I understand the X4540 has the ability to boot from a inbuilt flash card. Have you had a chance to play with this yet?

Posted by Malcolm Gibbs on September 25, 2008 at 02:57 PM BST #

Hi Malcolm. Yes, there is a bootable flashcard, but i have not experimented with it myself. Rgds, Tim

Posted by Tim Thomas on September 26, 2008 at 03:20 AM BST #

I'm a noob to the Sun systems (been over 15 years since I've played with one. I've noticed that my x4540 that just arrived has a different physical drive layout than your drive mapping chart. What I have is like this:

left rear right rear

3 7 11 ... 43 47

2 6 10 ... 42 46

1 5 9 ... 41 45

0 4 8 ... 40 44

left front right front

Do you know if this is a different drive to controller mapping that what you've played with?

Thanks (and nice "how to" & script)

Jim

Posted by Jim Bucks on October 14, 2008 at 08:27 AM BST #

Oh, and the top cover says the boot drives can be 0, 1, 8, or 9.

Jim

Posted by Jim Bucks on October 14, 2008 at 08:28 AM BST #

Jim, your slot numbers are correct. Physically the disks are laid out 12 across and 4 down as show in the service manual (Sun Document 819-4359-14) on page 44. My disk planner shows a logical (rather than physical) view of how solaris "enumerates" the devices. Rgds, Tim

Posted by Tim Thomas on October 15, 2008 at 11:50 AM BST #

Do you have any "rough / approximate" timings for the zfs create (?? format ??) to actually create usable space on the tank?

I've had my x4540 created for a couple of days now, but when I try to copy about 20 Tb of files onto it, I keep getting error messages to the effect of "not enough free space".

I have been able to copy about 6.5 Mb onto it, and I'm hoping this is just the formatting process is still running on all the drives.

PS - I used your basic setup script, have 2 boot disks, 2 hot spares, and using raidz for the remaining 42 drives.

df -h looks like this --
hdtank 32T 39K 32T 1% /hdtank
hdtank/programs 32T 6.5G 32T 1% /export/programs

and, zpool list looks like this -
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
hdtank 39.9T 7.90G 39.9T 0% ONLINE -

Thanks,

Jim

Posted by Jim Bucks on October 21, 2008 at 02:33 PM BST #

Hi Jim. This is peculiar behavior. There is no "format" taking place, once zpool create/zfs create return they are done. Try running "zpool status" to see if everything is healthy and "zfs list" to double check the capacity of your file system. Rgds, Tim

Posted by Tim Thomas on October 21, 2008 at 11:05 PM BST #

Hi Tim,

Thanks for this example - here's an alternate layout, for use with CF boot (so all 48 disks are available). It's more or less a variation of your layout, with a spare on each controller, and the remaining disks laid out as 7 sets of 5+1, with no stripes containing more than 1 disk on the same controller.

Note the CF card consumes c0, so the spinning disks start at c1, in this case.

Apologies for the formatting - not sure how to get tabs to survive.

raidz c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0
raidz c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0
raidz c3t2d0 c4t2d0 c5t2d0 c6t2d0 c1t3d0 c2t3d0
raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 c3t4d0
raidz c5t4d0 c6t4d0 c1t5d0 c2t5d0 c3t5d0 c4t5d0
raidz c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0
raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0
spare c1t1d0 c2t2d0 c3t3d0 c4t4d0 c5t5d0 c6t6d0

An example script to create the corresponding zpool command:

#!/usr/bin/perl

print "zpool create logpool ";

$vdevcount = 0;
foreach $t (0 .. 7)
{
foreach $c (1 .. 6)
{
if ($vdevcount == 0)
{
print "raidz ";
printf("c%dt%dd0 ", $c, $t);
$vdevcount++;
}
elsif ($vdevcount == 6)
{
push(@spares, sprintf("c%dt%dd0", $c, $t));
$vdevcount = 0;
}
else
{
printf("c%dt%dd0 ", $c, $t);
$vdevcount++;
}
}
}
print "spare ";
foreach $spare (@spares)
{
print $spare, " ";
}
print "\\n";

Posted by Ben on March 16, 2009 at 09:14 PM GMT #

Post a Comment:
Comments are closed for this entry.
About

Tim Thomas

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today