Thursday Sep 25, 2008

Recipe for a ZFS RAID-Z Storage Pool on Sun Fire X4540

[Update Sept 26th: I have revised this from the initial posting on Sept 25th. The hot spares have been laid out in a tidier way and I have included an improved script which is a little more generalized.]

Almost  year ago I posted a Recipe for Sun Fire X4500 RAID-Z Config with Hot Spares. Now we have the new SunFire X4540, it has a different disk controller numbering and more bootable disk slots, so I have revisited this.

Using my Sun Fire X4540 Disk Planner, I first worked out how I wanted it to look....


The server has six controllers, each with 8 disks. In the planner, the first controller is c0, but the controller numbering will not start at c0 in all cases: if you installed Solaris off an ISO image they will run from c1->c6; if Solaris is installed with Jumpstart then they will run c0->c5, in one case I have seen the first controller as c4. Whatever the first controller is, the others will follow in sequence.

I assumed that mirrored boot disks are desirable, so I allocated two disk for the OS.

ZFS is happy with stripes of dissimilar lengths in a pool, but I like all the stripes in a pool to be the same length, so I allocated hot spares across the controllers to enable me to build Eight 5 disk RAID-Z stripes. There is one hot spare per controller.

This script creates the pool as described above. The required arguments are the desired name of the pool and the name of the first controller. It does a basic check to see that you are on a  Sun Fire X4540.

#! /bin/sh
#set -x
#Make ZFS storage pools on a Sun Fire X4540 (Thor).
#This WILL NOT WORK on Sun Fire X4500 (Thumper) as
#the boot disk locations and controller numbering
#is different.
#Need two arguments:
# 1. name of pool
# 2. name of first controller e.g c0

prtdiag -v | grep -w X4540 > /dev/null 2>&1
if [ $? -ne 0 ] ; then
        echo "This script can only be run on a Sun Fire X4540."
        exit 1

case $# in
        2)#This is a valid argument count
        \*) #An invalid argument count
        echo "Usage: `basename ${0}` zfspoolname first_controller_number"
        echo "Example: `basename ${0}` tank c0"
        exit 1;;

#The numbering of the disk controllers will vary,
#but will most likely start at c0 or c1.

case $CFIRST in
        echo "This script cannot work if the first controller is ${CFIRST}."
        echo "If this is the correct controller than edit the script to add"
        echo "settings for first controller = ${CFIRST}."
        exit 1

# Create pool with 8 x RAIDZ.4+1 stripes
# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \\
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \\
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \\
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \\
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \\
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \\
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \\
raidz ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \\
raidz ${Cntrl0}t7d0 ${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \\
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0 ${Cntrl0}t6d0 ${Cntrl1}t7d0

#End of script

I have called the script In the below example I create a storage pool called tank and my first controller is c1.

root@isv-x4500a # tank c1

This is how it looks...

root@isv-x4540a # zpool status

root@isv-x4500a # zpool status tank
  pool: tank
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t6d0  ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c4t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
            c4t7d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
          c3t2d0    AVAIL   
          c4t3d0    AVAIL   
          c5t4d0    AVAIL   
          c6t5d0    AVAIL   
          c1t6d0    AVAIL   
          c2t7d0    AVAIL   

errors: No known data errors

I have used this layout on my systems for over a year now in the labs, pounding the heck out of it. The first two controllers are marginally less busy as they support both a boot disk and hotspare, but I have seen very even performance across all the data disks.

So far, I have not lost a disk so I am probably way over a cautious with my hot spares...famous last words :-)..but if you want to reduce the number of hot spares to four, then it is easy to modify the script by taking spares and adding them to the stripes. If you want to do this, since the first two controllers are marginally less loaded than the other controllers, I recommend you modify the script to extend the stripes on rows t6 & t7 as below . You need to make this decision up front before building the pool as you cannot change the length of a RAID-Z stripe once the pool is built.

The zpool create command in the script would now look like this...the modified lines are in bold text.




# Create pool with 6 x RAIDZ.4+1 stripes & 2 x RAIDZ.5+1 stripes
# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \\
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \\
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \\
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \\
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \\
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \\
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \\
${Cntrl0}t6d0 ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \\
raidz ${Cntrl0}t7d0
${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \\
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0

Wednesday Sep 24, 2008

Using the new Sun JBODS with ZFS

Sun recently announced a  new range of JBODS, the Sun Storage J4000 series: the J4200, J4400 and J4500....I borrowed this nice graphic from

J4000 Series

You can read about the JBODs in detail here on

JBODS are a great fit with ZFS and my good friend Dominic Kay has written a nice tutorial on using ZFS with the Sun Storage J4000 series.

The J4000 series is supported with Solaris, Windows and Linux. There is a choice of a simple 8 port SAS HBA or SAS HBA with Hardware RAID support.

Tuesday Sep 23, 2008

Sun Fire X4540 Disk Planner

The SunFire X4500 (commonly known as Thumper) got a facelift a few months ago and the new version is the SunFire X4540. The X4540 has to a large degree been re-architected, and has new CPUs, more memory and a new I/O subsystem. There are still 48 disks, but the controller numbering is different and we now have four bootable disk slots vs only two in the X4500. 

Now, I need to draw a picture when planning ZFS storage pools with so many disks so I just uploaded my SunFire X4540 disk planner in PDF and OpenOffice formats. There is no rocket science here, it just helps you draw a picture, but I find it useful. It is an update of a similar doc I created for the X4500

Tuesday May 27, 2008

Making SAMBA Go Faster.....

I do a lot of work with the CIFS server that we now provide as part of OpenSolaris, but I also still do work with SAMBA as well.

I have been experimenting with a workload where I am accessing 100 files from a Windows Server 2003 CIFS client (Sun Fire V40z). The server is a Sun Fire X4500 running SAMBA. I am doing sequential I/O with a  workload generating tool, and at any time 75 files are open for read and 25 open for write. There are 1million files in the file system comprising 4 TB of data.

I was not seeing great scalability or performance, then some research by a colleague (supported by some folk at led me to try enabling Async IO (AIO) on the SAMBA server. This is a standard feature of SAMBA, and has been available in Sun's SAMBA server build since Solaris 10 8/07. From what I have been told , AIO specifically helps the case of client workload scaling w/ threads Vs scaling with many connections; the workload generating tool I am using (vdbench) scales with threads.

To enable AIO add these lines to the global part of your smb.conf :

aio read size = 1
aio write size =1

Then restart SAMBA.

This parameter is defined in bytes. This setting means that any I/O over 1 byte will be handled asynchronously by smbd. There may be reasons for this to be a bigger value in some cases, I don't know.

Without AIO, my previous best result was 42 MB/sec reads and 14 MB/sec writes; with AIO the client could read at 64.4 MB/sec reads and write a 22.1 MB/sec. CPU utilization on the X4500 running SAMBA went up from 15% to 50%.

Your mileage will vary depending on your workload and application, but that is quite a nice boost for just adding two lines to a configuration file :-)

Note: This work was done running the Solaris Express Community Edition snv_89 X86 (aka Nevada) on the Sun Fire X4500. The underlying file system was a ZFS file system provisioned from a RAID-Z storage pool configured as described here. I was reading/writing 8KB blocks.

Friday Apr 11, 2008

OpenSolaris as a StorageOS - The Week That Everything Worked First Time

This has been an extraordinary week...everything I have tried to do has worked first time.

To summarise this weeks activities, I have set-up and then blogged on:

Configuring the OpenSolaris CIFS Server in Workgroup Mode
Configuring the OpenSolaris CIFS Server in Domain Mode
Solaris CIFS Windows SID Mapping - A First Look
Configuring the OpenSolaris Virus Scanning Services for ZFS Accessed via CIFS Clients

All in all..a very good week :-) 

Configuring the OpenSolaris Virus Scanning Services for ZFS Accessed via CIFS and NFSv4 Clients

OpenSolaris includes features that enable virus scanning of files accessed by CIFS and NFSv4 clients. You can read about the project on the OpenSolaris Project: VSCAN Service Home Page. The scanning model is similar to that discussed in this article in that files are sent (using the ICAP protocol) to external servers running anti virus software to be scanned...a common model for NAS appliances.

Having set up the OpenSolaris CIFS server as described here, I wanted to try out these services. Part of my job at Sun is to certify anti virus software with our NAS products, so I am experienced in this kind of testing.

As before, I am working on a Sun Fire X4500 with Solaris Nevada build 86 installed....

root@isv-x4500b # uname -a
SunOS isv-x4500b 5.11 snv_86 i86pc i386 i86pc

I have mostly presented the commands I have used as is..but I have occasionally had to edit fields.

I installed a copy of the Symantec Scan Engine Version 5.1 on a Windows server (hostname: scanengine) in my lab to provide the necessary scanning services. The Symantec Scan Engine has an ICAP protocol interface as standard, so enabling the OpenSolaris VSCAN service to communicate with it.

I configured the Symantec Scan Engine in "Scan Only" mode which means that it will notify the VSCAN service if a file is infected or will not attempt to repair infected files (i.e. if won't remove the virus). The response of the VSCAN service to being told that a file is infected is to set attributes on the file so that access is denied i.e. the file is quarantined.

The VSCAN services are managed with the vscanadm command. The vscan service daemon, vscand, interacts with the scan engine to have the file scanned; sending the file contents to the scan engine via the ICAP protocol.

I wanted to enable virus scanning on a CIFS share called cifs2 which is a share off the ZFS file system tank/cifs2. The procedure was as follows:

1. Enable VSCAN Services

root@isv-x4500b # svcadm enable vscan
root@isv-x4500b # svcs vscan
STATE          STIME    FMRI
online          7:38:08 svc:/system/filesystem/vscan:icap

2. Enable Virus Scanning on the ZFS File System

root@isv-x4500b # zfs set vscan=on tank/cifs2

NOTE: In the next steps you should substitute the hostname of your server running anti virus software for scanengine.

3. Add a Scan Engine

root@isv-x4500b # vscanadm add-engine scanengine
root@isv-x4500b # vscanadm get-engine scanengine

NOTE: Port 1344 is the default for ICAP, it can be changed. If you changed it you would also need to change the port used by the anti virus software.

4. Optional Step - Set Maximum Size of File To Sent To Be Scanned

It is inefficient to scan very large files. Here we set maximum size of a file to be scanned to 100 MB. If a file 100MB in size, or greater, needs to be scanned you have an option of allowing access or denying access: in either case the file will not be scanned.

root@isv-x4500b # vscanadm set -p max-size=100M
root@isv-x4500b # vscanadm set -p max-size-action=deny
root@isv-x4500b # vscanadm show


5. Optional Step - Modify The List of Types of File to Scan

By default all file types are scanned. Here we remove files ending in "jpg" from the list of files types to be scanned.

root@isv-x4500b # vscanadm set -p types=-jpg,+\*
root@isv-x4500b # vscanadm show


6. Checking That Scanning is Working

You can get files from EICAR to test virus scanners. The files look like viruses to the scanners, but are safe to use.

I mounted up the cifs2 share on a Windows server and created some files and Folders on the share with no issues. I then drag and dropped EICAR files from the Window server's unprotected drive onto the share. When I tried to open an EICAR file on the share, access was denied.


Messages like the one below appeared in the system log.

Apr  9 08:13:09 isv-x4500b vscand: [ID 540744 daemon.warning] quarantine /tank/cifs2/ 11101 - EICAR Test String

Back on the server running OpenSolaris I checked to see if files had actually been scanned as below

root@isv-x4500b # vscanadm stats

File are being scanned and there are no scan errors! 

Lastly, I also looked at the statistics on the Symantec Scan Engine GUI which also confirmed that files were being scanned. Note that the numbers below do not match the output of vscanadm above as the screenshot was taken a a later time.

Symantec Scan Engine

Access is denied to the infected files because the quarantine bit has been set. You can check for the q for quarantine bit as follows...

root@isv-x4500b # ls -/c
----------+  1 2147483649 2147483650      68 Apr  9 08:13

For More Information

OpenSolaris Project: VSCAN Service Home Page

vscanadm and  vscand manual pages.


Tim Thomas


« April 2014