SunFire X2100 - RAID, ZFS and Grub failover
By tls on Feb 18, 2007
This blog entry discusses the steps that I went through to get a combination of RAID and ZFS to run on a SunFire X2100. My goal is to have a simple web/mail server with 2 mirrored disks. This way, the unit is self contained with a complete mirror of everything. This way if a disk goes bad, it's a simple trip to Fry's and $150 later, the machine is running fine, with no lost data. At least that's my theory. This blog admits that I'm not taking a larger disaster recovery into account. That's performed with other data backups to an offsite location.
I would love to have used only ZFS to perform this task, however the current version of Solaris (update 3) won't allow for the root filesystem to be part of a ZFS pool. When this does occur, ZFS will be responsible for all the RAID and metadbs, etc... So, I'm using a combination of Solaris RAID and ZFS.
RAID will mirror the root, swap and a couple of metadb partitions. The larger data partition will be mirrored using ZFS. Below are the high level steps that used. They aren't intended to be 100% complete, but should offer a brief set of steps for others to use.
- Setting up partition scheme to use (Performed during my Solaris installation)
I used a Seagate 500GB SATA drive. The partition table looks like:
Partition Tag Size Description 0 root 20 GB / partition. Solaris 10 install 1 swap 4 GB Normal swap 2 backup 465 GB Entire Drive Not Used 3 unassigned 40 MB meta-db 4 unassigned 422 GB Data Partition (used by ZFS later) 5 unassigned 20 GB / partition. Used for future live update 6 unassigned 40 MB meta-db 7
A few comments about this scheme. It's been recommended from a few resources to seperate the two meta-db's from each other in case of bad drive blocks forming. We don't want to have both corrupted in the case that the machine needs to reboot to the other drive.
- Format the 2nd drive with the same partition layout as the first
$ prtvtoc /dev/rdsk/c1d0s2 | fmthard -s - /dev/rdsk/c2d0s2
If you are runnig this on an x86 machine, first ensure that the fdisk partitions on the second disk match the first. You can get the fdisk info with the command:
$ fdisk /dev/rdsk/c1d0p0 # first disk
$ fdisk /dev/rdsk/c2d0p0 # second disk
- Setup the metadb's on the disks
$ metadb -af -c 2 c1d0s3 c1d0s6
$ metadb -af -c 2 c2d0s3 c2d0s6
- Initialize the metadb's on the disks
Next step is to initialize the metadb's created from the previous step. This setups the mirror from the root partition over to each metadb. For the purposes, the RAID 1 mirrored volumes in this example will be setup as:
RAID Volume Partition Description d0 / mirror d10 c1d0s0 / d20 c2d0s0 / d1 swap mirror d11 d1d0s1 swap d21 c2d0s1 swap
$ metainit -f d10 1 1 c1d0s0
$ metainit -f d20 1 1 c2d0s0
$ metainit d0 -m d10
$ metainit -f d11 1 1 c1d0s1
$ metainit -f d21 1 1 c2d0s1
$ metainit d1 -m d11
- Setup /etc/vfstab with new mirror mount point
Use a helpful Solaris script to setup your root filesystem in vfstab.
$ metaroot d0
Then, edit your /etc/vfstab and make similar changes to your swap partition.
- Reboot your system to use new vfstab definition
- Attach the 2nd drive RAID to the first
$ metattach d0 d20
$ metattach d1 d21
You can view the status of the syncing of d10 to d20 and d11 to d21
$ metastat -c
- Install grub on 2nd drive
Install grub on the second drive in case the first one fails. This will allow the system to boot up. It's important to note that it will boot up in single user mode. You must use metadb to fix the meta database for the lost disk. See notes down below.
$ installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2d0s0
- Create the ZFS Pool
Now, we want to create a zfs pool with mirroring on the two partitions designated for zfs.
$ zpool create pool mirror /dev/dsk/c1d0s4 /dev/dsk/c2d0s4
- Create the ZFS Mount Point
Create whatever mountpoints and customize zfs to your hearts content. An example of creating a mountpoint for /foo might look like:
$ zfs create pool/foo
$ zfs set mountpoint=/foo pool/foo
A couple of other notes:
- During some testing, I pulled out drive 2 while the server was up and running. I believe the SunFire X2100 doesn't support hot swapping, as the OS started to send errors to console. I ended up rebooting the box just to see what would happen. Upon reboot, the metastat -c command told me that d20 was in maint mode, meaning there was a problem. I ran the command to rebuild that mirror:
$ metareplace -e d0 c2d0s0
- If a drive fails, upon reboot the system should boot up into single user mode. It's up to the system administrator to delete any meta mirrors off of the bad drive from the metadb. This is done with a combination of metadb -d commands. See the Sun doc notes at:
Recovering from failed disk