Tuesday Dec 23, 2008

My Home Media Server on OpenSolaris + ZFS: Part 2

In my previous blog entry, I decided how ZFS will protect my data for a home media server I'm building.  Next: partition the disks on my two larger drives and install OpenSolaris on them.

This would be stupendously easy if my four disk drives were all the same size: I would type "zpool create mediapool raidz <disk1> <disk2> <disk3> <disk4>" and ZFS would give me a ton of storage all nice and protected for me.  But I have two 1TB drives and two 1.5TB drives.   My problem: ZFS wants all the pieces of a "vdev" (a virtual device; in this case, I'm creating a virtual RAID-Z device with four disks in it) to be the same size.  So I have some partitioning work to do.  I'm documenting what I did in case any of you want to use ZFS with different sized drives.

Here is my plan:
  • make sure the 1.5TB drives are the 1st and 2nd drives seen by the computer's BIOS, so that I can install OpenSolaris on one of these bigger drives
  • partition each 1.5TB drive into a 1TB partition and a .5TB partition (I recommend doing the partitioning from the Live CD instead of after installing the OS; it went easier for me this way)
  • install OpenSolaris onto the first 1.5TB drive's .5TB partition; installation will create a ZFS pool called "rpool"
  • put the four 1TB partitions into a ZFS raidz pool I will call "mediapool", my primary storage for our home's stuff
  • attach the remaining .5TB partition (from the second 1.5TB drive) to "rpool", making it a ZFS mirror pool so that the OS is protected against a single disk failure
I suppose I could've just made a single pool for storage, but I still like the idea of being able to separate my media storage from my OS.  Anyway, this is my plan for now.

Recalling that ZFS wants all the devices in a vdev to be the same size, I need to do some disk math to make sure the partition sizes are the same number of bytes.  Here's why (and don't laugh if this is all trivial to you; I'm a manager, okay?  If I don't see headcount or budget somewhere in this, I just get confused):

First of all, fdisk lets me specify partition sizes by either a percentage of the disk or a number of cylinders.  Specifying a percentage doesn't let me get precise enough to match the partition sizes on the the 1.5TB disks and the 1TB disks, so I need to specify partition size in terms of cylinders.  But cylinders aren't the same size on the two different disks.

The fdisk utility reports the following information about the 1.5TB and 1TB disks:

1.5TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 48195 512-byte blocks
1TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 32130 512-byte blocks

Notice that one cylinder on the 1.5TB drive is 1.5 the size (or 3/2, this way of reckoning comes in handy later) of a cylinder on the 1TB drive (48195 = 3/2 \* 32130).

I want to use as much of the 1TB drive as possible (60800 cylinders) but I can't: 60800 cylinders on the 1TB drive corresponds to 40533.33333 cylinders on the 1.5TB drive; I can't enter a non-integer number into fdisk.  I must find a size that works for both disks.  It needs to be a multiple of 3 cylinders on the 1TB drive (which would be a multiple of 2 cylinders on the 1.5TB drive).  I'll waste a little space (2 cylinders' worth on the 1TB drive or about 32MB), but that's okay given that I'll get RAID-Z error correction in return.

I'll create 1 partition on the 1TB disk, 60798 cylinders (next closest multiple of 3) == 1,953,439,740 blocks.
I'll create two partitions the 1.5TB disk:
  1. 40532 cyls == 1,953,439,740 blocks
  2. 20268 cyls (use this for the OS "rpool")
Now that I know exactly how big each partition needs to be on the four disks, I can use this easy-to-follow example to create the Solaris fdisk partitions.  It's easy; it takes less than five minutes, once I've worked out the math I just laid out here.

Next, it's time to install the OS.  I'm already running OpenSolaris from the Live CD, so I just click on the icon to install and less then 20 minutes later, it's there.

Next: create the media storage pool, using all four disks in a RAID-Z configuration:
drapeau@blackfoot:$ pfexec formatSearching for disks...done

 0. c1t0d0  /pci@0,0/pci108e,534a@7/disk@0,0
 1. c1t1d0  /pci@0,0/pci108e,534a@7/disk@1,0 
 2. c2t0d0  /pci@0,0/pci108e,534a@8/disk@0,0
 3. c2t1d0  /pci@0,0/pci108e,534a@8/disk@1,0
Specify disk (enter its number): \^C

drapeau@blackfoot:$ zpool listNAME SIZE USED AVAIL CAP HEALTH ALTROOT
rpool 444G 13.7G 430G 3% ONLINE -

drapeau@blackfoot:$ pfexec zpool create mediapool raidz c2t1d0p1 c2t0d0p1 c1t1d0p2 c1t0d0p2

Note that I used partition names for these disks, which is important: according to this helpful document, in Solaris disk device names, you'll see four primary partitions (p1-p4) and a "p0" as well which means "the whole disk".  I had to be clear to tell ZFS that I didn't want to use the whole 1TB disks, only the 1st partition on them (c2t1d0p1, c2t0d0p1).  And I told ZFS to use the 2nd partitions on the 1.5TB disks (c1t1d0p2, c1t0d0p2), which are the roughly-1TB partitions.

So, did it work?  Let's see:
drapeau@blackfoot:$ zpool list

mediapool 3.62T 132K 3.62T 0% ONLINE -
rpool 444G 13.7G 430G 3% ONLINE -

So far, so good: two ZFS pools. Let's check status:
drapeau@blackfoot:$ zpool status
 pool: mediapool
state: ONLINE
scrub: none requested

 mediapool ONLINE 0 0 0
 raidz1 ONLINE 0 0 0
 c2t1d0p1 ONLINE 0 0 0
 c2t0d0p1 ONLINE 0 0 0
 c1t1d0p2 ONLINE 0 0 0
 c1t0d0p2 ONLINE 0 0 0

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested

 rpool ONLINE 0 0 0
 c1t0d0s0 ONLINE 0 0 0

errors: No known data errors

drapeau@blackfoot:$ zfs list

mediapool 92.0K 2.67T 26.9K /mediapool
rpool 21.7G 415G 72K /rpool
rpool/ROOT 5.74G 415G 18K legacy
rpool/ROOT/opensolaris 5.74G 415G 5.61G /
rpool/dump 8.00G 415G 8.00G -
rpool/export 634K 415G 19K /export
rpool/export/home 615K 415G 19K /export/home
rpool/export/home/drapeau 596K 415G 596K /export/home/drapeau
rpool/swap 8.00G 423G 16K -

Sweet.  Now I've got a mediapool configured as a four-disk RAID-Z, and I have the rpool but right now it's only using one disk.  I want to mirror it now, using the 2nd 1.5TB disk's extra space.  I'll do that right now, then ask ZFS for status (I'll omit ZFS's status report on the mediapool because we just saw that).  Oh, and I'll make sure that mirrored rpool is bootable; ZFS will remind me to do it, so I'll include my steps here:

drapeau@blackfoot:$ pfexec zpool attach rpool c1t0d0s0 c1t1d0p1
Please be sure to invoke installgrub(1M) to make 'c1t1d0p1' bootable.

drapeau@blackfoot:$ pfexec installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0

Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 48195)
stage2 written to partition 0, 267 sectors starting at 50 (abs 48245)
stage1 written to master boot sector

drapeau@blackfoot:$ zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h5m, 39.40% done, 0h9m to go
        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s0  ONLINE       0     0     0  84.1M resilvered
            c1t1d0p1  ONLINE       0     0     0  5.41G resilvered

errors: No known data errors
About 8 minutes later, zpool status reported that the 2nd drive in rpool was resilvered and I had a fully-mirrored rpool.  Now if one of the two drives fails, I can still boot the OS and replace the bad disk.  And with the mediapool, I'm protected against any one of the four disks failing.  I'm feeling nice and secure; it's unlikely that two disks will fail at once unless the whole computer goes up in flames.  I'll deal with backup later, maybe by looking into Zmanda or something.

This is great: to this point, I've decided how to set up my storage and protect it, I've installed the OS, and I've created my storage pools.

My next blog entry will describe how I set up the computer to share all that storage with the rest of the house.

Powered by ScribeFire.


The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. What more do you need to know, really?


« August 2016