My Home Media Server on OpenSolaris + ZFS: Part 2
By drapeau on Dec 23, 2008
This would be stupendously easy if my four disk drives were all the same size: I would type "zpool create mediapool raidz <disk1> <disk2> <disk3> <disk4>" and ZFS would give me a ton of storage all nice and protected for me. But I have two 1TB drives and two 1.5TB drives. My problem: ZFS wants all the pieces of a "vdev" (a virtual device; in this case, I'm creating a virtual RAID-Z device with four disks in it) to be the same size. So I have some partitioning work to do. I'm documenting what I did in case any of you want to use ZFS with different sized drives.
Here is my plan:
- make sure the 1.5TB drives are the 1st and 2nd drives seen by the computer's BIOS, so that I can install OpenSolaris on one of these bigger drives
- partition each 1.5TB drive into a 1TB partition and a .5TB partition (I recommend doing the partitioning from the Live CD instead of after installing the OS; it went easier for me this way)
- install OpenSolaris onto the first 1.5TB drive's .5TB partition; installation will create a ZFS pool called "rpool"
- put the four 1TB partitions into a ZFS raidz pool I will call "mediapool", my primary storage for our home's stuff
- attach the remaining .5TB partition (from the second 1.5TB drive) to "rpool", making it a ZFS mirror pool so that the OS is protected against a single disk failure
Recalling that ZFS wants all the devices in a vdev to be the same size, I need to do some disk math to make sure the partition sizes are the same number of bytes. Here's why (and don't laugh if this is all trivial to you; I'm a manager, okay? If I don't see headcount or budget somewhere in this, I just get confused):
First of all, fdisk lets me specify partition sizes by either a percentage of the disk or a number of cylinders. Specifying a percentage doesn't let me get precise enough to match the partition sizes on the the 1.5TB disks and the 1TB disks, so I need to specify partition size in terms of cylinders. But cylinders aren't the same size on the two different disks.
The fdisk utility reports the following information about the 1.5TB and 1TB disks:
1.5TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 48195 512-byte blocks
1TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 32130 512-byte blocks
Notice that one cylinder on the 1.5TB drive is 1.5 the size (or 3/2, this way of reckoning comes in handy later) of a cylinder on the 1TB drive (48195 = 3/2 \* 32130).
I want to use as much of the 1TB drive as possible (60800 cylinders) but I can't: 60800 cylinders on the 1TB drive corresponds to 40533.33333 cylinders on the 1.5TB drive; I can't enter a non-integer number into fdisk. I must find a size that works for both disks. It needs to be a multiple of 3 cylinders on the 1TB drive (which would be a multiple of 2 cylinders on the 1.5TB drive). I'll waste a little space (2 cylinders' worth on the 1TB drive or about 32MB), but that's okay given that I'll get RAID-Z error correction in return.
I'll create 1 partition on the 1TB disk, 60798 cylinders (next closest multiple of 3) == 1,953,439,740 blocks.
I'll create two partitions the 1.5TB disk:
- 40532 cyls == 1,953,439,740 blocks
- 20268 cyls (use this for the OS "rpool")
Next, it's time to install the OS. I'm already running OpenSolaris from the Live CD, so I just click on the icon to install and less then 20 minutes later, it's there.
Next: create the media storage pool, using all four disks in a RAID-Z configuration:
drapeau@blackfoot:$ pfexec formatSearching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0
/pci@0,0/pci108e,534a@7/disk@0,0 1. c1t1d0 /pci@0,0/pci108e,534a@7/disk@1,0 2. c2t0d0 /pci@0,0/pci108e,534a@8/disk@0,0 3. c2t1d0 /pci@0,0/pci108e,534a@8/disk@1,0 Specify disk (enter its number): \^C drapeau@blackfoot:$ zpool listNAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 444G 13.7G 430G 3% ONLINE - drapeau@blackfoot:$ pfexec zpool create mediapool raidz c2t1d0p1 c2t0d0p1 c1t1d0p2 c1t0d0p2
Note that I used partition names for these disks, which is important: according to this helpful document, in Solaris disk device names, you'll see four primary partitions (p1-p4) and a "p0" as well which means "the whole disk". I had to be clear to tell ZFS that I didn't want to use the whole 1TB disks, only the 1st partition on them (c2t1d0p1, c2t0d0p1). And I told ZFS to use the 2nd partitions on the 1.5TB disks (c1t1d0p2, c1t0d0p2), which are the roughly-1TB partitions.
So, did it work? Let's see:
drapeau@blackfoot:$ zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mediapool 3.62T 132K 3.62T 0% ONLINE - rpool 444G 13.7G 430G 3% ONLINE -
So far, so good: two ZFS pools. Let's check status:
drapeau@blackfoot:$ zpool status pool: mediapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mediapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t1d0p1 ONLINE 0 0 0 c2t0d0p1 ONLINE 0 0 0 c1t1d0p2 ONLINE 0 0 0 c1t0d0p2 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors drapeau@blackfoot:$ zfs list NAME USED AVAIL REFER MOUNTPOINT mediapool 92.0K 2.67T 26.9K /mediapool rpool 21.7G 415G 72K /rpool rpool/ROOT 5.74G 415G 18K legacy rpool/ROOT/opensolaris 5.74G 415G 5.61G / rpool/dump 8.00G 415G 8.00G - rpool/export 634K 415G 19K /export rpool/export/home 615K 415G 19K /export/home rpool/export/home/drapeau 596K 415G 596K /export/home/drapeau rpool/swap 8.00G 423G 16K - drapeau@blackfoot:$Sweet. Now I've got a mediapool configured as a four-disk RAID-Z, and I have the rpool but right now it's only using one disk. I want to mirror it now, using the 2nd 1.5TB disk's extra space. I'll do that right now, then ask ZFS for status (I'll omit ZFS's status report on the mediapool because we just saw that). Oh, and I'll make sure that mirrored rpool is bootable; ZFS will remind me to do it, so I'll include my steps here:
drapeau@blackfoot:$ pfexec zpool attach rpool c1t0d0s0 c1t1d0p1 Please be sure to invoke installgrub(1M) to make 'c1t1d0p1' bootable. drapeau@blackfoot:$ pfexec installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Updating master boot sector destroys existing boot managers (if any). continue (y/n)?y stage1 written to partition 0 sector 0 (abs 48195) stage2 written to partition 0, 267 sectors starting at 50 (abs 48245) stage1 written to master boot sector drapeau@blackfoot:$ zpool status pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h5m, 39.40% done, 0h9m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 84.1M resilvered c1t1d0p1 ONLINE 0 0 0 5.41G resilvered errors: No known data errors drapeau@blackfoot:$About 8 minutes later, zpool status reported that the 2nd drive in rpool was resilvered and I had a fully-mirrored rpool. Now if one of the two drives fails, I can still boot the OS and replace the bad disk. And with the mediapool, I'm protected against any one of the four disks failing. I'm feeling nice and secure; it's unlikely that two disks will fail at once unless the whole computer goes up in flames. I'll deal with backup later, maybe by looking into Zmanda or something.
This is great: to this point, I've decided how to set up my storage and protect it, I've installed the OS, and I've created my storage pools.
My next blog entry will describe how I set up the computer to share all that storage with the rest of the house.
Powered by ScribeFire.