Tuesday Dec 23, 2008

My Home Media Server on OpenSolaris + ZFS: Part 2

In my previous blog entry, I decided how ZFS will protect my data for a home media server I'm building.  Next: partition the disks on my two larger drives and install OpenSolaris on them.

This would be stupendously easy if my four disk drives were all the same size: I would type "zpool create mediapool raidz <disk1> <disk2> <disk3> <disk4>" and ZFS would give me a ton of storage all nice and protected for me.  But I have two 1TB drives and two 1.5TB drives.   My problem: ZFS wants all the pieces of a "vdev" (a virtual device; in this case, I'm creating a virtual RAID-Z device with four disks in it) to be the same size.  So I have some partitioning work to do.  I'm documenting what I did in case any of you want to use ZFS with different sized drives.

Here is my plan:
  • make sure the 1.5TB drives are the 1st and 2nd drives seen by the computer's BIOS, so that I can install OpenSolaris on one of these bigger drives
  • partition each 1.5TB drive into a 1TB partition and a .5TB partition (I recommend doing the partitioning from the Live CD instead of after installing the OS; it went easier for me this way)
  • install OpenSolaris onto the first 1.5TB drive's .5TB partition; installation will create a ZFS pool called "rpool"
  • put the four 1TB partitions into a ZFS raidz pool I will call "mediapool", my primary storage for our home's stuff
  • attach the remaining .5TB partition (from the second 1.5TB drive) to "rpool", making it a ZFS mirror pool so that the OS is protected against a single disk failure
I suppose I could've just made a single pool for storage, but I still like the idea of being able to separate my media storage from my OS.  Anyway, this is my plan for now.

Recalling that ZFS wants all the devices in a vdev to be the same size, I need to do some disk math to make sure the partition sizes are the same number of bytes.  Here's why (and don't laugh if this is all trivial to you; I'm a manager, okay?  If I don't see headcount or budget somewhere in this, I just get confused):

First of all, fdisk lets me specify partition sizes by either a percentage of the disk or a number of cylinders.  Specifying a percentage doesn't let me get precise enough to match the partition sizes on the the 1.5TB disks and the 1TB disks, so I need to specify partition size in terms of cylinders.  But cylinders aren't the same size on the two different disks.

The fdisk utility reports the following information about the 1.5TB and 1TB disks:

1.5TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 48195 512-byte blocks
1TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 32130 512-byte blocks

Notice that one cylinder on the 1.5TB drive is 1.5 the size (or 3/2, this way of reckoning comes in handy later) of a cylinder on the 1TB drive (48195 = 3/2 \* 32130).

I want to use as much of the 1TB drive as possible (60800 cylinders) but I can't: 60800 cylinders on the 1TB drive corresponds to 40533.33333 cylinders on the 1.5TB drive; I can't enter a non-integer number into fdisk.  I must find a size that works for both disks.  It needs to be a multiple of 3 cylinders on the 1TB drive (which would be a multiple of 2 cylinders on the 1.5TB drive).  I'll waste a little space (2 cylinders' worth on the 1TB drive or about 32MB), but that's okay given that I'll get RAID-Z error correction in return.

I'll create 1 partition on the 1TB disk, 60798 cylinders (next closest multiple of 3) == 1,953,439,740 blocks.
I'll create two partitions the 1.5TB disk:
  1. 40532 cyls == 1,953,439,740 blocks
  2. 20268 cyls (use this for the OS "rpool")
Now that I know exactly how big each partition needs to be on the four disks, I can use this easy-to-follow example to create the Solaris fdisk partitions.  It's easy; it takes less than five minutes, once I've worked out the math I just laid out here.

Next, it's time to install the OS.  I'm already running OpenSolaris from the Live CD, so I just click on the icon to install and less then 20 minutes later, it's there.

Next: create the media storage pool, using all four disks in a RAID-Z configuration:
drapeau@blackfoot:$ pfexec formatSearching for disks...done

AVAILABLE DISK SELECTIONS:
 0. c1t0d0  /pci@0,0/pci108e,534a@7/disk@0,0
 1. c1t1d0  /pci@0,0/pci108e,534a@7/disk@1,0 
 2. c2t0d0  /pci@0,0/pci108e,534a@8/disk@0,0
 3. c2t1d0  /pci@0,0/pci108e,534a@8/disk@1,0
Specify disk (enter its number): \^C

drapeau@blackfoot:$ zpool listNAME SIZE USED AVAIL CAP HEALTH ALTROOT
rpool 444G 13.7G 430G 3% ONLINE -

drapeau@blackfoot:$ pfexec zpool create mediapool raidz c2t1d0p1 c2t0d0p1 c1t1d0p2 c1t0d0p2

Note that I used partition names for these disks, which is important: according to this helpful document, in Solaris disk device names, you'll see four primary partitions (p1-p4) and a "p0" as well which means "the whole disk".  I had to be clear to tell ZFS that I didn't want to use the whole 1TB disks, only the 1st partition on them (c2t1d0p1, c2t0d0p1).  And I told ZFS to use the 2nd partitions on the 1.5TB disks (c1t1d0p2, c1t0d0p2), which are the roughly-1TB partitions.

So, did it work?  Let's see:
drapeau@blackfoot:$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT

mediapool 3.62T 132K 3.62T 0% ONLINE -
rpool 444G 13.7G 430G 3% ONLINE -

So far, so good: two ZFS pools. Let's check status:
drapeau@blackfoot:$ zpool status
 pool: mediapool
state: ONLINE
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 mediapool ONLINE 0 0 0
 raidz1 ONLINE 0 0 0
 c2t1d0p1 ONLINE 0 0 0
 c2t0d0p1 ONLINE 0 0 0
 c1t1d0p2 ONLINE 0 0 0
 c1t0d0p2 ONLINE 0 0 0

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 rpool ONLINE 0 0 0
 c1t0d0s0 ONLINE 0 0 0

errors: No known data errors

drapeau@blackfoot:$ zfs list
NAME USED AVAIL REFER MOUNTPOINT

mediapool 92.0K 2.67T 26.9K /mediapool
rpool 21.7G 415G 72K /rpool
rpool/ROOT 5.74G 415G 18K legacy
rpool/ROOT/opensolaris 5.74G 415G 5.61G /
rpool/dump 8.00G 415G 8.00G -
rpool/export 634K 415G 19K /export
rpool/export/home 615K 415G 19K /export/home
rpool/export/home/drapeau 596K 415G 596K /export/home/drapeau
rpool/swap 8.00G 423G 16K -

drapeau@blackfoot:$
Sweet.  Now I've got a mediapool configured as a four-disk RAID-Z, and I have the rpool but right now it's only using one disk.  I want to mirror it now, using the 2nd 1.5TB disk's extra space.  I'll do that right now, then ask ZFS for status (I'll omit ZFS's status report on the mediapool because we just saw that).  Oh, and I'll make sure that mirrored rpool is bootable; ZFS will remind me to do it, so I'll include my steps here:

drapeau@blackfoot:$ pfexec zpool attach rpool c1t0d0s0 c1t1d0p1
Please be sure to invoke installgrub(1M) to make 'c1t1d0p1' bootable.

drapeau@blackfoot:$ pfexec installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0

Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 48195)
stage2 written to partition 0, 267 sectors starting at 50 (abs 48245)
stage1 written to master boot sector

drapeau@blackfoot:$ zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h5m, 39.40% done, 0h9m to go
config:
        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s0  ONLINE       0     0     0  84.1M resilvered
            c1t1d0p1  ONLINE       0     0     0  5.41G resilvered

errors: No known data errors
drapeau@blackfoot:$ 
About 8 minutes later, zpool status reported that the 2nd drive in rpool was resilvered and I had a fully-mirrored rpool.  Now if one of the two drives fails, I can still boot the OS and replace the bad disk.  And with the mediapool, I'm protected against any one of the four disks failing.  I'm feeling nice and secure; it's unlikely that two disks will fail at once unless the whole computer goes up in flames.  I'll deal with backup later, maybe by looking into Zmanda or something.

This is great: to this point, I've decided how to set up my storage and protect it, I've installed the OS, and I've created my storage pools.

My next blog entry will describe how I set up the computer to share all that storage with the rest of the house.


Powered by ScribeFire.

Thursday Dec 18, 2008

My Home Media Server on OpenSolaris + ZFS: Part 1

A couple of people gave me some good pointers after my last blog entry, in which I was saying that doing an ssh or vncviewer into my new OpenSolaris installation was taking a while.  They pointed out that it may be reverse DNS lookups, not something about the OpenSolaris box itself, and that reminded me that recently I had changed something else about my home network setup: I had two hubs/routers active.  So, that mystery's solved; things are looking much better now and doing an ssh into the OpenSolaris box doesn't take so long.  Moving on...

My mission: use the current release of OpenSolaris (2008.11) as the basis for the main fileserver for home.  Right now, we've got several computers that have external disks attached to them for extra storage (music, photos, movies, etc.) and I want to centralize that for a couple of reasons:

  1. Friends of mine lost their house to a wildfire; fortunately, they had stored all of their critical data on a single computer with lots of disk so when they had to evacuate the house they grabbed one box and didn't lose any critical data.  Laugh all you want; when The Big One comes, I want to be ready.
  2. I'd like to simplify the administration of our home machines.  This is home, for goodness' sake; I don't want to be hiring a system administrator to keep our stuff in order.
I'll document here what I'm doing to build the home server.  My intent is to use OpenSolaris, use ZFS to manage the disks and files, and to do it with a cheap computer that I build myself from off-the-shelf parts (as opposed to, say, buying a Dell computer).  Besides, I want to put a bunch of disks in that computer and it's hard for me to find a cheap computer from Dell or HP with a bunch of internal drive bays, but you can pretty easily buy a reasonable computer enclosure with plenty of internal drive bays.  Building yourself can save money, and I'm all about saving some money on this.

But first, I'm going to try this out on a computer that is known to work with OpenSolaris.  I'll get the setup running there to make sure that ZFS + OpenSolaris really is as easy and reliable as I think it is.  Once I'm convinced that works, I'll switch to my cheapo computer and see if OpenSolaris runs on that.

So here goes...

Step 1: Data Protection

I have four disks: two 1TB drives, two 1.5 TB drives.  I'll split the larger drives into two partitions: 500GB for the operating system and the remaining TB for the big bucket-o-storage.  (let's call it my media pool: ZFS pool used primarily for storing audio, video, and photos)

So my first decision to make is: how should I have ZFS protect my data against disk failure?  After all, I'm buying consumer-grade disk drives but the server will be on 24/7.  The disks will fail.  I don't want to lose my data just because I don't want to pay extra for more reliable disks.  I want the software (ZFS) to take care of the problem for me.  I start by looking at the ZFS Best Practices Guide to see what my options are.

I'm considering three options for ZFS protecting my data:
  1. mirror
  2. raidz (shorthand for "raidz1", meaning 1 error can happen and I don't lose data)
  3. raidz2 (meaning 2 errors can happen and I don't lose data)
The guide points me to this blog entry by Roch Bourbonnais which tells me that the tradeoff I need to make is space versus performance: mirroring gives me maximum performance but cuts my disk storage in half: 4 TB of storage over 4 disks would take 50% overhead to protect the data by mirroring, leaving me a 2TB storage pool for storing movies, photos, music, and the like.  I don't need super-high performance but I want as much usable space as I can get out of my disks, so I choose raidz which should give me about 3TB of usable space; I lose 1TB to data protection, which sounds fine to me.  Later, I may buy a fifth disk and use raidz2 to give me even more robust data protection, but I'm not going to do that right now.

Now that I've decided how to protect my data, I just need to create the appropriate partitions on the larger disks, and I'll be ready to install the OS.  I'll document that in my next blog entry.



Powered by ScribeFire.

About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. What more do you need to know, really?

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today