My Home Media Server on OpenSolaris + ZFS: Part 2

In my previous blog entry, I decided how ZFS will protect my data for a home media server I'm building.  Next: partition the disks on my two larger drives and install OpenSolaris on them.

This would be stupendously easy if my four disk drives were all the same size: I would type "zpool create mediapool raidz <disk1> <disk2> <disk3> <disk4>" and ZFS would give me a ton of storage all nice and protected for me.  But I have two 1TB drives and two 1.5TB drives.   My problem: ZFS wants all the pieces of a "vdev" (a virtual device; in this case, I'm creating a virtual RAID-Z device with four disks in it) to be the same size.  So I have some partitioning work to do.  I'm documenting what I did in case any of you want to use ZFS with different sized drives.

Here is my plan:
  • make sure the 1.5TB drives are the 1st and 2nd drives seen by the computer's BIOS, so that I can install OpenSolaris on one of these bigger drives
  • partition each 1.5TB drive into a 1TB partition and a .5TB partition (I recommend doing the partitioning from the Live CD instead of after installing the OS; it went easier for me this way)
  • install OpenSolaris onto the first 1.5TB drive's .5TB partition; installation will create a ZFS pool called "rpool"
  • put the four 1TB partitions into a ZFS raidz pool I will call "mediapool", my primary storage for our home's stuff
  • attach the remaining .5TB partition (from the second 1.5TB drive) to "rpool", making it a ZFS mirror pool so that the OS is protected against a single disk failure
I suppose I could've just made a single pool for storage, but I still like the idea of being able to separate my media storage from my OS.  Anyway, this is my plan for now.

Recalling that ZFS wants all the devices in a vdev to be the same size, I need to do some disk math to make sure the partition sizes are the same number of bytes.  Here's why (and don't laugh if this is all trivial to you; I'm a manager, okay?  If I don't see headcount or budget somewhere in this, I just get confused):

First of all, fdisk lets me specify partition sizes by either a percentage of the disk or a number of cylinders.  Specifying a percentage doesn't let me get precise enough to match the partition sizes on the the 1.5TB disks and the 1TB disks, so I need to specify partition size in terms of cylinders.  But cylinders aren't the same size on the two different disks.

The fdisk utility reports the following information about the 1.5TB and 1TB disks:

1.5TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 48195 512-byte blocks
1TB Disk geometry:
Total disk size is 60800 cyls
Cylinder size is 32130 512-byte blocks

Notice that one cylinder on the 1.5TB drive is 1.5 the size (or 3/2, this way of reckoning comes in handy later) of a cylinder on the 1TB drive (48195 = 3/2 \* 32130).

I want to use as much of the 1TB drive as possible (60800 cylinders) but I can't: 60800 cylinders on the 1TB drive corresponds to 40533.33333 cylinders on the 1.5TB drive; I can't enter a non-integer number into fdisk.  I must find a size that works for both disks.  It needs to be a multiple of 3 cylinders on the 1TB drive (which would be a multiple of 2 cylinders on the 1.5TB drive).  I'll waste a little space (2 cylinders' worth on the 1TB drive or about 32MB), but that's okay given that I'll get RAID-Z error correction in return.

I'll create 1 partition on the 1TB disk, 60798 cylinders (next closest multiple of 3) == 1,953,439,740 blocks.
I'll create two partitions the 1.5TB disk:
  1. 40532 cyls == 1,953,439,740 blocks
  2. 20268 cyls (use this for the OS "rpool")
Now that I know exactly how big each partition needs to be on the four disks, I can use this easy-to-follow example to create the Solaris fdisk partitions.  It's easy; it takes less than five minutes, once I've worked out the math I just laid out here.

Next, it's time to install the OS.  I'm already running OpenSolaris from the Live CD, so I just click on the icon to install and less then 20 minutes later, it's there.

Next: create the media storage pool, using all four disks in a RAID-Z configuration:
drapeau@blackfoot:$ pfexec formatSearching for disks...done

AVAILABLE DISK SELECTIONS:
 0. c1t0d0  /pci@0,0/pci108e,534a@7/disk@0,0
 1. c1t1d0  /pci@0,0/pci108e,534a@7/disk@1,0 
 2. c2t0d0  /pci@0,0/pci108e,534a@8/disk@0,0
 3. c2t1d0  /pci@0,0/pci108e,534a@8/disk@1,0
Specify disk (enter its number): \^C

drapeau@blackfoot:$ zpool listNAME SIZE USED AVAIL CAP HEALTH ALTROOT
rpool 444G 13.7G 430G 3% ONLINE -

drapeau@blackfoot:$ pfexec zpool create mediapool raidz c2t1d0p1 c2t0d0p1 c1t1d0p2 c1t0d0p2

Note that I used partition names for these disks, which is important: according to this helpful document, in Solaris disk device names, you'll see four primary partitions (p1-p4) and a "p0" as well which means "the whole disk".  I had to be clear to tell ZFS that I didn't want to use the whole 1TB disks, only the 1st partition on them (c2t1d0p1, c2t0d0p1).  And I told ZFS to use the 2nd partitions on the 1.5TB disks (c1t1d0p2, c1t0d0p2), which are the roughly-1TB partitions.

So, did it work?  Let's see:
drapeau@blackfoot:$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT

mediapool 3.62T 132K 3.62T 0% ONLINE -
rpool 444G 13.7G 430G 3% ONLINE -

So far, so good: two ZFS pools. Let's check status:
drapeau@blackfoot:$ zpool status
 pool: mediapool
state: ONLINE
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 mediapool ONLINE 0 0 0
 raidz1 ONLINE 0 0 0
 c2t1d0p1 ONLINE 0 0 0
 c2t0d0p1 ONLINE 0 0 0
 c1t1d0p2 ONLINE 0 0 0
 c1t0d0p2 ONLINE 0 0 0

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 rpool ONLINE 0 0 0
 c1t0d0s0 ONLINE 0 0 0

errors: No known data errors

drapeau@blackfoot:$ zfs list
NAME USED AVAIL REFER MOUNTPOINT

mediapool 92.0K 2.67T 26.9K /mediapool
rpool 21.7G 415G 72K /rpool
rpool/ROOT 5.74G 415G 18K legacy
rpool/ROOT/opensolaris 5.74G 415G 5.61G /
rpool/dump 8.00G 415G 8.00G -
rpool/export 634K 415G 19K /export
rpool/export/home 615K 415G 19K /export/home
rpool/export/home/drapeau 596K 415G 596K /export/home/drapeau
rpool/swap 8.00G 423G 16K -

drapeau@blackfoot:$
Sweet.  Now I've got a mediapool configured as a four-disk RAID-Z, and I have the rpool but right now it's only using one disk.  I want to mirror it now, using the 2nd 1.5TB disk's extra space.  I'll do that right now, then ask ZFS for status (I'll omit ZFS's status report on the mediapool because we just saw that).  Oh, and I'll make sure that mirrored rpool is bootable; ZFS will remind me to do it, so I'll include my steps here:

drapeau@blackfoot:$ pfexec zpool attach rpool c1t0d0s0 c1t1d0p1
Please be sure to invoke installgrub(1M) to make 'c1t1d0p1' bootable.

drapeau@blackfoot:$ pfexec installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0

Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 48195)
stage2 written to partition 0, 267 sectors starting at 50 (abs 48245)
stage1 written to master boot sector

drapeau@blackfoot:$ zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h5m, 39.40% done, 0h9m to go
config:
        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s0  ONLINE       0     0     0  84.1M resilvered
            c1t1d0p1  ONLINE       0     0     0  5.41G resilvered

errors: No known data errors
drapeau@blackfoot:$ 
About 8 minutes later, zpool status reported that the 2nd drive in rpool was resilvered and I had a fully-mirrored rpool.  Now if one of the two drives fails, I can still boot the OS and replace the bad disk.  And with the mediapool, I'm protected against any one of the four disks failing.  I'm feeling nice and secure; it's unlikely that two disks will fail at once unless the whole computer goes up in flames.  I'll deal with backup later, maybe by looking into Zmanda or something.

This is great: to this point, I've decided how to set up my storage and protect it, I've installed the OS, and I've created my storage pools.

My next blog entry will describe how I set up the computer to share all that storage with the rest of the house.


Powered by ScribeFire.

Comments:

Hi George. nice post. Are you planning to share some photos of the box? Have you made it noiseless?
btw, I have a comment on mirroring slice s0 to partition p1.
From my personal experience I would not recommend doing this. You may check that size of p1 is slightly bigger then s0. It means that you can not attach s0 back to p1 if you will recreate s0 after disk loss.
My situation was the following: I've lost s0 and had troubles booting zfs which was taking whole p1.
In my case It leads to reinstall on s0 and copying system configuration files from p1.
Any case, you may physically detach drive containing s0 and try to boot from p1.

Posted by Roman on December 23, 2008 at 05:56 PM PST #

Hi George,

I am trying to use 1.5TB drives and am having a terrible time doing so. It seems as if Solaris chokes on the size of the drives. If you have a minute, please check out the following discussion and see if you might have any insight:

http://www.opensolaris.org/jive/thread.jspa?messageID=321192&#321192

Thank you!

Posted by Michael on December 24, 2008 at 01:47 AM PST #

Can you please add a comment that tells us which media server software you are using and if you had to compile it? I'm wanting to set up a media server for my PS3 using my existing OpenSolaris home server and am anxiously awaiting your next blog entry about the media server software you chose, and how difficult the setup was. Thanks

Posted by Ryan de Laplante on January 02, 2009 at 02:36 PM PST #

I'm dying to know how you're going to setup this box to share media to you're devices.

My OpenSolaris server currently shares its 2TB RaidZ volume over CIFS to my various Linux/Windows/Mac machines, but I have yet to find a workable solution for uPnP and streaming to either my PS3 or my Xbox360.

Posted by Joshua Wilsie on March 03, 2009 at 04:42 AM PST #

Just a note on your geometry stats: The disk does not report physical geometry. Actual number of cylinders on a 1.5TB drive (for example) is something like 165k (taken from disk geometry and a stated track density of 190'000/in for seagates new 1.5TB models). Given that these things use 4 heads, that gives you a real number of 660'000 cylinders at 1 head and an average cylinder length of 4440 sectors.

Posted by Arno Wagner on September 11, 2009 at 11:13 PM PDT #

Just a note on your geometry stats: The disk does not report physical geometry. Actual number of cylinders on a 1.5TB drive (for example) is something like 165k (taken from disk geometry and a stated track density of 190'000/in for seagates new 1.5TB models). Given that these things use 4 heads, that gives you a real number of 660'000 cylinders at 1 head and an average cylinder length of 4440 sectors.

Posted by Arno Wagner on September 11, 2009 at 11:13 PM PDT #

Is there a way to convince the OpenSolaris OS to detect the rest of the computer out side of the Solaris's own installed files and folders? I realize that most people probably don't care to see the rest of their computer's files and folders and disks, but if you wanted to see what else is on your computer, out side of the immediate folders that were installed by OpenSolaris, is that possible?

Posted by Anonymous on March 04, 2010 at 10:27 AM PST #

@Anonymous: sure, OpenSolaris can look at other files and directories beyond simply what it installed itself, but can you say more about what you mean? What would you like OpenSolaris to be watching or scanning for? It also depends on how you create filesystems; if you create your filesystems right, you can put your personal files there, and OpenSolaris of course can examine it just as any other filesystem.

Say more about what you're after, if you would, please. I'm happy to help if I can.

Posted by George Drapeau on March 05, 2010 at 07:13 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. What more do you need to know, really?

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today