More ZFS Goodness: The OpenSolaris Build Machine

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Apart from my usual LDAP'ing, I also -try to- help the opensolaris team with anything I can.

Lately, I've helped build their new x64 build rig, for which I carefully selected the best components out there while trying to keep the overall box budget on a leash. It came out at about $5k. Not on the cheap side, but cheaper than most machines in most data centers.

The components:

  • 2 Intel Xeon E5220 HyperThreaded QuadCores@2.27GHz. 16 cpus in solaris
  • 2 32GB Intel X25 SSD
  • 2 2TB WD drives
  • 24GB ECC DDR2

I felt compelled to follow up my previous post about making the most out your SSD because some people commented that non mirrored pools were evil. Well, here's how this is set up this time: in order to avoid using either of the relatively small SSDs for the system, I have partitioned the big 2TB drives with exactly the same layout, one 100GB partition for the system, the rest of the disk is going to be holding our data. This leaves our SSD available for the ZIL and the L2ARC. But thinking about it, the ZIL is never going to take up the entire 32GB SSD. So I partitioned one of the SSDs with a 3GB slice for the ZIL and the rest for L2ARC.

The result is a system with 24GB of RAM for the Level 1 ZFS cache (ARC) and 57GB for L2ARC in combination with a 3GB ZIL. So we know it will be fast. But the icing on the cache ... the cake sorry, is that the rpool is mirrored. And so is the data pool.

Here's how it looks: 

admin@factory:~$ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c5d1p2    ONLINE       0     0     0
      c6d1p2    ONLINE       0     0     0
    logs
      c6d0p1    ONLINE       0     0     0
    cache
      c5d0p1    ONLINE       0     0     0
      c6d0p2    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        c5d1s0  ONLINE       0     0     0
        c6d1p1  ONLINE       0     0     0

errors: No known data errors
admin@factory:~$

 This is a really good example of how to setup a real-life machine designed to be robust and fast without compromise. This rig achieves performance on par with $40k+ servers. And THAT is why ZFS is so compelling.

Comments:

Um, is the data pool mirrored ? Looks like a stripe to me.

Posted by Sean on December 21, 2009 at 02:24 AM MST #

good point, I grabbed the wrong zpool status...

here's the right output:

admin@factory:~$ zpool status
pool: data
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5d1p2 ONLINE 0 0 0
c6d1p2 ONLINE 0 0 0
logs
c6d0p1 ONLINE 0 0 0
cache
c5d0p1 ONLINE 0 0 0
c6d0p2 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5d1s0 ONLINE 0 0 0
c6d1p1 ONLINE 0 0 0

errors: No known data errors

Posted by arnaud on December 21, 2009 at 03:51 AM MST #

can the ZIL be mirrored? It strikes me that there would surely be data loss (and the pool would be unavailable) in the event of failure of c6d0p1

although I have never used a cache device myself so i am not sure this is possible...

Posted by danny on December 21, 2009 at 10:12 AM MST #

Hey Danny, of course you're right, ZILs (slogs) can indeed be mirrored for extra reliability.
There would actually not necessarily be data loss if a single ZIL device was lost. There could be /some/ lost transactions but it's not a given.
If for example you were to yank the SSD drive out of your rig while writing to disk, ZFS would automatically fall back to collocating the ZIL on your disk. In effect you would only lose the performance benefit of having the ZIL separated out from the disk, but things would continue to work. Yanking out the SSD, you would only lose the "synchronous" transactions that are transiently being stored on the ZIL before actually being committed to disk. All async disk writes bypass the ZIL anyway.

For more details on ZILs, I will defer to Neil Perrin who is really the ZIL person. Check his blog here http://blogs.sun.com/perrin/

regards
-=arnaud=-

Posted by arnaud on December 22, 2009 at 02:30 AM MST #

Arnaud if you have the time I'd like to get your feedback or see a post, running EON 0.59.9 64-bit CIFS version on this hardware.

Posted by andre on December 22, 2009 at 07:33 AM MST #

Sorry, the forgot the url for EON http://eonstorage.blogspot.com

Posted by andre on December 22, 2009 at 07:35 AM MST #

I will have to check with the opensolaris gate keeper. It is now used to make their image builds a lot quicker so I may not be able to pull it out of production to try EON on. I do have another machine that I may be able to try on though, stay tuned.

Posted by arnaud on December 22, 2009 at 07:41 AM MST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today