Friday Dec 18, 2009

More ZFS Goodness: The OpenSolaris Build Machine

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Apart from my usual LDAP'ing, I also -try to- help the opensolaris team with anything I can.

Lately, I've helped build their new x64 build rig, for which I carefully selected the best components out there while trying to keep the overall box budget on a leash. It came out at about $5k. Not on the cheap side, but cheaper than most machines in most data centers.

The components:

  • 2 Intel Xeon E5220 HyperThreaded QuadCores@2.27GHz. 16 cpus in solaris
  • 2 32GB Intel X25 SSD
  • 2 2TB WD drives
  • 24GB ECC DDR2

I felt compelled to follow up my previous post about making the most out your SSD because some people commented that non mirrored pools were evil. Well, here's how this is set up this time: in order to avoid using either of the relatively small SSDs for the system, I have partitioned the big 2TB drives with exactly the same layout, one 100GB partition for the system, the rest of the disk is going to be holding our data. This leaves our SSD available for the ZIL and the L2ARC. But thinking about it, the ZIL is never going to take up the entire 32GB SSD. So I partitioned one of the SSDs with a 3GB slice for the ZIL and the rest for L2ARC.

The result is a system with 24GB of RAM for the Level 1 ZFS cache (ARC) and 57GB for L2ARC in combination with a 3GB ZIL. So we know it will be fast. But the icing on the cache ... the cake sorry, is that the rpool is mirrored. And so is the data pool.

Here's how it looks: 

admin@factory:~$ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c5d1p2    ONLINE       0     0     0
      c6d1p2    ONLINE       0     0     0
    logs
      c6d0p1    ONLINE       0     0     0
    cache
      c5d0p1    ONLINE       0     0     0
      c6d0p2    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        c5d1s0  ONLINE       0     0     0
        c6d1p1  ONLINE       0     0     0

errors: No known data errors
admin@factory:~$

 This is a really good example of how to setup a real-life machine designed to be robust and fast without compromise. This rig achieves performance on par with $40k+ servers. And THAT is why ZFS is so compelling.

Monday Dec 14, 2009

Make The Most Of Your SSD With ZFS

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 If you're anything like me, you're lucky if you have a single SSD in your workstation or server. I had a dilemma in the past, I couldn't quite find a way to make the most of my one flash drive: I had to chose between making it a ZIL or use it as L2ARC. I dreaded having to make a definitive choice for one or the other. When I installed my workstation with OpenSolaris 2009.06, I had an idea in mind, so I installed the system on the SSD in a small partition (10GB) and left the rest of the drive unallocated if you catch my drift...

Bird's Eye View

Simple! Just partition the SSD to be able to use it as both L2ARC and ZIL in whatever proportions you think is going to suit  your needs. Note however that the IOs are shared between your partitions on the same drive. From my testing though, I can tell you that with this setupu you're still coming out on top in most situations.

The Meat

It's all pretty simple really, when you install solaris, you have a choice of installing on "whole disk" or to use the tool to make a smaller custom partition. I cut out a 36GB partition which allows ample room for the system and swap. The rest of my 64GB SSD is left unallocated at install time, we'll take care of everything later.

The second disk in my system is a 300GB 10,000 rpm SATA drive which, being fast but small, I wanted to leave whole for my data pool (keep in mind that the rpool is a little different than your regular pool, so make sure to treat it accordingly). That is why I decided to compromise and use some of the SSD space for the system. You don't have, you could partition your spindle and have the system on there.

Now, that you have opensolaris up and running, install GParted to be able to edit your disks partitions. You can either use the opensolaris package manager or

pfexec pkg install SUNWGParted

It's all downhill from here. Open GParted. If you just installed it, you will need to log out and back in to see in the GNnome menu. It will be in Applications->System tools->GParted Partition Editor

Select your flash drive and carve out a 2GB partition for your ZIL and assign the remaining space for L2ARC. Apply the changes and keep the window open.


Note the two devices path in /dev/dsk because that's what we'll use to add these two SSD partitions as performance enhancing tools in our existing pool.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool add data log /dev/dsk/c9d0p2 cache /dev/dsk/c9d0p3

Let's check how our pool looks now...

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0
    logs
      c9d0p2    ONLINE       0     0     0
    cache
      c9d0p3    ONLINE       0     0     0

errors: No known data errors

Et voila!

You've got the best of both worlds, making the absolute most of whatever little hardware you had at your disposal!

Enjoy!

Add And Remove ZILs Live!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Ever been playing with separate logs (separate ZIL, logzilla, etc...) and had to rebuild the pool everytime you wanted to yank the slog off ?

Not so anymore! The zil can now be added and removed as you like! Truly fantastic to tinker with it and observe its actual impact on performance. A quick walk through one of the most painless migrations of all times.

Bird's Eye View

As ZILs get more and more exposure to production environments, technical staff is getting to experiment more and more to be able to make recommendations and the new removal feature adds a lot more flexibility to try creative combinations.

The Meat

Let's suppose you have a pool with a separate log:

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

If you try to remove the log by removing the actual separate log device, you'll get the following error:

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
ramdiskadm: couldn't delete ramdisk "zil-drive": Device busy

If you now try to use the zpool remove command, you will also hit a wall:


arnaud@ioexception:/data/dsee7.0/instances$ zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
cannot remove /dev/ramdisk/zil-drive: pool must be upgrade to support log removal

So let's just follow up on the suggestion and upgrade the pool: 



arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool upgrade data
This system is currently running ZFS pool version 22.

Successfully upgraded 'data' from version 14 to version 22

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

Wow, that's quick, easy and smooth. Gotta love migrations of that sort. Let's now try to remove our separate log:


arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
arnaud@ioexception:/data/dsee7.0/instances$ zpool status data  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0

errors: No known data errors

So even though it barfed an error message, my separate log has been removed from my pool and I will be able to decommission my device.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
arnaud@ioexception:/data/dsee7.0/instances$

And I can now create a new device and do some more testing. Props to the ZFS for the ever improving level of service this file system brings to the table!

About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Archives
« December 2009
SunMonTueWedThuFriSat
  
1
3
4
5
6
9
10
11
12
13
17
20
22
23
24
25
26
27
28
29
30
31
  
       
Today