Thursday Dec 03, 2009

OSDevCon 2009 Paper: Implementing a simple ZFS Auto-Scrub Service with SMF, RBAC, IPS and Visual Panels Integration - Lessons learned

A while ago, I wrote a little tool that helps you keep your ZFS pools clean by automatically running regular scrubs, similar to what the OpenSolaris auto-snapshot service does.

The lessons I learned during development of this service went into an OSDevCon 2009 paper that was presented in September 2009 in Dresden. It is a nice summary of things to keep in mind when developing SMF services of your own and it includes a tutorial on writing a GUI based on the OpenSolaris Visual Panels project.

Check out the Whitepaper here, the slides here, the SMF service here and if you want to take a peek at the Service's Visual Panels Java code, you'll find it here.

Saturday Sep 19, 2009

New OpenSolaris ZFS Home Server: Requirements

Old OpenSolaris Home Server with blinkenlights USB drives

A few months ago, I decided it was time for a new home server. The old one (see picture) is now more than 3 years old (the hardware is 2 years older), so it was time to plan ahead for the inevitable hardware failure. Curiously enough, my old server started to refuse working with some of my external USB disks only a few weeks ago, which confirmed my need for a new system. This is the beginning of a series of blog articles around building a new OpenSolaris home server.

Home Server Goals

Let's go over some goals for a home server to help us decide on the hardware. IMHO, a good home server should:

  1. Run OpenSolaris. This means I don't want an appliance because this is too limiting and I'd end up hacking it to make it run something it doesn't do by itself anyway. It should therfore use a real OS.
    It also means I don't want to use Linux, because quite frankly the whole Linux landscape is too unstable, confused and de-focused (don't get me wrong: It's nice for experimentation and as a hobbyist-OS, but I want something more serious to guard my data).
    Windows is out of the question because it delivers too little for too high a price.
    I like BSD (and used to run NetBSD on my Amiga 4000 back then in the mid nineties), but it seems to be more oriented to some (albeit interesting) niches for my taste.
    Right now I prefer OpenSolaris because it's rock-solid, clean, well-documented, well-designed and it has lots of advanced features that other OSes only dream of. Yes, I'd still write and do the same if I weren't a Sun employee.
  2. Leverage ZFS. This should be a no-brainer, but I just wanted to point out that any system that is serious about its data should absolutely run ZFS because of the end-to-end-integrity. Period. And then there are many useful features such as compression, send/receive, snapshots, ease of administration, no fscks and much more. Oh, and I'm looking forward to leveraging encryption and de-duplication at home in the near future, too!
  3. Use ECC Memory: What's the use of having end-to-end data integrity with ZFS if your data is corrupted before ZFS can create it's checksum? That's why you need ECC Memory. Simply put: Use ECC memory and kiss those unexpected, unexplicable system crashes and broken data surprises good bye.
  4. Be Power Efficient: Think 1.5 Euros of electricity bill per Watt per Year for a system running 24/7. The difference between your typical gaming PC and a power-efficient home server can easily be 50W or more when idle, so you're looking at an extra 75 Euros and more of free cash if you just pick your components more carefully. Notice that I'm not saying "Low-Power". There are a lot of compromises when trying to reach absolute low-powerness. Like many optimization problems, squeezing the last few Watts out of your system means investing a lot of money and effort while sacrificing important features. So I want this system to be power-efficient, but without too many sacrifices.
  5. Use a Moderate Amount of Space: While my home server sits in the basement, form doesn't matter. But I may move into a new apartment where networking to the basement is not an option. Then the server needs to be living-room capable and have a decent WAF. Which also brings us to:
  6. Be quiet: A power-efficient server needs less cooling which helps with being quiet. Again, we don't want to stretch the limits of quietness at all costs, but we want to make sure we don't do any obvious mistakes here that sacrifice the living-room capabilities of the system

What's Next

In the next blog entry, we'll discuss a few processor and platform considerations and reveal a cool, yet powerful option that presented itself to me. Meanwhile, feel free to check out other home server resources, such as Simon Breden's blog, Matthias Pfuetzner's blog, Jan Brosowski's Blog (German) or one of the many home server discussions on the zfs-discuss mailing list.

What are your requirements for a good home server? What do you currently use at home to fulfill your home server needs? What would you add to the above list of home server requirements? Feel free to add a comment below!

Thursday Sep 17, 2009

New OpenSolaris ZFS Auto-Scrub Service Helps You Keep Proper Pool Hygiene

A harddisk that is being scrubbed

One of the most important features of ZFS is the ability to detect data corruption through the use of end-to-end checksums. In redundant ZFS pools (pools that are either mirrored or use a variant of RAID-Z), this can be used to fix broken data blocks by using the redundancy of the pool to reconstruct the data. This is often called self-healing.

This mechanism works whenever ZFS accesses any data, because it will always verify the checksum after reading a block of data. Unfortunately, this does not work if you don't regularly look at your data: Bit rot happens and with every broken block that is not checked (and therefore not corrected), the probability increases that even the redundant copy will be affected by bit rot too, resulting in data corruption.

Therefore, zpool(1M) provides the useful scrub sub-command which will systematically go through each data block on the pool and verify its checksum. On redundant pools, it will automatically fix any broken blocks and make sure your data is healthy and clean.

It should now be clear that every system should regularly scrub their pools to take full advantage of the ZFS self-healing feature. But you know how it is: You set up your server and often those little things get overlooked and that cron(1M) job you wanted to set up for regular pool scrubbing fell off your radar etc.

Introducing the ZFS Auto-Scrub SMF Service

Here's a service that is easy to install and configure that will make sure all of your pools will be scrubbed at least once a month. Advanced users can set up individualized schedules per pool with different scrubbing periods. It is implemented as an SMF service which means it can be easily managed using svcadm(1M) and customized using svccfg(1M).

The service borrows heavily from Tim Foster's ZFS Auto-Snapshot Service. This is not just coding laziness, it also helps minimize bugs in common tasks (such as setting up periodic cron jobs) and provides better consistency across multiple similar services. Plus: Why invent the wheel twice?


The ZFS Auto-Scrub service assumes it is running on OpenSolaris. It should run on any recent distribution of OpenSolaris without problems.

More specifically, it uses the -d switch of the GNU variant of date(1) to parse human-readable date values. Make sure that /usr/gnu/bin/date is available (which is the default in OpenSolaris).

Right now, this service does not work on Solaris 10 out of the box (unless you install GNU date in /usr/gnu/bin). A future version of this script will work around this issue to make it easily usable on Solaris 10 systems as well.

Download and Installation

You can download Version 0.5b of the ZFS Auto-Scrub Service here. The included README file explains everything you need to know to make it work:

After unpacking the archive, start the install script as a privileged user:

pfexec ./

The script will copy three SMF method scripts into /lib/svc/method, import three SMF manifests and start a service that creates a new Solaris role for managing the service's privileges while it is running. It also installs the OpenSolaris Visual Panels package and adds a simple GUI to manage this service.

ZFS Auto-Scrub GUI

After installation, you need to activate the service. This can be done easily with:

svcadm enable auto-scrub:monthly

or by running the GUI with:

vp zfs-auto-scrub

This will activate a pre-defined instance of the service that makes sure each of your pools is scrubbed at least once a month.

This is all you need to do to make sure all your pools are regularly scrubbed.

If your pools haven't been scrubbed before or if the time or their last scrub is unknown, the script will proceed and start scrubbing. Keep in mind that scrubbing consumes a significant amount of system resources, so if you feel that a currently running scrub slows your system too much, you can interrupt it by saying:

pfexec zpool scrub -s <pool name>

In this case, don't worry, you can always start a manual scrub at a more suitable time or wait until the service kicks in by itself during the next scheduled scrubbing period.

Should you want to get rid of this service, use:

pfexec ./ -d

The script will then disable any instances of the service, remove the manifests from the SMF repository, delete the scripts from /lib/svc/method, remove the special role and the authorizations the service created and finally remove the GUI. Notice that it will not remove the OpenSolaris Visual Panels package in case you want to use it for other purposes. Should you want to get rid of this as well, you can do so by saying:

pkg uninstall OSOLvpanels

Advanced Use

You can create your own instances of this service for individual pools at specified intervals. Here's an example:

  constant@fridolin:~$ svccfg
  svc:> select auto-scrub
  svc:/system/filesystem/zfs/auto-scrub> add mypool-weekly
  svc:/system/filesystem/zfs/auto-scrub> select mypool-weekly
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> addpg zfs application
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> setprop zfs/pool-name=mypool
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> setprop zfs/interval=days 
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> setprop zfs/period=7
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> setprop zfs/offset=0
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> setprop zfs/verbose=false
  svc:/system/filesystem/zfs/auto-scrub:mypool-weekly> end
  constant@fridolin:~$ svcadm enable auto-scrub:mypool-weekly

This example will create and activate a service instance that makes sure the pool "mypool" is scrubbed once a week.

Check out the zfs-auto-scrub.xml file to learn more about how these properties work.

Implementation Details

Here are some interesting aspects of this service that I came across while writing it:

  • The service comes with its own Solaris role zfsscrub under which the script runs. The role has just the authorizations and profiles necessary to carry out its job, following the Solaris Role-Based Access Control philosophy. It comes with its own SMF service that takes care of creating the role if necessary, then disables itself. This makes a future deployment of this service with pkg(1) easier, which does not allow any scripts to be started during installation, but does allow activation of newly installed SMF services.
  • While zpool(1M) status can show you the last time a pool has been scrubbed, this information is not stored persistently. Every time you reboot or export/import the pool, ZFS loses track of when the last scrub of this pool occurred. This has been filed as CR 6878281. Until that has been resolved, we need to take care of remembering the time of last scrub ourselves. This is done by introducing another SMF service that periodically checks the scrub status, then records the completion date/time of the scrub in a custom ZFS property called in the pool's root filesystem when finished. We call this service whenever a scrub is started and it deactivates itself once it's job is done.
  • As mentioned above, the GUI is based on the OpenSolaris Visual Panels project. Many thanks to the people on its discussion list to help me get going. More about creating a visual panels GUI in a future blog entry.

Lessons learned

It's funny how a very simple task like "Write an SMF service that takes care of regular zpool scrubbing" can develop into a moderately complex thing. It grew into three different services instead of one, each with their own scripts and SMF manifests. It required an extra RBAC role to make it more secure. I ran into some zpool(1M) limitations which I now feel are worthy of RFEs and working around them made the whole thing slightly more complex. Add an install and de-install script and some minor quirks like using GNU date(1) instead of the regular one to have a reliable parser for human-readable date strings, not to mention a GUI and you cover quite a lot of ground even with a service as seemingly simple as this.

But this is what made this project interesting to me: I learned a lot about RBAC and SMF (of course), some new scripting hacks from the existing ZFS Auto-Snapshot service, found a few minor bugs (in the ZFS Auto-Snapshot service) and RFEs, programmed some Java including the use of the NetBeans GUI builder and had some fun with scripting, finding solutions and making sure stuff is more or less cleanly implemented.

I'd like to encourage everyone to write their own SMF services for whatever tools they install or write for themselves. It helps you think your stuff through, make it easy to install and manage, and you get a better feel of how Solaris and its subsystems work. And you can have some fun too. The easiest way to get started is by looking at what others have done. You'll find a lot of SMF scripts in /lib/svc/method and you can extract the manifests of already installed services using svccfg export. Find an SMF service that is similar to the one you want to implement, check out how it works and start adapting it to your needs until your own service is alive and kicking.

If you happen to be in Dresden for OSDevCon 2009, check out my session on "Implementing a simple SMF Service: Lessons learned" where I'll share more of the details behind implementing this service including the Visual Panels part.

Edit (Sep. 21st) Changed the link to CR 6878281 to the externally visible OpenSolaris bug database version, added a link to the session details on OSDevCon.

Edit (Jun. 27th, 2011) As the Mediacast service was decommissioned, I have re-hosted the archive in my new blog and updated the download link. Since vpanels has changed a lot lately, the vpanels integration doesn't work any more, but the SMF service still does.

Monday Jul 06, 2009

How to Fix OpenSolaris Keyboard Irregularities with Virtual Box

Virtual Box is great: It allows you to install OS A on OS B for impressively large sets of A and B OSes and their permutations. Almost everything works smoothly and seamlessly between host and guest: Cut&Paste, File sharing, networking, USB pass-through, even seamless windows are supported.

But there's one little glitch that is still a little annoying, but apparently not annoying enough for someone else to have blogged about this before: Keyboard remapping on Mac OS X hosts.

The Problem

Simple problem: Macs are different than PCs (phew), but they have slightly different keyboard mappings (oops). Most notably, on my German keyboard, the "<" key at the bottom left on the Mac will yield "\^" on OpenSolaris and vice versa. Same thing goes for "@", which is Right-Alt-L on the Mac, but Right-Alt-Q on PCs. Similar difficulties are encountered if you try to create a "|" pipe symbol or angular/curved brackets ("[]" and "{}" respectively).

Pressing the Right Keys

Usually no big deal. Close your eyes and blindly type what you would type on a PC and that'll give you a good hint at where the right keystrokes are. That works because Virtual Box actually maps the physical locations of the keys between host and guest, but not what's painted on them. So, with a little practice, you should be fine. But what happens if you can't quite remember what that PC keyboard looked like?

Last Friday I had an hour or so left and the playfulness of the problem got the better of me, so I decided to see if this can be fixed the Unix way. It's actually quite easy.

Searching for a cure

There are some helpful hints on the net, most notably Petr Hruska's entry on "Switching Keyboard Layout in Solaris", but it only deals with internationalization issues. What if you have the keyboard nationalities right, but individual keys are still different as in the Mac/PC case? Here's a step-by-step guide to help you with any keyboard remapping problem, plus a bonus table for OpenSolaris on Macbook users to get you started:

Xmodmap to the rescue

  1. We're going to use xmodmap(1) to remap the keys on our keyboard. Check out the man-page to familiarize yourself with how it works.
  2. See the keystrokes as OpenSolaris sees them: Use xev(6) to find out what keycodes belong to the keys you want to correct.
  3. Check out what OpenSolaris is thinking about your problematic keys, either by testing them in a terminal or by checking your version of the standard Sun USB keyboard layouts.
  4. Before you start modifying the current keyboard mapping, get the currently active one by saying something like:
    xmakemap > ~/.xmodmaprc.current
    Caution: There seems to be a bug in xmakemap that corrupts some of the entires. So, please use this only for reference but do not feed this file back into xmodmap (see later) or you'll likely make your keyboard unusable (until this bug is resolved).
  5. Start editing your own remapping script for xmodmap:
    vi ~/.xmodmaprc
  6. For each key you want to remap, copy it's keycode entry from the xmakemap output into your own remapping table and modify to taste. Be careful, some entries from xmakemap are broken, but you should be able to figure those out. Here's my current .xmodmaprc file as a reference:
    ! Set up keys for a MacBook Pro running OpenSolaris on VirtualBox
    !       Key   Unshifted       Shifted         AltGraph        AltGraph-Shifted
    !       ---   --------------- --------------- --------------- ----------------
    keycode  49 = less            greater
    keycode  94 = asciicircum     degree          asciicircum     degree
    keycode  14 = 5               percent         bracketleft
    keycode  15 = 6               ampersand       bracketright
    keycode  16 = 7               slash           bar             backslash
    keycode  17 = 8               parenleft       braceleft
    keycode  18 = 9               parenright      braceright
    keycode  24 = q               Q               q               Q
    keycode  46 = l               L               at
    keycode  57 = n               N               asciitilde
    This works well on my MacBook Pro, your mileage may vary.
  7. You can activate your remapping by saying something like:
    xmodmap ~/.xmodmaprc
  8. In case something goes wrong and you render your keyboard useless, you can restart your X server by pressing Ctrl-Alt-Backspace twice.
  9. If you're happy with your remapping, you can automatically activate it on every login by using the System->Preferences->Sessions panel and adding an entry for the above xmodmap command there.


I hope this little exercise in some lesser known X-Windows commands (Hi Jörg) was useful for you, now you shouldn't need to worry too much about keyboard mapping inconsistencies any more.

If you want to learn a little more about modifying your keyboard, check out this section of the OpenSolaris docs.

The example keymap modifications above work well for me, but I'm sure I've forgotten a key or two. What other keys did you remap and why? Feel free to leave me a comment below.

Friday Feb 27, 2009

Munich OpenSolaris User Group Install Fest

mucosug logoYesterday we had the first Munich OpenSolaris User Group (MUCOSUG) install fest at Munich Technical University's Mathematics and Computer Science Building in the Garching Campus. Many thanks go to Martin Uhl for organizing coffee, meeting room and overall help!

The building is very cool, featuring two giant parabolic slides that go all the way from 3rd floor to the ground floor. Check out some construction pictures here.

Home server in the basementWe began the meeting with a short presentation on OpenSolaris as a home server (here are the slides, let me know if you want the source). It covers some thoughts on why you need a home server (hints: Photos, multimedia clients, backups, first-hand Solaris experience), where to get some extra software, first steps in ZFS, CIFS server and iSCSI and some useful blogs to follow up with for more good home-server specific content.

Most of the people had OpenSolaris installed already, either on their laptops or inside VirtualBox. So most of the conversation was centered around tips for setting up home server hardware, how to install the VirtualBox guest additions and why, or what the best ways are to integrate VirtualBox networking and exchange files between host and guest.

I learned that sharing the host interface with the Virtual Box guest has become as painless as using NAT with the added benefit of making your guest be a first-class citizen on your network, so that's what I'll try out next. Also, the cost of 32 GB USB sticks has come way down at acceptable speed rates, so I'll try one of them to host my OpenSolaris work environment and free my local harddisk a bit.

All in all, such geek gatherings are always a nice excuse to sit together and chat about the newest in technology, find new ideas and have a beer or two afterwards, so how about organizing your own OpenSolaris Installfest in your neighbourhood now?

Update: The way how to set up CIFS in OpenSolaris turned out to be slightly more complicated. Please check the above slides for an updated list of commands on how to set this up. I forgot to include how to expand /etc/pam.conf and assumed this was automatic. Sorry, must be because I set this up at home a while ago...

Wednesday Jan 14, 2009

How to get Audio to work on OpenSolaris on VirtualBox

Man playing a big trumpet My regular working environment on the go or when working from home is, of course, OpenSolaris. I've been using it on an Acer Ferrari Laptop for years now and I can say I'm very happy with it, and that's not just because I work for Sun.

Lately, I tried OpenSolaris on VirtualBox on my private MacBook Pro. This configuration turned out to work better than the native OpenSolaris on my company's Acer Ferrari laptop! Due to the MBP being 2 years newer and it having a dual-core CPU plus 4 GB of RAM, it turned out to be the better machine to host my OpenSolaris work environment.

With one exception: Audio.

Audio isn't enabled in VirtualBox by default in the Mac version and that has already been blogged elsewhere. The solution is simply to enable Audio in VirtualBox settings and select the Intel ICH AC97 soundchip.

Then, OpenSolaris doesn't come with an ICH AC97 audio driver and even the new SUNWaudiohd driver doesn't support it. The solution here is to download the OSS sound drivers from 4Front technologies. So far, so good.

But this didn't work for me: Either the sound would play for a few seconds, then hang, or the sound drivers wouldn't be recognized by GNOME/GStreamer at all, resulting in a crossed-out loudspeaker icon at the top! This is very frustrating if you want to show Brandan's excellent shouting video to an audience and have to switch out of OpenSolaris/VirtualBox back to Mac OS X just for that.

Apparently others suffered from the same annoyance, too, but neither of the solutions I found seemed to help: I installed and uninstalled and reinstalled the OSS drivers a number of times, ran the ossdevlinks script to recreate device links, even installed a newer, experimental version of the SUNaudiohd driver. No luck yet.

Then Frank, a Sun sales person who happens to use OpenSolaris on his laptop as well (Yay! a salesrep using OpenSolaris! Kudos to Frank!) suggested to uninstall the SUNWaudiohd driver, then install the OSS sound driver, which worked for him. It didn't occur to me that uninstalling SUNWaudiohd might be the solution, so I wanted to give it a try.

But, alas "pfexec pkg uninstall SUNaudiohd" didn't work for me either! Apparently there's a dependency between this package and the slim_install package bundle. Again, Google is your friend and it turned out to be a known bug that prevented me from uninstalling SUNWaudiohd. The workaround is simply to "pfexec pkg uninstall slim_install" which is no longer needed after the installation process anyway.

So lo and behold, gone is slim_install, gone is SUNWaudiohd, installed the OSS drivers, logged out and back in and audio works fine now! (Notice: no reboot required).

Here's the sweet and short way to audio goodness on OpenSolaris on VirtualBox:

  1. Shutdown your OpenSolaris VirtualBox image if it is running, so you can change it's settings.
  2. Activate audio for your OpenSolaris VM in VirtualBox. Select the ICH AC97 Chip. Here's a blog entry that describes the process.
  3. Boot your OpenSolaris VirtualBox image.
  4. Uninstall the slim_server package: "pfexec pkg uninstall slim_server"
  5. Uninstall the SUNWaudiohd driver: "pfexec pkg uninstall SUNWaudiohd"
  6. Download the OSS sound driver for OpenSolaris.
  7. Install the OSS sound driver: "pfexec pkgadd -d oss-solaris-v4.1-1051-i386.pkg" (Or whatever revision you happened to download).
  8. Log out of your desktop and log back in. Sound should work now.

Tuesday Dec 16, 2008

New OpenSolaris Munich User Group

The Munich OpenSolaris User Group (MUCOSUG) LogoMunich is one of the IT centers of Germany. Some would say, the IT center in Germany. Most popular IT and media companies are based here, including Sun Germany, and of course Bavaria has the reputation of being an important technology powerhouse for Germany, between Laptops and Lederhosen.

It was about time that a Munich OpenSolaris User Group be created, which Wolfgang and I just did.

So, if you love OpenSolaris and happen to be near Munich, welcome to the Munich Open Solaris User Group (MUCOSUG). Feel free to visit our project page, subscribe to the mailing list, watch our announcements or participate in our events.

As you can see above, we already have a logo. It shows a silhouette of the Frauenkirche church, which is a signature landmark of downtown Munich, with the Olympiaturm tower in the background. This is meant to symbolize the old and new features of Solaris, but let's not get too sentimental here... Let us know if you like it, or provide your own proposal for a better logo, this is not set in stone yet.

Our first meeting will be on January 12th, 2009, 7-11 PM (19:00-23:00) at the Sun Munich office near Munich, Germany. Check out some more information about this event, we're looking forward to meeting you!



Friday Aug 22, 2008

POFACS Podcast: Home Servers are quickly becoming Commonplace

I remember having talked at a conference 3 years ago and predicting that home servers are going to become a central part of most people's homes. Today, this would not be a surprise, but back then, running a server at home was really only for computer geeks.

Now, the entertainment industry gives us many home server alternatives to choose from: Add 50-100 EUR to a USB disk's price, and you'll get a built in server that offers the space to your local network through SMB, NFS or other protocols. Microsoft has discovered this, too and they're busily debugging their Windows Home Server products. UPnP has emerged as a standard for driving audio/video components over the network from servers, be they beefed up USB disks or some machine running some OS with some server component or a real dedicated home server machine. If you use iTunes and enable the "sharing" piece, you're already running a home server.

Of course, this is all driven by clients. First, people imported their music from CDs into their computers so they could listen on the go and fill their MP3 players. Then, they discovered that running a PC or even a laptop in your living room to listen to your music isn't really cool and lacks that WAF that makes or breaks most living room decisions. Soon, specialized living room clients started to pop up, such as the Roku Soundbridge or the Logitech SqueezeBox. Digital TV set-top-boxes and PVRs like the DreamBox were also early adopters of the home network by either offering TV streams on the network or using network attached storage for storing recorded TV shows. And the current generation of game consoles comes with Wifi and/or wired networking as a central part of their strategy, and they make good network media players as well. Even the traditional vendors of home entertainment equipment such as TVs, Hifi systems etc. have started to adopt some way of accepting digital audio and/or video from the network for A/V Receivers, DVD-Players, TVs etc. My current favourite, for example is the Linn Sneaky Music DS. And I applaud them for boldy migrating their records business to the digital world, in full studio master quality. You can even buy their full music catalog pre-installed on a 2TB NAS storage appliance, including UPnP server!

The current edition of the POFACS Podcast (sorry, it's in German) talks about the various ways a home server can add value to your living room experience, from serving files to your family's laptops, being a backup repository to the more interesting topics of serving music for dinner in a WAF-friendly way or handling your TV recordings over the net so you don't have to worry about noisy PCs and harddisks sitting in your living room. Enjoy!

Wednesday Aug 13, 2008

ZFS Replicator Script, New Edition

Many crates on a bicycle. A metaphor for ZFS snapshot replicationAbout a year ago, I blogged about a useful script that handles recursive replication of ZFS snapshots across pools. It helped me migrate my pool from a messy configuration into the clean two-mirrored-pairs configuration I have now.

Meanwhile, the fine guys at the ZFS developer team introduced recursive send/receive into the ZFS command, which makes most of what the script does a simple -F flag to the zfs(1M).

Unfortunately, this new version of the ZFS command has not (yet?) been ported back to Solaris 10, so my ZFS snapshot replication script is still useful for Solaris 10 users, such as Mike Hallock from the School of Chemical Sciences at the University of Illinois at Urbana-Champaign (UIUC). He wrote:

Your script came very close to exactly what I needed, so I took it upon myself to make changes, and thought in the spirit of it all, to share those changes with you.

The first change he in introduced was the ability to supply a pattern (via -p) that selects some of the potentially many snapshots that one wants to replicate. He's a user of Tim Foster's excellent automatic ZFS snapshot service like myself and wanted to base his migration solely on the daily snapshots, not any other ones.

Then, Mike wanted to migrate across two different hosts on a network, so he introduced the -r option that allows the user to specify a target host. This option simply pipes the replication data stream through ssh at the right places, making ZFS filesystem migration across any distance very easy.

The updated version including both of the new features is available as zfs-replicate_v0.7.tar.bz2. I didn't test this new version but the changes look very good to me. Still: Use at your own risk.

Thanks a lot, Mike! 

Tuesday Aug 12, 2008

ZFS saved my data. Right now.

Kid with a swimming ring.As you know, I have a server at home that I use for storing all my photos, music, backups and more using the Solaris ZFS filesystem. You could say that I store my life on my server.

For storage, I use Western Digital's MyBook Essential Edition USB drives because they are the cheapest ones I could find from a well-known brand. The packaging says "Put your life on it!". How fitting.

Last week, I had a team meeting and a colleague introduced us to some performance tuning techiques. When we started playing with iostat(1M), I logged into my server to do some stress tests. That was when my server said something like this:

constant@condorito:~$ zpool status

(data from other pools omitted)

  pool: santiago
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
 scrub: scrub completed after 16h28m with 0 errors on Fri Aug  8 11:19:37 2008

	santiago     DEGRADED     0     0     0
	  mirror     DEGRADED     0     0     0
	    c10t0d0  DEGRADED     0     0   135  too many errors
	    c9t0d0   DEGRADED     0     0    20  too many errors
	  mirror     ONLINE       0     0     0
	    c8t0d0   ONLINE       0     0     0
	    c7t0d0   ONLINE       0     0     0

errors: No known data errors

This tells us 3 important things:

  • Two of my disks (c10t0d0 and c9t0d0) are happily giving me garbage back instead of my data. Without knowing it.
    Thanks to ZFS' checksumming, we can detect this, even though the drive thinks everything is ok.
    No other storage device, RAID array, NAS or file system I know of can do this. Not even the increasingly hyped (and admittedly cool-looking) Drobo [1].
  • Because both drives are configured as a mirror, bad data from one device can be corrected by reading good data from the other device. This is the "applications are unaffected" and "no known data errors" part.
    Again, it's the checksums that enable ZFS to distinguish good data blocks from bad ones, and therefore enabling self-healing while the system is reading stuff from disk.
    As a result, even though both disks are not functioning properly, my data is still safe, because (luckily, albeit with millions of blocks per disk, statistics is on my side here) the erroneous blocks don't overlap in terms of what pieces of data they store.
    Again, no other storage technology can do this. RAID arrays only kick in when the disk drives as a whole are unacessible or when a drive  diagnoses itself to be broken. They do nothing against silent data corruption, which is what we see here and what all people on this planet that don't use ZFS (yet) can't see (yet). Until it's too late.
  • Data hygiene is a good thing. Do a "zpool scrub <poolname>" once in a while. Use cron(1M) to do this, for example every other week for all pools.

Over the weekend, I ordered myself a new disk (sheesh, they dropped EUR 5 in price already after just 5 days...) and after a "zpool replace santiago c10t0d0 c11t0d0" on monday, my pool started resilvering:

constant@condorito:~$ zpool status

(data from other pools omitted)

  pool: santiago
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
 scrub: resilver in progress for 1h13m, 6.23% done, 18h23m to go

        NAME           STATE     READ WRITE CKSUM
        santiago       DEGRADED     0     0     0
          mirror       DEGRADED     0     0     0
            replacing  DEGRADED     0     0     0
              c10t0d0  DEGRADED     0     0   135  too many errors
              c11t0d0  ONLINE       0     0     0
            c9t0d0     DEGRADED     0     0    20  too many errors
          mirror       ONLINE       0     0     0
            c8t0d0     ONLINE       0     0     0
            c7t0d0     ONLINE       0     0     0

errors: No known data errors

The next step for me is to send the c10t0d0 drive back and ask for a replacement under warranty (it's only a couple of months old). After receiving c10's replacement, I'll consider sending in c9 for replacement (depending on how the next scrub goes).

Which makes me wonder: How will drive manufacturers react to a new wave of warranty cases based on drive errors that were not easily detectable before?

[1] To the guys at Drobo: Of course you're invited to implement ZFS into the next revision of your products. It's open source. In fact, Drobo and ZFS would make a perfect team!


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« August 2016