Wednesday Jan 14, 2009

How to get Audio to work on OpenSolaris on VirtualBox

Man playing a big trumpet My regular working environment on the go or when working from home is, of course, OpenSolaris. I've been using it on an Acer Ferrari Laptop for years now and I can say I'm very happy with it, and that's not just because I work for Sun.

Lately, I tried OpenSolaris on VirtualBox on my private MacBook Pro. This configuration turned out to work better than the native OpenSolaris on my company's Acer Ferrari laptop! Due to the MBP being 2 years newer and it having a dual-core CPU plus 4 GB of RAM, it turned out to be the better machine to host my OpenSolaris work environment.

With one exception: Audio.

Audio isn't enabled in VirtualBox by default in the Mac version and that has already been blogged elsewhere. The solution is simply to enable Audio in VirtualBox settings and select the Intel ICH AC97 soundchip.

Then, OpenSolaris doesn't come with an ICH AC97 audio driver and even the new SUNWaudiohd driver doesn't support it. The solution here is to download the OSS sound drivers from 4Front technologies. So far, so good.

But this didn't work for me: Either the sound would play for a few seconds, then hang, or the sound drivers wouldn't be recognized by GNOME/GStreamer at all, resulting in a crossed-out loudspeaker icon at the top! This is very frustrating if you want to show Brandan's excellent shouting video to an audience and have to switch out of OpenSolaris/VirtualBox back to Mac OS X just for that.

Apparently others suffered from the same annoyance, too, but neither of the solutions I found seemed to help: I installed and uninstalled and reinstalled the OSS drivers a number of times, ran the ossdevlinks script to recreate device links, even installed a newer, experimental version of the SUNaudiohd driver. No luck yet.

Then Frank, a Sun sales person who happens to use OpenSolaris on his laptop as well (Yay! a salesrep using OpenSolaris! Kudos to Frank!) suggested to uninstall the SUNWaudiohd driver, then install the OSS sound driver, which worked for him. It didn't occur to me that uninstalling SUNWaudiohd might be the solution, so I wanted to give it a try.

But, alas "pfexec pkg uninstall SUNaudiohd" didn't work for me either! Apparently there's a dependency between this package and the slim_install package bundle. Again, Google is your friend and it turned out to be a known bug that prevented me from uninstalling SUNWaudiohd. The workaround is simply to "pfexec pkg uninstall slim_install" which is no longer needed after the installation process anyway.

So lo and behold, gone is slim_install, gone is SUNWaudiohd, installed the OSS drivers, logged out and back in and audio works fine now! (Notice: no reboot required).

Here's the sweet and short way to audio goodness on OpenSolaris on VirtualBox:

  1. Shutdown your OpenSolaris VirtualBox image if it is running, so you can change it's settings.
  2. Activate audio for your OpenSolaris VM in VirtualBox. Select the ICH AC97 Chip. Here's a blog entry that describes the process.
  3. Boot your OpenSolaris VirtualBox image.
  4. Uninstall the slim_server package: "pfexec pkg uninstall slim_server"
  5. Uninstall the SUNWaudiohd driver: "pfexec pkg uninstall SUNWaudiohd"
  6. Download the OSS sound driver for OpenSolaris.
  7. Install the OSS sound driver: "pfexec pkgadd -d oss-solaris-v4.1-1051-i386.pkg" (Or whatever revision you happened to download).
  8. Log out of your desktop and log back in. Sound should work now.

Tuesday May 27, 2008

OpenSolaris Home Server: ZFS and USB Disks

My home server with a couple of USB disksA couple of weeks ago, OpenSolaris 2008.05, project Indiana, saw its first official release. I've been looking forward to this moment so I can upgrade my home server and work laptop and start benefiting from the many cool features. If you're running a server at home, why not use the best server OS on the planet for it?

This is the first in a small series of articles about using OpenSolaris for home server use. I did a similar series some time ago and got a lot of good and encouraging feedback, so this is an update, or a remake, or home server 2.0, if you will.

I'm not much of a PC builder, but Simon has posted his experience with selecting hardware for his home server. I'm sure you'll find good tips there. In my case, I'm still using my trusty old Sun Java W1100z workstation, running in my basement. And for storing data, I like to use USB disks.

USB disk advantages

This is the moment where people start giving me that "Yeah, right" or "Are you serious?" looks. But USB disk storage has some cool advantages:

  • It's cheap. About 90 Euros for half a TB of disk from a major brand. Can't complain about that.
  • It's hot-pluggable. What happens if your server breaks and you want to access your data? With USB it's as easy as unplug from broken server, plug into laptop and you're back in business. And there's no need to shut down or open your server if you just want to add a new disk or change disk configuration.
  • It scales. I have 7 disks running in my basement. All I needed to do to make them work with my server was to buy a cheap 15 EUR 4-port USB card to expand my existing 5 USB ports. I still have 3 PCI slots left, so I could add 12 disks more at full USB 2.0 speed if I wanted.
  • It's fast enough. I measure about 10MB/s in write performance with a typical USB disk. That's about as fast as you can get over a 100 MBit/s LAN network which most people use at home. As long as the network remains the bottleneck, USB disk performance is not the problem.

ZFS and USB: A Great Team

But this is not enough. The beauty of USB disk storage lies in its combination with ZFS. When adding some ZFS magic to the above, you also get:

  • Reliability. USB disks can be mirrored or used in a RAID-Z/Z2 configuration. Each disk may be unreliable (because they're cheap) individually, but thanks to ZFS' data integrity and self-healing properties, the data will be safe and FMA will issue a warning early enough so disks can be replaced before any real harm can happen.
  • Flexibility. Thanks to pooled storage, there's no need to wonder what disks to use for what and how. Just build up a single pool with the disks you have, then assign filesystems to individual users, jobs, applications, etc. on an as-needed basis.
  • Performance. Suppose you upgrade your home network to Gigabit Ethernet. No need to worry: The more disks you add to the pool, the better your performance will be. Even if the disks are cheap.

Together, USB disks and ZFS make a great team. Not enterprise class, but certainly an interesting option for a home server.

ZFS & USB Tips & Tricks

So here's a list of tips, tricks and hints you may want to consider when daring to use USB disks with OpenSolaris as a home server:

  • Mirroring vs. RAID-Z/Z2: RAID-Z (or its more reliable cousin RAID-Z2) is tempting: You get more space for less money. In fact, my earlier versions of zpools at home were a combination of RAID-Z'ed leftover slices with the goal to squeeze as much space as possible at some reliability level out of my mixed disk collection.
    But say you have a 3+1 RAID-Z and want to add some more space. Would you buy 4 disks at once? Isn't that a bit big, granularity-wise?
    That's why I decided to keep it simple and just mirror. USB disks are cheap enough, no need to be even more cheap. My current zpool has a pair of 1 TB USB disks and a pair of 512 GB USB disks and works fine.
    Another advantage of this aproach is that you can organically modernize your pool: Wait until one of your disks starts showing some flakyness (FMA and ZFS will warn you as soon as the first broken data block has been repaired). Then replace the disk with a bigger one, then its mirror with the same, bigger size. That will give you more space without the complexity of too many disks and keep them young enough to not be a serious threat to your data. Use the replaced disks for scratch space or less important tasks.
  • Instant replacement disk: A few weeks ago, one of my mirrored disks showed its first write error. It was a pair of 320GB disks, so I ordered a 512GB replacement (with the plan to order the second one later). But now, my mirror may be vulnerable: What if the second disk starts breaking before the replacement has arrived?
    That's why having a few old but functional disks around can be very valuable: In my case, took a 200GB and a 160GB disk and combined them into their own zpool:
    zpool create temppool c11t0d0 c12t0d0
    Then, I created a new ZVOL sitting on the new pool:
    zfs create -sV 320g temppool/tempvol
    Here's out temporary replacement disk! I then attached it to my vulnerable mirror:
    zfs attach santiago c10t0d0 /dev/zvol/dsk/temppool/tempvol
    And voilá, my precious production pool stated resilvering the new virtual disk. After the new disk arrived and has been resilvered, the temporary disk can be detached, destroyed and its space put to some other good use.
    Storage virtualization has never been so easy!
  • Don't forget to scrub: Especially with cheap USB disks, regular scrubbing is important. Scrubbing will check each and every block of your data on disk and make sure it's still valid. If not, it will repair it (since we're mirroring or using RAID-Z/Z2) and tell you what disk had a broken block so you can decide whether it needs to be replaced or not just yet.
    How often you want to or should scrub depends on how much you trust your hardware and how much your data is being read out anyway (any data that is read out is automatically checked, so that particular portion of the data is already "scrubbed" if you will). I find scrubbing once every two weeks a useful cycle, othery may prefer once a month or once a week.
    But scrubbing is a process that needs to be initiated by the administrator. It doesn't happen by itself, so it is important that you think of issuing the "zpool scrub" command regularly, or better, set up a cronjob for it to happen automatically.
    As an example, the following line:
    23 01 1,15 \* \* for i in `zpool list -H -o name`; do zpool scrub $i; done
    in your crontab will start a scrub for each of your zpools twice a month on the 1st and the 15th at 01:23 AM.
  • Snapshot often: Snapshots are cheap, but they can save the world if you accientally deleted that important file. Same rule as with scrubbing: Do it. Often enough. Automatically. Tim Foster did a great job of implementing an automatic ZFS snapshot service, so why don't you just install it now and set up a few snapshot schemes for your favourite ZFS filesystems?
    The home directories on my home server are snapshotted once a month (and all snapshots are kept), once a week (keeping 52 snapshots) and once a day (keeping 31 snapshots). This gives me a time-machine with daily, weekly and monthly granularities depending on how far back in time I want to travel through my snapshots.

So, USB disks aren't bad. In fact, thanks to ZFS, USB disks can be very useful building blocks for your own little cost-effective but reliable and integrity-checked data center.

Let me know what experiences you made while using USB storage at home, or with ZFS and what tips and tricks you have found to work well for you. Just enter a comment below or send me email!

Monday Apr 28, 2008

Presenting images and screenshots the cool, 3D, shiny way

My daughter Amanda in her 2D cheerfulnessIf you give a presentation about hardware products, it is easy to make your slides look good: Remove boring text, add nice photos of your hardware and all is well.

But what if you have to present on software, some web service or give a Solaris training with lots of command line stuff?

Sure, you can do screenshots and hope that the GUI looks nice. Or use other photos (like the one to the left) that may or may not relate to the software you present about.

But screenshots and photos (to a lesser degree) are so, well, 2D. They look boring. Wouldn't it be nice to present your screenshots the way Apple presents its iTunes software? Like add some 3D depth to your slide-deck or website, with a nice, shiny, reflective underground?

Well, you don't need to spend thousands of dollars with art departments and graphics artists (they'd be glad to do something different for a change) or work long hours with Photoshop or the Gimp (a most excellent piece of software, BTW),  trying to create that stylish 3D look. Here's a script that can do this easily for you!

You're probably wondering why my daughter Amanda shows up at the top of this article. Well, she was volunteered to be a test subject for my new script. The script uses ImageMagick and POV-Ray in a similar way to my earlier photocube script that we now use to generate the animated cube of the HELDENFunk show notes. It places any image you give it into a 3D space and adds a nice, shiny reflection to it. Let's see how Amanda looks like after she's been through the script:

-bash-3.00$ ./ -s 200 Amanda_small.jpg
Fetching and pre-processing file:///home/constant/software/featurepic/Amanda_small.jpg
Rendering image.
Writing image: featurepic.jpg

Amanda, in her new 3D shininess

The size (-s) parameter defines the length of either width or height of the result image, whichever is larger. In this case, we choose an image size of a maximum of 200x200 pixels, so the image can fit this blog. You can see the result to the right. Nice, eh?

As you can see, her picture has now been placed into a 3D scene, slightly rotated to the left, onto a shiny, white surface. More interesting than the usual flat picture on a blog, isn't it?

The script uses POV-Ray to place and rotate the photo in 3D and to generate the reflection. ImageMagick is used for pre- and post-processing the image. The reflection is not real, it is actually the same picture, flipped across the y axis and with a gradient transparency applied to it. That way, the reflection can be controlled much better. I tried the real thing and it didn't want to look artistic enough :).

The amount of rotation, the reflection intensity and the length of the reflective tail can be adjusted with command-line switches, so can the height of the camera. Here's an example that uses all of these parameters:

-bash-3.00$ ./ -h
Usage: ./ [-a angle] [-c cameraheight] [-p] [-r reflection] [-s size] [-t taillength] image
-p creates a PNG image with transparency, otherwise a JPEG image is created.
Defaults: -a 15 -c 0.3 -r 0.3 -s 512 -t 0.3
-bash-3.00$ ./ -a 30 -c 0.1 -r 0.8 -s 200 -t 0.5 Amanda_small.jpg
Fetching and pre-processing file:///home/constant/software/featurepic/Amanda_small.jpg
Rendering image.
Writing image: featurepic.jpg

Amanda with more shinynessThe angle is in degrees and can be negative. One good thing about rotating the image into 3D is that you gain horizontal real estate to fit that slightly longer bullet point in. It helps you trade-off image width for height without losing too much detail. An angle value of up to 30 is still ok, I wouldn't recommend more than that.

The camera height (-c) value is relative to the picture: 0 is ground level, 1 is at the top edge. The camera will always look at the center of the image. Camera height values below 0.5 are good because a camera below the subject makes it look slightly more impressing. Values above 0.5 make you look down at the picture, making it a bit smaller and less significant.

The reflection intensity (-r) goes from 0 (no reflection) to 1 (perfect mirror) while the length of the reflection (the fade-off "tail", -t) goes from 0 (no tail) to 1 (same size as image). Smaller values for reflection and the tail length make the reflection more subtle and less distracting. I think the default values are very good for most cases.

Check out the -p option for a nicer way to integrate the resulting image into other graphical elements of your presentation. It creates a PNG image with a transparency channel. This means you can place it above other graphical elements (such as a different background color) and the reflection will still look right. See the next example to the right, where Amanda prefers a pink background. Keep in mind that the rendering step still assumes a white background, so drastic changes in background may or may not result in slight artifacts at the edges.

Amanda loves pink backgrounds!

You can also use this script with some pictures of hardware to make them look more interesting, if the hardware shot is dead front and if it doesn't have any border at the bottom. Use an angle value of 0, this will place your hardware onto that virtual glossy carbon plastic that makes it look nicer. See below for an embellished Sun Fire T5440 Server, the new flagship in our line of Chip-Multi-Threading (CMT) servers.

This script should work on any unixoid OS, especially Solaris, that understands sh and where a reasonably recent (6.x.x) version of ImageMagick and POV-Ray are available.

You can get ImageMagick and POV-Ray from their websites. On Solaris, you can easily install them through Blastwave. The version of ImageMagick that is shipped with Solaris in /usr/sfw is not recent enough for the way I'm using it, so the Blastwave version is recommended at the moment.

The Sun Fire T5440 Server, plus some added shinyness.The script is free, open source, distributed under the CDDL and you don't have to attribute its use when using the resulting images in your own presentations, websites, or other derivative work.

It's free as in "free beer". Speaking of which, if you like this script, leave a comment or send me email at constantin at sun dot com telling me what you did with it, what other features you'd like to see in the script and where I can meet you for some beer :).


Thursday Mar 20, 2008

How to compile/run MediaTomb on Solaris for PS3 and other streaming clients

MediaTomb showing photos on a PS3Before visiting CeBIT, I went to see my friend Ingo who works at the Clausthal University's computing center (where I grew up, IT-wise). This is a nice pre-CeBIT tradition we keep over the years when we get to watch movies in Ingo's home cinema and play computer games all day for a weekend or so :).

To my surprise, Ingo got himself a new PlayStation 3 (40GB). The new version is a lot cheaper (EUR 370 or so), less noisy (new chip process, no PS2 compatibility), and since HD-DVD is now officially dead, it's arguably the best value for money in Blu-Ray players right now (regular firmware upgrades, good picture quality, digital audio and enough horsepower for smooth Java BD content). All very rational and objective arguments to justify buying a new game console :).

The PS3 is not just a Blu-Ray player, it is also a game console (I recommend "Ratchett&Clank: Tools of Destruction" and the immensely cute "LocoRoco: Cocoreccho!", which is a steal at only EUR 3) and can act as a media renderer for DLNA compliant media servers: Watch videos, photos and listen to music in HD on the PS 3 from your home server.

After checking out a number of DLNA server software packages, it seemed to me that MediaTomb is the most advanced open source one (TwonkyVision seems to be nicer, but sorry, it isn't open source...). So here is a step-by-step guide on how to compile and run it in a Solaris machine.

Basic assumptions

This guide assumes that you're using a recent version of Solaris. This should be at least Solaris 10 (it's free!), a current Solaris Express Developer Edition (it's free too, but more advanced) is recommended. My home server runs Solaris Express build 62, I'm waiting for a production-ready build of Project Indiana to upgrade to.

I'm also assuming that you are familiar with basic compilation and installation of open source products.

Whenever I compile and install a new software package from scratch, I use /opt/local as my base directory. Others may want to use /usr/local or some other directory (perhaps in their $HOME). Just make sure you use the right path in the --prefix=/your/favourite/install/path part of the ./configure command.

I'm also trying to be a good citizen and use the Sun Studio Compiler here where I can. It generally produces much faster code on both SPARC and x86 architectures vs. the ubiquitous gcc, so give it a try! Alas, sometimes the code was really stubborn and it wouldn't let me use Sun Studio so I had to use gcc. This was the path of least resistance, but with some tinkering, everything can be made to compile on Sun Studio. You can also try gcc4ss which combines a gcc frontend with the Sun Studio backend to get the best of both worlds.

Now, let's get started!

MediaTomb Prerequisites

Before compiling/installing the actual MediaTomb application, we need to install a few prerequisite packages. Don't worry, most of them are already present in Solaris, and the rest can be easily installed as pre-built binaries or easily compiled on your own. Check out the MediaTomb requirements documentation. Here is what MediaTomb wants:

  • sqlite3, libiconv and curl are available on BlastWave. BlastWave is a software repository for Solaris packages that has almost everything you need in terms of pre-built open source packages (but not MediaTomb...). Setting up BlastWave on your system is easy, just follow their guide. After that, installing the three packages above is as easy as:
    # /opt/csw/bin/pkg-get -i sqlite3
    # /opt/csw/bin/pkg-get -i libiconv
    # /opt/csw/bin/pkg-get -i curl
  • MediaTomb uses a library called libmagic to identify file types. It took a little research until I found out that it is part of the file package that is shipped as part of many Linux distributions. Here I'm using file-4.23.tar.gz, which seems to be a reasonably new version. Fortunately, this is easy to compile and install:

    $ wget
    $ gzip -dc  file-4.23.tar.gz | tar xvf -$ cd file-4.23
    $ CC=/opt/SUNWspro/bin/cc ./configure --prefix=/opt/local
    $ gmake
    $ su
    # PATH=$PATH:/usr/ccs/bin:/usr/sfw/bin; export PATH; gmake install

    Notice that the last step is performed as root for installation purposes while compilation should generally be performed as a regular user.

  • For tag extraction of MP3 files, MediaTomb uses taglib:
    $ wget
    $ cd taglib-1.5
    $ CC=/usr/sfw/bin/gcc CXX=/usr/sfw/bin/g++ ./configure --prefix=/opt/local
    $ gmake
    $ su
    # PATH=$PATH:/usr/ccs/bin:/usr/sfw/bin; export PATH; gmake install
  • MediaTomb also uses SpiderMonkey, which is the Mozilla JavaScript Engine. Initially, I had some fear about having to compile all that Mozilla code from scratch, but then it dawned on me that we can just use the JavaScript libraries that are part of the Solaris Firefox standard installation, even the headers are there as well!

That was it. Now we can start building the real thing...

 Compiling and installing MediaTomb

Now that we have all prerequisites, we can move on to downloading, compiling and installing the MediaTomb package:

  • Download the MediaTomb source from
  • Somehow, the mediatomb developers want to enforce some funny LD_PRELOAD games which is uneccesary (at least on recent Solaris versions...). So let's throw that part of the code out: Edit src/ and comment lines 128-141 out by adding /\* before line 128 and \*/ at the end of line 141.
  • Now we can configure the source to our needs. This is where all the prerequisite packages from above are configured in:
    --prefix=/opt/local --enable-iconv-lib --with-iconv-h=/opt/csw/include
    --with-iconv-libs=/opt/csw/lib --enable-libjs
    --with-js-h=/usr/include/firefox/js --with-js-libs=/usr/lib/firefox
    --enable-libmagic --with-magic-h=/opt/local/include
    --with-magic-libs=/opt/local/lib --with-sqlite3-h=/opt/csw/include

    Check out the MediaTomb compile docs for details. One hurdle here was to use an extra iconv library because the MediaTomb source didn't work with the gcc built-in iconv library. Also, there were some issues with the Sun Studio compiler, so I admit I was lazy and just used gcc instead. 

  • After these preparations, compiling and installing should work as expected:
    PATH=$PATH:/usr/ccs/bin:/usr/sfw/bin; export PATH; gmake install

Configuring MediaTomb

Ok, now we have successfully compiled and installed MediaTomb, but we're not done yet. The next step is to create a configuration file that works well. An initial config will be created automatically during the very first startup of MediaTomb. Since we compiled in some libraries from different places, we either need to set LD_LIBRARY_PATH during startup (i.e. in a wrapper script) or update the linker's path using crle(1).

In my case, I went for the first option. So, starting MediaTomb works like this:

/opt/local/bin/mediatomb --interface bge0 --port 49194 --daemon
--pidfile /tmp/

Of course you should substitute your own interface. The port number is completely arbitrary, it should just be above 49152. Read the command line option docs to learn how they work.

You can now connect to MediaTomb's web interface and try out some stuff, but the important thing here is that we now have a basic config file in $HOME/.mediatomb/config.xml to work with. The MediaTomb config file docs should help you with this.

Here is what I added to my own config and why:

  • Set up an account for the web user interface with your own user id and password. It's not the most secure server, but better than nothing. Use something like this in the <ui> section:
    <accounts enabled="no" session-timeout="30">
      <account user="me" password="secret"/>
  • Uncomment the <protocolInfo> tag because according to the docs, this is needed for better PS3 compatibility.
  • I saw a number of iconv errors, so I added the following to the config file in the import section. Apparently, MediaTomb can better handle exotic characters in file names (very common with music files) with the following tag:
  • The libmagic library won't find its magic information because it's now in a nonstandard place. But we can add it with the following tag, again in the import section:
  • A few mime types should be added for completeness:

    <map from="mpg" to="video/mpeg"/>
    <map from="JPG" to="image/jpeg"/>
    <map from="m4a" to="audio/mpeg"/>

    Actually, it should "just work" through libmagic, but it didn't for me, so adding theses mime types was the easiest option. It also improves performance through saving libmagic calls. Most digital cameras use the uppercase "JPG" extension and MediaTomb seems to be case-sensitive so adding the uppercase variant was necessary. It's also apparent that MediaTomb doesn't have much support for AAC (.m4a) even though it is the official successor to MP3 (more than 95% of my music is in AAC format, so this is quite annoying).

  • You can now either add <directory> tags to the <autoscan> tags for your media data in the config file, or add them through the web interface.

MediaTomb browser on a PS3This is it. The pictures show MediaTomb running in my basement and showing some photos through the PS3 on the TV set. I hope that you can now work from here and find a configuration that works well for you. Check out the MediaTomb scripting guide for some powerful ways to create virtual directory structures of your media files.

MediaTomb is ok to help you show movies and pictures and the occasional song on the PS3 but it's not perfect yet. It lacks support for AAC (tags, cover art, etc.) and it could use some extra scripts for more comfortable browsing structures. But that's the point of open source: Now we can start adding more features to MediaTomb ourselves and bring it a few steps closer to usefulness.

Tuesday Feb 19, 2008

VirtualBox and ZFS: The Perfect Team

I've never installed Windows in my whole life. My computer history includes systems like the Dragon 32, the Commodore 128, then the Amiga, Apple PowerBook (68k and PPC) etc. plus the occasional Sun system at work. Even the laptop my company provided me with only runs Solaris Nevada, nothing else. Today, this has changed. 

A while ago, Sun announced the acquisition of Innotek, the makers of the open-source virtualization software VirtualBox. After having played a bit with it for a while, I'm convinced that this is one of the coolest innovations I've seen in a long time. And I'm proud to see that this is another innovative german company that joins the Sun family, Welcome Innotek!

Here's why this is so cool.

Windows XP running on VirtualBox on Solaris Nevada

After having upgraded my laptop to Nevada build 82, I had VirtualBox up and running in a matter of minutes. OpenSolaris Developer Preview 2 (Project Indiana) runs fine on VirtualBox, so does any recent Linux (I tried Ubuntu). But Windows just makes for a much cooler VirtualBox demo, so I did it:

After 36 years of Windows freedom, I ended up installing it on my laptop, albeit on top of VirtualBox. Safer XP if you will. To the top, you see my VirtualBox running Windows XP in all its Tele-Tubby-ish glory.

As you can see, this is a plain vanilla install, I just took the liberty of installing a virus scanner on top. Well, you never know...

So far, so good. Now let's do something others can't. First of all, this virtual machine uses a .vdi disk image to provide hard disk space to Windows XP. On my system, the disk image sits on top of a ZFS filesystem:

# zfs list -r poolchen/export/vm/winxp
NAME                                                          USED  AVAIL  REFER  MOUNTPOINT
poolchen/export/vm/winxp                                     1.22G  37.0G    20K  /export/vm/winxp
poolchen/export/vm/winxp/winxp0                              1.22G  37.0G  1.05G  /export/vm/winxp/winxp0
poolchen/export/vm/winxp/winxp0@200802190836_WinXPInstalled   173M      -   909M  -
poolchen/export/vm/winxp/winxp0@200802192038_VirusFree           0      -  1.05G  -

Cool thing #1: You can do snapshots. In fact I have two snapshots here. The first is from this morning, right after the Windows XP installer went through, the second has been created just now, after installing the virus scanner. Yes, there has been some time between the two snapshots, with lots of testing, day job and the occasional rollback. But hey, that's why snapshots exist in the first place.

Cool thing #2: This is a compressed filesystem:

# zfs get all poolchen/export/vm/winxp/winxp0
NAME                             PROPERTY         VALUE                    SOURCE
poolchen/export/vm/winxp/winxp0  type             filesystem               -
poolchen/export/vm/winxp/winxp0  creation         Mon Feb 18 21:31 2008    -
poolchen/export/vm/winxp/winxp0  used             1.22G                    -
poolchen/export/vm/winxp/winxp0  available        37.0G                    -
poolchen/export/vm/winxp/winxp0  referenced       1.05G                    -
poolchen/export/vm/winxp/winxp0  compressratio    1.53x                    -
poolchen/export/vm/winxp/winxp0  compression      on                       inherited from poolchen

ZFS has already saved me more than half a gigabyte of precious storage capacity already! 

Next, we'll try out Cool thing #3: Clones. Let's clone the virus free snapshot and try to create a second instance of Win XP from it:

# zfs clone poolchen/export/vm/winxp/winxp0@200802192038_VirusFree poolchen/export/vm/winxp/winxp1
# ls -al /export/vm/winxp
total 12
drwxr-xr-x   5 constant staff          4 Feb 19 20:42 .
drwxr-xr-x   6 constant staff          5 Feb 19 08:44 ..
drwxr-xr-x   3 constant staff          3 Feb 19 18:47 winxp0
drwxr-xr-x   3 constant staff          3 Feb 19 18:47 winxp1
dr-xr-xr-x   3 root     root           3 Feb 19 08:39 .zfs
# mv /export/vm/winxp/winxp1/WindowsXP_0.vdi /export/vm/winxp/winxp1/WindowsXP_1.vdi

The clone has inherited the mountpoint from the upper level ZFS filesystem (the winxp one) and so we have everything set up for VirtualBox to create a second Win XP instance from. I just renamed the new container file for clarity. But hey, what's this?

VirtualBox Error Message 

Damn! VirtualBox didn't fall for my sneaky little clone trick. Hmm, where is this UUID stored in the first place?

# od -A d -x WindowsXP_1.vdi | more
0000000 3c3c 203c 6e69 6f6e 6574 206b 6956 7472
0000016 6175 426c 786f 4420 7369 206b 6d49 6761
0000032 2065 3e3e 0a3e 0000 0000 0000 0000 0000
0000048 0000 0000 0000 0000 0000 0000 0000 0000
0000064 107f beda 0001 0001 0190 0000 0001 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
0000336 0000 0000 0200 0000 f200 0000 0000 0000
0000352 0000 0000 0000 0000 0200 0000 0000 0000
0000368 0000 c000 0003 0000 0000 0010 0000 0000
0000384 3c00 0000 0628 0000 06c5 fa07 0248 4eb6
0000400 b2d3 5c84 0e3a 8d1c
8225 aae4 76b5 44f5
0000416 aa8f 6796 283f db93 0000 0000 0000 0000
0000432 0000 0000 0000 0000 0000 0000 0000 0000
0000448 0000 0000 0000 0000 0400 0000 00ff 0000
0000464 003f 0000 0200 0000 0000 0000 0000 0000
0000480 0000 0000 0000 0000 0000 0000 0000 0000
0000512 0000 0000 ffff ffff ffff ffff ffff ffff
0000528 ffff ffff ffff ffff ffff ffff ffff ffff
0012544 0001 0000 0002 0000 0003 0000 0004 0000

Ahh, it seems to be stored at byte 392, with varying degrees of byte and word-swapping. Some further research reveals that you better leave the first part of the UUID alone (I spare you the details...), instead, the last 6 bytes: 845c3a0e1c8d, sitting at byte 402-407 look like a great candidate for an arbitrary serial number. Let's try changing them (This is a hack for demo purposes only. Don't do this in production, please):

# dd if=/dev/random of=WindowsXP_1.vdi bs=1 count=6 seek=402 conv=notrunc
6+0 records in
6+0 records out
# od -A d -x WindowsXP_1.vdi | more
0000000 3c3c 203c 6e69 6f6e 6574 206b 6956 7472
0000016 6175 426c 786f 4420 7369 206b 6d49 6761
0000032 2065 3e3e 0a3e 0000 0000 0000 0000 0000
0000048 0000 0000 0000 0000 0000 0000 0000 0000
0000064 107f beda 0001 0001 0190 0000 0001 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
0000336 0000 0000 0200 0000 f200 0000 0000 0000
0000352 0000 0000 0000 0000 0200 0000 0000 0000
0000368 0000 c000 0003 0000 0000 0010 0000 0000
0000384 3c00 0000 0628 0000 06c5 fa07 0248 4eb6
0000400 b2d3 2666 6fbb c1ca 8225 aae4 76b5 44f5
0000416 aa8f 6796 283f db93 0000 0000 0000 0000
0000432 0000 0000 0000 0000 0000 0000 0000 0000
0000448 0000 0000 0000 0000 0400 0000 00ff 0000
0000464 003f 0000 0200 0000 0000 0000 0000 0000
0000480 0000 0000 0000 0000 0000 0000 0000 0000
0000512 0000 0000 ffff ffff ffff ffff ffff ffff
0000528 ffff ffff ffff ffff ffff ffff ffff ffff
0012544 0001 0000 0002 0000 0003 0000 0004 0000

Who needs a hex editor if you have good old friends od and dd on board? The trick is in the "conv=notruc" part. It tells dd to leave the rest of the file as is and not truncate it after doing it's patching job. Let's see if it works:

VirtualBox with two Windows VMs, one ZFS-cloned from the other.

Heureka, it works! Notice that the second instance is running with the freshly patched harddisk image as shown in the window above.

Windows XP booted without any problem from the ZFS-cloned disk image. There was just the occasional popup message from Windows saying that it found a new harddisk (well observed, buddy!).

Thanks to ZFS clones we can now create new virtual machine clones in just seconds without having to wait a long time for disk images to be copied. Great stuff. Now let's do what everybody should be doing to Windows once a virus scanner is installed: Install Firefox:

Clones WinXP instance, running FireFox

I must say that the performance of VirtualBox is stunning. It sure feels like the real thing, you just need to make sure to have enough memory in your real computer to support both OSes at once, otherwise you'll run into swapping hell...

BTW: You can also use ZFS volumes (called ZVOLs) to provide storage space to virtual machines. You can snapshot and clone them just like regular file systems, plus you can export them as iSCSI devices, giving you the flexibility of a SAN for all your virtualized storage needs. The reason I chose files over ZVOLs was just so I can swap pre-installed disk images with colleagues. On second thought, you can dump/restore ZVOL snapshots with zfs send/receive just as easily...

Anyway, let's see how we're doing storage-wise:

# zfs list -rt filesystem poolchen/export/vm/winxp
NAME                              USED  AVAIL  REFER  MOUNTPOINT
poolchen/export/vm/winxp         1.36G  36.9G    21K  /export/vm/winxp
poolchen/export/vm/winxp/winxp0  1.22G  36.9G  1.05G  /export/vm/winxp/winxp0
poolchen/export/vm/winxp/winxp1   138M  36.9G  1.06G  /export/vm/winxp/winxp1

Watch the "USED" column for the winxp1 clone. That's right: Our second instance of Windows XP only cost us a meager 138 MB on top of the first instance's 1.22 GB! Both filesystems (and their .vdi containers with Windows XP installed) represent roughly a Gigabyte of storage each (the REFER column), but the actual physical space our clone consumes is just 138MB.

Cool thing #4: ZFS clones save even more space, big time!

How does this work? Well, when ZFS creates a snapshot, it only creates a new reference to the existing on-disk tree-like block structure, indicating where the entry point for the snapshot is. If the live filesystem changes, only the changed blocks need to be written to disk, the unchanged ones remain the same and are used for both the live filesystem and the snapshot.

A clone is a snapshot that has been marked writable. Again, only the changed (or new) blocks consume additional disk space (in this case Firefox and some WinXP temporary data), everything that is unchanged (in this case nearly all of the WinXP installation) is shared between the clone and the original filesystem. This is de-duplication done right: Don't create redundant data in the first place!

That was only one example of the tremenduous benefits Solaris can bring to the virtualization game. Imagine the power of ZFS, FMA, DTrace, Crossbow and whatnot for providing the best infrastructure possible to your virtualized guest operating systems, be they Windows, Linux, or Solaris. It works in the SPARC world (through LDOMs), and in the x86/x64 world through xVM server (based on the work of the Xen community) and now joined by VirtualBox. Oh, and it's free and open source, too.

So with all that: Happy virtualizing, everyone. Especially to everybody near Stuttgart.

Tuesday Feb 05, 2008

Faster Email Deletion

Cook unloading a very large pumpkin from a truckA while ago I wrote about email efficiency, which turned out to be a fairly popular article. During the recent holidays, I had about 4 weeks of vacation. The pile of email that accumulated during this period was humongous! Hundreds of emails in my Inbox, after server-side filtering (Which means that I need to pay real attention to all of them) and over 7000 emails in my "To read" folder (Which means finding a few needles in a large haystack).

It became apparent, that the speed at which one can delete emails is a survival factor in the information age.

During "normal" days, I would scan my Inbox (usually 20-30 mails a day, after server-side filtering) in the right order (My emails are sorted by priority thanks to the Thunderbird mail client). Then file away the good ones (into a single folder, since Thunderbird's search capabilities are real good) and delete the rest. After that, I would scan the "To read" folder for anything interesting and only occasionally file away something. Most of the time, I would just hit the delete key.

Thunderbird's threading ability is a great help: I can view email discussions on popular aliases with dozens of emails each as a single thread, then delete the whole thread (Shift-Ctrl-A to select, then Del) at once. Two keystrokes, dozens of emails gone. But this doesn't help with 7000 emails in your "To read" list while scanning them for a few dozen that might be interesting.

The numbers add up: While I'm pretty good at scanning email subjects and senders for keep/delete decisions, every "delete" action takes about a second or two for the transaction to complete with the email server. I simply didn't have 7000-14000 seconds (That's 2-4 hours!) for deleting only.

Then it struck me: Why not just scan, file away the good ones, and at the end select all that's left and do one huge delete action. Most of the 7000 emails gone in just 2 seconds, after scanning them for about an hour or so.

I didn't bother thinking about this earlier, because deleting 200-300 emails a day in smaller batches doesn't feel like much. Only after they added up over the holiday, it became apparent to me what a huge time-sink the simple act of deleting emails can be. Now, I'm always doing it this way: Scan first, file away the good ones, then hit Ctrl-A, then DEL and everything else is gone. This gives me about half a day of time back per month. Half. A. Day. Per. Month. 

Sometimes, it's just those little things that can make a huge difference.

P.S.: If you want me to read your email, don't put me on BCC only.

Monday Jan 21, 2008

The Newly Found Art of Presenting

I haven't blogged for a long time. Ok, there were the holidays and I was on vacation for the first two weeks of January etc. but that's not a good excuse. I'm not the this-is-what-I-did blogger, because I don't find that too interesting to blog about. Follow me on Twitter if you're really bored. I don't like the I-read-this-on-that-blog-and-agree/disagree/opine style either, as I'm not particularly chatty and much of what is written in this style seems kinda redundant.

I prefer picking an interesting subject and try to write something that hit me as useful and that I hope maybe useful for others.

Last week for instance, I was invited to present on a Sun internal event. Three presentations on broad and complex topics, and then my time was cut to 3 x 30 Minutes because there were so many other presentations to include into the agenda.

Last year, I ran into Guy Kawasaki's "10/20/30 rule of Powerpoint" (Guy, I assure you, StarOffice/OpenOffice is compatible with this rule, as with virtually all PowerPoint presentations). Watch him illustrate the concept in this 2 minute fun video:

I made a news year's resolution to try it out and stick to it as much as possible.

First presentation: "Workstations and High End Visualization Solutions". Hmm, two topics. Will it work? Slide #1 was about Guy's rules, as a warning to the audience. 9 to go. Slide #2 shows the Sun workstation portfolio, both SPARC and x86 ones. Then: Positioning, usefulness of >4 cores in a desktop, monitors, NVIDIA Quadro Plex, NVIDIA Tesla, Visualization in HPC overview, Sun Visualization Software, Summary. I didn't read a single word from my slides, they were all mostly (except for some tech specs) 30 points or larger anyway and the audience grasped them instantly. Instead, I enjoyed a nice flow of information to the audience, adorning the slide content with customer stories, practical examples and the occasional joke. After 20 minutes, there was still room left for Q&A which fit exactly into my remaining 10 minutes comfortably. It worked!

Second: "EcoComputing and UltraSPARC T1/T2". This isn't going to be easy... The presentation I stole (and compressed) the Eco part from is 20 slides long (thanks, Rolf!) and a "good" UltraSPARC presentation can easily have 50 slides! But, so be it: # of servers in Europe (millions), the amount of power they consume (37TWh or 4 nuclear power plants in 2006), stuff that helps (consolidation, more efficient servers, Sun Rays, tape), UltraSPARC T2 servers, UltraSPARC T2 chip overview and features, example benchmark (SPECjbb2005, 10x more efficient than a GHz hungry server), application compatibility (It runs almost everything beautifully), project eTude, customer example, Victoria Falls (it gets even better). It worked again: All major messages came through, they won't be forgotten. Let me quote a favourite technology of mine: "The audience is listening." Saving the world with 10 slides in 20 minutes!

Last topic: Web 2.0. A tough deck to prepare. A topic dear to my heart. So much to say, so little space-time. Guy makes it really hard to prepare as he forces you to dig deeply into your content and really bring out what's essential: Web 1.0 vs. Web 2.0, The Flickr Example, User Generated Content & Wisdom of Crowds, Long Tail, Folksonomy & Tagging, Social Networks & Viral Distribution, Rich Web Interfaces, Open APIs & Mashups, The New World of Software (we're in the middle of buying a piece of it), What This All Means & Call to Action. I may have cheated with all the "&"'s that combine 2 slides into one. And I'd like them to be more visual which I'm going to change. And still too much text. But it doesn't matter: I hardly use the text on the slides, they're just waypoints in an excited, motivating plea for participation. I'm sweating. It was the last talk of the day but the audience is still there and they loved it!

Simplicity is king, especially with presentations. Limiting your deck to just 10 slides doesn't help you with preparation: Each slide for the presentations above took a lot of work to create (or not to create), but it forced me to concentrate on what's important. And it worked out just great. Thank you, Guy!

Want more? Try out the Presentation Zen blog. Today, I received the book in my mail, it's brand new, still warm and I'm looking forward to reading it.

Sunday Oct 21, 2007

How to burn high resolution DVD-Audio DVDs on Solaris and Linux (And it's legal!)

This weekend I've burned my first DVD-Audio DVD with high resolution music at 96 kHz/24 Bit.

It all started with this email I got from Linn Records, advertising the release of their Super Audio Surround Collection Vol 3 Sampler (Yes, targeted advertising works, but only if customers choose to receive it), which is offered in studio master quality FLAC format files, as a download. Gerald and I applauded Linn Records a few months ago for offering high quality music as lossless quality downloads, so I decided to try out their high resolution studio master quality offerings.

The music comes as 96kHz/24 Bit FLAC encoded files. These can be played back quite easily on a computer with a high resolution capable sound card, but computers don't really look good in living rooms, despite all the home theater PC and other efforts. The better alternative is to burn your own DVD-Audio and then use a DVD-A capable DVD player connected to your HiFi-amplifier to play back the music.

There's a common misconception that "DVD-Audio" means "DVD-Video" without the picture which is wrong. DVD-Video is one standard, aimed at reproducing movies, that uses PCM, AC-3, DTS or MP2 (mostly lossy) for encoding audio, while DVD-Audio sacrifices moving pictures (allowing only still ones for illustration) so it can use the extra bandwidth for high resolution audio, encoded as lossless PCM or lossless MLP bitstreams. Also, note that it is not common for regular DVD-players to accept DVD-Audio discs, they must state that they can handle the format, otherwise you're out of luck. Some if not most DVD-Audio Discs are hybrid in that they offer the content stored in DVD-Audio format additionally as DVD-Video streams with one of the lossy DVD-Video audio codecs so they can be played on both DVD-Video and DVD-Audio players.


Now, after having downloaded a bunch of high-res FLAC audio files, how can you create a DVD-Audio disc? Here's a small open source program called dvda-author that does just that: Give it a bunch of FLAC or WAV files and a directory, and it'll create the correct DVD-A UDF file structure for you. It compiles very easily on Solaris so I was able to use my Solaris fileserver in the basement where I downloaded the songs to. Then you give the dvda-author output directory along with a special sort file (supplied by dvda-author) to mkisofs (which is included in Solaris in the /usr/sfw directory) and it'll create a DVD ISO image that you can burn onto any regular DVD raw media. It's all described nicely on the dvda-author How-To page. Linn Records also supplies a PNG image to download along with the music that you can print and use as your DVD-Audio cover.

And how about iPods and other MP3-Players? Most open source media players such as the VideoLan Client (VLC) can transcode from high resolution FLAC format to MP3 or AAC so that's easily done, too. For Mac users, there's a comfortable utility called XLD that does the transcoding for you.

Here's common misconception #2: Many people think AAC is proprietary to Apple, mostly because Apple is heavily advertising its use as their standard for music encoding. This is wrong. AAC is actually an open standard, it is part of the ISO/IEC MPEG-4 specification and it is therefore the legitimate successor to MP3. AAC delivers better audio quality at lower bitrates and even the inventors of MP3, the Fraunhofer IIS institute treat AAC as the legitimitate successor, just check their current projects page under the letter "A". Apple developed the "Fairplay" DRM extension to Quicktime (which is the official MPEG-4/AAC encapsulation format) to be able to sell their iTunes Music Store as a download portal to the music industry. Fairplay is proprietary to Apple, but has nothing to do with AAC per se.

As much as I love Apple's way of using open standards wherever possible, I don't think it's a good thing that their marketing department creates the illusion of these technologies being Apple's own. This is actually an example of how AAC suffers in the public perception because people think it's proprietary where the opposite is true.

How is the actual music, you ask? Good. The album is a nice mixture of jazz and classical music, both in smooth and in more lively forms, great for a nice dinner and produced with a very high quality. Being a sampler, this album gives you a good overview of current Linn Records productions, so you can choose your favourite artists and then dig deeper into the music you liked most.

There's one drawback still: The high-res files available on the Linn Records download store are currently stereo only, while the physical SACD releases come with 5.1 surround sound. It would be nice if they could introduce 5.1 FLAC downloads in the future. That would make downloading high resolution audio content perfect, and this silly SACD/DVD-Audio/Dolby-TrueHD/DTS-HD Master Audio war would finally be over.

P.S.: A big hello to the folks at who were so kind to link to my previous high resolution audio entry!


Thursday Sep 20, 2007

Robot Surgery 101

A Roomba vacuuming robot.Today, Roomba was ill. Roomba is our vacuuming robot.

How come we have one? Well, during CeBIT 2006 I met Roger Meike. He works at SunLabs and was presenting one of my favourite projects: Sun SPOT. Anyway, we talked a bit about robots and if they really can be useful or just expensive toys. He recommended Roomba to me and said it would really work.

So I checked out the iRobot home page. Gosh! Their main business is building robots for military purposes! You know, just like in the movies with robot arms and belt-drive, searching for bombs, that kind of thing. So if there is a company capable of building a real working vacuum bot, then it must be these guys. Millions of dollars of robot research hovering around my home for just a few hundred bucks.

It actually works well and Roomba has been vacuuming our home for quite a while now. The deal is this: You save time because you don't have to vacuum your home, Roomba does it while you're away working or at the supermarket. It may be slower, but it doesn't steal you time, so it really acts as a time-saver. If you save, say 15 minutes of time per week by not vacuuming and value your free time at, say 10-20 dollars per hour, then you have more than paid off the cost of a Roomba after a year or two. And you can have a lot of geek fun too!

Anyway, today Roomba decided to not work correctly anymore, it would spin around in circles and never go straight. After checking iRobot's support area and some forums, it seemed to be the bumper sensor. But thorough cleaning of the robot and more or less gentle banging on its bumper wouldn't improve Roomba's behaviour much. More drastic measures were needed.

Thanks to some diagnostic instructions I found on the web, it became clear that the right bumper sensor was really kinda stuck. Fortunately, I found some well-documented disassembly instructions and was able to remove the bumper assembly to check out the sensors. Roomba uses a lot of optical sensors, even for the mechanical stuff (based on rods interrupting light flow etc.) which make them very robust, but sometimes sensitive to dust cluttering. Now that the sensors have all been cleared, our Roomba is back and can continue to happily vacuum our home again!

Thursday Sep 06, 2007

7 Easy Tips for ZFS Starters

So you're now curious about ZFS. Maybe you read Jonathan's latest blog entry on ZFS or you've followed some other buzz on the Solaris ZFS file system or maybe you saw a friend using it. Now it's time for you to try it out yourself. It's easy and here are seven tips to get you started quickly and effortlessly:

1. Check out what Solaris ZFS can do for you

First, try to compose yourself a picture of what the Solaris ZFS filesystem is, what features it has and how it can work to your advantage. Check out the CSI:Munich video for a fun demo on how Solaris ZFS can turn 12 cheap USB memory sticks into highly available, enterprise-class, robust storage. Of course, what works with USB sticks also works with your own harddisks or any other storage device. Also, there are great ZFS screencasts that show you some more powerful features in an easy to follow way. Finally, there's a nice writeup on "What is ZFS?" at the OpenSolaris ZFS Community's homepage.

2. Read some (easy) documentation

It's easy to configure Solaris ZFS. Really. You just need to know two commands: zpool (1M) and zfs (1M). That's it. So, get your hands onto a Solaris system (or download and install it for free) and take a look at those manpages. If you still want more, then there's of course the ZFS Administration Guide with detailed planning, configuration and troubleshooting steps. If you want to learn even more, check out the OpenSolaris ZFS Community Links page. German-speaking readers are invited to read my german white paper on ZFS or listen to episode #006 of the POFACS podcast.

3. Dive into the pool

Solaris ZFS manages your storage devices in pools. Pools are a convenient way of abstracting storage hardware and turning it into a repository of blocks to store your data in. Each pool takes a number of devices and applies an availability scheme (or none) to it. Pools can then be easily expanded by adding more disks to them. Use pools to manage your hardware and its availability properties. You could create a mirrored pool for data that should be protected against disk failure and that needs fast access to hardware. Then, you could add another pool using RAID-Z (which is similar, but better than RAID-5) for data that needs to be protected but where performance is not the first priority. For scratch, test or demo data, a pool without any RAID scheme is ok, too. Pools are easily created:

zpool create mypool mirror c0d0 c1d0

Will create a mirror out of the two disk devices c0d0 and c1d0. Similarly, you can easily create a RAID-Z pool by saying:

zpool create mypool raidz c0d0 c1d0 c2d0

The easiest way to turn a disk into a pool is:

zpool create mypool c0d0

It's that easy. All the complexity of finding, sanity-checking, labeling, formatting and managing disks is hidden behind this simple command.

If you don't have any spare disks to try this out with, then you can just create yourself some files, then use them as if they were block devices:

# mkfile 128m /export/stuff/disk1
# mkfile 128m /export/stuff/disk2
# zpool create testpool mirror /export/stuff/disk1 /export/stuff/disk2
# zpool status testpool
pool: testpool
state: ONLINE
scrub: none requested

testpool ONLINE 0 0 0
mirror ONLINE 0 0 0
/export/stuff/disk1 ONLINE 0 0 0
/export/stuff/disk2 ONLINE 0 0 0

errors: No known data errors

The cool thing about this procedure is that you can create as many virtual disks as you like and then test ZFS's features such as data integrity, self-healing, hot spares, RAID-Z and RAID-Z2 etc. without having to find any free disks.

When creating a pool for production data, think about redundancy. There are three basic properties to storage: availability, performance and space. And it's a good idea to prioritize them in that order: Make sure you have redundancy (mirroring, RAID-Z, RAID-Z2) so ZFS can self-heal data when stuff goes wrong at the hardware level. Then decide how much performance you want. Generally, mirroring is faster and more flexible than RAID-Z/Z2, especially if the pool is degraded and ZFS needs to reconstruct data. Space is the cheapest of all three, so don't be greedy and try to give priority to the other two. Richard Elling has some great recommendations on RAID, space and MTTDL. Roch has also posted a great article on mirroring vs. RAID-Z.

4. The power to give

Once you have set up your basic pool, you can already access your new ZFS file system: Your pool has been automatically mounted for you in the root directory. If you followed the examples above, then you can just cd to /mypool and start using ZFS!

But there's more: Creating additional ZFS file systems that use your pool's resources is very easy, just say something like:

zfs create mypool/home
zfs create mypool/home/johndoe
zfs create mypool/home/janedoe

Each of these commands only takes seconds to complete and every time you will get a full new file system, already set up and mounted for you to start using it immediately. Notice that you can manage your ZFS filesystems hierarchically as seen above. Use pools to manage storage properties at the hardware level, use filesystems to present storage to your users and applications. Filesystems have properties (compression, quotas, reservations, etc.) that you can easily administer using zfs set and that are inherited across the hierarchy. Check out Chris Gerhard's blog on more thoughts about file system organization.

5. Snapshot early, snapshot often

ZFS snapshots are quick, easy and cheap. Much cheaper than the horrible experience when you realize that you just deleted a very important file that hasn't been backed up yet! So, use snapshots whenever you can. If you think about whether to snapshot or not, just do it. I recently spent only about $220 on two 320 GB USB disks for my home server to expand my pool with. At these prices, the time you spend thinking about whether to snapshot or not may be more worth than just buying more disk.

Again, Chris has some wisdom on this topic in his ZFS snapshot massacre blog entry. He once had over 60000 snapshots and he's snapshotting filesystems by the minute! Since snapshots in ZFS “just work” and since they only take up the space that actually changes between snapshots, there's really no reason to not doing snapshots all the time. Maybe once per minute is a little bit exaggerated, but once a week, once per day or once an hour per active filesystem is definitely good advice.

Instead of time based snapshotting, Chris came up with the idea to snapshot a file system shared with Samba whenever the Samba user logs in!

6. See the Synergy

ZFS by itself is very powerful. But the full beauty of it can be unleashed by combining ZFS with other great Solaris 10 features. Here are some examples:

  • Tim Foster has written a great SMF service that will snapshot your ZFS filesystems on a regular basis. It's fully automatic, configurable and integrated with SMF in a beautiful way.

  • ZFS can create block devices, too. They are called zvols. Since Nevada build 54, they are fully integrated into the Solaris iSCSI infrastructure. See Ben Rockwood's blog entry on the beauty of iSCSI with ZFS.

  • A couple of people are now elevating this concept even further: Take two Thumpers, create big zvols inside them, export them through iSCSI and mirror over them with ZFS on a server. You'll get a huge, distributed storage subsystem that can be easily exported and imported on a regular network. A poor man's SAN and a powerful shared storage for future HA clusters thanks to ZFS, iSCSI and Thumper! Jörg Möllenkamp is taking this concept a bit further by thinking about ZFS, iSCSI, Thumper and SAM-FS.

  • Check out some cool Sun StorageTek Availability Suite and ZFS demos here.

  • ZFS and boot support is still in the works, but if you're brave, you can try it out with the newer Solaris Nevada distributions on x64 systems. Think about the possibilities together with Solaris Live Upgrade! Create a new boot environment in seconds while not needing to find or dedicate a new partition, thanks to snapshots, while saving most of the needed disk space!

And that's only the beginning. As ZFS becomes more and more adopted, we'll see many more creative uses of ZFS with other Solaris 10 technologies and other OSes.

7. Beam me up, ZFS!

One of the most amazing features of ZFS is zfs send/receive. zfs send will turn a ZFS filesystem into a bitstream that you can save to a file, pipe through bzip2 for compression or send through ssh to a distant server for archiving or for remote replication through the corresponding zfs receive command. It also supports incremental sending and receiving out of subsequent snapshots through the -i modifier.

This is a powerful feature with a lot of uses:

  • Create your Solaris zone as a ZFS filesystem, complete with applications, configuration, automation scripts, users etc., zfs send | bzip2 >zone_archive.zfs.bz2 it for later use. Then, unpack and create hundreds of cloned zones out of this master copy.

  • Easily migrate ZFS filesystems between pools on the same machine or on distant machines (through ssh) with zfs send/receive.

  • Create a crontab entry that takes a snapshot every minute, then zfs send -i it over ssh to a second machine where it is piped into zfs receive. Tadah! You'll get free, finely-grained, online remote replication of your precious data.

  • Easily create efficient full or incremental backups of home directories (each in their own ZFS filesystems) through ZFS send. Again, you can compress them and treat them like you would, say, treat a tar archive.

See? It is easy, isn't it? I hope this guide helps you find your way around the world of ZFS. If you want more, drop by the OpenSolaris ZFS Community, we have a mailing list/forum where bright and friendly people hang out that will be glad to help you.


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« April 2014