Thursday Jun 26, 2008

ZFS and Mac OS X Time Machine: The Perfect Team

A few months ago, I wrote about "X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage". And thanks to you, my readers, it became one of my most popular blog entries. Then I wrote about "VirtualBox and ZFS: The Perfect Team", which turned out to be another very popular blog article. Well, I'm glad to introduce you to another perfect team now: Solaris ZFS and Mac OS X Time Machine.

Actually, it began a long time ago: In December '06, Ben Rockwood wrote about the beauty of ZFS and iSCSI integration, and immediatley I thought "That's the perfect solution to back up my Mac OS X PowerBook!" No more strings attached, just back up over WLAN to a really good storage device that lives on Solaris ZFS, while still using the Mac OS X native file system peculiarities. But Apple didn't have an iSCSI initiator yet (they still don't have one now) and the only free iSCSI initiator I could find was buggy, unstable and didn't like Solaris targets at all.

Then, Apple announced their Time Machine technology. Many people thought that this was related to them supporting ZFS and in fact, it's easy to believe that Time Machine's travels back in time are supported by ZFS snapshots. But they aren't. In reality, it's just a clever use of hardlinks. And not a very efficient one, too: Whenever a file changes, the whole file gets backed up again, even if you just changed a little bit of it.

Last week, a colleague of mine told me that Studio Networks Solutions had updated their globalSAN iSCSI Initiator software for Mac OS X and that it now works well with Solaris iSCSI targets. I decided to give it another try. So, here are two ZFS ZVOLs sitting on my OpenSolaris 2008.05 home server:

Sun Microsystems Inc.   SunOS 5.11      snv_86  January 2008
-bash-3.2$ zfs list -rt volume 
NAME                             USED  AVAIL  REFER  MOUNTPOINT
santiago/volumes/aperturevault  6.50G   631G  6.50G  -
santiago/volumes/mbptm           193G   631G   193G  -

They have both been shared as iSCSI targets through a single zfs set shareiscsi=on santiago/volumes command, thanks to ZFS' attribute inheritance:

-bash-3.2$ zfs get shareiscsi santiago/volumes
NAME              PROPERTY    VALUE             SOURCE
santiago/volumes  shareiscsi  on                local
-bash-3.2$ zfs get shareiscsi santiago/volumes/aperturevault
NAME                            PROPERTY    VALUE                           SOURCE
santiago/volumes/aperturevault  shareiscsi  on                              inherited from santiago/volumes
-bash-3.2$ zfs get shareiscsi santiago/volumes/mbptm
NAME                    PROPERTY    VALUE                   SOURCE
santiago/volumes/mbptm  shareiscsi  on                      inherited from santiago/volumes

On the Mac side, they show up in the globalSAN GUI just nicely:


And Disk Utility can format them perfectly as if they were real disks:

Disk Utility with 2 iSCSI disks 

Time Machine happily accepted one of the iSCSI disks and synced more than 190GB to it just fine and as I type these lines, Aperture is busy syncing more than 40GB of photos to the other iSCSI disk (it wouldn't accept a network share). Sometimes, they're busy working simultaneously :).

Of course, iSCSI performance heavily depends on network performance, so for larger transfers, a cable connection is mandatory. But the occasional Time Machine or Aperture sync in the background runs just fine over WLAN.

So finally, Solaris and Mac fans can have a Time Machine based on ZFS, with real data integrity, redundancy, robustness, two different ways of travelling through time (ZVOLs can be snapshotted just like regular ZFS file systems) and much more.

Many thanks to Christiano for letting me know and to the guys at Studio Network Solutions for making this possible. And of course to the ZFS team for a wonderful piece of open storage software!

Thursday Dec 06, 2007

X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage

Digital video editing is one of those applications that tend to be very data hungry. At SD PAL resolution, we're talking about 720 pixels x 576 lines x 3 bytes of color x 25 full frames per second = about 30 MB/s of data. That's about 224 GB for a 2 hour feature film. Not counting audio (that would only be around 3-4 GB). And we (in Germany) haven't looked at HD or Digital Cinema a lot yet...

During the last couple of weeks I worked with a customer who bought a Sun Fire X4500 server (you know, Thumper). The plan is to run Solaris ZFS on it, then provide big iSCSI volumes to the video editing systems, which tend to be specialized Windows or Mac OS X machines. Wonderful idea: Just use zpool create to combine a number of disks with some RAID level into a pool, then zfs create -V to create a ZVOL. Thanks to zfs shareiscsi=on, sharing the volume over iSCSI is dead easy.

But it didn't work.

First, Windows wouldn't mount the iSCSI volume. After some trying, we discovered that there must be an upper limit of 2TB to the size of iSCSI volumes that Windows can mount (we initially tried something like 5 ot 10TB). So be it: zfs create -V 2047G videopool/videovolume.

Now it mounted ok, we formatted the disk with NTFS (yuck!) and started the editing system's speed test. Then came the real issue: The test reported a write performance of 8-10 MB/s, but the editing system needs something like 30 MB/s sustained to be able to record reliably!

After some trying, we started the systematic approach:

  • A simple dd from one disk to another yielded >39 MB/s.
  • dd'ing from one small ZFS pool to another exceeded 120 MB/s (I later learned that cp is a better benchmark because it works asynchronously with large chunks of data vs. dd's synchronous block approach), so that was again more than we needed.
  • We tried re-attaching our ZVOL through iscsiadm to test the iSCSI stack's performance and ran into a TCP fusion issue. Ok, I've always wanted to play with mdb, so we followed the workaround instructions and we were able to attach our own ZVOL over the loopback interface. Slightly less performance (due to up the stack, down the stack effects, I presume) but still way more than we needed. So, it wasn't the X4500's nor ZFS' fault.

Finally, Danilo pointed me into the right direction: Nagle's algorithm. What usually helps maximize network bandwidth turns out to be a killer for iSCSI performance. For Solaris iSCSI clients, we know this already,  but how do we turn off Nagle on Windows?

The answer is deeply buried inside the Microsoft's iSCSI Initiator user guide: The "Addressing Slow Performance with iSCSI Clusters" chapter mentions a similar issue (although they talk about read not write performance) and they do mention RFC 1122's delayed ACK feature, which is related to Nagle's algorithm. The Microsoft document suggests a workaround which involves setting a variable in the registry, so it was worth a try (and my vengeance for having to use mdb before).

And low and behold, the speed test now yielded 90-100 MB/s (Close to a GBE's raw performance)! Yipee that was it! One little registry entry on the client side gave us a 10x improvement in iSCSI performance!

Now, can someone explain to me, why on Windows 2000 you need to set "TcpAckDelTicks=0" while on Windows 2003 the same thing is accomplished by saying "TcpAckFrequency=1" (which is the same thing, only seen from the other side of the division sign)?

So, to all you storage hungry video editors out there: The Sun Fire X4500 with Solaris ZFS and iSCSI is a great solution for reliable, fast, easy to use and inexpensive video storage. You just need to know how to tell your TCP/IP stack to not delay ACKs...


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« July 2016