Thursday Dec 06, 2007

X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage

Digital video editing is one of those applications that tend to be very data hungry. At SD PAL resolution, we're talking about 720 pixels x 576 lines x 3 bytes of color x 25 full frames per second = about 30 MB/s of data. That's about 224 GB for a 2 hour feature film. Not counting audio (that would only be around 3-4 GB). And we (in Germany) haven't looked at HD or Digital Cinema a lot yet...

During the last couple of weeks I worked with a customer who bought a Sun Fire X4500 server (you know, Thumper). The plan is to run Solaris ZFS on it, then provide big iSCSI volumes to the video editing systems, which tend to be specialized Windows or Mac OS X machines. Wonderful idea: Just use zpool create to combine a number of disks with some RAID level into a pool, then zfs create -V to create a ZVOL. Thanks to zfs shareiscsi=on, sharing the volume over iSCSI is dead easy.

But it didn't work.

First, Windows wouldn't mount the iSCSI volume. After some trying, we discovered that there must be an upper limit of 2TB to the size of iSCSI volumes that Windows can mount (we initially tried something like 5 ot 10TB). So be it: zfs create -V 2047G videopool/videovolume.

Now it mounted ok, we formatted the disk with NTFS (yuck!) and started the editing system's speed test. Then came the real issue: The test reported a write performance of 8-10 MB/s, but the editing system needs something like 30 MB/s sustained to be able to record reliably!

After some trying, we started the systematic approach:

  • A simple dd from one disk to another yielded >39 MB/s.
  • dd'ing from one small ZFS pool to another exceeded 120 MB/s (I later learned that cp is a better benchmark because it works asynchronously with large chunks of data vs. dd's synchronous block approach), so that was again more than we needed.
  • We tried re-attaching our ZVOL through iscsiadm to test the iSCSI stack's performance and ran into a TCP fusion issue. Ok, I've always wanted to play with mdb, so we followed the workaround instructions and we were able to attach our own ZVOL over the loopback interface. Slightly less performance (due to up the stack, down the stack effects, I presume) but still way more than we needed. So, it wasn't the X4500's nor ZFS' fault.

Finally, Danilo pointed me into the right direction: Nagle's algorithm. What usually helps maximize network bandwidth turns out to be a killer for iSCSI performance. For Solaris iSCSI clients, we know this already,  but how do we turn off Nagle on Windows?

The answer is deeply buried inside the Microsoft's iSCSI Initiator user guide: The "Addressing Slow Performance with iSCSI Clusters" chapter mentions a similar issue (although they talk about read not write performance) and they do mention RFC 1122's delayed ACK feature, which is related to Nagle's algorithm. The Microsoft document suggests a workaround which involves setting a variable in the registry, so it was worth a try (and my vengeance for having to use mdb before).

And low and behold, the speed test now yielded 90-100 MB/s (Close to a GBE's raw performance)! Yipee that was it! One little registry entry on the client side gave us a 10x improvement in iSCSI performance!

Now, can someone explain to me, why on Windows 2000 you need to set "TcpAckDelTicks=0" while on Windows 2003 the same thing is accomplished by saying "TcpAckFrequency=1" (which is the same thing, only seen from the other side of the division sign)?

So, to all you storage hungry video editors out there: The Sun Fire X4500 with Solaris ZFS and iSCSI is a great solution for reliable, fast, easy to use and inexpensive video storage. You just need to know how to tell your TCP/IP stack to not delay ACKs...

Friday Mar 09, 2007

CSI:Munich - How to save the world with ZFS and 12 USB Sticks

Here's a fun video that shows how cool the Sun Fire X4500 (codename: Thumper) is and how you can create your own Thumper experience on your laptop using Solaris ZFS and 12 USB sticks:

This is finally the english dubbed version of a German video that a couple of colleagues and I produced some weeks ago. If you don't mind the german language, you might enjoy the original german version, too (It turns out that the english language has a lot less redundancy than the german one, so please forgive the occasional soundless lip motions).

If you liked the video(s), let us know, we'll be glad to answer any questions, receive any leftover Oscars or accept any new ideas for future episodes.

Here are a few more details, in case you really want to try this at home:

The first hurdle to overcome is to teach Solaris how to accept more than 10 USB storage devices. On a plain vanilla Solaris 10 system, it turns out that there is a limitation: Connecting more than 10 USB sticks through 3 USB-powered Hubs yields a Connecting device on port n failed error. Thanks to a colleague from engineering, the fix is to set ehci:ehci_qh_pool_size = 120 in /etc/system.

The second issue is briefly explained in the video itself: Not all USB sticks (particularly the cheap ones) are created equal. Small variations in the components create small variations in their storage space. So, when creating a zpool, you need to use -f to tell zpool to ignore differing device sizes.

If you pay close attention to the video, you'll notice around 7:20 that pulling a hub wasn't so harmless at all: "errors: 8 data errors, use '-v' for a list" can be seen at the bottom of the teminal window. In fact, zpool status reports 6 checksum errors in c21t0d0p0. Well, using cheap USB sticks means that block errors can occur in practice and once you don't have enough redundancy (like after unplugging a USB hub for show effect) they may hurt you. Fortunately, they didn't hurt our particular demo, since on one hand ZFS' prefetch algorithm had most of the video in memory anyway, while on the other hand zpool scrub fixed any broken blocks after re-plugging the USB hub. So, the cheaper the storage the more redundancy one should add. In this case, RAID-Z2 would have been better. Perhaps we can get some more USB sticks and hubs from any sponsors?

Finally, it took us a couple of retrys until the remove-sticks-mix-then-replug stunt worked, because it turned out that the laptop's USB implementation wasn't as reliable as we needed it to be. And yes, it does help to wait until they've finished blinking before removing any sticks :).

All in all, it was great fun for us producing this video and thanks to the tireless efforts of Marc, our beloved but invisible video editor, we now can proudly present an english version. Actually, we were quite surprised by this video's success: We published it in early February and just a day later, it got noticed by a couple of Solaris engineering people. Now, we have more than 9000 views of the german version (counting the Google video and the YouTube edition together) and are still counting. Hopefully, we can cross the 10,000 views barrier with the english version, now that we have increased the potential audience :).

After watching the video, feel free to try out Solaris ZFS for yourself. There's nothing like building your own pool, then watching ZFS take care of your data. At home, ZFS keeps my photos, music and TV videos nice and tidy, including weekly snapshots thanks to Tim Foster's automatic snapshot SMF service. Just this tuesday, my weekly zpool scrub cron job told me it had fixed a broken block on one of my disks. One that I'd never found out with any other storage system.

To get started, get OpenSolaris here or download it here. All you need to do is check out the docs though real system heroes only need two man-pages: zpool(1M) and zfs(1M).

P.S.: CSI of course stands for "Computer Systems Integration". Any similarities to the popular TV show are purely coincidence. Really. Hmm, but maybe having a dead body or two in one of the next episodes might spice up things a little...

P.P.S.: The cool rock music at the beginning is from XING a great rock band where one of our colleagues plays drums in. Go XING!

Update: Here is a much higher quality version, in case you want to show this video around on your laptop.


Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« July 2016