Thursday Dec 06, 2007

X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage

Digital video editing is one of those applications that tend to be very data hungry. At SD PAL resolution, we're talking about 720 pixels x 576 lines x 3 bytes of color x 25 full frames per second = about 30 MB/s of data. That's about 224 GB for a 2 hour feature film. Not counting audio (that would only be around 3-4 GB). And we (in Germany) haven't looked at HD or Digital Cinema a lot yet...

During the last couple of weeks I worked with a customer who bought a Sun Fire X4500 server (you know, Thumper). The plan is to run Solaris ZFS on it, then provide big iSCSI volumes to the video editing systems, which tend to be specialized Windows or Mac OS X machines. Wonderful idea: Just use zpool create to combine a number of disks with some RAID level into a pool, then zfs create -V to create a ZVOL. Thanks to zfs shareiscsi=on, sharing the volume over iSCSI is dead easy.

But it didn't work.

First, Windows wouldn't mount the iSCSI volume. After some trying, we discovered that there must be an upper limit of 2TB to the size of iSCSI volumes that Windows can mount (we initially tried something like 5 ot 10TB). So be it: zfs create -V 2047G videopool/videovolume.

Now it mounted ok, we formatted the disk with NTFS (yuck!) and started the editing system's speed test. Then came the real issue: The test reported a write performance of 8-10 MB/s, but the editing system needs something like 30 MB/s sustained to be able to record reliably!

After some trying, we started the systematic approach:

  • A simple dd from one disk to another yielded >39 MB/s.
  • dd'ing from one small ZFS pool to another exceeded 120 MB/s (I later learned that cp is a better benchmark because it works asynchronously with large chunks of data vs. dd's synchronous block approach), so that was again more than we needed.
  • We tried re-attaching our ZVOL through iscsiadm to test the iSCSI stack's performance and ran into a TCP fusion issue. Ok, I've always wanted to play with mdb, so we followed the workaround instructions and we were able to attach our own ZVOL over the loopback interface. Slightly less performance (due to up the stack, down the stack effects, I presume) but still way more than we needed. So, it wasn't the X4500's nor ZFS' fault.

Finally, Danilo pointed me into the right direction: Nagle's algorithm. What usually helps maximize network bandwidth turns out to be a killer for iSCSI performance. For Solaris iSCSI clients, we know this already,  but how do we turn off Nagle on Windows?

The answer is deeply buried inside the Microsoft's iSCSI Initiator user guide: The "Addressing Slow Performance with iSCSI Clusters" chapter mentions a similar issue (although they talk about read not write performance) and they do mention RFC 1122's delayed ACK feature, which is related to Nagle's algorithm. The Microsoft document suggests a workaround which involves setting a variable in the registry, so it was worth a try (and my vengeance for having to use mdb before).

And low and behold, the speed test now yielded 90-100 MB/s (Close to a GBE's raw performance)! Yipee that was it! One little registry entry on the client side gave us a 10x improvement in iSCSI performance!

Now, can someone explain to me, why on Windows 2000 you need to set "TcpAckDelTicks=0" while on Windows 2003 the same thing is accomplished by saying "TcpAckFrequency=1" (which is the same thing, only seen from the other side of the division sign)?

So, to all you storage hungry video editors out there: The Sun Fire X4500 with Solaris ZFS and iSCSI is a great solution for reliable, fast, easy to use and inexpensive video storage. You just need to know how to tell your TCP/IP stack to not delay ACKs...
 

About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries