X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage

Digital video editing is one of those applications that tend to be very data hungry. At SD PAL resolution, we're talking about 720 pixels x 576 lines x 3 bytes of color x 25 full frames per second = about 30 MB/s of data. That's about 224 GB for a 2 hour feature film. Not counting audio (that would only be around 3-4 GB). And we (in Germany) haven't looked at HD or Digital Cinema a lot yet...

During the last couple of weeks I worked with a customer who bought a Sun Fire X4500 server (you know, Thumper). The plan is to run Solaris ZFS on it, then provide big iSCSI volumes to the video editing systems, which tend to be specialized Windows or Mac OS X machines. Wonderful idea: Just use zpool create to combine a number of disks with some RAID level into a pool, then zfs create -V to create a ZVOL. Thanks to zfs shareiscsi=on, sharing the volume over iSCSI is dead easy.

But it didn't work.

First, Windows wouldn't mount the iSCSI volume. After some trying, we discovered that there must be an upper limit of 2TB to the size of iSCSI volumes that Windows can mount (we initially tried something like 5 ot 10TB). So be it: zfs create -V 2047G videopool/videovolume.

Now it mounted ok, we formatted the disk with NTFS (yuck!) and started the editing system's speed test. Then came the real issue: The test reported a write performance of 8-10 MB/s, but the editing system needs something like 30 MB/s sustained to be able to record reliably!

After some trying, we started the systematic approach:

  • A simple dd from one disk to another yielded >39 MB/s.
  • dd'ing from one small ZFS pool to another exceeded 120 MB/s (I later learned that cp is a better benchmark because it works asynchronously with large chunks of data vs. dd's synchronous block approach), so that was again more than we needed.
  • We tried re-attaching our ZVOL through iscsiadm to test the iSCSI stack's performance and ran into a TCP fusion issue. Ok, I've always wanted to play with mdb, so we followed the workaround instructions and we were able to attach our own ZVOL over the loopback interface. Slightly less performance (due to up the stack, down the stack effects, I presume) but still way more than we needed. So, it wasn't the X4500's nor ZFS' fault.

Finally, Danilo pointed me into the right direction: Nagle's algorithm. What usually helps maximize network bandwidth turns out to be a killer for iSCSI performance. For Solaris iSCSI clients, we know this already,  but how do we turn off Nagle on Windows?

The answer is deeply buried inside the Microsoft's iSCSI Initiator user guide: The "Addressing Slow Performance with iSCSI Clusters" chapter mentions a similar issue (although they talk about read not write performance) and they do mention RFC 1122's delayed ACK feature, which is related to Nagle's algorithm. The Microsoft document suggests a workaround which involves setting a variable in the registry, so it was worth a try (and my vengeance for having to use mdb before).

And low and behold, the speed test now yielded 90-100 MB/s (Close to a GBE's raw performance)! Yipee that was it! One little registry entry on the client side gave us a 10x improvement in iSCSI performance!

Now, can someone explain to me, why on Windows 2000 you need to set "TcpAckDelTicks=0" while on Windows 2003 the same thing is accomplished by saying "TcpAckFrequency=1" (which is the same thing, only seen from the other side of the division sign)?

So, to all you storage hungry video editors out there: The Sun Fire X4500 with Solaris ZFS and iSCSI is a great solution for reliable, fast, easy to use and inexpensive video storage. You just need to know how to tell your TCP/IP stack to not delay ACKs...
 

Comments:

[Trackback] constantin posted a nice howto for having iscsi storage fast as hell.

Posted by smue.org:useless stuff on December 07, 2007 at 04:15 AM CET #

And for Digital Cinema (with 2K or even 4K resolution) you can put the 2TB iSCSI volumes on multiple X4500!

Posted by Danilo Poccia on December 07, 2007 at 09:24 AM CET #

Hi

What was the size/setup of the zpool?

Posted by Mika on December 07, 2007 at 09:47 AM CET #

Hi Danilo! Yes, striping over multiple 2TB targets does work on the windows side. Kinda ugly, but it does :).

Mika: I'm not sure, but I think it was around 10 or so disks. Keep in mind that dd will not give you maximum performance. cp should yield more.

Thanks for reading&posting&linking,
Constantin

Posted by Constantin Gonzalez on December 07, 2007 at 10:04 AM CET #

I am quite concerned about the need to set TCP_NODELAY. The default was changed in Nevada back in February, but unless the iSCSI protocol is just totally braindead, it shouldn't make a difference. See CR's 6523439 and 6543549

Posted by Brian Utterback on December 07, 2007 at 12:21 PM CET #

Hi Brian, thank you for your comment. So it has been fixed for iSCSI initator in Nevada build 60. How about Solaris 10?

At least on Windows, this is still very needed, as we saw the 10x performance difference with a recent version of the Windows iSCSI initator.

BTW, I don't know whether this is necessary on Linux, but I assume it's easy to configure there as well.

Have a nice weekend, Constantin

Posted by Constantin Gonzalez on December 07, 2007 at 12:30 PM CET #

[Trackback] When you use the OpenSolaris iSCSI Target for a Windows Inititator, the performance may be not in the expected range: Constantin summarizes the steps in his blog entry "X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage" And low and behold, t...

Posted by c0t0d0s0.org on December 08, 2007 at 05:17 PM CET #

Hello Constantin,
I'm trying to generate performance numbers/information on NFS/iSCSI Solaris and Linux. I did post some numbers already, but i'm working on "understand" them, and explay the why and the "tunning" needed to make them better. I did not understand, the "nagle" tunning is needed on the target machine? I'm trying to make more tests with files larger than 2giga (4g), and the connection between the initiator (linux), and the target (solaris) is loosed. I have got excelent numbers, but with intensive cache hit :) i'm working on make such tests with bigger files.
http://www.posix.brte.com.br/blog/?p=89

Posted by Marcelo leal on December 11, 2007 at 06:48 AM CET #

This is great to see - in the video realm, we have had great success with Thumper and IP Video Surveillance too. Network/IP camera shipments are growing more than 50% while Analog/CCTV camera shipments are declining by 9% - so IP cameras are causing a big disruption in this space.

Now a 16-channel CCTV Network Video Recorder (NVR) costs about $11,000 (USD), holds about 500GB - so that costs users appox. $688/Camera. Look at an IP, 16-channel, 500GB Digital Video Recorder (DVR) that costs about $4,420 - and now we are down to $276/Camera...

Introduce Thumper - add surveillance software like ipConfigure, turn Thumper into a DVR, and what do we get? Let's say our Thumper cost is $60,000. It holds 48TB, so now we are down to $30/Camera - quite the difference! What's more, a traditional DVR can only handle 16 cameras, while a single Thumper can handle almost 2,000....

cool stuff...

Posted by Taylor Allis on December 14, 2007 at 08:50 AM CET #

Quite some time ago I shook my head on NetApp wanting me to sell iSCSI block device files on WAFL, and now you suggest the same thing on ZFS.

I hope Cifs serving will become a more sane alternative (no NTFS needed).

I am happy that with Oracle 11g you dont need that anymore and can do dNFS to storage servers like thumber.

Bernd

Posted by Bernd Eckenfels on December 15, 2007 at 06:29 PM CET #

Post a Comment:
Comments are closed for this entry.
About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries