By Darius Zanganeh-Oracle on Aug 21, 2014
Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block read workloads? In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that READ in large blocks like 64k, 128k, 256k ++ etc.. Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads.
Netapp’s Fundamental Problem
Update:10/21/14 Well, it turns out that the netapp whitepaper referenced below is wrong/incomplete and doesn't tell the whole story. The netapp will try and lay the blocks out contiguously and retrieve multiple blocks per IO using some method they call chaining. So they are not as bad at large block as it may seem. They are still slow we just do not know the real reason they are slow. Could it be their CPU bound architecture? Lack of ability to do SMP multithreading like the Solaris kernel? The answer is I don't know. I don't know why they won't post an SPC-2 benchmark and I am not sure why the FAS box is slow at such a workload. I would bet anyone that if performance was even modestly OK that you would see netapp posting a benchmark for the 8080EX.
So fundamentally the issue still remains that the FAS box is not created for these types of workloads and will run out gas much faster then a ZS3 therefore costing the customer more $/IOP and $/GBs
The fundamental problem you have running these workloads on Netapp is the backend block size of their WAFL file system. Every application block on a Netapp FAS ends up in a 4k chunk on a disk. Reference: Netapp TR-3001 Whitepaper
Netapp has proven this lacking large block performance fact in at least two different ways.
- They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently.
- In 2011 they purchased Engenio to try and fill this GAP in their portfolio.
Block Size Matters
So why does block size matter anyways? Many applications use large block chunks of data especially in the Big Data movement. Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you.
9-10-14 (Updated with more detail, the facts haven't changed just more detail because Netapp was upset about the way I made their writes look. Even though the original entry focused on READS! So to be fair and friendly I have updated the write section with more detail)
If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes
and 16 different DISK IOPS. Now, to be fair. These WAFL writes will mostly be hidden or masked from the application via the write cache like all Storage Arrays have, in Netapp's case they have a couple of NVRAM pools that get flushed every 10 seconds or when they start getting full.
Now it is quite possible a customer could run into a huge problem here as well with the NVRAM buffers not being able to flush faster then they get full. These are usually called "Back to Back CPs". Netapp even has a systat to measure this potential problem. Netapp will tell you they have good sequential write performance; "the main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk" So for writes this is mostly true unless you blow through the relatively small amount of NVRAM. If you hit that bottleneck you cannot grow your nvram but you will have to buy a bigger Netapp FAS Filer. With the Oracle ZS-3 we use a different method to handle write cache. ZFS uses something called a ZFS Intent Log or ZIL. The ZIL is based on Write Optimized SLC SSD's and can be expanded on the fly by simply hot adding more ZIL drives to the given ZFS pool.
But this post is much less about writes and more about reads. The only reason I even wrote about writes is because Netapp has a fixed Back-end block size of 4k which can be very limiting on large block read workloads. So it seemed natural to explain that.
When the application later goes to read that 64k chunk the
Netapp will have to again do 16 different disk IOPS for data that is not chained together on the netapp disk. In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB. So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO. That is 16x faster! But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall. Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO. Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system. You can do some simple math in excel and very quickly you will see why it matters. Here are a couple of READ examples using SAS and MSSQL. Assume these are the READ IOPS the application needs even after all the fancy cache and prefetch read-ahead algorithms or chaining. Netapp will tell you they will get around this via read ahead and chaining and the fact that the data is laid out on the disk contiguously but at some point with these large block reads even that is going to fail. If it didn't fail, I guarantee you netapp would be posting SPC-2 throughput numbers and brag about it...
Here is an example with 128k blocks. Notice the numbers of drives on the Netapp!
Here is an example with 64k blocks
You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives. This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers. So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today.
ZS3 World Record Price/Performance in the SPC-2 benchmark
ZS3-2 is #1 in Price Performance $12.08
ZS3-2 is #3 in Overall Performance 16,212 MBPS
Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79. A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.