By Darius Zanganeh-Oracle on Aug 21, 2014
Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block read workloads? In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that READ in large blocks like 64k, 128k, 256k ++ etc.. Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads.
Netapp’s Fundamental ProblemNetapp has proven this lacking large block performance fact in at least two different ways.
- They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently.
- In 2011 they purchased Engenio to try and fill this GAP in their portfolio.
Block Size Matters
So why does block size matter anyways? Many applications use large block chunks of data especially in the Big Data movement. Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you.
If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes. Now, to be fair. These WAFL writes will mostly be hidden or masked from the application via the write cache like all Storage Arrays have, in Netapp's case they have a couple of NVRAM pools that get flushed every 10 seconds or when they start getting full.
Now it is quite possible a customer could run into a huge problem here as well with the NVRAM buffers not being able to flush faster then they get full. These are usually called "Back to Back CPs". Netapp even has a systat to measure this potential problem. Netapp will tell you they have good sequential write performance; "the main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk" So for writes this is mostly true unless you blow through the relatively small amount of NVRAM. If you hit that bottleneck you cannot grow your nvram but you will have to buy a bigger Netapp FAS Filer. With the Oracle ZS-3 we use a different method to handle write cache. ZFS uses something called a ZFS Intent Log or ZIL. The ZIL is based on Write Optimized SLC SSD's and can be expanded on the fly by simply hot adding more ZIL drives to the given ZFS pool.
But this post is much less about writes and more about reads. The only reason I even wrote about writes is because Netapp has a fixed Back-end block size of 4k which can be very limiting on large block read workloads. So it seemed natural to explain that.
When the application later goes to read that 64k chunk the Netapp will have to again do 16 different disk IOPS for data that is not chained together on the netapp disk. In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB. So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO. That is 16x faster! But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall. Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO. Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system. You can do some simple math in excel and very quickly you will see why it matters. Here are a couple of READ examples using SAS and MSSQL. Assume these are the READ IOPS the application needs even after all the fancy cache and prefetch read-ahead algorithms or chaining. Netapp will tell you they will get around this via read ahead and chaining and the fact that the data is laid out on the disk contiguously but at some point with these large block reads even that is going to fail. If it didn't fail, I guarantee you netapp would be posting SPC-2 throughput numbers and brag about it...
Here is an example with 128k blocks. Notice the numbers of drives on the Netapp!
Here is an example with 64k blocks
You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives. This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers. So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today.
ZS3 World Record Price/Performance in the SPC-2 benchmark
ZS3-2 is #1 in Price Performance $12.08
ZS3-2 is #3 in Overall Performance 16,212 MBPS
Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79. A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.