Thursday Aug 21, 2014

Why your Netapp is so slow...

Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block read workloads?  In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that READ in large blocks like 64k, 128k, 256k ++ etc..  Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads.

Netapp’s Fundamental Problem

Netapp has proven this lacking large block performance fact in at least two different ways. 

  1. They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently.
  2. In 2011 they purchased Engenio to try and fill this GAP in their portfolio.

Block Size Matters

So why does block size matter anyways?  Many applications use large block chunks of data especially in the Big Data movement.  Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you.


If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes.   Now, to be fair.  These WAFL writes will mostly be hidden or masked from the application via the write cache like all Storage Arrays have, in Netapp's case they have a couple of NVRAM pools that get flushed every 10 seconds or when they start getting full.

Now it is quite possible a customer could run into a huge problem here as well with the NVRAM buffers not being able to flush faster then they get full.  These are usually called "Back to Back CPs".  Netapp even has a systat to measure this potential problem.  Netapp will tell you they have good sequential write performance; "the main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk"  So for writes this is mostly true unless you blow through the relatively small amount of NVRAM.  If you hit that bottleneck you cannot grow your nvram but you will have to buy a bigger Netapp FAS Filer.  With the Oracle ZS-3 we use a different method to handle write cache.  ZFS uses something called a ZFS Intent Log or ZIL.  The ZIL is based on Write Optimized SLC SSD's and can be expanded on the fly by simply hot adding more ZIL drives to the given ZFS pool. 

But this post is much less about writes and more about reads.  The only reason I even wrote about writes is because Netapp has a fixed Back-end block size of 4k which can be very limiting on large block read workloads.  So it seemed natural to explain that.


When the application later goes to read that 64k chunk the Netapp will have to again do 16 different disk IOPS for data that is not chained together on the netapp disk.  In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB.  So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO.  That is 16x faster!  But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall.  Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO.  Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system.  You can do some simple math in excel and very quickly you will see why it matters.  Here are a couple of READ examples using SAS and MSSQL.  Assume these are the READ IOPS the application needs even after all the fancy cache and prefetch read-ahead algorithms or chaining.  Netapp will tell you they will get around this via read ahead and chaining and the fact that the data is laid out on the disk contiguously but at some point with these large block reads even that is going to fail.  If it didn't fail,  I guarantee you netapp would be posting SPC-2 throughput numbers and brag about it...

Here is an example with 128k blocks.  Notice the numbers of drives on the Netapp!

Here is an example with 64k blocks

You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives.  This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers.  So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today. 

ZS3 World Record Price/Performance in the SPC-2 benchmark 
ZS3-2 is #1 in Price Performance $12.08
ZS3-2 is #3 in Overall Performance 16,212 MBPS

Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79.  A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.

Tuesday Sep 10, 2013

ZFS Storage Appliance Benchmarks Destroy Netapp, EMC, Hitachi, HP etc..

Today Oracle released two new storage products and also posted two World Record Benchmarks!

First, Oracle posted a new SPC-2 throughput benchmark that is faster than anything else posted on the planet! The benchmark came in at 17,244 MBPS.  What is probably even more amazing then this result is the cost of the system to accomplish this benchmark.  Oracle by far has the lowest cost per MBPS compared to our major competitors coming in at $22.53.  IBM by contrast comes in at $131.21.   I should also note that Oracle accomplished this with less hardware then IBM.  The ZS3-4 entry uses 384 drives while the IBM DS8700 uses 480 drives.  When it comes to throughput applications the Oracle ZS3-4 beats every competitor in every category of the SPC-2 Benchmark.  You can read the details here yourself.

The second benchmark posted was the SPECsfs2008 benchmark.  Full Disclosure: SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation benchmark suite measuring file server throughput and response time. Oracle did break the World Record response time measurement coming in at .70ms.  Oracle did not break the record for most operations per second, but in many ways those benchmarks are sort of silly because you would be comparing such apples and oranges.  The Oracle benchmark is based on a standard 2-node cluster with some disk and flash behind it.  The vendors with higher OPS/sec numbers have mostly enormous configs or all flash configs which are mostly irrelevant for all but a few niche workloads.  For the average NAS user the ZS3-4 config used for this benchmark is perfectly inline with what most customer purchase to run things like Oracle Databases, VMware, MS SQL etc...

The closest 2 node cluster comparable is the recent Hitachi HNAS 4100 which came in at 293,128 OPS/sec with a latency of 1.67ms.  Compare that to the 2 node ZS3-4 entry with 450,702 OPS/sec and .70ms latency.  That latency is more then twice as fast as the Hitachi's and it still blows it away in OPS/sec.  They both have very close to the same number of drives as well which is interesting.  You can read the actual results here.  With the new ZFS Storage Appliance hardware/software combo Oracle can clearly see the competition in the rearview mirror when it comes to performance.  Many analyst's have also recently commented on the Oracle ZFS Storage Line-up.


Various information about Oracle Storage.


« February 2016