Thursday Aug 21, 2014

Why your Netapp is so slow...

Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block read workloads?  In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that READ in large blocks like 64k, 128k, 256k ++ etc..  Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads.

Netapp’s Fundamental Problem

Update:10/21/14  Well, it turns out that the netapp whitepaper referenced below is wrong/incomplete and doesn't tell the whole story.  The netapp will try and lay the blocks out contiguously and retrieve multiple blocks per IO using some method they call chaining.   So they are not as bad at large block as it may seem.  They are still slow we just do not know the real reason they are slow.  Could it be their CPU bound architecture?  Lack of ability to do SMP multithreading like the Solaris kernel?  The answer is I don't know.  I don't know why they won't post an SPC-2 benchmark and I am not sure why the FAS box is slow at such a workload. I would bet anyone that if performance was even modestly OK that you would see netapp posting a benchmark for the 8080EX. 

So fundamentally the issue still remains that the FAS box is not created for these types of workloads and will run out gas much faster then a ZS3 therefore costing the customer more $/IOP  and $/GBs

The fundamental problem you have running these workloads on Netapp is the backend block size of their WAFL file system.  Every application block on a Netapp FAS ends up in a 4k chunk on a disk. Reference:  Netapp TR-3001 Whitepaper

Netapp has proven this lacking large block performance fact in at least two different ways. 

  1. They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently.
  2. In 2011 they purchased Engenio to try and fill this GAP in their portfolio.

Block Size Matters

So why does block size matter anyways?  Many applications use large block chunks of data especially in the Big Data movement.  Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you.

9-10-14 (Updated with more detail, the facts haven't changed just more detail because Netapp was upset about the way I made their writes look.  Even though the original entry focused on READS!  So to be fair and friendly I have updated the write section with more detail)


If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes and 16 different DISK IOPS.   Now, to be fair.  These WAFL writes will mostly be hidden or masked from the application via the write cache like all Storage Arrays have, in Netapp's case they have a couple of NVRAM pools that get flushed every 10 seconds or when they start getting full.

Now it is quite possible a customer could run into a huge problem here as well with the NVRAM buffers not being able to flush faster then they get full.  These are usually called "Back to Back CPs".  Netapp even has a systat to measure this potential problem.  Netapp will tell you they have good sequential write performance; "the main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk"  So for writes this is mostly true unless you blow through the relatively small amount of NVRAM.  If you hit that bottleneck you cannot grow your nvram but you will have to buy a bigger Netapp FAS Filer.  With the Oracle ZS-3 we use a different method to handle write cache.  ZFS uses something called a ZFS Intent Log or ZIL.  The ZIL is based on Write Optimized SLC SSD's and can be expanded on the fly by simply hot adding more ZIL drives to the given ZFS pool. 

But this post is much less about writes and more about reads.  The only reason I even wrote about writes is because Netapp has a fixed Back-end block size of 4k which can be very limiting on large block read workloads.  So it seemed natural to explain that.


When the application later goes to read that 64k chunk the Netapp will have to again do 16 different disk IOPS for data that is not chained together on the netapp disk.  In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB.  So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO.  That is 16x faster!  But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall.  Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO.  Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system.  You can do some simple math in excel and very quickly you will see why it matters.  Here are a couple of READ examples using SAS and MSSQL.  Assume these are the READ IOPS the application needs even after all the fancy cache and prefetch read-ahead algorithms or chaining.  Netapp will tell you they will get around this via read ahead and chaining and the fact that the data is laid out on the disk contiguously but at some point with these large block reads even that is going to fail.  If it didn't fail,  I guarantee you netapp would be posting SPC-2 throughput numbers and brag about it...

Here is an example with 128k blocks.  Notice the numbers of drives on the Netapp!

Here is an example with 64k blocks

You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives.  This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers.  So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today. 

ZS3 World Record Price/Performance in the SPC-2 benchmark 
ZS3-2 is #1 in Price Performance $12.08
ZS3-2 is #3 in Overall Performance 16,212 MBPS

Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79.  A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.

Tuesday Sep 10, 2013

ZFS Storage Appliance Benchmarks Destroy Netapp, EMC, Hitachi, HP etc..

Today Oracle released two new storage products and also posted two World Record Benchmarks!

First, Oracle posted a new SPC-2 throughput benchmark that is faster than anything else posted on the planet! The benchmark came in at 17,244 MBPS.  What is probably even more amazing then this result is the cost of the system to accomplish this benchmark.  Oracle by far has the lowest cost per MBPS compared to our major competitors coming in at $22.53.  IBM by contrast comes in at $131.21.   I should also note that Oracle accomplished this with less hardware then IBM.  The ZS3-4 entry uses 384 drives while the IBM DS8700 uses 480 drives.  When it comes to throughput applications the Oracle ZS3-4 beats every competitor in every category of the SPC-2 Benchmark.  You can read the details here yourself.

The second benchmark posted was the SPECsfs2008 benchmark.  Full Disclosure: SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation benchmark suite measuring file server throughput and response time. Oracle did break the World Record response time measurement coming in at .70ms.  Oracle did not break the record for most operations per second, but in many ways those benchmarks are sort of silly because you would be comparing such apples and oranges.  The Oracle benchmark is based on a standard 2-node cluster with some disk and flash behind it.  The vendors with higher OPS/sec numbers have mostly enormous configs or all flash configs which are mostly irrelevant for all but a few niche workloads.  For the average NAS user the ZS3-4 config used for this benchmark is perfectly inline with what most customer purchase to run things like Oracle Databases, VMware, MS SQL etc...

The closest 2 node cluster comparable is the recent Hitachi HNAS 4100 which came in at 293,128 OPS/sec with a latency of 1.67ms.  Compare that to the 2 node ZS3-4 entry with 450,702 OPS/sec and .70ms latency.  That latency is more then twice as fast as the Hitachi's and it still blows it away in OPS/sec.  They both have very close to the same number of drives as well which is interesting.  You can read the actual results here.  With the new ZFS Storage Appliance hardware/software combo Oracle can clearly see the competition in the rearview mirror when it comes to performance.  Many analyst's have also recently commented on the Oracle ZFS Storage Line-up.

Tuesday Jul 10, 2012

Oracle ZFSSA Hybrid Storage Pool Demo

The ZFS Hybrid Storage Pool (HSP) has been around since the ZFSSA first launched.  It is one of the main contributors to the high performance we see on the Oracle ZFSSA both in benchmarks as well as many production environments.  Below is a short video I made to show at a high level just how impactful this HSP pool is on storage performance.  We squeeze a ton of performance out of our drives with our unique use of cache, write optimized ssd and read optimized ssd.  Many have written and blogged about this technology, here it is in action.

Demo of the Oracle ZFSSA Hybrid Storage Pool and how it speeds up workloads.

Wednesday Apr 18, 2012

7420 SPEC SFS Torches EMC/Isilon, Netapp, HDS Comparables


Another SPECsfs submission, and another confirmation that ZFSSA is a force to be reckoned with in the NFS world. The Oracle ZFSSA continues to astound with its performance benchmark numbers. Today Oracle posted the anticipated SPECsfs benchmark numbers for the 7420 that simply leave you wondering HOW?  How is Oracle technology so much faster, cost effective and efficient than the competition? I say efficient because Oracle continues to post impressive performance benchmarks surpassing competitor’s multi-million dollar configurations with 2-5X lower price points.


For this comparison, I grabbed the top 2-node 6240 Netapp cluster, a recent Hitachi/Bluearc submission as well as a 28 node SSD Isilon cluster (not really realistic). The Netapp is fairly close in terms of number of drives and number of controllers which makes it a good comparison. However, the ZFSSA provides 40% more performance than the 6240 at a $700,000 lower price point!!  This doesn't even factor in maintenance costs, extra software licensing and its lack of DTrace Analytics.  I will demo these in an upcoming post. 

Another interesting point if you dig into the details, during NetApp’s max 190k IOPS their latency is 3.6ms , on the other hand the ZFSSA has only 1.7ms latency for 202k IOPS.  That means with the same 190k IOPS workload Oracle would respond over 2x faster!  I threw Isilon in the mix because they refuse to post to SPC2 even though they are usually purchased for high bandwidth applications. They are over $2 Million dollars more than the ZFSSA for still posting lower performance. Hitachi is included due to market perception they have a competitive NFS system for performance.

All the links below on the respective system name will take you to the detailed summary.  Including hardware setup and raid type etc..  The 7420 in this case was mirrored.

 Storage System
SPEC SFS Result ops/sec (Higher is Better)
Peak Response Time (Lower is Better)  Overall Response time (Lower is Better)  # of Disks Exported TB  Estimated List Price  $/IOPS
Oracle 7420 267928  3.1  1.31  280  36.32 $430,332 $ 1.61
Netapp 6240 - 4n 260388 4.8 1.53 288 48 $1,606,048 $ 6.17
Isilon S200 - 28n 230782 7.8 3.20 672 172.3 $2,453,708 $ 10.63
Netapp 6240 190675 3.6 1.17 288 85.8 $1,178,868 $ 6.18
Hitachi 3090-G2 189994 9.5 2.08 368 36.95 ? ?
Netapp 3270 101183 4.3 1.66 360 110.08 $1,089,785 $10.77

Fast Databases, Servers, Sailboats and NFS Storage Systems

So not only do we have some of the fastest sailboats in the world, we also have some of the fastest NFS Storage Systems as well.  You could say we have a Storage benchmark Trifecta.  We have some of the top benchmarks in SPECsfs, SPC1 and SPC2.  Take a ZFSSA for a test ride today, improve performance and lower your storage costs.

Tuesday Apr 17, 2012

ZFSSA Storage Stomps IBM DS8800 and XIV

Another benchmark completed by the ZFSSA engineering team with astounding results.  This is becoming expected and usual around Oracle these days.  This is the 3rd major benchmark for the ZFSSA in the last 7 months.  Oracle released the SPC2 benchmark which scored 10,704 MBPS and earned a 2nd place in performance and 1st in terms of $/MBPS by a long margin, especially against the IBM systems also on the top 10.  The HP P9500 currently holds the number 1 position in terms of performance but is more than 2x the cost of the Oracle solution. In addition I don’t believe the HP solution has any compression capabilities, which is another significant advantage of the ZFSSA.

 MBPS  System Cost
Oracle 7420  10704  $ 377,225.38
IBM DS8800  9706  $ 2,624,257.00
IBM XIV  7468  $ 1,137,641.30

EMC and Netapp both choose not to participate in this benchmark.  I can only guess this is likely because they would have to share their $/MBPS and might not make them look so shiny anymore.  I think they may say that the configs posted are ridiculous and would never be purchased by the common customer.  There is certainly some truth to that even with their previous SPEC SFS submissions. However, this is not an issue with the oracle submission, our tested system is both within reason and practicality of what a typical customer would purchase.  Benchmarks provide a valuable resource for customers to see how the same workload works on each vendor's box without doing an in-house disruptive POC bake off.  It is unfortunate, not all vendors submit results for this and other benchmarks.

The massive performance and aggressive price points reflected in this benchmark adds to the increasing number of reasons to consider the ZFS Storage Appliance for any of your upcoming SAN or NAS storage projects.

Thursday Feb 23, 2012

Oracle Posts SPEC SFS Benchmark and Crushes Netapp Comparables

Oracle posted another shot across the bow of Netapp.  In Oct 2011 Oracle posted impressive SPC-1 benchmarks that were 2x faster and half the cost of netapp.  Now those customers looking for proof of ZFSSA's superior performance and cost have another benchmark to compare.

Why are we posting now?
For a long time the old Sun Engineering regime refused to post SFS results stating the problems with the benchmark which are true.  However some customers refused to even look at the Oracle ZFS Storage Appliance because of the lack of benchmark postings.  Our competitors like netapp and emc would use it as some sort of proof that we must perform poorly.  

But Netapp and EMC have other much larger configs that are much faster?
I should point out netapp and emc both have much larger benchmark posts to SPEC SFS, but they are ridiculous configurations that almost no customers would run and further more would be willing to pay for.  Most customers that purchase NAS to run NFS purchase many smaller 2 node HA clusters versus a 20 million dollar 24 node nas cluster.  I tried to compare and include EMC in this comparison but soon realized it was worthless in that their closest post used a celerra gateway in front of a 4 engine vmax.  The list price for that would be off the charts so I considered it not valuable for this comparison.  My goal was to get a good view of comparable systems that customers might consider for a performance oriented NAS box using NFS.  

Price Matters!
One of the major downsides of the SPEC SFS results is that they don't force vendors to post prices for customers to easily consider competitors like SPC does.  Obviously every customer wants great performance but price is always a major factor as well.  Therefore I have included the list prices as best I could figure them.  For the Netapp prices I used the following price sheet I easily found on google.  When comparing performance oriented storage customers should be comparing $/ops versus $/GB. 

Lets look at the results at a high level

 Storage System
SPEC SFS Result ops/sec (Higher is Better)
Peak Response Time (Lower is Better)  Overall Response time (Lower is Better)  # of Disks Exported TB  Estimated List Price  $/OPS
Oracle 7320 134140  2.5  1.51  136  36.96 $184,840  $1.38 
Netapp 3270 101183 4.3 1.66 360 110.08 $1,089,785 $10.77
Netapp 3160 60507 3.5 1.58 56 10.34 $258,043 $4.26

Umm, Why is the ZFSSA so much more efficient?
In a nutshell its superior engineering and the use of technologies such as the Hybrid Storage Pool (HSP) in the ZFS Storage Appliance.  The HSP extends flash technology not only to read cache but also write cache.  

The 3160 result includes the use of Netapp PAM Read flash cards.  I am not sure why a year later they didn't include them in the 3270 test if they improve performance so much?  Maybe they will post another netapp result with them now?

What now?
Now Oracle ZFSSA Engineering has posted results that again blow's away Netapp and prove our engineering is outstanding.  It makes sense that we would have an edge, when you consider that the NFS protocol itself was invented at SUN. Netapp has yet to respond with a new SPC-1 benchmark that is comparable.  I know some netapp bloggers were looking for us to post spec sfs results and thought we never would and therefore said our performance must be poor.  Now we have posted impressive results and there will be more to come, stay tuned.  As one famous blogger has said the proof is in the pudding.


Various information about Oracle Storage.


« March 2015