Why your Netapp is so slow...

Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block read workloads?  In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that READ in large blocks like 64k, 128k, 256k ++ etc..  Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads.

Netapp’s Fundamental Problem

Update:10/21/14  Well, it turns out that the netapp whitepaper referenced below is wrong/incomplete and doesn't tell the whole story.  The netapp will try and lay the blocks out contiguously and retrieve multiple blocks per IO using some method they call chaining.   So they are not as bad at large block as it may seem.  They are still slow we just do not know the real reason they are slow.  Could it be their CPU bound architecture?  Lack of ability to do SMP multithreading like the Solaris kernel?  The answer is I don't know.  I don't know why they won't post an SPC-2 benchmark and I am not sure why the FAS box is slow at such a workload. I would bet anyone that if performance was even modestly OK that you would see netapp posting a benchmark for the 8080EX. 

So fundamentally the issue still remains that the FAS box is not created for these types of workloads and will run out gas much faster then a ZS3 therefore costing the customer more $/IOP  and $/GBs

The fundamental problem you have running these workloads on Netapp is the backend block size of their WAFL file system.  Every application block on a Netapp FAS ends up in a 4k chunk on a disk. Reference:  Netapp TR-3001 Whitepaper

Netapp has proven this lacking large block performance fact in at least two different ways. 

  1. They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently.
  2. In 2011 they purchased Engenio to try and fill this GAP in their portfolio.

Block Size Matters

So why does block size matter anyways?  Many applications use large block chunks of data especially in the Big Data movement.  Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you.

9-10-14 (Updated with more detail, the facts haven't changed just more detail because Netapp was upset about the way I made their writes look.  Even though the original entry focused on READS!  So to be fair and friendly I have updated the write section with more detail)


If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes and 16 different DISK IOPS.   Now, to be fair.  These WAFL writes will mostly be hidden or masked from the application via the write cache like all Storage Arrays have, in Netapp's case they have a couple of NVRAM pools that get flushed every 10 seconds or when they start getting full.

Now it is quite possible a customer could run into a huge problem here as well with the NVRAM buffers not being able to flush faster then they get full.  These are usually called "Back to Back CPs".  Netapp even has a systat to measure this potential problem.  Netapp will tell you they have good sequential write performance; "the main reason ONTAP can provide good sequential performance with randomly written data is the blocks are organized contiguously on disk"  So for writes this is mostly true unless you blow through the relatively small amount of NVRAM.  If you hit that bottleneck you cannot grow your nvram but you will have to buy a bigger Netapp FAS Filer.  With the Oracle ZS-3 we use a different method to handle write cache.  ZFS uses something called a ZFS Intent Log or ZIL.  The ZIL is based on Write Optimized SLC SSD's and can be expanded on the fly by simply hot adding more ZIL drives to the given ZFS pool. 

But this post is much less about writes and more about reads.  The only reason I even wrote about writes is because Netapp has a fixed Back-end block size of 4k which can be very limiting on large block read workloads.  So it seemed natural to explain that.


When the application later goes to read that 64k chunk the Netapp will have to again do 16 different disk IOPS for data that is not chained together on the netapp disk.  In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB.  So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO.  That is 16x faster!  But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall.  Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO.  Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system.  You can do some simple math in excel and very quickly you will see why it matters.  Here are a couple of READ examples using SAS and MSSQL.  Assume these are the READ IOPS the application needs even after all the fancy cache and prefetch read-ahead algorithms or chaining.  Netapp will tell you they will get around this via read ahead and chaining and the fact that the data is laid out on the disk contiguously but at some point with these large block reads even that is going to fail.  If it didn't fail,  I guarantee you netapp would be posting SPC-2 throughput numbers and brag about it...

Here is an example with 128k blocks.  Notice the numbers of drives on the Netapp!

Here is an example with 64k blocks

You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives.  This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers.  So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today. 

ZS3 World Record Price/Performance in the SPC-2 benchmark 
ZS3-2 is #1 in Price Performance $12.08
ZS3-2 is #3 in Overall Performance 16,212 MBPS

Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79.  A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.


I've never responded to a blog in my life, but this time I feel compelled to do so.

Your statement that a 64k write requires 16 disk writes to 16 different locations is not true. The 4k block size is a minimum physical block size. Block operations in larger sizes will result in larger IO's, just like any storage array.

The remainder of your posting appears to be based on that false premise.

It would take more space than just a blog comment to properly explain how WAFL works, but irrespective of the details, I have Oracle AWR reports from real customers showing multiple GB/sec of sequential IO using NetApp storage systems, and certainly not with 5000 drives. Nobody would buy such an system.

Note: I am a NetApp employee, but my comments are my own.

Posted by Jeffrey Steiner on August 31, 2014 at 08:09 AM MDT #

Hi Jeff,
Yes I know your writes get queued up in NVAM and then sequentially written to disk. They still end up in essentially 4k bytes of data on each disk. Each of which will require a separate disk IO to READ, if you read the rest of the post its all focused on READ’s and NOT writes… The weakness of your limited NVRAM write feature would take a whole post of its own where we would discuss your nvram pools filling up faster than they can empty and creating Consistency Points problems (aka: Back to Back) https://communities.netapp.com/thread/14228

For this post and comments please focus on the LARGE BLOCK READs. Please tell me how you READ 64k from a single disk IO like ZFS?

Posted by Darius Zanganeh on September 03, 2014 at 09:21 AM MDT #

One other comment. Your Oracle DB example is not really relevant because its not a large block application like my post discusses. It is mostly 8k blocks. SO, I guess your box would only be degradaded on a 2-1 ratio for reads. Probably most of that could be masked with cache and PAM cards, but on large READ heavy DW workloads it would be a different story and I think the ZS3 would smoke any netapp FAS in a bakeoff. Even more so if your start talking about HCC and OISP.

Posted by Darius Zanganeh on September 03, 2014 at 09:32 AM MDT #

You need to learn how Wafl works. it's not what you're describing.

An oracle and netapp customer


Posted by guest on September 04, 2014 at 04:04 PM MDT #

Hello Irwin,
First of all, Thank you for your business.

RE: WAFL, Please let me know or show me something that shows my data is incorrect on LARGE BLOCK READs that are NOT coming from cache or prefetch etc... and I will happily update my post.

I want to know how you do a large block 64k+ reads from disk when clearly all the data is out on disk in 4k chunks.


Posted by Darius Zanganeh on September 04, 2014 at 05:20 PM MDT #

Sorry about the delay following up from my earlier comment. It wasn't going to be productive to try to explain myself in the comment fields of this blog, so I went ahead and wrote up a more comprehensive response over on http://recoverymonkey.org/

Again, the statement that a 64k write results in 16 different 4k writes it factually false. Leave that uncorrected if you wish. I would hope potential customers would realize the claims here are obviously false, but if a customer asks about it, I'll direct them to my response. That should resolve their questions more than adequately.

I also responded to the questions about SPC-2 benchmarks and the Engenio purchase.

Posted by Jeffrey Steiner on September 09, 2014 at 11:33 PM MDT #

Hi Jeff,
I posted a reply on your blog and also updated my blog. Here is what I posted your blog post.

Hey Guys,
Thanks for the conversation, My blog clearly says the example is on LARGE READ's not writes.

WAFL clearly writes the data down in 4k chunks even if it does it all contiguously which I have now acknowledged on my blog. If my 4k block thing is openly FALSE then why do your own white papers written by Dave H. explain WAFL that way with the 4k data block and lots of INODES...

Yes you mask the 4k writes from the application via NVRAM and you will try to mask the 4k reads via Read-Ahead which EVERY storage array has and I clearly said that in my blog post. You must admit that they are still 4k writes just written contiguously...

The point of my post which you clearly missed is that the Netapp WAFL is incredibly inefficient when it comes to large block sequential reads. Yes, some of the time the customer may be completely happy with the system because the workload is small enough and the read ahead of the 4k blocks on the WAFL backend is working well enough to mask any degradation in performance.

The point of the SPC-2 which Netapp can certainly afford (give me a break they just posted new SPC-1 and SPECSFS benchmarks). Is that you can compare two systems running the EXACT same throughput workload. It allows you to compare efficiency of a storage subsystem. How much bang of your buck do you get with vendor A vs vendor B.

Regarding your Oracle AWR report:
You are looking at a report from the client not the storage. Of course Netapp is not going to give the DB a bunch of 4k blocks. What the AWR does NOT show is how many individual DISK IOPS the WAFL system did to get this data. Only the filer could tell you that? Or can it? With DTRACE on the ZS3 we can show you individual DISK IOPS broken down by size both real-time and historical. I have no idea if Ontap can do that easily like dtrace?

I doubt your customer will let us run a bake off between the ZS3 and your current Netapp model. So because of that, we can look at benchmarks. We could look at the SPC-1 benchmark which is closest to a Oracle DB benchmark that is public and OPEN. The SPC states the following. "SPC-1 consists of a single workload designed to demonstrate the performance of a storage subsystem while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations."

So let’s compare efficiency of nice new Netapp 8040 to a 3 year old Oracle 7420.


The Netapp 8040 did roughly 86,000 IOPS for $5.76 a piece.
The Oracle 7420 did roughly 137,000 IOPS for $2.99 a piece and this was almost 3 yrs ago.

So for running 3 yr old Oracle ZFS technology you will still be almost 2x performance for almost half the cost. Which IOPS is more efficient? And stay tuned because eventually Oracle ZFS engineering will release a new SPC-1 that will make the 7420 seem slow and expensive like the Netapp 8040.

I could talk about the actual Oracle benefits of running the Oracle DB on Oracle ZS3 with HCC (Hybrid Columnar Compression) and OISP (Oracle Intelligent Storage Protocol) + many other reasons Oracle DBs run best on Oracle Storage but that is another topic and another post some other time.

So please let’s focus on large block reads that are not able to be READ-Ahead and explain how your disk IO's will be larger than 4k each. I promise you that if you put a a large block app like SAS or a BIG Data Warehouse you will run out of read-ahead and cache performance and at some point be relying on disk. Even the ZS3 with its HUGE cache's and WORLD RECORD performance hits this with large throughput workloads. You’re talking about a couple of measly GB/s per second. Try 16GB/s on a Netapp FAS that are not in cache?

Darius Zanganeh
Oracle Storage

Posted by Darius Zanganeh on September 10, 2014 at 12:35 PM MDT #

Hi Darius,

Can you explain how the SPCs prove your point that you need more disks in NetApp for comparable performance. The 7420 was using 280 15K HDD vs. 192 10K HDD in NetApp 8040. So 50% more of faster HDD resulted in 60% increase in IOPS. This contradicts to what you say in the article that NetApp needs more disks to achieve similar IOPS...

P.S. Also it looks like 7420 used 4 times more of SSD cache (4TB vs 1TB)

Posted by guest on September 18, 2014 at 08:34 AM MDT #

Hi Mikhail,
Sorry for the confusion.
1.The SPC-1 benchmark is out of context to this post which is mainly about large block reads such as SAS/BI/Datawarehouse etc.. I was simply responding to the Netapp blog post refuting my original post.
2.The 7420 spc-1 is almost 3yrs old, so you can rest assured that a new benchmark will likely eventually come with 10k drives. What is interesting is that We are still about ½ the cost of the Netapp price today.
3.If you want to compare more apples to apples you should look at the SPECSFS benchmarks, but again SPECSFS isn’t about throughput like the SPC-2 is. The REAL question you should be asking is WHY DOESN’T Netapp EVER post an SPC-2 with FAS?

Oracle ZS3-2 with 172 x 10k drives= 210,535 IOPS with 1.12ms response time

Netapp 8020 with 144 x 10k drives= 110,281 IOPS with 1.18ms response time

The Oracle system is likely ½ the price of the Netapp as well.


Posted by Darius Zanganeh on September 18, 2014 at 10:13 AM MDT #

Post a Comment:
  • HTML Syntax: NOT allowed

Various information about Oracle Storage.


« April 2015