Monday Sep 23, 2013

Using Analytics (Dtrace) to Troubleshoot a Performance Problem, Realtime

I was recently on a call with one of the largest financial retirement investment firms in the USA.  They were using a very small 7320 ZFS Storage appliance (ZFSSA) with 24GB of dram and 18 x 3TB HDDs.  This system was nothing in terms of specifications compared to the latest shipping systems, the  ZS3-2 and ZS3-4 storage systems with 512GB and 2TB of DRAM respectively.  Nevertheless I was more then happy to see what help myself and mostly the ZFSSA Analytics (Dtrace) could offer.  

The Problem: Umm, Performance aka Storage IO Latency is less then acceptable...

This customer was/is using this very small ZFSSA as an additional Data Guard target from their production Exadata.  In this case it was causing issues and pain.  Below you can see a small sample screen shot we got from their Enterprise Manager Console.

Discovery: Lets see what is going on right now! Its time to take a little Dtrace tour..

The next step was to fire up the browser and take a look at the real-time analytic's.    Our first stop on the dtrace train is to look at IOPS by protocol.  In this case the protocol is NFSv3 but we could easily see the same ting from nfsv4, smb, http, ftp, fc, iscsi etc... Quickly we see that this little box with 24GB of DRAM and 18 x 7200RPM HDD's was being pounded! Averaging 13,000 IOPS with 18 drives isn't bad.  I should note that this box had zero Read SSD drives and 2 x Write SSD drives.  

Doing a quick little calculation we see that this little system is doing about 650 IOPS per drive per second. Holy Cow SON! Wait, is that even possible? There is something else at play here, could it be the large ZFSSA cache (a measly 24GB in our case) at work?  Hold your horses, IOPS are not everything in fact you could argue that they really don't matter at all, what really matters is latency.  This is how we truly measure performance, how fast does it take to do one thing versus how many things can I do at the same time regardless of how fast they each get done.  To best understand how much effect DRAM has on latency see our World Record SPECSFS Benchmark information here.  Here you see that sure enough that some of the read latency is pathetic, for just about any application in the world.  

There are many ways to solve this problem, the average SAN/NAS vendor would tell you to simply add more disk, With dtrace we can get more granular and ask many other questions and see if there is perhaps a better or more efficient way(s) to solve this problem.

This leads us to our next stop on the dtrace discovery train, ARC accesses (Adaptive Replacement Cache).  Here we quickly find that even with our lackluster performance in terms of read latency.  We have an amazing read cache hit ratio.  Roughly about 60% of our IO is coming from our 24GB of DRAM. On this particular system the DRAM is upgradable to 144GB of DRAM.  Do ya think that would make a small dent in those 6033 data misses below?

This nicely leads into the next stop on the dtrace train which is to ask dtrace for all those 6033 data misses how many of them would be eligible to be read from L2ARC (READ SSD Drives in the Hybrid Storage Pool).  We quickly noticed that indeed they would have made a huge difference.  Sometimes 100% of the misses were eligible.  This means that after missing the soon to be upgraded 6x dram based cache the rest of the read IO's of this workload would likely be served from high performance MLC Flash SSD right in the controller itself.

Conclusion: Analytics on the ZFSSA are amazing, The Hybrid Storage Pool of the ZFSSA is amazing, the large DRAM based cache on the ZFSSA is very amazing...

At this point I recommend that they take a two phase approach to the workload.  First they upgrade the DRAM Cache 6x and add 2 x L2ARC Read SSD drives.  After that they could evaluate if they still needed to add more disk or not.

Extra Credit Stop:  One last stop I made  was to look at their NFS share settings and see if they had compression turned on like I have recommended in a previous post.  I noticed that they did not have it enabled and that CPU was very low at less then 10% AVG utilization.  I then explained to the customer how they would benefit even now without any upgrades by enabling compression and I asked if I could enable it that second at 12pm in the afternoon during a busy work day.  They trusted me and we enabled it on the fly and then for giggles I decided to see if it made a difference on disk IO and sure enough it made an immediate impact and disk IO lowered because now every new write they had was taking less disk space and less IO to move through the subsystem since we compress real-time in memory.  Remember that this system was a Data guard target so it was constantly being written too. You can clearly see in the image below how compression lowered the amount of  IO on the actual drives.  Trying doing that with your average storage array compression.

Special Thanks goes out to my teammates Michael Dautle and Hugh Shannon who helped with this adventure and capturing of the screenshots.

Tuesday Sep 10, 2013

ZFS Storage Appliance Benchmarks Destroy Netapp, EMC, Hitachi, HP etc..

Today Oracle released two new storage products and also posted two World Record Benchmarks!

First, Oracle posted a new SPC-2 throughput benchmark that is faster than anything else posted on the planet! The benchmark came in at 17,244 MBPS.  What is probably even more amazing then this result is the cost of the system to accomplish this benchmark.  Oracle by far has the lowest cost per MBPS compared to our major competitors coming in at $22.53.  IBM by contrast comes in at $131.21.   I should also note that Oracle accomplished this with less hardware then IBM.  The ZS3-4 entry uses 384 drives while the IBM DS8700 uses 480 drives.  When it comes to throughput applications the Oracle ZS3-4 beats every competitor in every category of the SPC-2 Benchmark.  You can read the details here yourself.

The second benchmark posted was the SPECsfs2008 benchmark.  Full Disclosure: SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation benchmark suite measuring file server throughput and response time. Oracle did break the World Record response time measurement coming in at .70ms.  Oracle did not break the record for most operations per second, but in many ways those benchmarks are sort of silly because you would be comparing such apples and oranges.  The Oracle benchmark is based on a standard 2-node cluster with some disk and flash behind it.  The vendors with higher OPS/sec numbers have mostly enormous configs or all flash configs which are mostly irrelevant for all but a few niche workloads.  For the average NAS user the ZS3-4 config used for this benchmark is perfectly inline with what most customer purchase to run things like Oracle Databases, VMware, MS SQL etc...

The closest 2 node cluster comparable is the recent Hitachi HNAS 4100 which came in at 293,128 OPS/sec with a latency of 1.67ms.  Compare that to the 2 node ZS3-4 entry with 450,702 OPS/sec and .70ms latency.  That latency is more then twice as fast as the Hitachi's and it still blows it away in OPS/sec.  They both have very close to the same number of drives as well which is interesting.  You can read the actual results here.  With the new ZFS Storage Appliance hardware/software combo Oracle can clearly see the competition in the rearview mirror when it comes to performance.  Many analyst's have also recently commented on the Oracle ZFS Storage Line-up.


Various information about Oracle Storage.


« September 2013