Monday Jul 15, 2013

ZFS Storage Appliance Compression vs Netapp, EMC and IBM in a Real Production Environment

Dear Oracle ZFSSA Customer, Please enable LZJB compression on every file system and LUN you have by default
I have been involved in many large Proof of Concepts during my time here at Oracle and I have come to realize that there is a quite compelling and honestly very important feature in the ZFS Storage Appliance that every customer and potential customer should review and plan on using.  This feature is inline data compression for just about any storage need, including production database storage.  Many storage vendors today list and possibly tout their compression technologies but rarely do they tell you to turn it on for production storage and never would they tell you to turn it on for every File System and LUN.  This is where the ZFS Storage Appliance is somewhat unique.  Real-world experience over a wide range of production systems has shown that Oracle storage customers should run LZJB compression unless there is a compelling reason not to run compression, such as storage for uncompressible data.  Even in the case of mixed data, where some data does not compress, the benefit of LZJB compression is still strong enough and system CPU capacity is still large enough to accommodate running compression in cases where not all of the data in a share will compress.  

Why Compress Everything?
There are many reasons to compress data such as saving space and money which are great reasons to compress, but with the ZFSSA you may actually gain performance as well.  ZFS compression reduces the amount of data written to the physical disks over the back-end network channels.  Finite physical limits on disk drive bandwidth and channel bandwidth put fundamental limitations on application data transfer rates.  With compression, the amount of data written to the drives and channels is reduced compared to the amount of data written by the application, so using CPU to perform compression increases the effective drive and channel bandwidth and opens up an important bottleneck that constrains application throughput.

What does this cost in terms of performance and is there a license fee?
 In terms of performance it costs very, very little. The currently shipping ZFSSA 7420 for example has either 32 or 40 cores of Intel Westmere horsepower under the covers coupled with up to 1TB of DRAM memory and a Solaris Kernel that is exceptional at doing many things at the same time and using all the cores it can.  Usually most of our customers have CPU cores and MHz sitting around waiting for something to do.  Turning on LZJB compression by default turns out to be a pretty good win-win in terms of using a few cores, gaining performance and saving space.  There is no license fee or cost to use ZFSSA compression you simply turn it on where you want to use it. It is granular enough that you can turn it on at the file system or LUN level.

There 4 different built in levels of compression to choose from with the ZFSSA.  LZJB, GZIP-2, GZIP(GZIP-6) and GZIP-9.  A customer could easily create 4 file systems each with a different compression version and then copy the same test data to each share at different times.  You could then very easily correlate the CPU usage (Using Dtrace Analytics) versus the compression ratio and make an educated decision on which level of compression you want to use for a particular data type.  I tell my customers to at a minimum start with LZJB pretty much everywhere.  Below is a real world example I pulled from one of my customers.  Here you see a ZFS pool that basically has multiple Oracle Databases running on it.  They have LZJB turned on everywhere and are seeing a very nice 2.79x compression ratio.

The Other Guys Compression Story: 
So at this point you may be wondering what is so unique about this Oracle ZFSSA Compression vs say Netapp Data Compression, EMC VNX compression, or IBM's v7000/SVC compression?  The basic issue is that pretty much all the other big players have issues scaling CPU/threads and processes and therefore turning on compression in these environments can create a cpu bottleneck performance problem pretty fast. Lets take a minute to examine each.

Netapp Data Compression:
Netapp's own whitepapers are riddled with WARNING's on using compression in production environments (Especially Inline Compression for Oracle OLTP).  Most notably they say that Compression can chew up a lot of the meager amount of CPU/threads available.  I tried to find a detailed document that explains how Netapp uses its cores and found it next to impossible to find even for Google!  I did find a few blogs and netapp forums where people frequently mentioned something about the Kahuna Domain and that compression was part of this Domain (CPU) and shared with many other services.  Maybe someone else can share how all the cores on a netapp box are used?  It appears that if a Netapp box has over 50% CPU utilized you could get into trouble turning on compression.

"On workloads such as file services, systems with less than 50% CPU utilization have shown an increased CPU usage of ~20% for datasets that were compressible. For systems with more than 50% CPU utilization, the impact may be more significant."

"When data is read from a compressed volume, the impact on the read performance varies depending on the access patterns, the amount of compression savings on disk, and how busy the system resources are (CPU and disk). In a sample test with a 50% CPU load on the system, read throughput from a dataset with 50% compressibility showed decreased throughput of 25%. On a typical system the impact could be higher because of the additional load on the system. Typically the most impact is seen on small random reads of highly compressible data and on a system that is more than 50% CPU busy. Impact on performance will vary and should be tested before implementing in production." Reference: http://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3958.pdf&cc=us

The following conclusions can be drawn based on the available evidence from industry analyst reviews as well as Netapp’s own documentation: In the best possible scenario, NetApp’s compression technology occurs asynchronously for active, transactional workloads, meaning that the data is initially written in an uncompressed form. Therefore, capacity must always be over-provisioned and later reclaimed, most likely with some degree of fragmentation, as compared to the synchronous or in-line compression architecture of the ZFS Storage Appliance. As a result, much of the potential cost-saving value of compression cannot be effectively realized as the data must always initially reside in an uncompressed state on the media.

EMC VNX Compression:
EMC is very similar to Netapp in that they give lots of warnings and say that compression is only recommended for static data.  EMC also has very small limits on the number of compressed LUN's you can have per controller.  The document that details this is a few old originally written for the Clariion CX4 but apparently still relevant for the newer controllers as it is directly linked in the newer VNX compression datasheet.

"Compression’s strength is improved capacity utilization. Therefore, compression is not recommended for active database or messaging systems, but it can successfully be applied to more static datasets like archives, clones of database, or messaging-system volumes." Reference: http://www.emc.com/collateral/hardware/white-papers/h8045-data-compression-wp.pdf http://www.emc.com/collateral/hardware/white-papers/h8198-vnx-deduplication-compression-wp.pdf
"VNX file deduplication and compression should be used exclusively for file data as more granular control is available for file data rather than block, so the system can identify inactive data to process versus active data.”
“Block data compression is intended for relatively inactive data that requires the high availability of the VNX system. Consider static data repositories or copies of active data sets that users want to keep on highly available storage."
Reference: http://www.emc.com/collateral/hardware/white-papers/h8198-vnx-deduplication-compression-wp.pdf

In conclusion, like Netapp, EMC’s compression technology is best suited for infrequently accessed data, and similarly requires post process functions to compress data; the usage cases for any data that will be accessed beyond an archival access frequency are highly limited.  Because data is initially written in the uncompressed form, compression must occur asynchronously by means of a time-consuming compression post-process. In fact, compression post-processing for newly written data often must run for time periods measured in days before the data is stored in compressed format. As a result, much of the space-saving value of compression is moot as the data must reside in an uncompressed state on the media, meaning capacity must always be over-provisioned relative to implementations that feature in-line compression such as the ZFS Storage Appliance.

IBM Real-Time Compression:

IBM is slightly different from the above in that they actually claim to support running their compression for production databases.  But as you dig into the bowels of the Redbook, you quickly find there are some severe limitation and concerns to beware of.  First off the v7000 has only 4 cores and 8GB of Cache Memory.  Pretty small versus the 32-40 cores and 1TB of Cache Memory in the Oracle ZFSSA 7420.  Apparently when you enable a single volume/lun with compression 3 of the 4 available cores and 2GB or the 8GB of memory now become dedicated solely to compression operations.  So with IBM v7000 if you turn on compression then kiss away 75% of your storage CPU resources on the particular node/controller this lun lives on.  IBM goes on and affirms this by saying that if you have more then 25% CPU used before compression is enabled then you should not turn it on.  They also have a hard limit of 200 volumes compressed within a 2-node I/O group.  They also say not to mix compressed and uncompressed luns/volumes in the same storage pool.  Probably one of the least desired aspects of the IBM compression is cost $$$.  IBM is the only vendor here that charges extra for compression.  With SVC they charge per TB and with the v7000 they charge per enclosure so either way, every time you had disk to a I/O Group/Controller Pair that has compression enabled you are going to be hit with more software licensing as well.  Another major SVC and v7000 viability concern arises for customers that would like to exploit IBM’s Easy Tier auto-tiering technology in addition to Real-Time Compression, because RTC and Easy Tier are currently mutually exclusive: Easy Tier is automatically disabled for compressed volumes and cannot be enabled.

"An I/O Group that is servicing at least one compressed volume dedicates certain processor and memory resources for exclusive use by the compression engine."
Reference: http://www.redbooks.ibm.com/redpapers/pdfs/redp4859.pdf

Special Thanks

I want to especially thank Jeff Wright (Oracle ZFSSA Product Management) and  Mark Kremkus (Fellow Oracle Storage Sales Consultant) who both contributed to this entry.

Tuesday Jul 10, 2012

Oracle ZFSSA Hybrid Storage Pool Demo

The ZFS Hybrid Storage Pool (HSP) has been around since the ZFSSA first launched.  It is one of the main contributors to the high performance we see on the Oracle ZFSSA both in benchmarks as well as many production environments.  Below is a short video I made to show at a high level just how impactful this HSP pool is on storage performance.  We squeeze a ton of performance out of our drives with our unique use of cache, write optimized ssd and read optimized ssd.  Many have written and blogged about this technology, here it is in action.

Demo of the Oracle ZFSSA Hybrid Storage Pool and how it speeds up workloads.

Thursday Feb 23, 2012

Oracle Posts SPEC SFS Benchmark and Crushes Netapp Comparables

Oracle posted another shot across the bow of Netapp.  In Oct 2011 Oracle posted impressive SPC-1 benchmarks that were 2x faster and half the cost of netapp.  Now those customers looking for proof of ZFSSA's superior performance and cost have another benchmark to compare.

Why are we posting now?
For a long time the old Sun Engineering regime refused to post spec.org SFS results stating the problems with the benchmark which are true.  However some customers refused to even look at the Oracle ZFS Storage Appliance because of the lack of benchmark postings.  Our competitors like netapp and emc would use it as some sort of proof that we must perform poorly.  

But Netapp and EMC have other much larger configs that are much faster?
I should point out netapp and emc both have much larger benchmark posts to SPEC SFS, but they are ridiculous configurations that almost no customers would run and further more would be willing to pay for.  Most customers that purchase NAS to run NFS purchase many smaller 2 node HA clusters versus a 20 million dollar 24 node nas cluster.  I tried to compare and include EMC in this comparison but soon realized it was worthless in that their closest post used a celerra gateway in front of a 4 engine vmax.  The list price for that would be off the charts so I considered it not valuable for this comparison.  My goal was to get a good view of comparable systems that customers might consider for a performance oriented NAS box using NFS.  

Price Matters!
One of the major downsides of the SPEC SFS results is that they don't force vendors to post prices for customers to easily consider competitors like SPC does.  Obviously every customer wants great performance but price is always a major factor as well.  Therefore I have included the list prices as best I could figure them.  For the Netapp prices I used the following price sheet I easily found on google.  When comparing performance oriented storage customers should be comparing $/ops versus $/GB. 

Lets look at the results at a high level

 Storage System
SPEC SFS Result ops/sec (Higher is Better)
Peak Response Time (Lower is Better)  Overall Response time (Lower is Better)  # of Disks Exported TB  Estimated List Price  $/OPS
Oracle 7320 134140  2.5  1.51  136  36.96 $184,840  $1.38 
Netapp 3270 101183 4.3 1.66 360 110.08 $1,089,785 $10.77
Netapp 3160 60507 3.5 1.58 56 10.34 $258,043 $4.26

Umm, Why is the ZFSSA so much more efficient?
In a nutshell its superior engineering and the use of technologies such as the Hybrid Storage Pool (HSP) in the ZFS Storage Appliance.  The HSP extends flash technology not only to read cache but also write cache.  

The 3160 result includes the use of Netapp PAM Read flash cards.  I am not sure why a year later they didn't include them in the 3270 test if they improve performance so much?  Maybe they will post another netapp result with them now?

What now?
Now Oracle ZFSSA Engineering has posted results that again blow's away Netapp and prove our engineering is outstanding.  It makes sense that we would have an edge, when you consider that the NFS protocol itself was invented at SUN. Netapp has yet to respond with a new SPC-1 benchmark that is comparable.  I know some netapp bloggers were looking for us to post spec sfs results and thought we never would and therefore said our performance must be poor.  Now we have posted impressive results and there will be more to come, stay tuned.  As one famous blogger has said the proof is in the pudding.

About

Various information about Oracle Storage.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today