Tuesday Feb 09, 2016

ZFSSA update 2013.1.5.0 is now available

The newest version of the ZFS Stprage Appliance code was released yesterday. Version 2013.1.5.0. This is a minor release, as opposed to a micro release, which means it has a few feature enhancements, as opposed to bug fixes (which are in the micros). Since this release does include all previous micros, its a very good idea to check your systems and upgrade to this latest release if you are behind. You can download it from the "Patches" tab of MOS.

Here is the read me: https://updates.oracle.com/Orion/Services/download?type=readme&aru=19853187

Here is the "What's New" section from the readme file: 

LDAP Security Configuration

In the user interface, the security-related parameters for LDAP service configuration have been redesigned to only present supported combinations of authentication methods, credential levels, and SSL/TLS options. The "anonymous" credential level does not use an authorization method. The "self" credential level uses Kerberos encryption and the SASL/GSSAPI authentication method. 

NFS Reserved Ports for Security

NFS clients must use low-numbered (reserved) ports when the new NFS "reserved ports" property is set in conjunction with the AUTH_SYS authentication mode. Some NFS clients, such as Solaris and Linux, use low-numbered TCP ports by default. Other clients, such as Windows, may require configuration.

NFSv4 Numeric String IDs

A new property allows NFSv4 clients to use numeric strings for user and group IDs when used in conjunction with authentication type AUTH_SYS. If you do not set this property, user and group IDs are exchanged in the default form, which is user@domain.

Monday Jan 11, 2016

Disk Scrub - Why and When?

What is a "Disk Scrub" and why should you do it?

To begin with, It's a good idea to do this about once a month. Have you ever done one, ever? Maybe you don't need to. We will talk about what this is, why you would or would not do it, and how you can automate it if you wish. We're talking about this button, seen here in the Configuration-->Storage page of your ZFSSA:


ZFS will automatically protect your data from "Bit rot", something that can happen to ALL forms of storage. Every time ZFS reads a block, it compares it to it's checksum, and automatically fixes it. That's great if you read your data a lot. However, what if you write the data, and then don't look at it again for years? Is it protected from bit rot?


That's where a disk scrub comes in. Disk scrub will read all the VDEVs in the pool, therefor fixing any and all bit rot errors. If you don't normally read your data in the pool, Oracle recommends a disk scrub about every month. This is a very low-priority background process, and I doubt you'll even notice it's happening. Because it's low-priority, it can take anywhere from 1 second to many weeks to complete, all depending on how much data and how busy your ZFSSA is. I can give you some examples. These both came from the Oracle Cloud, which uses many ZFS systems. A ZFSSA with 192 4TB drives, configured as a single RAIDz1 pool, with only 1TB of data currently in it, finished a disk scrub in less than 90 seconds. On the other extreme end, and older 7410 system, with only 256GB of DRAM, and 192 2TB drives, completely full and running a high IO, write-intensive workload around the clock, took many months to complete a scrub!

On his excellent blog, Matt Barnson gives some tips on who should use a scrub, and how often:

Should I do this?

  1. Is the pool formatted with either RAIDZ or Mirror2 configuration? Although these two options offer higher performance than RAIDZ2 or Mirror3, redundancy is lower. (No, I'm not going to talk about Stripe. That should only ever be used on a simulator; I don't even know why it exists on a ZFS appliance.)
  2. Are unable to absolutely 100% guarantee that every byte of data in the pool is read frequently?  Note that even databases that the DBAs think of as "very busy" often have blocks of data that go un-read for years and are at risk of bit rot. Ask me how I know...
  3. Do you run restore tests of your data less frequently than once per year?
  4. Do you back up every byte of data in your pool less frequently than once per quarter?

If you answer "Yes" to any of the above questions, then you probably want to scrub your pools from time to time to guarantee data consistency.

How often should I do this? 

This question is challenging for Support to answer, because as always the true answer is "It Depends".  So before I offer a general guideline, here are a few tips to help you create an answer more tailored to your use pattern.

  1. What is the expiration of your oldest backup? You should probably scrub your data at least as often as your oldest tapes expire so that you have a known-good restore point.
  2. How often are you experiencing disk failures? While the recruitment of a hot-spare disk invokes a "resilver" -- a targeted scrub of just the VDEV which lost a disk -- you should probably scrub at least as often as you experience disk failures on average in your specific environment.
  3. How often is the oldest piece of data on your disk read? You should scrub occasionally to prevent very old, very stale data from experiencing bit-rot and dying without you knowing it.

If any of your answers to the above are "I don't know", I'll provide a general guideline: you should probably be scrubbing your zpool at least once per month. It's a schedule that works well for most use cases, provides enough time for scrubs to complete before starting up again on all but the busiest & most heavily-loaded systems, and even on very large zpools (192+ disks) should complete fairly often between disk failures.

How can I automate this?

There is no easy button to automate this, but you can make a script to do it for you from the CLI. I don't want to put the script commands here and get in trouble with anyone, but you can ask your friendly, neighborhood SE for some help in making one.

In the meantime, just push the button in the BUI. Easy. 

Friday Jan 23, 2015

ZFS Improvements

This is a wonderful article by Roch (rhymes with Spock) that I thought you all may find interesting. Roch's top ten list of how ZFS has improved performance.


Thursday Nov 06, 2014

New Logzilla Drives for your ZFSSA

Yes, the new, larger Logzilla SSD drives for your ZFSSA systems are now out. They are 200GB usable, up from the 73GB usable drives. 

Yes, you will sometimes see them referred to in some marketing literature as 400GB. This is because there is extra room in enterprise SSD chips to allow for cell burnout and keep their 5 years lifetime. Make no mistake, they will give you 200GB of actually capacity in the ZFSSA systems.

Yes, they are compatible with the current 73GB version. You can mix and match. The one thing to look out for is in a 'mirrored' log profile. If you mix a new one with an old one in a mirrored log profile, then the new one will size down to 73GB to match it. In a striped profile, it doesn't matter, nor will it matter if you have 2 or more of each.

One last thing-- They are almost twice as fast as the older 73GB version. If you mix them, you will get faster, but not as fast as if you had all 200GB versions. Diminishing returns. Talk to your local SC on whether your Lozgilla workload is so great that either adding some new ones or even changing out your old ones would help your performance. Not every workload needs Logzillas, but there are built-in analytics that can tell us if yours is a good fit.


Sunday Sep 07, 2014

VMWare with the ZFSSA

So we have been saying how well the ZFSSA works in a VM environment for years. We tested and wrote a white paper on VMWare running on the ZFSSA back at Sun Microsystems well before being bought by Oracle. People still assume that now that we are Oracle, we must only work with Oracle's version of vitural machine but not true VMWare... I do hope our presence at VMWorld and this blog can help put those fears to rest. The ZFSSA KILLS the VMWare workload and we fully test and support it.

Check this out...  http://siliconangle.com/blog/2014/09/05/oracles-zfs-storage-zs3-series-boots-16000-vms-in-under-7-mins-outperforms-netapps-fas6000-vmworld/ 

Oracle Claims ZFS ZS3 Storage boots 16,000 VMs in under 7 mins., outperforms NetApp’s FAS6000

Wednesday Sep 03, 2014

Why is my NetApp so slow?

My colleague Darius wrote an excellent blog about the superior performance on the ZFSSA due to larger block sizes. It shows why we out-perform NetApp with workloads such as SAS and SQL.

 Check it out here: https://blogs.oracle.com/si/entry/why_your_netapp_is_so 

Cloud service with the ZFSSA

Everyone is talking about Clouds. Cloud this, cloud that, cloudy cloud cloud.

What is it? To begin with, there's no such thing. If you store your data "on the cloud" it's still being stored SOMEWHERE by SOMEBODY. It's just that you're not storing it yourself. You are paying someone else to do it. Well, they are storing it on real hardware. Real servers and storage. Then, they are charging you to use their hardware (or maybe giving you the space for free and charging advertisers).

Now it turns out the ZFSSA is an excellent storage device for Cloud services. There are many cloud service software products out there. OpenStack is one of them, and it's open source, so that's cool. Icehouse is the newest version of it. Version 9 I believe. There is a plug-in for OpenStack for the ZFSSA.

My Colleague, Roden Kofman, has a new blog showing how this plugin works with the ZFSSA. Check it out here: https://blogs.oracle.com/ronen/entry/running_openstack_icehouse_with_zfs

You can read more about OpenStack Icehouse here: http://www.openstack.org/software/icehouse/

Tuesday Jul 01, 2014

New ZS3-2 benchmark

Oracle released a new SPC2 benchmark today, which you can find on Storage Performance Council website here: http://www.storageperformance.org/results/benchmark_results_spc2_active

As you can see, the ZS3-2 gave excellent results, with the best price/performance ratio on the entire website, and the third fastest score overall. Does the Kaminario still beat it on speed? Yep it sure does. However, you can buy FIVE Oracle ZS3-2 systems for the same price as the Kaminario.  :)

Storage Performance Council SPC2 Results



SPC-2 Price-Performance

ASU Capacity GB

Total Price

Data Protection Level

Date Submitted

Kaminario K2





Raid 10







Raid 5







Raid 6


SGI InfiniteStorage 5600





Raid 6


HP P9500 XP





Raid 5


Oracle ZS3-4





Raid 10


Oracle ZS3-2





Raid 10


Results found on http://www.storageperformance.org/results/benchmark_results_spc2_active

Wednesday Jun 11, 2014

New expansion for the ZS3-2

If you missed the announcement, the ZS3-2 can now grow to 16 disk trays, up from 8. It can now also support four of any kind of IO card. 

I know, I know, I have not done anything in this blog for a while now. That was not by design. There will be a nice upgrade for the 2013 code (OS8.2) coming soon. When it comes out I will certainly blog about it ASAP.

Thursday Dec 12, 2013

Cluster tricks & tips

Most of us have clustered ZFSSAs, and have been frustrated at one time or another with getting the proper resource to be owned by the proper controller.

I feel your pain, and believe me, I have to deal with it as much or even more than you do. There are, however, some cool things you can do here and it will make your life easier if you fully understand how this screen works. 

First, understand this- You almost never want to push the 'Takeover' button. The 'Takeover' button actually sends a signal to instantly reboot the OTHER controller, in a non-graceful way. More on that below. We have two heads in this picture and they're both in the "Active" state as you see here. This means you can not click the "Failback" button which is how we move resources to the head you wish to own them. You are only allowed ONE Failback when a head is in the "Ready for Failback" state, as it is when it first comes up. We have already hit Failback on this system, so both heads are now Active. That's it. You're done until one reboots. 

Do NOT hit the 'Takeover' button. That button should be labeled "Ungracefully shutdown the other controller". Those were just too many words to fit on the button, so they called it Takeover. Sure, that means that since the other head is now being instantly rebooted, this head will now takeover all of the resources and the other head will now reboot. This is one of the worse ways to reboot the other head. It's not nice. It does not flush the cache first. It's actually slower then the other way. When and why would you ever hit it? There's a few reasons. Perhaps the other head is in a failed state that is not allowing you to log in and shut it down correctly. Perhaps you are just setting the controls up on day one, you know there's no workload at all, and you really don't care how the other head gets rebooted. If that's the case, then go for it. 

Instead, for a clean and faster reboot, log into the controller you want to reboot, and click the power button:

This allows you to reboot is gracefully, flushing the cache first, and it actually comes up faster than the 'takeover' way, almost always.

Now that it has rebooted, which may take 5-15 minutes, the good controller's cluster screen should show that it's "Ready for Failback". Be certain all of your resources are set to the proper owner, and then hit the "Failback" button to move the resources and change both controllers to the "Active" state. REMEMBER--- You only get to hit the Failback button ONCE!!! So take your time and do all of your config and setup and get the ownership right before you hit it. Otherwise, you will be rebooting one of your controllers again. Not a huge deal, but another 15 minutes of your life, and perhaps a production slowdown for your clients.

Now for a trick. There's nothing I can do to help you with the network resources. If they are on the wrong controller, you may have to reboot one and fix it and do a failback. However, if you have a storage pool on the wrong controller, I may be able to show you something cool.  The best thing to remember and do is this: Create the resource (network or pool) ON the controller you wish to be the owner in the first place!!! Then, it will already be owned by the proper one, and you don't have to do a failback at all. However, what if, for whatever reason, you need to move a pool to the other controller and you MUST NOT reboot a controller in order to move it using the Failback process? In other words, you have an Active-Active setup, the Failback button is grayed out, and it's very important that you change the ownership of a storage pool but you are not allowed to reboot one of the controllers?

Bummer, right? Not so fast, check this out. 

So here I have a system with two pools, Rotation and Bob, both on controller A. The Bob pool is supposed to be on controller B. They are both Active, so I can not click Failback. I would normally have to reboot head B to fix this. But I don't want to.

So I'm going to unconfigure the Bob pool here on controller A. That's right, unconfigure. This does NOT hurt your data. Your data is safe as long as you do NOT create a new pool in that space. We're not going to create a new pool. We're going to IMPORT the Bob pool on controller B. All of your shares, LUNs, and their properties will be perfectly fine. There is only one hiccup, which we will talk about.

Go to Configuration-->Storage, select the correct pool (Bob), and then click "Unconfig". 
But first, I want you to look carefully at the info below the pie chart here. Note that Bob currently has 2 Readzilla cache drives in it. This is important.

You will get this screen. Take a deep breath and hit apply.

No more Bob. Bob gone. Not really. It's still there and can be imported into another controller. This is how we safely move disk trays to new controllers, anyway. No big deal.

So, now go log into the OTHER controller. Don't do this on the same one or else you'll have to start all over again. 
Here we are on B. DO NOT click the Plus Sign!!!! That will destroy your data!!!!
Click the IMPORT button.

The Import button will go out and scan your disk trays for any valid ZFS pools not already listed. Here, it finds one called "bob". 

Select it and hit "Commit". There, Bob Pool is back. All of it's shares and LUNs will be there too. The "Rotation" pool shows Exported because it's owned by the "A" controller, and the Bob Pool is owned here on B. 

We can go to Configuration-->Cluster and see all is well and Bob Pool is indeed owned by the controller we wanted, and we never had to reboot!

However, we have one big problem.... Did you notice when you Imported the Bob Pool  into controller B, the Cache drives did NOT come over?
It now has zero cache drives. What did you expect? The cache drives are the readzillas inside the controller, itself. They can't move over just because you changed the owner.
No problem.
I have 2 extra Readzillas in my B controller not being used. So All I have to do is Add them to the Bob Pool.
Go back to Configuration-->Storage on the B controller. Select the Bob pool and click "ADD". Do NOT click the plus sign. This is different.

I can now add any extra drives to the Bob pool. In this case, I don't have anything I could possibly add other then these two readzillas inside controller B. So pretty easy.

Once added, I'm all good. I now have the Bob pool, with cache drives, being serviced on controller B with no reboot necessary.

That's it.

By the way, you know you can not remove drives from a pool, right? We can only add. This includes SSDs like Logzillas and Readzillas.
Well, I kind of just showed you a way you CAN remove readzillas from a pool, didn't I? Hmmmmmm.....

Tuesday Dec 10, 2013

Upgrading to AK8.1 (2013.1.1.0)

Ok, so AK8.1 has some cool new features. One of which is the ability to have block or record sizes on your shares larger than the current 128K. You can now have 256K, 512K, and 1M. This is important for some great performance boosts on different types of workloads. 

This feature is a deferred update in AK8.1. In other words, just upgrading to AK8.1 does not turn on this feature. Also, after you apply the deferred update, you will NOT be able to roll-back to a previous version. So test and check before you apply.

Here is my screen after the upgrade. Note the new "Deferred Update" section that appears.

If you click the "More info" button, you will see the following help file:

 So, before I apply the deferred update, if I go into a share, I will see the large block sizes in the pull-down but note that they are greyed out.

 After I apply the update, which does NOT require a reboot and is very quick, the same menu now has black entries that I can choose.

Monday Dec 09, 2013

New code out today- AK8.1 or 2013.1.1.0

The first minor release of code 2013.1 is now out and can be downloaded in MOS.

It is 2013.1.1.0, or AK8.1

Along with some bug fixes, it has three main new features:

Support for 2-port 16Gbs Fibre Channel HBA Target and Initiator (backup)

The drivers for supporting the 8300 Series adapters are available in 2013.1.1.0. In ZFS Storage Appliance, it supports SAN traffic at line rate, 16Gbps Fibre-Channel speeds at extremely low CPU usage with full hardware offloads.This extreme performance eliminates potential I/O bottlenecks in today's powerful multiprocessor, multicore servers.


Large Block Size (1M) Support

Enable support for block/record sizes bigger than 128k (256k, 512k and 1M) for filesystems or LUNS. The implementation for this includes a deferred update detailed in the Deferred Updates and Remote Replication Compatibility with Large Block/Recordsize Update sections below.


SPA Sync Concurrency

In order to better utilize the performance of high-speed storage devices, such as SSDs, improvements have been made to ZFS's algorithm for committing transaction groups. Specifically, the Storage Pool Allocator (SPA) sync process has been improved to parallelize some operations so that it spends a larger percentage of time writing data to the pool devices.

Friday Nov 01, 2013

VNIC - New feature of AK8 - Working with VNICs

One of the important new features of the AK8 code is the ability to use multiple IP addresses on the same physical network port. This feature is called VNICs, or Virtual NICs. This allows us to no longer "burn" a whole port in a cluster when one cluster peer owns a network port. Traditionally, we have had to leave Net0 empty on controller 2, because it was used for managing controller 1. Vise-versa for Net1 on Controller 1. Then, if you have data going over 10GigE ports, you probably only had half of your ports running at any given time, and the partner 10GigE port on the other controller just sat there, doing nothing, unless the first controller went down.

What a waste. Those days are over. 

I want to thank and give a big shout-out to our good partner, OnX Enterprise Solutions, for allowing me to come into their lab and play around with their 7320 to do this demo. They let me make a big mess of their lab for the day as I played around with VNICs. If you're looking for a partner who knows Oracle well and can also piece together a solution from multiple vendors to get you what you need, OnX is a good choice. If you would like to talk to your local OnX rep, you can contact Scott Gill at Scott.Gill@Onx.com and he can point you in the right direction for your area. 

Here we go:

Here is what your Datalinks window looks like BEFORE you upgrade to AK8.

Here's what the same screen looks like after you upgrade. See the new box?

So here is my current network setup. I have my 4 physical interfaces setup each with an IP address. If I ping them, no problems. 

So I can ping 180, 181, 251, and 252. However, if I try to ping 240, it does not work, as the 240 address is not being used by any of these interfaces, right?
Let's change that.

Here, I'm going to make a new Datalink by clicking the Datalink "Plus sign" button. I will check the VNIC box and tell it to use igb2, even though another interface is already using it.

Now, I will create a new Interface, and choose "v_dl2" for it's datalink.

My new network screen looks like this.
A few things to take note of here. First, when I click the "igb2" device, it only highlights dl2 and int2. It does not highlight v_dl2 or v_int2.
I think it should, but OK, it looks like VNICs don't highlight when you click the device. 
Second, note how the underscore character in v_dl2 and v_int2 do not seem to show on this screen. You can see it plainly if you go in and edit them, but from here it looks like a space instead of an underscore. Just a cosmetic bug, but something to be aware of.

Now, if I click the VNIC datalink "v_dl2", on the other hand, it DOES highlight the device it belongs to, as it should. Seen here:

Note that it did not, however, highlight int2 with it, even though int2 is connected to igb2. That's because we clicked v_dl2, which int2 has nothing to do with. So I'm OK with that.

So let's try pinging 240 now. Of course, it works great.

 So I now make another VNIC, and call it v_dl3 using igb3, and v_int3 with an address of 241. I then setup three shares, using ports 251, 240, and 241.
Remember that IP 251 and 240 both are using the same physical port of igb2, and IP 241 is using port igb3.

Next, I copy a folder full of stuff over to all three shares at the same time. I have analytics going so I can see the traffic. My top chart is showing the logical interfaces, and the bottom chart is showing the physical ports.
Sure enough, look at the igb2 and vnic1 interfaces. They equal the traffic going over the igb2 physical port on the second chart. VNIC2, on the other hand, gets igb3 all to itself.

This would work the same way with 10Gig or Infiniband ports. You can now have multiple IP addresses and even completely different subnets sharing the same physical ports. You may need to make route table entries for that. This allows us to use all of the ports you paid for with no more waste. 

Very, very cool. 

One small "bug" I found when doing this. It's really not a bug, it was designed to do this when VNICs were not around. But now that we have NVIC capability, they should probably change this. I've alerted the engineering team about this and they're looking into it, so perhaps it will be fixed in a later code.

Here it is. Remember when we made the new VNIC datalink, I specifically said to click on the "Plus Sign" button to create it? I don't always do that. I really like to use the drag-and-drop method to create my datalinks in the network screen.
HOWEVER, if you were to do that for building a VNIC, it will mess you up a little. Watch this.

Here, I'm dragging igb3 over to make a new datalink. igb3 is already being used by dl3, but I'm going to make this a VNIC, so who cares, right?

Well, the ZFSSA does not KNOW you are going to make it a VNIC, now does it? So... it works as designed and REMOVES the igb3 device from the current dl3 datalink in the background. See how it's now missing? At the same time, the dl3 datalink choice is missing from my list of possible VNICs for me to choose from!!!! Hey!!! I wanted to pick dl3. Why isn't it on the list??? Well, it can't be on this list because dl3 no longer has a device associated with it. Bummer for you.

When you click cancel, the device is still missing from dl3.

The fix is easy. Just edit dl3 by clicking the pencil button, do absolutely nothing, and click "Apply". The device will magically come back.

Now, make the VNIC datalink by clicking the "Plus Sign" button. Sure enough, once you check the VNIC box, dl3 is a valid choice. No problem.

 That's it for now. Have fun with VNICs.

Wednesday Oct 23, 2013

Replication with AK8

Hello folks,

This came up today and I want to make sure it's clear.

Remember the "deferred update" I spoke about in my "Upgrade to AK8" entry just a bit ago? It's important to understand that this deferred update changes the way replication works. It is necessary that systems with the deferred update applied only replicate with other systems that have also had this deferred update applied. So if you apply it, your system can NOT replicate with ANY other system that has NOT had it applied, even if that other system is running AK8!!! Got it???

Remember, we do have a new version of the 2011 code for the older systems that do not want to upgrade to AK8. This 2011.1.8 code ALSO HAS this same deferred update in it. So, if you upgrade your system to AK8, and then apply the deferred update, and you have another system running either 2011.1.8 or AK8, you can replicate with them again once they apply the deferred update for multiple initiator groups. Yes, even if you're not using LUNs. Here is what it looks like if you try. It will fail.

Wednesday Oct 16, 2013

OS8- AK8- The bad news...

Ok I told you I would give you the bad news of AK8 to go along with all the cool new stuff, so here it is. It's not that bad, really, just things you need to be aware of.

First, the 2013.1 code is being called OS8, AK8 and 2013.1 by different people. I mean different people INSIDE Oracle!! It was supposed to be easy, but it never is. So for the rest of this blog entry, I'm calling it AK8.

AK8 is not compatible with the 7x10 series. Ever. The 7x10 series is not supported with AK8, and if you try to upgrade one, it will fail at the healthcheck.

All 7x20 series, all of them regardless of age, are supported with AK8.

Drive trays. Let's talk about drive trays and SAS cards. The older drive trays for the 7x20 series were called the "Riverwalk 2" or "DS2" trays. They were technically the "J4410" series JBODs that Sun used to sell a la carte before we stopped selling JBODs. Don't get me started on that, it still makes me mad. We used these for many years, and you can still buy them right now until December 15th, 2013, when they will no longer be sold. The DS2 tray only came as a 4u, 24 drive shelf. It held 3.5" drives, and you had a choice of 2TB, 3TB, 300GB or 600GB drives. The SAS HBA in the 7x20 series was called a "Thebe" card, with a part # of 7105394. The 7420, for example, came standard with two of these "Thebe" cards for connecting to the disk trays. Two Thebe cards could handle up to 12 trays, so one would add two more cards to go to 24 trays, or have up to six Thebe cards to handle 36 trays. This card was for external SAS only. It did not connect to the internal OS drives or the Readzillas, both of which used the internal SCSI controller of the server.

These Riverwalk 2 trays ARE supported with AK8. You can upgrade your older 7420 or 7320, no problem, as-is. The much older Riverwalk 1 trays or J4400 trays are NOT supported by AK8. However, they were only used by the 7x10 series, and we already said that the 7x10 series was not supported.

Here's where it gets tricky. Since last January, we have been selling the new style disk trays. We call them the "DE2-24P" and the "DE2-24C" trays. The "C" tray is for capacity drives, which are 3.5" 3TB or 4TB drives. The "P" trays are for performance drives, which are 2.5" 300GB and 900GB drives. These trays are NOT Riverwalk 2 trays, even though the "C" series may kind of look like it. Different manufacturer and different firmware. They are not new. Like I said, we've been selling them with the 7x20 series since last January. They are the only disk trays we will be selling going forward. Of course, AK8 supports them.

So what's the problem? The problem is going to be for people who have to mix drive trays.

Remember, your older 7x20 series has Thebe SAS2 HBAs. These have 2 SAS ports per card.  The new ZS3-2 and ZS3-4 systems, however, have the new "Thebe2" SAS2 HBAs. These Thebe2 cards have 4 ports per card. This is very cool, as we can now do more SAS channels with less cards. Instead of needing 4 SAS cards to grow to 24 trays like we did with the old Thebe cards, I can now do 24 trays with only 2 Thebe2 cards. This means more IO slots for fun things like Infiniband and 10G. So far, so good, right? These Thebe2 cards work with any disk tray. You can even mix older DS2 trays with the newer DE2 trays in the same system, as long as you have Thebe2 cards.

Ah, there's your problem. You don't have Thebe2 cards in your old 7420, do you? Well, I told you the bad news wasn't that bad, right? We can take out your Thebe cards and replace them with Thebe2. You can then plug your older DS2 trays right back in, and also now get newer DE2 trays going forward. However, it's important that the trays are on different SAS channels. You can mix them in the same system, but not on the same channel. Ask your local SC if you need help with the new cable layout. By the way, the new ZS3-2 and ZS3-4 systems also include a new IO card called "Erie" cards. These are for INTERNAL SAS to the OS drives and the Readzillas. So those are now SAS2 instead of SATA like the older models. Yes, the Erie card uses an IO slot, but that's OK, because the Thebe2 cards allow us to use less SAS HBAs to grow the system, right?

That's it. Not too much bad news and really not that bad. AK8 does not support the 7x10 series, and you may need new Thebe2 cards in your older systems if you want to add on newer DE2 trays. I think we can all agree that there are worse things out there. Like our Congress.  

Next up.... More good news and cool AK8 tricks. Such as virtual NICS. 

Friday Oct 11, 2013

Do you want to upgrade to AK8 (2013.1) right now?

Ok, so you will hear some great stuff about AK8, but are you going to upgrade your production system to a new major release right after it comes out? Probably not. If you have a test system or a lab system you can play with, then I highly recommend upgrading it so you can start to see the new performance features that AK8 can give you. If you only have one system, or they're all in production, then of course you're going to wait for the first minor release of the new code, aren't you? I would too. I'm told the first minor is coming out in just a few weeks. It is the release they used for the public benchmark performance testing. So you can feel more confident in that release. You may also be able to talk to your local sales team about getting a demo unit. Then, you can play with the new code in a safe lab area before upgrading your production system.

Next up... The negative aspects of upgrading to AK8. It's not too bad, but you will need to know which older systems can't do it, how to work with older disk trays, and whether or not you can replicate newer systems with older systems. 

Hey, I told you I wasn't just going to blow sunshine on you all the time, right? I can spit out the kool-aid as well as drink it!  :)

Thursday Oct 10, 2013

Upgrading to OS8 - AK8- 2013.1

The upgrade to OS8, AK8 or whatever we are calling it this week was pretty straightforward. It will take some extra time, as it has to perform some one-time jobs the first time it reboots, but it wasn't more than 15 minutes. Your mileage may vary, it's possible on larger systems that it takes longer. There is also a deferred update I will show you down below that you can choose to do right away or later. Once you do that deferred update, you do NOT want to roll back to the previous version, so be warned. 

It's been over 1.5 years since the last major update, so many of you probably have never done one before. The process is just like a minor update, it just takes longer. 

1 Get the update from MOS and unzip it to a folder. Go ahead and upload it and unpack it like normal from your Maintenance-->System screen. I did like how it tried to tell me how much time was left, but the numbers were all over the place, and it was over by the time it was correct.

Now, when you click the arrow to apply the update, the normal health check window appears, but you will notice something extra. That's the 'Deferred Update' choice. You can make it apply as soon as it reboots, or you can manually apply it later. Remember, you do NOT want to rollback after this is applied. I did "Upon Request", click the "Check" button, and if all is well, click "Apply" 

After it installs and reboots, you can look at the command line via serial port or SSH. You will notice a few things are different during this boot-up.

Right after the "Updating ####" section you can see it actually upgrading various services and the SMF repository. This can take around 3 minutes, but if you have a lot of aggragations or IPMP then it could take longer. So relax. You can see mine, below, which went 290 seconds, and then continued upgrading other stuff.

 The upgrade continues, and the screen is pretty obvious.

 When you see it configuring network devices, you're almost done. You can see the new code level, and it's about to go to the login prompt. At that point, you should be able to log back into the BUI.

 Log back into the BUI, and you will see the new version is the current version in Maintenance-->System

Now, let's do the deferred update on the same screen.

You can read about the deferred updates here, and click apply when ready to add them. In this case, it's for the ability to associate multiple initiator groups with a LUN, something we have wanted for some time now, so very cool. Note that ANY other deferred updates you have not applied yet will also apply, as there is no way to pick and choose. Either they all apply or none do. Remember I said not to roll-back to a previous version of the code after you do this? It will let you, but if you do, your LUN operations will fail. No bueno. Don't do it. The deferred upgrades are one-way.

Note that the deferred update does NOT force a reboot. 

Once you apply the deferred updates, the whole deferred update area goes away, and the screen now looks like this. 

Do you want to see something cool right away now in OS8 that you could not do before? There's a lot I will talk about later, but for now, since you're so excited, go to Configuration-->Alerts, and create a new Threshold Alert. Notice the new Capacity threshold alerts, where you can now get emails or create an action when a pool, and project, or a share goes over, say, 80% full. Sweet.

Tuesday Oct 08, 2013

AK8- OS8- 2013.1- New major release code is available NOW

Well, they said it would be release on October 8th, and they did not disappoint.

The new code, internally called 2013.1.0.1 and what marketing is calling ZFSSA OS8 or AK8 is out now. Download from MOS.

The numerous updates are  hard to all get a handle on at once. This readme file will help: https://wikis.oracle.com/display/FishWorks/ak-2013.1.0.1+Release+Notes

I will be loading it, playing with it, and showing some of my favorite things coming up soon, as in the next few days.

Much of the improvements are what you can not see, such as the improved ARC and RAID benefits. 

Lots to talk about. Especially if you need to mix trays. Be careful. Read the file. Stay tuned. 

Tuesday Sep 24, 2013

Great Analytic blog

My co-worker, Darius, just made a great post about how he helped a client using the built-in analytics of the ZFSSA. Check it out here: https://blogs.oracle.com/si/entry/using_analytics_dtrace_to_troubleshoot

Tuesday Sep 10, 2013

ZS3 is #1 on Storage Performance Council benchmark site

This is pretty cool. It seems the ZS3-4 just became the number 1 system in performance on Storage Performance Council's benchmark site.

The email below went out today to all SPC members. 

I would like to point out that we are also the LEAST EXPENSIVE system per SPC-2 performance. Check out our Price/Performance numbers.
So we came in at 17,244 for a $388,472 system, for a price/performance of $22.53.
Now compare that to the 2nd place system on the site, which is HP's P9500. It came in at  13,147 for a huge price of $1,161,503 and a price/performance of $88.34

We KILLED it....

SPC Members:
Oracle Corporation has submitted the SPC-2 Result™ listed below.
The Executive Summary and Full Disclosure Report (FDR) are posted in the Benchmark Results section of the website.
The documents may be accessed by using the URL listed below:

Oracle ZFS Storage ZS3-4 (2-node cluster):

   SPC-2 Submission Identifier .... B00067
SPC-2 MBPS™ …………………….... 17,244.22
   SPC-2 Price-Performance™ …… $22.53/SPC-2 MBPS™
   Total ASU Capacity ………….…..  31,610.959 GB
   Data Protection Level ………..…. Protected 2 (Mirroring)
   Total Price ………………………….... $388,472.03

Congratulations to Oracle for an outstanding SPC-2 Result, which established a new #1 for SPC-2 performance (17,244.22 SPC-2 MBPS™).
SPC Administrator
Storage Performance Council (SPC) 

New ZS3 ZFS Storage family announced TODAY! Finally!

It's official and we can finally be excited about the new ZS3 family. We can start talking about it now and start ordering it on Thursday. It won't actully ship, however, until next month on October 8th. I know, I know... but hey I CAN give you something TODAY... How about the 4TB drives, available right now??? Also, the new 1.6TB Readzillas and the 16-port 4x4 SAS HBAs are all here this Thursday, Sept 12th. Not bad, right?

So we now have three new systems:
1- The new ZS3-2
2- The new ZS3-4
3- The updated 7420M2 with internal SAS

The first two will ship with the new OS8 code. The 3rd one will ship with the older OS7 (2011.1.7) code, but can be updated to OS8 at anytime. (see my last blog entry about the new OS names) 

Ok, here is the low-down. The new 4TB drives are for the DE2-C trays which have been out since last December. I do NOT YET KNOW if they will also be available for the older DS2 trays, but I will tell you when I find out. This is important--- The new 1.6TB Readzillas are SAS, not SATA, so they will only work in the new ZS3 series and the new 7420M2 box. Your older 7420 and 7320 use internal SATA, not SAS, for their Readzillas and system OS drives. The new 900GB OS drives and the new 1.6TB Readzillas are SAS, so you need the newer versions to work with them.

The LAST order date for the current 7420 is September 30, 2013, and the LOD for the 7320 and 7120 is November 30, 2013. 

You can get a new product datasheet or the product announcement from your local storage SC.  

Monday Aug 26, 2013

Some info about our ZFSSA codes

As you now know, version 2011.1.7.0 is the current shipping code for our ZFSSA. You really want to be running this code, no matter what ZFSSA system you have. This code will work all they way back to 7x10 systems. There are found bugs in even the last code, 2011.1.6.0, that this newer code will fix, so get on it.

Let's talk for a moment about code names and numbers, as it's going to change from your point of view very soon. Many years ago, Sun Microsystems created the "Fishworks" team to create this code that we now run on the ZFS Storage appliance. You can still see Fishworks and the original team names if you "Shift-Click" the Oracle/Sun logo in the top left corner of your ZFSSA. (There are MANY secret Shift-Click operations in the ZFSSA. I told you about some back in my blog on analytics here: https://blogs.oracle.com/7000tips/entry/fun_tips_with_analytics) By the way, FISH stands for "Fully Integrated Software & Hardware.

So the code that Fishworks created is a layer between you, the user, and the special version of Solaris and ZFS underneath. This is called an "Appliance Kit", and you will see all sorts of system names with an "AK" on them, which are directly linked to the Appliance Kit, which is basically the code for the interface, both the GUI and the CLI, which you all know and love. Internally at Oracle the Fishworks team, now a much larger team that Oracle has grown far beyond the original, calls the code levels for the ZFSSA "AK#####". For example, the code level you are all running right now is called AK7. It has minor updates to it, but the major code is AK7, with a minor now of 04.24.7, so really the last code level released is AK7.04.24.7. You have all been calling it "2011.04.24.7", because in the past they used the year the major release came out as it's name. For obvious reasons, this no longer makes sense. People think the current code they're running was made two years ago in 2011, but that's just not the case. This last release bears almost no resemblance to the original AK7 code. So much has changed in it.

So, to make things simpler, Oracle is dropping the year on the code, and will now call it AK#.#.#, starting with the upcoming release of AK8. In all likelihood,  there will still be one more minor release of AK7 coming first, so don't wait to upgrade like AK8 is just around the corner. It's still going to be a few months, and you don't want to hit a bug before that, so upgrade when you can to AK7-7 (my nickname for the current release). 

AK8 will be a game-changer. I'm not allowed to talk about it too much, but speak with your local storage SC and maybe they can give you a heads-up. HUGE stuff coming folks. Just the performance enhancements are going to be a world-changing event in the storage industry. If you have a 7x20 series system, going to AK8, without doing anything else, is going to make your system better and faster.

You can see all of the software release history here: https://wikis.oracle.com/display/FishWorks/Software+Updates

Wednesday Aug 14, 2013

New code out now

In case you were not paying attention, code 2011.1.7 is now out.



Tuesday Jul 02, 2013

Awesome new feature for HCC

I've talked about HCC (Hybrid Columnar Compression) before. This is Oracle's built-in compression feature, free of charge in 11Gr2, that allows a CRAZY amount of compression on historical data inside an Oracle database. It only works if the database is being stored in a ZFSSA, Exadata or Axiom. You can read all about it in this whitepaper, which shows the huge value of HCC when used with the ZFSSA. http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-hybrid-columnar-compression-1689701.html

Now, even better, Oracle has announced  a great new feature in Oracle 12c called "Automatic Data Optimization". This allows one to setup HCC to AUTOMATICALLY compress data AS IT AGES. 

So this is now ILM all built into the Oracle database. It's free for crying out loud. It just needs to be sitting on Oracle storage, such as the ZFSSA, Exadata or Axiom. 

Read about ADO here: http://www.oracle.com/technetwork/database/automatic-data-optimization-wp-12c-1896120.pdf?ssSourceSiteId=ocomen

Thursday May 30, 2013

Wikibon has a new article giving nice praise to the ZFSSA

It seems Wikibon has done some research and interviews and has written a very nice article on the awesome cost savings of the ZFSSA. 
Check it out here:


Here are some of my favorite quotes from the article:

“The high-end ZFS storage array is the highest performing hybrid storage device that has been analyzed by Wikibon, and in a class of its own when it comes to high write-IO environments.”

“Wikibon analyzed the architecture and performance of the ZFS Appliance in depth, and compared it to "traditional storage arrays" (e.g. EMC, NetApp, HDS, HP, etc. mainstream mid-market arrays) in high write environments.”

“For an environment with 100 terabytes, 1,000,000 IOPS and 20% writes, the additional cost of the traditional system (NetApp) is 194% higher than the hybrid system (ZFS). “

“Wikibon members should consider the ZFS Appliance in more demanding workloads where sustained write performance and IO requirements are higher. Examples include high performance environments such as specific backup applications and core transaction-intensive database workloads. In these situations, because of the hybrid design of the ZFS Appliance, customers will find significant savings relative to traditional disk arrays that don't scale as well.”

“CIOs, CTOs and senior storage executives should position the Oracle ZFS appliance as an ideal strategic fit for high streaming environments such as database backups. As well, the product can be successfully integrated into high-performance Oracle database workloads. In write-intensive and heavy IO workloads, the ZFS appliance will likely prove the best-of-breed, lowest cost solution.”

“The general feedback from the ZFS appliance practitioners was positive”

“Praise for the performance of the ZFS, particularly in backup (high-write) environment;”

“7 gigabytes/second write rates achieved in a benchmarks;”

“11 terabytes/hour sustained over 2.5 hours for backup, compared with 1 terabytes/hour for a traditional storage device;”

“ZFS snapshots and clones universally praised;”

“DTrace was praised for the quality and completeness of the performance analytic tool;”

“Compression performance was strongly praised (up to 16x compression), especially for reads;”

“None of the respondents needed to tune the ZFS read or write caching - performance maintenance was minimal;”

“No problems with availability.” 

Tuesday May 28, 2013

New eye-chart for the ZFSSA

I finally updated my ZFSSA eye-chart. Hey, it's only three months late.

You can find it under the "Bookmarks" section on the right. Version 12 is the newest one.

Wednesday May 08, 2013

ZFSSA update 2011.1.6.0 is now available

For those who have not noticed, ZFSSA version 2011.1.6.0 came out on April 29th.

Go get it.

Release notes are here:


Monday Apr 08, 2013

Clone license clarification

Someone asked a good question about the clone and snap-manager licenses, so I wanted to clarify.

The info I received is that if you use the new snap-manager for ZFSSA product, all snaps and clones created by that product are covered by it's one-time license. You do not need the additional clone license to manage these at all.

However, if you had other clones you are creating that are NOT created via the snap-manager tool, then yes, you do need the other clone license for these to be supported.

I hope that makes sense.


Tuesday Feb 12, 2013

SnapManager for Oracle DB for ZFSSA is out and ready

A few weeks ago, Oracle announced the Oracle database SnapManager software for ZFSSA.

It is a license just like the Clone or the Replication license. It's just a one-time, yes-or-no, on-or-off license per controller. Better yet, you can go ahead and get the software and try it out for free for 30 days. Go check it out with the link below.

The Snap Management Utility combines the underlying snapshot, clone, and rollback capabilities of the Oracle ZFS Storage Appliance with standard host-side processing so all operations are consistent.

Downloading the Oracle Snap Management Utility for Oracle Database Software

A. Customers who purchased the license need to download the software from eDelivery (see

instructions below)

B. Customers who wish to evaluate for 30-days prior to purchase may download from the same

site. The license allows a 30-day evaluation period. Follow instructions below.

Instructions to download software:

1. Go to eDelivery link: https://edelivery.oracle.com/EPD/Search/handle_go

2. Login

3. Accept Terms and Restrictions

4. In the “Media Pack Search” window:

a. Under Product Pack, select “Sun Products”

b. Under Platform, select “Generic”

c. Click “Go”

5. From the results, select the “Oracle Snap Management Utility for Oracle Database”

6. There are two files for download:

a. The “Oracle Snap Management Utility for Oracle Database, Client v 1.1.0” is required

b. The “Sun ZFS Storage Software 2011.1.5.0” is the latest version of the ZFS Storage

Appliance SW provided for customers who need to upgrade their software.

UPDATE-7-27-13- Just found out that if you buy the SMU license, you do NOT need to buy the clone license. The cloning is included in SMU, so that's cool.

Monday Feb 11, 2013

Oracle Iaas now includes the ZFS Backup Appliance

Ok, so this is pretty cool. If you didn't know, Oracle has this great program called Iaas, which is Infrastructure As A Service. You can go check it out here: http://www.oracle.com/us/products/engineered-systems/iaas/overview/index.html

What this means it that someone who really wants an Oracle engineered system, such as an Exadata, but can't come up with the up-front cost, can do Iaas and put it in their datacenter for a low monthly fee. This can be really cool. Some people can now change their entire budget from a Cap-ex to an Op-ex, save a bunch of up-front costs, and still get the hardware they need and want.

As of this week, the ZFSBA is now included in the Iaas offering. So one can get the ZFS Backup Appliance and use it to backup their engineered system (Exadata, Exalogic, or SuperCluster) over infiniband. They can also use it to then make snaps and clones of that data for their testing and development, as well as use it for general-purpose storage over 10Gig, 1Gig or FC. Pretty sweet way to get the ZFS Storage system in your site without the up-front costs. You can get the ZFSBA in a Iaas all by itself if you want, without the engineered system at all, just to get the ZFS storage.

Now, some of you may be asking, "What the heck is the ZFSBA and how is it different than the ZFSSA?"

I haven't talked about the ZFSBA before. The ZFS Backup appliance. I probably should have. You can get more info on it here: http://www.oracle.com/us/products/servers-storage/storage/nas/zfs-backup-appliance/overview/index.html
Here is the low-down. It's a 7420 cluster with drive trays, all pre-cabled and in a rack, ready-to-go. The 7420 has IB cards in place and the whole system is a single line-item to make it easy for the sales team to have a single line-item part number to use as an easy way to add a ZFSSA to an engineered system deal for backing up the engineered system. There are two versions, one with high-capacity drives and the other with high-performance drives. Either one you get can add additional trays of either type later. Unlike the other engineered systems, the ZFSBA does allow one to use the extra space in the rack, which is nice. 
Sun ZFS Storage 7420

So, if you want a 7420 cluster and a rack, is there a downside to always using the ZFSBA to order a 7420? Not many. Same price, easier to order with less part numbers. You can still customize it and add more stuff. There is one downside, and that's the fact that the ZFSBA does use the 32-core version of the 7420, not the 40-core version. The backup of an Exadata does not require more cores, so they went with the smaller of the two. If you need more power and more DRAM for faster workloads, however, you may want to build a 7420 ZFSSA the normal way.

If this doesn't make sense, please add a comment below or just email me.  



This blog is a way for Steve to send out his tips, ideas, links, and general sarcasm. Almost all related to the Oracle 7000, code named ZFSSA, or Amber Road, or Open Storage, or Unified Storage. You are welcome to contact Steve.Tunstall@Oracle.com with any comments or questions


« July 2016