By Steve Tunstall-Oracle on Jan 23, 2015
This is a wonderful article by Roch (rhymes with Spock) that I thought you all may find interesting. Roch's top ten list of how ZFS has improved performance.
This is a wonderful article by Roch (rhymes with Spock) that I thought you all may find interesting. Roch's top ten list of how ZFS has improved performance.
Yes, the new, larger Logzilla SSD drives for your ZFSSA systems are now out. They are 200GB usable, up from the 73GB usable drives.
Yes, you will sometimes see them referred to in some marketing literature as 400GB. This is because there is extra room in enterprise SSD chips to allow for cell burnout and keep their 5 years lifetime. Make no mistake, they will give you 200GB of actually capacity in the ZFSSA systems.
Yes, they are compatible with the current 73GB version. You can mix and match. The one thing to look out for is in a 'mirrored' log profile. If you mix a new one with an old one in a mirrored log profile, then the new one will size down to 73GB to match it. In a striped profile, it doesn't matter, nor will it matter if you have 2 or more of each.
One last thing-- They are almost twice as fast as the older 73GB version. If you mix them, you will get faster, but not as fast as if you had all 200GB versions. Diminishing returns. Talk to your local SC on whether your Lozgilla workload is so great that either adding some new ones or even changing out your old ones would help your performance. Not every workload needs Logzillas, but there are built-in analytics that can tell us if yours is a good fit.
So we have been saying how well the ZFSSA works in a VM environment for years. We tested and wrote a white paper on VMWare running on the ZFSSA back at Sun Microsystems well before being bought by Oracle. People still assume that now that we are Oracle, we must only work with Oracle's version of vitural machine but not true VMWare... I do hope our presence at VMWorld and this blog can help put those fears to rest. The ZFSSA KILLS the VMWare workload and we fully test and support it.
My colleague Darius wrote an excellent blog about the superior performance on the ZFSSA due to larger block sizes. It shows why we out-perform NetApp with workloads such as SAS and SQL.
Check it out here: https://blogs.oracle.com/si/entry/why_your_netapp_is_so
Everyone is talking about Clouds. Cloud this, cloud that, cloudy cloud cloud.
What is it? To begin with, there's no such thing. If you store your data "on the cloud" it's still being stored SOMEWHERE by SOMEBODY. It's just that you're not storing it yourself. You are paying someone else to do it. Well, they are storing it on real hardware. Real servers and storage. Then, they are charging you to use their hardware (or maybe giving you the space for free and charging advertisers).
Now it turns out the ZFSSA is an excellent storage device for Cloud services. There are many cloud service software products out there. OpenStack is one of them, and it's open source, so that's cool. Icehouse is the newest version of it. Version 9 I believe. There is a plug-in for OpenStack for the ZFSSA.
My Colleague, Roden Kofman, has a new blog showing how this plugin works with the ZFSSA. Check it out here: https://blogs.oracle.com/ronen/entry/running_openstack_icehouse_with_zfs
You can read more about OpenStack Icehouse here: http://www.openstack.org/software/icehouse/
Oracle released a new SPC2 benchmark today, which you can find on Storage Performance Council website here: http://www.storageperformance.org/results/benchmark_results_spc2_active
As you can see, the ZS3-2 gave excellent results, with the best price/performance ratio on the entire website, and the third fastest score overall. Does the Kaminario still beat it on speed? Yep it sure does. However, you can buy FIVE Oracle ZS3-2 systems for the same price as the Kaminario. :)
Storage Performance Council SPC2 Results
ASU Capacity GB
Data Protection Level
SGI InfiniteStorage 5600
HP P9500 XP
Results found on http://www.storageperformance.org/results/benchmark_results_spc2_active
If you missed the announcement, the ZS3-2 can now grow to 16 disk trays, up from 8. It can now also support four of any kind of IO card.
I know, I know, I have not done anything in this blog for a while now. That was not by design. There will be a nice upgrade for the 2013 code (OS8.2) coming soon. When it comes out I will certainly blog about it ASAP.
Most of us have clustered ZFSSAs, and have been frustrated at one time or another with getting the proper resource to be owned by the proper controller.
I feel your pain, and believe me, I have to deal with it as much or even more than you do. There are, however, some cool things you can do here and it will make your life easier if you fully understand how this screen works.
First, understand this- You almost never want to push the 'Takeover' button. The 'Takeover' button actually sends a signal to instantly reboot the OTHER controller, in a non-graceful way. More on that below. We have two heads in this picture and they're both in the "Active" state as you see here. This means you can not click the "Failback" button which is how we move resources to the head you wish to own them. You are only allowed ONE Failback when a head is in the "Ready for Failback" state, as it is when it first comes up. We have already hit Failback on this system, so both heads are now Active. That's it. You're done until one reboots.
Do NOT hit the 'Takeover' button. That button should be labeled "Ungracefully shutdown the other controller". Those were just too many words to fit on the button, so they called it Takeover. Sure, that means that since the other head is now being instantly rebooted, this head will now takeover all of the resources and the other head will now reboot. This is one of the worse ways to reboot the other head. It's not nice. It does not flush the cache first. It's actually slower then the other way. When and why would you ever hit it? There's a few reasons. Perhaps the other head is in a failed state that is not allowing you to log in and shut it down correctly. Perhaps you are just setting the controls up on day one, you know there's no workload at all, and you really don't care how the other head gets rebooted. If that's the case, then go for it.
Instead, for a clean and faster reboot, log into the controller you want to reboot, and click the power button:
This allows you to reboot is gracefully, flushing the cache first, and it actually comes up faster than the 'takeover' way, almost always.
Now that it has rebooted, which may take 5-15 minutes, the good controller's cluster screen should show that it's "Ready for Failback". Be certain all of your resources are set to the proper owner, and then hit the "Failback" button to move the resources and change both controllers to the "Active" state. REMEMBER--- You only get to hit the Failback button ONCE!!! So take your time and do all of your config and setup and get the ownership right before you hit it. Otherwise, you will be rebooting one of your controllers again. Not a huge deal, but another 15 minutes of your life, and perhaps a production slowdown for your clients.
Now for a trick. There's nothing I can do to help you with the network resources. If they are on the wrong controller, you may have to reboot one and fix it and do a failback. However, if you have a storage pool on the wrong controller, I may be able to show you something cool. The best thing to remember and do is this: Create the resource (network or pool) ON the controller you wish to be the owner in the first place!!! Then, it will already be owned by the proper one, and you don't have to do a failback at all. However, what if, for whatever reason, you need to move a pool to the other controller and you MUST NOT reboot a controller in order to move it using the Failback process? In other words, you have an Active-Active setup, the Failback button is grayed out, and it's very important that you change the ownership of a storage pool but you are not allowed to reboot one of the controllers?
Bummer, right? Not so fast, check this out.
So here I have a system with two pools, Rotation and Bob, both on controller A. The Bob pool is supposed to be on controller B. They are both Active, so I can not click Failback. I would normally have to reboot head B to fix this. But I don't want to.
So I'm going to unconfigure the Bob pool here on controller A. That's right, unconfigure. This does NOT hurt your data. Your data is safe as long as you do NOT create a new pool in that space. We're not going to create a new pool. We're going to IMPORT the Bob pool on controller B. All of your shares, LUNs, and their properties will be perfectly fine. There is only one hiccup, which we will talk about.
Go to Configuration-->Storage, select the correct pool (Bob), and then click "Unconfig".
But first, I want you to look carefully at the info below the pie chart here. Note that Bob currently has 2 Readzilla cache drives in it. This is important.
You will get this screen. Take a deep breath and hit apply.
No more Bob. Bob gone. Not really. It's still there and can be imported into another controller. This is how we safely move disk trays to new controllers, anyway. No big deal.
So, now go log into the OTHER controller. Don't do this on the same one or else you'll have to start all over again.
Here we are on B. DO NOT click the Plus Sign!!!! That will destroy your data!!!!
Click the IMPORT button.
The Import button will go out and scan your disk trays for any valid ZFS pools not already listed. Here, it finds one called "bob".
Select it and hit "Commit". There, Bob Pool is back. All of it's shares and LUNs will be there too. The "Rotation" pool shows Exported because it's owned by the "A" controller, and the Bob Pool is owned here on B.
We can go to Configuration-->Cluster and see all is well and Bob Pool is indeed owned by the controller we wanted, and we never had to reboot!
However, we have one big problem.... Did you notice when you Imported the Bob Pool into controller B, the Cache drives did NOT come over?
It now has zero cache drives. What did you expect? The cache drives are the readzillas inside the controller, itself. They can't move over just because you changed the owner.
I have 2 extra Readzillas in my B controller not being used. So All I have to do is Add them to the Bob Pool.
Go back to Configuration-->Storage on the B controller. Select the Bob pool and click "ADD". Do NOT click the plus sign. This is different.
I can now add any extra drives to the Bob pool. In this case, I don't have anything I could possibly add other then these two readzillas inside controller B. So pretty easy.
Once added, I'm all good. I now have the Bob pool, with cache drives, being serviced on controller B with no reboot necessary.
By the way, you know you can not remove drives from a pool, right? We can only add. This includes SSDs like Logzillas and Readzillas.
Well, I kind of just showed you a way you CAN remove readzillas from a pool, didn't I? Hmmmmmm.....
Ok, so AK8.1 has some cool new features. One of which is the ability to have block or record sizes on your shares larger than the current 128K. You can now have 256K, 512K, and 1M. This is important for some great performance boosts on different types of workloads.
This feature is a deferred update in AK8.1. In other words, just upgrading to AK8.1 does not turn on this feature. Also, after you apply the deferred update, you will NOT be able to roll-back to a previous version. So test and check before you apply.
Here is my screen after the upgrade. Note the new "Deferred Update" section that appears.
If you click the "More info" button, you will see the following help file:
So, before I apply the deferred update, if I go into a share, I will see the large block sizes in the pull-down but note that they are greyed out.
After I apply the update, which does NOT require a reboot and is very quick, the same menu now has black entries that I can choose.
The first minor release of code 2013.1 is now out and can be downloaded in MOS.
It is 2013.1.1.0, or AK8.1
Along with some bug fixes, it has three main new features:
The drivers for supporting the 8300 Series adapters are available in 2013.1.1.0. In ZFS Storage Appliance, it supports SAN traffic at line rate, 16Gbps Fibre-Channel speeds at extremely low CPU usage with full hardware offloads.This extreme performance eliminates potential I/O bottlenecks in today's powerful multiprocessor, multicore servers.
Enable support for block/record sizes bigger than 128k (256k, 512k and 1M) for filesystems or LUNS. The implementation for this includes a deferred update detailed in the Deferred Updates and Remote Replication Compatibility with Large Block/Recordsize Update sections below.
In order to better utilize the performance of high-speed storage devices, such as SSDs, improvements have been made to ZFS's algorithm for committing transaction groups. Specifically, the Storage Pool Allocator (SPA) sync process has been improved to parallelize some operations so that it spends a larger percentage of time writing data to the pool devices.
One of the important new features of the AK8 code is the ability to use multiple IP addresses on the same physical network port. This feature is called VNICs, or Virtual NICs. This allows us to no longer "burn" a whole port in a cluster when one cluster peer owns a network port. Traditionally, we have had to leave Net0 empty on controller 2, because it was used for managing controller 1. Vise-versa for Net1 on Controller 1. Then, if you have data going over 10GigE ports, you probably only had half of your ports running at any given time, and the partner 10GigE port on the other controller just sat there, doing nothing, unless the first controller went down.
What a waste. Those days are over.
I want to thank and give a big shout-out to our good partner, OnX Enterprise Solutions, for allowing me to come into their lab and play around with their 7320 to do this demo. They let me make a big mess of their lab for the day as I played around with VNICs. If you're looking for a partner who knows Oracle well and can also piece together a solution from multiple vendors to get you what you need, OnX is a good choice. If you would like to talk to your local OnX rep, you can contact Scott Gill at Scott.Gill@Onx.com and he can point you in the right direction for your area.
Here we go:
Here is what your Datalinks window looks like BEFORE you upgrade to AK8.
Here's what the same screen looks like after you upgrade. See the new box?
So here is my current network setup. I have my 4 physical interfaces setup each with an IP address. If I ping them, no problems.
So I can ping 180, 181, 251, and 252. However, if I try to ping 240, it does not work, as the 240 address is not being used by any of these interfaces, right?
Let's change that.
Here, I'm going to make a new Datalink by clicking the Datalink "Plus sign" button. I will check the VNIC box and tell it to use igb2, even though another interface is already using it.
Now, I will create a new Interface, and choose "v_dl2" for it's datalink.
My new network screen looks like this.
A few things to take note of here. First, when I click the "igb2" device, it only highlights dl2 and int2. It does not highlight v_dl2 or v_int2.
I think it should, but OK, it looks like VNICs don't highlight when you click the device.
Second, note how the underscore character in v_dl2 and v_int2 do not seem to show on this screen. You can see it plainly if you go in and edit them, but from here it looks like a space instead of an underscore. Just a cosmetic bug, but something to be aware of.
Now, if I click the VNIC datalink "v_dl2", on the other hand, it DOES highlight the device it belongs to, as it should. Seen here:
Note that it did not, however, highlight int2 with it, even though int2 is connected to igb2. That's because we clicked v_dl2, which int2 has nothing to do with. So I'm OK with that.
So let's try pinging 240 now. Of course, it works great.
So I now make another VNIC, and call it v_dl3 using igb3, and v_int3 with an address of 241. I then setup three shares, using ports 251, 240, and 241.
Remember that IP 251 and 240 both are using the same physical port of igb2, and IP 241 is using port igb3.
Next, I copy a folder full of stuff over to all three shares at the same time. I have analytics going so I can see the traffic. My top chart is showing the logical interfaces, and the bottom chart is showing the physical ports.
Sure enough, look at the igb2 and vnic1 interfaces. They equal the traffic going over the igb2 physical port on the second chart. VNIC2, on the other hand, gets igb3 all to itself.
This would work the same way with 10Gig or Infiniband ports. You can now have multiple IP addresses and even completely different subnets sharing the same physical ports. You may need to make route table entries for that. This allows us to use all of the ports you paid for with no more waste.
Very, very cool.
One small "bug" I found when doing this. It's really not a bug, it was designed to do this when VNICs were not around. But now that we have NVIC capability, they should probably change this. I've alerted the engineering team about this and they're looking into it, so perhaps it will be fixed in a later code.
Here it is. Remember when we made the new VNIC datalink, I specifically said to click on the "Plus Sign" button to create it? I don't always do that. I really like to use the drag-and-drop method to create my datalinks in the network screen.
HOWEVER, if you were to do that for building a VNIC, it will mess you up a little. Watch this.
Here, I'm dragging igb3 over to make a new datalink. igb3 is already being used by dl3, but I'm going to make this a VNIC, so who cares, right?
Well, the ZFSSA does not KNOW you are going to make it a VNIC, now does it? So... it works as designed and REMOVES the igb3 device from the current dl3 datalink in the background. See how it's now missing? At the same time, the dl3 datalink choice is missing from my list of possible VNICs for me to choose from!!!! Hey!!! I wanted to pick dl3. Why isn't it on the list??? Well, it can't be on this list because dl3 no longer has a device associated with it. Bummer for you.
When you click cancel, the device is still missing from dl3.
The fix is easy. Just edit dl3 by clicking the pencil button, do absolutely nothing, and click "Apply". The device will magically come back.
Now, make the VNIC datalink by clicking the "Plus Sign" button. Sure enough, once you check the VNIC box, dl3 is a valid choice. No problem.
That's it for now. Have fun with VNICs.
This came up today and I want to make sure it's clear.
Remember the "deferred update" I spoke about in my "Upgrade to AK8" entry just a bit ago? It's important to understand that this deferred update changes the way replication works. It is necessary that systems with the deferred update applied only replicate with other systems that have also had this deferred update applied. So if you apply it, your system can NOT replicate with ANY other system that has NOT had it applied, even if that other system is running AK8!!! Got it???
Remember, we do have a new version of the 2011 code for the older systems that do not want to upgrade to AK8. This 2011.1.8 code ALSO HAS this same deferred update in it. So, if you upgrade your system to AK8, and then apply the deferred update, and you have another system running either 2011.1.8 or AK8, you can replicate with them again once they apply the deferred update for multiple initiator groups. Yes, even if you're not using LUNs. Here is what it looks like if you try. It will fail.
Ok I told you I would give you the bad news of AK8 to go along with all the cool new stuff, so here it is. It's not that bad, really, just things you need to be aware of.
First, the 2013.1 code is being called OS8, AK8 and 2013.1 by different people. I mean different people INSIDE Oracle!! It was supposed to be easy, but it never is. So for the rest of this blog entry, I'm calling it AK8.
AK8 is not compatible with the 7x10 series. Ever. The 7x10 series is not supported with AK8, and if you try to upgrade one, it will fail at the healthcheck.
All 7x20 series, all of them regardless of age, are supported with AK8.
Drive trays. Let's talk about drive trays and SAS cards. The older drive trays for the 7x20 series were called the "Riverwalk 2" or "DS2" trays. They were technically the "J4410" series JBODs that Sun used to sell a la carte before we stopped selling JBODs. Don't get me started on that, it still makes me mad. We used these for many years, and you can still buy them right now until December 15th, 2013, when they will no longer be sold. The DS2 tray only came as a 4u, 24 drive shelf. It held 3.5" drives, and you had a choice of 2TB, 3TB, 300GB or 600GB drives. The SAS HBA in the 7x20 series was called a "Thebe" card, with a part # of 7105394. The 7420, for example, came standard with two of these "Thebe" cards for connecting to the disk trays. Two Thebe cards could handle up to 12 trays, so one would add two more cards to go to 24 trays, or have up to six Thebe cards to handle 36 trays. This card was for external SAS only. It did not connect to the internal OS drives or the Readzillas, both of which used the internal SCSI controller of the server.
These Riverwalk 2 trays ARE supported with AK8. You can upgrade your older 7420 or 7320, no problem, as-is. The much older Riverwalk 1 trays or J4400 trays are NOT supported by AK8. However, they were only used by the 7x10 series, and we already said that the 7x10 series was not supported.
Here's where it gets tricky. Since last January, we have been selling the new style disk trays. We call them the "DE2-24P" and the "DE2-24C" trays. The "C" tray is for capacity drives, which are 3.5" 3TB or 4TB drives. The "P" trays are for performance drives, which are 2.5" 300GB and 900GB drives. These trays are NOT Riverwalk 2 trays, even though the "C" series may kind of look like it. Different manufacturer and different firmware. They are not new. Like I said, we've been selling them with the 7x20 series since last January. They are the only disk trays we will be selling going forward. Of course, AK8 supports them.
So what's the problem? The problem is going to be for people who have to mix drive trays.
Remember, your older 7x20 series has Thebe SAS2 HBAs. These have 2 SAS ports per card. The new ZS3-2 and ZS3-4 systems, however, have the new "Thebe2" SAS2 HBAs. These Thebe2 cards have 4 ports per card. This is very cool, as we can now do more SAS channels with less cards. Instead of needing 4 SAS cards to grow to 24 trays like we did with the old Thebe cards, I can now do 24 trays with only 2 Thebe2 cards. This means more IO slots for fun things like Infiniband and 10G. So far, so good, right? These Thebe2 cards work with any disk tray. You can even mix older DS2 trays with the newer DE2 trays in the same system, as long as you have Thebe2 cards.
Ah, there's your problem. You don't have Thebe2 cards in your old 7420, do you? Well, I told you the bad news wasn't that bad, right? We can take out your Thebe cards and replace them with Thebe2. You can then plug your older DS2 trays right back in, and also now get newer DE2 trays going forward. However, it's important that the trays are on different SAS channels. You can mix them in the same system, but not on the same channel. Ask your local SC if you need help with the new cable layout. By the way, the new ZS3-2 and ZS3-4 systems also include a new IO card called "Erie" cards. These are for INTERNAL SAS to the OS drives and the Readzillas. So those are now SAS2 instead of SATA like the older models. Yes, the Erie card uses an IO slot, but that's OK, because the Thebe2 cards allow us to use less SAS HBAs to grow the system, right?
That's it. Not too much bad news and really not that bad. AK8 does not support the 7x10 series, and you may need new Thebe2 cards in your older systems if you want to add on newer DE2 trays. I think we can all agree that there are worse things out there. Like our Congress.
Next up.... More good news and cool AK8 tricks. Such as virtual NICS.
Ok, so you will hear some great stuff about AK8, but are you going to upgrade your production system to a new major release right after it comes out? Probably not. If you have a test system or a lab system you can play with, then I highly recommend upgrading it so you can start to see the new performance features that AK8 can give you. If you only have one system, or they're all in production, then of course you're going to wait for the first minor release of the new code, aren't you? I would too. I'm told the first minor is coming out in just a few weeks. It is the release they used for the public benchmark performance testing. So you can feel more confident in that release. You may also be able to talk to your local sales team about getting a demo unit. Then, you can play with the new code in a safe lab area before upgrading your production system.
Next up... The negative aspects of upgrading to AK8. It's not too bad, but you will need to know which older systems can't do it, how to work with older disk trays, and whether or not you can replicate newer systems with older systems.
Hey, I told you I wasn't just going to blow sunshine on you all the time, right? I can spit out the kool-aid as well as drink it! :)
The upgrade to OS8, AK8 or whatever we are calling it this week was pretty straightforward. It will take some extra time, as it has to perform some one-time jobs the first time it reboots, but it wasn't more than 15 minutes. Your mileage may vary, it's possible on larger systems that it takes longer. There is also a deferred update I will show you down below that you can choose to do right away or later. Once you do that deferred update, you do NOT want to roll back to the previous version, so be warned.
It's been over 1.5 years since the last major update, so many of you probably have never done one before. The process is just like a minor update, it just takes longer.
Get the update from MOS and unzip it to a folder. Go ahead and upload it and unpack it like normal from your Maintenance-->System screen. I did like how it tried to tell me how much time was left, but the numbers were all over the place, and it was over by the time it was correct.
Now, when you click the arrow to apply the update, the normal health check window appears, but you will notice something extra. That's the 'Deferred Update' choice. You can make it apply as soon as it reboots, or you can manually apply it later. Remember, you do NOT want to rollback after this is applied. I did "Upon Request", click the "Check" button, and if all is well, click "Apply"
After it installs and reboots, you can look at the command line via serial port or SSH. You will notice a few things are different during this boot-up.
Right after the "Updating ####" section you can see it actually upgrading various services and the SMF repository. This can take around 3 minutes, but if you have a lot of aggragations or IPMP then it could take longer. So relax. You can see mine, below, which went 290 seconds, and then continued upgrading other stuff.
The upgrade continues, and the screen is pretty obvious.
When you see it configuring network devices, you're almost done. You can see the new code level, and it's about to go to the login prompt. At that point, you should be able to log back into the BUI.
Log back into the BUI, and you will see the new version is the current version in Maintenance-->System
Now, let's do the deferred update on the same screen.
You can read about the deferred updates here, and click apply when ready to add them. In this case, it's for the ability to associate multiple initiator groups with a LUN, something we have wanted for some time now, so very cool. Note that ANY other deferred updates you have not applied yet will also apply, as there is no way to pick and choose. Either they all apply or none do. Remember I said not to roll-back to a previous version of the code after you do this? It will let you, but if you do, your LUN operations will fail. No bueno. Don't do it. The deferred upgrades are one-way.
Note that the deferred update does NOT force a reboot.
Once you apply the deferred updates, the whole deferred update area goes away, and the screen now looks like this.
Do you want to see something cool right away now in OS8 that you could not do before? There's a lot I will talk about later, but for now, since you're so excited, go to Configuration-->Alerts, and create a new Threshold Alert. Notice the new Capacity threshold alerts, where you can now get emails or create an action when a pool, and project, or a share goes over, say, 80% full. Sweet.
Well, they said it would be release on October 8th, and they did not disappoint.
The new code, internally called 2013.1.0.1 and what marketing is calling ZFSSA OS8 or AK8 is out now. Download from MOS.
The numerous updates are hard to all get a handle on at once. This readme file will help: https://wikis.oracle.com/display/FishWorks/ak-2013.1.0.1+Release+Notes
I will be loading it, playing with it, and showing some of my favorite things coming up soon, as in the next few days.
Much of the improvements are what you can not see, such as the improved ARC and RAID benefits.
Lots to talk about. Especially if you need to mix trays. Be careful. Read the file. Stay tuned.
This is pretty cool. It seems the ZS3-4 just became the number 1 system in performance on Storage Performance Council's benchmark site.
The email below went out today to all SPC members.
I would like to point out that we are also the LEAST EXPENSIVE system per SPC-2 performance. Check out our Price/Performance numbers.
So we came in at 17,244 for a $388,472 system, for a price/performance of $22.53.
Now compare that to the 2nd place system on the site, which is HP's P9500. It came in at 13,147 for a huge price of $1,161,503 and a price/performance of $88.34
We KILLED it....
Oracle Corporation has submitted the SPC-2 Result™ listed below.
The Executive Summary and Full Disclosure Report (FDR) are posted in the Benchmark Results section of the website.
The documents may be accessed by using the URL listed below:
SPC-2 Submission Identifier .... B00067
SPC-2 MBPS™ …………………….... 17,244.22
SPC-2 Price-Performance™ …… $22.53/SPC-2 MBPS™
Total ASU Capacity ………….….. 31,610.959 GB
Data Protection Level ………..…. Protected 2 (Mirroring)
Total Price ………………………….... $388,472.03
Congratulations to Oracle for an outstanding SPC-2 Result, which established a new #1 for SPC-2 performance (17,244.22 SPC-2 MBPS™).
Storage Performance Council (SPC)
It's official and we can finally be excited about the new ZS3 family. We can start talking about it now and start ordering it on Thursday. It won't actully ship, however, until next month on October 8th. I know, I know... but hey I CAN give you something TODAY... How about the 4TB drives, available right now??? Also, the new 1.6TB Readzillas and the 16-port 4x4 SAS HBAs are all here this Thursday, Sept 12th. Not bad, right?
So we now have three new systems:
1- The new ZS3-2
2- The new ZS3-4
3- The updated 7420M2 with internal SAS
The first two will ship with the new OS8 code. The 3rd one will ship with the older OS7 (2011.1.7) code, but can be updated to OS8 at anytime. (see my last blog entry about the new OS names)
Ok, here is the low-down. The new 4TB drives are for the DE2-C trays which have been out since last December. I do NOT YET KNOW if they will also be available for the older DS2 trays, but I will tell you when I find out. This is important--- The new 1.6TB Readzillas are SAS, not SATA, so they will only work in the new ZS3 series and the new 7420M2 box. Your older 7420 and 7320 use internal SATA, not SAS, for their Readzillas and system OS drives. The new 900GB OS drives and the new 1.6TB Readzillas are SAS, so you need the newer versions to work with them.
The LAST order date for the current 7420 is September 30, 2013, and the LOD for the 7320 and 7120 is November 30, 2013.
You can get a new product datasheet or the product announcement from your local storage SC.
As you now know, version 2011.1.7.0 is the current shipping code for our ZFSSA. You really want to be running this code, no matter what ZFSSA system you have. This code will work all they way back to 7x10 systems. There are found bugs in even the last code, 2011.1.6.0, that this newer code will fix, so get on it.
Let's talk for a moment about code names and numbers, as it's going to change from your point of view very soon. Many years ago, Sun Microsystems created the "Fishworks" team to create this code that we now run on the ZFS Storage appliance. You can still see Fishworks and the original team names if you "Shift-Click" the Oracle/Sun logo in the top left corner of your ZFSSA. (There are MANY secret Shift-Click operations in the ZFSSA. I told you about some back in my blog on analytics here: https://blogs.oracle.com/7000tips/entry/fun_tips_with_analytics) By the way, FISH stands for "Fully Integrated Software & Hardware.
So the code that Fishworks created is a layer between you, the user, and the special version of Solaris and ZFS underneath. This is called an "Appliance Kit", and you will see all sorts of system names with an "AK" on them, which are directly linked to the Appliance Kit, which is basically the code for the interface, both the GUI and the CLI, which you all know and love. Internally at Oracle the Fishworks team, now a much larger team that Oracle has grown far beyond the original, calls the code levels for the ZFSSA "AK#####". For example, the code level you are all running right now is called AK7. It has minor updates to it, but the major code is AK7, with a minor now of 04.24.7, so really the last code level released is AK7.04.24.7. You have all been calling it "2011.04.24.7", because in the past they used the year the major release came out as it's name. For obvious reasons, this no longer makes sense. People think the current code they're running was made two years ago in 2011, but that's just not the case. This last release bears almost no resemblance to the original AK7 code. So much has changed in it.
So, to make things simpler, Oracle is dropping the year on the code, and will now call it AK#.#.#, starting with the upcoming release of AK8. In all likelihood, there will still be one more minor release of AK7 coming first, so don't wait to upgrade like AK8 is just around the corner. It's still going to be a few months, and you don't want to hit a bug before that, so upgrade when you can to AK7-7 (my nickname for the current release).
AK8 will be a game-changer. I'm not allowed to talk about it too much, but speak with your local storage SC and maybe they can give you a heads-up. HUGE stuff coming folks. Just the performance enhancements are going to be a world-changing event in the storage industry. If you have a 7x20 series system, going to AK8, without doing anything else, is going to make your system better and faster.
You can see all of the software release history here: https://wikis.oracle.com/display/FishWorks/Software+Updates
In case you were not paying attention, code 2011.1.7 is now out.
I've talked about HCC (Hybrid Columnar Compression) before. This is Oracle's built-in compression feature, free of charge in 11Gr2, that allows a CRAZY amount of compression on historical data inside an Oracle database. It only works if the database is being stored in a ZFSSA, Exadata or Axiom. You can read all about it in this whitepaper, which shows the huge value of HCC when used with the ZFSSA. http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-hybrid-columnar-compression-1689701.html
Now, even better, Oracle has announced a great new feature in Oracle 12c called "Automatic Data Optimization". This allows one to setup HCC to AUTOMATICALLY compress data AS IT AGES.
So this is now ILM all built into the Oracle database. It's free for crying out loud. It just needs to be sitting on Oracle storage, such as the ZFSSA, Exadata or Axiom.
It seems Wikibon has done some research and interviews and has written a very nice article on the awesome cost savings of the ZFSSA.
Check it out here:
Here are some of my favorite quotes from the article:
“The high-end ZFS storage array is the highest performing hybrid storage device that has been analyzed by Wikibon, and in a class of its own when it comes to high write-IO environments.”
“Wikibon analyzed the architecture and performance of the ZFS Appliance in depth, and compared it to "traditional storage arrays" (e.g. EMC, NetApp, HDS, HP, etc. mainstream mid-market arrays) in high write environments.”
“For an environment with 100 terabytes, 1,000,000 IOPS and 20% writes, the additional cost of the traditional system (NetApp) is 194% higher than the hybrid system (ZFS). “
“Wikibon members should consider the ZFS Appliance in more demanding workloads where sustained write performance and IO requirements are higher. Examples include high performance environments such as specific backup applications and core transaction-intensive database workloads. In these situations, because of the hybrid design of the ZFS Appliance, customers will find significant savings relative to traditional disk arrays that don't scale as well.”
“CIOs, CTOs and senior storage executives should position the Oracle ZFS appliance as an ideal strategic fit for high streaming environments such as database backups. As well, the product can be successfully integrated into high-performance Oracle database workloads. In write-intensive and heavy IO workloads, the ZFS appliance will likely prove the best-of-breed, lowest cost solution.”
“The general feedback from the ZFS appliance practitioners was positive”
“Praise for the performance of the ZFS, particularly in backup (high-write) environment;”
“7 gigabytes/second write rates achieved in a benchmarks;”
“11 terabytes/hour sustained over 2.5 hours for backup, compared with 1 terabytes/hour for a traditional storage device;”
“ZFS snapshots and clones universally praised;”
“DTrace was praised for the quality and completeness of the performance analytic tool;”
“Compression performance was strongly praised (up to 16x compression), especially for reads;”
“None of the respondents needed to tune the ZFS read or write caching - performance maintenance was minimal;”
“No problems with availability.”
For those who have not noticed, ZFSSA version 2011.1.6.0 came out on April 29th.
Go get it.
Release notes are here:
Someone asked a good question about the clone and snap-manager licenses, so I wanted to clarify.
The info I received is that if you use the new snap-manager for ZFSSA product, all snaps and clones created by that product are covered by it's one-time license. You do not need the additional clone license to manage these at all.
However, if you had other clones you are creating that are NOT created via the snap-manager tool, then yes, you do need the other clone license for these to be supported.
I hope that makes sense.
A few weeks ago, Oracle announced the Oracle database SnapManager software for ZFSSA.
It is a license just like the Clone or the Replication license. It's just a one-time, yes-or-no, on-or-off license per controller. Better yet, you can go ahead and get the software and try it out for free for 30 days. Go check it out with the link below.
The Snap Management Utility combines the underlying snapshot, clone, and rollback capabilities of the Oracle ZFS Storage Appliance with standard host-side processing so all operations are consistent.
Downloading the Oracle Snap Management Utility for Oracle Database Software
A. Customers who purchased the license need to download the software from eDelivery (see
B. Customers who wish to evaluate for 30-days prior to purchase may download from the same
site. The license allows a 30-day evaluation period. Follow instructions below.
Instructions to download software:
1. Go to eDelivery link: https://edelivery.oracle.com/EPD/Search/handle_go
3. Accept Terms and Restrictions
4. In the “Media Pack Search” window:
a. Under Product Pack, select “Sun Products”
b. Under Platform, select “Generic”
c. Click “Go”
5. From the results, select the “Oracle Snap Management Utility for Oracle Database”
6. There are two files for download:
a. The “Oracle Snap Management Utility for Oracle Database, Client v 1.1.0” is required
b. The “Sun ZFS Storage Software 2011.1.5.0” is the latest version of the ZFS Storage
Appliance SW provided for customers who need to upgrade their software.
UPDATE-7-27-13- Just found out that if you buy the SMU license, you do NOT need to buy the clone license. The cloning is included in SMU, so that's cool.
Ok, so this is pretty cool. If you didn't know, Oracle has this great program called Iaas, which is Infrastructure As A Service. You can go check it out here: http://www.oracle.com/us/products/engineered-systems/iaas/overview/index.html
What this means it that someone who really wants an Oracle engineered system, such as an Exadata, but can't come up with the up-front cost, can do Iaas and put it in their datacenter for a low monthly fee. This can be really cool. Some people can now change their entire budget from a Cap-ex to an Op-ex, save a bunch of up-front costs, and still get the hardware they need and want.
As of this week, the ZFSBA is now included in the Iaas offering. So one can get the ZFS Backup Appliance and use it to backup their engineered system (Exadata, Exalogic, or SuperCluster) over infiniband. They can also use it to then make snaps and clones of that data for their testing and development, as well as use it for general-purpose storage over 10Gig, 1Gig or FC. Pretty sweet way to get the ZFS Storage system in your site without the up-front costs. You can get the ZFSBA in a Iaas all by itself if you want, without the engineered system at all, just to get the ZFS storage.
Now, some of you may be asking, "What the heck is the ZFSBA and how is it different than the ZFSSA?"
I haven't talked about the ZFSBA before. The ZFS Backup appliance. I probably should have. You can get more info on it here: http://www.oracle.com/us/products/servers-storage/storage/nas/zfs-backup-appliance/overview/index.html
Here is the low-down. It's a 7420 cluster with drive trays, all pre-cabled and in a rack, ready-to-go. The 7420 has IB cards in place and the whole system is a single line-item to make it easy for the sales team to have a single line-item part number to use as an easy way to add a ZFSSA to an engineered system deal for backing up the engineered system. There are two versions, one with high-capacity drives and the other with high-performance drives. Either one you get can add additional trays of either type later. Unlike the other engineered systems, the ZFSBA does allow one to use the extra space in the rack, which is nice.
So, if you want a 7420 cluster and a rack, is there a downside to always using the ZFSBA to order a 7420? Not many. Same price, easier to order with less part numbers. You can still customize it and add more stuff. There is one downside, and that's the fact that the ZFSBA does use the 32-core version of the 7420, not the 40-core version. The backup of an Exadata does not require more cores, so they went with the smaller of the two. If you need more power and more DRAM for faster workloads, however, you may want to build a 7420 ZFSSA the normal way.
If this doesn't make sense, please add a comment below or just email me.
Ok, here are some much better pictures of our two new trays.
The DE2-24P is the 2u performance model, meaning that it holds 2.5" 10,000 RPM drives (and up to four LZ SSDs, of course). These are currently either 300GB or 900GB drives.
The DE2-24C is the 4u capacity model, which holds the larger 3.5" 7,200 RPM drives and LZ drives. These are currently 3TB drives.
One of these days, I really need to update my storage eye charts with these new trays. I just haven't had the time!!!
Hey everybody, happy new year and some great news for the ZFSSA...
The new 2u disk trays have come out early. I was not expecting them until later this quarter, but was surprised yesterday that Oracle announced them ready for sale. Sweet. So we now have a 4u capacity tray for 3TB drives (soon to be 4TB drives), and a 2u high-performance tray with either 300GB or 900GB 10K speed drives. These new 900GB 10K speed drives have the same IOPS as our current 600GB 15K speed drives, since the form factor went from 3.5" to 2.5". So you now can have 24 drives in a 2u tray. Very cool. These new trays require OS 2011.1.5, and right now you can NOT mix them with the older DS2 trays. Being able to mix them will be supported later, however.
To go along with that, the new 2011.1.5 code has been released. you can download it right now in MOS. It fixes a ridiculous amount of issues, as well as supports these new 2u drive trays. You can read all about the new code here: https://updates.oracle.com/Orion/Services/download?type=readme&aru=15826899
**Update 1-18-13 - I need to correct myself, and I'm adding this note instead of changing what I wrote up above and trying to hide that I messed up... Hey it happens...
At first I was lead to believe the the smaller size platter made up for the slower speed on the new 2.5" drives. This is not the case. It does help, but the 10K speed drives do get slightly less IOPS and throughput then the 3.5" 15K speed drives. Not that this matters too much for us, since we pride ourselves on the fact we drive performance with the ZFSSA via our cache, not our spindle speed, but it's important to point out. Now, the power savings and space savings are real, and very much worth using the smaller form factor. Also, you do understand that Oracle does not have a whole lot to do with this? This is the way drive manufacturers are going. They just don't make 2.5" drives at 15K speed. So this is the way it is. Now, at some point sooner rather than later, we will also be putting out an all SSD tray. So if you need fast IOP speeds on the spindles, we will have you covered there, too.
This blog is a way for Steve to send out his tips, ideas, links, and general sarcasm. Almost all related to the Oracle 7000, code named ZFSSA, or Amber Road, or Open Storage, or Unified Storage. You are welcome to contact Steve.Tunstall@Oracle.com with any comments or questions