Friday Jul 27, 2012

Installing new ZFSSA systems

I had not realized how long it has been since my last blog entry. I've been so busy installing ZFSSAs that I've been a flake on my blog. I havn't hit anything Earth-shattering to report, and like most of you, I'm really waiting for the release of the new code, which I heard was supposed to be in July. Hey, we have two more days before we can say it's late, right?  :)

This week, I spent two days setting up and configuring a large 7420 cluster. The funny thing is, a day later, I spent only 30 minutes setting up two different 7420 single-head systems. It's really funny how easy it is to setup a single head, single tray system. 

Monday Jun 04, 2012

New Analytic settings for the new code

If you have upgraded to the new 2011.1.3.0 code, you may find some very useful settings for the Analytics. If you didn't already know, the analytic datasets have the potential to fill up your OS hard drives. The more datasets you use and create, that faster this can happen. Since they take a measurement every second, forever, some of these metrics can get in the multiple GB size in a matter of weeks. The traditional 'fix' was that you had to go into Analytics -> Datasets about once a month and clean up the largest datasets. You did this by deleting them. Ouch. Now you lost all of that historical data that you might have wanted to check out many months from now. Or, you had to export each metric individually to a CSV file first. Not very easy or fun. You could also suspend a dataset, and have it not collect data at all. Well, that fixed the problem, didn't it? of course you now had no data to go look at. Hmmmm....

All of this is no longer a concern. Check out the new Settings tab under Analytics...

Now, I can tell the ZFSSA to keep every second of data for, say, 2 weeks, and then average those 60 seconds of each minute into a single 'minute' value. I can go even further and ask it to average those 60 minutes of data into a single 'hour' value.  This allows me to effectively shrink my older datasets by a factor of 1/3600 !!! Very cool. I can now allow my datasets to go forever, and really never have to worry about them filling up my OS drives.

That's great going forward, but what about those huge datasets you already have? No problem. Another new feature in 2011.1.3.0 is the ability to shrink the older datasets in the same way. Check this out. I have here a dataset called "Disk: I/O opps per second" that is about 6.32M on disk (You need not worry so much about the "In Core" value, as that is in RAM, and it fluctuates all the time. Once you stop viewing a particular metric, you will see that shrink over time, just relax). 

When one clicks on the trash can icon to the right of the dataset, it used to delete the whole thing, and you would have to re-create it from scratch to get the data collecting again. Now, however, it gives you this prompt:

As you can see, this allows you to once again shrink the dataset by averaging the second data into minutes or hours.

Here is my new dataset size after I do this. So it shrank from 6.32MB down to 2.87MB, but i can still see my metrics going back to the time I began the dataset.

Now, you do understand that once you do this, as you look back in time to the minute or hour data metrics, that you are going to see much larger time values, right? You will need to decide what size of granularity you can live with, and for how long. Check this out.

Here is my Disk: Percent utilized from 5-21-2012 2:42 pm to 4:22 pm:

After I went through the delete process to change everything older than 1 week to "Minutes", the same date and time looks like this:

Just understand what this will do and how you want to use it. Right now, I'm thinking of keeping the last 6 weeks of data as "seconds", and then the last 3 months as "Minutes", and then "Hours" forever after that. I'll check back in six months and see how the sizes look.

Steve 

Friday May 18, 2012

New code is out- Version 2011.1.3.0

 

The newest version of the ZFSSA code, 2011.1.3.0, is now out and availiable on MOS.

I will be writing more about one of it's many new, useful features coming early next week. It's very cool, and has to do with how you can now change the size of your analytic datasets.

Steve

ak-2011.04.24.3.0 Release Notes

2011.1.3.0

This minor release of the Sun ZFS Storage Appliance software contains significant bug fixes for all supported platforms. Please carefully review the list of CRs that have been addressed and all known issues prior to updating.

Among other issues, this release fixes some memory fragmentation issues (CRs 7092116 and 7105404), includes improvements to DTrace Analytics, and failover improvements to DNS, LDAP, and the SMB Domain Controller.

This release requires appliances to be running the 2010.Q3.2.1 micro release or higher prior to updating to this release. In addition, this release includes update health checks that are performed automatically when an update is started prior to the actual update from the prerequisite 2010.Q3.2.1 micro release or higher. If an update health check fails, it can cause an update to abort. The update health checks help ensure component issues that may impact an update are addressed. It is important to resolve all hardware component issues prior to performing an update.

Deferred Updates

When updating from a 2010.Q3 release to a 2011.1 release, the following deferred updates are available and may be reviewed in the Maintenance System BUI screen. See the "Maintenance:System:Updates#Deferred_Updates" section in the online help for important information on deferred updates before applying them.

1. RAIDZ/Mirror Deferred Update (Improved RAID performance)
This deferred update improves both latency and throughput on several important workloads. These improvements rely on a ZFS pool upgrade provided by this update. Applying this update is equivalent to upgrading the on-disk ZFS pool to version 29.

2. Optional Child Directory Deferred Update (Improved snapshot performance)
This deferred update improves list retrieval performance and replication deletion performance by improving dataset rename speed. These improvements rely on a ZFS pool upgrade provided by this update. Before this update has been applied, the system will be able to retrieve lists and delete replications, but will do so using the old, much slower, recursive rename code. Applying this update is equivalent to upgrading the on-disk ZFS pool to version 31.

Supported Platforms

Issues Addressed

The following CRs have been fixed in this release:

4325892 performance decrease if 1st nameserver is down
4377911 RFE for improved DNS resolver failover performance
6822262 Windows Media Service/SQL server cannot connect to cifs share
6822586 SMB should try UDP first to get LDAP SRV record and retry with TCP if truncated UDP response
6941854 idmap_getwinnamebygid() and idmap_getwinnamebyuid() need to work for builtin names
6953716 Exception 'Native message: datasets is undefined' while append dataset to current worksheet
6973870 Specify retention time for Analytics data
6991949 panic will happen during some error injection stress
6996698 The SRAO may terminate irrelevant memory copy
6997450 gcpu_mca_process() doesn't return a right disp for poisoned error
7023548 replacement failed for faulted readzilla
7040757 smb_com_write_andx NULL pointer dereference panic in mbc_marshal_get_uio
7044065 Replay records within a dataset in parallel
7047976 zil replay assertion failure with full pool
7048780 I/O sent from iSCSI initiator embedded in VirtualBox completes with a status TASK SET FULL
7052406 NFS Server shouldn't take zero copy path on READ if no write chunk list provided
7052703 zl_replay_lock needs to be initialised and destroyed
7066080 s11n code generated by fcc should include strings.h
7066138 fcc must define _INT64_TYPE
7066170 configurable max- and min-units for akInputDuration
7066552 NFS Server Fails READS with NFS4ERR_INVAL when using krbi or krb5p
7071147 DC failover improvements
7071628 ldap_cachemgr exits after failure in profile refresh, only when using sasl/GSSAPI authentication
7071916 ztest/ds_3 missing log records: replayed X < committed Y
7074722 dataspan can be marked read-only even if it has dirty subspans
7080443 LDAP client failover doesn't work
7080790 ZIL: Assertion failed: zh->zh_replay_seq < *replayed_seq (0x1a < 0x1a)
7084762 ztest/ds_3 missing log records: replayed X < committed Y - Part 2
7089422 ldap client uses bindTimeLimit instead of searchTimeLimit when searching for entries
7090133 Large READs are broken with krb5i or krb5p with NFS Zero-Copy turned on
7090153 table-free akInputRadio
7090166 ak_dataspan_stashed needs a reality check
7091223 command to prune datasets
7092116 Extremely sluggish 7420 node due to heap fragmentation
7093687 LDAP client/ldap_cachemgr: long delays in failover to secondary Directory Server
7098553 deadlock when recursive zfs_inactive collides with zfs_unmount
7099848 Phone Home logs still refer to SUN support
7102888 akShow() can clobber CSS
7103620 akInputRadio consumers must be explicit in their use of subinputs as labels
7104363 Influx of snapshots can stall resilvering
7105404 appliance unavailable due to zio_arena fragmentation
7107750 SMB kernel door client times out too early on authentication requests
7108243 ldap_cachemgr spins on configuration error
7114579 Operations per second broken down by share reports "Datum not present" for most time periods
7114890 Ak Build tools should accommodate double-slashes in paths
7117823 RPC: Can't decode result after READ of zero bytes
7118230 Need to deliver CMOS images for Lynxplus SW 1.5
7121760 failure to post an alert causes a umem double-free
7122403 akCreateLabel(): helper function for creating LABEL elements
7122405 several Analytics CLI commands do not check for extra arguments
7122426 akParseDateTime() could be a bit more flexible
7123096 panic: LU is done with the task but LPORT is not done, itask ffffff9f59c540a0 itask_flags 3204
7125626 fmtopo shows duplicate target-path for both sims in a tray starting in 2010.q3.4 and 2011.1.1
7126842 NTLMSSP negotiation fails with 0xC00000BB (NT_STATUS_NOT_SUPPORTED)
7128218 uio_to_mblk() doesn't check for esballoca() failure
7129787 status could be uninitialized in netlogon_logon function
7130441 CPU is pegging out at 98%
7131965 SMB stops serving data (still running) Need to reset smb to fix issue
7133069 smbserver locks up on 7320 running 2011.1. Can't kill the service
7133619 Need to deliver CMOS image for SW 1.3 for Otoro
7133643 Need to deliver CMOS image for SW 1.2 for Otoro+
7142320 Enable DNS defer-on-fail functionality
7144155 idmap kernel module has lock contention calling zone_getspecific()
7144745 Online help Application Integration - MOS should be replaced with OTN
7145938 Add maximum cards for 7420 10GbE, FC, Quad GbE, and InfiniBand
7146346 Online Help: Document Sun ZFS Backup Appliance
7149992 Update doc to include 7320 DRAM
7152262 double-digit firefox version number throws off appliance version checks
7153789 ldapcachemgr lint warnings
7154895 Remove CMOS images for Lynx+ SW 1.5
7155512 lint warning in usr/src/cmd/ldapcachemgr/cachemgr_change.c
7158091 BUI Alert Banner and Wait Dialog are not functioning correctly
7158094 Still not ready for 9005
7158519 dataset class authorization doesn't work as expected
7158522 pruning an unsaved dataset does nothing but looks like it working continuously
7160553 NMI does not panic appliance platforms using apix
7161060 system hang due to physical memory exhaustion seen when major shift in workload
7165883 arc data shrinks continuously after arc grew to reach its steady state

Thursday May 03, 2012

Analytics & Threshold Alerts

Alerts are great for not only letting you know when there's some kind of hardware event, but they can also be pro-active and let you know there's a bottleneck coming BEFORE it happens. Check these out. There are two kinds of Alerts in the ZFSSA. When you go to Configuration-->Alerts, you fist see the plus sign by the "Alert Actions" section. These are pretty self-explanatory and not what I'm talking about today. Click on the "Threshold Alerts", and then click the plus sign by those.

This is what I'm talking about. The default one that comes up, "CPU: Percent Utilization" is a good one to start with. I don't mind if my CPUs go to 100% utilized for a short time. After all, we bought them to be used, right? If they go over 90% for over 10 minutes, however, something is up, and maybe we have workloads on this machine it was not designed for, or we don't have enough CPUs in the system and need more. So we can setup an alert that will keep an eye on this for us and send us an email if this were to occur. Now I don't have to keep watching it all the time. For an even better example, keep reading...

What if you want to keep your eyes on whether your Readzillas or Logzillas are being over-utilized? In other words, do you have enough of them? Perhaps you only have 2 Logzillas, and you think you may be better off with 4, but how do you prove it? No problem. Here in Threshold Alerts, click on the Threshold drop-down box, and choose your "Disk: Percent Utilization for Disk: Jxxxxx 013" choice, which is my Logzilla drive in the Jxxxxx tray.

Wait. What's that? You don't have a choice in your drop-down for the Threshold item you are looking for, such as an individual disk?
Well, we will have to fix that.

Leave Alerts for now, and join me over in Analytics. Start with a worksheet with "Disk: Percent utilization broken down by Disk" chart. You do have this, as it's already one of your built-in datasets.

Now, expand it so you can see all of your disks, and find one of your Readzilla or Logzilla drives. (Hint: It will NOT be disk 13 like my example here. Logzillas are always in the 20, 21, 22, or 23 slots of a disk tray. Go to your Configuration-->Hardware screens and you can easily find out which drives are which for your system).

Now, click on that drive to highlight it, like this: 

 Click on the Drill Button, and choose to drill down on that drive as a raw statistic. You will now have a whole new data chart, just for that one drive.

 Don't go away yet. You now need to save that chart as a new dataset, which will keep it in your ZFSSA analytic metrics forever. Well, until you delete it.
Click on the "Save" button, the second to last button on that chart. It looks like a circle with white dots on it (it's supposed to look like a reel-to-reel tape spindle).

Now go to your "Analytics-->Datasets", and you will see a new dataset in there for it. 

 Go back to your Threshold Alerts, and you will now be able to make an alert that will tell you if this specific drive goes over 90% for more than 10 minutes. If this happens a lot, you probably need more Readzillas or Logzillas.

I hope you like these Alerts. They may take some time to setup at first, but in the long run you may thank yourself. It might not be a bad idea to send the email alerts to a mail distribution list, instead of a single person who may be on vacation when the alert is hit.  Enjoy. 

Thursday Apr 19, 2012

Route Table Stuff

Let's talk about your Routing Table.

I have never installed a ZFSSA, ever, without having to edit this table. If you believe that you do not need to edit your routing table, then you are wrong.
:)  Ok, maybe not. Maybe you only have your ZFSSA connected to one network with only a few systems on it. I guess it's possible. Even in my simulator, however, I had to edit the routing table so I could use it no matter how I had my laptop connected, at home over a VPN or at work or using a public Wifi. So I'm going to bet a nice dinner that you, or someone, should be checking this out.

First things first. I'm going to assume you have a cluster. I try really hard to only sell clusters, but yes I know there are plenty of single-nodes out there too. Single-node people can skip these first two paragraphs. It's very important in your cluster to have a 1GigE management interface to each of the two controllers. You really want to be able to manage each controller, even when one of them is down, right? So best practice is to use the 'igb0' port for controller 1 management and to use the 'igb1' port for controller 2 management. It's important to make these ports 'Private' in the cluster configuration screen, so they do NOT failover to the other controller when a cluster takeover takes place for whatever reason. Igb0 and igb1 are two of the four built-in 1GigE ports. You can still use igb2 and igb3 for data, either alone or as an aggregate, and don't make them private, so they DO failover in a cluster takeover event. Now go to your remote workstation, which may be over a different subnet, and you should be able to ping and connect to Controller 1 using igb0.
Now, back to the routing table. You have probably noticed that you can not ping or connect to the other controller, and you think something is wrong. Not to worry, everything is fine. You just need to tell your routing table, which is shared between the heads, how to talk to that other port, igb1. You see, you have a default route setup already for port igb0, that's why it works. Your new, private, igb1 however, does not know how to speak back to your remote system you are now using to manage via the BUI from a different subnet. So, make a new default route for igb1 and point it to the default gateway, which is the router it needs to use in order to cross subnets. See the picture below. Note how I have a default route for "ZFS1-MGMT" for port igb0. This shows a green light because I'm currently on ZFS1, and it sees this port just fine. I also have a default route for "ZFS2-MGMT" from port igb1. This route has a blue light, showing it as inactive. That's because this controller, ZFS1, has nothing plugged into it's igb1 port. That's perfect. Hit "Apply". Now count to 10. Now from your remote host, go ahead and ping or connect to Controller 2, and it works!!! This is because your controllers share a routing table, and when you added that igb1 route, it propagated over to the other controller, where igb1 is plugged in, and that route has a green light over there and it works fine. You will see from Controller 2's point of view that igb1 has a green light and igb0 has a blue light.  (continued below the picture)

Now it's time to setup any static routes you may need. If you have different subnets for your 1GigE management and your IB or 10GigE data (a very good idea), then you will need to make these. It's important to have routes for this, as you do not want data coming in over the 10GigE pipe, but then returning over the 1GigE pipe, right? That will happen if this is not setup correctly. Make your routes, as the picture example shows with a 10Gig aggragate here we called "Front-end-IP". Any traffic coming in from subnet 172.20.69 will use this pipe.

Lastly, check your multi-homing model button up top. I like 'Adaptive'. Loose is the default, and makes it so your packets can traverse your routes, even though they may go over the wrong route, so it seems like your system is working. This can very well be an illusion. Your ping may work, but it may be coming from the wrong interface, as "Loose" basically means the ZFSSA just doesn't care or enforce any rules. "Strict", on the other hand, is great if you want total enforcement. If you are very good with your routes, and are positive you have it right, and want to ensure that a packet never goes the wrong way, even if that means dropping the packet, then use Strict. I'm using Adaptive here, which is a happy medium.  From the help file: The "Adaptive" choice will prefer routes with a gateway address on the same subnet as the packet's source IP address: 1) An IP packet will be accepted on an IP interface so long as its destination IP address is up on the appliance. 2) An IP packet will be transmitted over the IP interface tied to the route that most specifically matches an IP packet's destination address. If multiple routes are equally specific, prefer routes that have a gateway address on the same subnet as the packet's source address. If no eligible routes exist, drop the packet.

Update 4/23/12- My colleague, Darius (https://blogs.oracle.com/si/), rightfully wanted me to point out how important it was to setup a static route for replication. You do not want replication to go over a private management port by mistake, as this will cause it to fail when one controller or the other goes down for maintenance.

I hope this helps. Routing can be fun. 

Saturday Apr 14, 2012

New SPC2 benchmark- The 7420 KILLS it !!!

This is pretty sweet. The new SPC2 benchmark came out last week, and the 7420 not only came in 2nd of ALL speed scores, but came in #1 for price per MBPS.

Check out this table. The 7420 score of 10,704 makes it really fast, but that's not the best part. The price one would have to pay in order to beat it is ridiculous. You can go see for yourself at http://www.storageperformance.org/results/benchmark_results_spc2
The only system on the whole page that beats it was over twice the price per MBPS. Very sweet for Oracle.

So let's see, the 7420 is the fastest per $.
The 7420 is the cheapest per MBPS.
The 7420 has incredible, built-in features, management services, analytics, and protocols. It's extremely stable and as a cluster has no single point of failure. It won the Storage Magazine award for best NAS system this year.

So how long will it be before it's the number 1 NAS system in the market? What are the biggest hurdles still stopping the widespread adoption of the ZFSSA? From what I see, it's three things: 1. Administrator's comfort level with older legacy systems. 2. Politics 3. Past issues with Oracle Support.  

I see all of these issues crop up regularly. Number 1 just takes time and education. Number 3 takes time with our new, better, and growing support team. many of them came from Oracle and there were growing pains when they went from a straight software-model to having to also support hardware. Number 2 is tricky, but it's the job of the sales teams to break through the internal politics and help their clients see the value in oracle hardware systems. Benchmarks like this will help.

Thursday Apr 12, 2012

Hybrid Columnar Compression

You heard me in the past talk about the HCC feature for Oracle databases. Hybrid Columnar Compression is a fantastic, built-in, free feature of Oracle 11Gr2. One used to need an Exadata to make use of it. However, last October, Oracle opened it up and now allows it to work on ANY Oracle DB server running 11Gr2, as long as the storage behind it is a ZFSSA for DNFS, or an Axiom for FC.

If you're not sure why this is so cool or what HCC can do for your Oracle database, please check out this presentation. In it, Art will explain HCC, show you what it does, and give you a great idea why it's such a game-changer for those holding lots of historical DB data.

Did I mention it's free? Click here:

http://hcc.zanghosting.com/hcc-demo-swf.html

Monday Apr 02, 2012

New ZFSSA code release - April 2012

A new version of the ZFSSA code was released over the weekend.

In case you have missed a few, we are now on code 2011.1.2.1. This minor update is very important for our friends with the older SAS1 cards on the older 7x10 systems. This 2.1 minor release was made specifically for them, and fixes the issue that their SAS1 card had with the last major release. They can now go ahead and upgrade straight from the 2010.Q3.2.1 code directly to 2011.1.2.1.

If you are on a 7x20 series, and already running 2011.1.2.0, there is no real reason why you need to upgrade to 1.2.1, as it's really only the Pandora SAS1 HBA fix. If you are not already on 1.2.0, then go ahead and upgrade all the way to 2011.1.2.1.

I hope everyone out there is having a good April so far. For my next blog, the plan is to work off the Analytic tips I did last week and expand on which Analytics you want to really keep your eyes on, and also how to setup alerts to watch them for you.

You can read more and keep up on your releases here: https://wikis.oracle.com/display/FishWorks/Software+Updates

Steve 

 

Wednesday Mar 28, 2012

Fun tips with Analytics

If you read this blog, I am assuming you are at least familiar with the Analytic functions in the ZFSSA. They are basically amazing, very powerful and deep.

However, you may not be aware of some great, hidden functions inside the Analytic screen.

Once you open a metric, the toolbar looks like this:

Now, I’m not going over every tool, as we have done that before, and you can hover your mouse over them and they will tell you what they do. But…. Check this out.
Open a metric (CPU Percent Utilization works fine), and click on the “Hour” button, which is the 2nd clock icon. That’s easy, you are now looking at the last hour of data. Now, hold down your ‘Shift’ key, and click it again. Now you are looking at 2 hours of data. Hold down Shift and click it again, and you are looking at 3 hours of data. Are you catching on yet?
You can do this with not only the ‘Hour’ button, but also with the ‘Minute’, ‘Day’, ‘Week’, and the ‘Month’ buttons. Very cool. It also works with the ‘Show Minimum’ and ‘Show Maximum’ buttons, allowing you to go to the next iteration of either of those.

One last button you can Shift-click is the handy ‘Drill’ button. This button usually drills down on one specific aspect of your metric. If you Shift-click it, it will display a “Rainbow Highlight” of the current metric. This works best if this metric has many ‘Range Average’ items in the left-hand window. Give it a shot.

Also, one will sometimes click on a certain second of data in the graph, like this:

 In this case, I clicked 4:57 and 21 seconds, and the 'Range Average' on the left went away, and was replaced by the time stamp. It seems at this point to some people that you are now stuck, and can not get back to an average for the whole chart. However, you can actually click on the actual time stamp of "4:57:21" right above the chart. Even though your mouse does not change into the typical browser finger that most links look like, you can click it, and it will change your range back to the full metric.

Another trick you may like is to save a certain view or look of a group of graphs. Most of you know you can save a worksheet, but did you know you could Sync them, Pause them, and then Save it? This will save the paused state, allowing you to view it forever the way you see it now. 

Heatmaps. Heatmaps are cool, and look like this: 

Some metrics use them and some don't. If you have one, and wish to zoom it vertically, try this. Open a heatmap metric like my example above (I believe every metric that deals with latency will show as a heatmap). Select one or two of the ranges on the left. Click the "Change Outlier Elimination" button. Click it again and check out what it does. 

Enjoy. Perhaps my next blog entry will be the best Analytic metrics to keep your eyes on, and how you can use the Alerts feature to watch them for you.

Steve 

Wednesday Mar 21, 2012

Using all Ten IO slots on a 7420

So I had the opportunity recently to actually use up all ten slots in a clustered 7420 system. This actually uses 20 slots, or 22 if you count the clusteron card. I thought it was interesting enough to share here. This is at one of my clients here in southern California.

You can see the picture below. We have four SAS HBAs instead of the usual two. This is becuase we wanted to split up the back-end taffic for different workloads. We have a set of disk trays coming from two SAS cards for nothing but Exadata backups. Then, we have a different set of disk trays coming off of the other two SAS cards for non-Exadata workloads, such as regular user file storage. 
We have 2 Infiniband cards which allow us to do a full mesh directly into the back of the nearby, production Exadata, specifically for fast backups and restores over IB. You can see a 3rd IB card here, which is going to be connected to a non-production Exadata for slower backups and restores from it.
The 10Gig card is for client connectivity, allowing other, non-Exadata Oracle databases to make use of the many snapshots and clones that can now be created using the RMAN copies from the original production database coming off the Exadata. This allows for a good number of test and development Oracle databases to use these clones without effecting performance of the Exadata at all.
We also have a couple FC HBAs, both for NDMP backups to an Oracle/StorageTek tape library and also for FC clients to come in and use some storage on the 7420.

 Now, if you are adding more cards to your 7420, be aware of which cards you can place in which slots. See the bottom graphic just below the photo. 
Note that the slots are numbered 0-4 for the first 5 cards, then the "C" slots which is the dedicated Cluster card (called the Clustron), and then another 5 slots numbered 5-9.

Some rules for the slots:

  • Slots 1 & 8 are automatically populated with the two default SAS cards. The only other slots you can add SAS cards to are 2 & 7.
  • Slots 0 and 9 can only hold FC cards. Nothing else. So if you have four SAS cards, you are now down to only four more slots for your 10Gig and IB cards. Be sure not to waste one of these slots on a FC card, which can go into 0 or 9, instead. 
  • If at all possible, slots should be populated in this order: 9, 0, 7, 2, 6, 3, 5, 4




Monday Mar 12, 2012

Good papers and links for the ZFSSA

So I have a pretty good collection of links and papers for the ZFSSA, and instead of giving them out one-at-a-time when asked, I thought it may be easier to do it this way. Many of the links from my old blog last May no longer work, so here is an updated list of some good spots to check out.

These are for ZFS, in general, not the ZFSSA, but it gives one good insight to how ZFS functions:


Tuesday Mar 06, 2012

New 7420 hardware released today

Some great new upgrades to the 7420 were announced and released today. You can now get 10-core CPUs in your 7420, allowing you to have 40 cores in each controller. Even better, you can now also go to a huge 1TB of DRAM for your L1ARC in each controller, using the new 16GB DRAM modules.

So your new choices for the new 7420 hardware are 4 x 8-core or 4 x 10-core models. Oracle is no longer going to sell the 2 x CPU models, and they are also going to stop selling the 6-core CPUs, both as of May 31st. Also, you can now order 8GB or 16GB modules, meaning that the minimum amount of memory is now 128GB, and can go to 1TB in each controller. No more 64GB, as the 4GB module has also been phased out (starting today, actually).

Now before you get upset that you can no longer get the 2-CPU model, be aware that there was also a price drop, so that the 4 x 8-core CPU model is a tad LESS then the old 2 x 8-core CPU model. So stop complaining.

It's the DRAM that I'm most excited about. I don't have a single ZFSSA client that I know of that has a CPU bottleneck. So the extra cores are great, but not amazing. What I really like is that my L1ARC can now be a whole 1TB. That's crazy, and will be able to drive some fantastic workloads. I can now place your whole, say 800GB, database entirely in DRAM cache, and not even have to go to the L2ARC on SSDs in order to hit 99% of your reads. That's sweet. 

Friday Feb 24, 2012

New ZFSSA code release today

The first minor release of the 2011.1.1 major release for the ZFSSA came out yesterday.

You can get the code via MOS, under the "Patches and updates" tab. Just click the "Product or Family (advanced)" link, and then type "ZFS" in the search window and it really takes you right to it. Or search on it's patch ID, which is 13772123

Along with some other fixes, the most important piece of this update is the RPC flow control fix, which will greatly help those using the ZFSSA to backup an Exadata over Infiniband. 

If you're not already on the major release of 2011.1.1, I urge you to update to it as soon as you can. You can jump right to this new 2011.1.1.1 code, as long as you are already on 2010.Q3.2.1 or higher. You don't need to go to 2011.1.1 first, just jump to 2011.1.1.1.

If you are using your ZFSSA to backup an Exadata, I urge you to get on 2011.1.1.1 ASAP, even if it means staying late and scheduling special time to do it.

It's also important to note that if you have a much older ZFSSA (one of the 7x10 models that are using the older SAS1 HBAs, and not the SAS2 HBAs), that you do NOT upgrade to 2011.1 code. The latest code that supports your SAS1 systems is 2010Q3.4.2.

 **Update 2-26-12:  I noted a few folks saying the link was down, however that may have been a burp in the system, as I just went into MOS and was able to get 2011.1.1.1 just fine. So delete your cookies and try again. - Steve

Thursday Feb 23, 2012

Great new 7320 benchmark

A great new benchmark has been put up on SPEC for our mid-class 7320. You can see it here:

http://www.spec.org/sfs2008/results/res2012q1/sfs2008-20120206-00207.html

What's cool about this benchmark is the fact this is not only our middle-sized box, but it used only 136 drives to reach this rather high 134,140 NFS Ops/sec number. If you look at the other systems tested here, you will notice that they must use MANY more drives (at presumably a much higher cost) in order to meet or beat those IOPS.

Check these out here... http://www.spec.org/sfs2008/results/sfs2008nfs.html

For example, a FAS6080 should be far faster then our smaller 7320, right? But it only scored 120,011 even though it used 324 disks. The Isilon S200 with 14 nodes and 679 drives only scored 115,911. I would hate to find out what that system's street price is. I'm pretty sure it's higher then our 7320 with 136 drives. Now, of course all of these benchmark numbers are unrealistic to most people, as they are done in perfect conditions with each manufacture's engineers tuning and tweaking the system the best they can, right? True, but if that's the case, and the other folks tuned and configured those other boxes just like we did, it still seems like a fair fight to me, and our results are just heads and tails above the rest on a cost per IOP basis. I don't see anything on this site that touches our IOPS with the same amount of drives and presumably the same cost price range. Please point out if I missed anything here, I might be wrong.

I really love the ones that go so far overboard on this site... Check out the 140 node Isilon. Let's see... Wow, it's over one million IOPS!!!! That's impressive, until you see it's using 3,360 disk drives. That's funny. PLEASE let me know if you have a 140 node Isilon up and running. I'd love to see it. I'd also love to know what it costs.

Tuesday Feb 07, 2012

Tip- Setting up a new cluster

I haven’t given out a real tip for a while now, but this issue popped up on my last week, so thought I would pass it along. I had a horrible time setting up a new 7320 cluster; for the sole reason that I screwed it up by not doing it in the right order. This caused my install, which should have been done in 1 hour, to take me over 3 hours to complete.

So let me tell you what I did wrong, and then I'll tell you the way I should have done it.

Out of the box, my client's two new 7320 controller heads were one software revision behind, at 2010.Q3.4.2, so I wanted to upgrade them to the newest version of 2011.Q1.1.1. So far, so good, right? Well here was my mistake. I configured controller A via the serial interface, gave it IP numbers, went into the BUI, and did the upgrade to 2011.Q1.1.1. No problem. Now, I wanted to bring the other one up and do the same thing. However, I knew that controller B in a cluster must be in the initial, factory-reset state in order to be joined to a cluster.  You can't configure it, first, or if you do, you must factory-reset it in order to join a cluster. So I bring controller B up, but I don't configure it, and I go to controller A to start the cluster setup process. Big mistake. The process starts, but because the two controllers are on two different software versions, the cluster process cannot continue. This hoses me (that's southern California slang for "messes me up"), because now controller B has started the cluster setup process, and going to the serial connection just has it hung up in a "configuring cluster" state. Rebooting it does not help, as it's still in the "configuring cluster" state once it comes back up.

So.... now I have 2 choices. I can downgrade controller A back to 2010.Q3.4.2, or I can factory-reset controller B, bring it up as a single controller, upgrade it to 2011.Q1.1.1, and then factory reset again, and then finally be able to add it to the cluster via controller A's cluster setup process. I opt for the second choice, as I do not want to downgrade controller A, which is working just fine. Remember, controller B is currently hosed, messed up, or wanked, depending on how you want to say it.
It's stuck. So to get it back to a state I can work with, I need to do the trick I talked about way back in this blog on May 31, 2011 (http://blogs.oracle.com/7000tips/entry/how_to_reset_passwords_on). I had to use the GRUB menu, use the -c trick on the kernel line, and reset the machine and erase all configuration on it. Now I could bring it up as a single controller, upgrade it, factory reset it, and then have it join the cluster. That all worked fine, it just took be two hours to do it all.

Here's what I should have done.

Bring up controller A, config it and log into the BUI. Now bring up controller B. Do NOT config it in any way. Using controller A, setup clustering in the cluster menu.

Once the two controllers are clustered and all is well, NOW go ahead and upgrade controller A to the latest code. Once it reboots, go ahead and upgrade controller B. Everything's fine. You see, if the cluster has already been made, it's perfectly fine to upgrade one controller at a time. The software lets you do that. The software does NOT let you setup a NEW cluster if the controllers are not on the same software level. 

So that is the cluster setup safety tip of the day, kids. Have fun. 

Tuesday Jan 31, 2012

New Power Calculator is up

The Oracle Power Calculator for the new 3TB, 600GB, and 300GB drive versions of the ZFSSA is now up and running.

http://www.oracle.com/us/products/servers-storage/sun-power-calculators/calc/s7420-power-calculator-180618.html

From this page, you can click on the "Power Calculators" link on top to go back out to the main screen where you will find power calculators for all of Oracle hardware. 

Friday Jan 20, 2012

New Storage Magazine awards for NAS... Check this out...

Well, it's hard to be quiet about this. Storage Magazine just came out with the January 2012 issue, showing Oracle Storage doing quite well (#1) with the Oracle ZFSSA 7420 and 7320 family. Check out pages 37-43 of this month's Storage Magazine.

Storage Magazine: http://docs.media.bitpipe.com/io_10x/io_103104/item_494970/StoragemagOnlineJan2012final2.pdf (pages 37-43)

award

Thursday Jan 12, 2012

New ZFSSA simulator download

I've just been informed that the simulator download has been updated to the latest version of 2011.1.1.

So instead of trying to upgrade your older simulator, it is possible to download and install the new one at the latest code. Mine upgraded just fine, but some people report errors during upgrading, which occurs when using a computer or laptop without enough memory or a variety of other problems. You can get the simulator here:

http://www.oracle.com/webapps/dialogue/ns/dlgwelcome.jsp?p_ext=Y&p_dlg_id=10521841&src=7299332&Act=45

Tuesday Jan 10, 2012

Even more ZFSSA announcements

The new announcements for the ZFSSA just keep on coming.

Oracle has released today the 3TB drives for the 7420 and 7320 disk trays. So you now can choose 2TB and 3TB 7,200 RPM drives and 300GB and 600GB 15,000 RPM drives in your 7420 and 7320 systems.

Now, the 2TB drive have a last order date of May 31, 2012, so after that it will be 3TB only for the slower-speed drives.

Also, has anyone checked out the new local replication feature that just came out in the 2011.1.1 software release? I'm going to play with it this week and I'll do a write up on it soon.

Steve 

Thursday Jan 05, 2012

New ZFSSA firmware release is available in MOS

 

In case you have not been paying attention, the new 2011.1.1.0 software release for the ZFSSA is out and available for download inside the My Oracle Support website.

To find it, go to the "Patches & Updates" tab, and then do the advanced family search. Type in "ZFSSA" and it will take you right to it (choose 2011.1 in the next submenu).

You need to have your systems on 2010.3.2.1 or greater in order to upgrade to 2011.1.1, so be prepared.

It also includes a new OEM grid control plug-in for the ZFSSA.

Here are some details about it from the readme file: 

Sun ZFS Storage Software 2011.1.1.0 (ak-2011.04.24.1.0)This major software update for Sun ZFS Storage Appliances contains numerous bug fixes and important firmware upgrades. Please carefully review all release notes below prior to updating.
Seven separate patches are provided for the 2011.1.1.0 release:

Features

This release includes a variety of new features, including:

  • Improved RMAN support for Oracle Exadata
  • Improved ACL interoperability with SMB
  • Replication enhancements - including self-replication
  • InfiniBand enhancements - including better connectivity to Oracle Exalogic
  • Datalink configuration enhancements - including custom jumbogram MTUs
  • Improved fault diagnosis - including support for a variety of additional alerts
  • Per-share rstchown support

Performance

This release also includes major performance improvements, including:

  • Significant cluster rejoin performance improvements
  • Significant AD Domain Controller failover time improvements
  • Support for level-2 SMB Oplocks
  • Significant zpool import speed improvements
  • Significant NFS, iSER, iSCSI and Fibre Channel performance improvements due to elimination of data copying in critical datapaths
  • ZFS RAIDZ read performance improvements
  • Significant fairness improvements during ZFS resilver operations
  • Significant Ethernet VLAN performance improvements

Bug Fixes

This release includes numerous bug fixes, including:

  • Significant clustering stability fixes
  • ZFS aclmode support restored and enhanced
  • Assorted user interface and online help fixes
  • Significant ZFS, NFS, SMB and FMA stability fixes
  • Significant InfiniBand, iSER, iSCSI and Fibre Channel stability fixes
  • Important firmware updates

Wednesday Dec 28, 2011

New Storage Eye Charts

My new Storage Eye Chart is out. You can get it from the bookmark link on the right-hand side of this page.

Version 10 adds the Axiom and 2500M2 to a new page and also updates the ZFSSA with the new updates.

I hope everyone out there has a very happy New Year. See you in January. 

Tuesday Dec 06, 2011

New SSDs announced today

Thought you should know about the 3 new announcements for the ZFSSA.

--Write-flash-cache SSDs have gone from 18GB to 73GB each.
--New long-range transceivers for the 10GigE cards are now available
--3TB drives for the 7120 model are here today. The 3TB drives for the 7320 and 7420 are NOT here yet, but close.


Overview
Effective December 6, 2011, we are pleased to announce three new options for Oracle’s Sun ZFS Storage portfolio:
1. Availability of a 73GB Write Flash Cache for 7320 and 7420.  This new SSD features 4X the capacity and almost double the write throughput and IOPS performance of its
predecessor.  In comparison to the current 18GB SSD, this new 73GB SSD significantly enhances the system write speed.  As an example, a recent test on a particular 7420
system demonstrated a 7% improvement in system write performance while using half the number of SSDs.  The 73GB SSD is also available to our customers at a lower list
price point.  This is available as an ATO or X Option.
2. Availability of the standard Sun 10 GbE Long Range Transceiver for the current 1109A-Z 10GbE card as a configurable option for ZFS Storage Appliance.  This Long Range Transceiver enables 10 GbE optical connectivity for distances greater than 2 KM.
3. Availability of a new 7120 base model featuring integrated 11 x 3TB HDDs and a 73GB Write Flash Cache.  (Note that availability of the 3TB drive is limited to the 7120 base model internal storage only – it is not available in the disk shelves at this time.)


Additionally, we are announcing End-of-Life for the following two items:
1. 2TB drive-equipped base model of the 7120, with a Last Order Date of December 31, 2011.
2. 18GB Write Flash Cache, with a Last Order Date of January 10, 2012.

Wednesday Oct 26, 2011

VDEV - What is a VDEV and why should you care?

Ok, so we can finally talk VDEVs. Going back to my blog on disk calculations, I told you how the calculator works, and the way you can see how many drive spindles you would have for any particular RAID layout. Let's use an example of nine trays of 24 drives each, using 1TB drives.


 Yes, I know we no longer offer 1TB drives, but this is the graphic I had, so just roll with me. Now, if we were setting this up in the ZFSSA BUI, it would look like this:


 So that's all great and it all lines up, right? Well, the one thing the BUI doesn't show very well is the VDEVs. You can figure it out in your head if you know what you're doing, but the calculator can do it for you if you just add the "-v" option right after the .py command in the python string. Doing that for the above example will give you back this:


Notice the new column for VDEVs. Cool. So now I can see the breakdown of Virtual Devices that each type of RAID will create out of my physical devices (spindles). In this case, my nine trays of 24 spindles is 216 physical devices.  
-If I do something silly and make that a 'Stripe', then I would get 1 big virtual device made up of 216 physical devices.
-I could also make it a 'Mirror', which will give me 106 virtual devices, each made up of 2 physical devices.
-A RAIDz1 pool will give me 53 virtual devices, each with 4 physical devices to make my 3+1 stripes.
-Finally, for the sake of this conversation, a RAIDz2 choice will give me only 15 VDEVs, each with 14 physical drives that make 12+2 stripes. You don't get 14 data drives, you get 14 drives per stripe, so you need to remember that 2 of those are parity drives in a RAIDz2 stripe when you calculate your usable space.

Now, why do you care how many VDEVs you have? It's all about throughput.  Very simply stated, the more VDEVs you have, the more data can be pushed into the system by the most amount of users at once. Now, that's very simplistic, and it really depends on your workload. There are exceptions as I have found, but for the most part, more VDEVs will equal better throughput for small, random IO. This is why a Mirrored pool is almost always the best way to setup a high-throughput pool for small, random IO such as a database. look at all the VDEVs a mirrored pool gives you.

Think of it this way: Say you have a 256K block of data you want to write to the system, using a 128K record size. With a mirrored pool, ZFS will split your 256K file into 2 blocks of 128K each, and send it down to exactly 2 of the VDEVs to write out to 4 physical drives. Now, you still have a whopping 104 other VDEVs not doing anything, and they could all be handling other user's workflows all at the same time. Take the same example using a RAIDz1 pool. ZFS will have to break up your 256K block again into two 128K chunks and send it to 2 VDEVs, each with 4 physical drives, with each data drive of the 3+1 stripe getting about 43K. That's all fine, but while those 8 physical drives are working on that data, they can't do anything else, and you only have 51 other VDEVs to handle everyone else's workload.
As an extreme example, let's check out a RAIDz3 False pool. You only get 4 VDEVs, each with 53 drives, each in a 50+3 stripe. Writing that same 256K block with 128K record sizes will still split it over 2 VDEVs, and you only have 2 left for others to use at the same time. In other words, it will take the IOPs of 106 physical spindles to write that one stupid 256K block, while in the Mirrored pool, it would have only taken the IOPs of 4 physical spindles, leaving you with tons of other IOPs. 

Make sense?

Like I said, Mirroring is not always the best way to go. I've seen plenty of examples where we choose other pools over Mirrored after testing. That is the key. You need to test your workload with multiple pool types before picking one. If you don't have that luxury, make your best, educated guess based on the knowledge that in general, high throughput random IO does better with more VDEVs, and large, sequential files can do very well with larger stripes found in RAIDz2. 

As a side note, we recommend the RAIDz1 pool for our Exadata backups to a ZFSSA. After testing, we found that, yes, the mirrored pool did go a bit faster, but not enough to justify the drop in capacity. We also found that the RAIDz1 pool was about 20% faster for backups and restores then the RAIDz2 pool, so that speed difference didn't justify the extra capacity of RAIDz2. Now, some people may disagree and say they don't care about capacity, they want the fastest no matter what, and go with Mirrored even in this scenario. That's fine, and that's the beauty of the ZFSSA, where you are allowed to experiment with many choices and options and choose the right balance for your company and your workload.

Have fun. Steve 


Thursday Oct 20, 2011

Shadow Migration

Still not talking about VDEVs? I know, I know, but hey, there's only so many hours in a day, folks, and I do have a life... So something came up this week and I want to talk about Shadow Migration, instead.

Now, built-into the ZFSSA you have both Replication and Shadow Migration. Be sure to use the right one for the right job. Replication is used from one 7000 family system to a different 7000 system. This is important: It can NOT be used on two clustered controllers of the same system. That will mess you up. It is only for other 7000's, and can not replicate to anything other than another ZFSSA. ***UPDATE- This is no longer the case. Replication inside the same system between two clustered controllers has been supported since October 2012.

Shadow Migration, on the other hand, is really handy for both migrating the data from any, non-ZFSSA, NFS source (think from a filer made by someone other than Oracle), or even from a different pool between controllers on the SAME clustered ZFSSA system. This can be very cool when you have an important share on one pool, and you want to move it (and the data inside it) to a different pool. Maybe it's because you want it on your RAIDz2 pool instead of your Mirrored pool. Maybe it's because you want ControllerA in charge of the share but it got made months ago by mistake in the pool owned by ControllerB. I don't care, you just want data from some share, either local to the system or from a NFS share on a different system, to come over into a brand-new share in some pool. Maybe you want to suck in the data from an older, non-Oracle filer, but you know it will take a while, and you want people to be able to still get to the data while the migration is taking place.

Great. That's Shadow Migration. It can get data from both a local source (another share of the same system) or from any NFS mount from anywhere. While the migration is taking place, the original source turns read-only, and users start to mount and use your new share being created. If the data being requested by a user has not been migrated over yet, the ZFSSA will go get it, while continuing to migrate in the background.

Here's how to do a Local Shadow Migration, moving data from a share in one pool to another pool on the same system.

1. Check out the Shadow Migration Service. Under Services, one can change how many threads the background service will use to do the migration. Make sure the green light is on here, while you're at it. **Update: I have been told that our internal team took this down from 8 to 2 for our large (13PB) migrations from various older filers to new 7000s for our Oracle data center. Oracle IT and our Oracle DC is now 100% ZFSSA. 

2. I have a share called Share1A, inside Pool1, which is a mirrored pool. Note that I have about 85MB of stuff in it.
Be careful NOT to choose the replication area from here, or at all, from anywhere. You're not doing replication, remember? 
Do not confuse replication with shadow migration.

3. Now, I don't want that data inside pool1, I really want it in Pool2, which is a RAIDz1 pool. So, switch to Pool2, and create a brand-new share, just like normal.
Change pools with the Pools drop-down in the upper left, then click the plus sign.

 4. Now, in the new Share box, first choose the pool you want the new share to be in, and then be sure to choose "LOCAL" as your data migration source.
Instead of typing in the path to some external NFS share, you will type in the local path of another share on the same system, in this case it's "/export/Share1A"

5. Now it gets cool. Check out my new Shadow1 share. As the migration begins (right away), you will see the progress bar here on the left. You can actually stop it, and even change the source from here, mid-stream (although that would be strange and I don't think I would recommend that).  ***Update: To be fair, it was explained to me that this process may take a while to start. The process may have to read a large amount of metadata before you see the bar move. If you have very large directories in the share, especially at the top, then be patient.

6. When the migration is done (The Local version should go quite quickly), the Shadow Migration section goes away, and you will get an alert message on the top of the screen like this:

7. Also, you can view some Shadow Migration specific Analytics while it's running:

8. Now that it's done, I have 2 shares. My original Share1A, and my new Shadow1 in a different pool with the same data copied over.
I could now delete the first share or pool in order to rebuild the pool a different way. Or, if this was a migration from an older filer, I could re-purpose that filer as a nice planter in my garden.


Wednesday Oct 12, 2011

ARC- Adaptive Replacement Cache

I know, I know, I told you I was going to talk about the very important VDEVs next, but this other article came up in another blog, and it’s a rather good read about the ZFSSA cache system, called our ARC, or Adaptive Responsive Cache.

So, if you want to learn more about the ARC in a ZFSSA, go check it out. Our ARC has two levels. Level 1 ARC is our RAM. Almost the entire RAM in a ZFSSA is used for data caching, and that’s the ARC, or L1ARC. Now, we go further by having a L2ARC. Once RAM is full, our L2ARC can hold even more cache by using any Readzillas you have in the system. That’s right; our Readzillas SSDs are the L2ARC. We use SSDs for cache, not as storage space. (Logzillas, on the other hand, are for fast synchronous write acknowledgements, and have nothing to do with ARC at all).

So a 7420 with 512GB memory and four Readzillas has about a 500GB L1ARC and a 2TB L2ARC to use as an Adaptive Responsive Cache to work with. 500GB of that 2.5TB of space will be nano-second speed while 2TB of it will be micro-second speed. Still much faster than the milli-second speed you get when you have to get data off a hard drive.

So Cache is cool, and it’s nice to have a high cache hit ratio, and it’s easier to have a high cache hit ratio if you have more cache, right? With the new, lower priced Readzillas, this should be easier to do.

Now, this other blog I’m pointing you to says we call our cache something else, but don’t worry about it, we use the name “Adaptive responsive Cache” in the Oracle ZFSSA world.

Go check out:
http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html

Ok, VDEVs will come next!

Maybe.

Steve

Tuesday Oct 11, 2011

Where can you find info on updates?

Someone asked where one could find info on what is updated in each update. Once you download any update from MOS, there is a readme file inside of it with this info.

However, if you want to see the readme file first, go here:
http://wikis.sun.com/display/FishWorks/Software+Updates

Thursday Oct 06, 2011

New ZFSSA code release today - 2010.Q3.4.2

A new code was released on MOS today. We are now on code 2010.Q3.4.2. (ak-2010.08.17.4.2)

Our minimum recommended version is still 2010.Q3.4.0, but if you have the time and opportunity to upgrade to this new Q3.4.2 release, it would be a very good idea. It includes many minor bug fixes. You can view the readme file it comes with to see what it includes.

Download it under the "patches & upgrades" tab in My Oracle Support. 

Tuesday Oct 04, 2011

How to calculate your usable space on a ZFSSA

So let’s say you’re trying to figure out the best way to setup your storage pools on a ZFSSA. So many choices. You can have a Mirrored pool, a RAIDz1, RAIDz2, or RAIDz3 pool, a simple striped pool, or (if you’re REALLY anal) you can even have a Triple Mirrored pool.

How can you choose which pool to make? What if you want more than one pool on your system? How much usable space will you have when it’s all done?

All of these questions can be answered with Ryan Mathew’s Size Calculator. Ryan made a great calculator a while back that allows one to use the ZFSSA engine to give you back all sorts of pool results. You simply enter how many disk trays you have, what size drives they are, how many pools you want to make, and the calculator does the rest. It even shows you a nice graphical layout of your trays. Now, it’s not as easy as a webpage, but it’s not too bad, I promise. It’s a python script, but don’t let that scare you. I never used Python before I got my hands on this calculator, and it was worth loading it up for this. First, you need to go download and install Python 2.6 here: http://www.python.org/getit/releases/2.6/ Make sure you have 2.6 installed, as the calculator will not work with the newer 3.0 Python. In fact, I had both loaded, and had to completely uninstall 3.0 before it would work with my installed 2.6.

Now, get your hands on the Size Calc script. Ryan is making a new one that is for the general public. It will be out soon. In the meantime, ask your local Oracle Storage SC to do a calculation for you.

This is a copy from Ryan’s, but I fixed a few things to make it work on my Windows 7 laptop. If you’re not using Windows 7, you may find Ryan’s original blog and files here: http://blogs.oracle.com/rdm/entry/capacity_sizing_on_7x20

So now you’re ready. Go to a command line and get to the Python26 directory, where you have also placed the “size3.py” script.

Type “size3.py ZFSipaddress password 20”
Use your ZFSSA for the IP address and your root password for the password. You can use the simulator for this. Remember, the simulator is the real code and has no idea it's not a 'real' system.

Mine looks like this: “Size3.py 192.168.56.102 changeme 20” Now, you will see the calculator present a single tray with 20 drives, and all the types of pools you can make with that.

So now, make it bigger. Along with the first tray that has 20 drives (because of the Logzillas, right?), we also want to add a 2nd and a 3rd tray, each full with 24 drives. So type “Size3.py 192.168.56.102 changeme 20 24 24”  You could do this all day long. Notice that now you have some extra choices, as the NSPF (no single point of failure) pools are now allowed, since you have more than two trays.

That’s it for the basics. Pretty simple. Now, we can get more complicated. Say you don’t want one big pool, but want to have an active/active cluster with two pools. Type “Size3.py 192.168.56.102 changeme 10/10 12/12 12/12”


This will create two even pools. They don’t have to be even. Check this out. I want to make two pools, one with the first 2 disk trays with 8 logzillas plus half of full trays 3 and 4. So the second pool would only be the other half of trays 3 and 4. I used “Size3.py 192.168.56.102 changeme 20/0 20/0 12/12 12/12”

Here’s the last one for today- Say you already have a 2-disk shelf system, with 2 pools, and you set it up like this: “Size3.py 192.168.56.102 changeme 10/10 12/12” Simple. Now, you go out and buy another tray of 24 drives, and you want to add 12 drives to each pool. You can use the “add” command to add a tray onto an existing system. It’s very possible that adding a tray will give you different results than if you configured 3 trays to begin with, so be careful. This is a good example. Note that you get different results if you do “10/10 12/12 12/12” then if you do “10/10 12/12 add 12/12”.

Our next lesson will be about VDEVs. When you add the “-v” command right after “size3.py”, you may notice a new column in the output called “VDEVS”. These are the most important aspect of your pool. It’s very important to understand what these are, how many you need and how many you have.

It’s so important, I’m going to save it for another blog topic. Have a great day!!!! J

Monday Oct 03, 2011

New SPC benchmark for the 7420

Oracle announced today a great new benchmark on SPC (Storage Performance Council) for our 7420. Instead of re-writing everything already written, please go see this excellent blog entry by Roch at http://blogs.oracle.com/roch/entry/fast_safe_cheap_pick_3

It explains the new results and why they're so cool.

Go to http://www.storageperformance.org/results/benchmark_results_spc1 to see the results. Scroll down to the "O" section for oracle, and the 7420 results is the first one.

Friday Sep 30, 2011

Hooray! The HCC announcement came out today...

Check this out.

http://www.oracle.com/us/corporate/press/508020

The 7000 now supports Oracle 11gR2 HCC feature. This is something that, until now, you could only get inside an Exadata. Now, one can use HCC and see a huge savings not only in space used, but also in performance, so long as your database is being stored on Oracle hardware families of ZFSSA 7000 or the Axiom 600.

Very cool. No license. It's ready to go.

About

This blog is a way for Steve to send out his tips, ideas, links, and general sarcasm. Almost all related to the Oracle 7000, code named ZFSSA, or Amber Road, or Open Storage, or Unified Storage. You are welcome to contact Steve.Tunstall@Oracle.com with any comments or questions

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today