By Steve Tunstall-Oracle on Feb 07, 2012
I haven’t given out a real tip for a while now, but this issue popped up on my last week, so thought I would pass it along. I had a horrible time setting up a new 7320 cluster; for the sole reason that I screwed it up by not doing it in the right order. This caused my install, which should have been done in 1 hour, to take me over 3 hours to complete.
So let me tell you what I did wrong, and then I'll tell you the way I should have done it.
Out of the box, my client's two new 7320 controller heads were one software revision behind, at 2010.Q3.4.2, so I wanted to upgrade them to the newest version of 2011.Q1.1.1. So far, so good, right? Well here was my mistake. I configured controller A via the serial interface, gave it IP numbers, went into the BUI, and did the upgrade to 2011.Q1.1.1. No problem. Now, I wanted to bring the other one up and do the same thing. However, I knew that controller B in a cluster must be in the initial, factory-reset state in order to be joined to a cluster. You can't configure it, first, or if you do, you must factory-reset it in order to join a cluster. So I bring controller B up, but I don't configure it, and I go to controller A to start the cluster setup process. Big mistake. The process starts, but because the two controllers are on two different software versions, the cluster process cannot continue. This hoses me (that's southern California slang for "messes me up"), because now controller B has started the cluster setup process, and going to the serial connection just has it hung up in a "configuring cluster" state. Rebooting it does not help, as it's still in the "configuring cluster" state once it comes back up.
So.... now I have 2 choices. I can downgrade controller
A back to 2010.Q3.4.2, or I can factory-reset controller B, bring it up as a
single controller, upgrade it to 2011.Q1.1.1, and then factory reset again, and
then finally be able to add it to the cluster via controller A's cluster setup
process. I opt for the second choice, as I do not want to downgrade controller
A, which is working just fine. Remember, controller B is currently hosed,
messed up, or wanked, depending on how you want to say it.
It's stuck. So to get it back to a state I can work with, I need to do the trick I talked about way back in this blog on May 31, 2011 (http://blogs.oracle.com/7000tips/entry/how_to_reset_passwords_on). I had to use the GRUB menu, use the -c trick on the kernel line, and reset the machine and erase all configuration on it. Now I could bring it up as a single controller, upgrade it, factory reset it, and then have it join the cluster. That all worked fine, it just took be two hours to do it all.
Here's what I should have done.
Bring up controller A, config it and log into the BUI. Now bring up controller B. Do NOT config it in any way. Using controller A, setup clustering in the cluster menu.
Once the two controllers are clustered and all is well, NOW go ahead and upgrade controller A to the latest code. Once it reboots, go ahead and upgrade controller B. Everything's fine. You see, if the cluster has already been made, it's perfectly fine to upgrade one controller at a time. The software lets you do that. The software does NOT let you setup a NEW cluster if the controllers are not on the same software level.
So that is the cluster setup safety tip of the day, kids. Have fun.