Tuesday Aug 17, 2010

New numbers of Solaris Cluster 3.2 core patches

There was a rejuvenation of the Solaris Cluster 3.2 core patch. The new patches are

144220 Solaris Cluster 3.2: CORE patch for Solaris 9
144221 Solaris Cluster 3.2: CORE patch for Solaris 10
144222 Solaris Cluster 3.2: CORE patch for Solaris 10_x86
At this time these patches does NOT have the requirement to be installed in non-cluster-single-user-mode. They can be installed in order when cluster is running, but requires a reboot.

Beware the new patches requires the previous version -42 of the SC 3.2 core patch.
126105-42 Sun Cluster 3.2: CORE patch for Solaris 9
126106-42 Sun Cluster 3.2: CORE patch for Solaris 10
126107-42 Sun Cluster 3.2: CORE patch for Solaris 10_x86
And the -42 still have the requirement to be installed in non-cluster-single-user-mode. Furthermore carefully study the special install instructions and some entries of this blog.

The advantage is, when -42 is already applied then the patching of Solaris Cluster 3.2 becomes more easy.

Certainly, it's possible to apply the new SC core patch at the same time as the -42 core patch in non-cluster-single-user-mode.

Monday May 17, 2010

Oracle Solaris Cluster Community


- Do you have a question around Oracle Solaris Cluster ?
- Do you have a login into My Oracle Support or SunSolve ?
- Do you like to collaborate and sharing knowledge with your peers ?


Then login into the moderated Oracle Solaris Cluster Community and feel free to ask and answer questions. In the Community it's possible to start discussions and knowledge documents can be created. For easy monitoring subscribe to discussions and documents. If somebody answers your question you can identify the thread with 'Correct Answer' or 'Helpful Answer' this helps to find solutions quickly. Additional in the 'Private Messages' tab you can talk to other community members…

At this time there are more than 140 online Oracle Support Communities where you can participate. So, please give it try and login into the 'My Oracle Support Community' , setup your profile and refer to 'Getting Started' region in the upper right corner of the main page.

See you there...

Friday Mar 26, 2010

Summary of install instructions for 126106-40 and 126107-40

My last blog describe some issues around these patches (please read)
126106-40 Sun Cluster 3.2: CORE patch for Solaris 10
126107-40 Sun Cluster 3.2: CORE patch for Solaris 10_x86
This is a follow up with a summary of best practices 'How to install?' these patches. There is a difference between new installations, 'normal' patching and live upgrade patching.
Important: The mentioned instructions are working if already Solaris Cluster 3.2 1/09 update2 (or Solaris Cluster 3.2 core patch revision -27(sparc) / -28(x86) ) or higher is installed. If running lower version of the Solaris Cluster 3.2 core patch then additional needs are necessary. Please refer to special install instructions of the patches for the additional needs.
Update: 28.Apr 2010
This also apply to the already released -41 and -42 SC core patches, when -40 is not already active

A) In case of new installations:

Install the SC core patch -40 immediately after the installation of the Solaris Cluster 3.2 software.
In brief:
    1.) Install Solaris Cluster 3.2 via JES installer
    2.) Install the SC core patch -40
    3.) Run scinstall
    4.) Do the reboot
Note: Do NOT do a reboot between 1.) and 2.). Follow the EIS Solaris Cluster 3.2 checklist which also has a note for this issue. If not available follow the standard installation process of Sun Cluster 3.2


B) In case of 'normal' patching

It is vital to use the following/right approach in case of patching. Because if you not use the following approach then the Solaris Cluster 3.2 can not boot anymore:
    0.) Only if using AVS 4.0
    # patchadd 12324[67]-05 (Follow Special Install Instructions)
    1.) # boot in non-cluster mode
    2.) # svcadm disable svc:/system/cluster/loaddid
    3.) # svccfg delete svc:/system/cluster/loaddid
    4.) # patchadd 12610[67]-40
    5.) # init 6


C) In case of LU (Live Upgrade feature) to install patches
    1.) Create ABE:
    For zfs root within the same root pool:
    # lucreate -n patchroot
    For ufs on different root drive:
    # prtvtoc /dev/rdsk/c1t3d0s2 | fmthard -s - /dev/rdsk/c1t2d0s2
    # lucreate -c "c1t3d0s0-root" -m /:/dev/dsk/c1t2d0s0:ufs -m /global/.devices/node@2:/dev/dsk/c1t2d0s6:ufs -n "c1t2d0s0-patchroot"
    2.) Install patch into ABE ( patch is already unpacked in /var/tmp )
    # luupgrade -t -n c1t2d0s0-patchroot -s /var/tmp 126106-40
    3.) Activate ABE
    # luactivate patchroot
    4.) # init 6
    # Some errors comes up at this point
    (dependency cycle & ORB error - Please look to example below)
    5.) # init 6 (the second reboot fix the problem) Bug 6938144





My personal recommendation to minimize the risk of the installation for the SC core patch -40 is:
Step 1) Upgrade the Solaris Cluster to
a) Solaris 10 10/09 update 8 and Solaris Cluster 3.2 11/09 update3.
or
b) EIS Baseline 26JAN10 which include the Solaris kernel update 14144[45]-09 and SC core patch -39. If EIS baseline not available use other patchset which include the mentioned patches.
Step 2) After the successful upgrade do a single patch install of the SC core patch -40 by using the installation instruction B) which is mentioned above. In this software state the -40 can be applied 'rolling' to the cluster.

Note: 'Rolling' means: Boot node1 in non-cluster-mode -> install -40 (see B) -> boot node1 back into cluster -> boot node2 in non-cluster-mode -> install -40 (see B) -> boot node2 back into cluster.

Wednesday Mar 03, 2010

Oracle Solaris Cluster core patch 126106-40 and 126107-40

This is a notify because there are some troubles around with the following Sun Cluster 3.2 -40 core patches:
126106-40 Sun Cluster 3.2: CORE patch for Solaris 10
126107-40 Sun Cluster 3.2: CORE patch for Solaris 10_x86
Before installing the patch carefully read the Special Install Instructions.
Update: 28.Apr 2010
This also apply to the already released -41 and -42 SC core patches, when -40 is not already active

Two new notes where added to these patches:


NOTE 16: Remove the loaddid SMF service by running the following
commands before installing this patch, if current patch level
(before installing this patch) is less than -40:
svcadm disable svc:/system/cluster/loaddid
svccfg delete svc:/system/cluster/loaddid


      So, the right approach is:
      # boot in non-cluster mode
      # svcadm disable svc:/system/cluster/loaddid
      # svccfg delete svc:/system/cluster/loaddid
      # patchadd 126106-40
      # init 6


NOTE 17:
Installing this patch on a machine with Availability Suite
software installed will cause the machine to fail to boot with
dependency errors due to BugId 6896134 (AVS does not wait for
did devices to startup in a cluster).
Please contact your Sun
Service Representative for relief before installing this patch.


The solution for Bug 6896134 is now available, please follow the right approach below for installation...
123246-05 Sun StorEdge Availability Suite 4.0: Patch for Solaris 10
123247-05 Sun StorEdge Availability Suite 4.0: Patch for Solaris 10_x86
       # patchadd 12324[67]-05 (Follow Special Install Instructions)
       # boot in non-cluster mode
       # svcadm disable svc:/system/cluster/loaddid
       # svccfg delete svc:/system/cluster/loaddid
       # patchadd 12610[67]-40
       # init 6



Important to know: These 2 issues only come up if using Solaris 10 10/09 Update8 or the Kernel patch 141444-09 or higher. There are changes in the startup of the iSCSI initiator (is now a SMF service) - please refer to 6888193 for details.

Hint: If using LU (live upgrade) for patching please refer to my blog Summary of installation instructions for 126106-40 and 126107-40

ATTENTION: the NOTE 16 is valid for removal of the patch - carefully read 'Special Removal Instructions'.


Additional information around NOTE 16
1) What happen if forgot to delete the loaddid service before the patch installation?

The following error (or similar) come up within patch installation on the console of the server:
Mar 2 12:01:46 svc.startd[7]: Transitioning svc:/system/cluster/loaddid:default to maintenance because it completes a dependency cycle (see svcs -xv for details):
svc:/network/iscsi/initiator:default
svc:/network/service
svc:/network/service:default
svc:/network/rpc/nisplus
svc:/network/rpc/nisplus:default
svc:/network/rpc/keyserv
svc:/network/rpc/keyserv:default
svc:/network/rpc/bind
svc:/network/rpc/bind:default
svc:/system/sysidtool:net
svc:/milestone/single-user:default
svc:/system/cluster/loaddid
svc:/system/cluster/loaddid:default
Mar 2 12:01:46 svc.startd[7]: system/cluster/loaddid:default transitioned to maintenance by request (see 'svcs -xv' for details)

But this should NOT be problem because the patch 126106-40 will be installed in non-cluster-mode. This means after the next boot into cluster mode the error should disappear. This is reported in Bug 6911030.

But to be sure that the system is booting correctly do:
- check the log file /var/svc/log/system-cluster-loaddid:default.log
- that one of the last lines is: [ Mar 16 10:31:15 Rereading configuration. ]
- if not go to 'Recovery procedure below'


2) What happen if doing the loaddid delete after the patch installation?

Maybe you the see the problem mentioned in 1). If disable and delete the 'svc:/system/cluster/loaddid' service after the patch installation then the system will no longer joining the cluster. The following error come up:
...
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Configuring devices.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 svc.startd[8]: system/cluster/sc_failfast:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:55 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:55 svc.startd[8]: system/cluster/cl_execd:default failed: transitioned to maintenance (see 'svcs -xv' for details)
…
If seeing this error then refer to Recovering procedure


Recovering procedure

1) boot in non-cluster-mode if not able to login
2) bring the files loaddid and loaddid.xml in place (normally using the files from SC core patch -40)
ONLY in case of trouble with the files from SC core patch -40 use the old files!
Note: If restore old file without the dependency to iSCSI initiator then there can be problems if trying to use iSCSI storage within Sun Cluster.
3) Repair loaddid service
# svcadm disable svc:/system/cluster/loaddid
# svccfg delete svc:/system/cluster/loaddid
# svccfg import /var/svc/manifest/system/cluster/loaddid.xml
# svcadm restart svc:/system/cluster/loaddid:default
4) check the log file /var/svc/log/system-cluster-loaddid:default.log
# tail /var/svc/log/system-cluster-loaddid:default.log
for the following line (which should be on the end of the log file)
[ Mar 16 11:43:06 Rereading configuration. ]
Note: Rereading configuration is necessary before booting!
5) reboot the system
# init 6


Additional information: Difference details of loaddid files after installation of SC core patch -40.

A) /var/svc/manifest/system/cluster/loaddid.xml
The SC core patch -40 delivers a new version with the following changes:
<       ident "@(#)loaddid.xml 1.3 06/05/12 SMI"
---
>       ident "@(#)loaddid.xml 1.5 09/11/04 SMI"
56,61c79,92
<       <dependent
<              name='loaddid_single-user'
<              grouping='optional_all'
<             restart_on='none'>
<             <service_fmri value='svc:/milestone/single-user:default' />
<       </dependent>
---
>       <!--
>              The following dependency is for did drivers to get loaded
>              properly for iSCSI based quorum and data devices. We want to
>              start loaddid service after the time when iSCSI connections
>              can be made.
>        -->
>       <dependency
>              name='cl_iscsi_initiator'
>              grouping='optional_all'
>              restart_on='none'
>              type='service'>
>              <service_fmri
>              value='svc:/network/iscsi/initiator:default' />
>       </dependency>


Before patch -40 is applied:
node1 # svcs -d loaddid:default
STATE STIME FMRI
online 11:29:45 svc:/system/cluster/cl_boot_check:default
online 11:29:49 svc:/system/coreadm:default
online 11:30:52 svc:/milestone/devices:default

node1 # svcs -D svc:/system/cluster/loaddid:default
STATE STIME FMRI
online 15:34:41 svc:/system/cluster/bootcluster:default
online 15:34:46 svc:/milestone/single-user:default


After patch -40 is applied:
node1 # svcs -d loaddid:default
STATE STIME FMRI
online 12:09:18 svc:/system/coreadm:default
online 12:09:20 svc:/system/cluster/cl_boot_check:default
online 12:09:21 svc:/network/iscsi/initiator:default
online 12:10:21 svc:/milestone/devices:default

node1 # svcs -D svc:/system/cluster/loaddid:default
STATE STIME FMRI
online 16:08:19 svc:/system/cluster/bootcluster:default


B) /usr/cluster/lib/svc/method/loaddid
The SC core patch -40 delivers a new version with the following changes:
< #ident "@(#)loaddid 1.7 06/08/07 SMI"
---
> #ident "@(#)loaddid 1.9 09/11/04 SMI"
15,16c36,44
<        svcprop -q -p system/reconfigure system/svc/restarter:default 2>/dev/null
<        if [ $? -eq 0 ] && [ `svcprop -p system/reconfigure system/svc/restarter:default` = "true" ]
---
>        # The property "reconfigure" is used to store whether the boot is
>        # a reconfiguration boot or not. The property "system/reconfigure"
>        # of the "system/svc/restarter" Solaris SMF service can be used
>        # for this purpose as well. However the system/reconfigure
>        # property is reset at the single-user milestone. SC requires this
>        # property for use by service after the single-user milestone as
>        # well.
>        svcprop -q -p clusterdata/reconfigure system/cluster/cl_boot_check 2>/dev/null
>        if [ $? -eq 0 ] && [ `svcprop -p clusterdata/reconfigure system/cluster/cl_boot_check` = "true" ]


Wednesday Jan 27, 2010

The Move

>
>
>
>
>
>
>
>
>
If you miss some blogs, I removed all personal blogs due to new Social Media Participation Policy.
About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today