Oracle Solaris Cluster core patch 126106-40 and 126107-40

This is a notify because there are some troubles around with the following Sun Cluster 3.2 -40 core patches:
126106-40 Sun Cluster 3.2: CORE patch for Solaris 10
126107-40 Sun Cluster 3.2: CORE patch for Solaris 10_x86
Before installing the patch carefully read the Special Install Instructions.
Update: 28.Apr 2010
This also apply to the already released -41 and -42 SC core patches, when -40 is not already active

Two new notes where added to these patches:


NOTE 16: Remove the loaddid SMF service by running the following
commands before installing this patch, if current patch level
(before installing this patch) is less than -40:
svcadm disable svc:/system/cluster/loaddid
svccfg delete svc:/system/cluster/loaddid


      So, the right approach is:
      # boot in non-cluster mode
      # svcadm disable svc:/system/cluster/loaddid
      # svccfg delete svc:/system/cluster/loaddid
      # patchadd 126106-40
      # init 6


NOTE 17:
Installing this patch on a machine with Availability Suite
software installed will cause the machine to fail to boot with
dependency errors due to BugId 6896134 (AVS does not wait for
did devices to startup in a cluster).
Please contact your Sun
Service Representative for relief before installing this patch.


The solution for Bug 6896134 is now available, please follow the right approach below for installation...
123246-05 Sun StorEdge Availability Suite 4.0: Patch for Solaris 10
123247-05 Sun StorEdge Availability Suite 4.0: Patch for Solaris 10_x86
       # patchadd 12324[67]-05 (Follow Special Install Instructions)
       # boot in non-cluster mode
       # svcadm disable svc:/system/cluster/loaddid
       # svccfg delete svc:/system/cluster/loaddid
       # patchadd 12610[67]-40
       # init 6



Important to know: These 2 issues only come up if using Solaris 10 10/09 Update8 or the Kernel patch 141444-09 or higher. There are changes in the startup of the iSCSI initiator (is now a SMF service) - please refer to 6888193 for details.

Hint: If using LU (live upgrade) for patching please refer to my blog Summary of installation instructions for 126106-40 and 126107-40

ATTENTION: the NOTE 16 is valid for removal of the patch - carefully read 'Special Removal Instructions'.


Additional information around NOTE 16
1) What happen if forgot to delete the loaddid service before the patch installation?

The following error (or similar) come up within patch installation on the console of the server:
Mar 2 12:01:46 svc.startd[7]: Transitioning svc:/system/cluster/loaddid:default to maintenance because it completes a dependency cycle (see svcs -xv for details):
svc:/network/iscsi/initiator:default
svc:/network/service
svc:/network/service:default
svc:/network/rpc/nisplus
svc:/network/rpc/nisplus:default
svc:/network/rpc/keyserv
svc:/network/rpc/keyserv:default
svc:/network/rpc/bind
svc:/network/rpc/bind:default
svc:/system/sysidtool:net
svc:/milestone/single-user:default
svc:/system/cluster/loaddid
svc:/system/cluster/loaddid:default
Mar 2 12:01:46 svc.startd[7]: system/cluster/loaddid:default transitioned to maintenance by request (see 'svcs -xv' for details)

But this should NOT be problem because the patch 126106-40 will be installed in non-cluster-mode. This means after the next boot into cluster mode the error should disappear. This is reported in Bug 6911030.

But to be sure that the system is booting correctly do:
- check the log file /var/svc/log/system-cluster-loaddid:default.log
- that one of the last lines is: [ Mar 16 10:31:15 Rereading configuration. ]
- if not go to 'Recovery procedure below'


2) What happen if doing the loaddid delete after the patch installation?

Maybe you the see the problem mentioned in 1). If disable and delete the 'svc:/system/cluster/loaddid' service after the patch installation then the system will no longer joining the cluster. The following error come up:
...
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Configuring devices.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:54 : problem waiting the deamon, read errno 0
Mar 16 11:42:54 svc.startd[8]: svc:/system/cluster/sc_failfast:default: Method "/usr/cluster/lib/svc/method/sc_failfast start" failed with exit status 1.
Mar 16 11:42:54 svc.startd[8]: system/cluster/sc_failfast:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Mar 16 11:42:54 Cluster.Framework: Could not initialize the ORB. Exiting.
Mar 16 11:42:55 svc.startd[8]: svc:/system/cluster/cl_execd:default: Method "/usr/cluster/lib/svc/method/sc_cl_execd start" failed with exit status 1.
Mar 16 11:42:55 svc.startd[8]: system/cluster/cl_execd:default failed: transitioned to maintenance (see 'svcs -xv' for details)

If seeing this error then refer to Recovering procedure


Recovering procedure

1) boot in non-cluster-mode if not able to login
2) bring the files loaddid and loaddid.xml in place (normally using the files from SC core patch -40)
ONLY in case of trouble with the files from SC core patch -40 use the old files!
Note: If restore old file without the dependency to iSCSI initiator then there can be problems if trying to use iSCSI storage within Sun Cluster.
3) Repair loaddid service
# svcadm disable svc:/system/cluster/loaddid
# svccfg delete svc:/system/cluster/loaddid
# svccfg import /var/svc/manifest/system/cluster/loaddid.xml
# svcadm restart svc:/system/cluster/loaddid:default
4) check the log file /var/svc/log/system-cluster-loaddid:default.log
# tail /var/svc/log/system-cluster-loaddid:default.log
for the following line (which should be on the end of the log file)
[ Mar 16 11:43:06 Rereading configuration. ]
Note: Rereading configuration is necessary before booting!
5) reboot the system
# init 6


Additional information: Difference details of loaddid files after installation of SC core patch -40.

A) /var/svc/manifest/system/cluster/loaddid.xml
The SC core patch -40 delivers a new version with the following changes:
<       ident "@(#)loaddid.xml 1.3 06/05/12 SMI"
---
>       ident "@(#)loaddid.xml 1.5 09/11/04 SMI"
56,61c79,92
<       <dependent
<              name='loaddid_single-user'
<              grouping='optional_all'
<             restart_on='none'>
<             <service_fmri value='svc:/milestone/single-user:default' />
<       </dependent>
---
>       <!--
>              The following dependency is for did drivers to get loaded
>              properly for iSCSI based quorum and data devices. We want to
>              start loaddid service after the time when iSCSI connections
>              can be made.
>        -->
>       <dependency
>              name='cl_iscsi_initiator'
>              grouping='optional_all'
>              restart_on='none'
>              type='service'>
>              <service_fmri
>              value='svc:/network/iscsi/initiator:default' />
>       </dependency>


Before patch -40 is applied:
node1 # svcs -d loaddid:default
STATE STIME FMRI
online 11:29:45 svc:/system/cluster/cl_boot_check:default
online 11:29:49 svc:/system/coreadm:default
online 11:30:52 svc:/milestone/devices:default

node1 # svcs -D svc:/system/cluster/loaddid:default
STATE STIME FMRI
online 15:34:41 svc:/system/cluster/bootcluster:default
online 15:34:46 svc:/milestone/single-user:default


After patch -40 is applied:
node1 # svcs -d loaddid:default
STATE STIME FMRI
online 12:09:18 svc:/system/coreadm:default
online 12:09:20 svc:/system/cluster/cl_boot_check:default
online 12:09:21 svc:/network/iscsi/initiator:default
online 12:10:21 svc:/milestone/devices:default

node1 # svcs -D svc:/system/cluster/loaddid:default
STATE STIME FMRI
online 16:08:19 svc:/system/cluster/bootcluster:default


B) /usr/cluster/lib/svc/method/loaddid
The SC core patch -40 delivers a new version with the following changes:
< #ident "@(#)loaddid 1.7 06/08/07 SMI"
---
> #ident "@(#)loaddid 1.9 09/11/04 SMI"
15,16c36,44
<        svcprop -q -p system/reconfigure system/svc/restarter:default 2>/dev/null
<        if [ $? -eq 0 ] && [ `svcprop -p system/reconfigure system/svc/restarter:default` = "true" ]
---
>        # The property "reconfigure" is used to store whether the boot is
>        # a reconfiguration boot or not. The property "system/reconfigure"
>        # of the "system/svc/restarter" Solaris SMF service can be used
>        # for this purpose as well. However the system/reconfigure
>        # property is reset at the single-user milestone. SC requires this
>        # property for use by service after the single-user milestone as
>        # well.
>        svcprop -q -p clusterdata/reconfigure system/cluster/cl_boot_check 2>/dev/null
>        if [ $? -eq 0 ] && [ `svcprop -p clusterdata/reconfigure system/cluster/cl_boot_check` = "true" ]


Comments:

Hi Juergen,

thanks for the heads up.

If I understand this correctly then Note 16 can be disregarded if live upgrade is used for patching as it is only when patching live environments that the dependency issue will be seen?

Thanks,

Richard

Posted by Richard on March 05, 2010 at 07:02 AM CET #

Yes, Richard
the Note 16 should NOT apply to live upgrade patching.
Best Regards,
Juergen

Posted by Juergen Schleich on March 05, 2010 at 07:15 AM CET #

Hi Juergen,

Well, we have already delete the loaddid service before the installation of 126106-42 but the following error (or similar) come up within patch installation on the console of the server:

Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc.startd[7]: Transitioning svc:/s
ystem/cluster/loaddid:default to maintenance because it completes a dependency cycle (see svcs -xv for details):
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/iscsi/initiator:default
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/service
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/service:default
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/nisplus
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/nisplus:default
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/keyserv
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/keyserv:default
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/bind
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/network/rpc/bind:default
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/system/sysidtool:net
Jul 26 15:49:08 2010 Jul 26 15:49:15 bryan01 svc:/milestone/single-user:default

After the reboot & re-join the cluster, the following error was reported:

Jul 26 16:11:21 2010 Jul 26 16:11:12 svc.startd[8]: Transitioning svc:/system/sysidtool:net to maintenance because it completes a dependency cycle (see svcs -xv
for details):
Jul 26 16:11:21 2010 svc:/milestone/single-user:default
Jul 26 16:11:21 2010 svc:/system/cluster/loaddid
Jul 26 16:11:21 2010 svc:/system/cluster/loaddid:default
Jul 26 16:11:21 2010 svc:/network/iscsi/initiator:default
Jul 26 16:11:21 2010 svc:/network/service
Jul 26 16:11:21 2010 svc:/network/service:default
Jul 26 16:11:21 2010 svc:/network/rpc/nisplus
Jul 26 16:11:21 2010 svc:/network/rpc/nisplus:default
Jul 26 16:11:21 2010 svc:/network/rpc/keyserv
Jul 26 16:11:21 2010 svc:/network/rpc/keyserv:default
Jul 26 16:11:21 2010 svc:/network/rpc/bind
Jul 26 16:11:21 2010 svc:/network/rpc/bind:default

So, any idea on this issue?

Thanks in advance.

Thanks & Regards
Paul

Posted by Paul Liong on July 28, 2010 at 11:06 PM CEST #

If not using AVS then it seems that something went wrong with the deletion of the loaddid service. A second reboot or the mentioned recovery procedure should fix the issue.

Posted by Juergen Schleich on August 05, 2010 at 11:23 AM CEST #

Hi Juergen,

Thanks for your update. Yes. we don't use AVS in our environment. However, the error persist after the 2nd reboot. In effect, we have tried the suggested recovery procedure but in vain. So, is there any more hint on that issue?

Thanks & Regards
Paul

Posted by Paul Liong on August 05, 2010 at 08:19 PM CEST #

This sounds strange. When the message
[ Mar 16 11:43:06 Rereading configuration. ]
is on the end of the /var/svc/log/system-cluster-loaddid:default.log after the recovery procedure then the
startup should work. If not please open a SR for further investigation.
Thanks.

Posted by Juergen Schleich on August 11, 2010 at 11:08 AM CEST #

Hi Juergen,

Yes. We had raised a SR to oracle two week ago. The only feedback is "backout patch and re-install again". But I doubt if this works in our case :(

Thanks & Regards
Paul

Posted by Paul Liong on August 12, 2010 at 04:59 AM CEST #

Paul, I reinstalled this patch several times with the mentioned approach. Be very careful, if you make a mistake in the approach (install or uninstall) then you will end up in an error. Also the recovery procedure is sensitive it should be followed 'very' exactly. If you are not able to install this patch successful then something seems be wrong in your system...

Posted by Juergen Schleich on August 12, 2010 at 05:46 AM CEST #

I have encountered the service related difficulties with 126106-40. What if any kernel revision exists that this bug will not surface. Backed out patch and left our Cluster version at 126106-35. Thanks.

Posted by William Griffin on September 07, 2010 at 03:33 PM CEST #

Another workaround for the sysidtool issue which is mentioned in this thread...

The problem is the dependency on loaddid of the single-user milestone.
This dependency is not in the new loaddid manifest (starting with patch 126106-40) and should be removed during the patch update. The new loaddid adds a new dependency on iscsi/initiator. If this dependency is added but the old dependent of single-user is not removed, we end-up in a dependency cycle and the system will be stuck during boot.

Note: There seems to be a odd behavior. If we deleted the loaddid service from smf the dependency was gone. But if we import the new loaddid manifest the dependency was created again although the dependency was not defined in the manifest.

-----
The solution is to remove the dependency with svccfg

!! here an example, how to remove the loaddid dependency from the single-user milestone
-----
# svcs -d single-user | grep loaddid
online Oct_29 svc:/system/cluster/loaddid:default
-----
# svccfg
svc:> select single-user:default
-----
svc:/milestone/single-user:default> listprop
single_user dependency
single_user/entities fmri svc:/system/cluster/cl_boot_check
single_user/external boolean true
single_user/grouping astring require_all
single_user/restart_on astring none
single_user/type astring service
loaddid_single-user dependency
loaddid_single-user/entities fmri svc:/system/cluster/loaddid
loaddid_single-user/external boolean true
loaddid_single-user/grouping astring optional_all
loaddid_single-user/restart_on astring none
loaddid_single-user/type astring service
general framework
general/enabled boolean true
restarter framework NONPERSISTENT
restarter/start_pid count 375
restarter/start_method_timestamp time 1288378108.897436000
restarter/start_method_waitstatus integer 0
restarter/transient_contract count
restarter/logfile astring /var/svc/log/milestone-single-user:default.log
restarter/auxiliary_state astring none
restarter/next_state astring none
restarter/state astring online
restarter/state_timestamp time 1289318946.463352000
restarter_actions framework NONPERSISTENT
restarter_actions/refresh integer
-----
svc:/milestone/single-user:default> delprop loaddid_single-user
-----
svc:/milestone/single-user:default> listprop
single_user dependency
single_user/entities fmri svc:/system/cluster/cl_boot_check
single_user/external boolean true
single_user/grouping astring require_all
single_user/restart_on astring none
single_user/type astring service
general framework
general/enabled boolean true
restarter framework NONPERSISTENT
restarter/start_pid count 375
restarter/start_method_timestamp time 1288378108.897436000
restarter/start_method_waitstatus integer 0
restarter/transient_contract count
restarter/logfile astring /var/svc/log/milestone-single-user:default.log
restarter/auxiliary_state astring none
restarter/next_state astring none
restarter/state astring online
restarter/state_timestamp time 1289318946.463352000
restarter_actions framework NONPERSISTENT
restarter_actions/refresh integer
svc:/milestone/single-user:default> exit
-----

!! the loaddid dependency was removed but it is still active:
-----
# svcs -d single-user | grep loaddid
online Oct_29 svc:/system/cluster/loaddid:default
-----
!! to remove the dependency permanently a refresh is necessary:
-----
# svcadm refresh single-user
# svcs -d single-user | grep loaddid

Posted by Juergen Schleich on November 10, 2010 at 01:29 AM CET #

Hi Juergen,

Thanks for your further update. It really helps. By the way, it is noticed that the following notice is mentioned in 126106-42 patch Readme:

NOTE 8: This patch updates the RT versions of the following Resource Types.

SUNW.crs_framework
SUNW.rac_framework
SUNW.rac_cvm
SUNW.rac_svm
SUNW.rac_udlm
SUNW.LogicalHostname
SUNW.ScalDeviceGroup
SUNW.HAStoragePlus
SUNW.ScalMountPoint

You can upgrade your resources to these new versions by performing a Resource Type upgrade.

So, I just want to know where to check the instructions for upgrading the resource type to determine when we can migrate resources to a new version of the resource type.

Thanks & Regards
Paul

Posted by Paul Liong on December 28, 2010 at 05:48 PM CET #

Hi Paul,
normally you should always use the newest available resource type of your installation. For verification do:
# clsetup
by selecting:
2) Resource groups
9) Change properties of a resource
3) Manage resource versioning
1) Show versioning status
clsetup can also be used to do the upgrade of the resource type...

Posted by Juergen Schleich on January 04, 2011 at 07:21 AM CET #

Hi Juergen,

On initial boot of a newly patched BE I experienced:

-the loaddid service being disabled for dependency reasons (on Node 1)
and
- Could not initialize the ORB (on Node 2)

I understand the root cause was the presence of the loaddid service in the alternate (inactive) boot environment which I was upgrading the cluster from 3.2 to 3.3.

Is there a way to delete the loadddid service from an inactive BE (without booting into it) so theses issues cane be avoided?

Thanks & rgds,

Tony

Is there a way to use svccfg to delete the loaddid service from a inactive boot environment

Posted by guest on January 31, 2014 at 06:46 AM CET #

Hi Tony,

if the do a second reboot of the patched BE then the issue should be solved. Please look to the example (C) in this blog ‘Summary of install instructions for 126106-40 and 126107-40 (https://blogs.oracle.com/js/entry/summary_of_install_instructions_for)

Unfortunately you can not delete the loaddid service from an inactive boot environment.

Hth, Juergen

Posted by Juergen on February 10, 2014 at 10:48 AM CET #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today