Friday Aug 14, 2009

Improvements to Solaris 10 Recommended and Sun Alert Patch Clusters released

My colleague, Ed Clark, has made significant improvements to the Solaris 10 Recommended and Sun Alert patch clusters.  These improvements have just been released and are in the current clusters available to contract customers from the Patch Cluster & Patch Bundle Downloads on SunSolve.

Ed's improvements include:

  • Filtering out "false negatives" from the patch utility return codes, so that if the cluster install script returns "1", you know you've got a real problem which needs investigating.   As you may know, the Solaris patch utility, 'patchadd', can return errors for some acceptable situations - for example, if the patch is already applied to the system, or a later revision of the patch or a patch which obsoletes it is already applied to the system, or none of the packages in the patch are on the target system (e.g. because a reduced Install Metacluster was used to install it or the system has been security hardened by package removal), etc.   Such conditions are acceptable "errors" which do not usually require further investigation by the user.  By filtering these conditions out, if the 'installcluster' script returns "1", you know it isn't because of one of these acceptable "errors", and therefore you need to look at the logfiles to find out what's gone wrong.  For further information, please see the cluster README and Analyzing a patchadd or patchrm Failure in the Solaris OS.
  • The new 'installcluster' script will exit as soon as it encounters an unexpected failure - i.e. not one of the acceptable "errors" mentioned above.  This prevents potentially compounding issues by attempting to apply further patches.
  • The new 'installcluster' script includes context intelligence for patching operations.   It informs the user when zones need to be halted, and it provides phased installation to handle patches which absolutely require an immediate reboot before further patches can be applied.  Such interim reboots are only needed when patching a live boot environment on a system below Kernel patch 118833-36 (SPARC) / 118855-36 (x86) and well as the earlier interim reboot required on x86 related to 'libc.so' patches and Kernel patch 118844-14.  On systems below these patch levels, the 'installcluster' will stop at the appropriate point when patching the live boot environment, and inform the user to reboot and re-invoke the 'installcluster' script.  (In the old cluster install script, it simply tried to carry on blindly past such interim reboots, spewing out error messages, although code in the relevant patches prevented any harm from being done).  These interim reboots, when required, are dealt with relatively early in the cluster install sequence so that once completed, the Sys Admin can leave the rest of the installation to finish unattended and move onto other systems.
  • The new 'installcluster' script provides better integration with Solaris Live Upgrade as the user can now specify the Live Upgrade alternate boot environment to patch by name.
  • The new 'installcluster' script performs space checking prior to installing each patch, and will halt if it believes there is insufficient space to complete the installation successfully.  For example, this helps avoid non-global zones getting out of sync regarding patch levels with respect to the global zone.  This is an important enhancement as running out of space during patching can potentially leave the system in an inconsistent state and is to be avoided.  Even removing a patch requires space, so immediate removal of a patch which has failed to apply correctly due to space issues should be avoided until sufficient space is freed up and potential issues caused by its partial installation investigated - for example, was the undo.Z file successfully created to enable backout ? (Tip: It may be better to retry the patch installation once space has been freed up rather than patch removal in such circumstances.  Contact Sun Support for instructions if you encounter such issues.).   The space checking enhancements in the 'installcluster' script are designed to prevent such problems occurring.
  • The messages and log files produced by the 'installcluster' script are clear and well structured.  For example, a "failed" log is created if a patch fails to apply.  See the Cluster README for further information.
  • The 'patch_order' places patches in an optimal order for installation to avoid known issues - for example, the patch utilities patches are installed as early in the sequence as possible to avoid hitting patch installation bugs which are fixed in the patch utility patches, and the Kernel patch procedural script override patch, 125555 (SPARC) / 125556 (x86), is ordered prior to 137137-09 (SPARC) / 137138-09 (x86) to resolve some known issues.  When patching an alternate boot environment (which is recommended), a small sub-set of pre-requisite patches, primarily the patch utility patches, need to be applied to the live boot environment to ensure correct patching operation.  The 'installcluster' script will check for these pre-requisite patches are halt installation if they are not present, advising the user of the 'installcluster' script option to use to install these pre-requisite patches.   Further patches may need to be installed on the live boot environment to support Live Upgrade.  See the cluster README for further information.
  • The patches have been moved to a 'patches' sub-directory, to de-clutter the top level directory of the unzipped cluster.
  • Please see the cluster README file for further information.  Customers should read the cluster README file and look at the Special Install Instructions in the patches within the cluster prior to installation.

I really want to thank Ed Clark for the enormous amount of thought and effort he has put into improving the cluster installation experience.   The work he's done on the Solaris 10 Recommended and Sun Alert patch cluster is a continuation of his previous work on the Solaris Update Patch Bundles and the Solaris 10 Live Upgrade Zones Starter Patch Bundle.  Nice work, Ed!

While the 'installcluster' script is copyrighted, I am happy for customers to use it, and the 'patch_order' file, as a starting point for their own customized patch bundles, so long as it is for their own use and is not to be given to a 3rd party or used for commercial gain (e.g. by a 3rd party maintainer or 3rd party commercial automation tool).

We have also made significant improvements to the back end processes to ensure higher and more consistent cluster quality. 

Originally, the clusters were created by the Patch Operations and Distribution (POD) team after patch release.  The POD Cluster QA process left a lot to be desired, resulting in inconsistent cluster quality.   To plug this gap, my Patch System Test team have been testing the clusters for several years, but the old process only allowed us to test them in parallel with their release, which meant that we found issues at the same time that early downloaders of the cluster encountered them.  Although we ensured such issues were fixed as quickly as possible, it still obviously compromised our customers' experience.

In the new process, the clusters are routed to Patch System Test (PST) prior to release.  PST run a transformation script on them to optimize the patch installation order, etc.  The clusters will only be released once they have passed PST testing.  This should ensure higher and more consistent quality for customers.  Work is continuing to move the entire patch cluster generation process to PST, although these future backend enhancements in this regard should be invisible to customers.

Wednesday Dec 17, 2008

Definitive interpretation of the "rebootimmediate" and "reconfigimmediate" patch flags

The following is now available as Infodoc 249046:


What follows is an open letter to customers in response to customer confusion over how to handle the "rebootimmediate" and "reconfigimmediate" flags specified in some patches.

Despite the READMEs of patch clusters which contain such patches clearly stating that during a patching session, a reboot is only required in exceptional and documented circumstances, it has come to my attention that some customers are initiating reboots after applying every single patch in a patch set which specifies such flags.  Not surprisingly, such customers are concerned at the length of time this takes.

Open Letter with definitive interpretation of the "rebootimmediate" and "reconfigimmediate" patch flags

To whom it may concern,

Summary: When patching a live boot environment, it is usually OK to apply any number of patches before performing a single reboot at the end, even if multiple patches specify "rebootimmediate" or "reconfigimmediate".  On the rare occasion when it is found that this is not possible, specifically for 118833-36 (SPARC) and 118855-36 (x86) and 118844-14+ (x86), code will typically be inserted into the relevant patches to prevent the application of further patches which could cause problems.  Use of Live Upgrade to patch an inactive boot environment is recommended as it avoids the need for interim reboots for even these atypical patches.  Details below.

The "reboot" metadata flags which may be contained in the patch 'pkginfo' file(s) have the following meaning:

rebootafter - a reboot is required to activate some of the content delivered in the patch, but the system remains in a consistent state until the reboot is performed.

reconfigafter - a reconfiguration reboot is required to activate some of the content in the patch, but the system remains in a consistent state until the reconfiguration reboot is performed.

rebootimmediate - the system is in a potentially inconsistent state until the system is rebooted.  The objects applied in the patch are potentially inconsistent with processes running in memory.  Normal production must not be resumed until a reboot takes place to bring the system back into a fully consistent state.  However, since the footprint of the patch utilities is relatively small, it is normally OK to continue to apply further patches before initiating the reboot.   In cases where this is not OK, the patch in question will typically contain additional code to prevent further patches from being applied until the reboot takes place\*.  Since the system is in a potentially inconsistent state, it's advisable to avoid running any additional processes until the reboot takes place.  If patch automation tools are being used to apply "rebootimmediate" or "reconfigimmediate" patches, it's up to the automation tools' QA to ensure that their additional code footprint does not hit the potential inconsistent system state when applying such patches.

reconfigimmediate - exactly the same as rebootimmediate, except a reconfiguration reboot is required.

\*This is the case with Kernel patch 118833-36 (SPARC) / 118855-36 (x86), whose patch scripts replace 'patchadd' with a no-op telling the user to reboot the system.  The only other known reboot required before further patching can be done is specific to x86, and only if the system is running at a Kernel patch level below 118844-14.  A later revision of 118844, e.g. 118844-20, needs to be applied and the system rebooted to ensure the Kernel running in memory is compatible with library changes supplied in the libc patch 121208-02.  The prepatch script in 121208-02 and -03, and 118855-xx which obsoletes it, contains code to ensure 118844-14 or later is installed and active on the system.  (BTW, 118844-14 wasn't released. 118844-20 is recommended to fulfill the libc compatibility requirement.)

UPDATE, Jan 20, 2009: Murphy's Law strikes again!.  There's currently an issue, CR 6704883, with the "Sun Fibre Channel Device Drivers" patches 125184-05, -06, -07, and -08 (SPARC) and 125185-05, -06, -07, and -08 (x86) as described in Sun Alert 238630.  The fix for this issue is in rev-09 of the patches which is currently available as a T-Patch and will be released shortly.  Rev-09 of the patches uses modloading in its prepatch script to avoid the issue.  In the meantime, a workaround is to apply the affected patches last, immediately prior to rebooting the system.  The patches in the Solaris 10 10/08 patch bundle were specifically ordered to avoid this issue.  Where such issues are found, SunAlerts are published and the issue fixed.

Remember, patches can be downloaded and installed individually.  Therefore, each patch which requires a reboot must specify the reboot requirements.  But if patches are installed collectively in the same patching session, for example, as part of a patch cluster, then the install instructions contained in the cluster README file take precedence - e.g. that reboots are only required \*during\* patching sessions for the specific cases mentioned above.

Since the above patches were created, a significant enhancement has been made to the Solaris patch utilities called Deferred Activation Patching.  This enhancement is not retrospective, so the above historical problematic patches remain.

Deferred Activation Patching

The problem with the above atypical patches is that the new code they deliver may be invoked by the original patchadd code and the utilities it calls \*during\* patch installation.  A patch may patch many packages.  The packages are applied in alphabetic order.  In a Zones environment, the patch is applied to the global zone first, then to each non-global zone.

In the case of 118833-36 (SPARC) / 118855-36 (x86), the new versions of the libdevinfo.so.1 and libsec.so.1 libraries delivered in the patch could be invoked by patchadd and are potentially incompatible with the processes running in memory.

The solution devised in the patch scripts contained in 118833-36 (SPARC) / 118855-36 (x86) is to overlay mount the old objects on top of the newly laid down objects using the loopback filesystem (lofs).  This ensures that the system remains in a consistent state \*during\* the patch process as the old library versions which are compatible with what's running in memory will be called.

To avoid the application of further patches, which patch the same objects as 118833-36 (SPARC) / 118855-36 (x86), from patching the overlay mounted objects instead of the patched objects, 118833-36 (SPARC) / 118855-36 (x86) replace 'patchadd' with a no-op telling the customer to reboot the system before applying any further patches.

During reboot, the loopback filesystem mounts are torn down exposing the patched objects.  Further patching can now continue as the system is in a fully consistent state.

This loopback filesystem mount solution is the basis of Deferred Activation Patching.  After patch 118833-36 (SPARC) / 118855-36 (x86) was released, the solution was perfected and moved to the patch utilities.  The few patches which require application using Deferred Activation Patching specify the SUNW_PATCH_SAFE_MODE=true flag in their pkginfo files.  The solution was enhanced so that any subsequent patch applied prior to a reboot of the system, which patches the same objects as a patch explicitly specifying Deferred Activation Patching, will itself be automatically applied in Deferred Activation Patching mode.   This is known as implicit Deferred Activation Patching and enables other patches to be applied on top of a patch applied using Deferred Activation Patching without the need for an intervening reboot.  When a patch specifying Deferred Activation Patching mode is applied to a system, the user will see lots of loopback filesystem mounts on the system until such time as the reboot takes place.  Upon reboot, the loopback filesystem mounts are torn down, exposing the newly patched objects.

Kernel patch 12001[12]-14 which is included in Solaris 10 8/07 (Update 4), Kernel patch 12712[78]-11 which is included in Solaris 10 5/08 (Update 5), and Kernel patch 13713[78]-09 which is included in Solaris 10 10/08 (Update 6), are currently the only patches which specify application in Deferred Activation Patching mode.  Future Kernel patch included in future Solaris 10 Update releases are the likely candidates requiring application using Deferred Activation Patching.

With the introduction of Deferred Activation Patching, it is highly unlikely that future patches will require an interim reboot before further patches can be applied.

The problems with the system getting into an inconsistent state \*during\* patching (which Deferred Activation Patching resolves) could only occur when patching a live boot environment as it's due to the interaction between newly patched objects which are incompatible with processes running in memory being invoked prior to the system being rebooted.

To avoid this and other issues, Sun strongly recommends the use of Live Upgrade to patch (or upgrade) an inactive boot environment, which dramatically reduces the risk and downtime associated with patching.  For example, even though Deferred Activation Patching resolves the inconsistency issue, patching a live boot environment takes time and the system is out of production.

Using Live Upgrade, the inactive boot environment is patched, potentially while the system is still in production.  Issues such as those described above with Kernel patch 118833-36 (SPARC) / 118855-36 (x86), and 118844-20 (x86) simply don't apply when patching an inactive boot environment as there is no interaction between the objects being patched and the processes running in memory, as all the calls patchadd makes will be to the objects on the live partition, not the patched objects on the inactive partition.  A single reboot is required to boot into the new boot environment.

Another advantage of Live Upgrade is that if a problem arises with the new boot environment for whatever reason, the user can simply reboot back into the old boot environment to enable production to resume and the issues with the now inactive boot environment can be resolved later.

Best Wishes,

Gerry Haskins
Director, Software Patch Services

Wednesday Jan 09, 2008

Patch Install Downtime Requirements

As mentioned previously, Solaris Live Upgrade can help minimize the downtime associated with patching, by enabling users to patch an inactive boot environment.  When all modifications have been made to the inactive boot environment, a single reboot is required to activate it.  Also, most Special Install Instructions specified in patch READMEs can be ignored when patching an inactive boot environment.

When patching a live boot environment, certain patches require system downtime in order to complete their installation.

Such requirements will be specified in the patch README file and is also specified by the SUNW_PATCH_PROPERTIES field in the patch's pkginfo file(s).

As mentioned previously, the problem with patching a live boot environment (without Deferred Activation Patching) is that some objects delivered in a patch, such as shared objects, may be invoked immediately while other objects, such as genunix, will only be activated when the system is rebooted.  Also, processes already running in memory may be using the old versions of objects while processes started after the patch(es) were applied will be using the new versions of the same objects (when Deferred Activation Patching isn't specified).  In some cases this is harmless.  In other cases, the system may be in a potentially inconsistent state until it is rebooted.

Some patches require that they be installed in Single User Mode (run level S) when applied to a live boot environment.  This is to ensure that the system is in a quiesced state to avoid the above potential problems.  This will be specified in the patch README file.  Also, the SUNW_PATCH_PROPERTIES field in the patch's pkginfo file(s) will contain the entry singleuser_required.

Some patches require that the system be rebooted at some convenient point after the patch is applied in order to activate its contents. Either a normal reboot or a reconfigure reboot may be required. This will be specified in the patch README file.  Also, the SUNW_PATCH_PROPERTIES field in the patch's pkginfo file(s) will contain the entry rebootafter or reconfigafter.  For such patches, the system remains in a consistent state until the reboot takes place.  The reboot is simply to allow the changes supplied by the patch to be activated. If a reconfigure reboot is required, application of the patch will cause the creation of a /.reconfigure file which will result in a reconfigure reboot when the system is rebooted.

Some patches require that the system be rebooted immediately after the patch is applied to a live boot environment.  For such patches, the system is in a potentially inconsistent state until the reboot takes place.  Either a normal reboot or a reconfigure reboot may be required. This will be specified in the patch README file.  Also, the SUNW_PATCH_PROPERTIES field in the patch's pkginfo file(s) will contain the entry rebootimmediate or reconfigimmediate.  If a reconfigure reboot is required, application of the patch will cause the creation of a /.reconfigure file which will result in a reconfigure reboot when the system is rebooted.  Normally, it is OK to apply further patches to the live boot environment before initiating the reboot.  However, normal operations must not be resumed until after the reboot is performed.  On the rare occasion where the reboot must be instigated before any further patches are applied, such as is the case with Solaris 10 Kernel patch 118833-36 (SPARC) / 118855-36 (x86), such patches will typically contain code to prevent further patches from being applied as an added safety mechanism.

About

This blog is to inform customers about patching best practice, feature enhancements, and key issues. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. The Documents contained within this site may include statements about Oracle's product development plans. Many factors can materially affect these plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. ~~~~~~~~~~~~ Gerry Haskins, Director, Software Lifecycle Engineer

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today