By Gerry Haskins on Aug 14, 2009
My colleague, Ed Clark, has made significant improvements to the Solaris 10 Recommended and Sun Alert patch clusters. These improvements have just been released and are in the current clusters available to contract customers from the Patch Cluster & Patch Bundle Downloads on SunSolve.
Ed's improvements include:
- Filtering out "false negatives" from the patch utility return codes, so that if the cluster install script returns "1", you know you've got a real problem which needs investigating. As you may know, the Solaris patch utility, 'patchadd', can return errors for some acceptable situations - for example, if the patch is already applied to the system, or a later revision of the patch or a patch which obsoletes it is already applied to the system, or none of the packages in the patch are on the target system (e.g. because a reduced Install Metacluster was used to install it or the system has been security hardened by package removal), etc. Such conditions are acceptable "errors" which do not usually require further investigation by the user. By filtering these conditions out, if the 'installcluster' script returns "1", you know it isn't because of one of these acceptable "errors", and therefore you need to look at the logfiles to find out what's gone wrong. For further information, please see the cluster README and Analyzing a patchadd or patchrm Failure in the Solaris OS.
- The new 'installcluster' script will exit as soon as it encounters an unexpected failure - i.e. not one of the acceptable "errors" mentioned above. This prevents potentially compounding issues by attempting to apply further patches.
- The new 'installcluster' script includes context intelligence for patching operations. It informs the user when zones need to be halted, and it provides phased installation to handle patches which absolutely require an immediate reboot before further patches can be applied. Such interim reboots are only needed when patching a live boot environment on a system below Kernel patch 118833-36 (SPARC) / 118855-36 (x86) and well as the earlier interim reboot required on x86 related to 'libc.so' patches and Kernel patch 118844-14. On systems below these patch levels, the 'installcluster' will stop at the appropriate point when patching the live boot environment, and inform the user to reboot and re-invoke the 'installcluster' script. (In the old cluster install script, it simply tried to carry on blindly past such interim reboots, spewing out error messages, although code in the relevant patches prevented any harm from being done). These interim reboots, when required, are dealt with relatively early in the cluster install sequence so that once completed, the Sys Admin can leave the rest of the installation to finish unattended and move onto other systems.
- The new 'installcluster' script provides better integration with Solaris Live Upgrade as the user can now specify the Live Upgrade alternate boot environment to patch by name.
- The new 'installcluster' script performs space checking prior to
installing each patch, and will halt if it believes there is
insufficient space to complete the installation successfully. For example, this helps avoid non-global zones
getting out of sync regarding patch levels with respect to the global zone. This is an important enhancement as running out of space
during patching can potentially leave the system in an inconsistent
state and is to be avoided. Even removing a patch requires space, so
immediate removal of a patch which has failed to apply correctly due to
space issues should be avoided until sufficient space is freed up and potential issues caused by its partial installation investigated - for example, was the undo.Z file successfully created to enable backout ? (Tip: It may be better to retry the patch installation once space has been freed up rather than patch removal in such circumstances. Contact Sun Support for instructions if you encounter such issues.). The space checking enhancements in the 'installcluster' script are designed to prevent such problems occurring.
- The messages and log files produced by the 'installcluster' script are clear and well structured. For example, a "failed" log is created if a patch fails to apply. See the Cluster README for further information.
- The 'patch_order' places patches in an optimal order for installation to avoid known issues - for example, the patch utilities patches are installed as early in the sequence as possible to avoid hitting patch installation bugs which are fixed in the patch utility patches, and the Kernel patch procedural script override patch, 125555 (SPARC) / 125556 (x86), is ordered prior to 137137-09 (SPARC) / 137138-09 (x86) to resolve some known issues. When patching an alternate boot environment (which is recommended), a small sub-set of pre-requisite patches, primarily the patch utility patches, need to be applied to the live boot environment to ensure correct patching operation. The 'installcluster' script will check for these pre-requisite patches are halt installation if they are not present, advising the user of the 'installcluster' script option to use to install these pre-requisite patches. Further patches may need to be installed on the live boot environment to support Live Upgrade. See the cluster README for further information.
- The patches have been moved to a 'patches' sub-directory, to de-clutter the top level directory of the unzipped cluster.
- Please see the cluster README file for further information. Customers should read the cluster README file and look at the Special Install Instructions in the patches within the cluster prior to installation.
I really want to thank Ed Clark for the enormous amount of thought and effort he has put into improving the cluster installation experience. The work he's done on the Solaris 10 Recommended and Sun Alert patch cluster is a continuation of his previous work on the Solaris Update Patch Bundles and the Solaris 10 Live Upgrade Zones Starter Patch Bundle. Nice work, Ed!
While the 'installcluster' script is copyrighted, I am happy for customers to use it, and the 'patch_order' file, as a starting point for their own customized patch bundles, so long as it is for their own use and is not to be given to a 3rd party or used for commercial gain (e.g. by a 3rd party maintainer or 3rd party commercial automation tool).
We have also made significant improvements to the back end processes to ensure higher and more consistent cluster quality.
Originally, the clusters were created by the Patch Operations and Distribution (POD) team after patch release. The POD Cluster QA process left a lot to be desired, resulting in inconsistent cluster quality. To plug this gap, my Patch System Test team have been testing the clusters for several years, but the old process only allowed us to test them in parallel with their release, which meant that we found issues at the same time that early downloaders of the cluster encountered them. Although we ensured such issues were fixed as quickly as possible, it still obviously compromised our customers' experience.
In the new process, the clusters are routed to Patch System Test (PST) prior to release. PST run a transformation script on them to optimize the patch installation order, etc. The clusters will only be released once they have passed PST testing. This should ensure higher and more consistent quality for customers. Work is continuing to move the entire patch cluster generation process to PST, although these future backend enhancements in this regard should be invisible to customers.