By Gerry Haskins-Oracle on Jan 10, 2008
As mentioned in my initial posting, there isn't a "one size fits all" patching strategy for all customers to use in all circumstances.
Perhaps the most common question which customers ask is, "What patches should I apply to my system ?"
The answer, unfortunately, is "It depends."
Many factors determine what patching strategy is appropriate for a particular system. These may include:
- Risk profile of the customer. For example, Financial institutions tend to be very risk adverse. Their change control processes can be onerous.
- Criticality of the system. Is it Life Critical, Mission Critical, Business Critical, or relatively expendable ?
- Risk profile of the system. For example, is it behind a firewall, is it vulnerable to Denial Of Service attacks, etc. ?
- Cost of planned downtime (for proactive patching and maintenance) versus the cost of unplanned downtime (for reactive break-and-fix patching and maintenance).
- Available Maintenance Windows
- Upgrade strategy - Is the customer still on older versions such as Solaris 8 or Solaris 9 or is there a desire to leverage some of the cool new software features (e.g. Containers (Zones), ZFS, DTrace, etc.) or support for cool new hardware available in Solaris 10 ?
- Desire to keep a relatively homogeneous Operating Environment across similar servers
While I can discuss some evolving thoughts on patching strategies here, please note that Sun Services offer comprehensive solutions tailored for the needs of specific customers. The thoughts expressed here are not a substitute for careful analysis of the specific needs of individual customers.
Risk minimization is a key consideration for many customers when deciding on a patching strategy.
Change implies risk.
There are industry studies which show that for every x number of bug fixes or lines of code changed, a new bug or regression is introduced.
One might logically conclude therefore, that the more change that is applied to a system, the more the risk of introducing a regression. Hence, one might conclude that applying the minimum number of patches and hence the minimum amount of change would minimize risk.
But that's just one factor.
It's also important to consider the test coverage of the various change delivery options. This includes test coverage by Sun as well as test coverage by the customer, channel partner, or other vendor.
Always install the latest patch utilities patch first
Always install the latest version of the Solaris patch
utilities patch before installing any other patches. This is important to ensure that
you have all the latest fixes to the patch utilities. The patch utility patches are
always contained in the Solaris Recommended and SunAlert patch clusters and are
always installed first, along with any patches which they themselves require.
The patch utilities patches are currently:
Solaris 10 SPARC: 119254
Solaris 10 x86: 119255
Solaris 9 SPARC: 112951
Solaris 9 x86: 114194
Solaris 8 SPARC: 110380
Solaris 8 x86: 110403
Depending on the OS version, several other patches may be required to avoid issues which can impact correct patch application. Such patches are listed in the "Latest patch updates" section on the SunSolve home page.
Solaris Patch Management: Recommended Strategy
The Solaris Patch Management: Recommended Strategy available from http://docs.sun.com and linked off the SunSolve "Patches and Updates" page is a good starting point.
Perhaps surprisingly, it shows from an analysis of customer Explorer data that the more patches which are applied to a system, the more downtime that system will experience. This is largely because, as discussed in the preceding posting, a number of patches require downtime in order to be installed on a live boot environment.
However, in many cases the cost of unplanned downtime to fix issues is much, much higher than the cost of planned downtime to facilitate preventative patching to prevent issues from occurring in the first place.
The trick is to know which patches are likely to prevent issues on a particular system.
Recommended and Sun Alert Patch ClustersWhen deciding what patches to apply to Solaris, the Recommended and the Sun Alert Patch Clusters, which are available from SunSolve to customers with a valid support contract, are a good starting point. They provide:
- The latest revision of patch utilities patch
- Solaris patches which address Sun Alert issues - that is, patches which address Security, Data Corruption, or System Availability issues.
- Any patch which is required by either of the above.
The main difference between the Recommended Patch Cluster and the newer Sun Alert Patch Cluster is that the Sun Alert Patch Cluster contains the lowest revision of patches which address Sun Alert issues while the Recommended Patch Cluster contains the latest available revision of such patches. Both are good options.
Note, the Recommended and Sun Alert Clusters only contain patches for the Solaris OS. They do not contain patches for middleware or application layer products such as Java ES, SunStudio, etc.
Both the Recommended and the Sun Alert Patch Clusters come with an install_cluster script and a patch_order file listing the order in which the patches are to be installed. See the Cluster README files linked off the "Patches and Updates" page on SunSolve for further information. (On http://sunsolve.sun.com/show.do?target=patches/patch-access , "Solaris 10 x86" is the Solaris 10 x86 Recommended Cluster and "Solaris 10 x86 Sun Alert Patch Cluster" is self-explanatory.)
Applying the Solaris Recommended or Sun Alert patch cluster at each available maintenance window, plus any patches for fixes for bugs which you as a customer have filed yourself, is a good approach to proactive patching.
In between maintenance windows, monitor new Sun Alerts which are issued and determine whether your systems are vulnerable to the issue. If the risk of the issue occurring is low or the consequences of the potential problem manageable, you may decide that it's OK to wait until the next maintenance window before applying the patch or taking whatever other action is recommended in the Sun Alert. If the risk of the issue occurring is high and the potential problem severe, consider applying the patch or taking whatever other action is recommended in the Sun Alert as soon as possible.Apart from these Solaris patch clusters, other patch clusters are available on http://sunsolve.sun.com/show.do?target=patches/patch-access for other products, such as J2SE and Java ES.
Installing or Upgrading to the latest Solaris Update Release
Each bi-weekly build of the next Solaris Marketing Release ( "Nevada" ) and the next Solaris 10 Update Release is intensely tested by a large number of test teams throughout Sun. Each team has a particular focus, from functional testing of new features, regression testing of pre-existing features, performance improvement testing (each release should be faster than the last), new hardware testing, hardware regression testing, Desktop, Globalization, Accessibility, SunCluster, Java Enterprise System, patch testing, etc.
Due to the intensive testing of Update releases, installing or upgrading a system to the latest available Update Release should be seriously considered by customers wishing to minimize risk.
While each Update release contains significant amount of code change, it has been intensely tested as a unit. As previously mentioned, each Update contains all the available bug fixes at the time it was built. Therefore, pre-existing functionality should be more stable and more performant in each successive Update. The latest available Update should therefore provide a good stable baseline for customers.
For example, Solaris 10 8/07 (Update 4) not only introduces cool new software features and support for cool new hardware, it also contains many fixes and enhancements to pre-existing functionality such as Containers (Zones), such as the ability to Upgrade Zones, significantly improving the maintainability of Zones.
The complexities of patching the live boot environment of a pre-Update 4 Zones systems can be avoided by upgrading to Update 4 instead.
"Dim Sum" Patching
As previously mentioned, new bug fixes as well as features "soak" in the next Marketing Release of Solaris under development (currently "Nevada" to shake out any bugs in the code, before the code changes are allowed to be put back into an older release of Solaris for inclusion in a patch and, if the patch is for Solaris 10, included in the next Update Release.
In this way, patches leverage the intensive testing done on the Marketing and Update releases. Indeed, Solaris 10 patches leverage this intensive testing twice: once in the Marketing Release "soak" test, and again when the bug fix is included in builds of the next Solaris 10 Update.
The bug fix in each patch is verified by the responsible engineers in-house using a test case and/or by providing the T-Patch (Test Patch) to escalating customers to verify that it fixes the issue. In addition, the patch will be tested by the Patch System Test group and potentially by other test teams such as the Desktop QA or Hardware QA teams. Only when all verification and testing has been successfully completed will the patch be released to SunSolve. See an Overview of Patch Testing on SunSolve for further details.
Patch System Test test the patch both individually with any required patches, and cumulatively along with all other available patches. Testing the patch on its own helps ensure that all patch requirements have been correctly specified. Testing the patch in combination with all other available patches helps ensure that there are no bad interactions between patches. Testing these boundary conditions gives confidence that other patch combinations should work.
Extreme lengths are also taken in the code development and putback approval processes to ensure that patch requirements are correctly specified and that the change is compatible, well designed, and will not introduce regressions.
Nevertheless, if a customer takes individual patches which they feel are appropriate to their system, outside of a defined patch cluster such as the Recommended or Sun Alert Patch Cluster, they may end up running a code combination which has never been tested as a unit.
The various checks and balances in the patch process should be fully sufficient to ensure this code combination is stable and functional. But from a risk management perspective, running code which may not have been tested as a unit remains a finite risk.
This is what Bart Smaalders refers to as "Dim Sum" patching.
Most customers have practiced "Dim Sum" patching for years and, in general, it works very well. Even with the massive amount of code changes included in Solaris 10 Update Releases compared to Solaris 8 or Solaris 9 Update Releases, there have been very few issues as a result of "Dim Sum" patching.
But using the latest available Solaris Update release or Recommended or SunAlert Cluster or EIS CD (via Sun Connection 1.1.1 Satellite or xVM Ops Center 1.0) or other set of patches as a baseline has the advantage that that baseline has been tested to varying degrees as a unit, with Solaris Update releases the most intensely tested of those options.
This is a case where taking more change by installing or upgrading to the latest Solaris Update may actually imply less risk.
Testing by Sun is just one factor. Testing by the customer, channel partner, or other vendor also plays a significant part in managing risk.
If the customer has a test set up which exactly mirrors their live production environment, with tests which mimic normal and peak loads, then their confidence level in any patching strategy they choose can be very high.
The less sophisticated the customer test environment, the more the customer is relying on Sun's Development and QA processes to catch all the issues.
The good news is that the Sun's Development processes are meticulous and mature and the QA processes sophisticated and effective.
That doesn't stop all issues escaping to customers but, in general, the quality of patches is very high.
For example, out of approximately 4,500 patches released by Sun in 2007, only 70 have been subsequently withdrawn due to serious issues with them.
A number of customers like to wait a set period of time after a patch has been released before considering installing it to see if Sun or other customers find issues with it.
This is a reasonable strategy.
Some customers wait until 6 weeks after a patch has been released before applying it. Data analysis shows that there is no particular significance in this time period.
Data analysis shows that on the rare occasion where patches which are withdrawn from SunSolve after their release due to serious issues with them, the length of time between when a patch was released and when it was withdrawn shows no clear period of time before which a patch can be considered unsafe or after which a patch can be considered safe. Some patches are withdrawn within the first 3 weeks of release. Others not until 18 months or 2 years later.
However, it is reasonable to assume that patches which aren't withdrawn for a significant period of time only have a serious issue is a rare configuration which most customers won't encounter.