Patching Strategies

As mentioned in my initial posting, there isn't a "one size fits all" patching strategy for all customers to use in all circumstances.

Perhaps the most common question which customers ask is, "What patches should I apply to my system ?"

The answer, unfortunately, is "It depends."

Many factors determine what patching strategy is appropriate for a particular system.  These may include:

  • Risk profile of the customer.  For example, Financial institutions tend to be very risk adverse.  Their change control processes can be onerous.
  • Criticality of the system.  Is it Life Critical, Mission Critical, Business Critical, or relatively expendable ?
  • Risk profile of the system.  For example, is it behind a firewall, is it vulnerable to Denial Of Service attacks, etc. ?
  • Cost of planned downtime (for proactive patching and maintenance) versus the cost of unplanned downtime (for reactive break-and-fix patching and maintenance).
  • Available Maintenance Windows
  • Upgrade strategy - Is the customer still on older versions such as Solaris 8 or Solaris 9 or is there a desire to leverage some of the cool new software features (e.g. Containers (Zones), ZFS, DTrace, etc.) or support for cool new hardware available in Solaris 10 ?
  • Desire to keep a relatively homogeneous Operating Environment across similar servers
  • etc.

While I can discuss some evolving thoughts on patching strategies here, please note that Sun Services offer comprehensive solutions tailored for the needs of specific customers.  The thoughts expressed here are not a substitute for careful analysis of the specific needs of individual customers.

Minimizing Risk

Risk minimization is a key consideration for many customers when deciding on a patching strategy.

Change implies risk.

There are industry studies which show that for every x number of bug fixes or lines of code changed, a new bug or regression is introduced.

One might logically conclude therefore, that the more change that is applied to a system, the more the risk of introducing a regression.  Hence, one might conclude that applying the minimum number of patches and hence the minimum amount of change would minimize risk.

But that's just one factor.

It's also important to consider the test coverage of the various change delivery options.  This includes test coverage by Sun as well as test coverage by the customer, channel partner, or other vendor.

Always install the latest patch utilities patch first

Always install the latest version of the Solaris patch utilities patch before installing any other patches.  This is important to ensure that you have all the latest fixes to the patch utilities.  The patch utility patches are always contained in the Solaris Recommended and SunAlert patch clusters and are always installed first, along with any patches which they themselves require.

The patch utilities patches are currently:

  Solaris 10 SPARC:    119254
  Solaris 10 x86:           119255
  Solaris 9 SPARC:      112951
  Solaris 9 x86:             114194
  Solaris 8 SPARC:      110380
  Solaris 8 x86:             110403

Depending on the OS version, several other patches may be required to avoid issues which can impact correct patch application.  Such patches are listed in the "Latest patch updates" section on the SunSolve home page.

Solaris Patch Management: Recommended Strategy 

The Solaris Patch Management: Recommended Strategy available from http://docs.sun.com and linked off the SunSolve "Patches and Updates" page is a good starting point. 

Perhaps surprisingly, it shows from an analysis of customer Explorer data that the more patches which are applied to a system, the more downtime that system will experience.  This is largely because, as discussed in the preceding posting, a number of patches require downtime in order to be installed on a live boot environment. 

However, in many cases the cost of unplanned downtime to fix issues is much, much higher than the cost of planned downtime to facilitate preventative patching to prevent issues from occurring in the first place. 

The trick is to know which patches are likely to prevent issues on a particular system.

Recommended and Sun Alert Patch Clusters 

When deciding what patches to apply to Solaris, the Recommended and the Sun Alert Patch Clusters, which are available from SunSolve to customers with a valid support contract, are a good starting point.  They provide:
  • The latest revision of patch utilities patch
  • Solaris patches which address Sun Alert issues - that is, patches which address Security, Data Corruption, or System Availability issues.
  • Any patch which is required by either of the above.

The main difference between the Recommended Patch Cluster and the newer Sun Alert Patch Cluster is that the Sun Alert Patch Cluster contains the lowest revision of patches which address Sun Alert issues while the Recommended Patch Cluster contains the latest available revision of such patches.  Both are good options.

Note, the Recommended and Sun Alert Clusters only contain patches for the Solaris OS.  They do not contain patches for middleware or application layer products such as Java ES, SunStudio, etc. 

Both the Recommended and the Sun Alert Patch Clusters come with an install_cluster script and a patch_order file listing the order in which the patches are to be installed.  See the Cluster README files linked off the "Patches and Updates" page on SunSolve for further information.   (On http://sunsolve.sun.com/show.do?target=patches/patch-access , "Solaris 10 x86" is the Solaris 10 x86 Recommended Cluster and "Solaris 10 x86 Sun Alert Patch Cluster" is self-explanatory.)

Applying the Solaris Recommended or Sun Alert patch cluster at each available maintenance window, plus any patches for fixes for bugs which you as a customer have filed yourself, is a good approach to proactive patching.

In between maintenance windows, monitor new Sun Alerts which are issued and determine whether your systems are vulnerable to the issue.  If the risk of the issue occurring is low or the consequences of the potential problem manageable, you may decide that it's OK to wait until the next maintenance window before applying the patch or taking whatever other action is recommended in the Sun Alert.  If the risk of the issue occurring is high and the potential problem severe, consider applying the patch or taking whatever other action is recommended in the Sun Alert as soon as possible.

Apart from these Solaris patch clusters, other patch clusters are available on http://sunsolve.sun.com/show.do?target=patches/patch-access for other products, such as J2SE and Java ES.

Installing or Upgrading to the latest Solaris Update Release

Each bi-weekly build of the next Solaris Marketing Release ( "Nevada" ) and the next Solaris 10 Update Release is intensely tested by a large number of test teams throughout Sun.  Each team has a particular focus, from functional testing of new features, regression testing of pre-existing features, performance improvement testing (each release should be faster than the last), new hardware testing, hardware regression testing, Desktop, Globalization, Accessibility, SunCluster, Java Enterprise System, patch testing, etc.

Due to the intensive testing of Update releases, installing or upgrading a system to the latest available Update Release should be seriously considered by customers wishing to minimize risk. 

While each Update release contains significant amount of code change, it has been intensely tested as a unit.  As previously mentioned, each Update contains all the available bug fixes at the time it was built.  Therefore, pre-existing functionality should be more stable and more performant in each successive Update.  The latest available Update should therefore provide a good stable baseline for customers. 

For example, Solaris 10 8/07 (Update 4) not only introduces cool new software features and support for cool new hardware, it also contains many fixes and enhancements to pre-existing functionality such as Containers (Zones), such as the ability to Upgrade Zones, significantly improving the maintainability of Zones.

The complexities of patching the live boot environment of a pre-Update 4 Zones systems can be avoided by upgrading to Update 4 instead.

"Dim Sum" Patching

As previously mentioned, new bug fixes as well as features "soak" in the next Marketing Release of Solaris under development (currently "Nevada";) to shake out any bugs in the code, before the code changes are allowed to be put back into an older release of Solaris for inclusion in a patch and, if the patch is for Solaris 10, included in the next Update Release.

 

In this way, patches leverage the intensive testing done on the Marketing and Update releases.  Indeed, Solaris 10 patches leverage this intensive testing twice: once in the Marketing Release "soak" test, and again when the bug fix is included in builds of the next Solaris 10 Update.

The bug fix in each patch is verified by the responsible engineers in-house using a test case and/or by providing the T-Patch (Test Patch) to escalating customers to verify that it fixes the issue.  In addition, the patch will be tested by the Patch System Test group and potentially by other test teams such as the Desktop QA or Hardware QA teams.  Only when all verification and testing has been successfully completed will the patch be released to SunSolve.  See an Overview of Patch Testing on SunSolve for further details.

Patch System Test test the patch both individually with any required patches, and cumulatively along with all other available patches. Testing the patch on its own helps ensure that all patch requirements have been correctly specified.  Testing the patch in combination with all other available patches helps ensure that there are no bad interactions between patches.  Testing these boundary conditions gives confidence that other patch combinations should work.

Extreme lengths are also taken in the code development and putback approval processes to ensure that patch requirements are correctly specified and that the change is compatible, well designed, and will not introduce regressions.

Nevertheless, if a customer takes individual patches which they feel are appropriate to their system, outside of a defined patch cluster such as the Recommended or Sun Alert Patch Cluster, they may end up running a code combination which has never been tested as a unit. 

The various checks and balances in the patch process should be fully sufficient to ensure this code combination is stable and functional.  But from a risk management perspective, running code which may not have been tested as a unit remains a finite risk.

This is what Bart Smaalders refers to as "Dim Sum" patching.

Most customers have practiced "Dim Sum" patching for years and, in general, it works very well.  Even with the massive amount of code changes included in Solaris 10 Update Releases compared to Solaris 8 or Solaris 9 Update Releases, there have been very few issues as a result of "Dim Sum" patching.

But using the latest available Solaris Update release or Recommended or SunAlert Cluster or EIS CD (via Sun Connection 1.1.1 Satellite or xVM Ops Center 1.0) or other set of patches as a baseline has the advantage that that baseline has been tested to varying degrees as a unit, with Solaris Update releases the most intensely tested of those options.

This is a case where taking more change by installing or upgrading to the latest Solaris Update may actually imply less risk.

Customer Testing

Testing by Sun is just one factor.  Testing by the customer, channel partner, or other vendor also plays a significant part in managing risk.

If the customer has a test set up which exactly mirrors their live production environment, with tests which mimic normal and peak loads, then their confidence level in any patching strategy they choose can be very high.

The less sophisticated the customer test environment, the more the customer is relying on Sun's Development and QA processes to  catch all the issues.

Patch Quality 

The good news is that the Sun's Development processes are meticulous and mature and the QA processes sophisticated and effective. 

That doesn't stop all issues escaping to customers but, in general, the quality of patches is very high. 

For example, out of approximately 4,500 patches released by Sun in 2007, only 70 have been subsequently withdrawn due to serious issues with them.

Patch Maturity

A number of customers like to wait a set period of time after a patch has been released before considering installing it to see if Sun or other customers find issues with it.

This is a reasonable strategy.

Some customers wait until 6 weeks after a patch has been released before applying it.  Data analysis shows that there is no particular significance in this time period.

Data analysis shows that on the rare occasion where patches which are withdrawn from SunSolve after their release due to serious issues with them, the length of time between when a patch was released and when it was withdrawn shows no clear period of time before which a patch can be considered unsafe or after which a patch can be considered safe.  Some patches are withdrawn within the first 3 weeks of release.  Others not until 18 months or 2 years later. 

However, it is reasonable to assume that patches which aren't withdrawn for a significant period of time only have a serious issue is a rare configuration which most customers won't encounter.

Comments:

Hi:

So, it's better to use Sun Recommended or any tool as PCA and download and install all the patches?

Second question. If I use 6/06 release with SC 3.1 and zones (local zones) and clusterized zone, is a good idea to update everything at same time, node per node?

Posted by pbal on January 10, 2008 at 06:28 AM GMT #

pca is a good 3rd party tool, written by Martin Paul. This particular posting was largely about what to apply rather than how to apply it. I'll deal a little with patch automation tools in a subsequent posting.

Regarding your Sun Cluster question, that's outside my personal expertise, so I'll see if I can find someone who can answer that. My limited understanding is yes, Cluster nodes are usually patched using a rolling update, with each node updated in turn.

Posted by Gerry Haskins on January 10, 2008 at 07:27 AM GMT #

The following document describes how to patch a clusterized zone:
http://docs.sun.com/app/docs/doc/819-2664/ftzlv?a=view,

The above document describes patching a zone which is part of the cluster ( fails over to other nodes ).

I suggest you read the SunCluster documentation on docs.sun.com and follow up on the zones-discuss@opensolaris.sun.com alias if you have further questions on patching Zones on SunCluster.

Posted by Gerry Haskins on January 10, 2008 at 08:50 AM GMT #

I like and agree with what you've written so far.

I'd be interested in hearing your comments on the performance of the patch tools: currently it is terrible for reasons which are fairly easy to discover if not widely known. With Solaris 10 this is becoming a serious problem as the time to apply large patch bundles can exceed the largest available window for such work (many organisations are not happy for even LU-based patching to be taking place during critical periods). For machines with whole root zones it can exceed the largest window by a very substantial amount.

I remain surprised that Sun have not chosen to fix this performance issue (which is basically an algorithmic problem). What I've heard most recently is variations of "everything will be different in Nevada/Indiana and we basically are not going to fix this in 10". This is pretty depressing if true: it says something bad about Sun's attitude to their customers, who are hurting, and who will not be deploying Nevada/Indiana for years to come.

Posted by Tim Bradshaw on January 16, 2008 at 10:19 AM GMT #

Hi Tim!

Yes, patching performance in general, and on Zones in particular, is a known issue which Sun does intend to address.

Some prototype scripts existed for parallel patching of Zones, but unfortunately these are incompatible with Deferred Activation Patching.

Now that we have Deferred Activation Patching sorted, improving patching performance, especially on Zones, is close to being top of the list of remaining issues to tackle.

Regarding your references to Nevada/Indiana, the issues encountered with Solaris 10 patching is part of the inspiration for Image Packaging System (IPS), to architecturally resolve the underlying issues. I'm excited about what I'm hearing. Both Bart Smaalders and David Comay are intimately familiar with the issues with the current package and patching architecture so they are well placed to architect solutions in IPS.

However, please be assured that Sun will continue to address the issues in the current package and patching implementation, including performance.

I'm not in a position at the moment to say when we'll have a performance enhancement solution available for customers to use, but I will keep you posted in this blog.

Posted by Gerry Haskins on January 17, 2008 at 11:09 AM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

This blog is to inform customers about patching best practice, feature enhancements, and key issues. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. The Documents contained within this site may include statements about Oracle's product development plans. Many factors can materially affect these plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. ~~~~~~~~~~~~ Gerry Haskins, Director, Software Lifecycle Engineer

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today