Wednesday Jul 25, 2007

Rethinking patching

As Stephen mentioned recently, several of us have been thinking about revising the way we manage software change on Solaris.  I've been particularly focused on the difficulties Sun and it's customers have with the patching process, and the kinds of changes we need to make as a result in our technology and development processes.

 Today, most customers don't run OpenSolaris; they run a supported version of Solaris such as Solaris 8, 9 or 10.  A supported release means that someone will answer the phone, and that patches for problems are available.

Patches are a separate software change control mechanism distinct from package versions in Solaris.  Each patch may affect portions of several packages; patches are intended to include all the files necessary to fix one or more problems, either directly or by specifying dependencies.  If a patch affects packages which are not installed on this system (typically because it has been minimized), those portions of the patch are not installed.  If the administrator later adds the missing package, he must remember (good luck) to re-apply the patches since the packaging code knows nothing of patches.

Customers are today free to install which ever patches they feel are appropriate for their environment, consistent with the built-in dependency requirements.  This customization is a technique I refer to as Dim Sum patching, and is a major cause of patching difficulties.  Many customers pick and choose amongst the thousands of patches available for Solaris 10, for example; this means that customers are often pioneering new configurations.  Note that each Solaris release consists of a single source base; all Solaris 10 updates, for example, are but snapshots of the same Solaris patch gate at different times.  As a result, the developers are working on a cumulative  set of all previous changes; when a new patch is created, the files in the patch not only contain the desired fix, but all previous fixes as well.  Thus, the software change is constructed as a linear stream of change, but customers installs selected binaries from the various builds via patches.


When I've discussed the hazards of  Dim Sum patching with customers, the reasons given are typically characterizable as :



  1. we don't need all those patches,  we don't have those drivers loaded
  2. we're reducing downtime by not installing so many patches
  3. the less change, the less risk.

To these, I reply:

  1. If you don't need those drivers, then remove them them w/ pkgrm rather than leaving them in an unpatched state awaiting the introduction of new hardware or software to expose problems.  Minimization, not spotty patching, is the answer.  This is akin to disposing of an unused car, rather than simply leaving it unmaintained.
  2. Today, you should be using Live Upgrade and patching the alternate boot image to reduce downtime.  This allows machines in production to be safely patched, and will not leave the system in an inconsistent or unbootable state in the case of power failure during patching operations.  In the future, the new packaging system will always patch a clone of the current system to avoid the potential for disaster in case of power failure.
  3. Our experience has been that customers running all of the changes in an update are generally far less likely to experience problems than those who select only the fixes and features that appeal to them, and hope that our QA processes found all hidden dependencies on previous changes.

For our new packaging system, there is a powerful incentive to eliminate Dim Sum patching:  since we wish to use a single version numbering space for any package, attempting to support fine-grain Dim Sum patching would require very small packages - affecting the performance of packaging operations, and significantly increasing the workload of OpenSolaris developers.   Instead, we can set package boundaries according to what makes sense for minimization purposes. 

This implies that future (post Solaris 11) patches will be completely cumulative (aside from some exceptions for urgent security fixes), at least for the core OS.  Your system will be able to determine what is needed to bring the installed software up to the desired revision level automatically; needing to pick and choose patches will be a thing of the past. 


No Dim Sum Patching!


An engineer's viewpoint on Solaris...


« April 2014