Friday Dec 04, 2009

SATisfaction

During pkg(5) development it has become quite clear that computing the correct set of packages to install or upgrade is a non-trivial task.  Initially, we started delivering pkg(5) with a solution engine that simply took the latest available packages.  This worked so long as we only delivered packages that were all compatible, no third party publishers existed, and users were happy staying on the bleeding edge.

Since none of these conditions were maintainable, a more sophisticated solution was essential.

 We gained some breathing room with the introduction of incorporation dependencies.  Such a dependency in a package specifies the version (at a variable level of precision) of compatibility with another package.   We have used a package full of these dependencies (termed an incorporation ) during OpenSolaris development to insure that the various operating system packages from pkg.opensolaris.org  come from the same build - that there's no way to get build 123's drivers, but build 127's IP stack.  In effect these packages define surfaces of compatible package versions, and allow package maintainers to refactor their packages, exchange content, etc. without the need for dependencies at the package level that would prevent incompatible packages from appearing on the same system.  The use of incorporations has allowed us to continue OpenSolaris development with a solver that first applied all constraints imposed by installed incorporations, and then attempted to install the latest possible packages. 

As we anticipated, however, the existing solver's deficiencies have become steadily more limiting.  Since the existing solver doesn't support back-tracking (e.g. revising a selected package version selection backwards during solution generation), trying to install third party packages that were published with different versions for various OpenSolaris releases was difficult if your machine was not running the most recent releases, and dealing with nested incorporation dependencies was impossible.  I experimented with a more conservative solver that attempted to upgrade as little as possible; this made upgrades across multiple releases painfully slow, however, and still didn't deal with newer versions being un-installable due to missing dependencies, etc.  In addition, we received multiple requests for exclude-type dependencies that would allow packages to prevent installation of incompatible packages; this was definitely outside the capabilities of our naive solver.

Conventional solvers iteratively attempt to satisfy package dependencies by walking the package dependency graph and selecting package versions to try; our experience w/ the large numbers of package being generated by biweekly (or nightly or even every push) builds indicated that such an approach would be very slow in some cases as the order of graph traversal might lead to the need to explore thousands of possible solutions.  Reading some of the research (in particular, the EDOS and ZYpp projects) indicated significant interest/progress in attacking packaging computations as boolean satisfiability problems, and we decided to try that approach.

Boolean satisfiability solvers need their problems posed in conjunctive normal form; e.g. as a conjunction (logical ANDing) of clauses containing disjunctions (logical ORed) of variables (or their negation).  By assigning the presence of a particular version of each package a unique boolean variable, we can construct clauses for dependencies and existence that allow the solver to compute solutions to our packaging problems.  For example, given four possible versions of package A, the fact that we can only install one version of a time of package A yields the following set of clauses (assigning A1 to indicate the presence of version 1 of A, and ! to represent negation and | to represent disjunction):

  • !A1 |  !A2
  • !A1 |  !A3
  • !A1 |  !A4
  • !A2 |  !A3
  • !A2 |  !A4
  • !A3 |  !A4

Clearly, large numbers of versions of a package can generate an inordinate number of clauses; more on that a bit later.

If a require dependency exists on a particular package version, indicating that that version or newer is required, clauses are generated to describe that dependency.  For example, if package B@1 depended on A@2:

  • !B1 | A2 | A3 | A4

If an optional dependency exists on a particular package version, that indicates that if that package is installed it must be at least at the specified level.  Here, we end up excluding versions we don't want.... For example, if package B@1 optional depended on A@3:

  • !B1 | !A1
  • !B1 | !A2

Our incorporate dependencies that specify the version needed also generate such exclusionary clauses.  For example, if package B@1 incorporates A@3 (e.g. if A is present it must be at version 3):

  • !B1 | !A1
  • !B1 | !A2
  • !B1 | !A4

Lastly, actual exclude dependencies indicate that if present, the depended upon package must be at the specified level or lower.  If package B@1 has a exclude dependency on A@3:

  • !B1 | !A3
  • !B1 | !A4

Once the packaging problem can be described as a series of clauses, it can be passed to a SAT solver for solution; the solver generates a set of packages that will meet the specified criteria, or declare that no solution exists.  The number of variables used in the solver is the number of package versions installed plus those considered; the number of clauses used depends on the number of versions of a package and the types of dependencies.  To minimize the size of the problem and the resulting memory footprint, we don't simply generate clauses for all possible packages and their versions; since we know that packages are not allowed (normally) to decrease in version, we eliminate from consideration earlier versions of any installed packages, and any packages excluded by incorporations we're unwilling to change.  We also eliminate duplicate packages from publishers we're not willing to consider.  This "trimming" phase is actually the most time consuming phase of generating the list of packages to install.

If all we needed was a single solution, this would be adequate; however, we'd like to find solutions that meet our definition of optimal.  We do this by finding solutions and then looking for better ones by resolving the problem w/ additional constraints excluding areas we don't consider optimal. For example, when installing a package we're willing to update other packages if needed (within incorporation constraints, of course), but we'd like to minimize such changes.  This is an area we're still exploring, and a likely topic for additional blogging.

We choose the Minisat solver as a good place to start as they had built a C version of Minisat that would be easy to link into our packaging system, which is coded in Python.  About the only changes I made were to keep track of the clauses fed to the solver, so that it is possible to cheaply revise solutions by caching copies of the current state of the solver. Introduction of the SAT solver awaited Shawn Walker's very nice catalog rewrite which added dependency information into the package catalog, as it was critical for perfomance reasons to not have to download hundreds of manifests to do package planning.  I integrated the new solver into the packaging gate for build 128 of OpenSolaris, now available from pkg.opensolaris.org/dev and other mirror repositories.

One of the interesting implications of the solver change has been that it is more difficult to determine just why there are no image-updates are available.  The previous solver would fail (badly) when encountering missing packages in dependencies, etc; the new solver just considers packages with missing dependencies as uninstallable and thus unavailable for upgrade.  Image-update will now very rarely generate any error messages, which is nice from a user aspect but makes debugging mis-configured or broken builds more difficult than before.  If you think you should be able to upgrade, try explicitly installing the version of entire (the incorporation that currently controls what software build you're running) you think you should be able to install w/ -nv as flags; this will generate much more verbose debugging output when no solution can be found, as the packaging system has some idea of what you'd like to achieve other than just "get me newer bits if you can". Generation of more useful error messages will remain an important area for further work.

Other interesting areas for further enhancements enabled by the SAT solver include constructing the entire incorporation as an incorporation of other incorporations; this will allow developers to easily run the latest kernel and older window system bits, or vice versa.  We're also considering conditional (package A requires package B if package C is installed)  and disjunction (package A requires package B or package C) dependencies to solve some of the more complex package configuration requests we've seen.



Wednesday Feb 04, 2009

Fattening packages - supporting multiple variants in a single package

Dealing with parts of a package

Traditionally, packaging systems have placed optional components of a package in separate packages, and established conventions for naming such components, such as -localization, -locale, -devel, -doc, etc. This method of ad-hoc decomposition makes it more difficult for GUI programs to offer the appropriate choices when selecting components, makes the introduction of new optional components difficult and makes installing documentation after the fact a painful process.

Packaging options also exist which are mutually exclusive; the typical example is which architecture the package supports. One cannot select both sparc and x86, since the two architecture's files collide - /usr/bin/ls is either a sparc binary or a i386 binary. Other examples of such colliding options include debug vs non-debug kernel binaries, and global vs nonglobal zones.

Initially we started solving the problem of selecting parts of a package using more or less ad-hoc client-side filtering, but it became clear that the publisher of the package needed to be able to describe what components intersected or were optional, and that we needed to be able to clearly distinguish between the two cases. Thus, we refer to options that may be selected or not selected in any combination, such as various locales, documentation, etc., as facets. Options which must be mutually exclusive are called variants. Both variants and facet appear as tags on IPS actions, and result in the action being selected or de-selected for installation.

An example of variant tags on an action is:

dir group=sys mode=0755 opensolaris.zone=global owner=root path=kernel/drv/amd64
  variant.arch=i386 variant.opensolaris.zone=global

Here this directory action is tagged with variant.arch=i386, indicating that this directory should only be installed on i386 machines (Solaris doesn't distinguish between 32 and 64 bit architectures) and variant.opensolaris.zone=global, which shows that this directory should only be installed in the global zone and not in local zones as those don't contain kernel components.

Note that any action in the repository can be so tagged, including dependencies.

During installation planning, all actions not applicable to the current image are removed from consideration. If a user with a down-rev version of IPS that doesn't understand variants attempts to install a fat package, either an exception will occur indicating that duplicate actions have been found, or if a set action is found with an architecture tag an assertion will trigger. In either case the solution is either to stop attempting to install fat packages, or upgrade your version of IPS to the latest available for your build.

The initial implementation in build 106 supports only variants (facets are coming later on). To faciliate detection of attempts to install packages on the wrong architecture, our publication code now inserts set actions into packages to indicate what architectures the package supports:

set name=variant.arch value=sparc value=i386 

Publishing Fat packages

We produce fat packages by publishing separately on each architecture, and then merging the resulting manifests. Actions which are identical for both architectures are not tagged w/ variant tags; those that differ or exist only for one architecture are tagged.  Packages that exist only on one architecture are tagged to indicate that as well. This step can be repeated for different variants, and more than two variants types can be combined at once, so packages containing debug kernel binaries could be merged w/ non-debug versions, and then merged across several architectures. An example of a merged manifest can be seen here. Of course, those using pkgsend to create packages can simply insert the appropriate tags directly. For the curious, the code to merge packages can be seen here. Further work is needed to improve the performance of publishing the merged packages; allowing multiple repositories to share the filestore would help greatly here.



Wednesday Jul 25, 2007

Rethinking patching

As Stephen mentioned recently, several of us have been thinking about revising the way we manage software change on Solaris.  I've been particularly focused on the difficulties Sun and it's customers have with the patching process, and the kinds of changes we need to make as a result in our technology and development processes.

 Today, most customers don't run OpenSolaris; they run a supported version of Solaris such as Solaris 8, 9 or 10.  A supported release means that someone will answer the phone, and that patches for problems are available.

Patches are a separate software change control mechanism distinct from package versions in Solaris.  Each patch may affect portions of several packages; patches are intended to include all the files necessary to fix one or more problems, either directly or by specifying dependencies.  If a patch affects packages which are not installed on this system (typically because it has been minimized), those portions of the patch are not installed.  If the administrator later adds the missing package, he must remember (good luck) to re-apply the patches since the packaging code knows nothing of patches.

Customers are today free to install which ever patches they feel are appropriate for their environment, consistent with the built-in dependency requirements.  This customization is a technique I refer to as Dim Sum patching, and is a major cause of patching difficulties.  Many customers pick and choose amongst the thousands of patches available for Solaris 10, for example; this means that customers are often pioneering new configurations.  Note that each Solaris release consists of a single source base; all Solaris 10 updates, for example, are but snapshots of the same Solaris patch gate at different times.  As a result, the developers are working on a cumulative  set of all previous changes; when a new patch is created, the files in the patch not only contain the desired fix, but all previous fixes as well.  Thus, the software change is constructed as a linear stream of change, but customers installs selected binaries from the various builds via patches.

 

When I've discussed the hazards of  Dim Sum patching with customers, the reasons given are typically characterizable as :

 

 

  1. we don't need all those patches,  we don't have those drivers loaded
  2. we're reducing downtime by not installing so many patches
  3. the less change, the less risk.

To these, I reply:

  1. If you don't need those drivers, then remove them them w/ pkgrm rather than leaving them in an unpatched state awaiting the introduction of new hardware or software to expose problems.  Minimization, not spotty patching, is the answer.  This is akin to disposing of an unused car, rather than simply leaving it unmaintained.
  2. Today, you should be using Live Upgrade and patching the alternate boot image to reduce downtime.  This allows machines in production to be safely patched, and will not leave the system in an inconsistent or unbootable state in the case of power failure during patching operations.  In the future, the new packaging system will always patch a clone of the current system to avoid the potential for disaster in case of power failure.
  3. Our experience has been that customers running all of the changes in an update are generally far less likely to experience problems than those who select only the fixes and features that appeal to them, and hope that our QA processes found all hidden dependencies on previous changes.

For our new packaging system, there is a powerful incentive to eliminate Dim Sum patching:  since we wish to use a single version numbering space for any package, attempting to support fine-grain Dim Sum patching would require very small packages - affecting the performance of packaging operations, and significantly increasing the workload of OpenSolaris developers.   Instead, we can set package boundaries according to what makes sense for minimization purposes. 

This implies that future (post Solaris 11) patches will be completely cumulative (aside from some exceptions for urgent security fixes), at least for the core OS.  Your system will be able to determine what is needed to bring the installed software up to the desired revision level automatically; needing to pick and choose patches will be a thing of the past. 


 

No Dim Sum Patching!


About

An engineer's viewpoint on Solaris...

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today