Thursday Oct 06, 2011

Walking in the shadows of giants

As I sit here in 22A on an American Airlines flight from San Francisco to O'Hare at the start of my 16 hour journey home to Ireland, I'm reflecting on some of the key Solaris 11 related events at Oracle OpenWorld this week.

For the first time in a couple of years, I got to spend the weekend in Northern California, having been here  last week for Solaris 11 planning meetings.  I went up to the Sierras to hug some Sequoias.  I'm not normally the tree-hugging type, but I make as exception for these giants.  I saw Mono Lake.  Cool.  Devil's Postpile.  Way Cool.  And the Sequoia National Park - it's truly amazing walking in the shadows of these giants.

As usual, Oracle OpenWorld and Jave One this week provided the opportunity to hear about bleeding edge technologies directly from their architects and to chat with them about the what and the why.

Markus Flierl (VP, Solaris Engineering) hosted a session on Monday with some of his key architects who have been developing Solaris 11 over the last 7+ years, including Liane Praza (IPS), Bart Smaalders (IPS), Darren Moffett (Security), Dan Price (Zones), and Mark Maybee (I/O).  It was great to hear these experts express their passion, ingenuity, and innovation.  They have a justifable parental sense of pride in Solaris 11.  Technologies which were bolt-ons in Solaris 10, or indeed far too disruptive to even be considered for release in a Solaris 10 Update, are tightly integrated and honed in Solaris 11.  Low latency (i.e. performance), scalability, security, availability, robustness, and diagnosability are all factors that customers have come to expect of Solaris.  Solaris 11 takes it to a whole new level.  Warp drive.

My colleague, Pete Dennis, and I have been working closely with Bart, Liane, David Comay, and others to ensure that IPS fully meets the needs of our customers' maintenance lifecycle.  They've listening to us and subtly tweaked and adapted their implementations where necessary to fully meet customers' maintenance lifecycle needs.  Working with geniuses is great.  Working with geniuses who are prepared to listen and adapt is truly wonderful.

But what really blew me away this week was a presentation by Nicolas Droux last night on Network Virtualization in Solaris 11.  Some of you may know about earlier incarnations of this, codenamed Project "Crossbow".  But the fleshing out of the capabilities in Solaris 11 is truly amazing.  The ability to have virtualized NICs (VNICs), virtualized LANs (VLANs), Zones which act as virtualized switches, Zones which act as virtualized firewalls, fully segregated data "Lanes", "Flows", etc., etc., and all with diagnosability built in with new utilities such as 'dlstat' (Data link stats), 'flowstat', etc.  I hadn't met Nicolas before but wow!  Not only is Nicolas a key architect, he has an amazing ability to explain it with crystal clarity in a really easy to understand manner.  As I said to the Product Manager, Joost Pronk, we've got to video Nicolas giving this talk once Solaris 11 ships so that the world can see it.  

At the end of Nicolas's presentation, Thierry Manfe showed how he is leveraging Network Virtualization in Oracle Solaris's cloud infrastructure provided to enable ISVs to test their apps with complete data integrity and segregation.  You can sign up for this, it's available now.  "Solaris 11. #1 for Clouds" isn't just some Marketing hype. It's true.

I'm walking in the shadow of giants.  And it's a wonderful feeling.

Roll on Solaris 11.  It won't be long now and I really can't wait.  It's amazing.  Big time!

Thank you to the 90+ of you who attended Pete Dennis, Isaac Rozenfeld, and my presentation on Solaris 11 Customer Maintenance Lifecycles, policies, and best practices.  If you missed it, there'll be another chance to catch an updated version with more technical content at DOAG (the German Oracle Users Group) conference in Nuremberg, Germany in November (see previous posting for details).

Finally, I'd like to pay my respects to a true giant of our industry, Steve Jobs.  Gone way too soon.  RIP Steve.  You'll be missed.  Big time!

Best Wishes,


Disclaimer: Any forward looking statements in this posting are subject to the vagueries of my Crystal ball, possible hallucinations, and lack of coffee.  You get the drift. 

Thursday Jun 25, 2009

Heads up on Kernel patch installation issues with jumpstart or ZFS Root

I'd like to give you a heads-up on a couple of Kernel patch installation issues:

1. There was a bug (since fixed) in the Deferred Activation Patching functionality in a ZFS Root environment on x86 only.  See Sun Alert 263928.  An error message to the effect that a Class Action Script has failed to complete and failure to set up environment for Deferred Activation Patching may be seen.   The relevant CR is 6850329: "KU 139556-08 fails to apply on x86 systems that have ZFS root filesystems and corrupts the OS".    SPARC systems are similarly affected.  The following error message is returned:
mv: cannot rename /var/run/.patchSafeMode/root/lib/ to /lib/ Device busy
ERROR: Move of /var/run/.patchSafeMode/root/lib/ to dstActual failed
usage: puttext [-r rmarg] [-l lmarg] string
pkgadd: ERROR: class action script did not complete successfully

Installation of <SUNWcslr> failed.

This issue is fixed in patch in the Patch Utilities patch 119255-70 or later revision.

BTW: The principal reason ZFS Root support was implemented in Live Upgrade is so that patch application like this to the live boot environment would not be necessary.   With ZFS Root, creating a clone Boot Environment is so easy that there's no good reason not to.   This avoids the need to use technologies such as Deferred Activation Patching which attempt to make it safer to apply arbitrary change to a live boot environment, which is an inherently risky process.

2. There are reproducible issues using jumpstart finish scripts and other scenarios to install Kernel patch 137137-09 followed by Kernel patch 139555-08.   Here's the gist of the issue which I've pulled from an engineering email thread on the subject:

Issue 1: I have a customer whose system is not booting after applying the patch cluster with Live Upgrade (LU).

Solution 1: If using 'luupgrade -t', then you must ensure that latest version of LU patch is installed first, currently 121430-36 is currently the latest revision on SPARC, 121431-37 on x86. Once these patches are installed, LU will automatically handle the build of the boot archive when 'luactivate' is called, thus avoiding the problem.

Issue 2: There are other ways to get oneself into situations where a boot archive is out of sync: e.g. If using jumpstart finish scripts to apply patches that include 137137-09.  Basically any operation that involves patching to an ABE outside of 'luupgrade' will involve a manual build of boot-archive.

Solution 2: One must manually rebuild the boot-archive on the /a partition after applying the patches.  Otherwise once the system boots, the boot-archive will be out of sync.

Here's some more detail on the jumpstart finish script version of this: 

We've seen the same panic a few times when the latest patch cluster is applied via a finish script to a boot environment prior to  s10u6 via a jumpstart installation. It appears that the boot archive is out of sync with the kernel on the system. The boot archive was created from the 137137-09 patch and not updated after the 139555-08 kernel was applied, therefore the mismatch between the kernel and the boot archive.

In these instances updating the boot archive allows the system to boot successfully. Boot failsafe (ok boot -F failsafe) will detect an out of sync boot archive.  Execute the automated update then reboot.  This will now boot from the later kernel (139555-08) which successfully installed from the finish script.

I reproduced the problem in a jumpstart installation environment applying the latest 10_Recommended patch cluster from a finish script. The initial installation was S10U5 which is deployed from a miniroot that has no knowledge of a boot archive (my theory anyway).  This is similar to a live upgrade environment if the boot environment doing the patching is also boot archive unaware (meaning the kernel is pre 137137-09).

In the jumpstart scenario the immediate problem was solved by updating the boot archive by booting failsafe as previously described.  The Solution was to update the boot archive from the finish script after the patch cluster installation completed.  BTW, all patches in the patch cluster installed successfully per the /var/sadm/system/logs.finish.log.

In a standard jumpstart the boot device (install target) is mounted to /a, therefore adding the following entry to the finish script solved the problem:

/a/boot/solaris/bin/create_ramdisk -R /a

Depending on the finish script configuration, and variables the following would also work:

$ROOTDIR/boot/solaris/bin/create_ramdisk -R $ROOTDIR
Issue 3: This above issues are sometimes mis-diagnosed as CR 6850202: "bootadm fails to build bootarchive in certain configurations leading to unbootable system".

But CR 6850202 will only be encountered in very specific circumstances, all of which must occur in order to hit this specific bug, namely:

1. Install u6 SUNWCreq - there's  no mkisofs so we build ufs boot archive

2. Limit /tmp to 512M - thus forcing the ufs build to happen in /var/run

3. Have a separate /var - bootadm.c only lofs nosub mounts "/" when creating the alt root for DAP patching build of boot archive

4. Install 139555-08

You must have all 4 of above in order to hit this, i.e. step 4 must be installing a DAP patch such as a Kernel patch associated with a Solaris 10 Update such as 139555-08. 

Solution 3: Removing the 512MB limit (or whatever limit has been imposed) to /tmp in /etc/vfstab and/or adding SUNWmkcd (and probably SUNWmkcdS) so that mkisofs is available on the system is sufficient to avoid the code path that fails this way.

Booting failsafe and recreating the boot archive will successfully recreate the boot archive.

Here's further input from one of my senior engineers, Enda O'Connor:

If using Live Upgrade (LU), and LU on the live partition is up to date in terms of latest revision of the LU patch, 121430 (SPARC) and 121431 (x86), the boot-archive will be built automatically once users runs shutdown ( after luactivate to activate the new BE ).  This is done from a kill script in rcd.0.

If using a jumpstart finish script, or jumpstart profile to patch a pre-U6 image with latest kernel patches, then you need to run create_ramdisk from the finish script after all patching/packaging operations have been finished.  Alternatively, you can patch your pre-U6 miniroot to the U6 SPARC NewBoot level (137137-09), at which point the modified miniroot will handle the build of the boot_archive after the finish script has run.

If patching U6 and upwards from jumpstart, the boot_archive will get built automatically after finish script has run, so there's no issue in this scenario.

If using any home grown technology to patch or install/modify software on an Alternate Boot Environment ( ABE ), such as ufsrestore/cpio/tar for example, you must always run create_ramdisk manually before booting to said ABE.

Best Wishes,


Thursday Dec 04, 2008

Patching enhancements and other stuff

New title, same role, same me

I was promoted to Director, Software Patch Services in September.  The last couple of months have been quite hectic, as I've suddenly got a whole new bunch of buddies in Marketing and elsewhere who want some of my time.  That's a good thing, and I believe it will help me to drive and co-ordinate improvements for you, our customers, patching experience. 

Resources are limited and, as always, I'm interested in getting your thoughts as to what areas I should concentrate on next.  

Some of the stuff we're currently working on is outlined below as well as other information which I hope you will find useful.

Solaris 10 10/08 Patch Bundle

The Solaris 10 10/08 Patch Bundle, which delivers the equivalent set of patches to the Solaris 10 10/08 (Update 6) release image, is now available from SunSolve.  See my blog entry below on the Solaris 10 5/08 (Update 5) Patch Bundle for further information on why we produce it, what it contains, why you might wish to use it, how to download it, etc.

Recommended and Sun Alert patch cluster contents updated

I discussed the purpose of, and difference between, the Solaris Recommended and Sun Alert patch clusters in a previous blog posting. To recap:

The "Recommended" Cluster contains the latest revision of any Solaris OS patch which addresses a Sun Alert issue.  That is, a fix for a Security, Data Corruption, or System Availability issue.  The cluster also contains the latest revision of the patch utility patches to ensure correct patch application and any patch required by any other patch in the cluster.

The Sun Alert Cluster is newer, and contains the minimum revision of any Solaris OS patch which addresses a Sun Alert issue. The cluster also contains the latest revision of the patch utility patches to ensure correct patch application and any patch required by any other patch in the cluster.  Therefore, the Sun Alert Cluster provides the minimum amount of change to fix all Solaris OS Sun Alert issues. 

Both clusters are updated whenever a new patch meeting their inclusion criteria is released.  The Sun Alert Cluster changes less frequently than the "Recommended" Cluster as it contains only what is really needed to address Sun Alert issues and apply the patches.

One of my team members has been reconciling the cluster contents against the Sun Alert reports and the cluster contents have been updated as a result.  Some issues where found, largely to do with patches for things like GNOME which are also part of the Solaris OS.  A process has been put in place to ensure the cluster contents match the patches specified in the Sun Alert reports.   

Keeping as up to date as possible with the SunAlert or Recommended Cluster contents is advisable.   Remember also to keep firmware up to date.

BTW: The monthly EIS (Enterprise Installation Standards) patch baseline is based upon the Recommended Cluster contents but also includes ca. 150 additional patches to address irritants which are not Sun Alert fixes and includes patches for SunCluster, SunVTS, etc.  The monthly EIS patch baselines are available through xVM Ops Center and Sun Proactive Services.

I am planning to merge the Recommended and Sun Alert patch clusters into a single cluster using the Sun Alert cluster criteria as having two very similar clusters tends to confuse customers unnecessarily.  

I also intend to merge the two cluster pages on SunSolve as one is essentially a better formated subset of the other. 

ZFS and Zones features fully contained in patches

As I've mentioned previously, there's effectively a single customer visible code branch for each Solaris named release.  That means that there's one set of patches for all of Solaris 10, a separate set for Solaris 9, and a separate set for Solaris 8.  Within a named release, e.g. Solaris 10, the same set of patches will apply to any of the Solaris 10 releases, from the original Solaris 10 3/05 release right up to the current Solaris 10 10/08 (Update 6) release.  This simplifies System Administration and enables Sun to provide very long term support at reasonable cost for each Solaris named release. 

A consequence of effectively having a single code branch for each Solaris named release is that any change to pre-existing packages will be delivered in patch format.

New features are typically only added to the current Solaris named release, which is currently Solaris 10.  (They are also available via OpenSolaris.)

This means that if new features don't add any new packages, then the entire feature functionality is fully available in patches.  Customers can utilize the new features by simply applying the appropriate patches to their existing Solaris 10 system.  This is the case with all current Zones and ZFS\* functionality, including neat features like ZFS Root, ZFS Boot, and Zones "Update on Attach".

Other features which deliver new packages are only available from the Solaris Update release in which they were first included.  So, for example, if a new package was first delivered in Solaris 10 8/07 (Update 4), then a customer wishing to use that feature would need to install or upgrade to the Solaris 10 8/07 (Update 4) or subsequent update release image.   Such features are not available in patches.

\*OK, we cheated with ZFS.  ZFS does deliver new packages, but they are streamed into existence from a patch.  This type of patch is called a "genesis" patch, but they are hard to perfect, so we don't intend to release any more "genesis" patches.

Improving Zones Patching Performance

Zones Parallel Patching

My team has been working with those awfully nice folks in the Sustaining organization to deliver a Zones Parallel Patching enhancement to the patch utilities to dramatically improve Zones patching performance.  We have a fully stable prototype which has been given to selected Beta customers to trial. 

For a simple T2000 with 5 sparse non-global zones, the performance improvement is >3x.  On systems with optimized I/O (as Zones patching is primarily I/O bound), we expect the performance improvement to be even better.  A configuration file will allow users to select how many Zones to patch in parallel.  This will typically equate to the number of processors or threads available on the target system.

The general release of this feature is planned for April 2009.

Zones "Update on Attach" 

The Kernel patch associated with Solaris 10 10/08 (Update 6), 137137-09 (SPARC) / 137138-09 (x86) contains some cool new features, such as ZFS Root, ZFS Boot, and Zones "Update on Attach".  Beware, installing this patch requires significant free disk space to install!  See Sun Alert

Zones "Update on Attach" is a very cool feature indeed.

For example, if the patch level of non-global Zones is out-of-sync with respect to the global Zone, e.g. because the non-global Zones ran out of disk space during patch application, Zones "Update on Attach" provides a very neat way to bring the Zones back into sync.  Simply detach the affected non-global Zones, apply Kernel patch 137137-09 (SPARC) / 137138-09 (x86) to the global zones, and reattach the affected non-global Zones using 'zoneadm -z <zone-name> attach -u'.  The non-global Zones will be automagically updated to the same patch level as the global Zone.  Neat!

There are other interesting possibilities.  For example, detach all non-global Zones, apply an arbitrary set of patches to the global Zone (including 13713[78]-09), and reattach the non-global Zones using 'zoneadm -z <zone-name> attach -u'.  Viola!, the non-global Zones will be automagically updated with all of the patches applied to the global Zone.  Way neat!  And more importantly, way faster than even the Zones Parallel Patching solution we're working on.  And even better, it's available now!  This could be a key solution for customers having difficulty completing patching updates on Zones systems during tight maintenance windows.

We are working to explore potential caveats.  For example, when a patch is applied using 'patchadd' to a non-global zone, an "Undo.Z" file containing the data necessary to back out the patch is created specifically for each non-global zone to which the patch is applied.   Using Zones "Update on Attach" to patch non-global Zones will cause the "Undo.Z" file from the global Zone to be propagated to the non-global Zones.  This could theoretically cause issues if the patch is subsequently backed out (e.g. data from global Zone config files could potentially be merged into non-global Zone config files during patch backout which could potentially cause issues), although we've never actually encountered such an issue.  BTW: The same caveat applies to creating non-global Zones after the global Zone has been patched.  Again, we have yet to see this causing an actual issue, so it appears to be more of a theoretically caveat than a practical issue.

Improvements to 'smpatch' and Update Manager

The way the PatchPro analysis engine for 'smpatch' and Update Manager used to work was fine in theory, but in practice was what I call "a process with too many moving parts".   Too many steps had to happen correctly for the overall result to be correct.  In Six Sigma terms, there was too much error opportunity.  Occasionally, it would end up recommending a SPARC patch for an x86 system or a Solaris 8 patch for a Solaris 10 system.  Not surprisingly, its reputation suffered.

I'm pleased to say that a major overhaul to dramatically simplify the back end processing of 'smpatch' and Update Manager has just been rolled out by their engineering team.  The way 'smpatch' and Update Manager work is that Realization Detector(s) are associated with each patch.  These Realization Detectors determine whether it's appropriate to recommend a patch for application on a target system.  In the vast majority of cases, the Realization Detectors are simply comparing the packages contained in the patch to the packages installed on the system to see if the patch is applicable.  The enhancement is to replace these myriad Realization Detectors, which could potentially contain coding bugs, with a single Generic Realization Detector to map patch packages to packages on the target system.  It looks at the package name, package version, and package architecture fields (in pkginfo) for each package in the patch, and compares them to the same values for the packages installed on the target system.  If they match, the patch is recommended, else not.  Guess what, this is exactly how 'patchadd' decides whether a patch is applicable or not when installing a patch.  It's also how 'pca' works too in determining which patches to apply.

A few specialist Realization Detectors remain for a small number of patches which require special handling.

The changes to 'smpatch' and Update Manager should dramatically improve the reliability of these tools and the accuracy of their patching recommendations.

One remaining distinction between 'smpatch' / Update Manager and 'pca' is that 'pca' "knows" about all current Sun patches via the patchdiag.xref file, whereas 'smpatch' / Update Manager "knows" about all patches containing a 'patchinfo' file, including older patch revisions.  All Solaris OS and Java Enterprise System (middleware) patches contain a 'patchinfo' file.  These account for 49% of patches.  For patching the Solaris OS, the tools should produce similar results.  A decision was made not to "auto-include" all other patches for 'smpatch' and Update Manager, as it was felt that the explicit step of the patch creator including a non-blank PATCH_CORRECTS realization detector specification line in the 'patchinfo' file to signal that the patch was suitable for patch automation was potentially useful.  (Don't worry about what value the PATCH_CORRECTS field has.  This is overriden by the Generic Realization Detector in the vast majority of cases.  It has no meaning from a customer perspective.)

This enhancement is not an attempt to undermine 'pca'.  It's simply to improve 'smpatch' and Update Manager.  I will continue to work closely with Martin Paul to give him heads-ups on any initiative which may impact 'pca' and resolve any issues with patchdiag.xref.

One thing I want to do when I can free up some resources, is a comparative study of the patching recommendations of the various available patch automation tools, 'smpatch' / Update Manager, 'pca', UCE (a.k.a Sun Connection Satellite),  xVM Ops Center\*, and TLP (Traffic Light Patching) which is used by Sun Proactive Services to provide tailored patching solutions for customers in conjunction with SRAS (Sun Risk Analysis Service) and the EIS (Enterprise Installation Standards) methodology, with a view to ensuring that the patching recommendations of the various tools are coherent and consistent, with the higher value tools providing more sophisticated analysis.  It's part of my efforts to co-ordinate patching improvements to improve our customers' patching experience.

\*xVM OC also utilitizes the monthly EIS patch "baselines".

Same Patch Entitlement policy, new Patch Entitlement implementation

Solaris changed its business model a few years ago from selling Solaris and providing patches for free to a model of giving away the software releases for free and charging for patches. 

The policy is that patches delivering new security fixes will remain free to all customers, irrespective of whether or not they have a support contract, but most other patches require that customers have a valid support contract to access them.  (See my earlier blog entry on the subject.)

All fixes will all be available for free in the next Solaris Update release (and OpenSolaris), so customers not willing to pay for a support contract can still get the fixes by installing or upgrading to the next Solaris Update release.  They'll just need to wait for it to ship.  Alternatively, they can use OpenSolaris.

This policy is not changing.

What is changing is the implementation of patch entitlement to ensure it matches the policy.  Currently, circa 60% of Solaris patches are free, including most of the key patches.  Under the new entitlement implementation, 18% of Solaris patches will remain free, including the specific revision of all Solaris patches which include new security fixes.  The rest will require a valid support contract to access. 

Any of the following support contracts will provide access to all Solaris patches and patch clusters: a Solaris subscription, a Software Support Contract, a Sun System Service Plan for Solaris, a Sun Spectrum Storage Plan, or a Sun Spectrum Enterprise Service Plan.  Since the names of the support contracts change from time-to-time, this list may change.

The new implementation will roll out in Phases, starting this month.  The roll-out should be transparent to customers with valid support contracts.

Patch signing certificate renewal

The signing certificate used to sign Sun patches expires shortly.  A new signing certificate will be rolled out in January and instructions provided on how to adopt it.

Customers who download the unsigned patch versions will not need to take any action.

"Accumulation-only" patches

The "SplitGate" source code management model we first introduced in Solaris 10 8/07 (Update 4) has dramatically improved Solaris 10 patch quality.  A side-effect of the "SplitGate" model is that base PatchIDs (the first 6 digits) change at the end of each Update release.  See my earlier Solaris 10 Kernel PatchID Sequence posting.

In the "SplitGate" model, when building an Update release, we effectively have two parallel source code gates, one called the Sustaining Gate containing just the bug fixes we need to release to customers in patches asynchronous to the Update release, and the other called the Update Gate containing a superset of the the Sustaining Gate and as well as new features and less critical bug fixes which will be released as part of the Update release. 

The two gates remain separate (split) for the duration of the Update release build process.  Once the Update release has reached release quality, the Update Gate is promoted to become the new Sustaining Gate and the process repeats.  Since the Update Gate is always a strict superset of the Sustaining Gate, no regressions should result from the promotion of the Update Gate to become the new Sustaining Gate.  Each patch in the old Sustaining Gate is obsoleted by a corresponding patch from the Update Gate which has accumulated its contents.  When the Update is released, these new PatchIDs are released to SunSolve.  This is why you see the base PatchIDs changing after each Update release. 

If the Update Gate patch doesn't contain any additional code changes over the corresponding Sustaining Gate patch, then there's no need for customers to install the new Update Gate patch.  Such patches are called "accumulation-only" patches and can be identified as they have a different base PatchID (the first 6 digits) but don't contain any additional CR numbers over the Sustaining patch which they obsolete.

The reason Sun releases these "accumulation-only" patches is because some customers insist that all of the PatchIDs pre-applied into a Solaris Update release image be also available from SunSolve.


This blog is to inform customers about patching best practice, feature enhancements, and key issues. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. The Documents contained within this site may include statements about Oracle's product development plans. Many factors can materially affect these plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. ~~~~~~~~~~~~ Gerry Haskins, Director, Software Lifecycle Engineer


« April 2014