Wednesday Oct 12, 2011

Live Upgrade document updated and simplified

I forgot to let you know, but a couple of months ago, my colleagues, Don O'Malley and Ed Clark updated the Oracle Solaris Live Upgrade (LU) document describing the pre-requisites for Live Upgrade.

The original document was pretty convoluted and required several cups of strong coffee to parse.  The updated version is a little easier to understand, even without caffeine.

Thanks also to Beth Barrett, Rick Ramsey, and Jon Bowman who helped make this happen.

Thursday Jun 25, 2009

Heads up on Kernel patch installation issues with jumpstart or ZFS Root

I'd like to give you a heads-up on a couple of Kernel patch installation issues:

1. There was a bug (since fixed) in the Deferred Activation Patching functionality in a ZFS Root environment on x86 only.  See Sun Alert 263928.  An error message to the effect that a Class Action Script has failed to complete and failure to set up environment for Deferred Activation Patching may be seen.   The relevant CR is 6850329: "KU 139556-08 fails to apply on x86 systems that have ZFS root filesystems and corrupts the OS".    SPARC systems are similarly affected.  The following error message is returned:
mv: cannot rename /var/run/.patchSafeMode/root/lib/libc.so.1.20102 to /lib/libc.so.1: Device busy
ERROR: Move of /var/run/.patchSafeMode/root/lib/libc.so.1.20102 to dstActual failed
usage: puttext [-r rmarg] [-l lmarg] string
pkgadd: ERROR: class action script did not complete successfully

Installation of <SUNWcslr> failed.

This issue is fixed in patch in the Patch Utilities patch 119255-70 or later revision.

BTW: The principal reason ZFS Root support was implemented in Live Upgrade is so that patch application like this to the live boot environment would not be necessary.   With ZFS Root, creating a clone Boot Environment is so easy that there's no good reason not to.   This avoids the need to use technologies such as Deferred Activation Patching which attempt to make it safer to apply arbitrary change to a live boot environment, which is an inherently risky process.

2. There are reproducible issues using jumpstart finish scripts and other scenarios to install Kernel patch 137137-09 followed by Kernel patch 139555-08.   Here's the gist of the issue which I've pulled from an engineering email thread on the subject:

Issue 1: I have a customer whose system is not booting after applying the patch cluster with Live Upgrade (LU).

Solution 1: If using 'luupgrade -t', then you must ensure that latest version of LU patch is installed first, currently 121430-36 is currently the latest revision on SPARC, 121431-37 on x86. Once these patches are installed, LU will automatically handle the build of the boot archive when 'luactivate' is called, thus avoiding the problem.

Issue 2: There are other ways to get oneself into situations where a boot archive is out of sync: e.g. If using jumpstart finish scripts to apply patches that include 137137-09.  Basically any operation that involves patching to an ABE outside of 'luupgrade' will involve a manual build of boot-archive.

Solution 2: One must manually rebuild the boot-archive on the /a partition after applying the patches.  Otherwise once the system boots, the boot-archive will be out of sync.

Here's some more detail on the jumpstart finish script version of this: 

We've seen the same panic a few times when the latest patch cluster is applied via a finish script to a boot environment prior to  s10u6 via a jumpstart installation. It appears that the boot archive is out of sync with the kernel on the system. The boot archive was created from the 137137-09 patch and not updated after the 139555-08 kernel was applied, therefore the mismatch between the kernel and the boot archive.

In these instances updating the boot archive allows the system to boot successfully. Boot failsafe (ok boot -F failsafe) will detect an out of sync boot archive.  Execute the automated update then reboot.  This will now boot from the later kernel (139555-08) which successfully installed from the finish script.

I reproduced the problem in a jumpstart installation environment applying the latest 10_Recommended patch cluster from a finish script. The initial installation was S10U5 which is deployed from a miniroot that has no knowledge of a boot archive (my theory anyway).  This is similar to a live upgrade environment if the boot environment doing the patching is also boot archive unaware (meaning the kernel is pre 137137-09).

In the jumpstart scenario the immediate problem was solved by updating the boot archive by booting failsafe as previously described.  The Solution was to update the boot archive from the finish script after the patch cluster installation completed.  BTW, all patches in the patch cluster installed successfully per the /var/sadm/system/logs.finish.log.

In a standard jumpstart the boot device (install target) is mounted to /a, therefore adding the following entry to the finish script solved the problem:

/a/boot/solaris/bin/create_ramdisk -R /a

Depending on the finish script configuration, and variables the following would also work:

$ROOTDIR/boot/solaris/bin/create_ramdisk -R $ROOTDIR
Issue 3: This above issues are sometimes mis-diagnosed as CR 6850202: "bootadm fails to build bootarchive in certain configurations leading to unbootable system".

But CR 6850202 will only be encountered in very specific circumstances, all of which must occur in order to hit this specific bug, namely:

1. Install u6 SUNWCreq - there's  no mkisofs so we build ufs boot archive

2. Limit /tmp to 512M - thus forcing the ufs build to happen in /var/run

3. Have a separate /var - bootadm.c only lofs nosub mounts "/" when creating the alt root for DAP patching build of boot archive

4. Install 139555-08

You must have all 4 of above in order to hit this, i.e. step 4 must be installing a DAP patch such as a Kernel patch associated with a Solaris 10 Update such as 139555-08. 

Solution 3: Removing the 512MB limit (or whatever limit has been imposed) to /tmp in /etc/vfstab and/or adding SUNWmkcd (and probably SUNWmkcdS) so that mkisofs is available on the system is sufficient to avoid the code path that fails this way.

Booting failsafe and recreating the boot archive will successfully recreate the boot archive.

Here's further input from one of my senior engineers, Enda O'Connor:

If using Live Upgrade (LU), and LU on the live partition is up to date in terms of latest revision of the LU patch, 121430 (SPARC) and 121431 (x86), the boot-archive will be built automatically once users runs shutdown ( after luactivate to activate the new BE ).  This is done from a kill script in rcd.0.

If using a jumpstart finish script, or jumpstart profile to patch a pre-U6 image with latest kernel patches, then you need to run create_ramdisk from the finish script after all patching/packaging operations have been finished.  Alternatively, you can patch your pre-U6 miniroot to the U6 SPARC NewBoot level (137137-09), at which point the modified miniroot will handle the build of the boot_archive after the finish script has run.

If patching U6 and upwards from jumpstart, the boot_archive will get built automatically after finish script has run, so there's no issue in this scenario.

If using any home grown technology to patch or install/modify software on an Alternate Boot Environment ( ABE ), such as ufsrestore/cpio/tar for example, you must always run create_ramdisk manually before booting to said ABE.

Best Wishes,

Gerry.

About

This blog is to inform customers about patching best practice, feature enhancements, and key issues. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. The Documents contained within this site may include statements about Oracle's product development plans. Many factors can materially affect these plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. ~~~~~~~~~~~~ Gerry Haskins, Director, Software Lifecycle Engineer

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today