By Gerry Haskins-Oracle on Sep 24, 2015
Interesting blog post from Steve Duplessie which is well worth reading.
Steve, you're not losing your mind. You're just seeing the light!
A nugget I took from a management class I attended years ago is "You can't manage what you don't measure". Meaning that without good operational metrics in place, it's impossible to accurately measure the impact of process improvements and other initiatives. Therefore, I've always been a big fan of good metrics.
My colleague, statistician, and 110.5% good guy, Chuck Malovrh, has been crunching data on the impact of Oracle Installation Services on customer issue rates.
The results are very interesting indeed.
Nerdy bit: Normalizing metrics so we're comparing like-with-like is the difficult bit. So Chuck focused on M6 installs, as it's a relatively new system (minimize data staleness / hardware changes / wear-and-tear factors), yet powerful (people don't buy M6's unless they intend to run them hard - very high load, typically with high performance & redundancy requirements). By focusing on all instances of a particular system model, we eliminate variabilities between system types, Geographical factors, Customer experience factors, etc. Chuck normalized results by looking at the Service Request rate per 1,000 days of server operation. That is, how many issues did customers report per 1,000 days - so let's call our unit of measurement Software Service Requests (SRs) per Kilo System Service Days (KSSD*). "Bugged SRs" are Service Requests for which an actual Bug was identified, as opposed to say a configuration issue or user error.
Less issues are obviously better, both for customers and for Oracle.
This is the win-win for which we strive.
| Oracle Installation Service
|| Percentage of Total
|| Number of Software SRs per KSSD*
|| Number of Bugged SRs per KSSD*
| Hardware Only
| Hardware & Software
From the table we can clearly see that Oracle Installation Services have a dramatic impact on reducing the number of Customer Software Service Requests and Bugs encountered.
Oracle's Hardware Installation services leverages Oracle's Enterprise Installation Standards (EIS) which is a mature, tried-and-trusted best practice process, including recommended software and configuration checklists, so it's not a surprise that Hardware Installation Services have a positive impact on customers subsequent operational experience.
It should also come as no surprise that the additional Best Practice Software Installation and Configuration available with the Oracle Software Installation Service have a significant positive impact of further reducing the number of issues which customers subsequently experience - over 70% fewer issues and 60% fewer bugs per 1,000 days of subsequent operation than customers who didn't purchase an Oracle Installation Service.
These are really impressive figures.
I have long felt that customers' lifecycle experience is intrinsically linked to the quality of the initial installation and configuration. Once a sub-optimal configuration is deployed to production, it's usually impossible to take it back out of production and reconfigure it properly. Instead, it becomes a question of mitigating issues over the System's lifecycle, which is unlikely to result in an optimal customer experience.
That's why my team and I have been so focused on Best Practice Installation and Configuration over the last number of years, including the development of the Installation and Configuration utilities for SuperCluster Engineered Systems.
But this is the first time I've had reliable metrics on generic systems to back up my gut feel.
So next time you've purchasing servers, please consider purchasing Oracle Installation Services for them too.
It'll save you time and money in the long run and save us all from dealing with unnecessary issues once the servers are deployed to production.
Nice work Chuck!
Just cross-posting some interesting information on forthcoming Solaris 11.3 features as well as newly available FOSS packages for evaluation on Solaris 11.2:
New Program: FOSS Evaluation Packages for Solaris 11.2
The new ORAchk release 188.8.131.52.4 is now available to download.
New in this release, if ORAchk is older than 120 days and a newer version is not available locally it will check to see if a newer version is available on My Oracle Support and automatically download and upgrade.
Download of latest version directly from My Oracle Support can also be specifically triggered with “./orachk –download”.
If ORAchk is running in automated mode the daemon will automatically upgrade from local location defined by RAT_UPGRADE_LOC just before the next scheduled run. Email notification will be sent about the upgrade then ORAchk will continue with the scheduled run using the upgraded version, all without requiring you to restart the ORAchk daemon.
ORAchk 184.108.40.206.4 now brings wider and deeper support throughout the Oracle product stack, with newly added support for the following product areas:
See Document 1268927.2 for further details of the new product support.
This release of ORAchk adds new checks for some of the most impactful problems seen to Oracle Customer Support specifically in the areas of:
For more details and to download the latest release of ORAchk see Document 1268927.2
A customer once said to me that "bad news, delivered early, is relatively good news, as it enables me to plan for contingencies".
That need to manage expectations has stuck with me over the years.
And in that spirit, we issue Docs detailing known issues with Solaris 11 SRUs (Doc ID 1900381.1) and Solaris 10 CPU patchsets (Doc ID 1943839.1).
Many issues only occur in very specific configuration scenarios which won't be seen by the vast majority of customers.
A few will be subtle issues which have proved hard to diagnose and hence may impact a number of releases.
But providing the ability to read up on known issues before upgrading to a particular Solaris 11 SRU or Solaris 10 CPU patchset enables customers to make more informed and hence better decisions.
BTW: The Solaris 11 Support Repository Update (SRU) Index (Doc ID 1672221.1) provides access to SRU READMEs summarizing the goodness that each SRU provides. (As do the bugs fixed lists in Solaris 10 patch and patchset READMEs.)
For example, from the Solaris 11.2 SRU10.5 (220.127.116.11.0) README:
Why Apply Oracle Solaris 18.104.22.168.0
Oracle Solaris 22.214.171.124.0 provides improvements and bug fixes that are applicable for all the Oracle Solaris 11 systems. Some of the noteworthy improvements in this SRU include:
- Bug fix to prevent panics when using zones configured with exclusive IP networking, and DR has been used to add and remove CPUs from the domain (Bug 19880562).
- Bug fix to improve NFS stability when under stress (Bug 20138331).
- Bug fix to address the generation of FMA events on the PCIEX bus on T5-2 (Bug 20245857).
- Bug fix to improve the performance of the
zoneadm listcommand for systems running a large number of zones (Bug 20386861).
- Bug fix to remove misleading warning messages seen while booting the Oracle VM Server for SPARC guests (Bug 20341341).
- Bug fix to address NTP security issues, which includes the new slew always mode for leap second processing (Bug 20783962).
- OpenStack components have been updated to Juno. For more information, see OpenStack Upgrade Procedures.
- The Java 8, Java 7, and Java 6 packages have been updated. For more information, see Java 8 Update 45 Release Notes, Java 7 Update 80 Release Notes, and Java 6 Update 95 Release Notes.
Time is money.
I remember my first unplanned downtime as a Sys Admin on-site at a major Aluminum Mill in up-state New York. The Operations Manager was literally poking me in the back of the neck asking me "Don't you know downtime costs us $250,000 per hour ? How long will it take to get back up ?", to which I replied "It'll be faster if you stop poking me in the neck!". I had the Systems back up in 20 minutes.
For Solaris and other Oracle Sun products, we try to release bug fixes as fast as possible, balancing the need for speed with the need for quality.
Since an Operating System performs many disparate functions for many disparate workloads, testing that a fix isn't toxic in any supported scenario is complex and takes time.
But we can and do provide faster relief to the customer(s) who raised the specific issue as it's easier to ensure the fix is correct for their specific environments.
We do this by supplying Interim Diagnostics and Relief (an IDR). As the name suggests, it provides relief for the issue until the final fix is available in a Support Repository Update (SRU) or Solaris Update release (for example, Solaris 11.3). For hard to diagnose issues, an IDR may also provide additional diagnostic instrumentation to get to the root cause of an issue.
Like many things in Solaris 11, the IDR mechanism is far smoother thanks to the Image Packaging System (IPS) than it was in Solaris 10 and earlier releases.
SRUs for Solaris 11 and patches for Solaris 10 are released on a monthly cadence. These are tested as a unit to ensure quality.
In Solaris 11, IDRs are automatically superseded by later SRUs or Solaris Updates which include fixes for all the bugs the IDR addresses. An IDR terminal package is included in the SRU Repo for superseded IDRs. This tells IPS it's OK to overwrite the IDR on the target system. Therefore, it is no longer necessary to manually remove such IDRs before updating to a later SRU or Solaris Update.
This automatic superseding typically saves customers the need for an additional reboot, since it's no longer necessary to remove an IDR, reboot, apply an SRU, reboot. Instead, simply 'pkg update' to the desired SRU, reboot once to activate it, and you're done.
If the issues addressed by an IDR are not yet fixed in the later SRU or Solaris Update, IPS will warn the user and a Service Request (SR) should be filed requesting a new IDR at the later software version for the outstanding issues.
Normally, IDRs are provided to the specific customers who have filed Service Requests (SRs) for a specific bug.
To accelerate the release of fixes for public security vulnerabilities, we intend to release Security IDRs to the SRU Repo and My Oracle Support (MOS) so that all customers can get relief from such vulnerabilities quicker. Customers should continue to file Service Requests (SRs) for such bugs, so we know there's demand for a Security IDR.
These security fixes will be included into the next SRU to be released, which will automatically obsolete the Security IDRs, so customers need have no concern about installing such Security IDRs in advance of the SRU being available. The Security IDR simply provides a faster delivery mechanism.
As mentioned in a previous post, there's now a security Critical Patch Update (CPU) package which can be installed and updated on Solaris 11 systems to provide all available Criticial Vulnerabilities and Exposures (CVE) security fixes in the minimum amount of change to satisfy security compliance requirements. This package automagically pulls in the security fixes via IPS dependencies.
There are also significant new security compliance features in Solaris 11.2.
Also in Solaris 11.2 is support for a new Package Group install option: solaris-minimal-server, which provides the minimum useful bootable environment. Use this and install additional packages as required to support your applications. This is useful for security compliance as if the vulnerable software isn't installed, you ain't vulnerable, and you don't need to expend unnecessary time and effort applying fixes.
There's lots of other new stuff in Solaris 11.2 including Open Stack and the Oracle 12c Database Prerequisite Package. Check it out!
My hard working colleagues, Darrin Johnson, Darrell May and team, have been working diligently to dramatically improve the robustness of the iSCSI / iSER implementation in Solaris 11 and ZFSSA (AK).
Darrell's published Doc 1989174.1 available from MOS to provide a handy list of what's fixed where.
The current recommendation is to update to Solaris 11.2 SRU8 and AK 2013.1.3.5 to get the latest and greatest iSCSI stack related fixes.
The new ORAchk release 126.96.36.199.3 is now available to download.
ORAchk now supports upgrade checks for 188.8.131.52, enabling you to do pre and post upgrade checking for Oracle Database 184.108.40.206 to avoid the most common upgrade problems.
ORAchk now supports ASM checks and patch recommendations for single instance databases as well as the already supported RAC instances.
ORAchk can now query details related to the OS resource consumption of different GoldenGate processes, identifying any components using excessive resources. It also identifies if GoldenGate is configured to avoid known performance problems.
Oracle Enterprise Manager Agent support has now been added to the existing support for Enterprise Manager Repository checks. The agent checks now appear in a new “Enterprise Manager” section of the report. With the new EM 12c Agent checks, you will quickly identify common EM Agent configuration mistakes that if undetected can result in poor performance or a failure to run the Agent process.
There are certain things ORAchk can only do a partial check for, where a complete check requires information outside the scope of the machine or that require other customer specific knowledge. These partially identified checks now appear in the new section marked “Findings needing further review”. If you review these checks and verify they are not problematic you can choose to exclude them in the same way as any other checks, see Document 1268927.2 for further details.
The ORAchk Health Score calculation has been improved with version 220.127.116.11.3. Certain INFO level checks, which only communicate best practice and do not confirm a problem in your environment, no longer deduct points from your health score. This means that if you follow the recommended advice for excluding any non relevant “Findings needing further review” then a health score of 100 is now obtainable.
This release of ORAchk adds new checks for some of the most impactful problems seen to Oracle Customer Support specifically in the areas of:
For more details and to download the latest release of ORAchk see Document 1268927.2
Now, you have a chance to try it out yourself, with the Software In Silicon cloud.See John Soat's Proof Of Concept article for details.
I'm delighted to report that my hard working colleagues, Darren Moffat and Pete Dennis, have released the Solaris 11 Critical Patch Update package to make it easier for you to install and track fixes for Criticial Vulnerabilities and Exposures (CVE).
Once you've installed the package (pkg install solaris-11-cpu), applying all available Solaris fixes for CVE is now as simple as:
# pkg update solaris-11-cpu
See Darren's blog and MOS doc 1948847.1 for details.
Now that's a nice Thanksgiving present!
Since this is security related, this post will self-destruct in 5 seconds.
My ORAchk colleagues have asked me to post the following:
The new ORAchk release 18.104.22.168.1 is now available to download.
ORAchk release versioning now aligns with and follows the same format used by the Oracle 12c Database Patch Set Updates (PSUs); this version is 22.214.171.124.1, the next will be 126.96.36.199.2.
It’s also now even easier to update ORAchk across multiple machines.
ORAchk is now supported on Windows when run within a Cygwin environment. Instructions for configuring Cygwin can be found from Document 1268927.2. ORAchk now includes hundreds of database and application checks which will run on Windows. There are even more Windows specific checks in the pipeline.
You no longer need to have different users execute different ORAchk profiles to workaround your company’s implementation of role separation. ORAchk can now be run once as root to execute all checks. Prior to executing checks that do not require root access, ORAchk will switch user to the lower level accounts.
When running against multiple databases, ORAchk can now run database checks in parallel meaning it takes a fraction of the time to complete execution. Parallel database execution is now the default. It can be turned off, if you prefer to run checks serially.
Quickly find out what has changed on your system between two ORAchk runs. When ORAchk is run with the –diff command it will now not only compare check results but collection data too. Quickly compare and understand differences in kernel parameters or database initialization parameters.
ORAchk support for EBS has been enriched and broadened, with even more checks for Oracle Payables (R12) and Oracle Workflow and now with release 188.8.131.52.1 introduces new support for Oracle Order Management (R12) and Oracle Process Manufacturing (R12).
For more details and to download the latest release of ORAchk see Document 1268927.2
SRUs, Patches, and IDRs (Interim Diagnostics & Relief) are available from My Oracle Support, support.oracle.com for all supported Solaris releases to address the recent critical bash vulnerabilities, CVE-2014-6271, CVE-2014-7169.
Newer IDR revisions are available on MOS which additionally address the less critical "mop up" vulnerabilities, CVE-2014-7186, CVE-2014-7187. Patches and SRUs will follow for these too.
See MOS Doc ID 1930090.1 for details.
Many thanks to the folks around the globe who have been working tirelessly over the last 48 hours to code, test, and release these SRUs, patches, and IDRs - from Australia to India to the Czech Republic to Ireland and the US.
I sincerely apologise for the delay in proactively communicating these fixes to you. That was outside of my control.
Solaris 11.2 is released!
There's a huge amount of new and improved features in Solaris 11.2 as well as thousands of bug fixes. In short, it's our best Solaris ever!
For security conscious customers, Solaris 11.2 delivers significant compliance enhancements (see the docs) and provides the new "solaris-minimal-server" Install group, which is an excellent basis for installing secure, minimized (hardened) systems.
Hardening (minimizing) a system in Solaris 10 and earlier was as much an art form as a science. It was hard to be sure that the system was as minimized as possible.
In Solaris 11.2, the "solaris-minimal-server" Install group dramatically simplifies the process. It's a new install option in addition to the existing "solaris-small-server", "solaris-large-server", and "solaris-desktop" install groups.
"solaris-minimal-server" does exactly what it says. It provides the minimal set of packages to provision a minimal supported command-line Oracle Solaris environment. You will typically need to add packages to this minimal set which are required to support your applications.
For example, install a test domain with "solaris-minimal-server", your application, and any additional packages which you know your application requires - for example JRE7 and the application installer. Test it, and add in any additional packages which you discover your application requires - for example, for it's user GUI/BUI. That's the minimum install footprint for your application. Repeat as desired for other applications.
By reducing the install footprint, you reduce the "attack surface", ensuring you system is exposed to the minimum number of vulnerabilities. This in turn reduces the need to patch for security compliance, further reducing your TCO.
Since installing an Oracle Database would be a common scenario, Solaris 11.2 also
provides an additional group package for the database:
So, if you want to install the Oracle Database (single instance), you can simply add the above package to your solaris-minimal-server and you will have the required packages to install the database.
It's just one of many new features in Solaris 11.2 which I think you'll like. Please take a few minutes to browse the "What's New" and other documentation released with 11.2.
As with any Solaris Update release, expect a number of important
bug fixes in the first few
Solaris 11.2 SRUs which didn't make the Solaris 11.2 release.
More details on "solaris-minimal-server":
$ pkg contents -mr -g ./s11u2 group/system/solaris-minimal-server
set name=pkg.fmri value=pkg://firstname.lastname@example.org,5.11-0.175.2.0.0.42.0:20140623T214938Z
set name=pkg.summary value="Oracle Solaris Minimal Server"
set name=pkg.description value="Provides the minimal, supported command-line Oracle Solaris environment"
set name=info.classification value="org.opensolaris.category.2008:Meta Packages/Group Packages"
set name=org.opensolaris.consolidation value=solaris_re
set name=variant.arch value=i386 value=sparc
set name=variant.opensolaris.zone value=global value=nonglobal
depend fmri=network/ping type=group
depend fmri=service/network/ssh type=group
depend fmri=shell/tcsh type=group
depend fmri=shell/zsh type=group
depend fmri=system/network type=group
depend fmri=developer/debug/mdb type=require
depend fmri=editor/vim/vim-core type=require
depend fmri=group/system/solaris-core-platform type=require
depend fmri=package/pkg type=require
depend fmri=release/name type=require
depend fmri=release/notices type=require
depend fmri=shell/bash type=require
depend fmri=shell/ksh93 type=require
depend fmri=system/core-os type=require
depend fmri=system/library/platform type=require
The packages with group dependencies in the list above can be removed to further minimize the system. For example, if you don't want 'ssh', you don't have to install it.
More details on group package with Oracle Database 12.1 install pre-requisites:
$ pkg contents -mr -g ./s11u2 group/prerequisite/oracle/oracle-rdbms-server-12-1-preinstall
set name=pkg.fmri value=pkg://email@example.com,5.11-0.175.2.0.0.42.0:20140623T214934Z
set name=pkg.summary value="Prerequisite package for Oracle Database 12.1"
set name=pkg.description value="Provides the set of Oracle Solaris packages required for installation and operation of Oracle Database 12."
set name=info.classification value="org.opensolaris.category.2008:Meta Packages/Group Packages"
set name=org.opensolaris.consolidation value=solaris_re
set name=variant.arch value=i386 value=sparc
depend fmri=x11/diagnostic/x11-info-clients type=group
depend fmri=x11/library/libxi type=group
depend fmri=x11/library/libxtst type=group
depend fmri=x11/session/xauth type=group
depend fmri=compress/unzip type=require
depend fmri=developer/assembler type=require
depend fmri=developer/build/make type=require
As you may know, my team and I have been heavily focused on SuperCluster Engineered Systems for the last few years.
The intense work we've done for SuperCluster - especially on expediting fixes for scalability and availability issues - has a significant trickle down benefit for all Solaris customers. All of these critical fixes are in Solaris 11.2 SRU1.
Did you know that 97% of all customer SuperCluster domains / zones run Solaris 11.x ? Only 3% run Solaris 10. The reason for this massive adoption of Solaris 11.x is due to it's compelling features, excellent quality, and superb stability. It really is time to move to Solaris 11.x. It's like going from horses to motor cars. It is that big a difference.
Even if you are not in a position to adopt Solaris 11.2 immediately, please do consider using a recent Solaris 11.1 SRU, such as Solaris 11.1 SRU19.6 or later. This includes fixes for 110 critical issues encountered on SuperCluster and which are also relevant for other T4/T5/M5/M6/M10 users. This is our current recommended version for SuperCluster and our experience with it to date has been excellent.
We'll be moving up to Solaris 11.2 shortly to leverage more of the exciting features it provides.
Those awfully nice ORAchk folks have asked me to let you know about their latest release...
ORAchk version 2.2.5 is now available for download, new features in 2.2.5:
ORAchk has replaced the popular RACcheck tool, extending the coverage based on prioritization of top issues reported by users, to proactively scan for known problems within the area of:
ORAchk will expand in the future with high impact checks in existing and additional product areas. If you have particular checks or product areas you would like to see covered, please post suggestions in the ORAchk subspace in My Oracle Support Community.
For more details about ORAchk see Document 1268927.2
My colleagues, Susan Miller and Erwann Chénedé, have been working with the nice people behind the ORAchk tool (formerly RACcheck) to add Solaris health checks to the tool.
ORAchk 2.2.4, containing the initial 8 Solaris health checks, is now available:
ORAchk includes EXAchks functionality and replaces the popular RACcheck tool, extending the coverage based on prioritization of top issues reported by users, to proactively scan for known problems within:
ORAchk will expand in the future with more high impact checks in existing and additional product areas. If you have particular checks or product areas you would like to see covered, please post suggestions in the ORAchk community thread accessed from the support tab on the below document.
For more details about ORAchk see Document 1268927.1
Some say it's rude to stare. But that's not my experience.
I've been working on SuperCluster for 2 years now. And I've been looking intently at issues arising for SuperCluster customers, both to ensure the issues are fixed a.s.a.p. and to understand what lessons we can learn and where we can improve our products and processes.
Well, I want every customer to have the best possible experience.
Call me naive, but I sincerely believe that ensuring a good customer experience is the best way to encourage repeat business.
So, what have I learned ?
I've been working in the Solaris customer lifecycle space for 15 years now. One thing that's always puzzled me is why, while most customers have a perfectly good experience, there's always one or two customers who repeatedly hit problems.
The reasons are often not obvious. They may be running very similar hardware with very similar software configuration and broadly comparable workloads to hundreds of other customers in the same industry segment who are not experiencing any issue.
It's easy to assume that there may be something subtle "wrong" in their set-up. Either a misconfigured network, a piece of 3rd party kit which we don't have internally to aid us reproduce the issue, 3rd party or home grown apps relying on private interfaces they shouldn't be using, even a dodgey "favorite" /etc/system setting which the customer "knows" works from their Solaris 2.5.1 or V880 days that hamstrings performance, or whatever. Occasionally, despite enormous effort, it feels like we never get to true root cause and that customer never does have an optimal experience.
More often, we do determine the root cause, which may indeed be a sub-optimal configuration but, if the system's already in production, it may not be possible to reconfigure the system and start again, so the customer experience remains compromised for that system.
Indeed, it's for this exact reason - sub-optimal customer lifecycle experiences are often due to sub-optimal initial install and configuration - that my team was asked to develop the install and configuration utilities for SuperCluster so that they are configured according to best practice right out of the box. And that's worked very well indeed.
But some issues do still arise for SuperCluster customers.
Most are when we leverage new functionality - initially Infiniband, more lately VM2 and iSCSI. These issues are found and fixed rapidly, with proactive roll-out of the fixes to the entire SuperCluster customer base.
I previously blogged that, even though SuperCluster is configurable and certainly not an appliance, we are finding Engineered Systems issues much
easier to debug, as the fixed hardware layout, cabling, protocols,
etc., dramatically reduces the number of variables in play, making issue
reproduction in-house much easier, and hence issue analysis and resolution much faster. This really helps to improve our customers' experience.
But we still see a very small number of customers (two or possibly three come to mind) who repeatedly hit issues not seen by any other.
Why is that ?
The hardware is identical. The configurations are similar. We have other customers in the same industry segment utilizing the SuperClusters for broadly similar purposes. Even with similar DB and load characteristics. We know the networking is correct - it's fixed. We know the I/O cards are in the right slots - it's fixed. We know we're using the optimal protocols, configured optimally. We even have a process, ssctuner, running in the background to check that no dodgey settings are added to /etc/system, and it'll automatically remove them if they are.
We've gone through an interesting period over the summer. In early summer, we were seeing very few issues indeed reported from our now large customer base. Then, we saw 3 customers raise issues in quick succession.
The first, in Europe, looked like an Infiniband issue. Responses would just stop for multiple seconds for no apparent reason, then restart. We actually sent two very experienced engineers on site to debug after trying to debug over shared shell was unsuccessful, and they root caused a VM2 (Virtual Memory) issue and two scheduler issues.
Almost the same week, two U.S. SuperCluster customers raised VM2 issues. Our lead VM2 sustaining engineer, Vamsi Nagineni, engaged Eric Lowe from the VM2 development team, and they determined that none of the customer issues had the same root cause.
In one case, a bank, the customers' database is not optimized for Exadata, so more of the load runs on the SuperCluster compute nodes rather than on the storage cells. Nothing overly excessive, just enough to encounter an issue not seen by other customers.
In another, a State social services provider, the customer runs a high proportion of batch processing. Again, nothing excessive, just enough to encounter a different issue not seen by other customers.
In the third, a major retailer, the customer's apps had very specific memory requirements which the VM2 algorithms were handling sub-optimally.
The outcome of this is that a number of subtle VM2 and other bugs have been found and fixed, not just for the benefit of these and other SuperCluster customers, but since the fixes are putback into generic Solaris SRUs, all Solaris 11 customers benefit.
Without the reduced variables at play in Engineered Systems, it would be extremely difficult if not impossible to reproduce, analyze, and fix such subtle issues.
So even if you don't have a SuperCluster, you can still reap the benefits.
FYI, currently most of the SuperCluster install base is running Solaris 11.1 SRU7.5 (which fixes a number of VM2 issues).
BTW: We also improved the SRU README last month to summarize the important content.
Here's a Top Tip from my colleague, IPS Guru, and all-round good guy, Pete Dennis:
If the issue(s) addressed by a Solaris 11 IDR (Interim Diagnostics / Relief) are fixed in a subsequent SRU (Support Repository Update), the SRU is said to "supersede" the IDR.
As mentioned in previous posts, in Solaris 11 the IDR is automatically superseded when the system is updated to the relevant SRU (or any later SRU). That is, unlike in Solaris 10, there's no need to manually remove the IDR before updating*. We provide "terminal packages" for superseded IDRs in the Support Repo, enabling IPS (Image Packaging System) to automatically handle the IDRs for you.
Several weeks before a planned maintenance update, it's a good idea to check whether all the IDRs in use are superseded by the SRU to which you are planning to update.
If any of them aren't superseded, and the relevant packages they touch are updated in the SRU, you'll need to raise an SR (Service Request) with Oracle Support to get new IDRs generated for the relevant BugIDs at that SRU level. So please ensure you provide enough time for these to be generated. Note, if the Bugs are already fixed in a later SRU, you'll be told to update to that SRU.
Is there a simple way for a customer to find out which of their IDRs will be superseded by updating to a given SRU ?
All superseded IDRs are tagged in the Support Repository and on the
incremental ISO images available from MOS (My Oracle Support).
The following command will list the superseded IDRs in the Support Repository, so you can then examine the ones of interest.
I'm assuming here that you're maintaining a local Repo behind your firewall which is, at a minimum, up to date with the SRU to which you are planning to update:
pkg list -g http://<url of local repo> -af idr*
pkg contents -g http://<url of local repo> -m idr679
set name=pkg.fmri value=pkg://solaris/idr679@3,5.11:20130905T193900Z
set name=pkg.description value="Terminal package"
set name=pkg.renamed value=true
depend fmri=pkg:/firstname.lastname@example.org,5.11-0.175.1.11.0.4.2 type=require
You do need to be able to interpret FMRI strings correctly (see previous posts). For example, 5.11-0.175.1.11.0.4.2 is Solaris 11.1 SRU 11.4 or, to give it its official Marketing name, Solaris 184.108.40.206.0.
So that tells us that idr679 is superseded by Solaris 11.1 SRU 11.4 (Solaris 220.127.116.11.0).
We'll look to make this more transparent by adding a text field with the human readable translation of the FMRI string to the metadata.
If you wish to restrict updates to selected SRUs which you have "qualified" in your environment, for example, a "Golden Image", Bart's blog posting may also be of interest.
* There's more work required to make this happen seamlessly in Solaris 11 Zones.
Just thought you may be interested in some random metrics...
The unit of measurement is Umpf! The more Umpf!, the better. Comparing Oracle Sun's latest systems to some of our old favorite systems:
The new M6-32 which was announced at OpenWorld is ~174x more powerful than a fully loaded E10K running Solaris 8.
A T5-8 running Solaris 11 is ~133x more powerful than a fully loaded V880 and ~530x more powerful than a fully loaded E450 running Solaris 8. Holy spondulicks!
Guess it's time to replace the E450s in my lab and rent out the space I'll save for student accommodation.
With that sort of power, you can have world domination at your fingertips.
And surprisingly affordable world domination at that!
Usual disclaimers about my personal incompetence, lies, damn lies, and statistics, etc., apply
We now have quite a bit of experience of IPS and Repositories under our belt.
Feedback from customers has been extremely positive. I recently met a customer with 1000+ Solaris servers who told me that with Solaris 10 it took them 2 months to roll out a new patchset across their enterprise. With Solaris 11, it takes 10 days.
That really helps lower TCO.
As with anything, experience teaches us how to optimize things. Here's a few Top Tips around IPS / Repo management which I'd like to share with you from my experience with SuperCluster:
I hope you find these tips useful.
My colleagues, Glynn Foster and Bart Smaalders, will be presenting on "Oracle Solaris 11 Best Practices for Software Lifecycle Management [Con3889]" @ Oracle OpenWorld next week. The Oracle Sun "Systems" sessions are in the Westin this year. This particular session is on Tuesday, Sept 24 @ 5:15pm in the "City" meeting room in the Westin and will have lots more tips and best practices.
Other colleagues, Rob Hulme and Colin Seymour, are presenting on "Best Practices for Maintaining and Upgrading Oracle Solaris [CON8255]" on Monday, Sept 23 @ 10:45am in the Westin San Francisco, also in the "City" meeting room.
And there's lots of other good stuff on Solaris and SuperCluster. For example, the "Deep Dive into Oracle SuperCluster [CON8632]" on Tuesday, Sept 24 @ 5:15pm in the Westin, Metropolitan II.
I'm not presenting this year, but if you would like to meet up with me @ OpenWorld to discuss anything about Solaris / Systems / SuperCluster Lifecycle Maintainence, whether it's ideas you'd like to see implemented, what's keeping you awake at night, issues you want me to look at, etc., I am more than happy to do so. Just ping me at Gerry.Haskins@oracle.com.
We're tweaking the naming convention used by Oracle Solaris SRUs (Support Repository Updates) to use a 5-digit taxonomy.
For example, Oracle Solaris 18.104.22.168.0
The digits represent Release.Update.SRU.Build.Respin
For the above example, the old name would have been Oracle Solaris 11.1 SRU 6.4.
As with Oracle Solaris 10 and below, all bug fixes are putback to the tip of the source tree for Solaris 11, which is currently Solaris 11.1.x.y.z.
Therefore, these same SRUs are also the way to get fixes for systems installed with Oracle Solaris 11 11/11, in exactly the same way that Solaris 10 Kernel patches included code from all preceding Solaris 10 Updates.
As discussed in previously postings, systems should be updated to a later SRU, for example from Oracle Solaris 11 11/11 SRU13.4 to Oracle Solaris 22.214.171.124.0.
If you maintain a local Solaris Repository behind your firewall, both Solaris 11.1 and whichever subsequent SRUs you are interested in should be added to your Repo. This is because SRUs only contain the change delta relative to the preceding Solaris Update.
Solaris's long standing Binary Compatibility Guarantee coupled with the technical benefits of Image Packaging System (IPS) help to ensure a smooth update experience.
Image Packaging System (IPS) is a single tier packaging architecture which in Oracle Solaris 11, and other Oracle Sun products such as Oracle Solaris Cluster 4.x, replaces the previous SVR4-based dual tier packaging and patching architecture.
IPS and its implementation in Solaris 11 has a number of significant advantages over the old SVR4-based architecture, including:
As we get used to Solaris 11 and IPS, it's natural that users will encounter some issues.
As a novice user myself, I've documented here some of the more common Solaris 11 / IPS issues which I've come across over the past year. I plan to update it with additional items as they arise.
This is not designed to be an exhaustive list, but rather the "gottchas" which temporarily stumped either myself (easy to do!) or other non-IPS-expert colleagues.
Some of the "issues" are more to do with users getting used to conceptual changes.
Some are Caveats resulting from bugs or sub-optimal choices made in early releases. While these have been fixed, their residual impact may still be felt on systems with the affected software installed.
Much of the solutions knowledge below is thanks to two Solaris 11 IPS-expert colleagues of mine, Pete Dennis and Albert White, who I've been pestering unmercifully about IPS issues over the past year. It was either that, or I'd have to RTFM!
If you're looking to update from Solaris 11/11 to Solaris 11.1 or later, please read this article.
The 'pkg' command is functionally rich. See 'man pkg' and other documentation. When installing or updating packages, it dynamically analyzes the constraints on the target system, including dependencies and other factors defining what may be installed.
IPS is network repository based.
It is expected that most production customers will set-up their own repository behind their firewall and update it periodically with content from the Support Repository published by Oracle.
Many issues where 'pkg' is unable resolve all constraints imposed on a system, is because the required package versions are not available from the Repositories specified.
Sometimes, it is not immediately obvious why a particular package version is required to resolve a constraint, which can leave users scratching their heads.
Therefore, when a 'pkg install' or 'pkg update' command does not provide the anticipated results, check the specified Publishers (i.e. which Repositories are available to that system) and the content of those Repositories.
For example, Solaris 11 bug fix updates are provided by Support Repository Updates (SRUs) which are released monthly. They contain only the incremental changes relative to their base release, e.g. Solaris 11.1. They are designed to be used in conjunction with a Repository containing that base release.
If the system is already installed with that base release, and the user is just updating existing installed packages, as opposed to installing additional packages, then the user can often get away with just using the SRU on its own.
However, if a bug fix in the SRU has added a dependency on a package which is not installed on the target system, and that package is in the base release rather than the SRU, then an update to that SRU will fail if the base release is not available to enable the dependent package to be pulled in and installed.
For example, a bug fix to the 'thunderbird' package in Solaris 11 11/11 SRU4 to fix font displays resulted in a new dependency being added to the Solaris 11 11/11 'fonts' package. Since the 'fonts' package hadn't changed since the initial Solaris 11 11/11 release, it wasn't included in the SRU, so access to the base Solaris 11 11/11 release in a Repository was required to resolve the dependency. There was a similar dependency addition in a later Solaris 11 11/11 SRU.
Similarly, if a Publisher is specified but is unavailable, or is not specified but is needed because that Repository contains a required package, then 'pkg' will be unable to resolve the constraints and will fail.
Making sure the correct Repository Publishers are defined and accessible, and the content of those Repositories is complete will resolve many package install and update issues.
The concept I've had the most difficulty getting straight in my own head is the relationship between Install Groups and Incorporations.
Install Groups simply specify a list of packages to be installed for common Use Cases. They do not specify the versions of packages to install. Currently, the following Install Groups exist in Solaris 11:
Note the Install Group names 'solaris-small-server' and 'solaris-large-server' have
nothing to do with the size of the server, rather it's the size of the
solaris footprint on the server. Note also, that 'solaris-desktop' is not a superset of the other two. See here for more information.
The use of Install Groups is not mandatory. They are simply provided for ease of use. Additional packages can be specified in addition to these Install Groups, for example, to resolve application dependencies.
Incorporations specify the versions of packages which should be installed together to provide a set of functionality, called a surface. Incorporations exist for various consolidated sub-components of Solaris 11, such as the 'osnet-incorporation' for the core Operating System and Networking:
gerryh@dublin:~$ pkg info osnet-incorporation
Summary: OS/Net consolidation incorporation
Description: This incorporation constrains packages from the OS/Net
Category: Meta Packages/Incorporations
Build Release: 5.11
Packaging Date: Wed Jan 02 19:28:00 2013
Size: 6.22 kB
The 'entire' Incorporation defines what constitutes the version of the
entire Solaris Operating System, for example the 'entire' Solaris 11.1 SRU3.4
gerryh@dublin:~$ pkg info entire
Summary: entire incorporation including Support Repository Update (Oracle Solaris 11.1 SRU 3.4).
Description: This package constrains system package versions to the same
build. WARNING: Proper system update and correct package
selection depend on the presence of this incorporation.
Removing this package will result in an unsupported system. For
more information see https://support.oracle.com/CSP/main/article
Category: Meta Packages/Incorporations
Version: 0.5.11 (Oracle Solaris 11.1 SRU 3.4)
Build Release: 5.11
Packaging Date: Wed Jan 02 19:31:02 2013
Size: 5.46 kB
Removal of the Solaris 'entire' Incorporation is not supported. Removing it would remove contraints on other Incorporations, allowing an untested mix of Solaris software versions on the system, potentially leading to unnecessary issues.
When installing a Solaris system, it is common to specify both an Install Group - i.e. which packages to install - and a version of the 'entire' Incorporation - i.e. which versions of those packages to install.
For example, this could be specified in an AI (Automated Installer) manifest, along with any additional IPS products or packages required. Here's part of an AI manifest my team uses to install SPARC SuperClusters with Solaris 11 11/11 SRU12.4 as well as other tools from a separate Exa-family tools Repository which is specifically for Engineered Systems:
Now for the bit which always confuses me. Strong coffee helps!:
Installing an Incorporation does not, by itself, install any packages. Rather, the Incorporation specifies the constraints on package versions if they are present on the system.
So 'pkg install entire' on a bare metal system does nothing, unless other packages are specified upon which the constraints specified in the Incorporation are to operate - e.g. an Install Group package such as 'solaris-large-server'. To show this:
# create a bare metal image to play with
$ pkg image-create -p http://pkg.us.oracle.com/solaris11/release bare_metal
# what is in this image:
$ cd bare_metal
$ pkg -R `pwd` list
pkg: no packages installed
# Install 'entire'
$ pkg -R `pwd` install --accept entire
Packages to install: 28
# It installed 28 packages! What are they ?
$ pkg -R `pwd` list
NAME (PUBLISHER) VERSION IFO
consolidation/SunVTS/SunVTS-incorporation 0.5.11-0.175.1.0.0.14.0 i--
consolidation/X/X-incorporation 0.5.11-0.175.1.0.0.24.1317 i--
consolidation/admin/admin-incorporation 0.5.11-0.175.1.0.0.5.0 i--
consolidation/xvm/xvm-incorporation 0.5.11-0.175.1.0.0.5.0 i--
entire 0.5.11-0.175.1.0.0.24.2 i--
# These are all Incorporations specified by 'entire'. There is no software payload installed at all.
But once Solaris is installed on a system, updating an Incorporation, for example, using 'pkg update entire', updates the constraints, causing the relevant packages which are installed to be updated by IPS to the later functional 'surface' specified by the Incorporation.
So if, for example, the new version of the Incorporation specifies 'email@example.com' and specifies it has a new dependency on 'firstname.lastname@example.org' and package 'foo' is already installed, say @ Version 1.20, then updating the Incorporation tells IPS to update 'foo' to Version 1.24 and install 'bar' at Version 1.13 if it hasn't already been installed (from whichever specified Repository/Repositories contains these packages at these versions).
You can have too much of a good thing. Like information. Which can make it hard to see the wood for the trees when trying to debug a 'pkg' issue.
When issues occur, 'pkg' is verbose in its output about the problem.
Packages will have dependencies upon other packages. These dependencies may not only be satisfied by the explicit version mentioned but also by any later version of a package.
This means that if 'pkg' is unable to solve for all dependencies given the available Publishers specified, the contents of those Repositories, and the constraints specified for the target system, then 'pkg' will produce a list of all the dependencies that could not be satisfied. While these errors are all true, due to the amount of them, they can freak out the user (they do me!), obscuring the underlying issue.
One way to reduce the amount of errors is to specify the version of the packages that you wish to update to.
This is because, by default, 'pkg' will attempt to move to the latest set of packages. If this update fails then it will recurse through all other permutations, producing errors for each possible set of packages for which it attempted to resolve constraints.
By specifying an explicit version of the packages to update then the errors produced will be just for that particular version.
Therefore, rather than just saying:
$ pkg update entire
...explicitly state the FMRI string of the SRU you want to update to...
$ pkg update email@example.com,5.11-0.175.0.12.0.4
...which specifies Solaris 11 11/11 SRU 12.4.
There's a couple of other good reasons to explicitly specify which SRU or package version you want to update to.
Firstly, if you don't specify a version, 'pkg' will try to update to the latest version which satisfies the constraints on the target system.
If the repository has been updated, this could produce a different result than the same command issued prior to the repository been updated. This may be undesirable if you are trying to update a number of systems to a homogeneous SRU level.
Secondly, if 'pkg' is unable to update to the latest available release due to the constraints on the target system, it will recursively try to update to a version higher than what is already installed.
For example, one of my team issued a 'pkg update entire' to update a test system to SRU4. Only days later when he realized that the test system didn't appear to have the expected bug fixes, did he discover it had actually updated the system to SRU3 as there was a constraint which prevented IPS updating the system to SRU4.
Since IPS is not telepathic, it's best to explicitly state what version you want it to update a system to.
All 'pkg' commands are logged.
The use of 'pkg history' is useful to examine the history of the system. Additionally, it can be used to print out the previous errors messages without having to rerun a command that you know is going to fail.
Since 'pkg history' can be verbose, it's best to first identify when the error occurred and drill down on that specific invocation.
For example, if you think the failure occurred within the last 5 invocations of the 'pkg' command then run:
# pkg history -n 5
START OPERATION CLIENT OUTCOME
2013-01-18T10:28:04 update pkg Succeeded
2013-01-18T10:28:07 refresh-publishers pkg Succeeded
2013-01-18T10:28:24 rebuild-image-catalogs pkg Succeeded
2013-01-22T14:39:55 install pkg Failed
2013-01-22T14:40:51 install pkg Succeeded
Now look at the command that failed using the -l and -t options:
# pkg history -l -t 2013-01-22T14:39:55
User: root (0)
Boot Env.: s11.1sru341-reprise
Boot Env. UUID: 6c841d3c-7d0a-c42f-b480-b53bfb0c265e
New Boot Env.: None
New Boot Env. UUID: (None)
Start Time: 2013-01-22T14:39:55
End Time: 2013-01-22T14:40:06
Total Time: 0:00:11
Command: /usr/bin/pkg install
Release Notes: No
Start State: None
End State: None
Traceback (most recent call last):
File "/usr/lib/python2.6/vendor-packages/pkg/client/api.py", line 1079, in __plan_op self._img.make_install_plan(**kwargs)
File "/usr/lib/python2.6/vendor-packages/pkg/client/image.py", line 4288, in make_install_plan reject_list=reject_list) File "/usr/lib/python2.6/vendor-packages/pkg/client/image.py", line 4249, in __make_plan_common ip.plan_install(**kwargs)
File "/usr/lib/python2.6/vendor-packages/pkg/client/imageplan.py", line 419, in plan_install reject_list=reject_list)
File "/usr/lib/python2.6/vendor-packages/pkg/client/imageplan.py", line 395, in __plan_install reject_list=reject_list) File "/usr/lib/python2.6/vendor-packages/pkg/client/imageplan.py", line 370, in __plan_install_solver ignore_inst_parent_deps=ignore_inst_parent_deps)
File "/usr/lib/python2.6/vendor-packages/pkg/client/pkg_solver.py", line 442, in solve_install no_version=ret, solver_errors=solver_errors)
PlanCreationException: No matching version of system/kernel can be installed:
Reason: Newer versionis already installed
This version is excluded by installed incorporation
All errors are indented with, in most cases, the significant error being indented furthest to the right hand side.
In the above example, the requested Kernel can't be installed because a later revision is already installed and it's constrained by the 'osnet-incorporation' so it can't be "down-rev'd" to an earlier version.
Therefore, when investigating an issue, use the 'pkg history' command to print out the previous errors and look at the errors that are indented to the right.
Note that the errors themselves may be repeated for each and every package that the update has failed on, so the output may still be verbose.
But the errors are typically caused by just one or two issues, such as having the incorrect Publishers specified (too few, too many, or not accessible) or insufficient content in the Repositories.
The Release Repository contains just the Solaris Releases such as the original Solaris 11 11/11 release and the Solaris 11.1 update release.
The Support Repository is only available to customers with a valid support contract. It contains all releases, including all monthly Support Repository Updates (SRUs) providing bug fix updates to support contract customers.
As discussed in my previous blog posting, we've implemented a process improvement in Solaris 11 to remove any 'blackout' period on the release of bug fixes by tweaking the relationship between Update releases and bug fix releases, compared to Solaris 10 and older.
We still produce periodic Update releases such as Solaris 11.1, containing support for new hardware and enhanced software features (e.g. VM2.0).
Update releases also contain a significant number of bug fixes for issues found internally during Solaris testing and more complex customer reported issues which required more test "soak" time than is possible in an SRU.
Update releases are intensely tested and hence provide high quality Solaris Baselines.
The Solaris Binary Compatibility Guarantee applies, so users should not experience any compatibility issues crossing a Solaris Update boundary.
The Release Notes for the Update will give at least 12 months notice of any interfaces which will be deprecated.
Support Repository Updates (SRUs) primarily deliver bug fixes, although they may include some feature enhancements.
They too, are intensely tested prior to release.
SRUs go through several internal builds prior to release.
Once released, additional critical bug fixes can be "back-published" to SRUs.
The build number of the SRU is now included in its name to uniquely identify it, e.g. Solaris 11 11/11 SRU13.4 is Build 4 of SRU13 on top of Solaris 11 11/11.
Earlier SRUs were documented with a letter suffix to denote "back-published" additional content, e.g. Solaris 11 11/11 SRU2a.
We've improved the process in Solaris 11 so that we can continue to deliver bug fixes for critical issues in SRUs while the content for an Update release is being finalized.
This implies that an Update release may not be a superset of the SRU(s) immediately preceding it.
Rather, it is the SRU after the Update release which is effectively the superset of both the Update the SRUs preceding the Update release.
The relationship between Update releases and SRUs can be drawn as follows:
Solaris 11 11/11 Solaris 11.1 Solaris 11.2...
\ \ \
SRU1, SRU1a, SRU2, ... SRU12.4, SRU 13.4, SRU1.4, SRU2.5, SRU3.4, SRU3.4.1, ...
Installed systems with a valid support contract should always be updated using SRUs from the Support Repository.
The SRUs are contiguous, just as Kernel patches were contiguous in Solaris 10 and earlier. That is, the next SRU after Solaris 11 11/11 SRU13.4 is Solaris 11.1 SRU1.4.
It is important to understand that this is no different to the Kernel PatchID progression in Solaris 10 and earlier releases whereby Kernel patches released after an Update release depended upon the Kernel patch from the Update, which contained feature code from that Update.
The only difference is that that lineage is a little more transparent in Solaris 11 due to the naming of the SRUs.
As developers, reviewers, and release engineers have become used to Image Packaging System and the Solaris 11 eco-system, the number of bugs, "features", and caveats caused by inexperience continue to diminish.
Nevertheless, users may be impacted by the residual effects of some of these items.
Here's a non-exhaustive list of "features", potential issues, and their workarounds:
Oracle's Legal department insist that users explicitly accept the revised license terms in Java 7.
This means that users must add "--accept" to 'pkg install' or 'pkg update' commands when moving to versions with revised license terms. For example:
$ pkg install --accept entire
An IPS 'pkg' bug in early Solaris 11 11/11 versions can result in some x86 packages being installed on SPARC systems and vice versa due to the incorrect resolution of indirect dependencies. This is now fixed.
There are several methods to remove the residual effects of the issue on early Solaris 11 11/11 installations.
Until the erroneous architecture packages are removed, errors similar to the following may be displayed when updating:
Plan Creation: Package solver is unable to compute solution.
Dependency analysis is unable to determine exact cause.
Try specifying expected results to obtain more detailed error messages.
Include specific version of packages you wish installed.
Note, the above is a rather generic error indicating the package solver couldn't compute a solution, so not all instances of the above error message may be due to this particular issue. But for those which are, here's the options to resolve it:
The 'pkg' version delivered in Solaris 11 11/11 SRU 10.5, SRU 11.4, SRU 12.4, and SRU 13.4 contain functionality to remove the residual effects of the issue - i.e. remove the incorrect architecture packages.
Users can perform a "bunny hop" update to SRU 10.5, SRU 11.4, SRU 12.4, or SRU 13.4 prior to updating to a Solaris 11.1 SRU.
Indeed, simply updating the 'pkg' package itself is sufficient:
# pkg update
WARNING: pkg(5) appears to be out of date, and should be updated before
running update. Please update pkg(5) by executing 'pkg install
pkg:/package/pkg' as a privileged user and then retry the update.
# pkg update package/pkg
Packages to remove: 1
Create boot environment: No
Create backup boot environment: No
Removal Phase 13/13
Package State Update Phase 1/1
Package Cache Update Phase 1/1
Image State Update Phase 2/2
The first command shows that the 'pkg' client has detected an error and outputs a message to fix it by running 'pkg update package/pkg'.
Running this command removes the incorrectly installed packages - in the above example, it was the 'ldoms-incorporation' on an x86 system.
On SPARC systems, the 'xsvc' and 'nvidia-incorporation' x86 packages may be installed. Since they were introduced via indirect requirements from other packages, such as the optional 'hmp-tools' package on SPARC or 'ldoms-incorporation' on x86, an alternative resolution is to remove the package with the dependency which will also remove the erroneous packages if nothing else depends upon them. For example:
root@foobar:~# pkg -R /a/test2 uninstall -v
Packages to remove: 3
Estimated space available: 103.67 GB
Estimated space to be consumed: 19.20 MB
Create boot environment: No
Create backup boot environment: No
Rebuild boot archive: Yes
126.96.36.19913,5.11-1:20120314T235822Z -> None
0.5.11,5.11-0.175.0.0.0.0.0:20110927T192422Z -> None
0.5.11,5.11-0.175.0.0.0.2.1:20111019T060937Z -> None
Removal Phase 89/89
Package State Update Phase 3/3
Package Cache Update Phase 3/3
Image State Update Phase 2/2
Once the erroneous architecture packages have been removed, you can update the system as normal.
Oracle Solaris delivers the 'cacao' package. It's version is constrained by the 'cacao-incorporation'.
Oracle Enterprise Manager Ops Center also delivers the 'cacao' package. Rather than working with Solaris to update its 'cacao' version, or delivering its own version to a private location, early Ops Center versions on Solaris 11 updated the Solaris 'cacao' package to a level later than that contained in any Solaris release.
This was sub-optimal as it had the unintended consequence of effectively breaking Solaris updates as IPS found a version of 'cacao' installed on the target system which was later than any version available from the Solaris publisher.
The Solaris 'entire' Incorporation includes the Solaris 'cacao-incorporation' and that constrained 'cacao' to an earlier version than that installed by Ops Center, meaning IPS could not resolve the constraints, and hence could not update Solaris without user intervention.
The workaround for this, and other such issues, is to "unlock" the offending package(s) from their incorporation, allowing them to float free. This is done by toggling the IPS 'facet.version-lock' facility to 'false':
gerryh@dublin:~$ pkg contents -m cacao-incorporation
set name=pkg.fmri value=pkg://firstname.lastname@example.org,5.11-0.175.1.2.0.1.0:20121116T222617Z
set name=pkg.summary value="cacao consolidation incorporation"
set name=pkg.description value="This incorporation constrains packages from the cacao consolidation."
set name=pkg.depend.install-hold value=core-os.cacao
set name=info.classification value="org.opensolaris.category.2008:Meta Packages/Incorporations"
set name=org.opensolaris.consolidation value=cacao
set name=variant.arch value=sparc value=i386
depend fmri=SUNWcacaort@0.5.11-0.133 type=incorporate
depend fmri=SUNWcacaodtrace@0.5.11-0.133 type=incorporate
depend facet.version-lock.library/cacao=true email@example.com,5.11-0.175.1.2.0.1.0 type=incorporate
depend facet.version-lock.library/cacao/cacao-dtrace=true firstname.lastname@example.org,5.11-0.175.1.0.0.11.0 type=incorporate
depend fmri=SUNWcacaowsvr@0.5.11,5.11-0.166 type=incorporate
depend email@example.com,5.11-0.166 type=incorporate
signature 235c7674d821032ae3eeda280c7837d1f1f4fdb5 algorithm=rsa-sha256 chain="8e422c1bb80b05f08f7a849f3d7ae90a976e048e 754665e03bd28ef63b05a416073eb6d649624781" chain.chashes="083e40bb50e6964834ebfd3c66b8720b46028068 f85dabbb0d56b37de3c3de98663dd8f27a12ff8e" chain.csizes="1273 1326" chain.sizes="1773 2061" chash=05654e46fc5cac3b9b9bd11c39512bc92bc85089 pkg.csize=1281 pkg.size=1753 value=3d4ef4de2458c2f6f34e6e8b6fac08312e7b556e70b2af881e30e70912c335e5a39e526c9c662b71e88c8e293cd42d3e688d515fa961ceb876fbc9af5339123c923dc0fd81c8252c9d6098b5bb8bb886733e2b827445266d67ef888bc4f9b814d1b22eaea8758e767f21f2828bd7239797b503333d931f3a2fa2d8cf63ea560fb634228611ad61fc8a5401e0b49f3a54c6198cd712f3d6a97efdc5b11d8a94a160b47d30fd24da303a43bf500111b0bc13bc28d5d66440d50ce434e9002404b103c6e8c23f7a29bbe650002ccb9edffeced85b97656c1e06beb28f937ce054aaf3bb753ed84632a9956c30cbcb67fca1fef52d3c8b0af8a97c60aa85e871d725 version=0
Setting 'facet.version-lock' to 'false' tells IPS that the constraint on that package version can be ignored. This enables the rest of the packages to be updated.
Once the rest of Solaris has been updated, the package should typically be re-locked - i.e. set 'facet.version-lock' back to 'true', so that it will be updated along with the rest of the packages when future updates are performed (assuming the issue is transitory). Failure to re-lock the package will leave it floating independently.
Only use the 'facet.version-lock' feature when you have good cause to do so and you are confident you understand what you are doing.
Some core packages do not have a 'facet.version-lock' and cannot be unlocked from their Incorporation as their version is considered integral to the correct operation of Solaris.
Due to an unfortunate sequence of missteps in development, and in a rare set of circumstances only, users may experience issues updating pre-Solaris 11 11/11 SRU10.4 versions due to 'boot-management' package issues.
The 'boot-management' package was originally part of the 'install-incorporation' and needed to be moved to the 'osnet-incorporation' as part of the GRUB2 boot project.
In preparation for that work the 'boot-management' package was
unincorporated from the 'install-incorporation'.
Unfortunately, it didn't get incorporated into the 'osnet-incorporation' until Solaris 11 11/11 SRU 10.4, which effectively allowed it to float free unconstrained in the interim.
Thus, if you have a system installed with an early Solaris 11 version, such as Solaris 11 11/11 SRU 2a, and want to update it to another pre-SRU 10.4 version such as SRU 5.5, but a later SRU version, e.g. SRU 13.4, is also in your Repository, then the latest GRUB2 boot-management package version available in your Repository will be installed, as there's no incorporation in SRU 5.5 constraining it to an earlier version:
/usr/bin/pkg update --accept --be-name=Solaris11-sru5.5
If you subsequently try to update to an SRU between SRU 10.4 and any SRU with an earlier version of the 'boot-management' package than has been installed, say, SRU 11.4, it'll fail, because from SRU 10.4 onwards the version of the 'boot-management' package is constrained. In SRU 11.4, the 'boot-management' package version is constrained to an earlier version in the 'osnet-incorporation' than the SRU 13.4 version installed. The error message will be similar to the following:
Reason: Excluded by proposed incorporation 'consolidation/osnet/osnet-incorporation'
Newer versionis already installed
Reason: Newer versionis already installed
The solution is actually quite simple.
Since the 'boot-management' package on the target system installed with SRU 5.5 is not constrained by any incorporation in that SRU, simply "down-rev" the 'boot-management' package to the version in the SRU you wish to update to, e.g. SRU 11.4:
Now perform the update to the desired SRU, e.g. SRU 11.4, again:
pkg update firstname.lastname@example.org,5.11-0.175.0.11.0.4
It's not surprising that Engineered Systems accelerate the debugging and resolution of customer issues.
But what has surprised me is just how much faster issue resolution is with Engineered Systems such as SPARC SuperCluster.
These are powerful, complex, systems used by customers wanting extreme database performance, app performance, and cost saving server consolidation.
A SPARC SuperCluster consists or 2 or 4 powerful T4-4 compute nodes, 3 or 6 extreme performance Exadata Storage Cells, a ZFS Storage Appliance 7320 for general purpose storage, and ultra fast Infiniband switches. Each with its own firmware.
It runs Solaris 11, Solaris 10, 11gR2, LDoms virtualization, and Zones virtualization on the T4-4 compute nodes, a modified version of Solaris 11 in the ZFS Storage Appliance, a modified and highly tuned version of Oracle Linux running Exadata software on the Storage Cells, another Linux derivative in the Infiniband switches, etc.
It has an Infiniband data network between the components, a 10Gb data network to the outside world, and a 1Gb management network.
And customers can run whatever middleware and apps they want on it, clustered in whatever way they want.
In one word, powerful. In another, complex.
The system is highly Engineered. But it's designed to run general purpose applications.
That is, the physical components, configuration, cabling, virtualization technologies, switches, firmware, Operating System versions, network protocols, tunables, etc. are all preset for optimum performance and robustness.
That improves the customer experience as what the customer runs leverages our technical know-how and best practices and is what we've tested intensely within Oracle.
It should also make debugging easier by fixing a large number of variables which would otherwise be in play if a customer or Systems Integrator had assembled such a complex system themselves from the constituent components. For example, there's myriad network protocols which could be used with Infiniband. Myriad ways the components could be interconnected, myriad tunable settings, etc.
But what has really surprised me - and I've been working in this area for 15 years now - is just how much easier and faster Engineered Systems have made debugging and issue resolution.
All those error opportunities for sub-optimal cabling, unusual network protocols, sub-optimal deployment of virtualization technologies, issues with 3rd party storage, issues with 3rd party multi-pathing products, etc., are simply taken out of the equation.
All those error opportunities for making an issue unique to a particular set-up, the "why aren't we seeing this on any other system ?" type questions, the doubts, just go away when we or a customer discover an issue on an Engineered System.
It enables a really honed response, getting to the root cause much, much faster than would otherwise be the case.
Here's a couple of examples from the last month, one found in-house by my team, one found by a customer:
Example 1: We found a node eviction issue running 11gR2 with Solaris 11 SRU 12 under extreme load on what we call our ExaLego test system (mimics an Exadata / SuperCluster 11gR2 Exadata Storage Cell set-up). We quickly established that an enhancement in SRU12 enabled an 11gR2 process to query Infiniband's Subnet Manager, replacing a fallback mechanism it had used previously. Under abnormally heavy load, the query could return results which were misinterpreted resulting in node eviction. In several daily joint debugging sessions between the Solaris, Infiniband, and 11gR2 teams, the issue was fully root caused, evaluated, and a fix agreed upon. That fix went back into all Solaris releases the following Monday. From initial issue discovery to the fix being put back into all Solaris releases was just 10 days.
Example 2: A customer reported sporadic performance degradation. The reasons were unclear and the information sparse. The SPARC SuperCluster Engineered Systems support teams which comprises both SPARC/Solaris and Database/Exadata experts worked to root cause the issue. A number of contributing factors were discovered, including tunable parameters. An intense collaborative investigation between the engineering teams identified the root cause to a CPU bound networking thread which was being starved of CPU cycles under extreme load. Workarounds were identified. Modifications have been put back into 11gR2 to alleviate the issue and a development project already underway within Solaris has been sped up to provide the final resolution on the Solaris side. The fixed nature of the SPARC SuperCluster configuration greatly aided issue reproduction and dramatically sped up root cause analysis, allowing the correct workarounds and fixes to be identified, prioritized, and implemented. The customer is now extremely happy with performance and robustness.
Since the Engineered System configuration is common to other SPARC SuperCluster customers, the lessons learned are being proactively rolled out to other customers and incorporated into the installation procedures for future customers.
This effectively acts as a turbo-boost to performance and reliability for all SPARC SuperCluster customers.
If this had occurred in a "home grown" system of this complexity, I expect it would have taken at least 6 months to get to the bottom of the issue.
But because it was an Engineered System, known, understood, and qualified by both the Solaris and Database teams, we were able to collaborate closely to identify cause and effect and expedite a solution for the customer.
That is a key advantage of Engineered Systems which should not be underestimated.
Indeed, the initial issue mitigation on the Database side followed by final fix on the Solaris side, highlights the high degree of collaboration and excellent teamwork between the Oracle engineering teams.
It's a compelling advantage of the integrated Oracle Red Stack in general and Engineered Systems in particular.
As you may know, Support Repository Updates (SRUs) for Oracle Solaris 11 are released monthly and are available to customers with an appropriate support contract. SRUs primarily deliver bug fixes. They may also deliver low risk feature enhancements.
Solaris Updates are typically released once or twice a year, containing support for new hardware, new software feature enhancements, and all bug fixes available at the time the Update content was finalized. They also contain a significant number of new bug fixes, for issues found internally in Oracle and complex customer bug fixes which require significant "soak" time to ensure their efficacy prior to release.
We're changing the naming convention of Update releases from a
date based format such as Oracle Solaris 10 8/11 to a simpler "dot"
version numbering, e.g. Oracle Solaris 11.1. Oracle Solaris 11 11/11
(i.e. the initial Oracle Solaris 11 release) may be referred to as 11.0.
SRUs will simply be named as "dot.dot" releases, e.g. Oracle Solaris 11.1.1, for SRU1 after Oracle Solaris 11.1.
Many Oracle products and infrastructure tools such as BugDB and MOS are tailored towards this "dot.dot" style of release naming, so these name changes align Oracle Solaris with these conventions.
The Oracle Solaris 11 release process has been enhanced to eliminate blackout periods on the delivery of new bug fixes to customers.
Previously, Oracle Solaris Updates were a superset of all preceding bug fix deliveries. This made for a very simple update message - that which releases later is always a superset of that which was delivered previously.
However, it had a downside. Once the contents of an Update release were frozen prior to release, the release of new bug fixes for customer issues was also frozen to maintain the Update's superset relationship.
Since the amount of change allowed into the final internal builds of an Update release is reduced to mitigate risk, this throttling back also impacted the release of new bug fixes to customers.
This meant that there was effectively a 6 to 9 week hiatus on the release of new bug fixes prior to the release of each Update. That wasn't good for customers awaiting critical bug fixes.
We've eliminated this hiatus on the delivery of new bug fixes in Oracle Solaris 11 by allowing new bug fixes to continue to be released in SRUs even after the contents of the next Update release have been frozen.
The release of SRUs will remain contiguous, with the first SRU released after the Update release effectively being a superset of both the the Update release and all preceding SRUs*.
That is, later SRUs are supersets of the content of previous SRUs.
Therefore, the progression path from the final SRUs prior to the Update release is to the first SRU after the Update release, rather than to the Update release itself.
The timeline / logical sequence of releases can be shown as follows:
Updates: 11.0 11.1 11.2 etc.
\ \ \
SRUs: 11.0.1, 11.0.2,...,11.0.12, 11.0.13, 11.1.1, 11.1.2,...,11.1.x, 11.2.1, etc.
For example, for systems with Oracle Solaris 11 11/11 SRU12.4 or later installed, the recommended update path is to Oracle Solaris 11.1.1 (i.e. SRU1 after Solaris 11.1) or later rather than to the Solaris 11.1 release itself. This will ensure no bug fixes are "lost" during the update*.
If for any reason you do wish to update from SRU12.4 or later to
the 11.1 release itself - for example to update a test system - the instructions to do so are in the SRU12.4 README, https://updates.oracle.com/Orion/Services/download?type=readme&aru=15607102
For systems with Oracle Solaris 11 11/11 SRU11.4 or earlier
installed, customers can update to either the 11.1 release or any 11.1
SRU as both will be supersets of their current version. My colleague, Pete Dennis, explains the step-by-step process here.
Please do read the README of the SRU you are updating to, as it will contain important installation instructions which will save you time and effort.
7166132 vim should be able to run its test suite
7190213 libibmad and associated files need to be delivered in an NGZ
7191495 mkisofs install is incomplete
7195687 Update fetchmail to version 6.3.22
7195704 Problem with utility/fetchmail
7196234 Problem with network/dns
7197223 vim shows high CPU usage when editing dtrace script with syntax
7071362 tcp_icmp_source_quench and other tunables may no longer be field
7181137 sol_umad should allow userland MAD operations in NGZs
7196540 After 7174929 integration 0.9.0 is shown for first disk in
The October 2012 security "Critical Patch Update" information and downloads are now available from My Oracle Support (MOS).
See http://www.oracle.com/technetwork/topics/security/alerts-086861.html and in particular Document 1475188.1 on My Oracle Support (MOS), http://support.oracle.com, which includes security CVE mappings for Oracle Sun products.
For Solaris 11, Doc 1475188.1 points to the relevant SRUs containing the fixes for each issue. SRU12.4 was released on the CPU date and contains the current cumulative security fixes for the Solaris 11 OS.
For Solaris 10, we take a copy of the Recommended Solaris OS patchset containing the relevant security fixes and rename it as the October CPU patchset on MOS. See link provided from Doc 1475188.1
Doc 1475188.1 also contains references for Firmware, etc., and links to other useful security documentation, including information on Userland/FOSS vulnerabilities and fixes in https://blogs.oracle.com/sunsecurity/
My colleague, Albert White, has published a useful article detailing how to set up local IPS repositories for use within an enterprise: How to Create Multiple Internal Repositories for Oracle Solaris 11
This is useful as most servers will not be directly connected to the Internet and most customers will want to control which Oracle Solaris SRUs (Support Repository Updates) are "qualified" for deployment within their organization. Setting up and managing Internal IPS (Image Packaging System) Repositories is the way to do this.
The concept can naturally be extended and adapted. For example, Albert talks about a "Development" Repo containing the latest Oracle Solaris 11 deliverables. When qualifying a software level for deployment across the enterprise, a copy of a specific level could be taken, e.g. "GoldenImage2012Q3" or "SRU8.5", and once it passes testing, be used to deploy across the enterprise.
You may have already heard that Solaris 11 and SPARC T4 processors deliver stunning performance.
Here's a useful link from my colleagues in the Strategic Applications Engineering team which you may find useful:
The performance blog: http://blogs.oracle.com/BestPerf/
Welcome to my new blog http://blogs.oracle.com/Solaris11Life which is all about the Customer Maintenance Lifecycle for Image Packaging System (IPS) based Solaris releases, such as Solaris 11.
It'll include policies, best practices, clarifications, and lots of other stuff which I hope you'll find useful as you get up to speed with Solaris 11 and IPS.
Let's start with an updated version of my Solaris 11 Customer Maintenance Lifecycle presentation which I originally gave at Oracle Open World 2011 and at the 2011 Deutsche Oracle Anwendergruppe (DOAG - German Oracle Users Group) conference in Nürnberg.
Some of you may be familiar with my Patch Corner blog, http://blogs.oracle.com/patch , which fulfilled a similar purpose for System V [five] Release 4 (SVR4) based Solaris releases, such as Solaris 10 and below.
Since maintaining a Solaris 11 system is quite different to maintaining a Solaris 10 system, I thought it prudent to start this 2nd parallel blog for Solaris 11.
Actually, I have an ulterior motive for starting this separate blog.
Since IPS is a single tier packaging architecture, it doesn't have any patches, only package updates.
I've therefore banned the word "patch" in Solaris 11 and introduced a swear box to which my colleagues must contribute a quarter [$0.25] every time they use the word "patch" in a public forum. From their Oracle Open World presentations, John Fowler owes 50 cents, Liane Preza owes $1.25, and Bart Smaalders owes 75 cents.
Since I'm stinging my colleagues in what could be a lucrative enterprise, I couldn't very well discuss IPS best practices on a blog called "Patch Corner" with a URI of http://blogs.oracle.com/patch. I simply couldn't afford all those contributions to the "patch" swear box. :)
Feel free to let me know what topics you'd like covered - just post a comment in the comment box on the blog.
This blog is to inform customers about Solaris 11 maintenance best practice, feature enhancements, and key issues. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. The Documents contained within this site may include statements about Oracle's product development plans. Many factors can materially affect these plans and the nature and timing of future product releases. Accordingly, this Information is provided to you solely for information only, is not a commitment to deliver any material code, or functionality, and SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. The development, release, and timing of any features or functionality described remains at the sole discretion of Oracle. THIS INFORMATION MAY NOT BE INCORPORATED INTO ANY CONTRACTUAL AGREEMENT WITH ORACLE OR ITS SUBSIDIARIES OR AFFILIATES. ORACLE SPECIFICALLY DISCLAIMS ANY LIABILITY WITH RESPECT TO THIS INFORMATION. Gerry Haskins, Director, Software Lifecycle Engineering