Sunday Jul 20, 2014

Pre-Upgrade Checks Enterprise Manager Ops Center

With the release of Enterprise Manager Ops Center 12.2.1, it is time to go through the upgrade cycle. I thought I would share the pre-upgrade checks I go through when I upgrade to a new Ops Center build. As part of the development team, I get involved in pre-release Quality Assurance testing, which means I end up doing hundreds of upgrades as part of the testing process.

Update releases come out regularly and contain enhancements and bug fixes. As with any other application in your environment, you should upgrade Ops Center to the current release/update in a timely and controlled manner. For those of you who are long time sys-admins, there is no rocket science here. It is the same sort of planning you would do for any other Enterprise level application.

 In my test environments, I have my Enterprise Controller (EC) and Proxy Controllers (PC) inside Solaris Zones (Solaris 11), so I have a couple of extra checks I do, but the process as a whole is still valid if your EC/PC are on their own separate hardware.

1) Read the Release Notes

Yes, those release notes/README files are important and you should spend the time reading them. They will contain the latest information about the update and any known issues and workarounds.

2) Check Free Disk Space

Confirm that there is enough disk space to unpack and install the upgrade. How much is enough space is the ultimate question. It will vary with each different upgrade and will depend on how you have configured your underlying filesystems and your actual environment. Here are some guidelines. Please note that the numbers I quote tend to be a little generous as it is always better to have more free space than not enough.

  • There should always be a few GB of space free in the root partition (it is just good sys-admin practice - below 90 % would be ideal).
  • The filesystem that holds /var/tmp  will need space for the DB backup that is run as part of the installer. The size of this will depend on the size of your DB. So check how much a "ecadm backup" takes on your system.
  • The filesystem that holds /var/tmp is also the temporary location where we unpack the upgrade bundle.
  • The filesystem that holds /var/opt/sun/xvm will have the majority of the upgrade code installed into it as well as a copy of the installer under the update-saved-state directory.
So what does that mean for size requirements?
  • You need about 5 times the upgrade bundle. The current upgrade bundle is 3.8GB unpacked, so that would be 20GB.
  • The DB backup will take about 10% of the actual DB size.
root@ec:/ec_backup# du -hs *
 1.3G   sat-backup-pre-12.2.1-upgrade.20140702
root@ec:/ec_backup# du -hs /var/opt/sun/xvm/oracle/oradata/OCDB
14G   /var/opt/sun/xvm/oracle/oradata/OCDB

 Although more space is actually used during the backup before it is packed up, I would allow for about 4 GB of space.

  • So for my environment, I would look for about 25GB (rounding up) free space (your number may vary). I am sure I could scrimp and save and get this number down, but the idea is to have plenty of free space to allow for the upgrade to go through without incident.

3) Backups Backups Backups

Before commencing any upgrade, you should make sure you can roll back if something goes horribly wrong. Years of history in administration and support have made me a paranoid person. I believe you can never have too many backups, so I do the following:

  • Confirm you have a successful database backup using "ecadm backup". (You should already be doing this on a weekly basis)
root@ec:/# /opt/SUNWxvmoc/bin/ecadm backup -d pre-12.2.1-upgrade -o

ecadm: using logFile =
ecadm: *** PreBackup Phase 
ecadm: *** Backup Phase
ecadm: *** PostBackup Phase 
ecadm: *** Backup complete
ecadm: *** Output in /ec_backup/sat-backup-pre-12.2.1-upgrade.20140702 
ecadm: *** Log in /var/opt/sun/xvm/logs/sat-backup-2014-07-02-11:52:16.log
Of course, copy the generated backup file to somewhere safe on another system.
  • Confirm you have a successful filesystem backup using your Enterprise backup software. (You should already be doing this on a weekly basis.) I would recommend full filesystem backups and having a separate backup of the /var/opt/sun/xvm directory and any of your Ops Center software libraries if you did not put them in the default location (/var/opt/sun/xvm/locallib/swlib[0-2]).
  • Take a ZFS snapshot (recursive) of the full zone (rpool and any other zpool that are part of the zone). This is normally your easiest and fastest roll back method should you need it. NOTE: Make sure you know how to recover/rollback a zone. "zfs snapshot -r rpool" recursively snapshots all underlying filesystems, but "zfs rollback -r rpool" will only rollback a single filesystem. You need to rollback each filesystem separately. If you are not sure, practice it on a test zone first.
### Take a zfs snapshot ###
    root@ec:/# zfs list
    NAME                              USED  AVAIL  REFER  MOUNTPOINT
    rpool                             156G  41.1G    31K  /rpool
    rpool/ROOT                        134G  41.1G    31K  legacy
    rpool/ROOT/solaris                134G  41.1G  24.6G  /
    rpool/ROOT/solaris-backup-1       174K  41.1G  1.37G  /
    rpool/ROOT/solaris-backup-1/var   110K  41.1G  27.9G  /var
    rpool/ROOT/solaris-backup-2       296K  41.1G  24.2G  /
    rpool/ROOT/solaris-backup-2/var   232K  41.1G  48.4G  /var
    rpool/ROOT/solaris/var            109G  41.1G  77.2G  /var
    rpool/VARSHARE                     88K  41.1G  66.5K  /var/share
    rpool/ec_backup                  1.29G  41.1G  1.29G  /ec_backup
    rpool/export                      161K  41.1G    32K  /export
    rpool/export/home                 111K  41.1G    32K  /export/home
    rpool/export/home/ocadmin          61K  41.1G  40.5K 
    rpool/oracle                     20.7G  41.1G  20.7G 
    root@ec:/# zfs snapshot -r rpool@pre-OC-12.2.1-install.20140702

4) Check for any failed services

It is good practice to clear/enable/disable any broken SMF services, but there are a few key ones to check.

  • Make sure all the Ops Center services that should be running are running and the ones that should not are not. A classic example here is when you have an EC running without a collocated PC. The PC shows as disabled, but still shows in a "svcs -xv" output.

    root@ec:/var/tmp/downloads# svcs -xv
    (Cacao, a common Java container for JDMK/JMX based management
     State: disabled since June 12, 2014 08:07:08 AM EST 
    Reason: Disabled by an administrator.
       See: man -M /usr/share/man -s 1M cacaoadm 
       See: man -M /usr/share/man -s 5 cacao
       Impact: 1 dependent service is not running: 

In this case, our EC did not have a collocated PC, so we should ensure that these services are really disabled and don't try to start-up during the upgrade process.

    root@ec:/var/tmp/downloads# svcadm disable
    root@ec:/var/tmp/downloads# svcadm disable
  •  If you are using zones either on the system where the EC is installed in the GZ or your EC/PC run in a NGZ, you also need to check that the IPS proxies are running to allow the Solaris 11 packaging system to work correctly.
    • In a Global Zone (GZ) check that zones-proxyd is online.
root@t4-1-syd04-b:~# svcs svc:/application/pkg/zones-proxyd:default
STATE          STIME    FMRI
online         Jul_02   svc:/application/pkg/zones-proxyd:default
    • In a Non Global Zone (NGZ)check that the zones-proxy-client is online.
root@ec:~# svcs svc:/application/pkg/zones-proxy-client:default
STATE          STIME    FMRI
online          8:54:47 svc:/application/pkg/zones-proxy-client:default
  • What you are looking for is a clean bill of health from "svcs -xv" command.
    root@ec:/var/tmp/downloads# svcs -xv

5) Check the pkg publishers

To be able to do a successful upgrade, you need the pkg publisher for a system to be working. In a zones environment, that means the publishers in the GZ and all the NGZ should be working. Publishers that don't resolve when a package links into a zone will cause the whole upgrade to stop.

So here are a couple things to look for when you are using an EC in a zone.

  • If this was a test environment where you had multiple EC/PC in different zones, either those EC/PC should be running or the publishers that point to a NON running EC/PC should be cleared. This can be done by issuing a
# pkg unset-publisher Publisher-Name 

The aim here is to clear all the local publishers in the zone and just use the proxied publishers in the GZ.

  • If you have the GZ pointing to a PC that points to the EC that is being upgraded, where the EC is in a NGZ under the GZ (yes this is the whole chicken and egg problem), you have a slightly different problem. During the upgrade, parts of the EC will be shutdown which will stop the remote PC from proxying access to EC's IPS repository. So you need to set the publishers to point to an IPS repository that they can reach. Luckily, the actual IPS repository on the EC does still keep running on port 11000 throughout the upgrade.
root@t4-1-syd04-b:~# pkg publisher
solaris                     origin   online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
cacao                       origin   online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
mp-re          (non-sticky) origin   online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
opscenter                   origin   online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
root@t4-1-syd04-b:~# pkg unset-publisher opscenter
root@t4-1-syd04-b:~# pkg unset-publisher mp-re
root@t4-1-syd04-b:~# pkg unset-publisher cacao
root@t4-1-syd04-b:~# pkg set-publisher -G '*' -g http://ec:11000/ solaris
root@t4-1-syd04-b:~# pkg publisher
solaris                     origin   online F http://ec:11000/
  • You can reset to their original state, all the publishers that were set by Ops Center, by rebooting the system or running the script in each zone.
# /var/opt/sun/xvm/utils/ -P PC_IP_Address
Use as the IP address for the EC/PC when it is pointing too itself

6) Run OCDoctor troubleshoot

Run the OCDoctor troubleshoot script over your EC and PC's before an upgrade. It is a good sanity check to look for and fix underlying problems before you start the upgrade process. If you are in connected mode, your EC should already have the latest version of OCDoctor downloaded. Otherwise, you can update it by running " --update" or downloading from

Note: The error "'root' should not be a role" can be safely ignored as it was only required for earlier versions of Ops Center.

    root@ec:/var/tmp/downloads# /var/opt/sun/xvm/OCDoctor/ -t
    Ops Center Doctor 4.34  [OC,SunOS11] [Read only]
    [02-Jul-2014 11:25AM EST]
    ======================== Checking Enterprise Controller...==============================
    OK: Total number of OSes: 12  Total LDOMs:7  Total Zones: 
    ERROR: User 'root' should not be a role. You should convert it to a
    normal user before the installation.
           This can be done by running:
    # rolemod -K type=normal root
    OK: Files in /var/opt/sun/xvm/images/agent/ have the right permissions
    OK: Files in /var/opt/sun/xvm/osp/web/pub/pkgs/ have the right  permissions
    OK: both pvalue and pdefault in systemproperty are equal to false (at id 114)
    OK: Found only 285 OCDB*.aud files in oracle/admin/OCDB/adump folder
    OK: Found no ocdb*.aud files in oracle/admin/OCDB/adump folder
    OK: No auth.cgi was found in cgi-bin
    OK: User 'oracleoc' home folder points to the right location
    OK: User 'allstart' home folder points to the right location
    OK: Apache logs are smaller than 2 GB
    OK: n1gc folder has the right permissions
    OK: All agent packages are installed properly
    OK: All Enterprise Controller packages are installed properly
    OK: Enterprise Controller status is online
    OK: the version is the latest one (
    OK: satadm timeouts were increased
    OK: tar command was properly adjusted in satadm
    OK: stclient command works properly
    OK: Colocated proxy status is 'disabled'
    OK: Local Database used space is 19%, 6G out of 32G (local DB, using 1 files)
    OK: Debug is disabled in .uce.rc
    OK: Debug is disabled for cacao instance oem-ec
    OK: no 'conn_properties_file_name' value in .uce.rc
    OK: 30G available in /
    OK: 30G available in /var
    OK: 30G available in /var/tmp
    OK: 30G available in /var/opt/sun/xvm
    OK: 30G available in /opt
    OK: DNS does not unexpectedly resolve hostname '_default_'
    OK: Found the server .uce.rc at /var/opt/sun/xvm/uce/opt/server/cgi-bin/.uce.rc
    OK: Server .uce.rc has the correct file permissions
    OK: Server .uce.rc has the correct ownership
    OK: Connectivity to the KB public servers works properly (using download_large.cgi)
    OK: Grouplock file doesn't exist
    OK: package hmp-tools@2.2.1 is not installed
    OK: package driver/x11/xsvc is not installed
    OK: Cacao facet is set to False
    OK: All Solaris 11 agent bundles in /var/opt/sun/xvm/images/agent are imported properly to the repository
    OK: Disconnected mode is not configured
    OK: Locales are OK ("en_US.UTF-8")
    OK: No need to check for Solaris 11 agent bundle issue as this EC is newer than Update 1
    OK: No partially installed packages
    OK: UCE 'private' folder exists
    OK: No http_proxy is set in the user profile files
    OK: 'public' folder has the right ownership
    OK: 'public' folder is writable for uce-sds
    OK: 'private' folder has the right ownership
    OK: 'private' folder is writable for uce-sds
    OK: '/var/tmp' folder is writable for uce-sds
    OK: No old jobs rerun (CR 6990675)
    OK: No need to adjust SEQ_COUNT (MAXID:2986 SEQCOUNT:2986)
    OK: no row with found in DB table
    NOTICE: Can't perform cryptoadm test inside a zone.
            Run --troubelshoot from the global zone as well to test the crypto services.
    OK: System time is not in the past
    OK: User uce-sds is part of all the proper groups
    OK: oracleoc user ulimit -Sn is 1024
    OK: oracleoc user ulimit -Hn is 65536
    OK: FC Libraries do not contain duplicate LUNs
    OK: 'update-saved-state' folder exists and has the right permissions
    OK: verify-db does not return 'Invalid pad value' message
    OK: No credential issues found 
    =========== Proxy controller is installed but not configured, skipping ==================
    =========== Agent controller is installed but not configured, skipping ==================

Now do the upgrade

Choose whichever upgrade method you like. Both the BUI and CLI methods will give you the same end result. The Ops Center upgrade is not a difficult upgrade and following some simple pre-work checks will maximize your chance of a straightforward and successful upgrade.


Rodney Lindner


Latest information and perspectives on Oracle Enterprise Manager.

Related Blogs


« July 2014 »