Pre-Upgrade Checks Enterprise Manager Ops Center
By Rodney Lindner-Oracle on Jul 20, 2014
With the release of Enterprise Manager Ops Center 12.2.1, it is time to go through the upgrade cycle. I thought I would share the pre-upgrade checks I go through when I upgrade to a new Ops Center build. As part of the development team, I get involved in pre-release Quality Assurance testing, which means I end up doing hundreds of upgrades as part of the testing process.
Update releases come out regularly and contain enhancements and bug fixes. As with any other application in your environment, you should upgrade Ops Center to the current release/update in a timely and controlled manner. For those of you who are long time sys-admins, there is no rocket science here. It is the same sort of planning you would do for any other Enterprise level application.
In my test environments, I have my Enterprise Controller (EC) and Proxy Controllers (PC) inside Solaris Zones (Solaris 11), so I have a couple of extra checks I do, but the process as a whole is still valid if your EC/PC are on their own separate hardware.
1) Read the Release Notes
Yes, those release notes/README files are important and you should spend the time reading them. They will contain the latest information about the update and any known issues and workarounds.
2) Check Free Disk Space
Confirm that there is enough disk space to unpack and install the upgrade. How much is enough space is the ultimate question. It will vary with each different upgrade and will depend on how you have configured your underlying filesystems and your actual environment. Here are some guidelines. Please note that the numbers I quote tend to be a little generous as it is always better to have more free space than not enough.
- There should always be a few GB of space free in the root partition (it is just good sys-admin practice - below 90 % would be ideal).
- The filesystem that holds /var/tmp will need space for the DB backup that is run as part of the installer. The size of this will depend on the size of your DB. So check how much a "ecadm backup" takes on your system.
- The filesystem that holds /var/tmp is also the temporary location where we unpack the upgrade bundle.
- The filesystem that holds /var/opt/sun/xvm will have the majority of the upgrade code installed into it as well as a copy of the installer under the update-saved-state directory.
- You need about 5 times the upgrade bundle. The current upgrade bundle is 3.8GB unpacked, so that would be 20GB.
- The DB backup will take about 10% of the actual DB size.
root@ec:/ec_backup# du -hs * 1.3G sat-backup-pre-12.2.1-upgrade.20140702root@ec:/ec_backup# du -hs /var/opt/sun/xvm/oracle/oradata/OCDB 14G /var/opt/sun/xvm/oracle/oradata/OCDB root@ec:/ec_backup#
Although more space is actually used during the backup before it is packed up, I would allow for about 4 GB of space.
- So for my environment, I would look for about 25GB (rounding up) free space (your number may vary). I am sure I could scrimp and save and get this number down, but the idea is to have plenty of free space to allow for the upgrade to go through without incident.
3) Backups Backups Backups
Before commencing any upgrade, you should make sure you can roll back if something goes horribly wrong. Years of history in administration and support have made me a paranoid person. I believe you can never have too many backups, so I do the following:
- Confirm you have a successful database backup using "ecadm backup". (You should already be doing this on a weekly basis)
Of course, copy the generated backup file to somewhere safe on another system.root@ec:/# /opt/SUNWxvmoc/bin/ecadm backup -d pre-12.2.1-upgrade -o /ec_backup/sat-backup-pre-12.2.1-upgrade.20140702 ecadm: using logFile = /var/opt/sun/xvm/logs/sat-backup-2014-07-02-11:52:16.log ecadm: *** PreBackup Phase ecadm: *** Backup Phase ecadm: *** PostBackup Phase ecadm: *** Backup complete ecadm: *** Output in /ec_backup/sat-backup-pre-12.2.1-upgrade.20140702 ecadm: *** Log in /var/opt/sun/xvm/logs/sat-backup-2014-07-02-11:52:16.log root@ec:/#
- Confirm you have a successful filesystem backup using your Enterprise backup software. (You should already be doing this on a weekly basis.) I would recommend full filesystem backups and having a separate backup of the /var/opt/sun/xvm directory and any of your Ops Center software libraries if you did not put them in the default location (/var/opt/sun/xvm/locallib/swlib[0-2]).
- Take a ZFS snapshot (recursive) of the full zone (rpool and any other zpool that are part of the zone). This is normally your easiest and fastest roll back method should you need it. NOTE: Make sure you know how to recover/rollback a zone. "zfs snapshot -r rpool" recursively snapshots all underlying filesystems, but "zfs rollback -r rpool" will only rollback a single filesystem. You need to rollback each filesystem separately. If you are not sure, practice it on a test zone first.
### Take a zfs snapshot ### root@ec:/# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 156G 41.1G 31K /rpool rpool/ROOT 134G 41.1G 31K legacy rpool/ROOT/solaris 134G 41.1G 24.6G / rpool/ROOT/solaris-backup-1 174K 41.1G 1.37G / rpool/ROOT/solaris-backup-1/var 110K 41.1G 27.9G /var rpool/ROOT/solaris-backup-2 296K 41.1G 24.2G / rpool/ROOT/solaris-backup-2/var 232K 41.1G 48.4G /var rpool/ROOT/solaris/var 109G 41.1G 77.2G /var rpool/VARSHARE 88K 41.1G 66.5K /var/share rpool/ec_backup 1.29G 41.1G 1.29G /ec_backup rpool/export 161K 41.1G 32K /export rpool/export/home 111K 41.1G 32K /export/home rpool/export/home/ocadmin 61K 41.1G 40.5K /export/home/ocadmin rpool/oracle 20.7G 41.1G 20.7G /var/opt/sun/xvm/oracle root@ec:/# root@ec:/# zfs snapshot -r rpool@pre-OC-12.2.1-install.20140702 root@ec:/#
4) Check for any failed services
It is good practice to clear/enable/disable any broken SMF services, but there are a few key ones to check.
Make sure all the Ops Center services that should be running are running and the ones that should not are not. A classic example here is when you have an EC running without a collocated PC. The PC shows as disabled, but still shows in a "svcs -xv" output.
root@ec:/var/tmp/downloads# svcs -xvsvc:/application/management/common-agent-container-1:scn-proxy (Cacao, a common Java container for JDMK/JMX based management solution) State: disabled since June 12, 2014 08:07:08 AM ESTReason: Disabled by an administrator. See: http://support.oracle.com/msg/SMF-8000-05 See: man -M /usr/share/man -s 1M cacaoadm See: man -M /usr/share/man -s 5 cacao Impact: 1 dependent service is not running: svc:/application/scn/proxy-available:default root@ec:/var/tmp/downloads#
In this case, our EC did not have a collocated PC, so we should ensure that these services are really disabled and don't try to start-up during the upgrade process.
root@ec:/var/tmp/downloads# svcadm disable svc:/application/scn/proxy-available:defaultroot@ec:/var/tmp/downloads# svcadm disable svc:/application/management/common-agent-container-1:scn-proxy
- If you are using zones either on the system where the EC is installed in the GZ or your EC/PC run in a NGZ, you also need to check that the IPS proxies are running to allow the Solaris 11 packaging system to work correctly.
- In a Global Zone (GZ) check that zones-proxyd is online.
root@t4-1-syd04-b:~# svcs svc:/application/pkg/zones-proxyd:default STATE STIME FMRI online Jul_02 svc:/application/pkg/zones-proxyd:default root@t4-1-syd04-b:~#
- In a Non Global Zone (NGZ)check that the zones-proxy-client is online.
root@ec:~# svcs svc:/application/pkg/zones-proxy-client:default STATE STIME FMRI online 8:54:47 svc:/application/pkg/zones-proxy-client:default root@ec:~#
- What you are looking for is a clean bill of health from "svcs -xv" command.
root@ec:/var/tmp/downloads# svcs -xvroot@ec:/var/tmp/downloads#
5) Check the pkg publishers
To be able to do a successful upgrade, you need the pkg publisher for a system to be working. In a zones environment, that means the publishers in the GZ and all the NGZ should be working. Publishers that don't resolve when a package links into a zone will cause the whole upgrade to stop.
So here are a couple things to look for when you are using an EC in a zone.
- If this was a test environment where you had multiple EC/PC in different zones, either those EC/PC should be running or the publishers that point to a NON running EC/PC should be cleared. This can be done by issuing a
# pkg unset-publisher Publisher-Name
The aim here is to clear all the local publishers in the zone and just use the proxied publishers in the GZ.
- If you have the GZ pointing to a PC that points to the EC that is being upgraded, where the EC is in a NGZ under the GZ (yes this is the whole chicken and egg problem), you have a slightly different problem. During the upgrade, parts of the EC will be shutdown which will stop the remote PC from proxying access to EC's IPS repository. So you need to set the publishers to point to an IPS repository that they can reach. Luckily, the actual IPS repository on the EC does still keep running on port 11000 throughout the upgrade.
root@t4-1-syd04-b:~# pkg publisher
PUBLISHER TYPE STATUS P LOCATION
solaris origin online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
cacao origin online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
mp-re (non-sticky) origin online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
opscenter origin online F https://oracle-oem-oc-mgmt-pc217:8002/IPS/
root@t4-1-syd04-b:~# pkg unset-publisher opscenter
root@t4-1-syd04-b:~# pkg unset-publisher mp-re
root@t4-1-syd04-b:~# pkg unset-publisher cacao
root@t4-1-syd04-b:~# pkg set-publisher -G '*' -g http://ec:11000/ solaris
root@t4-1-syd04-b:~# pkg publisher
PUBLISHER TYPE STATUS P LOCATION
solaris origin online F http://ec:11000/
- You can reset to their original state, all the publishers that were set by Ops Center, by rebooting the system or running the install_ips_ac.sh script in each zone.
# /var/opt/sun/xvm/utils/install_ips_ac.sh -P PC_IP_AddressUse 127.0.0.1 as the IP address for the EC/PC when it is pointing too itself
6) Run OCDoctor troubleshoot
Run the OCDoctor troubleshoot script over your EC and PC's before an upgrade. It is a good sanity check to look for and fix underlying problems before you start the upgrade process. If you are in connected mode, your EC should already have the latest version of OCDoctor downloaded. Otherwise, you can update it by running "OCDoctor.sh --update" or downloading from https://java.net/projects/oc-doctor/downloads/download/OCDoctor-4.36.zip
Note: The error "'root' should not be a role" can be safely ignored as it was only required for earlier versions of Ops Center.
root@ec:/var/tmp/downloads# /var/opt/sun/xvm/OCDoctor/OCDoctor.sh -t Ops Center Doctor 4.34 [OC 18.104.22.16863,SunOS11] [Read only] [02-Jul-2014 11:25AM EST] ======================== Checking Enterprise Controller...============================== OK: Total number of OSes: 12 Total LDOMs:7 Total Zones: ERROR: User 'root' should not be a role. You should convert it to a normal user before the installation. This can be done by running: # rolemod -K type=normal root OK: Files in /var/opt/sun/xvm/images/agent/ have the right permissions OK: Files in /var/opt/sun/xvm/osp/web/pub/pkgs/ have the right permissions OK: both pvalue and pdefault in systemproperty are equal to false (at id 114) OK: Found only 285 OCDB*.aud files in oracle/admin/OCDB/adump folder OK: Found no ocdb*.aud files in oracle/admin/OCDB/adump folder OK: No auth.cgi was found in cgi-bin OK: User 'oracleoc' home folder points to the right location OK: User 'allstart' home folder points to the right location OK: Apache logs are smaller than 2 GB OK: n1gc folder has the right permissions OK: All agent packages are installed properly OK: All Enterprise Controller packages are installed properly OK: Enterprise Controller status is online OK: the version is the latest one (22.214.171.12463) OK: satadm timeouts were increased OK: tar command was properly adjusted in satadm OK: stclient command works properly OK: Colocated proxy status is 'disabled' OK: Local Database used space is 19%, 6G out of 32G (local DB, using 1 files) OK: Debug is disabled in .uce.rc OK: Debug is disabled for cacao instance oem-ec OK: no 'conn_properties_file_name' value in .uce.rc OK: 30G available in / OK: 30G available in /var OK: 30G available in /var/tmp OK: 30G available in /var/opt/sun/xvm OK: 30G available in /opt OK: DNS does not unexpectedly resolve hostname '_default_' OK: Found the server .uce.rc at /var/opt/sun/xvm/uce/opt/server/cgi-bin/.uce.rc OK: Server .uce.rc has the correct file permissions OK: Server .uce.rc has the correct ownership OK: Connectivity to the KB public servers works properly (using download_large.cgi) OK: Grouplock file doesn't exist OK: package email@example.com is not installed OK: package driver/x11/xsvc is not installed OK: Cacao facet is set to False OK: All Solaris 11 agent bundles in /var/opt/sun/xvm/images/agent are imported properly to the repository OK: Disconnected mode is not configured OK: Locales are OK ("en_US.UTF-8") OK: No need to check for Solaris 11 agent bundle issue as this EC is newer than Update 1 OK: No partially installed packages OK: UCE 'private' folder exists OK: No http_proxy is set in the user profile files OK: 'public' folder has the right ownership OK: 'public' folder is writable for uce-sds OK: 'private' folder has the right ownership OK: 'private' folder is writable for uce-sds OK: '/var/tmp' folder is writable for uce-sds OK: No old jobs rerun (CR 6990675) OK: No need to adjust SEQ_COUNT (MAXID:2986 SEQCOUNT:2986) OK: no row with ssh.tunnel.info found in DB table HD_RESOURCE_PARAMETER NOTICE: Can't perform cryptoadm test inside a zone. Run --troubelshoot from the global zone as well to test the crypto services. OK: System time is not in the past OK: User uce-sds is part of all the proper groups OK: oracleoc user ulimit -Sn is 1024 OK: oracleoc user ulimit -Hn is 65536 OK: FC Libraries do not contain duplicate LUNs OK: 'update-saved-state' folder exists and has the right permissions OK: verify-db does not return 'Invalid pad value' message OK: No credential issues found =========== Proxy controller is installed but not configured, skipping ================== =========== Agent controller is installed but not configured, skipping ================== root@ec:/var/tmp/downloads#
Now do the upgrade
Choose whichever upgrade method you like. Both the BUI and CLI methods will give you the same end result. The Ops Center upgrade is not a difficult upgrade and following some simple pre-work checks will maximize your chance of a straightforward and successful upgrade.