As discussed in an earlier blog post, Always-On Monitoring (AOM) provides the capability of monitoring all (or a specific list of) targets using the same EM agents already deployed to these targets during complete downtime of the EM environment. It is especially useful for monitoring critical targets during planned EM downtime such as patching and/or upgrades. In this previous blog post we reviewed the high-level steps for the setup and configuration of the AOM tool. In this blog post, we will discuss the following topics: configuring High Availability, configuring Disaster Recovery and troubleshooting AOM.
As with any other application, Always-On Monitoring should be configured for High Availability (HA) and Disaster Recovery (DR) according to the Maximum Availability Architecture standards. This will ensure that the identified list of targets that must be monitored during Enterprise Manger downtime will continue to be monitored even if a system failure occurs on one of the AOM instances.
High Availability
Setting up High Availability for AOM is as simple as setting up additional AOM instances. This helps ensure availability of monitoring but also provides load sharing. The incoming alerts from agents can be directed to another AOM instance if one goes down via a server load balancer (SLB). To keep events from a target in the proper order, a single target will send a message to one AOM instance only. If that AOM instance goes down, the application assigns the responsibility of the queues previously managed by the down AOM instance to another instance. Not only are the incoming agent alerts shared by multiple AOM instances, but the work of sending out the notifications can also be shared among the AOM instances. Adding another AOM instance is as simple as running the following command from the new server: emsca add_ems. Below is a sample of the output from this command:
% $AOM_HOME/scripts/emsca add_ems
Oracle Enterprise Manager Cloud Control 13c Release 1 Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved. ---------------------------------------------------------------
Always-On Monitoring Repository Connection String : (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=aomRepo.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=aom)))
Always-On Monitoring Repository Username [ems] :
Always-On Monitoring Repository Password [ems] :
Enterprise Manager Repository Connection String : (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=emServer.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emdb)))
Enterprise Manager Repository Username : sysman
Enterprise Manager Repository Password :
Connecting to Always-On Monitoring repository.
Enter Enterprise Manager Middleware Home : /u01/app/oracle/MWare
Registering Always-On Monitoring instance
Always-On Monitoring Upload URL: https://aomserver.domain:8081/upload
Oracle PKI Tool : Version 12.1.3.0.0
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.
Certificate was added to keystore
removed the key from the repository
Disaster Recovery
For those familiar with the Disaster Recovery setup for an Enterprise Manager 13c environment, the AOM Disaster Recovery setup is pretty much the same implementation. The AOM Disaster Recovery setup is an active/passive configuration. This means that if the primary AOM site goes down, the virtual IP will fail over to a standby node. Once the AOM instance is started, it will run exactly the same on the DR site as it did on the primary site. To support DR, the AOM repository needs to be available on the DR site and the AOM instance file systems need to be replicated to the DR site.
Troubleshooting
The following commands are helpful in troubleshooting AOM.
emsctl status
This command will return the status of the AOM service. Below is an example:
% $AOM_HOME/scripts/emsctl status
Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
------------------------------------------------------------------
Always-On Monitoring Version : 13.1.0.0.0
Always-On Monitoring Home: /u01/app/oracle/aom
Started At: January 13, 2016 4:11:01 PM PST
Last Repository Sync: February 2, 2016 1:41:17 PM PST
Upload URL: https://aomserver.domain:8081/upload
Always-On Monitoring Process ID: 15399
Always-On Monitoring Repository: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=aomRepo.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=aom)))
Enterprise Manager Repository: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=emServer.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emdb)))
Notifications Enabled<: false
Always-On Monitoring is up.
emsctl ping
Another way to check the status of the AOM service is to use the ping option as seen below:
Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
------------------------------------------------------------------
Always-On Monitoring is running
emsctl list_agents
This command provides a count and list of URLs that have communicated with the AOM service in the past hour.
emsctl getstats
This command will display AOM performance statistics. Log files are also helpful during AOM troubleshooting. AOM records events that occur during operation in log files located under the $AOM_HOME/logs directory and are listed below. The .log files are rotated once the primary file reaches the maximum size of 10M. Two copies of the files are kept, the primary and one previous (rotated) file.
emsca logs:
There are different log levels that determine the type of information recorded in these log files. The levels can be set at INFO, DEBUG, WARN(default setting), and ERROR. The log level can be changed by adding the logLevel property to the $AOM_HOME/conf/emsconfig.properties file. The AOM instance must be bounced for the change to take effect. This is done with the commands below:
emsctl stop
emsctl start
Below is a recommended list of steps to take when troubleshooting AOM:
emctl get property -name "oracle.sysman.core.events.ems.emsURL"
emcli get_oms_config_property -property_name='oracle.sysman.core.events.ems.downtimeContact'
To summarize, Always-On Monitoring offers a solution to the long-time problem of who monitors our critical targets when EM is down. To make sure you get the most from this solution, at a minimum, it is important to setup AOM with High Availability. If a DR/Standby site is setup for the EM environment, it is an MAA best practice to also setup a DR site for the AOM application. For more detailed information on Always-On Monitoring, refer to the Always-On Monitoring chapter in the Enterprise Manager Cloud Control Administrator’s Guide.