X

Best Practices, tips and news for managing Oracle’s Engineered Systems for both on-premise and cloud.

Oracle Enterprise Manager 13c and Always-On Monitoring -Part 2

As discussed in an earlier blog post, Always-On Monitoring (AOM) provides the capability of monitoring all (or a specific list of) targets using the same EM agents already deployed to these targets during complete downtime of the EM environment. It is especially useful for monitoring critical targets during planned  EM downtime such as patching and/or upgrades. In this previous blog post we reviewed the high-level steps for the setup and configuration of the AOM tool. In this blog post, we will discuss the following topics: configuring High Availability, configuring Disaster Recovery and troubleshooting AOM.

As with any other application, Always-On Monitoring should be configured for High Availability (HA) and Disaster Recovery (DR) according to the Maximum Availability Architecture standards. This will ensure that the identified list of targets that must be monitored during Enterprise Manger downtime will continue to be monitored even if a system failure occurs on one of the AOM instances.

High Availability

Setting up High Availability for AOM is as simple as setting up additional AOM instances. This helps ensure availability of monitoring but also provides load sharing. The incoming alerts from agents can be directed to another AOM instance if one goes down via a server load balancer (SLB). To keep events from a target in the proper order, a single target will send a message to one AOM instance only. If that AOM instance goes down, the application assigns the responsibility of the queues previously managed by the down AOM instance to another instance. Not only are the incoming agent alerts shared by multiple AOM instances, but the work of sending out the notifications can also be shared among the AOM instances. Adding another AOM instance is as simple as running the following command from the new server: emsca add_ems. Below is a sample of the output from this command:

% $AOM_HOME/scripts/emsca add_ems

Oracle Enterprise Manager Cloud Control 13c Release 1 Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved. ---------------------------------------------------------------

Always-On Monitoring Repository Connection String : (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=aomRepo.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=aom)))

Always-On Monitoring Repository Username [ems] :

Always-On Monitoring Repository Password [ems] :

Enterprise Manager Repository Connection String : (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=emServer.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emdb)))

Enterprise Manager Repository Username : sysman

Enterprise Manager Repository Password :

Connecting to Always-On Monitoring repository.

Enter Enterprise Manager Middleware Home : /u01/app/oracle/MWare

Registering Always-On Monitoring instance

Always-On Monitoring Upload URL: https://aomserver.domain:8081/upload

Oracle PKI Tool : Version 12.1.3.0.0

Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Certificate was added to keystore

removed the key from the repository

Disaster Recovery

For those familiar with the Disaster Recovery setup for an Enterprise Manager 13c environment, the AOM Disaster Recovery setup is pretty much the same  implementation. The AOM Disaster Recovery setup is an active/passive configuration. This means that if the primary AOM site goes down, the virtual IP will fail over to a standby node. Once the AOM instance is started, it will run exactly the same on the DR site as it did on the primary site. To support DR, the AOM  repository needs to be available on the DR site and the AOM instance file systems need to be replicated to the DR site.

Troubleshooting

The following commands are helpful in troubleshooting AOM.

emsctl status

This command will return the status of the AOM service. Below is an example:

% $AOM_HOME/scripts/emsctl status

Oracle Enterprise Manager Cloud Control 13c Release 1

Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.

------------------------------------------------------------------

Always-On Monitoring Version : 13.1.0.0.0

Always-On Monitoring Home: /u01/app/oracle/aom

Started At: January 13, 2016 4:11:01 PM PST

Last Repository Sync: February 2, 2016 1:41:17 PM PST

Upload URL: https://aomserver.domain:8081/upload

Always-On Monitoring Process ID: 15399

Always-On Monitoring Repository: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=aomRepo.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=aom)))

Enterprise Manager Repository: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=emServer.domain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=emdb)))

Notifications Enabled<: false

Always-On Monitoring is up.

emsctl ping

Another way to check the status of the AOM service is to use the ping option as seen below:

Oracle Enterprise Manager Cloud Control 13c Release 1

Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.

------------------------------------------------------------------

Always-On Monitoring is running

emsctl list_agents

This command provides a count and list of URLs that have communicated with the AOM service in the past hour.

emsctl getstats

This command will display AOM performance statistics. Log files are also helpful during AOM troubleshooting. AOM records events that occur during operation in log files located under the $AOM_HOME/logs directory and are listed below. The .log files are rotated once the primary file reaches the maximum size of 10M. Two copies of the files are kept, the primary and one previous (rotated) file.

emsca logs:

  • emsca.err (only errors)
  • emsca.log.0 (rotating log file that contains all output including errors)
ems logs:
  • ems.err (only errors)
  • ems.log.0(rotating log file that contains all output including errors)

There are different log levels that determine the type of information recorded in these log files. The levels can be set at INFO, DEBUG, WARN(default setting), and ERROR. The log level can be changed by adding the logLevel property to the $AOM_HOME/conf/emsconfig.properties file. The AOM instance must be bounced for the change to take effect. This is done with the commands below:

emsctl stop

emsctl start

Below is a recommended list of steps to take when troubleshooting AOM:

  1. Check the log file.
  2. Verify that the URL is set in Enterprise Manager. This is done using this command: emctl get property -name "oracle.sysman.core.events.ems.emsURL"
  3. Verify that the URL is set on the Management Agent. To do this, verify the proper setting for EMS_URL in the $AGENT_HOME/sysman/config/emd.properties file.
  4. Verify that Always-On Monitoring is running and notifications are enabled. This can be verified by running the emsctl status command
  5. Verify that downtime contacts have been specified in Enterprise Manger and Always-On Monitoring. Verifying this is a manual process depending on how the downtime contacts were configured.
    1. If a global downtime contact was specified, verify this setting using emcli get_oms_config_property -property_name='oracle.sysman.core.events.ems.downtimeContact'
    2. If per target downtime contacts were specified, it can be verified by looking at the “Downtime Contact” target property for each of the targets.

To summarize, Always-On Monitoring offers a solution to the long-time problem of who monitors our critical targets when EM is down. To make sure you get the most from this solution, at a minimum, it is important to setup AOM with High Availability. If a DR/Standby site is setup for the EM environment, it is an MAA best practice to also setup a DR site for the AOM application. For more detailed information on Always-On Monitoring, refer to the Always-On Monitoring chapter in the Enterprise Manager Cloud Control Administrator’s Guide.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.