Tuesday Nov 25, 2014

Monitoring NFS mounted file systems using EM12c

A customer recently asked me how they could monitor and alert against all the NFS mounted file systems across their datacenter. Here is a quick guide to do the same.

Read More

Thursday Sep 11, 2014

Simplify deployment of JVMD Agents to command line Java applications

Contributing Author: Shiraz Kanga, Consulting Member of Technical Staff, Oracle

Most customers of Oracle Enterprise Manager using JVM Diagnostics use the tool to monitor their Java Applications servers like Weblogic, Websphere, Tomcat, etc. In this environment it is fairly easy to deploy the JVMD Agent. Since it is distributed as a war file, you merely deploy the agent into a running application server using the management GUI or command line tools. Then you can start monitoring with no need for a restart of the app server or for the modification of any startup commands or scripts. However, with other types of Java applications that do not allow for any code deployment at runtime such as AWT/Swing or command line java applications these steps are necessary. Modifying startup scripts is complex because each application comes with its own custom and unique launch script. Additionally, the command that actually launches the runtime needs to have the java command with its related parameters (like -Xmx) the JVMD Agent with its own parameters (like console host/port) and the application itself which may have some more custom parameters. People often get confused due to the complexity that is seen here.

I've recently had customers that needed to monitor Hadoop, HDFS, Zookeeper, Kafka, Cassandra and Solr with JVMD. In order to simplify some of the complexity discussed above, I created a simple script based framework that makes things a bit easier. Feel free to use my approach to quickly setup JVMD with these or any other command line java programs. You can also use it as the basis for your own modifications. The framework modifies the startup scripts supplied with these tools in order to add the JVMD agent. All the code/scripts are attached in a zip file. Both original and modified versions of all changed scripts are included so you can easily see the modifications I made with a simple diff.

Here's how these scripts are setup. Everything is configured using 4 environment variables as shown below:

    export JVMD_AGENT_HOME=/home/skanga/servers
    export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
    export JVMD_MANAGER_PORT=3800
    export JVMD_UNIQUE_ID=<unique name for each server process>

where the JVMD_AGENT_HOME must contain the jamagent-env.sh (from the attached zip file) and jamagent.war (which can be downloaded from your JVMD console). The first three of these are likely to remain unchanged for all the JVMs being monitored so you can easily add them directly into jamagent-env.sh if needed.

The JVMD_UNIQUE_ID will always be unique so it must not be placed there. However it has two other modes where you can use a pointer to the unique ID instead of specifying it directly. You can point to either an environment variable or to a JVM system property that holds the actual unique ID. If you are using these cases then you could add this one to the jamagent-env.sh script too.

If JVMD_UNIQUE_ID starts with the string "sysprop-" then the actual unique ID will be read from the JVM system property named by the string following "sysprop-". For example if JVMD_UNIQUE_ID is "sysprop-server_name" and we have a system property -Dserver_name=MyTestingServer then JVMD will use MyTestingServer as the JVM unique identifier.

If JVMD_UNIQUE_ID starts with the string "envvar-" then the actual unique ID will be read from the environment variable named by the string following "envvar-". For example if JVMD_UNIQUE_ID is "envvar-server_name" and we have an environment variable called server_name=MyTestingServer then JVMD will use MyTestingServer as the JVM unique identifier.

Caution: Do not use dash (minus) character in the environment variable setup of unique id. Use underscore instead.

Generic Launch Script Modifications

After these four environment variables are set we need to modify our launch scripts. Make sure you have a backup of all files before you proceed. In the main script that you use to launch your java application look for a line that has a format that is similar to the one below: 
and replace it with

So we simply added a $JVMD_AGENT_INSERT just before the name of the Main class. If there are multiple such lines then you should modify them all in the same way. And in order to configure $JVMD_AGENT_INSERT we also need to source jamagent-env.sh (with some error checking). So we insert a snippet like this in the line just before the JAVA invocation. 

# add JVMD Agent Env settings
[[ -e "${JVMD_AGENT_HOME}/jamagent-env.sh" ]] 
&& source "${JVMD_AGENT_HOME}/jamagent-env.sh" ||
{ echo "ERROR: JVMD_AGENT_HOME undefined or does not contain jamagent-env.sh" 1>&2 ; exit 1; } 

NOTE: Everything after the comment above should in a single line of code in your launch script. This line gets mangled by the blogging software so it is best to cut & paste it from it from one of the scripts in the attached zip file.

We will now look at how I used these techniques to add JVMD monitoring to Kafka, Hadoop, Zookeeper, Cassandra and Solr. 

1) Kafka 2.8.0-

I used Kafka 2.8.0- and downloaded it directly from the Kafka site. In Kafka, ALL processes are initiated through a common launcher called kafka-run-class.sh in the bin folder. All the other shell scripts (including the built-in Zookeeper) call this one. So this single insertion point is the only place that we will need to modify in order to add JVMD monitoring to Kafka. Pretty simple. Using the modified script (inside the attached zip file) you can run the servers as shown below:

TEST - with mods to use JVMD
cd /home/skanga/servers/kafka_2.8.0-
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com

# start a zookeeper server
export JVMD_UNIQUE_ID=zookeeper-server
./zookeeper-server-start.sh ../config/zookeeper.properties

# start a kafka server
export JVMD_UNIQUE_ID=kafka-server
./kafka-server-start.sh ../config/server.properties

2) Hadoop 2.4.1

The scripts called hadoop, hfds, mapred and yarn in the hadoop bin directory will ALL need to be modified for JVMD monitoring. Using the modified scripts (inside the attached zip file) you can run all the servers as shown below:

TEST - with mods for hadoop command to use JVMD

cd /home/skanga/servers/hadoop-2.4.1
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com

# Launch the hdfs nfs gateway
export JVMD_UNIQUE_ID=hdfs-nfs3-gateway
./bin/hdfs nfs3

# Run a mapreduce history server
export JVMD_UNIQUE_ID=mapred-historyserver
./bin/mapred historyserver

# Run a yarn resource manager
export JVMD_UNIQUE_ID=yarn-resourcemanager
./bin/yarn resourcemanager

# Run a hadoop map-reduce job to find the value of PI (QuasiMonteCarlo method)
export JVMD_UNIQUE_ID=hadoop-test-pi-montecarlo
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi 1024 100

3) Zookeeper 3.4.6

The standalone version of zookeeper has a common environment setup script called zkEnv.sh where most JVMD setup can be done. After that a minor modification is needed in the java launch command in zkServer.sh after which all JVMD monitoring works fine. The scripts called zkCleanup.sh and zkCli.sh probably do not need monitoring but can be easily added if really needed.

TEST - with mods for zkServer.sh command to use JVMD

cd /home/skanga/servers/zookeeper-3.4.6/bin
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_UNIQUE_ID=zk-server

# start the zookeeper server
./zkServer.sh start
./zkServer.sh status
./zkServer.sh stop

4) Cassandra 2.0.9

The Apache Cassandra data store has a common environment setup script called conf/cassandra-env.sh where we can add the command to source our include script. Then a minor modification is needed to the java launch command in bin/cassandra after which all JVMD monitoring works fine. The other scripts probably do not need monitoring but can be easily added if really needed. 

TEST - with mods for cassandra command to use JVMD

cd /home/skanga/servers/apache-cassandra-2.0.9/bin
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_UNIQUE_ID=cassandra-server

# start cassandra
./cassandra -f

5) Solr 4.9.0

The Solr search server is an interesting case. In production scenarios, users will probably use the Solr war file in their own application server. In this scenario the standard JVMD warfile can be deployed to the same application server and monitored easily. However, the Solr distribution also include an embedded mode which may be used by simply running java -jar start.jar and for this scenario we have converted this java command into a simple script called start.sh and added it to the same folder as start.jar in order to run it. Using this script (inside the attached zip file) you can run a test as shown below:

TEST - with addition of start.sh command to use JVMD with Solr

cd /home/skanga/servers/solr-4.9.0/example
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_UNIQUE_ID=solr-server

# start solr

After everything is setup properly for your servers you should see all the relevant JVMs in the default pool with the proper ID as shown in the image below.

JVMs in Default Pool (with hostnames & ip addresses blanked out)
Click image to expand it in a new tab

Remember to be a bit patient and wait a few seconds until the connections are established and the servers appear in the console.

Wednesday Sep 10, 2014

Enterprise Manager Ops Center - Changing alert severity in default monitoring policies

Modifying Monitoring Policies

Ops Center delivers default monitoring policies for the various types of assets managed and monitored by Ops Center. These policies are specific to each asset type. In the real world, these policies act only as a starting point and you will need to customize them to suit your own environment. Most of the customizations can be done in the BUI (Browser User Interface),  which is covered in the manuals and other blogs on this site, but occasionally, you will need to manually edit the underlying XML of the default policies to get the customization you require. The method of doing that is covered in this blog entry.

List of monitoring ptofiles

In the BUI, you can easily copy these default policies and then modify them to suit your own environment.

You can do the following modifications in the BUI:

  • enable/disable monitoring rules
  • add a new monitoring rule
  • delete an existing monitoring rule
  • Modify the thresholds/severities/triggers for most alert rules

Modifications are normally done by highlighting the rule, clicking the edit [] icon, making your changes and then clicking the apply button. Remember that once you have made all the rule changes, the policy should be applied/reapplied to your target assets. Most rules are editable in this way.

Edit SMF rule

However, not all rules can be edited in the BUI. A rule like "Operating System Reachability" can not be edited from the BUI and must be done manually by editing the underlying XML. These rules can be identified by the fact that there is no edit [] icon available when the "Operating System Reachability" alert rule is selected.

Only Ops Center factory default policies (product standard default policies) can be edited by modifying the XML on the filesystem. When a policy is modified, it copies the default policy to a custom policy which can be modified in the BUI. These modified policies are stored in the database, not as XML on the filesystem. This means that if you want to change one of these non editable rules, you must manually edit the factory default policy.  Then, make a copy of the policy to create a custom policy and, if required, re-apply any additional customizations in the BUI, so that your new policy adsorbs the manual modifications.

While the default values are normally sufficient for most customers, I had a request from a customer who wanted to change the "Operating System Reachability" severity from Warning (the default) to Critical. He considered this to be an important event that needed to be alerted at a higher level so that it would grab the attention of his administration staff. Below is the procedure for how to achieve such a modification.

Manually Modifying the Default Alert Severity

As part of a standard install, Ops Center will create an alert of severity Warning if it loses connectivity with an Operating System (S8/9 OS or S10/11 GZ).

This will create an alert with the description "The asset can no longer be reached"

Warning Level alert

So here is the procedure for how to change the default alert severity for the "Operating System Reachability" alert from Warning to Critical. Be aware that there is a different alert for "Non-global zone Reachability", which will not be covered here, but modifying it, or other alerts, would follow a similar procedure.

We will be modifying the XML files for the default monitoring policies. These can be found at /var/opt/sun/xvm/monitoringprofiles on your EC.

root@ec:/var/opt/sun/xvm/monitoringprofiles# ls
Chassis.xml                       MSeriesDomain.xml                 ScCluster.xml
CiscoSwitch.xml                   NasLibrary.xml                    ScNode.xml
Cloud.xml                         NonGlobalZone.xml                 ScZoneClusterGroup.xml
ExadataCell.xml                   OperatingSystem.xml               ScZoneClusterNode.xml
FileServer.xml                    OvmGuest.xml                      Server.xml
GlobalZone.xml                    OvmHost.xml                       Storage.xml
IscsiStorageArray.xml             OvmManager.xml                    Switch.xml
LDomGuest.xml                     PDU.xml                           Tenancy.xml
LDomHost.xml                      RemoteOracleEngineeredSystem.xml  VirtualPool.xml
LocalLibrary.xml                  SanLibrary.xml
MSeriesChassis.xml                SanStorageArray.xml

Follow the steps below to modify the monitoring policy:

  1. In the BUI, identify which policies you want to modify. Look at an asset in the BUI and select the "Monitoring" tab. At the top of the screen, you will see what monitoring policy (Alert Monitoring Rules) it is running. In this case, the policy is called "OC- Global Zone", which will be the "GlobalZone.xml" file.

    Identify Profile type

    Or alternatively, log on to the EC and grep for the alert rule name.

    # grep "Operating System Reachability" *
    GlobalZone.xml:                <name>Operating System Reachability</name>
    OperatingSystem.xml:                <name>Operating System Reachability</name>

    In this case, we will want to change "OC - Operating System" and "OC - Global Zone" policies, as they both have the "Operating System Reachability" rule, so we will be editing both the "GlobalZone.xml" and "OperatingSystem.xml" files.

  2. Make a backup copy of any XML file you modify (in case you mess something up).

    # pwd
    # cp OperatingSystem.xml OperatingSystem.xml.orig
    # cp GlobalZone.xml GlobalZone.xml.orig
  3. Edit each file and look for the rule name

         <name>Operating System Reachability</name>

    and change "unreachable.duration.minutes.WARNING" to "unreachable.duration.minutes.CRITICAL".

         <name>Operating System Reachability</name>

    Repeat for the other file(s).

  4. Make a backup copy of  your modified XML files as these files may be overwritten during an upgrade process.

  5. Now restart the EC so that the new monitoring policies are re-read.

  6. You should now apply the new policy to the hosts you want to have the updated rule.

  7. Check the Message Center in the Navigation panel and you will see that your alert has now changed from "Warning" to "Critical".

Critical Alert level

A Best Practice option would now use the BUI to copy the new  (OC - Global Zone and OC - Operating System) policies to your own custom policies, adding any additional rule modifications. Copying the new OC policy  to a custom policy saves it into the database so it will not get overridden by any subsequent Ops Center upgrade. Remember to apply the custom policy to your asset(s) or asset groups.

It is good practice to keep the name of the source policy in the name of your custom policy. It will make your life easier if you ever get confused about which policy applies to which type of asset or if you want to go back to the original source policy.

If you want your new custom policy to be automatically applied when you discover/provision a new asset, you will need to select the policy and click the "Set as Default Policy" action for that asset class.

Setting default policy

The green tick on the icon indicates that a policy is the default for that asset class.

You have now successfully modified the default alert severity, for an alert that could not be modified in the BUI.


Rodney Lindner
Senior IT/Product Architect
Systems Management - Ops Center Engineering

Thursday Feb 27, 2014

Fast Recovery Area for Archive Destination

If you are using Fast Recovery Area (FRA) for the archive destination and the destination is set to USE_DB_RECOVERY_FILE_DEST, you may notice that the Archive Area % Used metric does not trigger anymore. Instead you will see the Recovery Area % Used metric trigger when it hits a Warning threshold of 85% full, and Critical of 97% full. As this metric is controlled by the server side database thresholds it cannot be modified by Enterprise Manager (see MOS Note 428473.1 for more information). Thresholds of 85/97 are not sufficient for some of the larger, busier databases. This may not give you enough time to kickoff a backup and clear enough logs before the archiver hangs. If you need different thresholds, you can easily accomplish this by creating a Metric Extension (ME) and setting thresholds to your desired values.  This blog will walk through an example of creating an ME to monitor archive on FRA destinations, for more information on ME's and how they can be used, refer to the Oracle Enterprise Manager Cloud Control Administrator's Guide

[Read More]

Tuesday Feb 18, 2014

Monitoring Archive Area % Used on Cluster Databases

One of the most critical events to monitor on an Oracle Database is your archive area. If the archive area fills up, your database will halt until it can continue to archive the redo logs. If your archive destination is set to a file system, then the Archive Area % Used metric is often the best way to go. This metric allows you to monitor a particular file system for the percentage space that has been used. However, there are a couple of things to be aware of for this critical metric.

Cluster Database vs. Database Instance

You will notice in EM 12c, the Archive Area metric exists on both the Cluster Database and the Database Instance targets. The majority of Cluster Databases (RAC) are built against database best practices which indicate that the Archive destination should be shared read/write between all instances. The purpose for this is that in case of recovery, any instance can perform the recovery and has all necessary archive logs to do so. Monitoring this destination for a Cluster Database at the instance level caused duplicate alerts and notifications, as both instances would hit the Warning/Critical threshold for Archive Area % Used within minutes of each other. To eliminate duplicate notifications, the Archive Area % Used metric for Cluster Databases was introduced. This allows the archive destination to be monitored at a database level, much like tablespaces are monitored in a RAC database.

In the Database Instance (RAC Instance) target, you will notice the Archive Area % Used metric collection schedule is set to Disabled.

If you have a RAC database and you do not share archive destinations between instances, you will want to Disable the Cluster Database metric, and enable the Database Instance metric to ensure that each destination is monitored individually.

Tuesday Mar 12, 2013

Monitoring virtualization targets in Oracle Enterprise Manager 12C

Contributed by Sampanna Salunke, Principal Member of Technical Staff, Enterprise Manager

For monitoring any target instance in Oracle Enterprise Manager 12C, you would typically go to target home page, and click on the target menu to navigate to:

  • Monitoring->All Metrics page to view all the collected metrics
  • Monitoring->Metric and Collection Settings to set thresholds and/or modify collection frequencies of metrics
The thresholds and collection frequencies modified affect only the target instance that you are making changes to.

However, some of virtualization targets need to be monitored and managed differently due to changes made to the way data is collected and thresholds/collection frequencies are applied. Such target types include:

  • Oracle VM Server
  • Oracle VM Guest

As an optimization effort to minimize number of connections made to Oracle VM Manager to collect data for virtualization targets, the performance metrics for Oracle VM Server and Oracle VM Guest targets are “bulk-collected” at the Oracle VM Server Pool level. This means that thresholds and collection frequencies of Oracle VM Server and Oracle VM Guest metrics need to be set on the Oracle Server Pool that they belong to. For example, if a user wants to set thresholds on the “Oracle VM Server Load:CPU Utilization” metric for Oracle VM Server target, the sequence of steps to be performed are:

1. Navigate to the homepage of the Oracle VM Server Pool target that the Oracle VM Server target belongs to

2. Click on the target menu->Monitoring->Metric and Collection Settings

3. Expand the view option to “All Metrics” if required, and find the “Oracle VM Server Load” metric and change the thresholds or collection frequency of "CPU Utilization" as required.

Note that any changes made at the Oracle VM Server Pool for a “bulk collected” metric affect all the targets for which the metric is applicable in the server pool. In this example, since the user modified the “Oracle VM Server Load: CPU Utilization” threshold, the change is applied to all the Oracle VM Server targets in the server pool sg-pool1.

To summarize – the differences between “traditional” monitoring and “bulk-collected” monitoring is that the thresholds and collection frequencies of metrics are modified at the parent target, and the changes made are applied to all the children targets for which the metrics are applicable. However, data and alerts uploaded continue to appear as normal against the child target.

Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter

Thursday Nov 03, 2011

Alert Monitoring and Problem Notification in Oracle Enterprise Manager Ops Center

Oracle Enterprise Manager Ops Center provides full lifecycle management of your Oracle hardware and operating systems, including your virtual environments. A significant portion of any given asset's lifecycle is spent in daily operations and when things are running smoothly, there isn't much for an administrator to do. When things go awry, it's critical to know what happened and why as quickly as possible. Oracle Enterprise Manager Ops Center provides alert monitoring and problem notification and management capabilities to enable you to do just that. I'll walk you through a quick and simple example of how you can use these features and hopefully it will spark ideas of how you can implement even more interesting solutions using the same basic steps.

The first step is to tune your monitoring rules. Each type of asset will have a default set of monitoring rules that are applied when the asset is first managed. Rules can be managed on individual assets via their Monitoring tab, or by applying Monitoring Profiles to individual assets or groups of assets. Monitoring rules can be configured to raise alerts when, for example, a monitored attribute exceeds a threshold value for a selectable period of time. For more details on how to configure your monitoring rules, please see section 9 of the Advanced User's Guide, available by clicking on the Help link from within the browser user interface. If you update monitoring rules in a Monitoring Profile, be sure to apply that profile to your desired assets in order to make it affect their monitoring rules. For this example I have set a very short window for the CPU Usage attribute to generate an alert after only 1 minute of high CPU utilization, as shown in the screenshot below.

When an alert is generated, a new problem will be created if none is already open for the issue. Otherwise the alert will be added to an existing problem. Problems aggregate alerts and annotations together and provide the opportunity to assign and track resolution. Any users who have their Notification Profile defined to receive notification of the problem will get an email or page with the pertinent details. The image below shows how you might specify to have the root user subscribe to get email notification of all WARNING or higher level problems.

Problems can be managed holistically from the Message Center in the top of the left-hand navigation panel or they can be viewed for individual assets by selecting the Problems tab. When looking at an open problem, icons along the top allow you to see existing alerts and annotations, to add an annotation, to assign the problem to a user or to take action on the problem, as shown in the screenshot below.

Annotations can be simple textual comments or suggested actions which can include the execution of an existing Operational Plan. For more detail on how to use Operational Plans, see section 11 of the Advanced User's Guide. For this example, I created a simple Operational Plan to execute a prstat. Be sure to select the appropriate Subtype, in this case a Global Zone.

When adding an annotation to a problem, you can optionally select the checkbox at the bottom of the window in order to save that annotation to the Problems Knowledge Base and associate it to future problems of the same type and severity as shown below.

When an annotation has been saved to the Problems Knowledge Base, it can be edited to include additional severities and can also be changed to execute automatically when a future problem is initially created, as shown below. For more detail on the Problems Knowledge Base, please refer to section 10 of the Advanced User's Guide.

When a new problem is detected, the newly added Automated Action will execute the associated Operational Plan and attach the output as an annotation to the problem. To demonstrate this in action, I executed several 'dd' commands on the host to force excessive CPU usage. In this case, the prstat output shows the high CPU usage of the processes that were running at the time that the alert was generated, even though they lasted only a few minutes.

This is clearly a simple example and would not suffice to capture very short-lived processes but it illustrates the possibilities available. The automatic action could have been a more in-depth data gathering script utilizing dTrace or could have even made system changes, depending on the real scenario it was built to address. I hope this quick walk-through has provoked thoughts of how you might implement Alert Monitoring and Problem Notification and Management in your enterprise using Oracle Enterprise Manager Ops Center.

Follow Oracle Enterprise Manager Ops Center at : 

Twitter   Facebook YouTube Linkedin


Latest information and perspectives on Oracle Enterprise Manager.

Related Blogs


« October 2015