Enterprise Manager 12C Agent Home Page

My first blog focused on the new architecture of the Oracle Enterprise Manager 12c agent. Now I will start a series of blogs that discuss various issues. This current entry will discuss the numerous tools available for the end user to self-diagnose issues and keep track of key performance metrics of the agent.

To start with, the agent maintains detailed statistics around each operation being performed and can report on these automatically in a top 'N' style report hourly. This is called the Top Metric Report and can be enabled via the property topMetricReporter=true in the emd.properties file. Once enabled, reports are automatically produced hourly and can also be requested on demand. Reports are maintained for a period of 24 hours.

The reports are stored in xml inside of the agent state home (available in $EMSTATEDIR/sysman/emd/topMetrics). They are also available from the agent metric browser, where a report can be requested to be generated as well.

The report provides detailed information regarding the top consuming cpu targets and/or metrics enabling customers to decide if collections are occurring too frequently. Customers can then modify the Oracle default collection frequencies.

The agent home page in the enterprise manager console has been redesigned to provide specific performance information about the agent. There are three main panels to be review.

The first is the performance panel. It contains 3 key performance indicators (KPI's) of the agent. They are:

  • Upload transfer time for 1KB measured in milliseconds (this metric is the graphed values)

  • Time spent on average per collection as a percentage of the declared collection interval

  • Agent heartbeat round trip time measured in milliseconds.

An example is shown below.

The upload transfer time is simply a measure of how long on average its taking to perform the round trip between the agent and loading sub-system. The time includes not only the http communication time but the time to load into the repository.

The % of collection measures on average how long metrics are taking to collect as a percentage of their declared interval. For instance, if a metric takes 1 second to collect and its interval was to be collected every 30 seconds, then the percentage is 1/30 or about 3%. The rolled up number is computed across all metrics running and is re-evaluated by the agent every minute. The expectation is that the overall percentage will be low.If targets experience issues causing collections to take longer or the agent itself is not performing well, the expectation is that the percentage would increase.

Finally the heartbeat round trip is a measure similar to the upload in concept – a measurement of how long it's taking to send a certain amount of data; but given that these heartbeats (or pings) are typically constant in size, the measure is just the total round trip time.

If either the upload or ping times are increasing, then this could be an indicator of a network performance issue or the oracle management server itself experiencing some performance issue.

The second panel is the usage panel. It also contains 3 KPI's:

  • Number of requests made by the console of the agent (called Dispatched Actions) (this is the graphed data set)

  • Number of collections scheduled to be run for the next hour.

  • Number of jobs currently running on the agent

The usage panel reflects work being requested of the agent, either from console actions (real-time metric requests for example or jobs being submitted) as well as the work queue of the agent (i.e. the number of collections to be run for the next hour).



The final panel is the resource panel. There are 5 KPI's here:

  • percentage of the java heap in use (graphed)

  • approximate cpu usage (graphed)

  • current megabytes of the java heap in use

  • the maximum number of megabytes of the java heap used

  • a rolling load average of the cpu usage

First, let me explain the java heap statistics. The heap is sized to a specified maximum. The percentage of the heap used is a measure against that maximum. Current represents the current amount used and the maximum is the highest recorded value (not the absolute declared maximum). For instance, if the declared maximum is 100MB and the maximum ever observed is 80MB, then at some point in time 80% of the available heap was in use. Its fully expected that the percentage of the heap would follow the classic 'saw-tooth' pattern seen below.Java virtual machines allocate memory from the heap and then run garbage collection (GC) to free up space. If the trend line is flat, then no leaks would be evident. Its recommended that the maximum ever observed is no greater than 95% of the declared maximum, as that would potentially lead to either excessive GC or out of memory exceptions.

The cpu statistics are a representation of the current cpu being used and the 'load average' – a rolling weighted average of the last 15 minutes of cpu consumed.

My next blog will continue on the theme of using additional services to diagnosis current and/or potential issues with the agent.


Post a Comment:
  • HTML Syntax: NOT allowed

Latest information on Oracle Enterprise Manager and Oracle Management Cloud.

Related Blogs


« July 2016