OS Analytics with Oracle Enterprise Manager (by Eran Steiner)
By Zeynep Koch-Oracle on Nov 13, 2012
Oracle Enterprise Manager Ops Center provides a feature called "OS Analytics". This feature allows you to get a better understanding of how the Operating System is being utilized. You can research the historical usage as well as real time data. This post will show how you can benefit from OS Analytics and how it works behind the scenes.
The recording of our call to discuss this blog is available here:
Here is quick summary of what you can do with OS Analytics in Ops Center:
- View historical charts and real time value of CPU, memory, network and disk utilization
- Find the top CPU and Memory processes in real time or at a certain historical day
- Determine proper monitoring thresholds based on historical data
- Drill down into a process details
Where to start
To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab.
You can see the CPU utilization, Memory utilization and Network
utilization, along with the current real time top 5 processes in each
category (click the image to see a larger version):
In the above screen, you can click each of the top 5 processes to see a more detailed view of that process. Here is an example of one of the processes:
One of the cool things is that you can see the process tree for this process along with some port binding and open file descriptors.
Next, click the "Processes" tab to see real time information of all the processes on the machine:
An interesting column is the "Target" column. If you configured Ops Center to work with Enterprise Manager Cloud Control, then the two products will talk to each other and Ops Center will display the correlated target from Cloud Control in this table. If you are only using Ops Center - this column will remain empty.
The "Threshold" tab is particularly helpful - you can view historical trends of different monitored values and based on the graph - determine what the monitoring values should be:
You can ask Ops Center to suggest monitoring levels based on the
historical values or you can set your own. The different colors in the
graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data.
It's important to note that when looking at longer periods, Ops Center smooths out the data and uses averages. So when looking at values such as CPU Usage, try shorter time frames which are more detailed, such as one hour or one day.
Applying new monitoring values
When first applying new values to monitored attributes - a popup will come up asking if it's OK to get you out of the current Monitoring Policy. This is OK if you want to either have custom monitoring for a specific machine, or if you want to use this current machine as a "Gold image" and extract a Monitoring Policy from it. You can later apply the new Monitoring Policy to other machines and also set it as a default Monitoring Profile.
Once you're done with applying the different monitoring values, you
can review and change them in the "Monitoring" tab. You can also click
the "Extract a Monitoring Policy" in the actions pane on the right to
save all the new values to a new Monitoring Policy, which can then be
found under "Plan Management" -> "Monitoring Policies".
Visiting the past
Under the "History" tab you can "go back in time". This is very helpful when you know that a machine was busy a few hours ago (perhaps in the middle of the night?), but you were not around to take a look at it in real time. Here's a view into yesterday's data on one of the machines:
You can see an interesting CPU spike happening at around 3:30 am along with some memory use. In the bottom table you can see the top 5 CPU and Memory consumers at the requested time. Very quickly you can see that this spike is related to the Solaris 11 IPS repository synchronization process using the "pkgrecv" command.
The "time machine" doesn't stop here - you can also view historical data to determine which of the zones was the busiest at a given time:
Under the hood
The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/
- An "os.zip" file exists for the main OS. Inside you will find
many small text files, named after the Epoch time stamp in which they
- If you have any zones, there will be a file called "guests.zip"
containing the same small files for all the zones, as well as a folder
with the name of the zone along with "os.zip" in it
- If this is the Enterprise Controller or the Proxy Controller, you will have folders called "proxy" and "sat" in which you will find the "os.zip" for that controller
The actual script collecting the data can be viewed for debugging purposes as well:
- On Linux, the location is: /opt/sun/xvmoc/private/os_analytics/collect
If you would like to redirect all the standard error into a file for debugging, touch the following file and the output will go into it:
# touch /tmp/.collect.stderr
The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped.
If you would like to review the properties for the Analytics, you can view those per each agent in
I hope you find this helpful! Please post questions in the comments below.