Oracle Enterprise Manager Ops Center provides a feature called "OS Analytics". This feature allows you to get a better understanding of how the Operating System is being utilized. You can research the historical usage as well as real time data. This post will show how you can benefit from OS Analytics and how it works behind the scenes.
The recording of our call to discuss this blog is available here:
Here is quick summary of what you can do with OS Analytics in Ops Center:
Where to start
To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab.
You can see the CPU utilization, Memory utilization and Network utilization, along with the current real time top 5 processes in each category (click the image to see a larger version):
In the above screen, you can click each of the top 5 processes to see a more detailed view of that process. Here is an example of one of the processes:
One of the cool things is that you can see the process tree for this process along with some port binding and open file descriptors.
On Solaris machines with zones, you get an extra level of tabs, allowing you to get more information on the different zones:
This is a good way to see the busiest zones. For example, one zone may not take a lot of CPU but it can consume a lot of memory, or perhaps network bandwidth. To see the detailed Analytics for each of the zones, simply click each of the zones in the tree and go to its Analytics tab.
Next, click the "Processes" tab to see real time information of all the processes on the machine:
An interesting column is the "Target" column. If you configured Ops Center to work with Enterprise Manager Cloud Control, then the two products will talk to each other and Ops Center will display the correlated target from Cloud Control in this table. If you are only using Ops Center - this column will remain empty.
Next, if you view a Solaris machine, you will have a "Services" tab:
By default, all services will be displayed, but you can choose to display only certain states, for example, those in maintenance or the degraded ones. You can highlight a service and choose to view the details, where you can see the Dependencies, Dependents and also the location of the service log file (not shown in the picture as you need to scroll down to see the log file).
The "Threshold" tab is particularly helpful - you can view historical trends of different monitored values and based on the graph - determine what the monitoring values should be:
You can ask Ops Center to suggest monitoring levels based on the historical values or you can set your own. The different colors in the graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data.
It's important to note that when looking at longer periods, Ops Center smooths out the data and uses averages. So when looking at values such as CPU Usage, try shorter time frames which are more detailed, such as one hour or one day.
Applying new monitoring values
When first applying new values to monitored attributes - a popup will come up asking if it's OK to get you out of the current Monitoring Policy. This is OK if you want to either have custom monitoring for a specific machine, or if you want to use this current machine as a "Gold image" and extract a Monitoring Policy from it. You can later apply the new Monitoring Policy to other machines and also set it as a default Monitoring Profile.
Once you're done with applying the different monitoring values, you can review and change them in the "Monitoring" tab. You can also click the "Extract a Monitoring Policy" in the actions pane on the right to save all the new values to a new Monitoring Policy, which can then be found under "Plan Management" -> "Monitoring Policies".
Visiting the past
Under the "History" tab you can "go back in time". This is very helpful when you know that a machine was busy a few hours ago (perhaps in the middle of the night?), but you were not around to take a look at it in real time. Here's a view into yesterday's data on one of the machines:
You can see an interesting CPU spike happening at around 3:30 am along with some memory use. In the bottom table you can see the top 5 CPU and Memory consumers at the requested time. Very quickly you can see that this spike is related to the Solaris 11 IPS repository synchronization process using the "pkgrecv" command.
The "time machine" doesn't stop here - you can also view historical data to determine which of the zones was the busiest at a given time:
Under the hood
The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/
The actual script collecting the data can be viewed for debugging purposes as well:
If you would like to redirect all the standard error into a file for debugging, touch the following file and the output will go into it:
# touch /tmp/.collect.stderr
The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped.
If you would like to review the properties for the Analytics, you can view those per each agent in
I hope you find this helpful! Please post questions in the comments below.