By Scott McNeil on Jul 30, 2012
To try and make the new features a bit more understandable, I’ll be writing a number of blog entries over the coming months to highlight just some of my favourite new features for EM12c. From an administrator’s perspective, one of those standout features (and the subject of today’s entry) has to be incident management.
The goal of incident management is to enable administrators to monitor and resolve service disruptions that may be occurring in their data centre as quickly and efficiently as possible. Instead of managing the numerous discrete individual events that may be raised as the result of any of these service disruptions, we want to manage a smaller number of more meaningful incidents, and to manage them based on business priority across the lifecycle of those incidents.
To do this, Enterprise Manager now provides a centralized incident console called Incident Manager that will enable the administrator to track, diagnose, and resolve incidents, as well as providing features to help rectify the root causes of recurrent incidents. Incident Manager also directly leverages Oracle’s own expertise via My Oracle Support knowledge base articles and documentation to enable administrators to accelerate the process of diagnosing and resolving incidents and problems. Finally, Incident Manager also offers the ability to do lifecycle operations for incidents, so you can assign ownership of an incident to a specific user, acknowledge an incident, set priority for an incident, track an incident’s status, escalate an incident or suppress it so you can defer it to a later time. You can also raise notifications on an incident or open a helpdesk ticket via the helpdesk connectors.
Enterprise Manager continues to be the primary tool for managing and monitoring the Oracle data center, so it manages and monitors Oracle applications as well as the application stack from presentation layer to middleware, databases to hosts and the operating system, as well as non-Oracle technology. When Enterprise Manager detects issues in any of this infrastructure, it raises events. Sample events might be:
1. Metric alerts (for example, CPU utilization or tablespace usage alerts) where a critical threshold you set has been crossed
2. Job events – events are raised by the job system for job statuses
that you specify, for example an event is raised to signal the failure of a job.
3. Standards violations – if you are using compliance standards and any of the targets that are being monitored violate any of the compliance standards, then a standards violation event could be raised.
4. Availability events – if a target is down and Enterprise Manager detects that, an availability event that the target is down can be raised
5. Other events – there are other types of events that occur as well
All these events signal particular issues have occurred in the managed data centre. As an administrator, you really want to be able to determine which of these events are significant. From these significant events, you then want to be able to correlate discrete events that are related to the same underlying issue, so you in fact have to manage a smaller number of significant incidents.
An incident could then be defined as an object containing a significant event (such as a target being down, for example) or it could be a combination of events that all relate to the same issue (for example, running out of space could be detected by Enterprise Manager as separate events raised from the database, host and storage target types). For example, you may have a performance incident that amalgamates a number of performance events, another incident related to space, and a different incident based on availability problems.
Sound good? OK, so how do we do this? Well, events are significant occurrences in your IT infrastructure and that Enterprise Manager detects and raises. Each event has a set of attributes– what type of event it is, the severity (fatal, critical and so on), the object or entity on which the event is raised (typically a target but it can also be a job or some other object), the message associated with the event, the timestamp at which it occurred, as well as the functional category (such as availability, security etc.)
Some examples of the different types of events include:
· Target availability: raised when a target is down or has gone into an agent unreachable state.
· Metric alert: raised when a metric crosses its threshold.
· Job status change: raised, for example, when a job fails.
· Compliance standard rule: raised when a compliance standard rule is violated.
· Metric evaluation: raised when there is an error with the evaluation of a metric.
· Other events such as SLA Alert, High Availability and Compliance Standard Score violation can also be raised, and of course, users can cause an event to be raised.
Associated with these event types are event severities. The first of these, “Fatal”, is a new severity level in Enterprise Manager specifically associated with the target availability event type for when the target is down. Critical and warning events have the same meaning as they had in previous releases, and then we have the Advisory level. Typically, this is associated with non-service-impacting events such as compliance standard violation events. The informational level is an event severity used to indicate simply that an event has occurred, but there is no need to do anything about it.
As we discussed previously, an actual incident will contain one or more events. Let’s look at the details of an incident with one event. For example, Figure 1 shows us an availability event:
Figure 1: Incident with one event
The event signals that the database DB1 is down and includes a timestamp of when the event was raised. Because this is a target availability event and the database is down, the severity is marked as Fatal. An incident can be created for that event, so the incident contains only one event. In order to manage and track the resolution of the incident, the incident has other attributes such as owner (the Enterprise Manager user that is working on the incident), status, incident severity (which is based on the event severity), priority and a comment field.
Many incidents will instead contain
multiple events, where those events are related and pointed to the same
underlying cause. In the example shown
in Figure 2, we have two metric alert events on a host target -- a memory
utilization metric alert event and a CPU utilization metric alert event because
the host is starting to suffer from heavy load. We have a warning severity memory utilization metric alert event, and a
short time later a critical severity CPU utilization metric alert event.
An incident can be created containing both events in order to manage and track the resolution of the incident. In the current release, the administrator needs to manually combine events into an incident in the Enterprise Manager console (the automatic grouping of related events into an incident is a future enhancement). Again, we have additional attributes associated with the incident like we had in the previous example. Enterprise Manager automatically assigns the incident severity, based on the worst case event severity of all the events contained in the incident. Since the worst event severity is Critical, the incident severity is also set to Critical. Finally, the incident has a summary which is a short description of what the incident is about. The individual events are indicating the machine load is high so you can set the summary to that. Alternatively, you can set the incident summary to be the same as the event messages.
If you are using one of the helpdesk connectors to interface to a helpdesk system, an incident might also result in a helpdesk ticket which can allow the helpdesk analyst to work on the ticket. Within Enterprise Manager, we’ll be able to track both the ticket number and the status of that particular ticket.
A problem is the underlying root cause of an incident. In Enterprise Manager terms, a problem is specifically related to either an Automatic Diagnostic Repository (ADR) incident or Oracle software incident. Enterprise Manager will automatically create a problem whenever it detects an ADR incident has been raised. An ADR incident can be thought of as a critical Oracle software problem where the resolution of the software problem typically involves contacting Oracle Support, opening a service request and possibly receiving a patch for that problem.
Whenever an ADR incident is raised, we generate one incident in Enterprise Manager for that ADR incident, and we also automatically generate a problem as well. All the ADR incidents that have the same problem signature (that is, the same root cause) will be linked into a single problem object. The administrator can manage the problem in Incident Manager in the same way as you would manage an incident, so you can assign an owner to the problem, track the resolution and so on. In addition, there are in-context links to Support Workbench functionality which allows the administrator to package the diagnostic material, open a service request and view the status of diagnostic activity such as the SR number and ultimately bug number (if one is generated) within the user interface.
Figure 3 shows a diagrammatic example of how incidents and problems are related. Two ADR incidents have occurred, in this example two ORA-600 errors have occurred in my database. Both of these incidents are of critical severity. Enterprise Manager automatically creates a problem containing those incidents. Within the Incident Manager interface you can link to the Support Workbench to open a service request which you can then track from Incident Manager.
Figure 3: Incidents and problems
So now you have an understanding of the terminology and relationships between these terms, what’s next? Well, the next thing to understand is just how you deal with these incidents. That will be the topic of my next blog, so stay tuned for more!
Contributed by Pete Sharman , Principal Product Manager, Oracle Enterprise Manager
Oracle Enterprise Manager Ops Center 12c Update 1 was released earlier this month. Eran Steiner , Technical Architect, Oracle Enterprise Manager, adds some additional information and best practices about upgrading to Ops Center 12c Update 1 in this blog.Eran hosted a call to provide an overview of Oracle Enterprise Manager Ops Center 12c Update 1 and answer any questions.The recording of this call is available here and the presentation can be downloaded here.
Each month the Enterprise Manager team puts together a full line-up of great articles, content and technical product information around a particular theme. This month is no different. In July, we’re focusing on lifecycle management and all the things you need to know about this important topic. Read the July newsletter to get the latest articles and content on lifecycle management. Read it now!
The July edition covers:
So stay tuned as we bring you more feature articles, and more webcasts throughout the month.
Subscribe here to receive the Oracle Enterprise Manager 12c Newsletter.
Over last few months, the Oracle Enterprise Manager team has released Enterprise Manager 12c Bundle Patch1 (a.k.a. BP1) for Oracle Enterprise Manager Cloud Control 12c (EM 12c) on all the supported platforms (Customer announcement). BP1 is a mandatory patch because all future patches will assume the presence of this patch. BP1 includes several critical fixes. Therefore this patch touches almost all the components of Enterprise Manager and applying it on existing environment is a multi step process that can be tricky. From our recent experience of applying BP1 on an internal production demo site with over a thousand targets, we would like to share following tips and tricks.
1. Applying incremental Bundle Patch or using full install?
Unless you have an EM 12c running in production environments where you can’t afford to lose the existing management repository, it is highly recommended to re-install EM using the patched EM Base Platform 220.127.116.11 Full Installer (With BP1), instead of applying BP1. For instance, if you have a test or a “sandbox” environment even with substantial number of targets, it might be easier to reinstall EM environment instead of manually applying patches (including agent patches) and upgrading plug-ins. Note that reinstalling EM will require reinstallation of target agents as well. Therefore you’ll need to do careful consideration before choosing this option.
2. Plan it well
3. EM 12c Backup is critical
4. Upgrade plug-ins in bulk using emcli
BP1 comes with the 18.104.22.168 release of plug-ins and it is highly recommended to upgrade plug-ins together with the application of BP1. In a typical environment you may need to upgrade more than 15 plug-ins. For some plug-ins, the upgrade process requires a restart of OMS(s) and therefore can take a good deal of time (almost fifteen minutes for single plug-in upgrade). Hence upgrading plug-ins one by one via the UI can be time consuming process requiring manual intervention after every few minutes.
Use EM command line ‘deploy_plugin_on_server’ to deploy multiple plug-ins in one go in an automated manner. Since multiple plug-ins are upgraded in single downtime window for OMS, it is a far more efficient process. In our environment, using command line we could upgrade eight plug-ins in under 30 minutes which otherwise would have taken more than two hours. Make sure to run the emcli in a pre-requisite check mode before doing the actual deployment such as shown in the screenshot below.
Also note that if your OMS is on Linux platform, you’ll need to apply one-off patch 13638422 after applying BP1 on the OMS to get the ‘deploy_plugin_on_server’ emcli verb. For BP1 on other platforms (e.g. Windows) this patch is included in BP1 itself.
5. From Linux OMS don’t push agents on other platforms without necessary patches
A very common use case is the deployment of agents on non-Linux platform from a Linux OMS with BP1. Since you have Windows/Solaris agent software (with BP1) available via self update, you might tend to assume that, all you need to do is simply push the Windows/Solaris agents from BP1 Linux OMS but that’s not the right way to do. You have to apply few patches on OMS and plug-ins before pushing the agent, follow the instructions from the Oracle® Enterprise Manager Bundle Patch 1 Application Guide.
6. Read the right documentation
Last but surely not the least, it is vital that you go through the Oracle® Enterprise Manager Bundle Patch 1 Application Guide, before you begin with the BP1 process. This guide provides step by step instructions for applying BP1 including pre-requisite checks, recommendations, and troubleshooting steps.
In addition to the documentation, we suggest to refer the following key resources published by Oracle.
These few simple tips can make your experience with BP1 a lot smoother. In future entries, we’ll discuss the best practices for major updates for Oracle Enterprise Manager 12c as and when those are available.
I would like to bring your attention to the online launch event on July 25 to introduce Oracle ExaLogic Elastic Cloud Software 2.0 . One of the focus areas of the upcoming ExaLogic release is enhanced manageability with Oracle Enterprise Manager .
Please join Oracle executives Hasan Rizvi, Cliff Godwin, Steve Wilson, Wim Coekaerts and Mohamad Afshar on July 25 for the launch of Oracle Exalogic Elastic Cloud Software 2.0. Oracle’s latest release of hardware and software is engineered to work together to run your business applications.
Learn how Oracle Exalogic Elastic Cloud Software 2.0 can help your company:
The interactive launch event will feature a panel discussion with Oracle executives and customer testimonials.Register now.
Following the announcement of Oracle Enterprise Manager Ops Center 12c on April 4th, we are happy to announce the release of Oracle Enterprise Manager Ops Center 12c update 1. This is a bundled patch release for Oracle Enterprise Manager Ops Center.
Here are the key features of the Oracle Enterprise Manager Ops Center 12c update 1 :
This new release contains significant enhancements in the update provisioning, bare metal OS provisioning, shared storage management, cloud/virtual datacenter, and networking management sections of the product. With this update, customers can achieve better handling of ASR faults, add networks and storage to virtual guests more easily, understand IPMP and VLAN configurations better, get a more robust LDAP integration, get virtualization aware firmware patching, and observe improved product performance across the board. Customers can now accelerate Oracle VM SPARC and T4 deployments into production .
Oracle Enterprise Manager Ops Center 11g and Ops Center 12c customers will now notice the availability of new product update under the Administration tab within the Browser User Interface (BUI) . Upgrade process is explained in detail within the Ops Center Administration Guide under “Chapter 10: Upgrading”. Please be sure to read over that chapter and the Release Notes before upgrading.
During the week of July 9th, the full download of the product will be available from the Oracle Enterprise Manager Ops Center download website. Based on the customer feedback, we have changed the updates to include the entire product. Customers no longer need to install Ops Center 12c and then upgrade to the update 1 release. The can simply install Ops Center 12c update 1 directly.Here are some of the resources that can help you learn more about the Oracle Enterprise Manager Ops Center and the new update 1.
Watch the recording of Oracle Enterprise Manager 12c launch webcast by clicking the following banner.
On or around July 1, 2012, Oracle has become aware of an issue on Linux distributions resulting from the introduction of the leap second; this is causing problems for some customers. Leap seconds may be introduced at the end of June or December in a calendar year, like 2012, as necessary to maintain time standards. Servers hosting Oracle products which are clients of an NTP server (Network Time Protocol) may be particularly susceptible to this issue as the NTP server is updated.
Linux distributions which may be affected include Oracle Enterprise Linux, Red Hat Enterprise Linux, Oracle VM and Oracle Unbreakable Enterprise Kernel. Asianux 2 and 3, based on RHEL 4 and 5, may also be affected. One report of correction to high agent CPU using Note 1472421.1 on SLES11 has also been reported.
Not all customers will be affected, but those, who are affected, may observe higher than normal CPU consumption on their Linux environments where JVM's are utilized. In Oracle Enterprise Manager ( EM ) , this problem can manifest itself as high CPU consumption with the EM Agent process (which runs on a JVM in EM 12c, for instance). It is possible that the OMS is also affected.
would advise customers to review the description of this problem in MOS
Note 1472651.1 and take action if they observe that their environment
Contributed by Andrew Bulloch , Director, Application Systems Management Products
Note – Oracle Enterprise Manager 12c DBaaS is platform agnostic and is designed to work on Exadata/non-Exadata, physical/virtual, Oracle/non Oracle infrastructure(hardware and OS) platforms and it’s not a mandatory requirement to use Exadata as the base platform.
Database-as-a-Service (DBaaS) is an important trend these days and the top business drivers motivating customers towards private database cloud model include constant pressure to reduce IT Costs and Complexity, and also to be able to improve Agility and Quality of Service.
The first step many enterprises take in their journey towards cloud computing is to move to a consolidated and standardized environment and Exadata being already a proven best-in-class popular consolidation platform, we are seeing now more and more customers starting to evolve from Exadata based platform into an agile self service driven private database cloud using Oracle Enterprise Manager 12c.
Together Exadata Database Machine and Enterprise Manager 12c provides industry’s most comprehensive and integrated solution to transform from a typical silo’ed environment into enterprise class database cloud with self service, rapid elasticity and pay-per-use capabilities.
In today’s post, I’ll list down the important steps to enable DBaaS on Exadata using Enterprise Manager 12c. These steps are chalked down based on a recent DBaaS implementation from a real customer engagement -
Congratulations! You just delivered a successful database cloud implementation project!
In future posts, we will cover these additional useful topics around database cloud –
More Information –
Latest information and perspectives on Oracle Enterprise Manager.