Wednesday Feb 25, 2015

Editing EM12c Jobs in Bulk

I recently received requests for suggestions on how to edit EM12c jobs in bulk. The two examples that were presented to me were:

  1. Change the start time of 70+ jobs
  2. Change the oracle home path in 80+ RMAN script jobs

So how do we do this?

Read More

Monday Feb 09, 2015

Take the IOUG Database Manageability Survey

The Independent Oracle Users Group (IOUG), is conducting its annual database manageability survey. If you are a database professional responsible for managing and operating your organization's database applications and systems, then the IOUG wants to hear from you.

Take the next few minutes to answer simply questions around your common manageability practices and challenges. Share your experience and insight with the IOUG and Oracle communities and you'll be entered into a draw for a chance to win an Apple iPad from the IOUG. You will also receive a free copy of the final report.

Take the 2015 IOUG Database Manageability Survey


Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app

Monday Jan 26, 2015

Register Now for the Cloud Platform Online Forum

January 28th, 2015 10:00 a.m. PST/1:00 p.m. EST
Register Today

Modern Business. Modern Cloud.
Is Your PaaS Delivering the Agility Your Users Demand?

Don't miss the opportunity to join Oracle and IDC in a series of deep discussions on Cloud Platform as a Service. This online event kicks off on Wednesday January 28th at 10:00 a.m. PST. Take advantage of 20+ sessions, 10+ demos and 100+ tips and techniques to plan your PaaS adoption effectively. Topics encompass private, public and hybrid cloud. You will also learn how to plan your transition from traditional IT delivery to cloud delivery using Oracle technologies you use every day. Register at bit.ly/PaaSForum

Oracle Enterprise Manager discussions during this event include tips and techniques on workload consolidation using database as a service and application platform as a service; transforming traditional IT delivery using self-service; effectively collaborating with business users with showback/chargeback, and more. Our speaker Sudip Datta, Vice President of Product Management will also provide a glimpse into the future of PaaS management.

Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app

Sunday Jan 18, 2015

Enterprise Manager Ops Center - Ops Center EC/PC sizing

Counting Access Points Vs Assets

When designing a customer's Ops Center environment, one of the first questions I get asked is how many servers can an Enterprise Controller (EC) or a Proxy Controller (PC) support.  Customers think of sizing, as being the number of servers,  LDOMs, etc. and these are lumped together into a total number of assets. In the real world, the answer to the "How many assets can my Enterprise Controller /Proxy Controller support?" question is, "It Depends".  How you place your Proxy Controllers in your network and what you manage from each Proxy Controller can impact the total number of assets that can be supported from an Enterprise Controller.

The amount of load placed on an Enterprise Controller or Proxy Controller is not directly determined by the number of assets it manages, but by the number of Access Points (AP) it is supporting. A simple view of Access Points is that they are a connection. But it is a little more complex in that they are also a reflection of the way Ops Center internally models its data structures. The actual load on any given Enterprise Controller or Proxy Controller closely links to the number of Access Points it is managing. The number of Access Points seen by the Enterprise Controller can differ depending on whether an asset is connected via a single or multiple Proxy Controllers. Therefore, the Access Point count on the Enterprise Controller will be higher than the actual number of managed assets, if assets are managed by multiple Proxy Controllers. When the documentation is referring to Enterprise Controller /Proxy Controller capacity and uses the term "Asset", it is actually counting "Access Points".

Let's look at a couple examples of this to make it clearer:

Physical Server ILOM

In this example, a single physical server has multiple data collection methods. A Proxy Controller can gather LOM data by accessing the service processor and by using the agent running inside the operating system.


If the same Proxy Controller is used to access a single asset, the two data feeds are consolidated into a single Access Point. The Enterprise Controllers total load will be a single Access Point.


If different Proxy Controllers are used to access a single asset,  each Proxy Controller will record it as a separate Access Point. The Enterprise Controllers total load will be the sum of both Proxy Controllers and will be a total of 2 Access Points.

LDOMs

Another example would be an LDOM guest where we obtain data from the Control Domain agent and the agent running inside the LDOM guest OS.

Once again, if  both data feeds are via a single Proxy Controller they only count as 1 Access Point on both the Proxy Controller and the Enterprise Controller.

Where as, if each data feed is via a separate Proxy Controller they each count as 1 Access Point on each Proxy Controller and the total Access Point count on the Enterprise Controller will be 2.

With the release of Enterprise Manager Ops Center - 12.2.2, we have updated and renamed the Enterprise Manager Ops Center Sizing and Performance Guide (https://docs.oracle.com/cd/E40871_01/doc.122/e57052/toc.htm). This revised document is easier to understand and has been updated to reflect current releases of Ops Center. Note: Since the term "Access Point" was not considered to be commonly understood, the term "Asset" has been used in the documentation. 

To make the counting of Access Points easier, we have added a program to the OCDoctor toolbox. This program "AssetCount.sh" is new in OCDoctor version 4.45 and can be found on any EC/PC or system that has an agent deployed. The path to the OCDoctor toolbox is /var/opt/sun/xvm/OCDoctor/toolbox.

# /var/opt/sun/xvm/OCDoctor/toolbox/AssetCount.sh

        AssetCount.sh [-v] { standard | machine | agent | all }

        standard - show Asset count
        machine - show machine output with Assets names
        agent - show information about agents, agentless Assets, Service Processors
        all - show all the information together

        -v : verbose output, present additional notes around the counting algorithm

        (Version 1.0 2015/01/08) 
  #

Let's just look at the standard output (we are running this on the Enterprise Controller). The output shows:

  • The total Access Point count for the EC (72)
  • The number of Access Points for each type of asset, for each Proxy Controller ( Note: the total Access Point count for each Proxy Controller is labeled as Assets )
root@ec1:~# /var/opt/sun/xvm/OCDoctor/toolbox/AssetCount.sh standard

EC 72

Proxy Assets Zones Ldoms OVMGuests Servers Storages Switches ExadataCells MSeriesChassis MSeriesDomain
------------------------------------------------------------------------------------------------------
pc4   32     5     25    0         2       0        0        0            0              0
pc1   28     0     26    0         2       0        0        0            0              0
pc0   12     2     4     0         6       0        0        0            0              0

Use option '-v' to see additional notes on the counting algorithm.

#

You can also use the "machine" option to list out which asset is attached to which proxy.

# /var/opt/sun/xvm/OCDoctor/toolbox/AssetCount.sh machine

EC 72

Proxy 32 pc4
Zones 5 S11zone101 S11zone102 S11zone100 S11zone103 S11zone104
Ldoms 25 stdldom21 stdldom34 stdldom36 stdldom22 stdldom45 stdldom47 stdldom25 stdldom43 stdldom49 stdldom30 stdldom23 stdldom29 stdldom42 stdldom20 stdldom32 stdldom46 stdldom44 stdldom26 stdldom31 stdldom27 stdldom48 stdldom40 stdldom35 stdldom41 stdldom28
OVMGuests 0
Servers 2 10.187.70.169 pc4
...
Proxy 28 pc1
Zones 0
Ldoms 26 stdldom21 stdldom34 stdldom36 stdldom22 stdldom45 stdldom47 stdldom25 stdldom43 stdldom49 stdldom30 stdldom23 stdldom29 stdldom42 stdldom33 stdldom20 stdldom32 stdldom46 stdldom44 stdldom26 stdldom31 stdldom27 stdldom48 stdldom40 stdldom35 stdldom41 stdldom28
OVMGuests 0
Servers 2 10.187.70.171 pc1
...
# 

You can see this would be incredibly verbose on a large environment ( I have truncated it here). You can clearly see that the LDOMs (stdldomXX) are being reported by both "pc1" (via Control Domain) and "pc4" (via OS agent). Note: the differing LDOM count on "pc1" and "pc4"  as "stdldom33" has no OS or agent on it, so it only reports against "pc1".

You can also use the "agent" option to display agent/agentless/LOM totals for each Proxy Controller.

# /var/opt/sun/xvm/OCDoctor/toolbox/AssetCount.sh agent

EC 72

Proxy Agents Agentless SPs
--------------------------
pc4   25     2         0
pc1   1      1         0
pc0   5      5         5

Use option '-v' to see additional notes on the counting algorithm.

#

In addition, OCDoctor.sh --troubleshoot will also be checking for the number of configured Access Points and comparing this number against the recommend Access Point count. It will print a warning at approximately 90% of the recommended capacity and a warning if you exceed the recommended capacity.

Armed with the information above, you should be able to better design your Ops Center deployments and to monitor the growing load on the Enterprise Controller/Proxy Controllers as you add more assets.

Regards,

Rodney

Monday Jan 12, 2015

Enterprise Manager Ops Center - The Power of Groups

To Group or not to Group

Any customer with more than a handful of servers will struggle to take advantage of Ops Center's full features, unless they embrace the use of groups. The ability to group a number of like servers into a logical target, that can then be operated on, means you can do a single action instead of running the same action hundreds of times, once for each server in the group. Grouping also allows you to view and manage your environment in line with your current organizational units and business practices.

What can be done with Groups

Groups avoid the need to select individual servers and can be used as a target for:

  • Update profiles (Patching) - Update patches/packages and files and apply pre/post actions
  • Operational plans - Run a script, update config files etc
  • Perform actions - Reboot/halt/refresh/power-off/power-on actions to all assets in the group
  • Monitoring profiles - Apply customized monitoring profiles
  • Reporting -  Run a single report that includes multiple servers

Groups can also:

  • Display membership - The Membership Graph or the asset tree view both show the servers that make up a group
  • Display incidents/alerts - Groups work as a roll-up point for incidents on any server in the group
  • Display Top CPU/Memory/Network consumers - The "Asset summary" of a group shows top consumers of CPU/Memory/Network resources
  • Restrict the assets that a given user has access to in Ops Center

Types of Groups

Groups are totally logical constructs. An asset (Server, OS, Zone, LDOM) can be a member of as many groups as you like. Deleting a group, does not delete the assets it contains. While most often groups will contain assets of all the same type (eg: System LOMs), as this will give you a group where an action like "power off" makes sense to all the members of the group, it is also possible to create a group that is made up of differing asset types eg: all the assets (OS/LOM/Zones) that are part of a physical server. This type of group would normally be used to restrict the permissions of users so that they could only view/manage the servers for which they are responsible.  A key fact to remember when thinking about groups is that an asset that is a member of one group is not precluded from being a member of other groups.

Ops Center Predefined Groups

As a starting point, Ops Center provides a number of predefined groups found under the [ Navigation ==> Assets ] menu.

While most of the groupings are what you would expect, there are a few that require a little more explanation.

Standard Views 

[ All Assets ] - Not really a group as everything turns up here

[ Engineered Systems ] - A grouping of all discovered Engineered Systems (SPARC SuperCluster). Note that each Engineered System is also its own sub-group

[ Operating Systems ] -  Grouping based on OS type, release and version

[ Servers ] - Grouping based on physical servers

  • SPARC
    • M-Series -  M[3/4/5/8/0]000, M6 and M10 servers
    • Other SPARC - servers that are not sun4u or sun4v architecture or non Oracle/Sun servers
    • U-Class - servers that have sun4u architecture CPU's (V240/V490/SF6800 etc.)
    • V-Class - servers that have sun4v architecture CPU's (T1000/T2000/T5XX0/T3/T4/T5 etc.) - not V-series servers as you might first think
  • x86
    • Other x86 - Non Oracle/Sun servers 
    • x64 - 64 bit servers
    • x86 32-bit - 32 bit servers

[ Chassis ] - 6000/8000 blade based chassis and their server blades 

[ Network Switches ] - Managed InfiniBand and network switches. Ops Center only manages a limited number of switch models and these will normally be found as part of an Engineered System (Exadata/Exalogic/SPARC Super Cluster).

[ Racks ]  - Both Engineered System racks and manually declared racks. It is not commonly known that you can declare all the racks in your data center in Ops Center and place all your servers in their respective racks, giving you a useful data center view.

All the predefined groups are useful but as you can see, they are based on broad brush physical characteristics of a server and its OS. There is no allowance for how you actually use your servers. For that you will need to build your own "User Defined Groups".

User Defined Groups

User Defined Groups are an extremely powerful addition to Ops Center and allow you to model your application, organizational units and business constraints into Ops Center's management interface. Basically, it makes Ops Center capable of working much more in alignment with the way your business operates. Before we go onto how to create "User Defined Groups", let's go over, in a little more detail, what you could use them for:

  • Applications - create a group of all your web servers to report on patch levels, apply patches, update application configurations, restart applications, list top resource consumers.
  • Prod/Dev/Test - create a group based on production status, so you can  apply differing monitoring/alerting profiles, produce management reports and restrict admin access.
  • Admin By - create a group of all the servers that an admin(s) is responsible for, so they can quickly respond to alerts or you can limit the functions they are authorized to perform.
  • Patching - create a group based on the servers that are in the 1st round of patching, so you can easily and consistently patch, do before and after patching reports and maintain consistent patch levels across key applications.

These are just a few of the many things for which groups can be used. Setting up groups will greatly decrease your day to day workload and increase the manageability of your environment. Without the use of grouping, it is unlikely that you will be able to scale your Ops Center environment efficiently beyond about 30 servers.

Creating a User Defined Group 

First select the "Create Group" action [ Navigation ==> All Assets ==> Create Group ]

Static Groups 

Static groups are just as the name suggests, you define a group and place static members in it.

The default action of the "Configure Group" wizard is to create a Static Group. As long  as the "Configure group rules" checkbox is unchecked this will be a static group.

Give the group a name (mandatory), a description (Optional), and one or more group tags (Optional) and click "Next" and "Finish" to complete the wizard and launch the job that creates the group.

Tagging is another powerful feature that will be the topic of another blog, but in summary, it is a way of storing an arbitrary tag (value pair) with an asset or group, which means you can store any important information with the asset, such as Asset Number, Production status, etc.

Now, one by one, navigate to your servers and manually add the server to the group you have created.


Select your individual servers page and select the "Add Asset to Group" action.

Select the Group you want to add to (our example group is "Static Group") and the click then [Add Assets to Group] button.

Dynamic (Smart) Groups 

Dynamic (smart) groups are once again much as the label says. An asset(server/OS etc) will become part of the group based on it matching one or many criteria. The criteria is evaluated every time the group is accessed. So if you deploy a new server, create a zone or update any other matched attribute, it will change the group membership. The next time you access the group its membership will be automatically updated to include the current view of the environment. There is a large number of attributes that can be used to make criteria and the criteria can be combined to make complex grouping rules. There is more than enough to discuss on building these rules for another blog, but today, let's just go with a single example to give you the feel for the capabilities of dynamic grouping.

We will launch the "Create Group" wizard, as we did for the static group, but this time we will give it a more descriptive name and description. Last but not least, we will check the "Configure group rules" check-box, which will make the group we create a dynamic group.


Rules can be as simple as "hostname" starts with "prod" or as complex as having multiple rules each with multiple criteria matches. This is why I will be going into more details on building these rule sets in another blog in the next few weeks.


For this example, I have chosen a moderate level of complexity. We have a single rule, but we will only match on any asset that has all 4 of the attributes set.

  • OS Version contains 11 ( I also could have used Version equals 5.11)
  • Has an IP address is on subnet 192.168.20.0/24
  • Is a Non Global Zone
  • Is managed by Ops Center (It would be unusual to not be in a managed state, but a Proxy Controller in a NGZ is an example of a non managed asset. )

Click [Next] to see the preview screen and to check that we matched the assets we want.

You can see that we have matched on 4 Solaris 11 zones. Now let's see how that looks in the BUI [Navigation==>Assets ==>All User Defined Groups (pull-down)].

You see we have our static group and our dynamic group we have just created.

OK, let's create a second group, but this time for managed Non Global zones of the 10.187.56.0/24 network.

Create a new name and description.

Configure our rule, but this time look for Non Global Zones on the 10.187.56.0 network.

Preview shows no matching hosts, which in this case is correct, as I have not deployed a zone on that network yet. Finish the wizard and now let's look in the BUI to see what we have.

Checking the [All User Defined Groups] pull-down, we now see our static group and 2 different S11 NGZ groups, one with 4 members and one with no members.  (I was not quite consistent with the group naming, but I could fix that using the [Modify Group action].)

Now if I go off and deploy a few new zones, we can then see what our smart groups look like. I have deployed 2 zones on the 10.187.56.0 subnet and one more zone on the 192.168.20.0 subnet.

As you can see, the new zones automatically appear in the correct groups.

Dynamic (Smart) Groups - Criteria

There are far too many attributes to go through here ( a few are listed below) and I will be writing a further blog to show you how to use some of the more useful ones.

https://blogs.oracle.com/oem/resource/RL-Groups/Criteria-2.png

Semantic Tags

But I will call out one special criteria (attribute) that is probably the most useful one of all - the Semantic tag. A Semantic tag is an arbitrary tag or a tag/value pair that can be added to any asset to basically store descriptive information about that asset. You can add a tag to an asset by simply clicking the [Edit Tag] action.

Examples of Semantic Tags (Key):

 Tag Name
Description
 PRODUCTION  Describes the production status
 SALES  Describes the business unit that owns the server

Examples of Semantic Tags/Value pairs (Key and Value):

Tag Name  Value Description
 PRODUCTION_STATUS  DEV/UAT/PROD The value describes the production status DEV/UAT/PROD
 Admin_By
 admin1@oarcle.com The value describes the name/group of the administrator of the system (could even be their email)
 Business_Unit  SALES/ENGINEERING/ACCOUNTS The value describes the business unit that owns the server
 Application  DataBase/Web/ApplicationServer The value describes the application running on the asset

As you can see, you can create a Semantic Tag to store any information about your server that you require. These tags and tag/value pair scan be used as attributes to create Dynamic groups.

Configure a group using a "Semantic Tag Key & Value".

And the group based on the tag/value pair is ready to use.

Nesting Groups

One final feature of groups is that you can nest them (have a group that contains 1 or more other groups).

Create a group as before. This time click the check-box for "Configure subgroups".

Then you must drag the subgroups you want to include to the "Selected Group" icon.

Repeat this procedure until you have all your required groups selected.

Now click [Next], [Next] and [Finish], then check what our new group looks like.


You can see the S11-Zones group contains both S11 NGZ groups.

And by highlighting the S11-Zones group, we can see its membership and incident information for all included assets.

Summary

I hope this has given you a good understanding of groups and how they can make your use of Ops Center more productive.

Regards,

Rodney

Tuesday Nov 25, 2014

Monitoring NFS mounted file systems using EM12c

A customer recently asked me how they could monitor and alert against all the NFS mounted file systems across their datacenter. Here is a quick guide to do the same.

Read More

Friday Oct 24, 2014

Get Compliant with Oracle Java VM Database PSU OCT 2014 using EM12c

Check for compliance and automate patching of Oracle Database fleet using EM12c

Oracle along with its regular Quarterly Database PSU/SPU/CPU update this October 2014 released Oracle JAVA VM PSU patch, the patch is recommended to be applied to all databases in your fleet (esp. the ones that uses JAVA).  (For more information, support Note- 1929745.1 explains it in detail).
The mandate primarily is to apply the patches against the databases that use JAVAVM option. Ideally, you would need to apply it against all databases, so in case a new database is created in the ORACLE_HOME it is covered.

Oracle Enterprise Manager 12c provides support features of Compliance Management and automated Patching of databases. Using both these features you could identify the databases in your fleet that needs the patch and automate applying them.  

To get started, download the starter kit here. It contains utilities to jump start on your compliance, the kit contains a readily importable Compliance Standard and a step-by-step guide.

[Read More]

Tuesday Sep 30, 2014

Using JVM Diagnostics (JVMD) to help tune production JVMs

Contributing Author: Shiraz Kanga, Consulting Member of Technical Staff, Oracle


Tuning a production JVM involves more than merely adding more RAM to it via the -Xmx parameter. It depends upon an understanding of how your application truly behaves in a production environment. Most JVM tuning is done by developers with a simulated load in their Development or QA environment. This is unlikely to be truly representative of the production load running on production hardware with regards to proper JVM tuning.

One of the tools that actually contains real-world production data is JVM Diagnostics (JVMD). Hence it is a good idea to use data collected by JVMD for tuning your production JVMs. Note that JVMD is a component of Oracle Enterprise Manager, licensed as part of both Weblogic Management Pack and the Non-oracle Middleware Management Pack.


Figure 1. Heap Utilization and Garbage Collections for a specific JVM


In this document we are primarily addressing the Hotspot JVM. There are several aspects of tuning this JVM that we will look into:

Tuning Heap Size

The main parameters needed to tune heap size are:
  • -Xms<n>[g|m|k] is the initial and minimum size of the Java heap
  • -Xmx<n>[g|m|k] is the maximum possible size that the Java heap can grow upto

Figure 2. Heap after GC and Garbage Collection Overhead for a specific JVM


The Java Heap size here refers to the total size of the young and the old generation spaces. To start, take a look at the Heap usage chart (Figure 1) of your production JVM under maximum load in the JVMD Performance Diagnostics page. You should see some patterns in the minimum and the maximum heap sizes over time. You can use this data as a rough guide for your choice of -Xms and -Xmx with a reasonable amount of padding. After setting these you should start monitoring the garbage collection charts of your production JVMs (Figure 2) in the JVMD Live Heap Analysis page. It is useful to look into the JVMD metric called "Heap use after GC" which provides a good reflection of the actual amount of heap memory being used by your application. Ideally this metric should remain relatively steady over time with only few full garbage collections occuring. If there are too many full garbage collections then performance of your production application is impacted since GC is done by blocking threads that take a while to scan the entire heap. You can monitor this metric with the JVM GC Overhead% chart on the same page of JVMD. Garbage collection overhead is the percentage of total time spent in garbage collection. Increasing -Xmx can help to make these happen less frequently but actually it is time to dig deeper into your tuning options.

The key questions that you need to answer are - How frequently does garbage collection take place, How long does each collection take and what is the actual memory used (i.e. heap after GC). Also be sure that you NEVER make the heap size larger than the available free RAM on your system as disk will decrease performance as RAM will start getting swapped to disk.

The Sun HotSpot JVM relies on generational garbage collection to achieve optimum performance. The -XX:SurvivorRatio command line parameter could further help in tuning garbage collection.

The Java heap has a young generation for newly created objects and an old generation for long lived objects. The young generation is further subdivided into the Eden space where new objects are allocated and the Survivor space where new objects that are still in use can survive their first few garbage collections before being promoted to old generations. The Survivor Ratio is the ratio of Eden to Survivor space in the young object area of the heap. Increasing this setting optimizes the JVM for applications with high object creation and low object preservation. In applications that generate more medium and long lived objects, this setting should be lowered from the default and vice versa.

For example, -XX:SurvivorRatio=10 sets the ratio between each survivor space and eden space to be 1:10. If survivor spaces are too small, they will overflow directly into the old generation. If survivor spaces are too large, they will be empty. At each GC, the JVM determines the number of times an object can be copied before it is tenured, called the tenure threshold. This threshold should be set to keep the survivor space half full.

Most tuning operations represent a trade-off of some type or another. In the case of garbage collection the trade-off usually involves the memory used v/s throughput and latency.
  • The throughput of a JVM is measured in terms of the time spent doing garbage collection vs. the time spent outside of garbage collection (referred to as application time). It is the inverse of GC overhead mentioned above and represents the amount of work done by an application as a ratio of time spent in GC. Throughput can be tuned with -XX:GCTimeRatio=99 where 99 is the default which represents a 1% GC overhead.
  • Latency is the amount of time delay that is caused by garbage collection. Latency for GC pauses can be tuned by specifying rhe maximum pause time goal with the command line option -XX:MaxGCPauseMillis=<N>. This is interpreted as a hint that pause times of <N> milliseconds or less are desired. By default, there is no maximum pause time goal. If a pause time goal is specified, the heap size and other garbage collection related parameters are adjusted in an attempt to keep garbage collection pauses shorter than the specified value. Note that these adjustments may cause the garbage collector to reduce the overall throughput of the application and in some cases the desired pause time goal cannot be met.
Some lesser-known options are about permanent generation space which is used by the JVM itself to hold metadata, classes structures and so on:
  • -XX:PermSize=<n>[g|m|k] is the initial and minimum size of the permanent generation space.
  • -XX:MaxPermSize=<n>[g|m|k] is the maximum size of the permanent generation space. If you ever get the message java.lang.OutOfMemoryError: PermGen space then it means that your application is loading a very large number of classes and this should be raised.
  • -Xss=<n>[g|m|k]is the size of the thread stack. Each thread in a Java application has its own stack. The stack is used to hold return addresses, arguments to functions and method calls, and so on. The default stack size setting for a thread in Java is 1MB. In a highly multi-threaded system, like an application server at any given point in time there are multiple thread pools and threads that are in use so this may need to be reduced. Since stack size has to be allocated in contiguous blocks and if the machine is being used actively and there are many threads running in the system you may encounter an OutOfMemory error even when you have sufficient heap space. Recursive code can quickly exhaust the stack and if you use such code then you may need to increase the -Xss setting. However, if you see java.lang.OutOfMemoryError: unable to create new native thread then you may have too many threads, or each thread has a large stack; so you may need to decrease it.

Tuning Garbage Collection Algorithm

Garbage collection is expensive. Generational garbage collectors have the JVM  memory divided into several spaces.
  • Eden space: All objects are placed here when first created
  • Survivor spaces: One or more regions where objects mature
  • Tenured space: Where long lived objects are stored
  • Permanent generation: This area is only used by the JVM itself to hold metadata, such as data structures for classes, methods, interned strings
One thing that people often forget to try, is to lower the amount of garbage being created in the first place. There are a lot of ways to do this which are specific to the application/code that is being written. This often involves techniques such as using StringBuilder/StringBuffer instead of Strings, lowering the amount of logging, etc.

There are several GC algorithms which are available to be used in a Java VM. The following command line options allow to use a specific GC algorithm:
  • -XX:+UseSerialGC uses a single threaded, young generation, and old generation garbage collector (Normally this is a poor choice and should be used only for small Java heap sizes such as -Xmx256m or smaller)
  • -XX:+UseParallelGC utilizes a multithreaded (parallel) garbage collector for the young generation and a single-threaded garbage collector for the old generation space in parallel.
  • -XX:+UseParallelOldGC uses a multithread garbage collector for both the young and old generations.
  • -XX:+UseParNewGC -> enables a multithreaded, young generation garbage collector
  • -XX:+UseConcMarkSweepGC -> enables the VM’s mostly concurrent garbage collector. It also auto-enables -XX:+UseParNewGC (use if If you are not able to meet your application’s worst case latency requirements due to full garbage collection duration being too long)
  • -XX:+UseG1GC -> garbage first collector (default in java 7, can be also used in latest releases of Java 6)
In practice, the default in Java 6 is ParallelGC and in Java 7 it is the G1GC. Changing the algorithm requires detailed analysis of the application behavior. If you see a nice regular sawtooth chart in the heap usage you may not need any changes at all. If not, we recommend trying out each GC algorithm under a realistic load and then comparing it to the default algorithm's behavior under the same load. Usually you will find that the default algorithm outperforms the new setting and that there is no reason to change it.

As you can see, tuning the JVM and it's garbage collectors is largely a trade-off between space and time. If you had infinite heap space then you would never need to collect garbage. Inversely, if you could tolerate infinite time delays, then you could run a cleanup as frequently as you like and keep the heap compact. Clearly, both those situations are impossible. Finding the right middle ground that is right for you requires careful balancing act based on understanding how GC works and what the application requires.

References:

Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Summary of Enterprise Manager general session at Oracle OpenWorld 2014

Oracle OpenWorld is at full speed now and this morning we just delivered our Enterprise Manager general session. We realize that probably the majority of the thousands of readers of our blog are not attending OpenWorld so that's why we have created a summary of the Tweets that were sent during the session
(in case you are not familiar with Twitter timelines, please read from the bottom to the top for a chronological order of the presentation) 

For those that want to see the full presentation, we've uploaded them to our SlideShare account for your convenience .

For all Enterprise Manager related sessions at Oracle OpenWorld 2014, please take look here and let us know which one you'd like to see summarized or even see the full deck?

Wednesday Sep 17, 2014

Stay Connected at Oracle OpenWorld

Follow Oracle Enterprise Manager 12c @ Oracle OpenWorld

If you are planning to attend OpenWorld this year, make sure you maximize the full conference experience—stay connected to all the Oracle Enterprise Manager 12c activities.

For the complete list of OpenWorld events and activities, read the Oracle Enterprise Manager 12c Focus on Doc.

Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app

Thursday Sep 11, 2014

Simplify deployment of JVMD Agents to command line Java applications

Contributing Author: Shiraz Kanga, Consulting Member of Technical Staff, Oracle

Most customers of Oracle Enterprise Manager using JVM Diagnostics use the tool to monitor their Java Applications servers like Weblogic, Websphere, Tomcat, etc. In this environment it is fairly easy to deploy the JVMD Agent. Since it is distributed as a war file, you merely deploy the agent into a running application server using the management GUI or command line tools. Then you can start monitoring with no need for a restart of the app server or for the modification of any startup commands or scripts. However, with other types of Java applications that do not allow for any code deployment at runtime such as AWT/Swing or command line java applications these steps are necessary. Modifying startup scripts is complex because each application comes with its own custom and unique launch script. Additionally, the command that actually launches the runtime needs to have the java command with its related parameters (like -Xmx) the JVMD Agent with its own parameters (like console host/port) and the application itself which may have some more custom parameters. People often get confused due to the complexity that is seen here.

I've recently had customers that needed to monitor Hadoop, HDFS, Zookeeper, Kafka, Cassandra and Solr with JVMD. In order to simplify some of the complexity discussed above, I created a simple script based framework that makes things a bit easier. Feel free to use my approach to quickly setup JVMD with these or any other command line java programs. You can also use it as the basis for your own modifications. The framework modifies the startup scripts supplied with these tools in order to add the JVMD agent. All the code/scripts are attached in a zip file. Both original and modified versions of all changed scripts are included so you can easily see the modifications I made with a simple diff.

Here's how these scripts are setup. Everything is configured using 4 environment variables as shown below:

    export JVMD_AGENT_HOME=/home/skanga/servers
    export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
    export JVMD_MANAGER_PORT=3800
    export JVMD_UNIQUE_ID=<unique name for each server process>

where the JVMD_AGENT_HOME must contain the jamagent-env.sh (from the attached zip file) and jamagent.war (which can be downloaded from your JVMD console). The first three of these are likely to remain unchanged for all the JVMs being monitored so you can easily add them directly into jamagent-env.sh if needed.

The JVMD_UNIQUE_ID will always be unique so it must not be placed there. However it has two other modes where you can use a pointer to the unique ID instead of specifying it directly. You can point to either an environment variable or to a JVM system property that holds the actual unique ID. If you are using these cases then you could add this one to the jamagent-env.sh script too.

If JVMD_UNIQUE_ID starts with the string "sysprop-" then the actual unique ID will be read from the JVM system property named by the string following "sysprop-". For example if JVMD_UNIQUE_ID is "sysprop-server_name" and we have a system property -Dserver_name=MyTestingServer then JVMD will use MyTestingServer as the JVM unique identifier.

If JVMD_UNIQUE_ID starts with the string "envvar-" then the actual unique ID will be read from the environment variable named by the string following "envvar-". For example if JVMD_UNIQUE_ID is "envvar-server_name" and we have an environment variable called server_name=MyTestingServer then JVMD will use MyTestingServer as the JVM unique identifier.

Caution: Do not use dash (minus) character in the environment variable setup of unique id. Use underscore instead.

Generic Launch Script Modifications

After these four environment variables are set we need to modify our launch scripts. Make sure you have a backup of all files before you proceed. In the main script that you use to launch your java application look for a line that has a format that is similar to the one below: 
    $JAVA $JAVA_OPTS $MAIN_CLASS $MAIN_CLASS_ARGS
and replace it with
    $JAVA $JAVA_OPTS $JVMD_AGENT_INSERT $MAIN_CLASS $MAIN_CLASS_ARGS

So we simply added a $JVMD_AGENT_INSERT just before the name of the Main class. If there are multiple such lines then you should modify them all in the same way. And in order to configure $JVMD_AGENT_INSERT we also need to source jamagent-env.sh (with some error checking). So we insert a snippet like this in the line just before the JAVA invocation. 

# add JVMD Agent Env settings
[[ -e "${JVMD_AGENT_HOME}/jamagent-env.sh" ]] 
&& source "${JVMD_AGENT_HOME}/jamagent-env.sh" ||
{ echo "ERROR: JVMD_AGENT_HOME undefined or does not contain jamagent-env.sh" 1>&2 ; exit 1; } 

NOTE: Everything after the comment above should in a single line of code in your launch script. This line gets mangled by the blogging software so it is best to cut & paste it from it from one of the scripts in the attached zip file.

We will now look at how I used these techniques to add JVMD monitoring to Kafka, Hadoop, Zookeeper, Cassandra and Solr. 

1) Kafka 2.8.0-0.8.1.1

I used Kafka 2.8.0-0.8.1.1 and downloaded it directly from the Kafka site. In Kafka, ALL processes are initiated through a common launcher called kafka-run-class.sh in the bin folder. All the other shell scripts (including the built-in Zookeeper) call this one. So this single insertion point is the only place that we will need to modify in order to add JVMD monitoring to Kafka. Pretty simple. Using the modified script (inside the attached zip file) you can run the servers as shown below:

TEST - with mods to use JVMD
cd /home/skanga/servers/kafka_2.8.0-0.8.1.1/bin
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_MANAGER_PORT=3800

# start a zookeeper server
export JVMD_UNIQUE_ID=zookeeper-server
./zookeeper-server-start.sh ../config/zookeeper.properties

# start a kafka server
export JVMD_UNIQUE_ID=kafka-server
./kafka-server-start.sh ../config/server.properties

2) Hadoop 2.4.1

The scripts called hadoop, hfds, mapred and yarn in the hadoop bin directory will ALL need to be modified for JVMD monitoring. Using the modified scripts (inside the attached zip file) you can run all the servers as shown below:

TEST - with mods for hadoop command to use JVMD

cd /home/skanga/servers/hadoop-2.4.1
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_MANAGER_PORT=3802

# Launch the hdfs nfs gateway
export JVMD_UNIQUE_ID=hdfs-nfs3-gateway
./bin/hdfs nfs3

# Run a mapreduce history server
export JVMD_UNIQUE_ID=mapred-historyserver
./bin/mapred historyserver

# Run a yarn resource manager
export JVMD_UNIQUE_ID=yarn-resourcemanager
./bin/yarn resourcemanager

# Run a hadoop map-reduce job to find the value of PI (QuasiMonteCarlo method)
export JVMD_UNIQUE_ID=hadoop-test-pi-montecarlo
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi 1024 100

3) Zookeeper 3.4.6

The standalone version of zookeeper has a common environment setup script called zkEnv.sh where most JVMD setup can be done. After that a minor modification is needed in the java launch command in zkServer.sh after which all JVMD monitoring works fine. The scripts called zkCleanup.sh and zkCli.sh probably do not need monitoring but can be easily added if really needed.

TEST - with mods for zkServer.sh command to use JVMD

cd /home/skanga/servers/zookeeper-3.4.6/bin
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_MANAGER_PORT=3800
export JVMD_UNIQUE_ID=zk-server

# start the zookeeper server
./zkServer.sh start
./zkServer.sh status
./zkServer.sh stop

4) Cassandra 2.0.9

The Apache Cassandra data store has a common environment setup script called conf/cassandra-env.sh where we can add the command to source our include script. Then a minor modification is needed to the java launch command in bin/cassandra after which all JVMD monitoring works fine. The other scripts probably do not need monitoring but can be easily added if really needed. 

TEST - with mods for cassandra command to use JVMD

cd /home/skanga/servers/apache-cassandra-2.0.9/bin
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_MANAGER_PORT=3800
export JVMD_UNIQUE_ID=cassandra-server

# start cassandra
./cassandra -f

5) Solr 4.9.0

The Solr search server is an interesting case. In production scenarios, users will probably use the Solr war file in their own application server. In this scenario the standard JVMD warfile can be deployed to the same application server and monitored easily. However, the Solr distribution also include an embedded mode which may be used by simply running java -jar start.jar and for this scenario we have converted this java command into a simple script called start.sh and added it to the same folder as start.jar in order to run it. Using this script (inside the attached zip file) you can run a test as shown below:

TEST - with addition of start.sh command to use JVMD with Solr

cd /home/skanga/servers/solr-4.9.0/example
export JVMD_AGENT_HOME=/home/skanga/servers
export JVMD_MANAGER_HOST=jvmdconsole.us.oracle.com
export JVMD_MANAGER_PORT=3800
export JVMD_UNIQUE_ID=solr-server

# start solr
./start.sh

After everything is setup properly for your servers you should see all the relevant JVMs in the default pool with the proper ID as shown in the image below.


JVMs in Default Pool (with hostnames & ip addresses blanked out)
Click image to expand it in a new tab

Remember to be a bit patient and wait a few seconds until the connections are established and the servers appear in the console.

Wednesday Sep 10, 2014

Enterprise Manager Ops Center - Changing alert severity in default monitoring policies

Modifying Monitoring Policies

Ops Center delivers default monitoring policies for the various types of assets managed and monitored by Ops Center. These policies are specific to each asset type. In the real world, these policies act only as a starting point and you will need to customize them to suit your own environment. Most of the customizations can be done in the BUI (Browser User Interface),  which is covered in the manuals and other blogs on this site, but occasionally, you will need to manually edit the underlying XML of the default policies to get the customization you require. The method of doing that is covered in this blog entry.

List of monitoring ptofiles

In the BUI, you can easily copy these default policies and then modify them to suit your own environment.

You can do the following modifications in the BUI:

  • enable/disable monitoring rules
  • add a new monitoring rule
  • delete an existing monitoring rule
  • Modify the thresholds/severities/triggers for most alert rules


Modifications are normally done by highlighting the rule, clicking the edit [] icon, making your changes and then clicking the apply button. Remember that once you have made all the rule changes, the policy should be applied/reapplied to your target assets. Most rules are editable in this way.


Edit SMF rule


However, not all rules can be edited in the BUI. A rule like "Operating System Reachability" can not be edited from the BUI and must be done manually by editing the underlying XML. These rules can be identified by the fact that there is no edit [] icon available when the "Operating System Reachability" alert rule is selected.



Only Ops Center factory default policies (product standard default policies) can be edited by modifying the XML on the filesystem. When a policy is modified, it copies the default policy to a custom policy which can be modified in the BUI. These modified policies are stored in the database, not as XML on the filesystem. This means that if you want to change one of these non editable rules, you must manually edit the factory default policy.  Then, make a copy of the policy to create a custom policy and, if required, re-apply any additional customizations in the BUI, so that your new policy adsorbs the manual modifications.

While the default values are normally sufficient for most customers, I had a request from a customer who wanted to change the "Operating System Reachability" severity from Warning (the default) to Critical. He considered this to be an important event that needed to be alerted at a higher level so that it would grab the attention of his administration staff. Below is the procedure for how to achieve such a modification.

Manually Modifying the Default Alert Severity

As part of a standard install, Ops Center will create an alert of severity Warning if it loses connectivity with an Operating System (S8/9 OS or S10/11 GZ).

This will create an alert with the description "The asset can no longer be reached"


Warning Level alert

So here is the procedure for how to change the default alert severity for the "Operating System Reachability" alert from Warning to Critical. Be aware that there is a different alert for "Non-global zone Reachability", which will not be covered here, but modifying it, or other alerts, would follow a similar procedure.

We will be modifying the XML files for the default monitoring policies. These can be found at /var/opt/sun/xvm/monitoringprofiles on your EC.

root@ec:/var/opt/sun/xvm/monitoringprofiles# ls
Chassis.xml                       MSeriesDomain.xml                 ScCluster.xml
CiscoSwitch.xml                   NasLibrary.xml                    ScNode.xml
Cloud.xml                         NonGlobalZone.xml                 ScZoneClusterGroup.xml
ExadataCell.xml                   OperatingSystem.xml               ScZoneClusterNode.xml
FileServer.xml                    OvmGuest.xml                      Server.xml
GlobalZone.xml                    OvmHost.xml                       Storage.xml
IscsiStorageArray.xml             OvmManager.xml                    Switch.xml
LDomGuest.xml                     PDU.xml                           Tenancy.xml
LDomHost.xml                      RemoteOracleEngineeredSystem.xml  VirtualPool.xml
LocalLibrary.xml                  SanLibrary.xml
MSeriesChassis.xml                SanStorageArray.xml
root@ec:/var/opt/sun/xvm/monitoringprofiles# 

Follow the steps below to modify the monitoring policy:

  1. In the BUI, identify which policies you want to modify. Look at an asset in the BUI and select the "Monitoring" tab. At the top of the screen, you will see what monitoring policy (Alert Monitoring Rules) it is running. In this case, the policy is called "OC- Global Zone", which will be the "GlobalZone.xml" file.

    Identify Profile type

    Or alternatively, log on to the EC and grep for the alert rule name.

    # grep "Operating System Reachability" *
    GlobalZone.xml:                <name>Operating System Reachability</name>
    OperatingSystem.xml:                <name>Operating System Reachability</name>
    #

    In this case, we will want to change "OC - Operating System" and "OC - Global Zone" policies, as they both have the "Operating System Reachability" rule, so we will be editing both the "GlobalZone.xml" and "OperatingSystem.xml" files.

  2. Make a backup copy of any XML file you modify (in case you mess something up).

    # pwd
    /var/opt/sun/xvm/monitoringprofiles
    # cp OperatingSystem.xml OperatingSystem.xml.orig
    # cp GlobalZone.xml GlobalZone.xml.orig
    
  3. Edit each file and look for the rule name

     <monitor>
         <enabled>true</enabled>
         <monitorType>Reachability</monitorType>
         <name>Operating System Reachability</name>
         <parameter>
            <name>unreachable.duration.minutes.WARNING</name>
            <value>3</value>
         </parameter>
     </monitor>
    

    and change "unreachable.duration.minutes.WARNING" to "unreachable.duration.minutes.CRITICAL".

    <monitor>
         <enabled>true</enabled>
         <monitorType>Reachability</monitorType>
         <name>Operating System Reachability</name>
         <parameter>
            <name>unreachable.duration.minutes.CRITICAL</name>
            <value>3</value>
         </parameter>
     </monitor>

    Repeat for the other file(s).

  4. Make a backup copy of  your modified XML files as these files may be overwritten during an upgrade process.

  5. Now restart the EC so that the new monitoring policies are re-read.

  6. You should now apply the new policy to the hosts you want to have the updated rule.

  7. Check the Message Center in the Navigation panel and you will see that your alert has now changed from "Warning" to "Critical".

Critical Alert level

A Best Practice option would now use the BUI to copy the new  (OC - Global Zone and OC - Operating System) policies to your own custom policies, adding any additional rule modifications. Copying the new OC policy  to a custom policy saves it into the database so it will not get overridden by any subsequent Ops Center upgrade. Remember to apply the custom policy to your asset(s) or asset groups.

It is good practice to keep the name of the source policy in the name of your custom policy. It will make your life easier if you ever get confused about which policy applies to which type of asset or if you want to go back to the original source policy.

If you want your new custom policy to be automatically applied when you discover/provision a new asset, you will need to select the policy and click the "Set as Default Policy" action for that asset class.

Setting default policy

The green tick on the icon indicates that a policy is the default for that asset class.

You have now successfully modified the default alert severity, for an alert that could not be modified in the BUI.

Regards,

Rodney Lindner
Senior IT/Product Architect
Systems Management - Ops Center Engineering

Tuesday Sep 09, 2014

Express Scripts Automates IT with Oracle Enterprise Manager

Express Scripts, the largest pharmacy benefit management organization in the United States, automates its database services using Oracle Enterprise Manager's centralized management solution. Watch the video.



Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app

Wednesday Aug 27, 2014

Demo: Consolidation Planner

Oracle Enterprise Manager 12c Consolidation Planner is a great tool that helps you plan and consolidate multiple targets on to a single machine such as Oracle Exadata. This solution helps you visualize what you have running in your environment and where you can take advantage of consolidation in order to maximize resources and lower IT operational costs. Watch the demo below to get a better understand of how Consolidation Planner works.


Want to learn more, read the Consolidation Planner documentation.


Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app

Tuesday Aug 26, 2014

Q&A: Oracle's Andrew Sutherland on Managing the Entire Oracle Stack with Oracle Enterprise Manager 12c

As senior vice president of middleware for Oracle in Europe, Dr. Andrew Sutherland has more than 20 years’ experience in emerging technologies and their application to business problems. Currently, he manages a team of architects, business development managers, and technical specialists who help customers make the best use of their investments in Oracle technologies.

Given his breadth and depth of experience, we decided to ask Sutherland how Oracle Enterprise Manager 12c Release 4 is helping the Oracle customers he works with.

Q. What makes Oracle Enterprise Manager 12c different from competitors' offerings?
A. Oracle Enterprise Manager's approach is unique in that it manages across the entire Oracle stack, from applications, middleware, and the database all the way down to servers and storage. That means it can truly unlock the value of the entire Oracle stack.

Q. What is the payoff for organizations that adopt such a comprehensive approach?
A. Our customers are able to manage the entire Oracle estate in the most cost-effective way possible by automating many of their day-to-day tasks. To give you an idea of its scope, many of our customers have made sure that Oracle Enterprise Manager 12c’s dashboard is available to their senior IT management team. They use it to ensure that all parts of their IT stack are delivering what they should be delivering, when they should be delivering it.

Perhaps most important of all, Oracle Enterprise Manager 12c enables organizations to move beyond the old paradigm of multiple independent IT stacks to offer infrastructure as a service and platform as a service.

Q. As someone who helps customers make the most of their investment in Oracle technology, what do you find most promising about Oracle Enterprise Manager 12c Release 4?
A. There are three key areas that are especially exciting. First, it provides an accelerated path to the cloud. Whether you are building a small, medium, or large private cloud within your organization, it provides the tools you need to make it happen, from designing the cloud to provisioning and testing.

Secondly, this release provides monitoring and management tools that go both deeper into the stack and wider across components within the stack. That means an even more comprehensive dashboard.

Finally Oracle Enterprise Manager 12c Release 4 offers true enterprise-grade management. With the growth of social and mobile connectivity, the need for a highly performant and robust stack is more prominent than ever. And Oracle Enterprise Manager 12c is there to do exactly that: manage true, enterprise-grade IT deployments.

Q. What should Oracle customers do if they want to learn more about the latest release of Oracle Enterprise Manager 12c?
A. First, speak to your Oracle contact, whether it is a partner or Oracle representative, to get more complete information. Also consider coming to an Oracle Day event in your area, especially if you can attend one dedicated to cloud computing. And in the meantime, you can always head to the Oracle Enterprise Manager pages on oracle.com to get started.

Find out more about Oracle Enterprise Manager 12c Release 4.

Watch a short video featuring Sutherland.



Stay Connected:
Twitter |
Facebook | YouTube | Linkedin | Newsletter
Download the Oracle Enterprise Manager 12c Mobile app
About

Latest information and perspectives on Oracle Enterprise Manager.

Related Blogs




Search

Categories
Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today