Tuesday Dec 23, 2014

The vServers were migrated to another Compute Node... Oh My!

Recently I was working 2 similar SRs, both related to this behavior of Exalogic.

When for some reason a compute node is rebooted, the vServers running on it are automatically moved to other compute nodes. Why does this happen? This will occur if the HA flag is enabled on a vServer, which can verified by looking at vm.cfg file of that VM.

After the compute node that was rebooted is back up and running, you may probably want those migrated vServers to be located where they were before the outage (that is, on the previously failed compute node). Unfortunately, "Live Migration" or the ability to migrate running vServers is not yet supported in Exalogic. By design, migration is not exposed at the vDC level, a vDC cloud user does not care about where his vServer runs, it runs in the cloud. So, you need to use "Cold Migration" instead.

Basically, cold migration may be executed manually by admin with "root" privileges on OVS node:
- stop a vServer, it will be detached from OVS
- start a vServer on the OVS node you want your vServer to run, making sure you do not switch server pools

Tuesday Aug 12, 2014

A "ZFS Storage Appliance is not reachable" message is thrown by Exachk and the ZFS checks are skipped. WAD?

Sometimes it may happen that something like the following can be seen on the "Skipped Nodes" section:

Host NameReason
myexalogic01sn01-mgm1ZFS Storage Appliance is not reachable
myexalogic01sn01-mgm2ZFS Storage Appliance is not reachable

Also, a message like the following can be seen when executing Exachk:
Could not find infiniband gateway switch names from env or configuration file.
Please enter the first gateway infiniband switch name :
Could not find storage node names from env or configuration file.
Please enter the first storage server :

This is because the way Exachk works on this is based on the standard naming convention of "<rack-name>sn0x" format.

To solve this, make sure there is an o_storage.out file in the directory where Exachk is installed. If the file is missing, create a blank one.

The o_storage.out must contain the right storage nodes hostnames in the format they have in hosts file. This format should typically be something like "<rack-name>sn0x-mgmt" For example an o_storage.out should look quite simply as below: myexalogic01sn01-mgmt

This way it is ensured that the o_storage.out file has valid ZFS Storage Appliance hostnames.

If the switch checks are skipped, then a similar procedure should be performed with the o_switches.out file.

Wednesday Mar 26, 2014

Independent Clusters

Some days ago, I was asked if it's possible to have an Oracle Service Bus (OSB) Domain domain with 2 independent clusters. The answer is no. In a domain you can have one and only one OSB cluster. An OSB cluster can co-exist with other clusters, like a WLS-only clusters, SOA Suite cluster, etc., but not with another OSB cluster. As a matter of fact, in this case, the observed behavior was that all services were assigned to the first created OSB cluster, and the newer one was not responding the URL calls. Works as designed.

However, for a WLS domain, is it possible to have 2 independent clusters? In this case, the answer is yes. Here you can check what types of objects can be clustered and what cannot be clustered.

The following types of objects can be clustered in a WebLogic Server deployment:
  • Servlets
  • JSPs
  • EJBs
  • Remote Method Invocation (RMI) objects
  • Java Messaging Service (JMS) destinations
The following APIs and internal services cannot be clustered in WebLogic Server:
  • File services including file shares
  • Time service

Wednesday Dec 12, 2012

Managed servers getting down regularly by Node Manager. WAD?

Recently I have been working on a service request where several instances were running, and several technologies were being used, including SOA, BAM, BPEL and others.

At a first glance, this may seem to be a Node Manager problem. But on this situation, the problem was actually at JMS - Persistent Store level. Node Manager can automatically restart Managed Servers that have the "failed" health state, or have shut down unexpectedly due to a system crash or reboot. As a matter of fact, from the provided log files it was clear that the instance was becoming unhealthy because of a Persistent Store problem.

So finally, the problem here was not with Node Manager as it was working as designed, and the restart was being caused by the Persistent Store. After this Persistent Store problem was fixed, everything went fine.

This particular issue that I worked was on an Exalogic machine, but note that this may happen on any hardware running Weblogic.

