Wednesday Feb 26, 2014

Clustering Events

Setting up an Oracle Event Processing Cluster

Recently I was working with Oracle Event Processing (OEP) and needed to set it up as part  of a high availability cluster.  OEP uses Coherence for quorum membership in an OEP cluster.  Because the solution used caching it was also necessary to include access to external Coherence nodes.  Input messages need to be duplicated across multiple OEP streams and so a JMS Topic adapter needed to be configured.  Finally only one copy of each output event was desired, requiring the use of an HA adapter.  In this blog post I will go through the steps required to implement a true HA OEP cluster.

OEP High Availability Review

The diagram below shows a very simple non-HA OEP configuration:

Events are received from a source (JMS in this blog).  The events are processed by an event processing network which makes use of a cache (Coherence in this blog).  Finally any output events are emitted.  The output events could go to any destination but in this blog we will emit them to a JMS queue.

OEP provides high availability by having multiple event processing instances processing the same event stream in an OEP cluster.  One instance acts as the primary and the other instances act as secondary processors.  Usually only the primary will output events as shown in the diagram below (top stream is the primary):

The actual event processing is the same as in the previous non-HA example.  What is different is how input and output events are handled.  Because we want to minimize or avoid duplicate events we have added an HA output adapter to the event processing network.  This adapter acts as a filter, so that only the primary stream will emit events to out queue.  If the processing of events within the network depends on how the time at which events are received then it is necessary to synchronize the event arrival time across the cluster by using an HA input adapter to synchronize the arrival timestamps of events across the cluster.

OEP Cluster Creation

Lets begin by setting up the base OEP cluster.  To do this we create new OEP configurations on each machine in the cluster.  The steps are outlined below.  Note that the same steps are performed on each machine for each server which will run on that machine:

  • Run ${MW_HOME}/ocep_11.1/common/bin/config.sh.
    • MW_HOME is the installation directory, note that multiple Fusion Middleware products may be installed in this directory.
  • When prompted “Create a new OEP domain”.
  • Provide administrator credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Specify a  “Server name” and “Server listen port”.
    • Each OEP server must have a unique name.
    • Different servers can share the same “Server listen port” unless they are running on the same host.
  • Provide keystore credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Configure any required JDBC data source.
  • Provide the “Domain Name” and “Domain location”.
    • All servers must have the same “Domain name”.
    • The “Domain location” may be different on each server, but I would keep it the same to simplify administration.
    • Multiple servers on the same machine can share the “Domain location” because their configuration will be placed in the directory corresponding to their server name.
  • Create domain!

Configuring an OEP Cluster

Now that we have created our servers we need to configure them so that they can find each other.  OEP uses Oracle Coherence to determine cluster membership.  Coherence clusters can use either multicast or unicast to discover already running members of a cluster.  Multicast has the advantage that it is easy to set up and scales better (see http://www.ateam-oracle.com/using-wka-in-large-coherence-clusters-disabling-multicast/) but has a number of challenges, including failure to propagate by default through routers and accidently joining the wrong cluster because someone else chose the same multicast settings.  We will show how to use both unicast and multicast to discover the cluster. 

Multicast Discovery Unicast Discovery
Coherence multicast uses a class D multicast address that is shared by all servers in the cluster.  On startup a Coherence node broadcasts a message to the multicast address looking for an existing cluster.  If no-one responds then the node will start the cluster. Coherence unicast uses Well Known Addresses (WKAs). Each server in the cluster needs a dedicated listen address/port combination. A subset of these addresses are configured as WKAs and shared between all members of the cluster. As long as at least one of the WKAs is up and running then servers can join the cluster. If a server does not find any cluster members then it checks to see if its listen address and port are in the WKA list. If it is then that server will start the cluster, otherwise it will wait for a WKA server to become available.
  To configure a cluster the same steps need to be followed for each server in the cluster:
  • Set an event server address in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          <server-name>server1</server-name>
          <server-host-name>oep1.oracle.com</server-host-name>
      </cluster>
    • The “server-name” is displayed in the visualizer and should be unique to the server.

    • The “server-host-name” is used by the visualizer to access remote servers.

    • The “server-host-name” must be an IP address or it must resolve to an IP address that is accessible from all other servers in the cluster.

    • The listening port is configured in the <netio> section of the config.xml.

    • The server-host-name/listening port combination should be unique to each server.

 
  • Set a common cluster multicast listen address shared by all servers in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          …
          <!—For us in Coherence multicast only! –>
          <multicast-address>239.255.200.200</multicast-address>
          <multicast-port>9200</multicast-port>
      </cluster>
    • The “multicast-address” must be able to be routed through any routers between servers in the cluster.

  • Optionally you can specify the bind address of the server, this allows you to control port usage and determine which network is used by Coherence

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>
  • Configure the Coherence WKA cluster discovery.

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—WKA Configuration –>
                  <well-known-addresses>
                      <socket-address id="1">
                          <address>192.168.56.91</address>
                          <port>9200</port>
                      </socket-address>
                      <socket-address id="2">
                          <address>192.168.56.92</address>
                          <port>9200</port>
                      </socket-address>
                  </well-known-addresses>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>

    • List at least two servers in the <socket-address> elements.

    • For each <socket-address> element there should be a server that has corresponding <address> and <port> elements directly under <well-known-addresses>.

    • One of the servers listed in the <well-known-addresses> element must be the first server started.

    • Not all servers need to be listed in <well-known-addresses>, but see previous point.

 
  • Enable clustering using a Coherence cluster.
    • Add the following to the <cluster> element in config.xml.
      <cluster>
          …
          <enabled>true</enabled>
      </cluster>
    • The “enabled” element tells OEP that it will be using Coherence to establish cluster membership, this can also be achieved by setting the value to be “coherence”.

 
  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <!—For us in Coherence multicast only! –>
        <multicast-address>239.255.200.200</multicast-address>
        <multicast-port>9200</multicast-port>
        <enabled>true</enabled>
    </cluster>

  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <enabled>true</enabled>
    </cluster>

 
  • The following shows the “tangosol-coherence-override.xml” file for another server in the cluster with differences highlighted:
    <?xml version='1.0'?>
    <coherence>
        <cluster-config>
            <unicast-listener>
                <!—WKA Configuration –>
                <well-known-addresses>
                    <socket-address id="1">
                        <address>192.168.56.91</address>
                        <port>9200</port>
                    </socket-address>
                    <socket-address id="2">
                        <address>192.168.56.92</address>
                        <port>9200</port>
                    </socket-address>
                    <!—This server Coherence address and port number –>
                    <address>192.168.56.92</address>
                    <port>9200</port>
                </well-known-addresses>
            </unicast-listener>
        </cluster-config>
    </coherence>

You should now have a working OEP cluster.  Check the cluster by starting all the servers.

Look for a message like the following on the first server to start to indicate that another server has joined the cluster:

<Coherence> <BEA-2049108> <The domain membership has changed to [server2, server1], the new domain primary is "server1">

Log on to the Event Processing Visualizer of one of the servers – http://<hostname>:<port>/wlevs.  Select the cluster name on the left and then select group “AllDomainMembers”.  You should see a list of all the running servers in the “Servers of Group – AllDomainMembers” section.

Sample Application

Now that we have a working OEP cluster let us look at a simple application that can be used as an example of how to cluster enable an application.  This application models service request tracking for hardware products.  The application we will use performs the following checks:

  1. If a new service request (identified by SRID) arrives (indicated by status=RAISE) then we expect some sort of follow up in the next 10 seconds (seconds because I want to test this quickly).  If no follow up is seen then an alert should be raised.
    • For example if I receive an event (SRID=1, status=RAISE) and after 10 seconds I have not received a follow up message (SRID=1, status<>RAISE) then I need to raise an alert.
  2. If a service request (identified by SRID) arrives and there has been another service request (identified by a different SRID) for the same physcial hardware (identified by TAG) then an alert should be raised.
    • For example if I receive an event (SRID=2, TAG=M1) and later I receive another event for the same hardware (SRID=3, TAG=M1) then an alert should be raised.

Note use case 1 is nicely time bounded – in this case the time window is 10 seconds.  Hence this is an ideal candidate to be implemented entirely in CQL.

Use case 2 has no time constraints, hence over time there could be a very large number of CQL queries running looking for a matching TAG but a different SRID.  In this case it is better to put the TAGs into a cache and search the cache for duplicate tags.  This reduces the amount of state information held in the OEP engine.

The sample application to implement this is shown below:

Messages are received from a JMS Topic (InboundTopicAdapter).  Test messages can be injected via a CSV adapter (RequestEventCSVAdapter).  Alerts are sent to a JMS Queue (OutboundQueueAdapter), and also printed to the server standard output (PrintBean).  Use case 1 is implemented by the MissingEventProcessor.  Use case 2 is implemented by inserting the TAG into a cache (InsertServiceTagCacheBean) using a Coherence event processor and then querying the cache for each new service request (DuplicateTagProcessor), if the same tag is already associated with an SR in the cache then an alert is raised.  The RaiseEventFilter is used to filter out existing service requests from the use case 2 stream.

The non-HA version of the application is available to download here.

We will use this application to demonstrate how to HA enable an application for deployment on our cluster.

A CSV file (TestData.csv) and Load generator properties file (HADemoTest.prop) is provided to test the application by injecting events using the CSV Adapter.

Note that the application reads a configuration file (System.properties) which should be placed in the domain directory of each event server.

Deploying an Application

Before deploying an application to a cluster it is a good idea to create a group in the cluster.  Multiple servers can be members of this group.  To add a group to an event server just add an entry to the <cluster> element in config.xml as shown below:

<cluster>
      …
      <groups>HAGroup</groups>
   </cluster>

Multiple servers can be members of a group and a server can be a member of multiple groups.  This allows you to have different levels of high availability in the same event processing cluster.

Deploy the application using the Visualizer.  Target the application at the group you created, or the AllDomainMembers group.

Test the application, typically using a CSV Adapter.  Note that using a CSV adapter sends all the events to a single event server.  To fix this we need to add a JMS output adapter (OutboundTopicAdapter) to our application and then send events from the CSV adapter to the outbound JMS adapter as shown below:

So now we are able to send events via CSV to an event processor that in turn sends the events to a JMS topic.  But we still have a few challenges.

Managing Input

First challenge is managing input.  Because OEP relies on the same event stream being processed by multiple servers we need to make sure that all our servers get the same message from the JMS Topic.  To do this we configure the JMS connection factory to have an Unrestricted Client ID.  This allows multiple clients (OEP servers in our case) to use the same connection factory.  Client IDs are mandatory when using durable topic subscriptions.  We also need each event server to have its own subscriber ID for the JMS Topic, this ensures that each server will get a copy of all the messages posted to the topic.  If we use the same subscriber ID for all the servers then the messages will be distributed across the servers, with each server seeing a completely disjoint set of messages to the other servers in the cluster.  This is not what we want because each server should see the same event stream.  We can use the server name as the subscriber ID as shown in the below excerpt from our application:

<wlevs:adapter id="InboundTopicAdapter" provider="jms-inbound">
    …
    <wlevs:instance-property name="durableSubscriptionName"
            value="${com_bea_wlevs_configuration_server_ClusterType.serverName}" />
</wlevs:adapter>

This works because I have placed a ConfigurationPropertyPlaceholderConfigurer bean in my application as shown below, this same bean is also used to access properties from a configuration file:

<bean id="ConfigBean"
        class="com.bea.wlevs.spring.support.ConfigurationPropertyPlaceholderConfigurer">
        <property name="location" value="file:../Server.properties"/>
    </bean>

With this configuration each server will now get a copy of all the events.

As our application relies on elapsed time we should make sure that the timestamps of the received messages are the same on all servers.  We do this by adding an HA Input adapter to our application.

<wlevs:adapter id="HAInputAdapter" provider="ha-inbound">
    <wlevs:listener ref="RequestChannel" />
    <wlevs:instance-property name="keyProperties"
            value="EVID" />
    <wlevs:instance-property name="timeProperty" value="arrivalTime"/>
</wlevs:adapter>

The HA Adapter sets the given “timeProperty” in the input message to be the current system time.  This time is then communicated to other HAInputAdapters deployed to the same group.  This allows all servers in the group to have the same timestamp in their event.  The event is identified by the “keyProperties” key field.

To allow the downstream processing to treat the timestamp as an arrival time then the downstream channel is configured with an “application-timestamped” element to set the arrival time of the event.  This is shown below:

<wlevs:channel id="RequestChannel" event-type="ServiceRequestEvent">
    <wlevs:listener ref="MissingEventProcessor" />
    <wlevs:listener ref="RaiseEventFilterProcessor" />
    <wlevs:application-timestamped>
        <wlevs:expression>arrivalTime</wlevs:expression>
    </wlevs:application-timestamped>
</wlevs:channel>

Note the property set in the HAInputAdapter is used to set the arrival time of the event.

So now all servers in our cluster have the same events arriving from a topic, and each event arrival time is synchronized across the servers in the cluster.

Managing Output

Note that an OEP cluster has multiple servers processing the same input stream.  Obviously if we have the same inputs, synchronized to appear to arrive at the same time then we will get the same outputs, which is central to OEPs promise of high availability.  So when an alert is raised by our application it will be raised by every server in the cluster.  If we have 3 servers in the cluster then we will get 3 copies of the same alert appearing on our alert queue.  This is probably not what we want.  To fix this we take advantage of an HA Output Adapter.  unlike input where there is a single HA Input Adapter there are multiple HA Output Adapters, each with distinct performance and behavioral characteristics.  The table below is taken from the Oracle® Fusion Middleware Developer's Guide for Oracle Event Processing and shows the different levels of service and performance impact:

Table 24-1 Oracle Event Processing High Availability Quality of Service

High Availability Option Missed Events? Duplicate Events? Performance Overhead
Section 24.1.2.1, "Simple Failover" Yes (many) Yes (few) Negligible
Section 24.1.2.2, "Simple Failover with Buffering" Yes (few)Foot 1 Yes (many) Low
Section 24.1.2.3, "Light-Weight Queue Trimming" No Yes (few) Low-MediumFoot 2
Section 24.1.2.4, "Precise Recovery with JMS" No No High

I decided to go for the lightweight queue trimming option.  This means I won’t lose any events, but I may emit a few duplicate events in the event of primary failure.  This setting causes all output events to be buffered by secondary's until they are told by the primary that a particular event has been emitted.  To configure this option I add the following adapter to my EPN:

    <wlevs:adapter id="HAOutputAdapter" provider="ha-broadcast">
        <wlevs:listener ref="OutboundQueueAdapter" />
        <wlevs:listener ref="PrintBean" />
        <wlevs:instance-property name="keyProperties" value="timestamp"/>
        <wlevs:instance-property name="monotonic" value="true"/>
        <wlevs:instance-property name="totalOrder" value="false"/>
    </wlevs:adapter>

This uses the time of the alert (timestamp property) as the key to be used to identify events which have been trimmed.  This works in this application because the alert time is the time of the source event, and the time of the source events are synchronized using the HA Input Adapter.  Because this is a time value then it will increase, and so I set monotonic=”true”.  However I may get two alerts raised at the same timestamp and in that case I set totalOrder=”false”.

I also added the additional configuration to config.xml for the application:

<ha:ha-broadcast-adapter>
    <name>HAOutputAdapter</name>
    <warm-up-window-length units="seconds">15</warm-up-window-length>
    <trimming-interval units="millis">1000</trimming-interval>
</ha:ha-broadcast-adapter>

This causes the primary to tell the secondary's which is its latest emitted alert every 1 second.  This will cause the secondary's to trim from their buffers all alerts prior to and including the latest emitted alerts.  So in the worst case I will get one second of duplicated alerts.  It is also possible to set a number of events rather than a time period.  The trade off here is that I can reduce synchronization overhead by having longer time intervals or more events, causing more memory to be used by the secondary's or I can cause more frequent synchronization, using less memory in the secondary's and generating fewer duplicate alerts but there will be more communication between the primary and the secondary's to trim the buffer.

The warm-up window is used to stop a secondary joining the cluster before it has been running for that time period.  The window is based on the time that the EPN needs to be running to be have the same state as the other servers.  In our example application we have a CQL that runs for a period of 10 seconds, so I set the warm up window to be 15 seconds to ensure that a newly started server had the same state as all the other servers in the cluster.  The warm up window should be greater than the longest query window.

Adding an External Coherence Cluster

When we are running OEP as a cluster then we have additional overhead in the servers.  The HA Input Adapter is synchronizing event time across the servers, the HA Output adapter is synchronizing output events across the servers.  The HA Output adapter is also buffering output events in the secondary’s.  We can’t do anything about this but we can move the Coherence Cache we are using outside of the OEP servers, reducing the memory pressure on those servers and also moving some of the processing outside of the server.  Making our Coherence caches external to our OEP cluster is a good idea for the following reasons:

  • Allows moving storage of cache entries outside of the OEP server JVMs hence freeing more memory for storing CQL state.
  • Allows storage of more entries in the cache by scaling cache independently of the OEP cluster.
  • Moves cache processing outside OEP servers.

To create the external Coherence cache do the following:

  • Create a new directory for our standalone Coherence servers, perhaps at the same level as the OEP domain directory.
  • Copy the tangosol-coherence-override.xml file previously created for the OEP cluster into a config directory under the Coherence directory created in the previous step.
  • Copy the coherence-cache-config.xml file from the application into a config directory under the Coherence directory created in the previous step.
  • Add the following to the tangosol-coherence-override.xml file in the Coherence config directory:
    • <coherence>
          <cluster-config>
              <member-identity>
                  <cluster-name>oep_cluster</cluster-name>
                  <member-name>Grid1</member-name>
              </member-identity>
              …
          </cluster-config>
      </coherence>
    • Important Note: The <cluster-name> must match the name of the OEP cluster as defined in the <domain><name> element in the event servers config.xml.
    • The member name is used to help identify the server.
  • Disable storage for our caches in the event servers by editing the coherence-cache-config.xml file in the application and adding the following element to the caches:
    • <distributed-scheme>
          <scheme-name>DistributedCacheType</scheme-name>
          <service-name>DistributedCache</service-name>
          <backing-map-scheme>
              <local-scheme/>
          </backing-map-scheme>
          <local-storage>false</local-storage>
      </distributed-scheme>
    • The local-storage flag stops the OEP server from storing entries for caches using this cache schema.
    • Do not disable storage at the global level (-Dtangosol.coherence.distributed.localstorage=false) because this will disable storage on some OEP specific cache schemes as well as our application cache.  We don’t want to put those schemes into our cache servers because they are used by OEP to maintain cluster integrity and have only one entry per application per server, so are very small.  If we put those into our Coherence Cache servers we would have to add OEP specific libraries to our cache servers and enable them in our coherence-cache-config.xml, all of which is too much trouble for little or no benefit.
  • If using Unicast Discovery (this section is not required if using Multicast) then we want to make the Coherence Grid be the Well Known Address servers because we want to disable storage of entries on our OEP servers, and Coherence nodes with storage disabled cannot initialize a cluster.  To enable the Coherence servers to be primaries in the Coherence grid do the following:
    • Change the unicast-listener addresses in the Coherence servers tangosol-coherence-override.xml file to be suitable values for the machine they are running on – typically change the listen address.
    • Modify the WKA addresses in the OEP servers and the Coherence servers tangosol-coherence-override.xml file to match at least two of the Coherence servers listen addresses.
    • The following table shows how this might be configured for 2 OEP servers and 2 Cache servers
      OEP Server 1 OEP Server 2 Cache Server 1 Cache Server 2

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid1
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid2
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

    • Note that the OEP servers do not listen on the WKA addresses, using different port numbers even though they run on the same servers as the cache servers.
    • Also not that the Coherence servers are the ones that listen on the WKA addresses.
  • Now that the configuration is complete we can create a start script for the Coherence grid servers as follows:
    • #!/bin/sh
      MW_HOME=/home/oracle/fmw
      OEP_HOME=${MW_HOME}/ocep_11.1
      JAVA_HOME=${MW_HOME}/jrockit_160_33
      CACHE_SERVER_HOME=${MW_HOME}/user_projects/domains/oep_coherence
      CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config
      COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar
      JAVAEXEC=$JAVA_HOME/bin/java
      # specify the JVM heap size
      MEMORY=512m
      if [[ $1 == '-jmx' ]]; then
          JMXPROPERTIES="-Dcom.sun.management.jmxremote -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
          shift
      fi
      JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY $JMXPROPERTIES"
      $JAVAEXEC -server -showversion $JAVA_OPTS -cp "${CACHE_SERVER_CLASSPATH}:${COHERENCE_JAR}" com.tangosol.net.DefaultCacheServer $1
    • Note that I put the tangosol-coherence-override and the coherence-cache-config.xml files in a config directory and added that directory to my path (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config) so that Coherence would find the override file.
    • Because my application uses in-cache processing (entry processors) I had to add a jar file containing the required classes for the entry processor to the classpath (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config).
    • The classpath references the Coherence Jar shipped with OEP to avoid versoin mismatches (COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar).
    • This script is based on the standard cache-server.sh script that ships with standalone Coherence.
    • The –jmx flag can be passed to the script to enable Coherence JMX management beans.

We have now configured Coherence to use an external data grid for its application caches.  When starting we should always start at least one of the grid servers before starting the OEP servers.  This will allow the OEP server to find the grid.  If we do start things in the wrong order then the OEP servers will block waiting for a storage enabled node to start (one of the WKA servers if using Unicast).

Summary

We have now created an OEP cluster that makes use of an external Coherence grid for application caches.  The application has been modified to ensure that the timestamps of arriving events are synchronized and the output events are only output by one of the servers in the cluster.  In event of failure we may get some duplicate events with our configuration (there are configurations that avoid duplicate events) but we will not lose any events.  The final version of the application with full HA capability is shown below:

Files

The following files are available for download:

  • Oracle Event Processing
    • Includes Coherence
  • None-HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • OEP Cluster Files
    • Includes config.xml
    • Includes tangosol-coherence-override.xml
    • Includes Server.properties that will need customizing for your WLS environment
  • Coherence Cluster Files
    • Includes tangosol-coherence-override.xml and coherence-cache-configuration.xml
    • includes cache-server.sh start script
    • Includes HADemoCoherence.jar with required classes for entry processor

References

The following references may be helpful:

Tuesday Dec 24, 2013

Cleaning Up After Yourself

Maintaining a Clean SOA Suite Test Environment

Fun blog entry with Fantasia animated gifs got me thinking like Mickey about how nice it would be to automate clean up tasks.

I don’t have a sorcerers castle to clean up but I often have a test environment which I use to run tests, then after fixing problems that I uncovered in the tests I want to run them again.  The problem is that all the data from my previous test environment is still there.

Now in the past I used VirtualBox snapshots to rollback to a clean state, but this has a problem that it not only loses the environment changes I want to get rid of such as data inserted into tables, it also gets rid of changes I want to keep such as WebLogic configuration changes and new shell scripts.  So like Mickey I went in search of some magic to help me.

Cleaning Up SOA Environment

My first task was to clean up the SOA environment by deleting all instance data from the tables.  Now I could use the purge scripts to do this, but that would still leave me with running instances, for example 800 Human Workflow Tasks that I don’t want to deal with.  So I used the new truncate script to take care of this.  Basically this removes all instance data from your SOA Infrastructure, whether or not the data is live.  This can be run without taking down the SOA Infrastructure (although if you do get strange behavior you may want to restart SOA).  Some statistics, such are service and reference statistics, are kept since server startup, so you may want to restart your server to clear that data.  A sample script to run the truncate SQL is shown below.

#!/bin/sh
# Truncate the SOA schemas, does not truncate BAM.
# Use only in development and test, not production.

# Properties to be set before running script
# SOAInfra Database SID
DB_SID=orcl
# SOA DB Prefix
SOA_PREFIX=DEV
# SOAInfra DB password
SOAINFRA_PASSWORD=welcome1
# SOA Home Directory
SOA_HOME=/u01/app/fmw/Oracle_SOA1

# Set DB Environment
. oraenv << EOF
${DB_SID}
EOF

# Run Truncate script from directory it lives in
cd ${SOA_HOME}/rcu/integration/soainfra/sql/truncate

# Run the truncate script
sqlplus ${SOA_PREFIX}_soainfra/${SOAINFRA_PASSWORD} @truncate_soa_oracle.sql << EOF
exit
EOF

After running this script all your SOA composite instances and associated workflow instances will be gone.

Cleaning Up BAM

The above example shows how easy it is to get rid of all the runtime data in your SOA repository, however if you are using BAM you still have all the contents of your BAM objects from previous runs.  To get rid of that data we need to use BAM ICommand’s clear command as shown in the sample script below:

#!/bin/sh
# Set software locations
FMW_HOME=/home/oracle/fmw
export JAVA_HOME=${FMW_HOME}/jdk1.7.0_17
BAM_CMD=${FMW_HOME}/Oracle_SOA1/bam/bin/icommand
# Set objects to purge
BAM_OBJECTS=/path/RevenueEvent /path/RevenueViolation

# Clean up BAM
for name in ${BAM_OBJECTS}
do
  ${BAM_CMD} -cmd clear -name ${name} -type dataobject
done

After running this script all the rows of the listed objects will be gone.

Ready for Inspection

Unlike the hapless Mickey, our clean up scripts work reliably and do what we want without unexpected consequences, like flooding the castle.

Saturday Oct 12, 2013

Share & Enjoy : Using a JDeveloper Project as an MDS Store

Share & Enjoy : Sharing Resources through MDS

One of my favorite radio shows was the Hitchhikers Guide to the Galaxy by the sadly departed Douglas Adams.  One of the characters, Marvin the Paranoid Android, was created by the Sirius Cybernetics Corporation whose corporate song was entitled Share and Enjoy!  Just like using the products of the Sirius Cybernetics Corporation, reusing resources through MDS is not fun, but at least it is useful and avoids some problems in SOA deployments.  So in this blog post I am going to show you how to re-use SOA resources stored in MDS using JDeveloper as a development tool.

The Plan

We would like to have some SOA resources such as WSDLs, XSDs, Schematron files, DVMs etc. stored in a shared location.  This gives us the following benefits

  • Single source of truth for artifacts
  • Remove cross composite dependencies which can cause deployment and startup problems
  • Easier to find and reuse resources if stored in a single location

So we will store a WSDL and XSD in MDS, using a JDeveloper project to maintain the shared artifact and using File based MDS to access it from development and Database based MDS to access it from runtime.  We will create the shared resources in a JDeveloper project and deploy them to MDS.  We will then deploy a project that exposes a service based on the WSDL.  Finally we will deploy a client project to the previous project that uses the same MDS resources.

Creating Shared Resources in a JDeveloper Project

First lets create a JDeveloper project and put our shared resources into that project.  To do this

  1. In a JDeveloper Application create a New Generic Project (File->New->All Technologies->General->Generic Project)
  2. In that project create a New Folder called apps (File->New->All Technologies->General->Folder) – It must be called apps for local File MDS to work correctly.
  3. In the project properties delete the existing Java Source Paths (Project Properties->Project Source Paths->Java Source Paths->Remove)
  4. In the project properties a a new Java Source Path pointing to the just created apps directory (Project Properties->Project Source Paths->Java Source Paths->Add)
    JavaSourcePaths

Having created the project we can now put our resources into that project, either copying them from other projects or creating them from scratch.

Create a SOA Bundle to Deploy to a SOA Instance

Having created our resources we now want to package them up for deployment to a SOA instance.  To do this we take the following steps.

  1. Create a new JAR deployment profile (Project Properties->Deployment->New->Jar File)
  2. In JAR Options uncheck the Include Manifest File
  3. In File Groups->Project Output->Contributors uncheck all existing contributors and check the Project Source Path
  4. Create a new SOA Bundle deployment profile (Application Properties->Deployment->New->SOA Bundle)
  5. In Dependencies select the project jar file from the previous steps.
    SOABundle
  6. On Application Properties->Deployment unselect all options.
    SOABundle2

The bundle can now be deployed to the server by selecting Deploy from the Application Menu.

Create a Database Based MDS Connection in JDeveloper

Having deployed our shared resources it would be good to check they are where we expect them to be so lets create a Database Based MDS Connection in JDeveloper to let us browse the deployed resources.

  1. Create a new MDS Connection (File->All Technologies->General->Connections->SOA-MDS Connection)
  2. Make the Connection Type DB Based MDS and choose the database Connection and parition.  The username of the connection will be the <PREFIX>_mds user and the MDS partition will be soa-infra.

Browse the repository to make sure that your resources deplyed correctly under the apps folder.  Note that you can also use this browser to look at deployed composites.  You may find it intersting to look at the /deployed-composites/deployed-composites.xml file which lists all deployed composites.

DbMDSbrowse

Create an File Based MDS Connection in JDeveloper

We can now create a File based MDS connection to the project we just created.  A file based MDS connection allows us to work offline without a database or SOA server.  We will create a file based MDS that actually references the project we created earlier.

  1. Create a new MDS Connection (File->All Technologies->General->Connections->SOA-MDS Connection)
  2. Make the Connection Type File Based MDS and choose the MDS Root Folder to be the location of the JDeveloper project previously created (not the source directory, the top level project directory).
    FileMDS

We can browse the file based MDS using the IDE Connections Window in JDeveloper.  This lets us check that we can see the contents of the repository.

Using File Based MDS

Now that we have MDS set up both in the database and locally in the file system we can try using some resources in a composite.  To use a WSDL from the file based repository:

  1. Insert a new Web Service Reference or Service onto your composite.xml.
  2. Browse the Resource Palette for the WSDL in the File Based MDS connection and import it.
    BrowseRepository
  3. Do not copy the resource into the project.
  4. If you are creating a reference, don’t worry about the warning message, that can be fixed later.  Just say Yes you do want to continue and create the reference.
    ConcreteWSDLWarning

Note that when you import a resource from an MDS connection it automatically adds a reference to that MDS into the applications adf-config.xml.  SOA applications do not deploy their adf-config.xml, they use it purely to help resolve oramds protocol references in SOA composites at design time.  At runtime the soa-infra applications adf-config.xml is used to help resolve oramds protocol references.

The reason we set file based MDS to point to the project directory rather than the apps directory underneath is because when we deploy SOA resources to MDS as a SOA bundle the resources are all placed under the apps MDS namespace.  To make sure that our file based MDS includes an apps namespace we have to rename the src directory to be apps and then make sure that our file based MDS points to the directory aboive the new source directory.

Patching Up References

When we use an abstract WSDL as a service then the SOA infrastructure automatically adds binging and service information at run time.  An abstract WSDL used as a reference needs to have binding and service information added in order to compile successfully.  By default the imported MDS reference for an abstract WSDL will look like this:

<reference name="Service3"
   ui:wsdlLocation="oramds:/apps/shared/WriteFileProcess.wsdl">
  <interface.wsdl interface="
http://xmlns.oracle.com/Test/SyncWriteFile/WriteFileProcess# wsdl.interface(WriteFileProcess)"/>
  <binding.ws port="" location=""/>
</reference>

Note that the port and location properties of the binding are empty.  We need to replace the location with a runtime WSDL location that includes binding information, this can be obtained by getting the WSDL URL from the soa-infra application or from EM.  Be sure to remove any MDS instance strings from the URL.

EndpointInfo

The port information is a little more complicated.  The first part of the string should be the target namespace of the service, usually the same as the first part of the interface attribute of the interface.wsdl element.  This is followed by a #wsdl.endpoint and then in parenthesis the service name from the runtime WSDL and port name from the WSDL, separated by a /.  The format should look like this:

{Service Namespace}#wsdl.endpoint({Service Name}/{Port Name})

So if we have a WSDL like this:

<wsdl:definitions
   …
  
targetNamespace=
   "http://xmlns.oracle.com/Test/SyncWriteFile/WriteFileProcess"
>
   …
   <wsdl:service name="writefileprocess_client_ep">
      <wsdl:port name="WriteFileProcess_pt"
            binding="client:WriteFileProcessBinding">
         <soap:address location=… />
      </wsdl:port>
   </wsdl:service>
</wsdl:definitions>

Then we get a binding.ws port like this:

http://xmlns.oracle.com/Test/SyncWriteFile/WriteFileProcess# wsdl.endpoint(writefileprocess_client_ep/WriteFileProcess_pt)

Note that you don’t have to set actual values until deployment time.  The following binding information will allow the composite to compile in JDeveloper, although it will not run in the runtime:

<binding.ws port="dummy#wsdl.endpoint(dummy/dummy)" location=""/>

The binding information can be changed in the configuration plan.  Deferring this means that you have to have a configuration plan in order to be able to invoke the reference and this means that you reduce the risk of deploying composites with references that are pointing to the wrong environment.

Summary

In this blog post I have shown how to store resources in MDS so that they can be shared between composites.  The resources can be created in a JDeveloper project that doubles as an MDS file repository.  The MDS resources can be reused in composites.  If using an abstract WSDL from MDS I have also shown how to fix up the binding information so that at runtime the correct endpoint can be invoked.  Maybe it is more fun than dealing with the Sirius Cybernetics Corporation!

Wednesday Oct 09, 2013

Multiple SOA Developers Using a Single Install

Running Multiple SOA Developers from a Single Install

A question just came up about how to run multiple developers from a single software install.  The objective is to have a single software installation on a shared server and then provide different OS users with the ability to create their own domains.  This is not a supported configuration but it is attractive for a development environment.

Out of the Box

Before we do anything special lets review the basic installation.

  • Oracle WebLogic Server 10.3.6 installed using oracle user in a Middleware Home
  • Oracle SOA Suite 11.1.1.7 installed using oracle user
  • Software installed with group oinstall
  • Developer users dev1, dev2 etc
    • Each developer user is a member of oinstall group and has access to the Middleware Home.

Customizations

To get this to work I did the following customization

  • In the Middleware Home make all user readable files/directories group readable and make all user executable files/directories group executable.
    • find $MW_HOME –perm /u+r ! –perm /g+r | xargs –Iargs chmod g+r args
    • find $MW_HOME –perm /u+x ! –perm /g+x | xargs –Iargs chmod g+x args

Domain Creation

When creating a domain for a developer note the following:

  • Each developer will need their own FMW repository, perhaps prefixed by their username, e.g. dev1, dev2 etc.
  • Each developer needs to use a unique port number for all WebLogic channels
  • Any use of Coherence should use Well Known Addresses to avoid cross talk between developer clusters (note SOA and OSB both use Coherence!)
  • If using Node Manager each developer will need their own instance, using their own configuration.

Friday Jun 28, 2013

SOA Suite 11g Developers Cookbook Published

SOA Suite 11g Developers Cookbook Available

Just realized that I failed to mention that Matt & mine’s most recent book, the SOA Suite 11g Developers Cookbook was published over Christmas last year!

In some ways this was an easier book to write than the Developers Guide, the hard bit was deciding what recipes to include.  Once we had decided that the writing of the book was pretty straight forward.

The book focuses on areas that we felt we had neglected in the Developers Guide, and so there is more about Java integration and OSB, both of which we see a lot of questions about when working with customers.

Amazon has a couple of reviews.

Table of Contents

Chapter 1: Building an SOA Suite Cluster
Chapter 2: Using the Metadata Service to Share XML Artifacts
Chapter 3: Working with Transactions
Chapter 4: Mapping Data
Chapter 5: Composite Messaging Patterns
Chapter 6: OSB Messaging Patterns
Chapter 7: Integrating OSB with JSON
Chapter 8: Compressed File Adapter Patterns
Chapter 9: Integrating Java with SOA Suite
Chapter 10: Securing Composites and Calling Secure Web Services
Chapter 11: Configuring the Identity Service
Chapter 12: Configuring OSB to Use Foreign JMS Queues
Chapter 13: Monitoring and Management

More Reviews

In addition to the Amazon Reviews I also found some reviews on GoodReads.

Wednesday Apr 25, 2012

Scripting WebLogic Admin Server Startup

How to Script WebLogic Admin Server Startup

My first car was a 14 year old Vauxhall Viva.  It is the only one of my cars that has ever been stolen, and to this day how they stole it is a mystery to me as I could never get it to start.  I always parked it pointing down a steep hill so that I was ready to jump start it!  Of course its ability to start was dramatically improved when I replaced the carburetor butterfly valve!

Getting SOA Suite or other WebLogic based systems to start can sometimes be a problem because the default WebLogic start scripts require you to stay logged on to the computer where you started the script.  Obviously this is awkward and a better approach is to run the script in the background.  This problem can be avoided by using a WLST script to start the AdminServer but that is more work, so I never bother with it.

If you just run the startup script in the background the standard output and standard error still go to the session where you started the script, not helpful if you log off and later want to see what is happening.  So the next thing to do is to redirect standard out and standard error from the script.

Finally it would be nice to have a record of the output of the last few runs of the Admin Server, but these should be purged to avoid filling up the directory.

Doing the above three tasks is the job of the script I use to start WebLogic.  The script is shown below:

Startup Script

#!/bin/sh

# SET VARIABLES

SCRIPT_HOME=`dirname $0`

MW_HOME=/home/oracle/app/Middleware

DOMAIN_HOME=$MW_HOME/user_projects/domains/dev_domain

LOG_FILE=$DOMAIN_HOME/servers/AdminServer/logs/AdminServer.out

# MOVE EXISTING LOG FILE

logrotate -f -s $SCRIPT_HOME/logrotate.status $SCRIPT_HOME/AdminServerLogRotation.cfg

#RUN ADMIN SERVER

touch $LOG_FILE

nohup $DOMAIN_HOME/startWebLogic.sh &> $LOG_FILE &

tail -f $LOG_FILE

Explanation

Lets walk through each section of the script.

SET VARIABLES

The first few lines of the script just set the environment.  Note that I put the output of the start script into the same location and same filename that it would go to if I used the Node Manager to start the server.  This keeps it consistent with other servers that are started by the node manager.

MOVE EXISTING LOG FILE

The next section keeps a copy of the previous output file by using the logrotate command.  This reads its configuration from the “AdminServerLogRotation.cfg” file shown below:

/home/oracle/app/Middleware/user_projects/domains/dev_domain/servers/AdminServer/logs/AdminServer.out {
  rotate 10
  missingok
}

This tells the logrotate command to keep 10 copies (rotate 10) of the log file and if there is no previous copy of the log file that is not an error condition (missingok).

The logrotate.status file is used by logrotate to keep track of what it has done.  It is ignored when the –f flag is used, causing the log file to be rotated every time the command is invoked.

RUN ADMIN SERVER

UPDATE: Sometimes the tail command starts before the shell has created the log file for the startWebLogic.sh command. To avoid an error in the tail command I "touch" the log file to make sure that it is there.

The final section actually invokes the standard command to start an admin server (startWebLogic.sh) and redirects the standard out and standard error to the log file.  Note that I run the command in the background and set it to ignore the death of the parent shell.

Finally I tail the log file so that the user experience is the same as running the start command directly.  However in this case if I Ctrl-C the command only the tail will be terminated, the Admin Server will continue to run as a background process.

This approach allows me to watch the output of the AdminServer but not to shut it down if I accidently hit Ctrl-C or close the shell window.

Restart Script

I also have a restart script shown below:

#!/bin/sh
# SET VARIABLES
SCRIPT_HOME=`dirname $0`
MW_HOME=/home/oracle/app/Middleware
DOMAIN_HOME=$MW_HOME/user_projects/domains/dev_domain

# STOP ADMIN SERVER
$DOMAIN_HOME/bin/stopWebLogic.sh

# RUN ADMIN SERVER
$SCRIPT_HOME/startAdminServer.sh

This is just like the start script except that it runs the stop weblogic command followed by my start script command.

Summary

The above scripts are quick and easy to put in place for the Admin Server and make the stdout and stderr logging consistent with other servers that are started from the node manager.  Now can someone help me push start my car!

Wednesday Dec 28, 2011

Too Much Debug

Too Much Debug

Remains of a Roast Turkey

Well it is Christmas and as is traditional, in England at least, we had roast turkey dinner.  And of course no matter how big your family, turkeys come in only two sizes; massively too big or enormously too big!  So by the third day of Christmas you are ready never to eat turkey again until thanksgiving.  Your trousers no longer fit around the waist, your sweater is snug around the midriff, and your children start talking about the return of the Blob.

And my point?  Well just like the food world, sometimes in the SOA world too much of a good thing is bad for you.  I had just extended my BPM domain with OSB only to discover that I could no longer start the BAM server, or the newly configured OSB server.  The error message I was getting was:

starting weblogic with Java version:
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690]
Starting WLS with line:
C:\app\oracle\product\FMW\JDK160~2\bin\java -client -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=8453,server=y,suspend=n

The mention of JDWP points to a problem with debug settings and sure enough in a development domain the setDomainEnv script is set up to enable debugging of OSB.  The problem is that the settings apply to all servers started with settings in the setDomainEnv script and should really only apply to the OSB servers.  There is a blog entry by Jiji Sasidharan that explains this and provides a fix.  However the fix disables the debug flag for all servers, not just the non-OSB servers.  So I offer my extension to this fix which modifies the setDomainEnv script as follows from:

set debugFlag=true

to:

rem Added so that only OSB server starts in debug mode
if "%SERVER_NAME%"=="osb_server1" (
    set debugFlag=true
)

This enables debugging to occur on managed server osb_server1 (this should match the name of one of your OSB servers to enable debugging).  It does not enable the debug flag for any other server, including other OSB servers in a cluster.  After making this change it may be necessary to restart the Admin Server because it is probably bound to the debug port.

So the moral of this tale is don’t eat too much turkey, don’t abuse the debug flag, but make sure you can get the benefits of debugging.

Have a great new year!

Thursday Sep 22, 2011

Coping with Failure

Handling Endpoint Failure in OSB

HardwareFailureRecently I was working on a POC and we had demonstrated stellar performance with OSB fronting a BPEL composite calling back end EJBs.  The final test was a failover test which tested killing an OSB and bringing it back online and then killing a SOA(BPEL) server and bringing it back online and finally killing a backend EJB server and bringing it back online.  All was going well until the BPEL failover test when for some reason OSB refused to mark the BPEL server as down.  Turns out we had forgotten to set a very important setting and so this entry outlines how to handle endpoint failure in OSB.

Step 1 – Add Multiple End Points to Business Service

The first thing to do is create multiple end points for the business service, pointing to all available backends.  This is required for HTTP/SOAP bindings.  In theory if using a T3 protocol then a single cluster address is sufficient and load balancing will be taken care of by T3 smart proxies.  In this scenario though we will focus on HTTP/SOAP endpoints.

Navigate to the Business Service->Configuration Details->Transport Configuration and add all your endpoint URIs.  Make sure that Retry Count is greater than 0 if you don’t want to pass failures back to the client.  In the example below I have set up links to three back end webs service instances.  Go to Last and Save the changes.

MultiOSBEndpoint

Step 2 – Enable Offlining & Recovery of Endpoint URIs

When a back end service instance fails we want to take it offline, meaning we want to remove it from the pool of instances to which OSB will route requests.  We do this by navigating to the Business Service->Operational Settings and selecting the Enable check box for Offline Endpoint URIs in the General Configuration section.  This causes OSB to stop routing requests to a backend that returns errors (if the transport setting Retry Application Errors is set) or fails to respond at all.

Offlining the service is good because we won’t send any more requests to a broken endpoint, but we also want to add the endpoint again when it becomes available.  We do this by setting the Enable with Retry Interval in General Configuration to some non-zero value, such as 30 seconds.  Then every 30 seconds OSB will add the failed service endpoint back into the list of endpoints.  If the endpoint is still not ready to accept requests then it will error again and be removed again from the list.  In the example below I have set up a 30 second retry interval.  Remember to hit update and then commit all the session changes.

OfflineOSBEndpoint

Considerations on Retry Count

A couple of things to be aware of on retry count.

If you set retry count to greater than zero then endpoint failures will be transparent to OSB clients, other than the additional delay they experience.  However if the request is mutative (changes the backend) then there is no guarantee that the request might not have been executed but the endpoint failed before turning the result, in which case you submit the mutative operation twice.  If your back end service can’t cope with this then don’t set retries.

If your back-end service can’t cope with retries then you can still get the benefit of transparent retries for non-mutative operations by creating two business services, one with retry enabled that handles non-mutative requests, and the other with retry set to zero that handles mutative requests.

Considerations on Retry Interval for Offline Endpoints

If you set the retry interval to too small a value then it is very likely that your failed endpoint will not have recovered and so you will waste time on a request failing to contact that endpoint before failing over to a new endpoint, this will increase the client response time.  Work out what would be a typical unplanned outage time for a node (such as caused by a JVM failure and subsequent restart) and set the retry interval to be say half of this as a comprise between causing additional client response time delays and adding the endpoint back into the mix as soon as possible.

Conclusion

Always remember to set the Operational Setting to Enable Offlining and then you won’t be surprised in a fail over test!

About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today