Wednesday Feb 26, 2014

Clustering Events

Setting up an Oracle Event Processing Cluster

Recently I was working with Oracle Event Processing (OEP) and needed to set it up as part  of a high availability cluster.  OEP uses Coherence for quorum membership in an OEP cluster.  Because the solution used caching it was also necessary to include access to external Coherence nodes.  Input messages need to be duplicated across multiple OEP streams and so a JMS Topic adapter needed to be configured.  Finally only one copy of each output event was desired, requiring the use of an HA adapter.  In this blog post I will go through the steps required to implement a true HA OEP cluster.

OEP High Availability Review

The diagram below shows a very simple non-HA OEP configuration:

Events are received from a source (JMS in this blog).  The events are processed by an event processing network which makes use of a cache (Coherence in this blog).  Finally any output events are emitted.  The output events could go to any destination but in this blog we will emit them to a JMS queue.

OEP provides high availability by having multiple event processing instances processing the same event stream in an OEP cluster.  One instance acts as the primary and the other instances act as secondary processors.  Usually only the primary will output events as shown in the diagram below (top stream is the primary):

The actual event processing is the same as in the previous non-HA example.  What is different is how input and output events are handled.  Because we want to minimize or avoid duplicate events we have added an HA output adapter to the event processing network.  This adapter acts as a filter, so that only the primary stream will emit events to out queue.  If the processing of events within the network depends on how the time at which events are received then it is necessary to synchronize the event arrival time across the cluster by using an HA input adapter to synchronize the arrival timestamps of events across the cluster.

OEP Cluster Creation

Lets begin by setting up the base OEP cluster.  To do this we create new OEP configurations on each machine in the cluster.  The steps are outlined below.  Note that the same steps are performed on each machine for each server which will run on that machine:

  • Run ${MW_HOME}/ocep_11.1/common/bin/config.sh.
    • MW_HOME is the installation directory, note that multiple Fusion Middleware products may be installed in this directory.
  • When prompted “Create a new OEP domain”.
  • Provide administrator credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Specify a  “Server name” and “Server listen port”.
    • Each OEP server must have a unique name.
    • Different servers can share the same “Server listen port” unless they are running on the same host.
  • Provide keystore credentials.
    • Make sure you provide the same credentials on all machines in the cluster.
  • Configure any required JDBC data source.
  • Provide the “Domain Name” and “Domain location”.
    • All servers must have the same “Domain name”.
    • The “Domain location” may be different on each server, but I would keep it the same to simplify administration.
    • Multiple servers on the same machine can share the “Domain location” because their configuration will be placed in the directory corresponding to their server name.
  • Create domain!

Configuring an OEP Cluster

Now that we have created our servers we need to configure them so that they can find each other.  OEP uses Oracle Coherence to determine cluster membership.  Coherence clusters can use either multicast or unicast to discover already running members of a cluster.  Multicast has the advantage that it is easy to set up and scales better (see http://www.ateam-oracle.com/using-wka-in-large-coherence-clusters-disabling-multicast/) but has a number of challenges, including failure to propagate by default through routers and accidently joining the wrong cluster because someone else chose the same multicast settings.  We will show how to use both unicast and multicast to discover the cluster. 

Multicast Discovery Unicast Discovery
Coherence multicast uses a class D multicast address that is shared by all servers in the cluster.  On startup a Coherence node broadcasts a message to the multicast address looking for an existing cluster.  If no-one responds then the node will start the cluster. Coherence unicast uses Well Known Addresses (WKAs). Each server in the cluster needs a dedicated listen address/port combination. A subset of these addresses are configured as WKAs and shared between all members of the cluster. As long as at least one of the WKAs is up and running then servers can join the cluster. If a server does not find any cluster members then it checks to see if its listen address and port are in the WKA list. If it is then that server will start the cluster, otherwise it will wait for a WKA server to become available.
  To configure a cluster the same steps need to be followed for each server in the cluster:
  • Set an event server address in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          <server-name>server1</server-name>
          <server-host-name>oep1.oracle.com</server-host-name>
      </cluster>
    • The “server-name” is displayed in the visualizer and should be unique to the server.

    • The “server-host-name” is used by the visualizer to access remote servers.

    • The “server-host-name” must be an IP address or it must resolve to an IP address that is accessible from all other servers in the cluster.

    • The listening port is configured in the <netio> section of the config.xml.

    • The server-host-name/listening port combination should be unique to each server.

 
  • Set a common cluster multicast listen address shared by all servers in the config.xml file.
    • Add the following to the <cluster> element:
      <cluster>
          …
          <!—For us in Coherence multicast only! –>
          <multicast-address>239.255.200.200</multicast-address>
          <multicast-port>9200</multicast-port>
      </cluster>
    • The “multicast-address” must be able to be routed through any routers between servers in the cluster.

  • Optionally you can specify the bind address of the server, this allows you to control port usage and determine which network is used by Coherence

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>
  • Configure the Coherence WKA cluster discovery.

    • Create a “tangosol-coherence-override.xml” file in the ${DOMAIN}/{SERVERNAME}/config directory for each server in the cluster.
      <?xml version='1.0'?>
      <coherence>
          <cluster-config>
              <unicast-listener>
                  <!—WKA Configuration –>
                  <well-known-addresses>
                      <socket-address id="1">
                          <address>192.168.56.91</address>
                          <port>9200</port>
                      </socket-address>
                      <socket-address id="2">
                          <address>192.168.56.92</address>
                          <port>9200</port>
                      </socket-address>
                  </well-known-addresses>
                  <!—This server Coherence address and port number –>
                  <address>192.168.56.91</address>
                  <port>9200</port>
              </unicast-listener>
          </cluster-config>
      </coherence>

    • List at least two servers in the <socket-address> elements.

    • For each <socket-address> element there should be a server that has corresponding <address> and <port> elements directly under <well-known-addresses>.

    • One of the servers listed in the <well-known-addresses> element must be the first server started.

    • Not all servers need to be listed in <well-known-addresses>, but see previous point.

 
  • Enable clustering using a Coherence cluster.
    • Add the following to the <cluster> element in config.xml.
      <cluster>
          …
          <enabled>true</enabled>
      </cluster>
    • The “enabled” element tells OEP that it will be using Coherence to establish cluster membership, this can also be achieved by setting the value to be “coherence”.

 
  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <!—For us in Coherence multicast only! –>
        <multicast-address>239.255.200.200</multicast-address>
        <multicast-port>9200</multicast-port>
        <enabled>true</enabled>
    </cluster>

  • The following shows the <cluster> config for another server in the cluster with differences highlighted:
    <cluster>
        <server-name>server2</server-name>
        <server-host-name>oep2.oracle.com</server-host-name>
        <enabled>true</enabled>
    </cluster>

 
  • The following shows the “tangosol-coherence-override.xml” file for another server in the cluster with differences highlighted:
    <?xml version='1.0'?>
    <coherence>
        <cluster-config>
            <unicast-listener>
                <!—WKA Configuration –>
                <well-known-addresses>
                    <socket-address id="1">
                        <address>192.168.56.91</address>
                        <port>9200</port>
                    </socket-address>
                    <socket-address id="2">
                        <address>192.168.56.92</address>
                        <port>9200</port>
                    </socket-address>
                    <!—This server Coherence address and port number –>
                    <address>192.168.56.92</address>
                    <port>9200</port>
                </well-known-addresses>
            </unicast-listener>
        </cluster-config>
    </coherence>

You should now have a working OEP cluster.  Check the cluster by starting all the servers.

Look for a message like the following on the first server to start to indicate that another server has joined the cluster:

<Coherence> <BEA-2049108> <The domain membership has changed to [server2, server1], the new domain primary is "server1">

Log on to the Event Processing Visualizer of one of the servers – http://<hostname>:<port>/wlevs.  Select the cluster name on the left and then select group “AllDomainMembers”.  You should see a list of all the running servers in the “Servers of Group – AllDomainMembers” section.

Sample Application

Now that we have a working OEP cluster let us look at a simple application that can be used as an example of how to cluster enable an application.  This application models service request tracking for hardware products.  The application we will use performs the following checks:

  1. If a new service request (identified by SRID) arrives (indicated by status=RAISE) then we expect some sort of follow up in the next 10 seconds (seconds because I want to test this quickly).  If no follow up is seen then an alert should be raised.
    • For example if I receive an event (SRID=1, status=RAISE) and after 10 seconds I have not received a follow up message (SRID=1, status<>RAISE) then I need to raise an alert.
  2. If a service request (identified by SRID) arrives and there has been another service request (identified by a different SRID) for the same physcial hardware (identified by TAG) then an alert should be raised.
    • For example if I receive an event (SRID=2, TAG=M1) and later I receive another event for the same hardware (SRID=3, TAG=M1) then an alert should be raised.

Note use case 1 is nicely time bounded – in this case the time window is 10 seconds.  Hence this is an ideal candidate to be implemented entirely in CQL.

Use case 2 has no time constraints, hence over time there could be a very large number of CQL queries running looking for a matching TAG but a different SRID.  In this case it is better to put the TAGs into a cache and search the cache for duplicate tags.  This reduces the amount of state information held in the OEP engine.

The sample application to implement this is shown below:

Messages are received from a JMS Topic (InboundTopicAdapter).  Test messages can be injected via a CSV adapter (RequestEventCSVAdapter).  Alerts are sent to a JMS Queue (OutboundQueueAdapter), and also printed to the server standard output (PrintBean).  Use case 1 is implemented by the MissingEventProcessor.  Use case 2 is implemented by inserting the TAG into a cache (InsertServiceTagCacheBean) using a Coherence event processor and then querying the cache for each new service request (DuplicateTagProcessor), if the same tag is already associated with an SR in the cache then an alert is raised.  The RaiseEventFilter is used to filter out existing service requests from the use case 2 stream.

The non-HA version of the application is available to download here.

We will use this application to demonstrate how to HA enable an application for deployment on our cluster.

A CSV file (TestData.csv) and Load generator properties file (HADemoTest.prop) is provided to test the application by injecting events using the CSV Adapter.

Note that the application reads a configuration file (System.properties) which should be placed in the domain directory of each event server.

Deploying an Application

Before deploying an application to a cluster it is a good idea to create a group in the cluster.  Multiple servers can be members of this group.  To add a group to an event server just add an entry to the <cluster> element in config.xml as shown below:

<cluster>
      …
      <groups>HAGroup</groups>
   </cluster>

Multiple servers can be members of a group and a server can be a member of multiple groups.  This allows you to have different levels of high availability in the same event processing cluster.

Deploy the application using the Visualizer.  Target the application at the group you created, or the AllDomainMembers group.

Test the application, typically using a CSV Adapter.  Note that using a CSV adapter sends all the events to a single event server.  To fix this we need to add a JMS output adapter (OutboundTopicAdapter) to our application and then send events from the CSV adapter to the outbound JMS adapter as shown below:

So now we are able to send events via CSV to an event processor that in turn sends the events to a JMS topic.  But we still have a few challenges.

Managing Input

First challenge is managing input.  Because OEP relies on the same event stream being processed by multiple servers we need to make sure that all our servers get the same message from the JMS Topic.  To do this we configure the JMS connection factory to have an Unrestricted Client ID.  This allows multiple clients (OEP servers in our case) to use the same connection factory.  Client IDs are mandatory when using durable topic subscriptions.  We also need each event server to have its own subscriber ID for the JMS Topic, this ensures that each server will get a copy of all the messages posted to the topic.  If we use the same subscriber ID for all the servers then the messages will be distributed across the servers, with each server seeing a completely disjoint set of messages to the other servers in the cluster.  This is not what we want because each server should see the same event stream.  We can use the server name as the subscriber ID as shown in the below excerpt from our application:

<wlevs:adapter id="InboundTopicAdapter" provider="jms-inbound">
    …
    <wlevs:instance-property name="durableSubscriptionName"
            value="${com_bea_wlevs_configuration_server_ClusterType.serverName}" />
</wlevs:adapter>

This works because I have placed a ConfigurationPropertyPlaceholderConfigurer bean in my application as shown below, this same bean is also used to access properties from a configuration file:

<bean id="ConfigBean"
        class="com.bea.wlevs.spring.support.ConfigurationPropertyPlaceholderConfigurer">
        <property name="location" value="file:../Server.properties"/>
    </bean>

With this configuration each server will now get a copy of all the events.

As our application relies on elapsed time we should make sure that the timestamps of the received messages are the same on all servers.  We do this by adding an HA Input adapter to our application.

<wlevs:adapter id="HAInputAdapter" provider="ha-inbound">
    <wlevs:listener ref="RequestChannel" />
    <wlevs:instance-property name="keyProperties"
            value="EVID" />
    <wlevs:instance-property name="timeProperty" value="arrivalTime"/>
</wlevs:adapter>

The HA Adapter sets the given “timeProperty” in the input message to be the current system time.  This time is then communicated to other HAInputAdapters deployed to the same group.  This allows all servers in the group to have the same timestamp in their event.  The event is identified by the “keyProperties” key field.

To allow the downstream processing to treat the timestamp as an arrival time then the downstream channel is configured with an “application-timestamped” element to set the arrival time of the event.  This is shown below:

<wlevs:channel id="RequestChannel" event-type="ServiceRequestEvent">
    <wlevs:listener ref="MissingEventProcessor" />
    <wlevs:listener ref="RaiseEventFilterProcessor" />
    <wlevs:application-timestamped>
        <wlevs:expression>arrivalTime</wlevs:expression>
    </wlevs:application-timestamped>
</wlevs:channel>

Note the property set in the HAInputAdapter is used to set the arrival time of the event.

So now all servers in our cluster have the same events arriving from a topic, and each event arrival time is synchronized across the servers in the cluster.

Managing Output

Note that an OEP cluster has multiple servers processing the same input stream.  Obviously if we have the same inputs, synchronized to appear to arrive at the same time then we will get the same outputs, which is central to OEPs promise of high availability.  So when an alert is raised by our application it will be raised by every server in the cluster.  If we have 3 servers in the cluster then we will get 3 copies of the same alert appearing on our alert queue.  This is probably not what we want.  To fix this we take advantage of an HA Output Adapter.  unlike input where there is a single HA Input Adapter there are multiple HA Output Adapters, each with distinct performance and behavioral characteristics.  The table below is taken from the Oracle® Fusion Middleware Developer's Guide for Oracle Event Processing and shows the different levels of service and performance impact:

Table 24-1 Oracle Event Processing High Availability Quality of Service

High Availability Option Missed Events? Duplicate Events? Performance Overhead
Section 24.1.2.1, "Simple Failover" Yes (many) Yes (few) Negligible
Section 24.1.2.2, "Simple Failover with Buffering" Yes (few)Foot 1 Yes (many) Low
Section 24.1.2.3, "Light-Weight Queue Trimming" No Yes (few) Low-MediumFoot 2
Section 24.1.2.4, "Precise Recovery with JMS" No No High

I decided to go for the lightweight queue trimming option.  This means I won’t lose any events, but I may emit a few duplicate events in the event of primary failure.  This setting causes all output events to be buffered by secondary's until they are told by the primary that a particular event has been emitted.  To configure this option I add the following adapter to my EPN:

    <wlevs:adapter id="HAOutputAdapter" provider="ha-broadcast">
        <wlevs:listener ref="OutboundQueueAdapter" />
        <wlevs:listener ref="PrintBean" />
        <wlevs:instance-property name="keyProperties" value="timestamp"/>
        <wlevs:instance-property name="monotonic" value="true"/>
        <wlevs:instance-property name="totalOrder" value="false"/>
    </wlevs:adapter>

This uses the time of the alert (timestamp property) as the key to be used to identify events which have been trimmed.  This works in this application because the alert time is the time of the source event, and the time of the source events are synchronized using the HA Input Adapter.  Because this is a time value then it will increase, and so I set monotonic=”true”.  However I may get two alerts raised at the same timestamp and in that case I set totalOrder=”false”.

I also added the additional configuration to config.xml for the application:

<ha:ha-broadcast-adapter>
    <name>HAOutputAdapter</name>
    <warm-up-window-length units="seconds">15</warm-up-window-length>
    <trimming-interval units="millis">1000</trimming-interval>
</ha:ha-broadcast-adapter>

This causes the primary to tell the secondary's which is its latest emitted alert every 1 second.  This will cause the secondary's to trim from their buffers all alerts prior to and including the latest emitted alerts.  So in the worst case I will get one second of duplicated alerts.  It is also possible to set a number of events rather than a time period.  The trade off here is that I can reduce synchronization overhead by having longer time intervals or more events, causing more memory to be used by the secondary's or I can cause more frequent synchronization, using less memory in the secondary's and generating fewer duplicate alerts but there will be more communication between the primary and the secondary's to trim the buffer.

The warm-up window is used to stop a secondary joining the cluster before it has been running for that time period.  The window is based on the time that the EPN needs to be running to be have the same state as the other servers.  In our example application we have a CQL that runs for a period of 10 seconds, so I set the warm up window to be 15 seconds to ensure that a newly started server had the same state as all the other servers in the cluster.  The warm up window should be greater than the longest query window.

Adding an External Coherence Cluster

When we are running OEP as a cluster then we have additional overhead in the servers.  The HA Input Adapter is synchronizing event time across the servers, the HA Output adapter is synchronizing output events across the servers.  The HA Output adapter is also buffering output events in the secondary’s.  We can’t do anything about this but we can move the Coherence Cache we are using outside of the OEP servers, reducing the memory pressure on those servers and also moving some of the processing outside of the server.  Making our Coherence caches external to our OEP cluster is a good idea for the following reasons:

  • Allows moving storage of cache entries outside of the OEP server JVMs hence freeing more memory for storing CQL state.
  • Allows storage of more entries in the cache by scaling cache independently of the OEP cluster.
  • Moves cache processing outside OEP servers.

To create the external Coherence cache do the following:

  • Create a new directory for our standalone Coherence servers, perhaps at the same level as the OEP domain directory.
  • Copy the tangosol-coherence-override.xml file previously created for the OEP cluster into a config directory under the Coherence directory created in the previous step.
  • Copy the coherence-cache-config.xml file from the application into a config directory under the Coherence directory created in the previous step.
  • Add the following to the tangosol-coherence-override.xml file in the Coherence config directory:
    • <coherence>
          <cluster-config>
              <member-identity>
                  <cluster-name>oep_cluster</cluster-name>
                  <member-name>Grid1</member-name>
              </member-identity>
              …
          </cluster-config>
      </coherence>
    • Important Note: The <cluster-name> must match the name of the OEP cluster as defined in the <domain><name> element in the event servers config.xml.
    • The member name is used to help identify the server.
  • Disable storage for our caches in the event servers by editing the coherence-cache-config.xml file in the application and adding the following element to the caches:
    • <distributed-scheme>
          <scheme-name>DistributedCacheType</scheme-name>
          <service-name>DistributedCache</service-name>
          <backing-map-scheme>
              <local-scheme/>
          </backing-map-scheme>
          <local-storage>false</local-storage>
      </distributed-scheme>
    • The local-storage flag stops the OEP server from storing entries for caches using this cache schema.
    • Do not disable storage at the global level (-Dtangosol.coherence.distributed.localstorage=false) because this will disable storage on some OEP specific cache schemes as well as our application cache.  We don’t want to put those schemes into our cache servers because they are used by OEP to maintain cluster integrity and have only one entry per application per server, so are very small.  If we put those into our Coherence Cache servers we would have to add OEP specific libraries to our cache servers and enable them in our coherence-cache-config.xml, all of which is too much trouble for little or no benefit.
  • If using Unicast Discovery (this section is not required if using Multicast) then we want to make the Coherence Grid be the Well Known Address servers because we want to disable storage of entries on our OEP servers, and Coherence nodes with storage disabled cannot initialize a cluster.  To enable the Coherence servers to be primaries in the Coherence grid do the following:
    • Change the unicast-listener addresses in the Coherence servers tangosol-coherence-override.xml file to be suitable values for the machine they are running on – typically change the listen address.
    • Modify the WKA addresses in the OEP servers and the Coherence servers tangosol-coherence-override.xml file to match at least two of the Coherence servers listen addresses.
    • The following table shows how this might be configured for 2 OEP servers and 2 Cache servers
      OEP Server 1 OEP Server 2 Cache Server 1 Cache Server 2

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>








          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9200</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid1
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.91
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

      <?xml version='1.0'?>
      <coherence>
        <cluster-config>
          <member-identity>
            <cluster-name>
              oep_cluster
            </cluster-name>
            <member-name>
              Grid2
            </member-name>
          </member-identity>
          <unicast-listener>
            <well-known-addresses>
              <socket-address id="1">
                <address>
                  192.168.56.91
               
      </address>
                <port>9300</port>
              </socket-address>
              <socket-address id="2">
                <address>
                  192.168.56.92
               
      </address>
                <port>9300</port>
              </socket-address>
            </well-known-addresses>
            <address>
              192.168.56.92
           
      </address>
            <port>9300</port>
          </unicast-listener>
        </cluster-config>
      </coherence>

    • Note that the OEP servers do not listen on the WKA addresses, using different port numbers even though they run on the same servers as the cache servers.
    • Also not that the Coherence servers are the ones that listen on the WKA addresses.
  • Now that the configuration is complete we can create a start script for the Coherence grid servers as follows:
    • #!/bin/sh
      MW_HOME=/home/oracle/fmw
      OEP_HOME=${MW_HOME}/ocep_11.1
      JAVA_HOME=${MW_HOME}/jrockit_160_33
      CACHE_SERVER_HOME=${MW_HOME}/user_projects/domains/oep_coherence
      CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config
      COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar
      JAVAEXEC=$JAVA_HOME/bin/java
      # specify the JVM heap size
      MEMORY=512m
      if [[ $1 == '-jmx' ]]; then
          JMXPROPERTIES="-Dcom.sun.management.jmxremote -Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true"
          shift
      fi
      JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY $JMXPROPERTIES"
      $JAVAEXEC -server -showversion $JAVA_OPTS -cp "${CACHE_SERVER_CLASSPATH}:${COHERENCE_JAR}" com.tangosol.net.DefaultCacheServer $1
    • Note that I put the tangosol-coherence-override and the coherence-cache-config.xml files in a config directory and added that directory to my path (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config) so that Coherence would find the override file.
    • Because my application uses in-cache processing (entry processors) I had to add a jar file containing the required classes for the entry processor to the classpath (CACHE_SERVER_CLASSPATH=${CACHE_SERVER_HOME}/HADemoCoherence.jar:${CACHE_SERVER_HOME}/config).
    • The classpath references the Coherence Jar shipped with OEP to avoid versoin mismatches (COHERENCE_JAR=${OEP_HOME}/modules/com.tangosol.coherence_3.7.1.6.jar).
    • This script is based on the standard cache-server.sh script that ships with standalone Coherence.
    • The –jmx flag can be passed to the script to enable Coherence JMX management beans.

We have now configured Coherence to use an external data grid for its application caches.  When starting we should always start at least one of the grid servers before starting the OEP servers.  This will allow the OEP server to find the grid.  If we do start things in the wrong order then the OEP servers will block waiting for a storage enabled node to start (one of the WKA servers if using Unicast).

Summary

We have now created an OEP cluster that makes use of an external Coherence grid for application caches.  The application has been modified to ensure that the timestamps of arriving events are synchronized and the output events are only output by one of the servers in the cluster.  In event of failure we may get some duplicate events with our configuration (there are configurations that avoid duplicate events) but we will not lose any events.  The final version of the application with full HA capability is shown below:

Files

The following files are available for download:

  • Oracle Event Processing
    • Includes Coherence
  • None-HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • HA version of application
    • Includes test file TestData.csv and Load Test property file HADemoTest.prop
    • Includes Server.properties.Antony file to customize to point to your WLS installation
  • OEP Cluster Files
    • Includes config.xml
    • Includes tangosol-coherence-override.xml
    • Includes Server.properties that will need customizing for your WLS environment
  • Coherence Cluster Files
    • Includes tangosol-coherence-override.xml and coherence-cache-configuration.xml
    • includes cache-server.sh start script
    • Includes HADemoCoherence.jar with required classes for entry processor

References

The following references may be helpful:

Tuesday May 21, 2013

Target Verification

Verifying the Target

I just built a combined OSB, SOA/BPM, BAM clustered domain.  The biggest hassle is validating that the resource targeting is correct.  There is a great appendix in the documentation that lists all the modules and resources with their associated targets.  The only problem is that the appendix is six pages of small print.  I manually went through the first page, verifying my targeting, until I thought ‘there must be a better way of doing this’.  So this blog post is the better way Smile

WLST to the Rescue

WebLogic Scripting Tool allows us to query the MBeans and discover what resources are deployed and where they are targeted.  So I built a script that iterates over each of the following resource types and verifies that they are correctly targeted:

  • Applications
  • Libraries
  • Startup Classes
  • Shutdown Classes
  • JMS System Resources
  • WLDF System Resources

Source Data

To get the data to verify my domain against, I copied the tables from the documentation into a text file.  The copy ended up putting the resource on the first line and the targets on the second line.  Rather than reformat the data I just read the lines in pairs, storing the resource as a string and splitting apart the targets into a list of strings.  I then stored the data in a dictionary with the resource string as the key and the target list as the value.  The code to do this is shown below:

# Load resource and target data from file created from documentation
# File format is a one line with resource name followed by
# one line with comma separated list of targets
# fileIn - Resource & Target File
# accum - Dictionary containing mappings of expected Resource to Target
# returns - Dictionary mapping expected Resource to expected Target
def parseFile(fileIn, accum) :
  # Load resource name
  line1 = fileIn.readline().strip('\n')
  if line1 == '':
    # Done if no more resources
    return accum
  else:
    # Load list of targets
    line2 = fileIn.readline().strip('\n')
    # Convert string to list of targets
    targetList = map(fixTargetName, line2.split(','))
    # Associate resource with list of targets in dictionary
    accum[line1] = targetList
    # Parse remainder of file
    return parseFile(fileIn, accum)

This makes it very easy to update the lists by just copying and pasting from the documentation.

Each table in the documentation has a corresponding file that is used by the script.

The data read from the file has the target names mapped to the actual domain target names which are provided in a properties file.

Listing & Verifying the Resources & Targets

Within the script I move to the domain configuration MBean and then iterate over the resources deployed and for each resource iterate over the targets, validating them against the corresponding targets read from the file as shown below:

# Validate that resources are correctly targeted
# name - Name of Resource Type
# filename - Filename to validate against
# items - List of Resources to be validated
def validateDeployments(name, filename, items) :
  print name+' Check'
  print "====================================================="
  fList = loadFile(filename)
  # Iterate over resources
  for item in items:
    try:
      # Get expected targets for resource
      itemCheckList = fList[item.getName()]
      # Iterate over actual targets
      for target in item.getTargets() :
        try:
          # Remove actual target from expected targets
          itemCheckList.remove(target.getName())
        except ValueError:
          # Target not found in expected targets
          print 'Extra target: '+item.getName()+': '+target.getName()
      # Iterate over remaining expected targets, if any
      for refTarget in itemCheckList:
        print 'Missing target: '+item.getName()+': '+refTarget
    except KeyError:
      # Resource not found in expected resource dictionary
      print 'Extra '+name+' Deployed: '+item.getName()
  print

Obtaining the Script

I have uploaded the script here.  It is a zip file containing all the required files together with a PDF explaining how to use the script.

To install just unzip VerifyTargets.zip. It will create the following files

  • verifyTargets.sh
  • verifyTargets.properties
  • VerifyTargetsScriptInstructions.pdf
  • scripts/verifyTargets.py
  • scripts/verifyApps.txt
  • scripts/verifyLibs.txt
  • scripts/verifyStartup.txt
  • scripts/verifyShutdown.txt
  • scripts/verifyJMS.txt
  • scripts/verifyWLDF.txt

Sample Output

The following is sample output from running the script:

Application Check
=====================================================
Extra Application Deployed: frevvo
Missing target: usermessagingdriver-xmpp: optional
Missing target: usermessagingdriver-smpp: optional
Missing target: usermessagingdriver-voicexml: optional
Missing target: usermessagingdriver-extension: optional
Extra target: Healthcare UI: soa_cluster
Missing target: Healthcare UI: SOA_Cluster ??
Extra Application Deployed: OWSM Policy Support in OSB Initializer Aplication

Library Check
=====================================================
Extra Library Deployed: oracle.bi.adf.model.slib#1.0-AT-11.1-DOT-1.2-DOT-0
Extra target: oracle.bpm.mgmt#11.1-DOT-1-AT-11.1-DOT-1: AdminServer
Missing target: oracle.bpm.mgmt#11.1.1-AT-11.1.1: soa_cluster
Extra target: oracle.sdp.messaging#11.1.1-AT-11.1.1: bam_cluster

StartupClass Check
=====================================================

ShutdownClass Check
=====================================================

JMS Resource Check
=====================================================
Missing target: configwiz-jms: bam_cluster

WLDF Resource Check
=====================================================

IMPORTANT UPDATE

Since posting this I have discovered a number of issues.  I have updated the configuration files to correct these problems.  The changes made are as follows:

  • Added WLS_OSB1 server mapping to the script properties file (verifyTargets.properties) to accommodate OSB singletons and modified script (verifyTargets.py) to use the new property.
  • Changes to verifyApplications.txt
    • Changed target from OSB_Cluster to WLS_OSB1 for the following applications:
      • ALSB Cluster Singleton Marker Application
      • ALSB Domain Singleton Marker Application
      • Message Reporting Purger
    • Added following application and targeted at SOA_Cluster
      • frevvo
    • Adding following application and targeted at OSB_Cluster & Admin Server
      • OWSM Policy Support in OSB Initializer Aplication
  • Changes to verifyLibraries.txt
    • Adding following library and targeted at OSB_Cluster, SOA_Cluster, BAM_Cluster & Admin Server
      • oracle.bi.adf.model.slib#1.0-AT-11.1.1.2-DOT-0
    • Modified targeting of following library to include BAM_Cluster
      • oracle.sdp.messaging#11.1.1-AT-11.1.1

Make sure that you download the latest version.  It is at the same location but now includes a version file (version.txt).  The contents of the version file should be:

FMW_VERSION=11.1.1.7

SCRIPT_VERSION=1.1

About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today