Thursday Sep 26, 2013

Enterprise Deployment Presentation at OpenWorld

Presentation Today Thursday 26 September 2013

Today Matt & I together with Ram from Oracle Product Management and Craig from Rubicon Red will be talking about building a highly available, highly scalable enterprise deployment. We will go through Oracles Enterprise Deployment Guide and explain why it recommends what is does and also identify alternatives to its recommendations. Come along to Moscone West 2020 at 2pm today. it would be great see you there.

Update

Thanks to all who attended, we were gratified to see around 100 people turn out on Thursday afternoon. I know I am usually presentationed out by then! I have uploaded the presentation here.

Tuesday May 21, 2013

Target Verification

Verifying the Target

I just built a combined OSB, SOA/BPM, BAM clustered domain.  The biggest hassle is validating that the resource targeting is correct.  There is a great appendix in the documentation that lists all the modules and resources with their associated targets.  The only problem is that the appendix is six pages of small print.  I manually went through the first page, verifying my targeting, until I thought ‘there must be a better way of doing this’.  So this blog post is the better way Smile

WLST to the Rescue

WebLogic Scripting Tool allows us to query the MBeans and discover what resources are deployed and where they are targeted.  So I built a script that iterates over each of the following resource types and verifies that they are correctly targeted:

  • Applications
  • Libraries
  • Startup Classes
  • Shutdown Classes
  • JMS System Resources
  • WLDF System Resources

Source Data

To get the data to verify my domain against, I copied the tables from the documentation into a text file.  The copy ended up putting the resource on the first line and the targets on the second line.  Rather than reformat the data I just read the lines in pairs, storing the resource as a string and splitting apart the targets into a list of strings.  I then stored the data in a dictionary with the resource string as the key and the target list as the value.  The code to do this is shown below:

# Load resource and target data from file created from documentation
# File format is a one line with resource name followed by
# one line with comma separated list of targets
# fileIn - Resource & Target File
# accum - Dictionary containing mappings of expected Resource to Target
# returns - Dictionary mapping expected Resource to expected Target
def parseFile(fileIn, accum) :
  # Load resource name
  line1 = fileIn.readline().strip('\n')
  if line1 == '':
    # Done if no more resources
    return accum
  else:
    # Load list of targets
    line2 = fileIn.readline().strip('\n')
    # Convert string to list of targets
    targetList = map(fixTargetName, line2.split(','))
    # Associate resource with list of targets in dictionary
    accum[line1] = targetList
    # Parse remainder of file
    return parseFile(fileIn, accum)

This makes it very easy to update the lists by just copying and pasting from the documentation.

Each table in the documentation has a corresponding file that is used by the script.

The data read from the file has the target names mapped to the actual domain target names which are provided in a properties file.

Listing & Verifying the Resources & Targets

Within the script I move to the domain configuration MBean and then iterate over the resources deployed and for each resource iterate over the targets, validating them against the corresponding targets read from the file as shown below:

# Validate that resources are correctly targeted
# name - Name of Resource Type
# filename - Filename to validate against
# items - List of Resources to be validated
def validateDeployments(name, filename, items) :
  print name+' Check'
  print "====================================================="
  fList = loadFile(filename)
  # Iterate over resources
  for item in items:
    try:
      # Get expected targets for resource
      itemCheckList = fList[item.getName()]
      # Iterate over actual targets
      for target in item.getTargets() :
        try:
          # Remove actual target from expected targets
          itemCheckList.remove(target.getName())
        except ValueError:
          # Target not found in expected targets
          print 'Extra target: '+item.getName()+': '+target.getName()
      # Iterate over remaining expected targets, if any
      for refTarget in itemCheckList:
        print 'Missing target: '+item.getName()+': '+refTarget
    except KeyError:
      # Resource not found in expected resource dictionary
      print 'Extra '+name+' Deployed: '+item.getName()
  print

Obtaining the Script

I have uploaded the script here.  It is a zip file containing all the required files together with a PDF explaining how to use the script.

To install just unzip VerifyTargets.zip. It will create the following files

  • verifyTargets.sh
  • verifyTargets.properties
  • VerifyTargetsScriptInstructions.pdf
  • scripts/verifyTargets.py
  • scripts/verifyApps.txt
  • scripts/verifyLibs.txt
  • scripts/verifyStartup.txt
  • scripts/verifyShutdown.txt
  • scripts/verifyJMS.txt
  • scripts/verifyWLDF.txt

Sample Output

The following is sample output from running the script:

Application Check
=====================================================
Extra Application Deployed: frevvo
Missing target: usermessagingdriver-xmpp: optional
Missing target: usermessagingdriver-smpp: optional
Missing target: usermessagingdriver-voicexml: optional
Missing target: usermessagingdriver-extension: optional
Extra target: Healthcare UI: soa_cluster
Missing target: Healthcare UI: SOA_Cluster ??
Extra Application Deployed: OWSM Policy Support in OSB Initializer Aplication

Library Check
=====================================================
Extra Library Deployed: oracle.bi.adf.model.slib#1.0-AT-11.1-DOT-1.2-DOT-0
Extra target: oracle.bpm.mgmt#11.1-DOT-1-AT-11.1-DOT-1: AdminServer
Missing target: oracle.bpm.mgmt#11.1.1-AT-11.1.1: soa_cluster
Extra target: oracle.sdp.messaging#11.1.1-AT-11.1.1: bam_cluster

StartupClass Check
=====================================================

ShutdownClass Check
=====================================================

JMS Resource Check
=====================================================
Missing target: configwiz-jms: bam_cluster

WLDF Resource Check
=====================================================

IMPORTANT UPDATE

Since posting this I have discovered a number of issues.  I have updated the configuration files to correct these problems.  The changes made are as follows:

  • Added WLS_OSB1 server mapping to the script properties file (verifyTargets.properties) to accommodate OSB singletons and modified script (verifyTargets.py) to use the new property.
  • Changes to verifyApplications.txt
    • Changed target from OSB_Cluster to WLS_OSB1 for the following applications:
      • ALSB Cluster Singleton Marker Application
      • ALSB Domain Singleton Marker Application
      • Message Reporting Purger
    • Added following application and targeted at SOA_Cluster
      • frevvo
    • Adding following application and targeted at OSB_Cluster & Admin Server
      • OWSM Policy Support in OSB Initializer Aplication
  • Changes to verifyLibraries.txt
    • Adding following library and targeted at OSB_Cluster, SOA_Cluster, BAM_Cluster & Admin Server
      • oracle.bi.adf.model.slib#1.0-AT-11.1.1.2-DOT-0
    • Modified targeting of following library to include BAM_Cluster
      • oracle.sdp.messaging#11.1.1-AT-11.1.1

Make sure that you download the latest version.  It is at the same location but now includes a version file (version.txt).  The contents of the version file should be:

FMW_VERSION=11.1.1.7

SCRIPT_VERSION=1.1

Thursday Sep 22, 2011

Coping with Failure

Handling Endpoint Failure in OSB

HardwareFailureRecently I was working on a POC and we had demonstrated stellar performance with OSB fronting a BPEL composite calling back end EJBs.  The final test was a failover test which tested killing an OSB and bringing it back online and then killing a SOA(BPEL) server and bringing it back online and finally killing a backend EJB server and bringing it back online.  All was going well until the BPEL failover test when for some reason OSB refused to mark the BPEL server as down.  Turns out we had forgotten to set a very important setting and so this entry outlines how to handle endpoint failure in OSB.

Step 1 – Add Multiple End Points to Business Service

The first thing to do is create multiple end points for the business service, pointing to all available backends.  This is required for HTTP/SOAP bindings.  In theory if using a T3 protocol then a single cluster address is sufficient and load balancing will be taken care of by T3 smart proxies.  In this scenario though we will focus on HTTP/SOAP endpoints.

Navigate to the Business Service->Configuration Details->Transport Configuration and add all your endpoint URIs.  Make sure that Retry Count is greater than 0 if you don’t want to pass failures back to the client.  In the example below I have set up links to three back end webs service instances.  Go to Last and Save the changes.

MultiOSBEndpoint

Step 2 – Enable Offlining & Recovery of Endpoint URIs

When a back end service instance fails we want to take it offline, meaning we want to remove it from the pool of instances to which OSB will route requests.  We do this by navigating to the Business Service->Operational Settings and selecting the Enable check box for Offline Endpoint URIs in the General Configuration section.  This causes OSB to stop routing requests to a backend that returns errors (if the transport setting Retry Application Errors is set) or fails to respond at all.

Offlining the service is good because we won’t send any more requests to a broken endpoint, but we also want to add the endpoint again when it becomes available.  We do this by setting the Enable with Retry Interval in General Configuration to some non-zero value, such as 30 seconds.  Then every 30 seconds OSB will add the failed service endpoint back into the list of endpoints.  If the endpoint is still not ready to accept requests then it will error again and be removed again from the list.  In the example below I have set up a 30 second retry interval.  Remember to hit update and then commit all the session changes.

OfflineOSBEndpoint

Considerations on Retry Count

A couple of things to be aware of on retry count.

If you set retry count to greater than zero then endpoint failures will be transparent to OSB clients, other than the additional delay they experience.  However if the request is mutative (changes the backend) then there is no guarantee that the request might not have been executed but the endpoint failed before turning the result, in which case you submit the mutative operation twice.  If your back end service can’t cope with this then don’t set retries.

If your back-end service can’t cope with retries then you can still get the benefit of transparent retries for non-mutative operations by creating two business services, one with retry enabled that handles non-mutative requests, and the other with retry set to zero that handles mutative requests.

Considerations on Retry Interval for Offline Endpoints

If you set the retry interval to too small a value then it is very likely that your failed endpoint will not have recovered and so you will waste time on a request failing to contact that endpoint before failing over to a new endpoint, this will increase the client response time.  Work out what would be a typical unplanned outage time for a node (such as caused by a JVM failure and subsequent restart) and set the retry interval to be say half of this as a comprise between causing additional client response time delays and adding the endpoint back into the mix as soon as possible.

Conclusion

Always remember to set the Operational Setting to Enable Offlining and then you won’t be surprised in a fail over test!

About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today