Sunday Mar 09, 2014

The Impact of Change

Measuring Impact of Change in SOA Suite

Mormon prophet Thomas S. Monson once said:

When performance is measured, performance improves. When performance is measured and reported, the rate of performance accelerates.

(LDS Conference Report, October 1970, p107)

Like everything in life, a SOA Suite installation that is monitored and tracked has a much better chance of performing well than one that is not measured.  With that in mind I came up with tool to allow the measurement of the impact of configuration changes on database usage in SOA Suite.  This tool can be used to assess the impact of different configurations on both database growth and database performance, helping to decide which optimizations offer real benefit to the composite under test.

Basic Approach

The basic approach of the tool is to take a snapshot of the number of rows in the SOA tables before executing a composite.  The composite is then executed.  After the composite has completed another snapshot is taken of the SOA tables.  This is illustrated in the diagram below:

An example of the data collected by the tool is shown below:

Test Name Total Tables Changed Total Rows Added Notes
AsyncTest1 13 15 Async Interaction with simple SOA composite, one retry to send response.
AsyncTest2 12 13 Async interaction with simple SOA composite, no retries on sending response.
AsyncTest3 12 13 Async interaction with simple SOA composite, no callback address provided.
OneWayTest1 12 13 One-Way interaction with simple SOA composite.
SyncTest1 7 7 Sync interaction with simple SOA composite.

Note that the first three columns are provided by the tool, the fourth column is just an aide-memoir to identify what the test name actually did. The tool also allows us to drill into the data to get a better look at what is actually changing as shown in the table below:

Test Name Table Name Rows Added
AsyncTest1 AUDIT_COUNTER 1
AsyncTest1 AUDIT_DETAILS 1
AsyncTest1 AUDIT_TRAIL 2
AsyncTest1 COMPOSITE_INSTANCE 1
AsyncTest1 CUBE_INSTANCE 1
AsyncTest1 CUBE_SCOPE 1
AsyncTest1 DLV_MESSAGE 1
AsyncTest1 DOCUMENT_CI_REF 1
AsyncTest1 DOCUMENT_DLV_MSG_REF 1
AsyncTest1 HEADERS_PROPERTIES 1
AsyncTest1 INSTANCE_PAYLOAD 1
AsyncTest1 WORK_ITEM 1
AsyncTest1 XML_DOCUMENT 2

Here we have drilled into the test case with the retry of the callback to see what tables are actually being written to.

Finally we can compare two tests to see difference in the number of rows written and the tables updated as shown below:

Test Name Base Test Name Table Name Row Difference
AsyncTest1 AsyncTest2 AUDIT_TRAIL 1

Here are the additional tables referenced by this test

Test Name Base Test Name Additional Table Name Rows Added
AsyncTest1 AsyncTest2 WORK_ROWS 1

How it Works

I created a database stored procedure, soa_snapshot.take_soa_snaphot(test_name, phase). that queries all the SOA tables and records the number of rows in each table.  By running the stored procedure before and after the execution of a composite we can capture the number of rows in the SOA database before and after a composite executes.  I then created a view that shows the difference in the number of rows before and after composite execution.  This view has a number of sub-views that allow us to query specific items.  The schema is shown below:

The different tables and views are:

  • CHANGE_TABLE
    • Used to track number of rows in SOA schema, each test case has two or more phases.  Usually phase 1 is before execution and phase 2 is after execution.
    • This only used by the stored procedure and the views.
  • DELTA_VIEW
    • Used to track changes in number of rows in SOA database between phases of a test case.  This is a view on CHANGE_TABLE.  All other views are based off this view.
  • SIMPLE_DELTA_VIEW
    • Provides number of rows changed in each table.
  • SUMMARY_DELTA_VIEW
    • Provides a summary of total rows and tables changed.
  • DIFFERENT_ROWS_VIEW
    • Provides a summary of differences in rows updated between test cases
  • EXTRA_TABLES_VIEW
    • Provides a summary of the extra tables and rows used by a test case.
    • This view makes use of a session context, soa_ctx, which holds the test case name and the baseline test case name.  This context is initialized by calling the stored procedure soa_ctx_pkg.set(testCase, baseTestCase).

I created a web service wrapper to the take_soa_snapshot procedure so that I could use SoapUI to perform the tests.

Sample Output

How many rows and tables did a particular test use?

Here we can see how many rows in how many tables changed as a result of running a test:

-- Display the total number of rows and tables changed for each test
select * from summary_delta_view
order by test_name;

TEST_NAME            TOTALDELTAROWS TOTALDELTASIZE TOTALTABLES
-------------------- -------------- -------------- -----------
AsyncTest1                   15              0          13
AsyncTest1noCCIS             15              0          13
AsyncTest1off                 8              0           8
AsyncTest1prod               13              0          12
AsyncTest2                   13              0          12
AsyncTest2noCCIS             13              0          12
AsyncTest2off                 7              0           7
AsyncTest2prod               11              0          11
AsyncTest3                   13              0          12
AsyncTest3noCCIS             13          65536          12
AsyncTest3off                 7              0           7
AsyncTest3prod               11              0          11
OneWayTest1                  13              0          12
OneWayTest1noCCI             13          65536          12
OneWayTest1off                7              0           7
OneWayTest1prod              11              0          11
SyncTest1                     7              0           7
SyncTest1noCCIS               7              0           7
SyncTest1off                  2              0           2
SyncTest1prod                 5              0           5

20 rows selected

Which tables grew during a test?

Here for a given test we can see which tables had rows inserted.

-- Display the tables which grew and show the number of rows they grew by
select * from simple_delta_view
where test_name='AsyncTest1'
order by table_name;
TEST_NAME            TABLE_NAME                      DELTAROWS  DELTASIZE
-------------------- ------------------------------ ---------- ----------
AsyncTest1       AUDIT_COUNTER                           1          0
AsyncTest1       AUDIT_DETAILS                           1          0
AsyncTest1       AUDIT_TRAIL                             2          0
AsyncTest1       COMPOSITE_INSTANCE                      1          0
AsyncTest1       CUBE_INSTANCE                           1          0
AsyncTest1       CUBE_SCOPE                              1          0
AsyncTest1       DLV_MESSAGE                             1          0
AsyncTest1       DOCUMENT_CI_REF                         1          0
AsyncTest1       DOCUMENT_DLV_MSG_REF                    1          0
AsyncTest1       HEADERS_PROPERTIES                      1          0
AsyncTest1       INSTANCE_PAYLOAD                        1          0
AsyncTest1       WORK_ITEM                               1          0
AsyncTest1       XML_DOCUMENT                            2          0
13 rows selected

Which tables grew more in test1 than in test2?

Here we can see the differences in rows for two tests.

-- Return difference in rows updated (test1)
select * from different_rows_view
where test1='AsyncTest1' and test2='AsyncTest2';

TEST1                TEST2                TABLE_NAME                          DELTA
-------------------- -------------------- ------------------------------ ----------
AsyncTest1       AsyncTest2       AUDIT_TRAIL                             1

Which tables were used by test1 but not by test2?

Here we can see tables that were used by one test but not by the other test.

-- Register base test case for use in extra_tables_view
-- First parameter (test1) is test we expect to have extra rows/tables
begin soa_ctx_pkg.set('AsyncTest1', 'AsyncTest2'); end;
/
anonymous block completed
-- Return additional tables used by test1
column TEST2 FORMAT A20
select * from extra_tables_view;
TEST1                TEST2                TABLE_NAME                      DELTAROWS
-------------------- -------------------- ------------------------------ ----------
AsyncTest1       AsyncTest2       WORK_ITEM                               1

 

Results

I used the tool to find out the following.  All tests were run using SOA Suite 11.1.1.7.

The following is based on a very simple composite as shown below:

Each BPEL process is basically the same as the one shown below:

Impact of Fault Policy Retry Being Executed Once

Setting Total Rows Written Total Tables Updated
No Retry 13 12
One Retry 15 13

When a fault policy causes a retry then the following additional database rows are written:

Table Name Number of Rows
AUDIT_TRAIL 1
WORK_ITEM 1

Impact of Setting Audit Level = Development Instead of Production

Setting Total Rows Written Total Tables Updated
Development 13 12
Production 11 11

When the audit level is set at development instead of production then the following additional database rows are written:

Table Name Number of Rows
AUDIT_TRAIL 1
WORK_ITEM 1

Impact of Setting Audit Level = Production Instead of Off

Setting Total Rows Written Total Tables Updated
Production 11 11
Off 7 7

When the audit level is set at production rather than off then the following additional database rows are written:

Table Name Number of Rows
AUDIT_COUNTER 1
AUDIT_DETAILS 1
AUDIT_TRAIL 1
COMPOSITE_INSTANCE 1

Impact of Setting Capture Composite Instance State

Setting Total Rows Written Total Tables Updated
On 13 12
Off 13 12

When capture composite instance state is on rather than off then no additional database rows are written, note that there are other activities that occur when composite instance state is captured:

Impact of Setting oneWayDeliveryPolicy = async.cache or sync

Setting Total Rows Written Total Tables Updated
async.persist 13 12
async.cache 7 7
sync 7 7

When choosing async.persist (the default) instead of sync or async.cache then the following additional database rows are written:

Table Name Number of Rows
AUDIT_DETAILS 1
DLV_MESSAGE 1
DOCUMENT_CI_REF 1
DOCUMENT_DLV_MSG_REF 1
HEADERS_PROPERTIES 1
XML_DOCUMENT 1

As you would expect the sync mode behaves just as a regular synchronous (request/reply) interaction and creates the same number of rows in the database.  The async.cache also creates the same number of rows as a sync interaction because it stores state in memory and provides no restart guarantee.

Caveats & Warnings

The results above are based on a trivial test case.  The numbers will be different for bigger and more complex composites.  However by taking snapshots of different configurations you can produce the numbers that apply to your composites.

The capture procedure supports multiple steps in a test case, but the views only support two snapshots per test case.

Code Download

The sample project I used us available here.

The scripts used to create the user (createUser.sql), create the schema (createSchema.sql) and sample queries (TableCardinality.sql) are available here.

The Web Service wrapper to the capture state stored procedure is available here.

The sample SoapUI project that I used to take a snapshot, perform the test and take a second snapshot is available here.

Friday Jan 17, 2014

Clear Day for Cloud Adapters

salesforce.com Adapter Released

Yesterday Oracle released their cloud adapter for salesforce.com (SFDC) so I thought I would talk a little about why you might want it.  I had previously integrated with SFDC using BPEL and the SFDC web interface, so in this post I will explore why the adapter might be a better approach.

Why?

So if I can interface to SFDC without the adapter why would I spend money on the adapter?  There are a number of reasons and in this post I will just explain the following 3 benefits:

  • Auto-Login
  • Non-Ploymorphic Operations
  • Operation Wizards

Lets take each one in turn.

Auto-Login

The first obvious benefit is how you connect and make calls to SFDC.  To perform an operation such as query an account or update an address the SFDC interface requires you to do the following:

  • Invoke a login method which returns a session ID to placed in the header on all future calls and the actual endpoint to call.
  • Invoke the actual operation using the provided endpoint and passing the session ID provided.
  • When finished with calls invoke the logout operation.

Now these are not unreasonable demands.  The problem comes when you try to implement this interface.

Before calling the login method you need the credentials.  These need to be obtained from somewhere, I set them as BPEL preferences but there are other ways to store them.  Calling the login method is not a problem but you need to be careful in how you make subsequent calls.

First all subsequent calls must override the endpoint address with the one returned from the login operation.  Secondly the provided session ID must be placed into a custom SOAP header.  So you have to copy the session ID into a custom SOAP Header and provide that header to the invoke operation.  You also have to override the endpointURI property in the invoke with provided endpoint.

Finally when you have finished performing operations you have to logout.

In addition to the number of steps you have to code there is the problem of knowing when to logout.  The simplest thing to do is for each operation you wish to perform execute the login operatoin folloed y the actual working operation and then do a logout operation.  The trouble with this is that you are now making 3 calls every time you want to perform an operation against SFDC.  This causes additional latency in processing the request.

The adapter hides all this from you, hiding the login/logout operations and allowing connections to  be re-used, reducing the number of logins required.  The adapter makes the SFDC call look like a call to any other web service while the adapter uses a session cache to avoid repeated logins.

Non-Polymorphic Operations

The standard operations in the SFDC interface provide a base object return type, the sObject.  This could be an Account or a Campaign for example but the operations always return the base sObject type, leaving it to the client to make sure they process the correct return type.  Similarly requests also use polymorphic data types.  This often requires in BPEL that the sObject returned is copied to a variable of a more specific type to simplify processing of the data.  If you don’t do this then you can still query fields within the specific object but the SOA tooling cannot check it for you.

The adapter identifies the type of request and response and also helps build the request for you with bind parameters.  This means that you are able to build your process to actually work with the real data structures, not the abstract ones.  This is another big benefit in my view!

Operation Wizards

The SFDC API is very powerful.  Translation: the SFDC API is very complex.  With great power comes complexity to paraphrase Uncle Ben (amongst others).  The adapter groups the operations into logical collections and then provides additional help in selecting from within those collections of operations and providing the correct parameters for them.

Installing

Installation takes place in two parts.  The Design time is deployed into JDeveloper and the run time is deployed into the SOA Suite Oracle Home.  The adapter is available for download here and the installation instructions and documentation are here.  Note that you will need OPatch to install the adapter.  OPatch can be downloaded from Oracle Support Patch 6880880.  Don’t use the OPatch that ships with JDeveloper and SOA Suite.  If you do you may see an error like:

Uncaught exception
oracle.classloader.util.AnnotatedNoClassDefFoundError:
      Missing class: oracle.tip.tools.ide.adapters.cloud.wizard.CloudAdapterWizard

You will want the OPatch 11.1.0.x.x.  Make sure you download the correct 6880880, it must be for 11.1.x as that is the version of JDeveloper and SOA Suite and it must be for the platform you are running on.

Apart from getting the right OPatch the installation is very straight forward.

So don’t be afraid of SFDC integration any more, cloud integratoin is clear with the SFDC adapter.

Thursday Dec 26, 2013

List Manipulation in Rules

Generating Lists from Rules

Recently I was working with a customer that wanted to use rules to do validation.  The idea was to pass in a document to the rules engine and get back a list of violations, or an empty list if there were no violations.  Turns out that there were a coupe more steps required than I expected so thought I would share my solution in case anyone else is wondering how to return lists from the rules engine.

The Scenario

For the purposes of this blog I modeled a very simple shipping company document that has two main parts.   The Package element contains information about the actual item to be shipped, its weight, type of package and destination details.  The Billing element details the charges applied.

For the purpose of this blog I want to validate the following:

  • A residential surcharge is applied to residential addresses.
  • A residential surcharge is not applied to non-residential addresses.
  • The package is of the correct weight for the type of package.

The Shipment element is sent to the rules engine and the rules engine replies with a ViolationList element that has all the rule violations that were found.

Creating the Return List

We need to create a new ViolationList within rules so that we can return it from within the decision function.  To do this we create a new global variable – I called it ViolationList – and initialize it.  Note that I also had some globals that I used to allow changing the weight limits for different package types.

When the rules session is created it will initialize the global variables and assert the input document – the Shipment element.  However within rules our ViolationList variable has an uninitialized internal List that is used to hold the actual List of Violation elements.  We need to initialize this to an empty RL.list in the Decision Functions “Initial Actions” section.

We can then assert the global variable as a fact to make it available to be the return value of the decision function.  After this we can now create the rules.

Adding a Violation to the List

If a rule fires because of a violation then we need add a Violation element to the list.  The easiest way to do this without having the rule check the ViolationList directly is to create a function to add the Violation to the global variable VioaltionList.

The function creates a new Violation and initializes it with the appropriate values before appending it to the list within the ViolationList.

When a rule fires then it just necessary to call the function to add the violation to the list.

In the example above if the address is a residential address and the surcharge has not been applied then the function is called with an appropriate error code and message.

How it Works

Each time a rule fires we can add the violation to the list by calling the function.  If multiple rules fire then we will get multiple violations in the list.  We can access the list from a function because it is a global variable.  Because we asserted the global variable as a fact in the decision function initialization function it is picked up by the decision function as a return value.  When all possible rules have fired then the decision function will return all asserted ViolationList elements, which in this case will always be 1 because we only assert it in the initialization function.

What Doesn’t Work

A return from a decision function is always a list of the element you specify, so you may be tempted to just assert individual Violation elements and get those back as a list.  That will work if there is at least one element in the list, but the decision function must always return at least one element.  So if there are no violations then you will get an error thrown.

Alternative

Instead of having a completely separate return element you could have the ViolationList as part of the input element and then return the input element from the decision function.  This would work but now you would be copying most of the input variables back into the output variable.  I prefer to have a cleaner more function like interface that makes it easier to handle the response.

Download

Hope this helps someone.  A sample composite project is available for download here.  The composite includes some unit tests.  You can run these from the EM console and then look at the inputs and outputs to see how things work.

Tuesday Dec 24, 2013

Cleaning Up After Yourself

Maintaining a Clean SOA Suite Test Environment

Fun blog entry with Fantasia animated gifs got me thinking like Mickey about how nice it would be to automate clean up tasks.

I don’t have a sorcerers castle to clean up but I often have a test environment which I use to run tests, then after fixing problems that I uncovered in the tests I want to run them again.  The problem is that all the data from my previous test environment is still there.

Now in the past I used VirtualBox snapshots to rollback to a clean state, but this has a problem that it not only loses the environment changes I want to get rid of such as data inserted into tables, it also gets rid of changes I want to keep such as WebLogic configuration changes and new shell scripts.  So like Mickey I went in search of some magic to help me.

Cleaning Up SOA Environment

My first task was to clean up the SOA environment by deleting all instance data from the tables.  Now I could use the purge scripts to do this, but that would still leave me with running instances, for example 800 Human Workflow Tasks that I don’t want to deal with.  So I used the new truncate script to take care of this.  Basically this removes all instance data from your SOA Infrastructure, whether or not the data is live.  This can be run without taking down the SOA Infrastructure (although if you do get strange behavior you may want to restart SOA).  Some statistics, such are service and reference statistics, are kept since server startup, so you may want to restart your server to clear that data.  A sample script to run the truncate SQL is shown below.

#!/bin/sh
# Truncate the SOA schemas, does not truncate BAM.
# Use only in development and test, not production.

# Properties to be set before running script
# SOAInfra Database SID
DB_SID=orcl
# SOA DB Prefix
SOA_PREFIX=DEV
# SOAInfra DB password
SOAINFRA_PASSWORD=welcome1
# SOA Home Directory
SOA_HOME=/u01/app/fmw/Oracle_SOA1

# Set DB Environment
. oraenv << EOF
${DB_SID}
EOF

# Run Truncate script from directory it lives in
cd ${SOA_HOME}/rcu/integration/soainfra/sql/truncate

# Run the truncate script
sqlplus ${SOA_PREFIX}_soainfra/${SOAINFRA_PASSWORD} @truncate_soa_oracle.sql << EOF
exit
EOF

After running this script all your SOA composite instances and associated workflow instances will be gone.

Cleaning Up BAM

The above example shows how easy it is to get rid of all the runtime data in your SOA repository, however if you are using BAM you still have all the contents of your BAM objects from previous runs.  To get rid of that data we need to use BAM ICommand’s clear command as shown in the sample script below:

#!/bin/sh
# Set software locations
FMW_HOME=/home/oracle/fmw
export JAVA_HOME=${FMW_HOME}/jdk1.7.0_17
BAM_CMD=${FMW_HOME}/Oracle_SOA1/bam/bin/icommand
# Set objects to purge
BAM_OBJECTS=/path/RevenueEvent /path/RevenueViolation

# Clean up BAM
for name in ${BAM_OBJECTS}
do
  ${BAM_CMD} -cmd clear -name ${name} -type dataobject
done

After running this script all the rows of the listed objects will be gone.

Ready for Inspection

Unlike the hapless Mickey, our clean up scripts work reliably and do what we want without unexpected consequences, like flooding the castle.

Friday Dec 20, 2013

Supporting the Team

SOA Support Team Blog

Some of my former colleagues in support have created a blog to help answer common problems for customers.  One way they are doing this is by creating better landing zones within My Oracle Support (MOS).  I just used the blog to locate the landing zone for database related issues in SOA Suite.  I needed to get the purge scripts working on 11.1.1.7 and I couldn’t find the patches needed to do that.  A quick look on the blog and I found a suitable entry that directed me to the Oracle Fusion Middleware (FMW) SOA 11g Infrastructure Database: Installation, Maintenance and Administration Guide (Doc ID 1384379.1) in MOS.  Lots of other useful stuff on the blog so stop by and check it out, great job Shawn, Antonella, Maria & JB.

Thursday Dec 19, 2013

$5 eBook Bonanza

Packt eBooks $5 Offer

Packt Publishing just told me about their Christmas offer, get eBooks for $5.

From December 19th, customers will be able to get any eBook or Video from Packt for just $5. This offer covers a myriad of titles in the 1700+ range where customers will be able to grab as many as they like until January 3rd 2014 – more information is available at http://bit.ly/1jdCr2W

If you haven’t bought the SOA Developers Cookbook then now is a great time to do so!

Friday Nov 15, 2013

Postscript on Scripts

More Scripts for SOA Suite

Over time I have evolved my startup scripts and thought it would be a good time to share them.  They are available for download here.  I have finally converted to using WLST, which has a number of advantages.  To me the biggest advantage is that the output and log files are automatically written to a consistent location in the domain directory or node manager directory.  In addition the WLST scripts wait for the component to start and then return, this lets us string commands together without worrying about the dependencies.

The following are the key scripts (available for download here):

Script Description Pre-Reqs Stops when
Task Complete
startWlstNodeManager.sh Starts Node Manager using WLST None Yes
startNodeManager.sh Starts Node Manager None Yes
stopNodeManager.sh Stops Node Manager using WLST Node Manager running Yes
startWlstAdminServer.sh Starts Admin Server using WLST Node Manager running Yes
startAdminServer.sh Starts Admin Server None No
stopAdminServer.sh Stops Admin Server Admin Server running Yes
startWlstManagedServer.sh Starts Managed Server using WLST Node Manager running Yes
startManagedServer.sh Starts Managed Server None No
stopManagedServer.sh Stops Managed Server Admin Server running Yes

Samples

To start Node Manager and Admin Server

startWlstNodeManager.sh ; startWlstAdminServer.sh

To start Node Manager, Admin Server and SOA Server

startWlstNodeManager.sh ; startWlstAdminServer.sh ; startWlstManagedServer soa_server1

Note that the Admin server is not started until the Node Manager is running, similarly the SOA server is not started until the Admin server is running.

Node Manager Scripts

startWlstNodeManager.sh

Uses WLST to start the Node Manager.  When the script completes the Node manager will be running.

startNodeManager.sh

The Node Manager is started in the background and the output is piped to the screen. This causes the Node Manager to continue running in the background if the terminal is closed. Log files, including a .out capturing standard output and standard error, are placed in the <WL_HOME>/common/nodemanager directory, making them easy to find. This script pipes the output of the log file to the screen and keeps doing this until terminated, Terminating the script does not terminate the Node Manager.

stopNodeManager.sh

Uses WLST to stop the Node Manager.  When the script completes the Node manager will be stopped.

Admin Server Scripts

startWlstAdminServer.sh

Uses WLST to start the Admin Server.  The Node Manager must be running before executing this command.  When the script completes the Admin Server will be running.

startAdminServer.sh

The Admin Server is started in the background and the output is piped to the screen. This causes the Admin Server to continue running in the background if the terminal is closed.  Log files, including the .out capturing standard output and standard error, are placed in the same location as if the server had been started by Node Manager, making them easy to find.  This script pipes the output of the log file to the screen and keeps doing this until terminated,  Terminating the script does not terminate the server.

stopAdminServer.sh

Stops the Admin Server.  When the script completes the Admin Server will no longer be running.

Managed Server Scripts

startWlstManagedServer.sh <MANAGED_SERVER_NAME>

Uses WLST to start the given Managed Server. The Node Manager must be running before executing this command. When the script completes the given Managed Server will be running.

startManagedServer.sh <MANAGED_SERVER_NAME>

The given Managed Server is started in the background and the output is piped to the screen. This causes the given Managed Server to continue running in the background if the terminal is closed. Log files, including the .out capturing standard output and standard error, are placed in the same location as if the server had been started by Node Manager, making them easy to find. This script pipes the output of the log file to the screen and keeps doing this until terminated, Terminating the script does not terminate the server.

stopManagedServer.sh <MANAGED_SERVER_NAME>

Stops the given Managed Server. When the script completes the given Managed Server will no longer be running.

Utility Scripts

The following scripts are not called directly but are used by the previous scripts.

_fmwenv.sh

This script is used to provide information about the Node Manager and WebLogic Domain and must be edited to reflect the installed FMW environment, in particular the following values must be set:

  • DOMAIN_NAME – the WebLogic domain name.
  • NM_USERNAME – the Node Manager username.
  • NM_PASSWORD – the Node Manager password.
  • MW_HOME – the location where WebLogic and other FMW components are installed.
  • WEBLOGIC_USERNAME – the WebLogic Administrator username.
  • WEBLOGIC_PASSWORD - the WebLogic Administrator password.

The following values may also need changing:

  • ADMIN_HOSTNAME – the server where AdminServer is running.
  • ADMIN_PORT – the port number of the AdminServer.
  • DOMAIN_HOME – the location of the WebLogic domain directory, defaults to ${MW_HOME}/user_projects/domains/${DOMAIN_NAME}
  • NM_LISTEN_HOST – the Node Manager listening hostname, defaults to the hostname of the machine it is running on.
  • NM_LISTEN_PORT – the Node Manager listening port.

_runWLst.sh

This script runs the WLST script passed in environment variable ${SCRIPT} and takes its configuration from _fmwenv.sh.  It dynamically builds a WLST properties file in the /tmp directory to pass parameters into the scripts.  The properties filename is of the form <DOMAIN_NAME>.<PID>.properties.

_runAndLog.sh

This script runs the command passed in as an argument, writing standard out and standard error to a log file.  The log file is rotated between invocations to avoid losing the previous log files.  The log file is then tailed and output to the screen.  This means that this script will never finish by itself.

WLST Scripts

The following WLST scripts are used by the scripts above, taking their properties from /tmp/<DOMAIN_NAME>.<PID>.properties:

  • startNodeManager.py
  • stopNodeManager.py
  • startServer.py
  • startServerNM.py

Relationships

The dependencies and relationships between my scripts and the built in scripts are shown in the diagram below.

Thanks for the Memory

Controlling Memory in Oracle SOA Suite

Within WebLogic you can specify the memory to be used by each managed server in the WebLogic console.  Unfortunately if you create a domain with Oracle SOA Suite it adds a new config script, setSOADomainEnv.sh, that overwrites any USER_MEM_ARGS passed in to the start scripts.  setDomainEnv.sh only sets a single set of memory arguments that are used by all servers, so an admin server gets the same memory parameters as a BAM server.  This means some servers will have more memory than they need and others will be too tightly constrained.  This is a bad thing.

A Solution

To overcome this I wrote a small script, setMem.sh, that checks to see which server is being started and then sets the memory accordingly.  It supports up to 9 different server types, each identified by a prefix.  If the prefix matches then the max and min heap and max and min perm gen are set.  Settings are controlled through variables set in the script,  If using JRockit then leave the PERM settings empty and they will not be set.

SRV1_PREFIX=Admin
SRV1_MINHEAP=768m
SRV1_MAXHEAP=1280m
SRV1_MINPERM=256m
SRV1_MAXPERM=512m

The above settings match any server whose name starts Admin.  The above will result in the following USER_MEM_ARGS value

USER_MEM_ARGS=-Xms768m -Xmx1280m -XX:PermSize=256m -XX:MaxPermSize=512m

If the prefix were soa then it would match soa_server1, soa_server2 etc.

There is a set of DEFAULT_ values that allow you to default the memory settings and only set them explicitly for servers that have settings different from your default.  Note that there must still be a matching prefix, otherwise the USER_MEM_ARGS will be the ones from setSOADomainEnv.sh.

This script needs to be called from the setDomainEnv.sh.  Add the call immediately before the following lines:

if [ "${USER_MEM_ARGS}" != "" ] ; then
        MEM_ARGS="${USER_MEM_ARGS}"
        export MEM_ARGS
fi

The script can be downloaded here.

Thoughts on Memory Setting

I know conventional wisdom is that Xms (initial heap size) and Xmx (max heap size) should be set the same to avoid needing to allocate memory after startup.  However I don’t agree with this.  Setting Xms==Xmx works great if you are omniscient, I don’t claim to be omniscient, my omni is a little limited, so I prefer to set Xms to what I think the server needs, this is what I would set it to if I was setting the parameters to be Xms=Xmx.  However I like to then set Xmx to be a little higher than I think the server needs.  This allows me to get the memory requirement wrong and not have my server fail due to out of memory.  In addition I can now monitor the server and if the heap memory usage goes above my Xms setting I know that I calculated wrong and I can change the Xms setting and restart the servers at a suitable time.  Setting them not equal buys be time to fix increased memory needs, for exampl edue to change in usage patterns or new functionality, it also allows me to be less than omniscient.

ps Please don’t tell my children I am not omniscient!

Tuesday May 21, 2013

Target Verification

Verifying the Target

I just built a combined OSB, SOA/BPM, BAM clustered domain.  The biggest hassle is validating that the resource targeting is correct.  There is a great appendix in the documentation that lists all the modules and resources with their associated targets.  The only problem is that the appendix is six pages of small print.  I manually went through the first page, verifying my targeting, until I thought ‘there must be a better way of doing this’.  So this blog post is the better way Smile

WLST to the Rescue

WebLogic Scripting Tool allows us to query the MBeans and discover what resources are deployed and where they are targeted.  So I built a script that iterates over each of the following resource types and verifies that they are correctly targeted:

  • Applications
  • Libraries
  • Startup Classes
  • Shutdown Classes
  • JMS System Resources
  • WLDF System Resources

Source Data

To get the data to verify my domain against, I copied the tables from the documentation into a text file.  The copy ended up putting the resource on the first line and the targets on the second line.  Rather than reformat the data I just read the lines in pairs, storing the resource as a string and splitting apart the targets into a list of strings.  I then stored the data in a dictionary with the resource string as the key and the target list as the value.  The code to do this is shown below:

# Load resource and target data from file created from documentation
# File format is a one line with resource name followed by
# one line with comma separated list of targets
# fileIn - Resource & Target File
# accum - Dictionary containing mappings of expected Resource to Target
# returns - Dictionary mapping expected Resource to expected Target
def parseFile(fileIn, accum) :
  # Load resource name
  line1 = fileIn.readline().strip('\n')
  if line1 == '':
    # Done if no more resources
    return accum
  else:
    # Load list of targets
    line2 = fileIn.readline().strip('\n')
    # Convert string to list of targets
    targetList = map(fixTargetName, line2.split(','))
    # Associate resource with list of targets in dictionary
    accum[line1] = targetList
    # Parse remainder of file
    return parseFile(fileIn, accum)

This makes it very easy to update the lists by just copying and pasting from the documentation.

Each table in the documentation has a corresponding file that is used by the script.

The data read from the file has the target names mapped to the actual domain target names which are provided in a properties file.

Listing & Verifying the Resources & Targets

Within the script I move to the domain configuration MBean and then iterate over the resources deployed and for each resource iterate over the targets, validating them against the corresponding targets read from the file as shown below:

# Validate that resources are correctly targeted
# name - Name of Resource Type
# filename - Filename to validate against
# items - List of Resources to be validated
def validateDeployments(name, filename, items) :
  print name+' Check'
  print "====================================================="
  fList = loadFile(filename)
  # Iterate over resources
  for item in items:
    try:
      # Get expected targets for resource
      itemCheckList = fList[item.getName()]
      # Iterate over actual targets
      for target in item.getTargets() :
        try:
          # Remove actual target from expected targets
          itemCheckList.remove(target.getName())
        except ValueError:
          # Target not found in expected targets
          print 'Extra target: '+item.getName()+': '+target.getName()
      # Iterate over remaining expected targets, if any
      for refTarget in itemCheckList:
        print 'Missing target: '+item.getName()+': '+refTarget
    except KeyError:
      # Resource not found in expected resource dictionary
      print 'Extra '+name+' Deployed: '+item.getName()
  print

Obtaining the Script

I have uploaded the script here.  It is a zip file containing all the required files together with a PDF explaining how to use the script.

To install just unzip VerifyTargets.zip. It will create the following files

  • verifyTargets.sh
  • verifyTargets.properties
  • VerifyTargetsScriptInstructions.pdf
  • scripts/verifyTargets.py
  • scripts/verifyApps.txt
  • scripts/verifyLibs.txt
  • scripts/verifyStartup.txt
  • scripts/verifyShutdown.txt
  • scripts/verifyJMS.txt
  • scripts/verifyWLDF.txt

Sample Output

The following is sample output from running the script:

Application Check
=====================================================
Extra Application Deployed: frevvo
Missing target: usermessagingdriver-xmpp: optional
Missing target: usermessagingdriver-smpp: optional
Missing target: usermessagingdriver-voicexml: optional
Missing target: usermessagingdriver-extension: optional
Extra target: Healthcare UI: soa_cluster
Missing target: Healthcare UI: SOA_Cluster ??
Extra Application Deployed: OWSM Policy Support in OSB Initializer Aplication

Library Check
=====================================================
Extra Library Deployed: oracle.bi.adf.model.slib#1.0-AT-11.1-DOT-1.2-DOT-0
Extra target: oracle.bpm.mgmt#11.1-DOT-1-AT-11.1-DOT-1: AdminServer
Missing target: oracle.bpm.mgmt#11.1.1-AT-11.1.1: soa_cluster
Extra target: oracle.sdp.messaging#11.1.1-AT-11.1.1: bam_cluster

StartupClass Check
=====================================================

ShutdownClass Check
=====================================================

JMS Resource Check
=====================================================
Missing target: configwiz-jms: bam_cluster

WLDF Resource Check
=====================================================

IMPORTANT UPDATE

Since posting this I have discovered a number of issues.  I have updated the configuration files to correct these problems.  The changes made are as follows:

  • Added WLS_OSB1 server mapping to the script properties file (verifyTargets.properties) to accommodate OSB singletons and modified script (verifyTargets.py) to use the new property.
  • Changes to verifyApplications.txt
    • Changed target from OSB_Cluster to WLS_OSB1 for the following applications:
      • ALSB Cluster Singleton Marker Application
      • ALSB Domain Singleton Marker Application
      • Message Reporting Purger
    • Added following application and targeted at SOA_Cluster
      • frevvo
    • Adding following application and targeted at OSB_Cluster & Admin Server
      • OWSM Policy Support in OSB Initializer Aplication
  • Changes to verifyLibraries.txt
    • Adding following library and targeted at OSB_Cluster, SOA_Cluster, BAM_Cluster & Admin Server
      • oracle.bi.adf.model.slib#1.0-AT-11.1.1.2-DOT-0
    • Modified targeting of following library to include BAM_Cluster
      • oracle.sdp.messaging#11.1.1-AT-11.1.1

Make sure that you download the latest version.  It is at the same location but now includes a version file (version.txt).  The contents of the version file should be:

FMW_VERSION=11.1.1.7

SCRIPT_VERSION=1.1

Wednesday Oct 10, 2012

Following the Thread in OSB

Threading in OSB

The Scenario

I recently led an OSB POC where we needed to get high throughput from an OSB pipeline that had the following logic:

1. Receive Request
2. Send Request to External System
3. If Response has a particular value
  3.1 Modify Request
  3.2 Resend Request to External System
4. Send Response back to Requestor

All looks very straightforward and no nasty wrinkles along the way.  The flow was implemented in OSB as follows (see diagram for more details):

  • Proxy Service to Receive Request and Send Response
  • Request Pipeline
    •   Copies Original Request for use in step 3
  • Route Node
    •   Sends Request to External System exposed as a Business Service
  • Response Pipeline
    •   Checks Response to Check If Request Needs to Be Resubmitted
      • Modify Request
      • Callout to External System (same Business Service as Route Node)

The Proxy and the Business Service were each assigned their own Work Manager, effectively giving each of them their own thread pool.

The Surprise

Imagine our surprise when, on stressing the system we saw it lock up, with large numbers of blocked threads.  The reason for the lock up is due to some subtleties in the OSB thread model which is the topic of this post.

 

Basic Thread Model

OSB goes to great lengths to avoid holding on to threads.  Lets start by looking at how how OSB deals with a simple request/response routing to a business service in a route node.

Most Business Services are implemented by OSB in two parts.  The first part uses the request thread to send the request to the target.  In the diagram this is represented by the thread T1.  After sending the request to the target (the Business Service in our diagram) the request thread is released back to whatever pool it came from.  A multiplexor (muxer) is used to wait for the response.  When the response is received the muxer hands off the response to a new thread that is used to execute the response pipeline, this is represented in the diagram by T2.

OSB allows you to assign different Work Managers and hence different thread pools to each Proxy Service and Business Service.  In out example we have the “Proxy Service Work Manager” assigned to the Proxy Service and the “Business Service Work Manager” assigned to the Business Service.  Note that the Business Service Work Manager is only used to assign the thread to process the response, it is never used to process the request.

This architecture means that while waiting for a response from a business service there are no threads in use, which makes for better scalability in terms of thread usage.

First Wrinkle

Note that if the Proxy and the Business Service both use the same Work Manager then there is potential for starvation.  For example:

  • Request Pipeline makes a blocking callout, say to perform a database read.
  • Business Service response tries to allocate a thread from thread pool but all threads are blocked in the database read.
  • New requests arrive and contend with responses arriving for the available threads.

Similar problems can occur if the response pipeline blocks for some reason, maybe a database update for example.

Solution

The solution to this is to make sure that the Proxy and Business Service use different Work Managers so that they do not contend with each other for threads.

Do Nothing Route Thread Model

So what happens if there is no route node?  In this case OSB just echoes the Request message as a Response message, but what happens to the threads?  OSB still uses a separate thread for the response, but in this case the Work Manager used is the Default Work Manager.

So this is really a special case of the Basic Thread Model discussed above, except that the response pipeline will always execute on the Default Work Manager.

 

Proxy Chaining Thread Model

So what happens when the route node is actually calling a Proxy Service rather than a Business Service, does the second Proxy Service use its own Thread or does it re-use the thread of the original Request Pipeline?

Well as you can see from the diagram when a route node calls another proxy service then the original Work Manager is used for both request pipelines.  Similarly the response pipeline uses the Work Manager associated with the ultimate Business Service invoked via a Route Node.  This actually fits in with the earlier description I gave about Business Services and by extension Route Nodes they “… uses the request thread to send the request to the target”.

Call Out Threading Model

So what happens when you make a Service Callout to a Business Service from within a pipeline.  The documentation says that “The pipeline processor will block the thread until the response arrives asynchronously” when using a Service Callout.  What this means is that the target Business Service is called using the pipeline thread but the response is also handled by the pipeline thread.  This implies that the pipeline thread blocks waiting for a response.  It is the handling of this response that behaves in an unexpected way.

When a Business Service is called via a Service Callout, the calling thread is suspended after sending the request, but unlike the Route Node case the thread is not released, it waits for the response.  The muxer uses the Business Service Work Manager to allocate a thread to process the response, but in this case processing the response means getting the response and notifying the blocked pipeline thread that the response is available.  The original pipeline thread can then continue to process the response.

Second Wrinkle

This leads to an unfortunate wrinkle.  If the Business Service is using the same Work Manager as the Pipeline then it is possible for starvation or a deadlock to occur.  The scenario is as follows:

  • Pipeline makes a Callout and the thread is suspended but still allocated
  • Multiple Pipeline instances using the same Work Manager are in this state (common for a system under load)
  • Response comes back but all Work Manager threads are allocated to blocked pipelines.
  • Response cannot be processed and so pipeline threads never unblock – deadlock!

Solution

The solution to this is to make sure that any Business Services used by a Callout in a pipeline use a different Work Manager to the pipeline itself.

The Solution to My Problem

Looking back at my original workflow we see that the same Business Service is called twice, once in a Routing Node and once in a Response Pipeline Callout.  This was what was causing my problem because the response pipeline was using the Business Service Work Manager, but the Service Callout wanted to use the same Work Manager to handle the responses and so eventually my Response Pipeline hogged all the available threads so no responses could be processed.

The solution was to create a second Business Service pointing to the same location as the original Business Service, the only difference was to assign a different Work Manager to this Business Service.  This ensured that when the Service Callout completed there were always threads available to process the response because the response processing from the Service Callout had its own dedicated Work Manager.

Summary

  • Request Pipeline
    • Executes on Proxy Work Manager (WM) Thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
  • Route Node
    • Request sent using Proxy WM Thread
    • Proxy WM Thread is released before getting response
    • Muxer is used to handle response
    • Muxer hands off response to Business Service (BS) WM
  • Response Pipeline
    • Executes on Routed Business Service WM Thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
  • No Route Node (Echo functionality)
    • Proxy WM thread released
    • New thread from the default WM used for response pipeline
  • Service Callout
    • Request sent using proxy pipeline thread
    • Proxy thread is suspended (not released) until the response comes back
    • Notification of response handled by BS WM thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
      • Note this is a very short lived use of the thread
    • After notification by callout BS WM thread that thread is released and execution continues on the original pipeline thread.
  • Route/Callout to Proxy Service
    • Request Pipeline of callee executes on requestor thread
    • Response Pipeline of caller executes on response thread of requested proxy
  • Throttling
    • Request message may be queued if limit reached.
    • Requesting thread is released (route node) or suspended (callout)

So what this means is that you may get deadlocks caused by thread starvation if you use the same thread pool for the business service in a route node and the business service in a callout from the response pipeline because the callout will need a notification thread from the same thread pool as the response pipeline.  This was the problem we were having.
You get a similar problem if you use the same work manager for the proxy request pipeline and a business service callout from that request pipeline.
It also means you may want to have different work managers for the proxy and business service in the route node.
Basically you need to think carefully about how threading impacts your proxy services.

References

Thanks to Jay Kasi, Gerald Nunn and Deb Ayers for helping to explain this to me.  Any errors are my own and not theirs.  Also thanks to my colleagues Milind Pandit and Prasad Bopardikar who travelled this road with me.

About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today