X

The Integration blog covers the latest in product updates, best practices, customer stories, and more.

Bulk Recovery of Fault Instances

Viswanatha Basavalingappa
Consulting Member of Technical Staff

One of the most common requirements of enterprise integration is error management. It is critical for customers to manage recoverable errors in a seamless and automated fashion.

What are Recoverable Fault Errors?

All faulted instances in asynchronous flows in Oracle Integration Cloud Service are recoverable and can be resubmitted. Synchronous flows cannot be resubmitted. You can resubmit errors in the following ways:

  • Single failed message resubmissions
  • Bulk failed message resubmissions

Today operator can manually resubmit failed messages individually from the integration console monitor dashboard. 

In this blog we are going to focus on how to create an integration flow that can be used to auto resubmit faulted instances in bulk. 

Here are the High Level Steps 

Here are the steps to create an integration flow to implement the automated bulk recovery of errors.

Note we also provide a sample that is available for download.

STEP 1: Create New Scheduled Orchestration Flow 

STEP 2: Add Schedule Parameters 

It is always a good practice to parametrize the variable so you can configure the flow based on business need by overriding them, here are the schedule parameters configured in this bulk resubmit fault instances integration sample.

  • strErrorQueryFilter : fault query filter parameter.
    • This defines which error instances are to be selected for recovery.
    • Valid values:  timewindow: 1h, 6h, 1d, 2d, 3d, RETENTIONPERIOD. Default is 1h.
    • code: integration code version: integration version
    • id: error id(instance id)
    • primaryValue: value of primary tracking variable
    • secondaryValue: value of secondary tracking variable
    • See API documentation.
  • strMaxBatchSize: Maximum number of error instances to resubmit per run. (default 50)
    • This limits the number of recovery requests to avoid overloading the system.
  • strMinBatchSize: Minimum number of error instances to resubmit per run. (default 2)
    • This defers running the recovery until the given number of errors have accumulated.
  • strRetryCount:  Maximum number of retry attempts an individual error instance. (default 3)
    • This prevents repeatedly resubmitting a failed instance.
  • strMaxThershold: Threshold number of errors to abort recovery and notify user. (default 500)
    • This allows resubmission to be ignored if an excessive number of errors have been detected, indicating that some sort of user intervention may be required.

STEP 3: Update the Query Filter to Include only Recoverable Errors

concat(concat("{",$strErrorQueryFilter,",recoverable:'true'"),"}")

STEP 4: Query All Recoverable Error Instances in the System matching the Query Filter

GET /ic/api/integration/v1/monitoring/errors?q=strErrorQueryFilter

STEP 4: Determine Recovery Action

STEP 4a: if Total Recovery Error Instances Found is more than Max Threshold (totalResults > strMaxThershold) then Send a Notification. In this case there may be too many errors, indicating a more serious problem, it is best practice to review manually and once the issue is fixed to temporarily override the strMaxThershold value to allow recovery of failed instances.

STEP 4b: else if No Recovery Error Instances Found (totalResults <= 0) then End Flow.

STEP 4c: else Continue to resubmit strMaxBatchSize Found Errors in a single batch.

NOTE: We limit the number of errors re-submitted in a single batch to avoid overloading the system, we suggest a limit of 50 instances.

STEP 5: Query Recovery Errors (limit to Batch Size)

GET /ic/api/integration/v1/monitoring/errors?q=strErrorQueryFilter&limit=strMaxBatchSize&offset=0

STEP 6 Filter Results to Avoid too Many Retries

STEP 6a: if totalResults found <  strMinBatchSize , then skip the batch re-submit and stop the flow

STEP 6b: else if totalResults > strMinBatchSize, then Invoke REST API to submit fault errors IDs Bulk Re-submit Error API.

Here we can filter out the Fault Instance that are already retry but did fail again,  as shown below 

- Drag and Drop For each items

- Add if function from Mapper on top of items

- Add <= condition element

- Add Left Operator = retryCount from source 

- Add Right Operator = strMaxRetryAttempt from variable

retryCount < = $strMaxRetryAttempt

STEP 7: Resubmit Errors

POST - /ic/api/integration/v1/monitoring/errors/resubmit

 

STEP 8:  Check `resubmitRequested` = true, / false, 

STEP 9:  Send Email Notification with Recovery Submit status details as below shown

(Optional): User can model the integration to invoke a process (using OIC process capability for human interaction and long running tasks) or take any action based on re-submit response via a child flow, or other 3rd party Integration. This may be to post the re-submit information to some system for future analysis/ review. One can utilize the local invoke feature to model the parent to child flow hand off.

STEP 9:  Activate the Integration,

STEP 10: Schedule the Integration to Run on every X period of time

One can also run OnDemand with the option of SubmitNow 

Email Notification

Here is the Email Notification one would receive

Case1: When Bulk Resubmit success  email is sent as below (Sample).

Case2: When Too Many Fault Instance and Alert Email Sent as below (Sample).

Ok, by now you have completed Development of Integration and schedule to run on your Integration Cloud Instances.

How to Customize your Integration to Run Recovery for a Specific Integration or Connection

Because different integration or error types may have different recovery requirements you may want to have different query parameters and/or scheduled intervals.

For this you need to clone the above Integration and override schedule parameter to query only specific fault Instance for a given Integration or Connection Type based on query filter. so you can keep separate instance running for a specific business use case.

Here is how you do it:

STEP1 - Clone the above Integration.

STEP2. Update the Schedule Parameter strErrorQueryFilter

timewindow : '3d', code : 'SC2RNSYNC', version : '01.00.0000'
code : 'SC2RNSYNC', version : '01.00.0000', connection :'EH_RN'
timewindow : '3d', primaryValue : 'TestValue'

You may also want to modify other parameters or even to modify the integration to take alternative actions.

STEP3: Schedule to Run

This will give you ability to config the bulk re-submit for given set of integration level or connection level.

Sample Integration (IAR)  -  Download Here

Summary

This blog explained how to automatically resubmit errored instances, allowing control of rate of recovery, type of errors to recover and showed how to customize the recovery integration through cloning and modifying parameters.

We hope that you find this a useful pattern for your integrations.

Thanks You!

 

 

Join the discussion

Comments ( 1 )
  • Manish Friday, August 16, 2019
    Great Article !!! Thanks for writing.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.