X

The Integration blog covers the latest in product updates, best practices, customer stories, and more.

Archiving and Purging Process Automation Data in Oracle Integration

Arvind Venugopal, and Nilakshi Soni

Archive Process data in Oracle Integration

Archiving is taking data which is no longer actively used and storing it in an external secondary storage where it can be retained for extended time. This archived data can be accessed, if needed, for compliance, audit, and analytics. Archiving the right data can not only save businesses money, but also intelligence derived using this will add value to the business. The following questions are often asked when users are setting up archive: what all data should be archived, how to setup archiving and finally how to retrieve data from the archived file. The purpose of these blogs is to discuss couple of options which will answer these very questions.
When planning an archiving strategy best suited for your needs one must give thought to some basic questions like

  • how often you would need to retrieve data from the archive
  • what data needs to be available for legal and regulatory compliance
  • what data set could be analyzed to add value to the business over time etc.
  • what insights are relevant for your business

What data is generated by Process in Oracle Integration

Large volumes of data are generated during the life of each process instance. This data collected can be broadly divided into two. Instance data and Analytics data.

  • Instance Data:    This is instance specific data generated at each step during execution. It includes the payload, comments, execution and audit information.  
  • Analytics Data:   Additional system metrics and user defined business metrics are also collated and stored at predefined events during the lifecycle of an instance. This data exists independently of the instance data

Purge in Oracle Integration

To keep Oracle Integration lightweight agile and performant, completed instances will be purged 7 days after they are completed. If completed instances need to be retained for longer, it is recommended that the instance data be archived and stored in secondary storage. Based on the business requirements and regulatory compliances, it may be necessary for the instance data to be available in the production environment for longer. In that case the 7 days retention period may be increased. But before changing this value it is imperative to understand why this data is being retained and the security, space and performance impact. 

Purge:  The archive and purge utilities are independent. When configuring the purge retention time, one must budget enough time to ensure that the data is archived before it gets removed. If the number of instances generated are large and payload is huge, additional time will be needed to complete the archive using the OOB utility. Inactive instances are eligible for purge based on their State. Please refer to the table below for more details. 

Category

State

Eligible for Purge after

Open

OPEN

Not Purged

Close

COMPLETED

Completion Date + Retention days 

 

ABORTED 

Abort Date + Retention days 

 

SUSPENDED

Not Eligible

 

CANCELED

Completion Date + Retention days 

 

RETIRE

Completion Date + Retention days 

Error 

FAULTED_RECOVERABLE

Not eligible  

 

ERRORED 

Error Date + Retention days 

 

Archiving instances using out of the box feature

Oracle Integration comes with a rich Out of the Box Archive utility. A pre-requisite to archiving is to ensure that the connection to the storage service is configured as all the archive files are stored here and can be accessed by end users using the storage service REST libraries. The following blog describes how to create and setup the storage service.  While setting up archive the user can configure the following:

  • what data should be archived?  instance or analytics
  • when should the archive start?
  • the maximum duration for the archive to run
  • the archive contents.
  • the email address to be notified if the archive fails

The archive job is enabled at the scheduled time and will do the following

  • Identify the pointer where the last archive run stopped.
  • Retrieve the next set of instances to be archived. The default set size is 10 instances.
  • Retrieve the content details for these instances
  • Create the set of xml files with the archive content and add to staging area
  • If archive content in staging area has reached the size threshold, push it to the storage service
  • Check if the maximum duration for archive run is exceeded, if not repeat

Step by Step directions to enable archive

Step 1: Configure the storage service
Navigate to OIC Home – My Tasks – Administration – Services – Infrastructure

•    Enter the URL for the storage service
•    User friendly container name
•    User ID and Password to access the service
•    Test the connection. If successful, save and proceed

Step 2: Enable and schedule archive  

Navigate to OIC Home – My Tasks – Administration – Archive and Purge – Schedule Instances Archive  

Step 3: Enable and schedule archive and purge for analytics data

  • Follow same steps as above to schedule archive and purge for analytics data

  • The analytics data is usually used for analysis and comparison over a longer duration. Hence, the lifecycle of this data is defined independently.

•    Enable Archive by clicking on the checkbox
•    Enter the frequency and time for the archive run to start. In the above example it will run every day at 06:00 UTC
•    Configure the archive content. In the above example Audit Payload and Task History will be archived. Note: As the archive content increases, the size of the archive and time taken to generate it will increase too. Archiving the audit image adds a large overhead and should be avoided as much as possible.
•    Enter the maximum time for each archive run. In the above example it is set to 90 mins
•    Enter Failure notification email address
•    The purge retention days can also be updated here. In the above example it is set to 200 days

Archiving instances using REST APIs

The out of the box archive utility is sturdy and feature rich. But some users prefer to have more control over what data is archived and the format of the archive. This is possible by extracting the data using REST APIs and storing it in the preferred format like json, xml, pdf etc.
Advantages
•    Archiving can be restricted to instances associated with specific applications
•    End users can extract the relevant data, groom it, aggregate and summarize it before either creating an archive file or uploading it directly to their analytics or BI database.
•    The archive file can be created in any preferred format like pdf, xml, json etc.
•    If data extracted is stored in json it can be later retrieved and displayed using UI Snippet widgets available as part of OIC. This will then have the same look and feel as in OIC
•    Additional dashboards can be created, and metrics analyzed to discover trends and patterns etc.

REST APIs available to retrieve data

Processes

1

Process List

ic/api/process/v1/processes

2

Process Detail

ic/api/process/v1/processes/<instance ID>

3

Payload

ic/api/process/v1/processes/<instance ID>/payload

4

Comments

ic/api/process/v1/processes/<instance ID>/comments

5

Audit

ic/api/process/v1/processes/<instance ID>/audit

6

Attachments

ic/api/process/v1/processes/<instance ID>/attachments

7

Conversation

ic/api/process/v1/processes/<instance ID>/conversation

 

Human Task

1

Task Details

ic/api/process/v1/tasks/<Task ID>

2

Payload

ic/api/process/v1/tasks/<Task ID>/payload

3

Comments

along with process comments

4

Attachments

along with process attachments

5

History

ic/api/process/v1/tasks/<Task ID>/history

 

Sample Steps

  1. Retrieve list of completed process instances to be archived - GET https://<Instance URL>/ic/api/process/v1/processes?assignment=ALL&processState=COMPLETED
  2. Parse the response received from Step 1 to get list of process instances to be archived. Repeat steps below for each instance to be archived
  3. Get instance details - GET https://<Instance URL>/ic/api/process/v1/processes/<instanceID>
  4. Get instance audit details - GET https://<Instance URL>/ic/api/process/v1/processes/<instanceID>/audit
  5. Parse the response to get list of human tasks and their task IDs and repeat steps below for each human task
  6. Get Task Details - GET https://<Instance URL>/ic/api/process/v1/tasks/<taskID>
  7. Get Task payload - GET https://<Instance URL>/ic/api/process/v1/tasks/<taskID>/payload
  8. Get Task history - GET https://<Instance URL>/ic/api/process/v1/tasks/<taskID>/history

Using archive data

The next article in this series will deep dive into this topic.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.