Friday Nov 20, 2009

Parallel Processing in ODI

This post assumes that you have some level of familiarity with ODI. The concepts of Packages, Interfaces, Procedures and Scenarios are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

ODI: Parallel Processing

A common question in ODI is how to run processes in parallel. When you look at a typical ODI package, all steps are described in a serial fashion and will be executed in sequence.


However, this same package can parallelize and synchronize processes if needed.


The first piece of the puzzle if you want to parallelize your executions is that a package can invoke other packages once they have been compiled into scenarios (the process of generation of scenarios is described later in this post). You can then have a master package that will orchestrate other scenarios. There is no limit as to how many levels of nesting you will have, as long as your processes are making sense: Your master package invokes a seconday package which, in turn invokes another package...

When you invoke these scenarios, you have two possible execution modes: synchronous and asynchronous.


A synchronous execution will serialize the scenario execution with other steps in the package: ODI executes the scenario, and only after its execution is completed, runs the next step.

An asynchronous execution will only invoke the scenario but will immediately execute the next step in the calling package: the scenario will then run in parallel with the next step. You can use this option to start multiple scenarios concurrently: they will all run in parallel, independently of one another.


Once we have started multiple processes in parallel, a common requirement is to synchronize these processes: some steps may run in parallel, but at times we will need all separate threads to be completed before we proceed with a final series of steps. ODI provides a tool for this: OdiWaitForChildSession.


An interesting feature is that as you start your different processes in parallel, they can each be assigned a keyword (this is just one of the parameters you can set when you start a scenario). When you synchronize the processes, you can select which processes will be synchronized based on a selection of keywords.


To add a scenario to your package, simply drag and drop the generated scenario in the package, and edit the execution parameters as needed. In particular, remember to set the execution mode to Asynchronous.

You can generate a scenario from a package, from an interface, or from a procedure. The last two will be more atomic (one interface or one procedure only per execution unit). The typical way to generate a scenario is to right-click on one of these objects and to select Generate Scenario.

The generation of scenarios can also be automated with ODI processes that would invoke the ODI tool OdiGenerateAllScen. The parameters of this tool will let you define which scenarios are being generated automatically.

In all cases, scenarios can be found in the object tree, under the object they were generated from - or in the Operator interface, in the Scenarios tab.

While you are developing your different objects, keep in mind that you can Regenerate existing scenarios. This is faster than deleting existing ones only to re-create them with the same version number. To re-generate a scenario, simply right-click on the existing version and select Regenerate ... .

From an execution perspective, you can specify that the scenario you will execute is version -1 (negative one) to ensure that the latest version number is always the one executed. This is a lot easier than editing the parameters with each new release.


You will notice that as of, ODI does not graphically differentiate between serialized and parallelized executions: all are represented in a serial manner. One way to make parallel executions more visible is stack up the objects vertically, versus the more natural horizontal execution for serialized objects. (If we have electricians reading this, the layout will be very familiar to them, but this is only a coincidence...)



Scenarios are not the only objects that will allow for parallel (or Asynchronous) execution. If you look at the ODI tool OdiOSCommand, you will notice a Synchronous option that will allow you to define if the external component you are executing will run in parallel with the current process, or if it will be serialized in your process. The same is true for the Data Quality tool OdiDataQuality.


As you will start running more processes in parallel, be ready to see more processes being executed concurrently in the Operator interface. If you are only interested in seing the master processes though, the Hierarchy tab will allow you to limit your view to parent processes. Children processes will be listed under the entry Childres Sessions under each session.

Likewise, when you access the logs from the web front end, you can view the Parent processes only.


Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Friday Jun 12, 2009

Using an ODI Procedure to Loop Through a Command

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Procedure, Command and Logical Architecture are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

In this article we will focus on using the Command on Target and Command on Source tab of a Command in an ODI procedure.

It is not uncommon to have to execute a specific command for each value returned by a select statement. For example you might be willing to send an email as part of your integration process to a specific list of users maintained in a database table.
You might also receive several zip files that have to be extracted before being processed by ODI. An ODI procedure can help you loop through a list of files and start the extraction process.

If we want to implement the email example we will need to:
- Reverse-Engineer the database table in ODI.
- Create an ODI procedure.
- Add a Command.
- Use the Command on Source tab to execute a select statement on a table to retrieve all the email addresses.
- Use the Command on Target tab to execute an OdiSendMail process for each email address returned by the select statement.

Step 1. Create the ODI Procedure

Expand your ODI project then expand a folder and right-click on the Procedures node. Select Insert Procedure in the menu.


You can pick any name for the procedure, we will use Send Email to Mailing List Users in this example.
You don't need to modify the other parameters.


Now that the procedure is created, we will add a command.

Step 2. Add a Command

Go to the Details tab and click on the grid button to create a new Command in the procedure.


A new window will appear you can specify any name for this Command. We will use Email Step in this example.

Step 3. Define a Command on Source

In the Command on Source we want to execute a select statement on a database table to retrieve a list of email addresses.

In this example the email addresses are stored in the MAILING_LIST table in an Oracle schema called STAGING.

The table can be created easily using the following code:

To define the Command on Source implementation, click on the Command on Source tab.


Set the technology to Oracle.
Set the Schema to the logical schema that hosts your MAILING_LIST table.
You don't need to modify the other parameters.
The select statement we want to use is the following:
select EMAIL email_address from STAGING.MAILING_LIST

EMAIL is the column storing the email addresses and email_address is the alias we will use in the Command on Target to refer to them.

As we are following the ODI Best Practices we do not want to hard-code the schema name STAGING in our query. To avoid this we will use the getObjectName substitution method and let ODI complete the table name with the schema name at runtime:
select EMAIL email_address from <%=odiRef.getObjectName("L","MAILING_LIST","D")%>

Refer to the ODI documentation for additional information regarding the substitution methods.

You should now have the following:


We are done with the Command on Source, we will now define the Command on Target.

Step 4. Define a Command on Target

In the Command on Target tab we will use the OdiSendMail tool to send an email to the email addresses retrieved from the command in the Command on Source tab.

To define the Command on Target implementation, click on the Command on Target tab.


Set the technology to Sunopsis API.
You don't need to modify the other parameters.

We will use the following command:
OdiSendMail -MAILHOST= -FROM= "-TO=#email_address" "-SUBJECT=log and bad files" -ATTACH=c:/temp/log.txt
Please find attached the log file...

We are referring to the alias email_address defined in the Command on Source tab prefixed by #: #email_address. This gives us access to the email addresses retrieved by the select statement in the Command on Source.

You will have to modify the MAILHOST and FROM parameters according to your SMTP server settings otherwise you will not be able to send any emails.
The SUBJECT and ATTACH parameter can be modified as well as the text.

Refer to the ODI documentation for additional information regarding OdiSendMail and the other ODI tools.

Note: You can use any other ODI tools or technologies in the Command on Source and the Command on Target.


The procedure is now complete we can click on Execute to start it and follow its execution in Operator.

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« July 2016