Parallel Processing in ODI

This post assumes that you have some level of familiarity with ODI. The concepts of Packages, Interfaces, Procedures and Scenarios are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

ODI: Parallel Processing

A common question in ODI is how to run processes in parallel. When you look at a typical ODI package, all steps are described in a serial fashion and will be executed in sequence.

ParallelPackageSerial.PNG

However, this same package can parallelize and synchronize processes if needed.

PARALLEL PROCESSES

The first piece of the puzzle if you want to parallelize your executions is that a package can invoke other packages once they have been compiled into scenarios (the process of generation of scenarios is described later in this post). You can then have a master package that will orchestrate other scenarios. There is no limit as to how many levels of nesting you will have, as long as your processes are making sense: Your master package invokes a seconday package which, in turn invokes another package...

When you invoke these scenarios, you have two possible execution modes: synchronous and asynchronous.

ParallelScenario.PNG

A synchronous execution will serialize the scenario execution with other steps in the package: ODI executes the scenario, and only after its execution is completed, runs the next step.

An asynchronous execution will only invoke the scenario but will immediately execute the next step in the calling package: the scenario will then run in parallel with the next step. You can use this option to start multiple scenarios concurrently: they will all run in parallel, independently of one another.

SYNCHRONIZING PROCESSES

Once we have started multiple processes in parallel, a common requirement is to synchronize these processes: some steps may run in parallel, but at times we will need all separate threads to be completed before we proceed with a final series of steps. ODI provides a tool for this: OdiWaitForChildSession.

ParallelSynchronize.PNG

An interesting feature is that as you start your different processes in parallel, they can each be assigned a keyword (this is just one of the parameters you can set when you start a scenario). When you synchronize the processes, you can select which processes will be synchronized based on a selection of keywords.

ADDING SCENARIOS TO YOUR PACKAGE FOR PARALLEL PROCESSING

To add a scenario to your package, simply drag and drop the generated scenario in the package, and edit the execution parameters as needed. In particular, remember to set the execution mode to Asynchronous.

You can generate a scenario from a package, from an interface, or from a procedure. The last two will be more atomic (one interface or one procedure only per execution unit). The typical way to generate a scenario is to right-click on one of these objects and to select Generate Scenario.

The generation of scenarios can also be automated with ODI processes that would invoke the ODI tool OdiGenerateAllScen. The parameters of this tool will let you define which scenarios are being generated automatically.

In all cases, scenarios can be found in the object tree, under the object they were generated from - or in the Operator interface, in the Scenarios tab.

While you are developing your different objects, keep in mind that you can Regenerate existing scenarios. This is faster than deleting existing ones only to re-create them with the same version number. To re-generate a scenario, simply right-click on the existing version and select Regenerate ... .

From an execution perspective, you can specify that the scenario you will execute is version -1 (negative one) to ensure that the latest version number is always the one executed. This is a lot easier than editing the parameters with each new release.

DISPLAYING PARALLEL PROCESSING

You will notice that as of 10.1.3.4, ODI does not graphically differentiate between serialized and parallelized executions: all are represented in a serial manner. One way to make parallel executions more visible is stack up the objects vertically, versus the more natural horizontal execution for serialized objects. (If we have electricians reading this, the layout will be very familiar to them, but this is only a coincidence...)

ParallelPackageStackUp.PNG

OTHER OBJECTS THAN SCENARIOS

Scenarios are not the only objects that will allow for parallel (or Asynchronous) execution. If you look at the ODI tool OdiOSCommand, you will notice a Synchronous option that will allow you to define if the external component you are executing will run in parallel with the current process, or if it will be serialized in your process. The same is true for the Data Quality tool OdiDataQuality.

EXECUTION LOGS

As you will start running more processes in parallel, be ready to see more processes being executed concurrently in the Operator interface. If you are only interested in seing the master processes though, the Hierarchy tab will allow you to limit your view to parent processes. Children processes will be listed under the entry Childres Sessions under each session.

Likewise, when you access the logs from the web front end, you can view the Parent processes only.

Enjoy!

Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Comments:

Christophe - is there a way to encourage the interfaces to be run in a particular order? At runtime? I'd like to run a group of hundreds of interfaces in parallel, but submit them for execution largest to smallest. Won't know until runtime what the sizes of the underlying tables are. thx- lyn

Posted by Lyn Pratt on August 03, 2010 at 10:16 AM PDT #

The best way to do this will be to add some processing to be able to order the tables by size. I would create an admin table that lists the tables you are interested in, along with the size of the table (the easiest way to do this would be with a straight SQL query that would be in an ODI procedure) and the matching interface scenario names. Rather than graphically listing the interfaces top to bottom, create a separate ODI procedure. Create a single step in that procedure: - In the source tab, put the query that will return the interfaces in the appropriate order. Something like: select interface_name MYSCEN from mytable order by tablecount - In the target tab, use the ODI tool OdiStartScen. for the scenario name, use the alias from the query to set the name: OdiStartScen "-SCEN_NAME=#MYSCEN" "-SYNC_MODE=2" ... Set all the parameters as required. Odi will subsitute the names for you as needed.

Posted by Christophe Dupupet on August 31, 2010 at 01:15 AM PDT #

Hi Christophe, Your post was very similar to my requirement. We have created a procedure which has the ODI tools technology call of the ODIStartScen and having a variable which holds the dynamic value of Scenario name. When We use the command "OdiStartScen -SCEN_NAME='#PROJ.ETL_SCENARIO_NAME' -SCEN_VERSION=-1" it is unable to find the scenario name , though the variable holds a valid scenario name. Do you know, how to pass the ODI project variable to the ODITools command.. Thanks, Mallesh

Posted by guest on May 09, 2011 at 02:25 AM PDT #

Mallesh, Make sure that your variable is declared in the package before it is used in your ODITool. It should definitely work. More details on passing parameters are available here: http://blogs.oracle.com/dataintegration/entry/using_parameters_in_odi_the_dy_1 My best -Christophe

Posted by guest on May 09, 2011 at 02:26 AM PDT #

Hi All,

Here I am calling a package in a particular context. Within that package another package is being called (kind of hierarchy). So in my child package I am trying to get the context name using odiRef.getContext in a variable which was used to call the parent package. But the variable does not gets any value.

The query being used is as given below ( in the refreshing tab of the ODI project variable):
select replace(”, ‘”‘, ”) from dual

My requirement is to get the execution context name of the parent package(e.g. PKJ_1) in a child package(PKJ_2). This child PKJ_2 is being called in the PKJ_1 (parent) by placing the scenario of PKJ_2 in PKJ_1.

I have used the given variable in a simple package to get the context name and it is working fine there. But When i try to use that variable in the child package instead of the parent then that variable remains empty. I donot want to pass the context name as parameter from the parent to the child package execution.

Immediate pointers would be of great help to me.

Thanks,
Kalyan

Posted by Kalyan on June 14, 2011 at 02:19 PM PDT #

Kaylan,

Unless you pass the variable as a parameter to the child process, the values set in the parent process will not be accessible to the child process.

Another approach could be to retrieve the context value in the child process - unless you explicitly change the context when executing the child process, parent and child are sharing the same context

Posted by Christophe Dupupet on June 28, 2011 at 09:02 AM PDT #

I am invoking a Scenario of a package within a package. Is it possible to pass parameters back to the calling package like the session no generated by the invoked Scenario? If not, how can the calling package see the session no of the child process?

Posted by David on February 04, 2012 at 07:23 PM PST #

David,

That is a very good question... Variables are owned by the pacakges, so you can easily pass a variable from a parent to a child (it is just a parameter in the end) but sending information the other way is a little bit more tricky. I can think of 2 possible approaches here:
1st approach:
- Create a table to store the information you want to share across packages (child to parent, or even across siblings).
- Write the value of the information you want to send back in that table
- in the parent scenario, do a "refresh variable" to retrieve the value from the table.
You can make the table more complete than just a list to make sure that you retrieve the proper value (by adding timestamps, name of the process that writes a session ID, etc).

2nd approach:
Instead of using ODIstartScen, take advantage of the fact that all ODI scenarios can be invoked as web services and use this approach... because it gives you access to the session number.
For an example of how you could leverage this approach, look at the script startscenremote.sh that comes with the installation of the agent.
The documentation on how to invoke ODI scenarios as a web service is available here: http://docs.oracle.com/cd/E21764_01/integrate.1111/e12643/running_executions.htm#BABDHJJF (or look for "20.11 Managing Executions Using Web Services" in the "ODI Developer's Guide"

I hope this will help!
-Christophe

Posted by Christophe on February 06, 2012 at 08:29 AM PST #

Christophe,

Thank you very much. We now have 2 options to work with though we are leaning towards the 1st option coz we have already started coding using the ODIstartscen approach. The 2nd option is something we could explore going forward.

Thanks,

David

Posted by David on February 08, 2012 at 07:55 AM PST #

Hi Christophe,

Can we load all data from single source file (records more than 10Million) to single target table with 5 interface? So that each interface can load 2million to same target.This is a insert/update process. Please suggest if any feasible solution is there.I dont have any sequenece in my source file.I want to run this 5 interface parallely.
Thank You

Posted by Piyush on May 16, 2012 at 07:57 AM PDT #

Piyush,

The challenge with an insert/update is that the database will serialize the processes. If you have 5 processes, then you basically have 5 connections to the database. When the process running on the first connection is doing updates on that table the database will lock the table... and all other 4 processes are queued. In the end that will not be any faster than waiting for the whole file to be processed.

If you were to do inserts only, then launching 5 interfaces would be possible - depending on which LKM you are using to load the file, you may have to modify the KM to enable a filtering mechanism. Short of a sequence number, you could use any field that has a distribution that can be used for a somewhat even distribution of data.

I hope this helps
-Christophe

Posted by Christophe on May 16, 2012 at 12:45 PM PDT #

Running multiple parallel inserts might work if the separate inserts each write to different partitions of a partitioned table, assuming the locking deosn't escalate beyond partition level to the whole table (it all depends on your dbms, I guess).

Posted by guest on February 06, 2013 at 05:58 AM PST #

Sorry I have a little issue :(. I need your help

Suppose i have package SUMVALUES that use the following variables:
- values1 ( input )
- values2 ( input)
- SUMVALUES

I want create a package that calculates the sum of values into SUMVALUES and Then insert the calculates value into a table.

I would like execute package in parallel mode usino different values.
Is possibile? Value of variable is overwritten or is local to execution?

Posted by Alberto on March 20, 2014 at 06:35 AM PDT #

Alberto,

Each session (each parallel package is a separate session) will have its own instance of the variables. The only thing to pay attention to will be the insertion in the target table: if all jobs process the data in parallel, you will have to make sure that the database will allow these operations to be executed in parallel.
Hope this helps
-Christophe

Posted by guest on March 20, 2014 at 10:34 AM PDT #

Thanks but my question is different.
I would like know if I run two instance of same package with different value of input variables, if the package use a different variable or if the package use the same variable.

The variable is allocated at run time? Every session use a different variable with same name?

Alberto

Posted by Alberto on March 20, 2014 at 11:13 AM PDT #

Yes, each package instantiates the variable independently of the others, so each package will use independent values of the variables.

Posted by guest on March 21, 2014 at 05:49 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today