Parallel Processing in ODI
By Christophe Dupupet on Nov 20, 2009
This post assumes that you have some level of familiarity with ODI. The concepts of Packages, Interfaces, Procedures and Scenarios are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.
ODI: Parallel Processing
A common question in ODI is how to run processes in parallel. When you look at a typical ODI package, all steps are described in a serial fashion and will be executed in sequence.
However, this same package can parallelize and synchronize processes if needed.
The first piece of the puzzle if you want to parallelize your executions is that a package can invoke other packages once they have been compiled into scenarios (the process of generation of scenarios is described later in this post). You can then have a master package that will orchestrate other scenarios. There is no limit as to how many levels of nesting you will have, as long as your processes are making sense: Your master package invokes a seconday package which, in turn invokes another package...
When you invoke these scenarios, you have two possible execution modes: synchronous and asynchronous.
A synchronous execution will serialize the scenario execution with other steps in the package: ODI executes the scenario, and only after its execution is completed, runs the next step.
An asynchronous execution will only invoke the scenario but will immediately execute the next step in the calling package: the scenario will then run in parallel with the next step. You can use this option to start multiple scenarios concurrently: they will all run in parallel, independently of one another.
Once we have started multiple processes in parallel, a common requirement is to synchronize these processes: some steps may run in parallel, but at times we will need all separate threads to be completed before we proceed with a final series of steps. ODI provides a tool for this: OdiWaitForChildSession.
An interesting feature is that as you start your different processes in parallel, they can each be assigned a keyword (this is just one of the parameters you can set when you start a scenario). When you synchronize the processes, you can select which processes will be synchronized based on a selection of keywords.
ADDING SCENARIOS TO YOUR PACKAGE FOR PARALLEL PROCESSING
To add a scenario to your package, simply drag and drop the generated scenario in the package, and edit the execution parameters as needed. In particular, remember to set the execution mode to Asynchronous.
You can generate a scenario from a package, from an interface, or from a procedure. The last two will be more atomic (one interface or one procedure only per execution unit). The typical way to generate a scenario is to right-click on one of these objects and to select Generate Scenario.
The generation of scenarios can also be automated with ODI processes that would invoke the ODI tool OdiGenerateAllScen. The parameters of this tool will let you define which scenarios are being generated automatically.
In all cases, scenarios can be found in the object tree, under the object they were generated from - or in the Operator interface, in the Scenarios tab.
While you are developing your different objects, keep in mind that you can Regenerate existing scenarios. This is faster than deleting existing ones only to re-create them with the same version number. To re-generate a scenario, simply right-click on the existing version and select Regenerate ... .
From an execution perspective, you can specify that the scenario you will execute is version -1 (negative one) to ensure that the latest version number is always the one executed. This is a lot easier than editing the parameters with each new release.
DISPLAYING PARALLEL PROCESSING
You will notice that as of 10.1.3.4, ODI does not graphically differentiate between serialized and parallelized executions: all are represented in a serial manner. One way to make parallel executions more visible is stack up the objects vertically, versus the more natural horizontal execution for serialized objects. (If we have electricians reading this, the layout will be very familiar to them, but this is only a coincidence...)
OTHER OBJECTS THAN SCENARIOS
Scenarios are not the only objects that will allow for parallel (or Asynchronous) execution. If you look at the ODI tool OdiOSCommand, you will notice a Synchronous option that will allow you to define if the external component you are executing will run in parallel with the current process, or if it will be serialized in your process. The same is true for the Data Quality tool OdiDataQuality.
As you will start running more processes in parallel, be ready to see more processes being executed concurrently in the Operator interface. If you are only interested in seing the master processes though, the Hierarchy tab will allow you to limit your view to parent processes. Children processes will be listed under the entry Childres Sessions under each session.
Likewise, when you access the logs from the web front end, you can view the Parent processes only.
Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.