November 20, 2009

Parallel Processing in ODI

This post assumes that you have some level of familiarity with ODI. The concepts of Packages, Interfaces, Procedures and Scenarios are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

ODI: Parallel Processing

A common question in ODI is how to run processes in parallel. When you look at a typical ODI package, all steps are described in a serial fashion and will be executed in sequence.

ParallelPackageSerial.PNG

However, this same package can parallelize and synchronize processes if needed.

PARALLEL PROCESSES

The first piece of the puzzle if you want to parallelize your executions is that a package can invoke other packages once they have been compiled into scenarios (the process of generation of scenarios is described later in this post). You can then have a master package that will orchestrate other scenarios. There is no limit as to how many levels of nesting you will have, as long as your processes are making sense: Your master package invokes a seconday package which, in turn invokes another package...

When you invoke these scenarios, you have two possible execution modes: synchronous and asynchronous.

ParallelScenario.PNG

A synchronous execution will serialize the scenario execution with other steps in the package: ODI executes the scenario, and only after its execution is completed, runs the next step.

An asynchronous execution will only invoke the scenario but will immediately execute the next step in the calling package: the scenario will then run in parallel with the next step. You can use this option to start multiple scenarios concurrently: they will all run in parallel, independently of one another.

SYNCHRONIZING PROCESSES

Once we have started multiple processes in parallel, a common requirement is to synchronize these processes: some steps may run in parallel, but at times we will need all separate threads to be completed before we proceed with a final series of steps. ODI provides a tool for this: OdiWaitForChildSession.

ParallelSynchronize.PNG

An interesting feature is that as you start your different processes in parallel, they can each be assigned a keyword (this is just one of the parameters you can set when you start a scenario). When you synchronize the processes, you can select which processes will be synchronized based on a selection of keywords.

ADDING SCENARIOS TO YOUR PACKAGE FOR PARALLEL PROCESSING

To add a scenario to your package, simply drag and drop the generated scenario in the package, and edit the execution parameters as needed. In particular, remember to set the execution mode to Asynchronous.

You can generate a scenario from a package, from an interface, or from a procedure. The last two will be more atomic (one interface or one procedure only per execution unit). The typical way to generate a scenario is to right-click on one of these objects and to select Generate Scenario.

The generation of scenarios can also be automated with ODI processes that would invoke the ODI tool OdiGenerateAllScen. The parameters of this tool will let you define which scenarios are being generated automatically.

In all cases, scenarios can be found in the object tree, under the object they were generated from - or in the Operator interface, in the Scenarios tab.

While you are developing your different objects, keep in mind that you can Regenerate existing scenarios. This is faster than deleting existing ones only to re-create them with the same version number. To re-generate a scenario, simply right-click on the existing version and select Regenerate ... .

From an execution perspective, you can specify that the scenario you will execute is version -1 (negative one) to ensure that the latest version number is always the one executed. This is a lot easier than editing the parameters with each new release.

DISPLAYING PARALLEL PROCESSING

You will notice that as of 10.1.3.4, ODI does not graphically differentiate between serialized and parallelized executions: all are represented in a serial manner. One way to make parallel executions more visible is stack up the objects vertically, versus the more natural horizontal execution for serialized objects. (If we have electricians reading this, the layout will be very familiar to them, but this is only a coincidence...)

ParallelPackageStackUp.PNG

OTHER OBJECTS THAN SCENARIOS

Scenarios are not the only objects that will allow for parallel (or Asynchronous) execution. If you look at the ODI tool OdiOSCommand, you will notice a Synchronous option that will allow you to define if the external component you are executing will run in parallel with the current process, or if it will be serialized in your process. The same is true for the Data Quality tool OdiDataQuality.

EXECUTION LOGS

As you will start running more processes in parallel, be ready to see more processes being executed concurrently in the Operator interface. If you are only interested in seing the master processes though, the Hierarchy tab will allow you to limit your view to parent processes. Children processes will be listed under the entry Childres Sessions under each session.

Likewise, when you access the logs from the web front end, you can view the Parent processes only.

Enjoy!

Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

November 10, 2009

Oracle GoldenGate Downloads now Available on OTN

If you listened to our webcast today: Introducing Oracle GoldenGate, Real-time Data Integration and Continuous Availability with Hasan Rizvi and Juan Loaiza, you heard us mention about some of the key performance, heterogeneity and reliability benefits.

Now you can download the Oracle GoldenGate software here on our website and check it out for yourself. We've included some of the common platforms and Oracle sources and targets to get you started, but as we pointed out in our I-seminar we have support for many other sources and target environments:

Click here for more information about Oracle GoldenGate.

November 9, 2009

The Benefits of ODI Knowledge Modules: a Template Approach for Better Data Integration

This post assumes that you have some level of familiarity with ODI. The concepts of Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..

At the core, ODI knowledge modules are templates of code that will be leveraged for data integration tasks: they pre-define data integration steps that are required to extract, load, stage - if needed - and integrate data.

Several types of Knowledge Modules are available, and are grouped in families for Loading operations (LKMs), Integration operations (IKMs), Data Control operations (CKMs), and more.

For a more detailed description of what a knowledge module is, simply picture the multiple steps required to load data from a flat file into a database. You can connect to the file using JDBC, or leverage the database native loading utility (sqlldr for Oracle, bcp for SQL Server or Sybase, load for db2, etc.). External tables are another alternative for databases that support this feature.
As you use one or the other technique, you may first want to stage the data before loading your actual target table; in other cases, staging will only slow down your data load.

As far as the integration in the target system is concerned, again multiple strategies are available: simple inserts, inserts and updates, upserts, slowly changing dimension... these techniques may be as simple as one step, or be a complex series of commands that must be issued to your database for proper execution.

The Knowledge Modules will basically list these steps so that a developer who needs to repeat the same integration pattern only has to select the appropriate templates, versus re-developing the same logic over and over again.

The immediate benefits of this approach are well known and well documented:
- All developers use the same approach, and code development is consistent across the company, hence guarantying the quality of the code
- Productivity is greatly improved, as proven path are re-used versus being re-developed
- Code improvement and modification can be centralized and has a much broader impact: optimization and regulatory changes are done once and inherited by all processes
- Maintenance is greatly simplified

To fully appreciate all the benefits of using knowledge Modules, there is a lot more that needs to be exposed and understood about the technology. This post is a modest attempt at addressing this need.

GENERATION OF CODE AND TRANSFORMATIONS

Most tools today will offer the ability to generate SQL code (or some other type of code, such as scripts) on your source or target system. As most products come with a transformation engine, they will also generate proprietary code for this engine where data is staged (I'll skip the debate here as to whether a transformation engine is a staging area or not - the point being that code can be generated on either source, "middle-tier" or target).

However, real life requirements are rarely either/or. Often times, it makes sense to leverage all systems to optimize the processing: spread out the load for the transformations, reduce the amount of data to be transferred over the network, process the data where it is versus moving the data around solely for the purpose of transformations.

To achieve this, Data Integration tools must be able to distribute the transformation logic across the different systems.

KMCodeExecution.PNG

Only ODI will effectively generate code and transformations on all systems. This feature is only possible thanks to the KM technology.

Beyond the ability to generate code, you have to make sure that the generated code is the best possible code for the selected technology. Too often, tools first generate code that is then translated for the appropriate database. With the KMs technology, no translation is required: the generated code was initially conceived explicitly for a given technology, hence taking advantage of all the specifics of this technology.

And since the KMs are technology specific, there is no limit to what can be leveraged on the databases, including user defined functions or stored procedures.

KMCodeGeneration.PNG

CODE ADAPTABILITY

Whenever a tool generates code, the most common complaint is that there is very little (if any) control over the generated result. What if a simple modification of the code could provide dramatic performance improvements? Basic examples would include index management, statistics generation, joins management, and a lot more.

The KM technology is open and expansible so that developers have complete control over the code generation process. Beyond the ability to optimize the code, they can extend their solution to define and enforce in house best practices, and comply with corporate, industry or regulatory requirements. KMs Modifications are done directly from the developers graphical interface.

One point that can easily be adapted is whether data have to be materialized throughout the integration process. Some out-of-the-box KMs will explicitly land data in a physical file or tables. Others will avoid I/Os by leveraging pipes instead of files, views and synonyms instead of tables. Again, developers can adapt the behavior to their actual requirements.

EXPANDING THE TOOL TO NEW TECHNOLOGIES

How much time does it take to adapt your code to a new release of your database? How much time does it take to add a new technology altogether? In both cases, KMs will provide a quick and easy answer.

Let us start with the case of a new version of the database. While our engineering teams will release new KMs as quickly as possible to take advantage of the latest releases of any new database, you do not have to wait for them. A new release typically means new parameters for your DDL and DML, as well as new functions for your existing transformations. Adapt the existing KMs with the features you need, and in minutes your code is ready to leverage the latest and greatest of your database.

Likewise, if you ever need to define a new technology that would not be listed by ODI (in spite of the already extensive list we provide), simply define the behavior of this technology in the Topology interface, and design technology specific KMs to take advantage of the specific features of this database. I can guaranty you that 80% of the code you need (at least!) is already available in an existing KM... Thus dramatically reducing the amount of effort required to generate code for your own technology.

ARE KM MODIFICATIONS REQUIRED?

I am a strong advocate of the customization of KMs: I like to get the best I can out of what I am given. But often times, good enough is more than enough. I will always remember trying to optimize performance for a customer: we did not know initially what our processing window would be - other than "give us your best possible performance". The first out-of-the-box KM we tried processed the required 30,000,000 records in 20 minutes. Due to IT limitations, we could only leverage lesser systems for faster KMs... but still reduced performance to 6 minutes for the same volume of data. We started modifying KMs to get even better results, when the customer admitted that we actually had 3 hour for the process to complete... At this point, spending time in KM modifications was clearly not needed anymore.

KMs are meant to give the best possible performance out of the box. But every environment is unique, and assuming that we can have the best possible code for you before knowing your own specific challenges would be an illusion - hence the ability to push the product and the code to the limit

Another common question is: do you have to leverage both source and target systems as part of your transformations? Clearly, the answer is no. But in most cases, it is crucial to have the flexibility to leverage all systems, versus being cornered in using only one of them. Over time, you will want to reduce the volume of data transferred over the network; you will want to distribute some of your processing... all more reasons to leverage all available engines in your environment.

Do not hesitate and share with us how you extend your KMs!

Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

November 2, 2009

Live Webcast: How Your Business Can Profit from Oracle GoldenGate

gg_header.jpg

Through its recent acquisition of GoldenGate Software, Oracle has accelerated its product strategy to deliver a comprehensive data integration platform and increase uptime for customers' mission-critical systems. But how can your business profit from Oracle GoldenGate products - and how do they complement your existing Oracle investments?

Join us for this live Webcast hosted by Hasan Rizvi, Senior Vice President for Oracle Fusion Middleware Product Development and Juan Loaiza, Senior Vice President Systems Technology, to get the answers!

Register now for this event

You'll learn how you can apply Oracle GoldenGate's real-time data integration and heterogeneous data replication capabilities to:

  • Improve business insight through real-time integrations to business intelligence and data warehousing
  • Maximize data availability through multi-master database configurations
  • Synchronize and distribute data using bi-directional replication
  • Upgrade databases, hardware and applications with zero downtime
  • Replicate data between Oracle and non-Oracle systems

To find out more about Oracle and GoldenGate goto: http://www.oracle.com/goldengate

October 15, 2009

Day 4: Hasta la Vista Oracle Open World

I hope everyone enjoyed the blockbuster keynote with Arnold and Larry - my favorite quote of the night was from Gov. Shwarzeneggar in speaking how much he loved the tech industry:

"I love tech. I love all kinds of tech... and not just high tech but bio tech, green tech, clean tech. The tech industry has made me what I am today... I wouldn't have made my start in acting career without help from body building technology and of course food supplement technology!"

Ellison delivered a compelling keynote as well - especially interesting was his 10 million dollar challenge for anyone that could double the speed of Exadata.

So what about data integration? Well yesterday we had an action-packed day. Thanks to Rittman Mead for their Oracle Un-conference on Oracle Warehouse Builder, and special thanks to Paul Longhurst at Overstock.com for delivering an innovative best practice presentation on how to make Data Warehouses more active, more real-time and more efficient.

Today there's a session which I'm delivering with Haidong Song on Information Management: Combining the power of ODI and MDM together. I invite you to come to our session at

9AM in Moscone South, Room 304

October 14, 2009

Day 3 Oracle OpenWorld: the Innovator meets Terminator

For all those geeks like me who idolize both the innovators and the terminators in the world, today is your day. Day 3 promises to keynote both Larry Ellison and Gov. Schwarzenegger. And if you're like me - and I know you are - you've memorized all the one-liners from Commando and stapled them into your pocket protector, or I should say you've archived them on your iPhone and created ring tones out of them.

What else to do at OpenWorld today?

Don't forget to spend time at the Oracle GoldenGate and the Oracle Data Integration booths in Moscone West . And check out Overstock.com session which promises to be a good one on the how Overstock.com used both Oracle GoldenGate and Oracle Data Integration. Moscone West L3 in Room 3014

What else is up today? Well I'm definitely looking forward today to meet up with some friends and colleagues and discuss and share what we've been up to this year.

Don't forget to follow me on twitter: "Dainsworld" and check out my own #oow09 one-liners

October 13, 2009

Day 2 OpenWorld: 1 Umbrella. 2 Espressos. And the Innovation Across the Stack

With a 2 hour commute through what appears to be horizontal rain, I'm enjoying my double espresso in preparation for today's excitement. Stay dry and try not to miss the important sessions below:

Check out Thomas Kurian talk about the "Innovation across the stack" today in the keynote and be sure not to miss the most important session of the day for the Data Integration crew:

Oracle GoldenGate & Oracle Streams: The Future of Oracle Replication and Data Integration

Panel With Hasan Rizvi and Juan Loaiza, Sachin Chawla, and Ali Kutay
Tuesday, 2:30 - 3:30 PM
Moscone South Room 270


And be sure not to miss these sessions today


  • 11:30a Using Oracle GoldenGate for Zero Down time with Sabre, Marriott Hotel GoldenGate B1

  • 11:30a Enterprise Data Services in the Cloud with Oracle Fusion Middleware, Marriott Hotel Salon 7

  • 1:00p Oracle GoldenGate Deep Dive Architecture for Real-time, Marriott Hotel

  • 1:00p Deep Dive for Real-world Use cases with SOA and Data Integration, Hilton Hotel GoldenGate 3

  • 4:00p BPM and Data Integration combined, Marriott Hotel Nob Hill CD

October 12, 2009

Live from Oracle Open World: "Art of the Possible" ... and your must see on Day 1

I'm sitting here on one of just a handful of the Oracle Open World official bean bags waiting for the Charles Phillips keynote, "Art of the Possible". I thought I would send you what I think some important noteworthy sessions of the day:

  • 8:30 AM - Charles Phillips - Art of the possible
  • 11:30 AM - Introducing Oracle GoldenGate products, strategy discussion: Marriott Hotel Golden Gate B1
  • 1:00 PM - Real-time BI using Oracle Data Integration and Oracle GoldenGate, Marriott Hotel Golden Gate B1
  • 1:00 PM - Event-Driven BI from Heterogeneous Datasources with Oracle Data Integrator Marriott Hotel Salon 1
  • 2:30 PM - Oracle Fusion Middleware 11g--Foundation for Innovation Moscone North Hall D
  • 4:00 PM - RIM Case Study: Using Oracle GoldenGate Data Replication for High Availability, Marriott Hotel Golden Gate B1
  • 5:30 PM - Integrate Oracle BI Solutions, Using Oracle Data Integrator Enterprise Edition - Moscone South Room 307
And don't forget two hands on labs for those of you that like to get your hands dirty

11:30 AM and 1:00 PM for Oracle Data Integrator Enterprise Edition at Marriott Hotel Golden Gate A2

Useful links:

Find it all here: Oracle OpenWorld home page

To keep up to date on the Data Integration sessions at Oracle Open World feel free to follow me on twitter: "dainsworld"

See you at the show!


October 9, 2009

Did You Know that ODI Automatically Summarizes Data Errors?

Looking for Data Integration at OpenWorld 2009? Click here!

This post assumes that you have some level of familiarity with ODI. The concepts of Interface, Flow and Static Control, as well as Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..


TABLES GENERATED BY ODI TO IDENTIFY ERRORS

If you take advantage of either Flow Control or Static Control in your interfaces, you know that ODI will automatically trap errors for you as you run your interfaces.

When you select the Controls tab of your interface, where you will decide which Knowledge Module will be used to identify the errors, you have an option to drop the Error table and another one to drop the Check table. Have you ever wondered what these are?

Interface Controls Tab

The Error table is the table that will be created by ODI to store all errors trapped by the FLOW_CONTROL and STATIC_CONTROL of your interface. You have probably already used the error table. This table is structured after your target table, along with administrative information needed to re-cycle or re-process the invalid records. It is loaded by ODI with all records that fail to pass the validation of the rules defined on your Target table. This feature is often referred to as a Data Quality Firewall as only the "good" data will make it to the target table.

Once all errors have been identified for a given interface, ODI will summarize them into another table: the Check table. There will be only one such table per data server: all target tables in the server (irrespectively of their schema) will share the same summary table. The name of this table is defined by default by the CKMs as SNP_CHECK_TAB.


LOCATION OF THE CHECK TABLE

You will find the check table in the default work schema of your server. To locate this schema, you have to go back to topology, in the Physical Architecture tab. Expand your data server to list the different physical schemas. One of the schemas is your default schema and will be identified by a checkmark on the schema icon (see SALES_DWH in the example below).

Default Schema
When you edit the schema, it has an associated work schema. The work schema associated to your default schema is your default work schema: ODI_TMP is the following example.

Default Work Schema

Note that you can change your default schema by selecting/unselecting the default option in the schema definition. But remember that you will always need exactly one default schema for each server.

Now that we know where to find this table, let's look at its structure:

  • CATALOG_NAME, SCHEMA_NAME: location of the table that was being loaded (i.e. the target table)

  • RESOURCE_NAME, FULL_RES_NAME: name of the table that was being loaded

  • ERR_TYPE: type of control that was performed (Flow Control or Static Control)

  • ERR_MESS: plain English error message associated with the error

  • CHECK_DATE: date and time of the control

  • ORIGIN: name of the ODI process that identified the errors

  • CONS_NAME: name of the constraint (as defined in the ODI Models) that defines the rule that the record violated

  • CONS_TYPE: type of error (duplicate primary key, invalid reference, conditional check failed, Null Value)

  • ERR_COUNT: number of records identified by the process that failed to pass that specific control rule.

ErrorsSummaryStructure.PNG

A sample of the data available in that summary table is show below (we split the content in 2 screenshots to make this more readable - this is one and only one table):

Errors Summary Data

Errors Summary Data2

There are many possible uses for this table: decision making in your ODI processes based on the number of errors identified or the type of errors identified, basic reporting on errors trapped by ODI, trend analysis or the evolution of errors over time...

Do not hesitate and share with us how you leverage this table!

Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

October 8, 2009

Connecting to Oracle Business Intelligence EE using Oracle Data Integrator


Looking for Data Integration at OpenWorld 2009? Click here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Topology, Data Server, Physical and Logical Architecture are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

In this post I will describe how to create a connection to an Oracle Business Intelligence (OBI) Server.

Adding the Oracle BI Server JDBC driver to ODI

Before we can connect to Oracle BI Server we need to add its JDBC driver to the ODI_HOME\oracledi\drivers directory.
You will find the Oracle BI JDBC driver (bijdbc.jar) in your OBI EE installation directory -> OBI_HOME\jdbc. You can simply copy it and then paste it into the ODI drivers directory.
Note: Make sure you use the JDBC driver coming with OBI EE 10.1.3.4.1, I've had issues with previous releases of the driver. If you are getting an error like 'No Suitable Driver' you are not using the right driver.

Creating the Oracle BI Server Technology in Topology Manager

The Oracle BI Server technology does not exist in ODI 10g, while we could use the Oracle technology it is cleaner to create a new technology.
Instead of creating it from scratch I recommend to duplicate the Oracle technology.

Open Topology Manager and locate the Oracle technology in the Physical Architecture tree.
Right-click on Oracle and select Duplicate.
Duplicate_Technology.png
A new technology called Copy of Oracle should now appear in the list.
Double-click on it to edit it.
Change its name to Oracle BI Server and its code to ORACLE_BI_SERVER.
BI_Server_Techno_1.png
Under Naming Rules check the Using "Catalog"' checkbox and enter Catalog in the Used Term field.
Remove Schema in the Used Term field and uncheck the Using "Schema" checkbox.
BI_Server_Techno_2.png
In the Local Object Mask and Remote Object Mask fields enter "%CATALOG"."%OBJECT".

BI_Server_Techno_3.png
Go to the Language tab and empty the Object Delimiter field.

BI_Server_Techno_4.png

Creating a Data Server for the Oracle BI Server Technology

Now that we have created the OBI Server technology we can create a data server to connect to an OBI Server instance.
Right-click on the Oracle BI Server technology and select Insert Data Server.

BIServer_Insert_DataServer.png

In the Data Server window, enter any name in the Name field.
Enter your OBI Server username and password in the User and Password fields.

BI_Server_DataServer.png

Click on the JDBC tab.

Enter the JDBC Driver Name and JDBC URL in their respective fields:
- JDBC Driver: oracle.bi.jdbc.AnaJdbcDriver
- JDBC URL: jdbc:oraclebi://localhost:9703/ (modify it according to your environment settings)

BI_Server_DataServer_2.png

Note:
The OBI Server JDBC Driver is not in the Driver list, you need to type the Driver name in the JDBC Driver field.
Make sure that there are no spaces at the end or at the beginning of each string.
The JDBC URL needs to end with the '/' character otherwise you'll receive an error while trying to connect.

When you are done click on the Properties tab. We need to add 2 properties.

Click on the Insert button to add a Key.
Enter Catalog in the Key field and the name of the OBI Catalog you would like to access in the Value field.
In this example I used a catalog called Paint.

Click on the Insert button to add a second Key.
Enter NQ_SESSION.SELECTPHYSICAL in the Key field and yes in the Value field.

BI_Server_DataServer_3.png

Click on Test to validate the connection information.
Click OK to close the Successful Connection window.

Click OK to save and close the Data Server window.

A Physical Schema window should pop up.
In the Catalog (Catalog) list select the Catalog you would like to access.
In the Catalog (Work Catalog) list select the same Catalog.

BI_Server_PSchema_1.png

Verify that the Local Object Mask and Remote Object Mask fields are set to "%CATALOG"."%OBJECT".
Click on the Context tab then click on the Insert button to add a row.

Select a context in the Context column and select a Logical Schema or type a new Logical Schema name in the Logical Schema column. I used BIEE_LSCHEMA in this example.

BI_Server_PSchema_2.png

Click OK to save and close the Physical Schema window.

We now have successfully created the infrastructure we need to use OBI Server in our ODI processes.

In another post I will describe how to use OBI EE as a source in an ODI interface.

About

The world of Data Integration: Data warehousing, ELT/ETL, Data Quality, Data Governance, Master Data Management, Business Intelligence, Data Services, Data Federation, and much much more...

Top Tags

Categories

Blogroll

Contributors

Dain Hansen
Christophe Dupupet
Jeff Pollock
Powered by
Movable Type and Oracle