Monday Nov 09, 2009

The Benefits of ODI Knowledge Modules: a Template Approach for Better Data Integration

This post assumes that you have some level of familiarity with ODI. The concepts of Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..

At the core, ODI knowledge modules are templates of code that will be leveraged for data integration tasks: they pre-define data integration steps that are required to extract, load, stage - if needed - and integrate data.

Several types of Knowledge Modules are available, and are grouped in families for Loading operations (LKMs), Integration operations (IKMs), Data Control operations (CKMs), and more.

For a more detailed description of what a knowledge module is, simply picture the multiple steps required to load data from a flat file into a database. You can connect to the file using JDBC, or leverage the database native loading utility (sqlldr for Oracle, bcp for SQL Server or Sybase, load for db2, etc.). External tables are another alternative for databases that support this feature.
As you use one or the other technique, you may first want to stage the data before loading your actual target table; in other cases, staging will only slow down your data load.

As far as the integration in the target system is concerned, again multiple strategies are available: simple inserts, inserts and updates, upserts, slowly changing dimension... these techniques may be as simple as one step, or be a complex series of commands that must be issued to your database for proper execution.

The Knowledge Modules will basically list these steps so that a developer who needs to repeat the same integration pattern only has to select the appropriate templates, versus re-developing the same logic over and over again.

The immediate benefits of this approach are well known and well documented:
- All developers use the same approach, and code development is consistent across the company, hence guarantying the quality of the code
- Productivity is greatly improved, as proven path are re-used versus being re-developed
- Code improvement and modification can be centralized and has a much broader impact: optimization and regulatory changes are done once and inherited by all processes
- Maintenance is greatly simplified

To fully appreciate all the benefits of using knowledge Modules, there is a lot more that needs to be exposed and understood about the technology. This post is a modest attempt at addressing this need.


Most tools today will offer the ability to generate SQL code (or some other type of code, such as scripts) on your source or target system. As most products come with a transformation engine, they will also generate proprietary code for this engine where data is staged (I'll skip the debate here as to whether a transformation engine is a staging area or not - the point being that code can be generated on either source, "middle-tier" or target).

However, real life requirements are rarely either/or. Often times, it makes sense to leverage all systems to optimize the processing: spread out the load for the transformations, reduce the amount of data to be transferred over the network, process the data where it is versus moving the data around solely for the purpose of transformations.

To achieve this, Data Integration tools must be able to distribute the transformation logic across the different systems.


Only ODI will effectively generate code and transformations on all systems. This feature is only possible thanks to the KM technology.

Beyond the ability to generate code, you have to make sure that the generated code is the best possible code for the selected technology. Too often, tools first generate code that is then translated for the appropriate database. With the KMs technology, no translation is required: the generated code was initially conceived explicitly for a given technology, hence taking advantage of all the specifics of this technology.

And since the KMs are technology specific, there is no limit to what can be leveraged on the databases, including user defined functions or stored procedures.



Whenever a tool generates code, the most common complaint is that there is very little (if any) control over the generated result. What if a simple modification of the code could provide dramatic performance improvements? Basic examples would include index management, statistics generation, joins management, and a lot more.

The KM technology is open and expansible so that developers have complete control over the code generation process. Beyond the ability to optimize the code, they can extend their solution to define and enforce in house best practices, and comply with corporate, industry or regulatory requirements. KMs Modifications are done directly from the developers graphical interface.

One point that can easily be adapted is whether data have to be materialized throughout the integration process. Some out-of-the-box KMs will explicitly land data in a physical file or tables. Others will avoid I/Os by leveraging pipes instead of files, views and synonyms instead of tables. Again, developers can adapt the behavior to their actual requirements.


How much time does it take to adapt your code to a new release of your database? How much time does it take to add a new technology altogether? In both cases, KMs will provide a quick and easy answer.

Let us start with the case of a new version of the database. While our engineering teams will release new KMs as quickly as possible to take advantage of the latest releases of any new database, you do not have to wait for them. A new release typically means new parameters for your DDL and DML, as well as new functions for your existing transformations. Adapt the existing KMs with the features you need, and in minutes your code is ready to leverage the latest and greatest of your database.

Likewise, if you ever need to define a new technology that would not be listed by ODI (in spite of the already extensive list we provide), simply define the behavior of this technology in the Topology interface, and design technology specific KMs to take advantage of the specific features of this database. I can guaranty you that 80% of the code you need (at least!) is already available in an existing KM... Thus dramatically reducing the amount of effort required to generate code for your own technology.


I am a strong advocate of the customization of KMs: I like to get the best I can out of what I am given. But often times, good enough is more than enough. I will always remember trying to optimize performance for a customer: we did not know initially what our processing window would be - other than "give us your best possible performance". The first out-of-the-box KM we tried processed the required 30,000,000 records in 20 minutes. Due to IT limitations, we could only leverage lesser systems for faster KMs... but still reduced performance to 6 minutes for the same volume of data. We started modifying KMs to get even better results, when the customer admitted that we actually had 3 hour for the process to complete... At this point, spending time in KM modifications was clearly not needed anymore.

KMs are meant to give the best possible performance out of the box. But every environment is unique, and assuming that we can have the best possible code for you before knowing your own specific challenges would be an illusion - hence the ability to push the product and the code to the limit

Another common question is: do you have to leverage both source and target systems as part of your transformations? Clearly, the answer is no. But in most cases, it is crucial to have the flexibility to leverage all systems, versus being cornered in using only one of them. Over time, you will want to reduce the volume of data transferred over the network; you will want to distribute some of your processing... all more reasons to leverage all available engines in your environment.

Do not hesitate and share with us how you extend your KMs!

Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.


Thursday Oct 01, 2009

Creating a New Knowledge Module for Sample Data Sets Generation

Looking for Data Integration at OpenWorld 2009? Click here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface and Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..
In particular, to learn more on Knowledge Modules, I strongly recommend the Knowledge Module Developer's Guide - Fundamentals that comes with the product. You will have to download and install ODI to access this document in the Documentation Library.

This post will look into "when" and "how" to create a knowledge module. Then it will walk through some of the choices that can be made when designing a Knowledge Module.

To illustrate the descriptions, we are working on an example described previously in this post.


The first element to look into is what parts of the logic of your code are reusable. What you are typically looking for are the following:

  • Sequences of steps that are repeated commonly, even though some steps may be optional. For instance: creation of a staging table, creation of a script or parameter file for a utility, invoking an external program, extraction of data from a database, etc.
  • For each step, identification of the non variable parts vs. the variable parts. For instance, in a select statement, the body of the code remains the same. In the following example, the elements in brackets are variables, the others are fixed:
    • Insert into [TableName] ([ListOfColumns]) select ([ListOfColumns and Expressions]) from [List of Tables] where [conditions]

  • For your module to be re-usable, you want to make sure that no information that physically relates your code to any system or table structure is left out of the code. The idea behind the KMs is that as developers will build their transformations and mappings, ODI will "fill in the blanks" with the appropriate data.

The easiest way to get started with a knowledge module is actually to take an existing one and modify it. As the syntax has already been validated in existing KMs, the amount of work required to produce valid code will be greatly reduced.
In most cases, column names, mapping expressions do not belong to a knowledge module. The exception would be administrative columns that you add as part of the logic of your KM. For instance, most Incremental Update knowledge modules that ship with ODI create an IND_UPDATE column to differentiate records that will be updated from those that will be inserted. These columns definitely belong in the code of the KM.


Likewise, you may want to create your own tables (administrative tables, audit tables, etc.) with a very static name. These can be created by the Knowledge Module. But in general, it is better to dynamically generate the table name after the table being loaded, to prevent multiple processes running in parallel from trying to use the same intermediate table.


Any technique used to extract data out of a database (or file, or messaging system, or web service for that matter) can be a good opportunity to create a new KM. The same is true for loading techniques and integration techniques: inserts, updates, slowly changing dimension, etc.

In the scenario that we are contemplating, we want to insert data (albeit random data) into a table, so we probably have a good case for a knowledge module.

The first step is usually to look for available techniques, try the code independently of any knowledge module, and check out how it behaves: how is performance? How does the code behave when data volume grows? You want to make sure that the code you will integrate as a template is as good as it can be before you share it with the entire corporation!

Typically, extracting from a source system to stage data is done in an LKM. Loading data into a target table is done with an IKM. In our case, we will clearly create an IKM.


For our example, will start with a KM that works exclusively for Oracle Databases. Adaptations of the code will be possible later on to make similar processes run on other databases.

The Oracle database provides a fantastic feature that we can leverage to generate a large number of records: group by cube: it returns all the possible permutation for the selected columns. So the following code:

select NULL from dual group by cube(1,1,1)

returns 8 records (2 to the power 3). Add columns for the list for the permutations, and you are adding an exponential number of records.

Now when I played with this function on my (very) little installation of the database, I seemed to hit a limit for (very) large permutation numbers. I have to admit that I am not using the function in its expected fashion, no I cannot really complain. But at least I can easily generate 1024 records (2 to the power 10). Now from a usability perspective, I do not really want to use that number for the users of my KM (1024 has a geeky flavor to it, doesn't it?). How about generating a table with just 1,000 records?

The following code will do the trick:

select NULL from dual group by cube(1,1,1,1,1,1,1,1,1,1)
where rownum<=1000

Note that so far, all the instructions we have are hard-coded. We still do not have anything that would be dynamic in nature.

Now we need to use the above query to create some table with our 1,000 records. Again, we can hard-code the table name - but this does not make for very portable code. In particular, from one environment to the next, the database schema names will vary. We have three options to create our staging table, from the least portable to the most portable:

  • Hardcoded table name and schema name: myschema.SEED
  • Dynamic schema name, hardcoded table name: let ODI retrieve the proper schema name and automatically update the code at execution time:  (Generated code: myschema.SEED)
  • Fully dynamic table name and schema name (usually, dynamic tables are named after the target table with some sort of extension): _SEED (generated code: if you are loading TRG_CUSTOMERS, then the SEED table name is myschema.TRG_CUSTOMER_SEED)

Best practice is of course to use the last one of these options to allow for multiple processes to run in parallel. To keep our explanations simple, we will use the second option above - but keep in mind that best practice would be to use the fully dynamic one.


As we will use our KM over and over, it is important to make the developer's life easy. Steps have to be included here to create our seeding table, drop it when we are done, and make sure before we create it that it is not there from a previous run that could have failed.
The typical sequence of steps for a KM creating any type of staging table is:

  • Drop table (and ignore errors - if there is no table, we are fine)
  • Create table (re-create it to reflect any possible meta-data changes)
  • Load the table with staging data - in our case a sequence of numbers that we will be able to leverage later on for filtering... (please be patient: we will come back to this). Here a rownum will do the trick...

Now that we have the code to insert data into our staging table, we can put all the pieces together and have the first three steps of our knowledge module. Keep in mind that you have to be consistent from one step to the next as you name your table. The actual knowledge module with all the matching code is available here (look for KM_IKM Oracle - Build Sample Data - Gen II.xml).



So far our table only has 1,000 records. Not much in terms of volume. But all we need now to create a table with 1,000,000 records... is a Cartesian product (you know, the one thing your mother told you NOT to do with a database? It comes very handy here!):

insert into [Target] ([Columns]) Select * from SEED S1, SEED s2

And if we want to return less records, all we have to do is filter on the S2 table. For instance the clause:

where S2.SEED_ID<=10

will return 10,000 records!. Remember when we stored rownums in this table earlier? This is where it becomes very handy...

So far the only thing we have done is to generate a fairly large number of records. Where the exercise becomes even more interesting is if we can generate data for each record that matches our requirements for sample data. In a previous post we have seen how to generate User Functions in ODI to abstract this type of random generation. The code for the sample data generation typically does not belong to the Knowledge Module as it would not give us enough flexibility for all the possible combinations out there.

The User Function examples used before can generate numerics and strings. We could expand their design to generate random phone numbers, or random social security numbers... and work with random data that will now look like real data, instead of exposing sensitive information.


The fact that User Functions do not belong to the code of the Knowledge Module does not mean that there is no flexibility in the Knowledge Modules. Here, we are building a sample table out of thin air: it is very possible that the table does not exist in the first place. Or if want to run multiple tests with different amounts of records each time, we may want to truncate the table before each new run.

KM options are a very simple way to toggle such behaviors in a KM. We can create 2 options: CREATE_TABLE and TRUNCATE. Then create the appropriate steps in our KM:

  • Create Target Table
  • Truncate table

When you want to associate an option to a given step, edit the step itself; then click on the "option" tab and un-check "always execute". Select only the appropriate option in the list and click OK...



As we define the options, it is also good to think of the most common usage for the KM. In our case, chances are we will often want to create the table and truncate it for successive runs: we can then define that these steps will be executed by default (set the default for the variables to "Yes". More conventional KMs would typically have these options, but their defaults would be set to "No".

We now have a complete Knowledge Module that can be used to generate between 1,000 and 1,000,000 records in any table of your choice, complete with options that will let the users of the KM adapt the behavior to their actual needs...

Again, if you want to review all the code in details, it is available here (look for KM_IKM Oracle - Build Sample Data - Gen II.xml).


Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Sunday Sep 13, 2009

How to Define Multi Record Format Files in ODI

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Datastore, Model and Logical Schema are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

It is not unusual to have to load files containing various record formats. For example a company might store orders and order lines using distinct record formats in the same flat file or you might have one single file containing a header, some records and a footer.
In this post, we'll use the following source file as an example:

1,101,2009/09/01,Computer Parts
2,234,101,Motherboard, Asus P6T,239.99
2,235,101,CPU,Intel Celeron 430,40
1,102,2009/09/02,Computer Parts
2,301,102,CPU,AMD Phenom II X4,170
2,401,103,Inkjet Printer,Canon iP4600,69.99
2,402,103,Inkjet Printer,Epson WF30,39.99
2,403,103,Inkjet Printer,HP Deskjet D2660,49.99

As we can see the Order and Order Lines records have different formats (one has 4 fields, the other has 6 fields), they could also have a different field separator.

Identifying the Record Codes

The first step in order to handle such a file in ODI is to identify a record code, this record code should be unique for a particular record type. In our example the record code will be used by ODI to identify if the record is an Order or an Order Line. All the Order records should have the same record code, this also applies to the Order Lines records.
In our example the first field indicates the record code:
- 1 for Orders records.
- 2 for Order Lines records.

Define the Datastores

We assume that you have already created a Model using a Logical Schema that points to the directory containing your source file.

We will start by defining a datastore for the Order records.

Right-click on the File model and select Insert Datastore.
In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. 




In the Files tab, specify your flat file settings (delimiter, field separator etc.)

Refer to the ODI documentation for additional information regarding how to define a flat file datastore.



In our example the Order records have 4 fields:

Go to the columns tab and add those 4 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.




Click OK.



In the Models view, right-click on the datastore and select View Data to display the file content and make sure it is defined correctly. 



The data is filtered based on the record code value, we only see the Order records.


We will now apply the same approach to the Order Lines record.

Right-click on the File model and select Insert Datastore to add a second datastore for the Order Lines record.

In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. We are pointing this datastore to the same file we used for the Order records. 



In the Files tab, specify your flat file settings (delimiter, field separator etc.).
Refer to the ODI documentation for additional information regarding how to define a flat file datastore.



In our example the Order Lines records have 6 fields:

Go to the columns tab and add those 6 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.



Click OK.


Right-click on the datastore and select View Data to display the file content and make sure it is defined correctly.



The data is filtered based on the record code value, we only see the Order Lines records.

You can now use those 2 datastores in your interfaces.


All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Wednesday Sep 09, 2009

Generating Sample Data with ODI: A Case Study For Knowledge Modules and User Functions

Looking for Data Integration at OpenWorld 2009? Look no further: all you need is here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model, Knowledge Module and User Function are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

We've all been there: we start coding, waiting for a set of sample data to be available. We move along with the code... and the data is not available. Or we need to build a small (or not so small) data set quickly. Sure, we all have sample databases left and right for that purpose. But recently I was looking for a decent size data set for some tests (more than the traditional 30 sample records) and could not put my hands on what I needed. What the heck: why not have ODI build this for me?

The techniques that we will leveraged for this are the following:

  • Creation of a temporary interface to create the sample table (See this previous post for details on how to create a temporary interface)
  • Creation of a new knowledge module to generate enough records in the new table
  • Creation of ODI User Functions to simplify the generation of random values


All the objects mentioned in this article can be downloaded. Save

this XML file
if you want to import in your repository a project that already contains all the objects (IKM and User functions). Click
if you want to download a file that will let you import the different objects individually. You will have to unzip the file before importing the objects in the later case.

The samples provided here have all been designed for an Oracle database, but can be modified and adapted for other technologies.

Today we will discuss the different elements that allow us to generate the sample data set. In future posts, we will dissect the Knowledge Modules and User Functions to see what technological choices were made based on the different challenges that had to be solved.


For more details on how to create a temporary interface, you can refer to this post. For our example, we will create a new table in an existing schema. When you create your temporary interface, remember to set the following elements:

  • Select of your staging area ( In the Definition tab of the interface)
  • Name your target table
  • Select the location of your target table (work schema / data schema)
  • Name the Columns, and set their individual data type and length


Interface Definition Tab

For our example, we will use a fairly simple table structure:


SAMPLER_ID number(3)
SAMPLER_NAME varchar2(30)
SAMPLER_PRICE number(10,2)


Sampler Interface Table Creation


The Oracle database comes with a package called DBMS_RANDOM. Other random generators can be used (DBMS_CRYPTO for instance has random generation functions as well). These functions take more or less parameters, and if we realize after creating dozens of mappings that using the "other" package would have been better... we would be in a lot of trouble. Creating user functions will allow us to:

  • Have a naming convention that is simplified
  • Limit the number of parameters
  • Limit the complexity of the code
  • Later maintain the code independently of our interfaces, in a centralized location: if we decide to change the code entirely, we will make modifications in one single place - no matter how often we use that function.


For our example, we will have 5 ODI user functions in ODI (again, these can be downloaded


  • RandomDecimal(Min, Max): generates a random value (with decimals) between the Min and Max values
  • RandomNumber(Min, Max): generates a random value (without decimals) between the Min and Max values
  • RandomBool(): generate a 0 or a 1
  • RandomDate(MinDate, MaxDate): returns a date between MinDate and MaxDate (make sure MinDate and MaxDate are valid dates for Oracle)
  • RandomString(Format, Min, Max): generates a random string with a minimum of Min characters and a maximum of Max characters. Valid formats are:
    • 'u', 'U' - returning string in uppercase alpha characters
    • 'l', 'L' - returning string in lowercase alpha characters
    • 'a', 'A' - returning string in mixed case alpha characters
    • 'x', 'X' - returning string in uppercase alpha-numeric characters
    • 'p', 'P' - returning string in any printable characters.



We can either use these functions as is or as part of a more complex code logic, such as a case...when statement.

For our example, we will build the following mappings:

Column Mapping
SAMPLER_ID RandomNumber(1,999)
SAMPLER_NAME RandomString('A', 1, 30)
SAMPLER_PROMOTION case when RandomBool()=0 then 'FALSE'
else 'TRUE'
SAMPLER_PRICE RandomDecimal(1,10000)
SAMPLER_RELEASE_DATE RandomDate('01-JAN-2000', sysdate)

In ODI, the mappings will look like this:

Sampler Interface Mappings


Since we do not have any source table in this interface, we only need an IKM. The IKM provided will this example needs to be imported in your project.

Because the purpose of this KM is to generate sample data, it will have a few options where the default values will be different from the usual KMs:

  • TRUNCATE defaults to 'YES': we assume here that if you re-run the interface, you want to create a new sample. If you only want to add more records to an existing table, simply set this option to 'NO' in your interface.
  • CREATE_TABLE defaults to 'YES': we assume that the table to be loaded does not exist yet. You can turn that option to 'NO' if there is no need to create the table.
  • THOUSANDS_OF_RECORDS: set this to any value between 1 and 1,000 to generate between 1,000 and 1,000,000 records


Sampler IKM

Once you have set the values for your KM, you can run the interface and let it generate the random data set.

With the above configuration, and using a standard laptop (dual core 1.86GHz processor and 2 Gb of RAM) equipped with Oracle XE my statistics were as follows:

10,000 records generated in 5 seconds
100,000 records generated in 24 to 35 seconds (about 30 seconds on average)
1,000,000 records generated in 211 to 235 seconds (about 4 minutes on average)

Note that the machine was not dedicated to this process and was running other processes.

Statistics are available in the Operator interface.

Sampler Stats

To review the data loaded by ODI in your target table, simply reverse-engineer this table in a model, then right-click on the table and select View Data to see what was generated!



One question: why did I stop here and did not try to make this work for other technologies? Well, it turns out that ODI is really meant to move and transform data. As long as I have at least ONE table with random data in any one of my databases, it is now faster to just create a regular ODI interface and move the data across... The design will take less than a minute. The data transfer should not take much time either. Who would try to spend more time coding when the solution is that simple?

But if you want to make this work for other databases, here are your entry points:

  • Duplicate the KM and modify it to use SQL that would work on these other databases
  • Update the user functions to make sure that they use the appropriate functions for the given databases
  • Use the same logic to create your interface



All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Data Integration Showcased at OpenWorld 2009

Thursday Sep 03, 2009

Designing and Loading Your Own Staging Tables

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model and Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.


When ODI moves data from your source systems to your target system, it automates the management (creation, loading, deletion) of all the necessary staging tables. The Knowledge Modules control which tables are created when and where, so that the developers do not have to worry about these mechanics.
There are cases though where the logic of your integration processes calls for the creation of your own staging tables, independently of what ODI will create. You may want to pre-aggregate some data before applying further logic. You may want to break a very complex interface into a set on interfaces that will be easier to develop and maintain. Over time, I have found that creating these intermediate tables very valuable.
The problem is: they are not part of the data model.
One of the ODI features that is often overlooked is the ability to create tables directly from the designer tool.

We will look into the creation of the table from 2 different angles: using ODI Diagrams or using temporary interfaces. We will look into the pros and cons of each approach.Note that these techniques can be used beyond staging tables and leveraged in the design of your overall solutions.

1. ODI Diagrams

ODI Diagrams can be found under the Models section in the Designer interface.


Creating Diagrams will have many benefits from a design perspective:
- You will be able to retrieve and convert table structures from other technologies, ODI will automatically convert the data types as required;
- Once you have designed your new tables, ODI can generate a DDL that is context independent (See this previous post for more details on contexts);
- If you derive the structure of the new tables from existing models, ODI will offer the automatic creation of interfaces for you - using the base tables as sources and the new tables as targets.

Let's now see how to create a diagram and get started.

1.1 Creation of an ODI Diagram

In your models, you will find the entry point for Diagrams directly under the Model name itself. Simply right-click on the Diagrams entry and select Insert Diagram to create your first Diagram.

Insert Diagram

When you create a diagram, you can either model all your tables one column at a time, or start from existing structures. This is the feature that I like best, as it will save me a lot of time in all subsequent operations. To try this out, simply drag and drop a table from another model (if possible even from a different technology) directly in the model. ODI will first ask you if you really want to create this table in the new model. Then it will immediately match the data types with the new technology, and display the resulting table so that you can make any modification as required: you may want different datatypes, different column names, you may want to add and remove columns, or even rename the table itself. You can either perform these operations as you are copying the structure, or later on when you will edit your diagram.

Diagram Table Name

Diagram Table Columns

Once you validate the definition of the table, ODI will display the table in the diagram and add it to your model tree. Note that at this point in time, the table is not created in the database.

Not all tables will necessarily have a matching structure in other systems. You may have to create them directly from the diagram. You can then take advantage of the toolbar to create not only the table and its columns, but also to define the relationship between the different tables.


1.2 Leveraging the ODI Diagram

Once the Diagram has been created, several options become available:
- Generation of the DDLs to create the tables in the actual databases
- Generation of Interfaces using the initial tables as sources and the new tables as target (and vice-versa).
All these operations are possible from the Models menu (right-click on the model to see these operations).

Diagram Menus

The Generate DDL menu will allow you to create the tables you have designed along with all the necessary constraints. Furthermore, you will be able to run the DDL in all contexts with absolutely no code modification - hence guarantying consistency across environments .

The creation of interfaces will be as easy. But interfaces will require knowledge modules, and to truly take advantage of this feature, you want to make sure that the knowledge modules of your choice (and only these!) are imported in a project. The trick with "only these" is that you want the knowledge modules to be selected for you automatically. When ODI will generate the interfaces, if there is only one choice for the LKM, and only one choice for the IKM, maybe a CKM if needed - then you will not even have to open the interfaces before you run them (except for a sanity check maybe!). Do not hesitate and modify the default values for the KMs you have selected as, again, that will save you a tremendous amount of time in terms of reviews.

Once the KMs are set, go back to your model, and select the menu entry Generate Interfaces IN to generate the interfaces that will load your new tables (or Generate Interfaces OUT to generate interfaces that will extract data from your new tables). Select the project where you have imported the appropriate KMs as the destination for the interfaces, and wait for the automatic generation to happen. If you have renamed columns, they are properly mapped to the original column. If a table was created from several source tables, all sources are part of the interfaces. You only have to worry about new columns, relationships between source tables that ODI would not know of, and transformations if you need any.

Over time, I've used this feature a lot when I am given a set of files that are supposed to represent tables in a real case. I reverse engineer the files - then use the files definitions to quickly design the equivalent structures in the database that I need to load... and run the DDLs and interfaces all in one shot. A very nice way to save hours of development time!

1.3 When to use this approach

The real benefit of this approach is in the massive generation of DDLs and interfaces. If you only need to load a couple of tables, the benefit is not as obvious. But as the number of tables to create and interfaces to generate grows, so does the value of Diagrams. I have recently worked on a case where in less than an hour, I had two dozen tables designed, created, and loaded with data when initially all I had was a series of flat files...


There will be cases where you need to create just a few set of staging tables and you do not necessarily want to go through the creation of models and DDLs. For these cases, ODI will let you create your table directly from with your interfaces.

2.1 Creating the Interface

When you create a new table in an interface, you will first have to indicate where the table will be created. To do so, select "staging area different from target" in the definition tab of your interface, and select the schema where the table will be created.

New Interface Staging

2.2 Designing the table

The first step will be to name the table, and define where it will be created (in the data schema or in the work schema. These schemas are defined in the Topology interface. The work schema is typically your staging area). To name the table, click on the "Untitled" caption in the target side of you interface, and enter a table name in the properties window. It is in that same properties window that you will select the data or work schema.

Interface - Name Table

You can then add columns to this table. If you add a datastore on the source side of your interface, you can add the columns one by one (simply drag and drop the columns from the source datastore to the target datastore. Or you can right-click on any source column to add it to the target table. Or you can right-click on the table name and add all columns at once by selecting the menu "add to target". When the columns are added, ODI converts the data types to match those of the target technology, and pre-maps the target columns to the source columns.

Of course, you can always edit the column information later on in the Properties window.

Interface Add Columns

You can also right-click under the existing columns on the target side to manually add new columns. In this case, you will have to specify the column name, data type and size.

Interface Add New Column

2.3 Creating and loading the table

Once you have created the table as needed, you can use it as any other target table in your interface and design your transformations as usual. The one thing to remember will be to set the "CREATE_TARGET_TABLE" option to "YES" in the IKM of this interface to make sure that the table is created at execution time.

2.4 Using the staging table as a source

Once this table has been created and loaded, all you have to do to use it as a source is to drag and drop the Interface as a source in the next interface (you read that right: drag and drop the interface directly from the project tree, as if it were a table!). No need to actually reverse engineer the table and add it to your model.

This does not mean that you cannot reverse engineer the table and add it to your model. In many cases, the table will actually belong to the model in the end and definitely belongs with your metadata.

2.5 When to use this approach

This second approach is more attractive for "one-off" creation of staging tables. It is a very quick and easy way to integrate intermediary steps in your data integration logic, but does not have the scale effect of the diagram approach.


As we have seen here, ODI offer multiple techniques to create new tables very easily directly from the Graphical Interface. Depending on the techniques used to create the tables, ODI will offer many shortcuts to make your development initiatives even faster: either the generation of DDL and interfaces based on the old and new structures, or the ability to create a table and use it immediately without ever having to deal with the metadata...


All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Looking for Data Integration at OpenWorld 2009? Were are here!

For more information about ODI, click here.

Friday Jun 12, 2009

Using an ODI Procedure to Loop Through a Command

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Procedure, Command and Logical Architecture are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

In this article we will focus on using the Command on Target and Command on Source tab of a Command in an ODI procedure.

It is not uncommon to have to execute a specific command for each value returned by a select statement. For example you might be willing to send an email as part of your integration process to a specific list of users maintained in a database table.
You might also receive several zip files that have to be extracted before being processed by ODI. An ODI procedure can help you loop through a list of files and start the extraction process.

If we want to implement the email example we will need to:
- Reverse-Engineer the database table in ODI.
- Create an ODI procedure.
- Add a Command.
- Use the Command on Source tab to execute a select statement on a table to retrieve all the email addresses.
- Use the Command on Target tab to execute an OdiSendMail process for each email address returned by the select statement.

Step 1. Create the ODI Procedure

Expand your ODI project then expand a folder and right-click on the Procedures node. Select Insert Procedure in the menu.


You can pick any name for the procedure, we will use Send Email to Mailing List Users in this example.
You don't need to modify the other parameters.


Now that the procedure is created, we will add a command.

Step 2. Add a Command

Go to the Details tab and click on the grid button to create a new Command in the procedure.


A new window will appear you can specify any name for this Command. We will use Email Step in this example.

Step 3. Define a Command on Source

In the Command on Source we want to execute a select statement on a database table to retrieve a list of email addresses.

In this example the email addresses are stored in the MAILING_LIST table in an Oracle schema called STAGING.

The table can be created easily using the following code:

To define the Command on Source implementation, click on the Command on Source tab.


Set the technology to Oracle.
Set the Schema to the logical schema that hosts your MAILING_LIST table.
You don't need to modify the other parameters.
The select statement we want to use is the following:
select EMAIL email_address from STAGING.MAILING_LIST

EMAIL is the column storing the email addresses and email_address is the alias we will use in the Command on Target to refer to them.

As we are following the ODI Best Practices we do not want to hard-code the schema name STAGING in our query. To avoid this we will use the getObjectName substitution method and let ODI complete the table name with the schema name at runtime:
select EMAIL email_address from <%=odiRef.getObjectName("L","MAILING_LIST","D")%>

Refer to the ODI documentation for additional information regarding the substitution methods.

You should now have the following:


We are done with the Command on Source, we will now define the Command on Target.

Step 4. Define a Command on Target

In the Command on Target tab we will use the OdiSendMail tool to send an email to the email addresses retrieved from the command in the Command on Source tab.

To define the Command on Target implementation, click on the Command on Target tab.


Set the technology to Sunopsis API.
You don't need to modify the other parameters.

We will use the following command:
OdiSendMail -MAILHOST= -FROM= "-TO=#email_address" "-SUBJECT=log and bad files" -ATTACH=c:/temp/log.txt
Please find attached the log file...

We are referring to the alias email_address defined in the Command on Source tab prefixed by #: #email_address. This gives us access to the email addresses retrieved by the select statement in the Command on Source.

You will have to modify the MAILHOST and FROM parameters according to your SMTP server settings otherwise you will not be able to send any emails.
The SUBJECT and ATTACH parameter can be modified as well as the text.

Refer to the ODI documentation for additional information regarding OdiSendMail and the other ODI tools.

Note: You can use any other ODI tools or technologies in the Command on Source and the Command on Target.


The procedure is now complete we can click on Execute to start it and follow its execution in Operator.

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Friday May 08, 2009

Using ODI Variables in Topology: Pushing Contexts to the Extreme

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Context, Topology, Logical Architecture, Physical Architecture and Variables are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

In a previous post, we have discussed the notion of Context and how powerful it can be. There is a limit however. If you remember our discussion, each server, each schema on each server, have to be defined in the ODI Topology. Now imagine that you design processes that have to run on a large number of systems. The exact same code will be executed; only the physical location will be different. How about having thousands of systems where this code has to run. Who wants to maintain the connection information manually through a graphical interface? And maintain thousands of contexts while doing so?

Realistically, I think that contexts are fine up to about a dozen of environments. Beyond that, you need an environment that will be more dynamic. But we want to keep the benefits of having the exact same code on all systems, the flexibility of having a complete separation of the generated code and of the execution location. The solution? Use variables in Topology!


Before jumping heads down in the usage of variables in Topology, I strongly recommend the creation of 2 contexts:
- A development context where all URL in topology point to an actual server, not using the variables. This will ensure that data can be viewed, and interfaces can be tested without any concerns regarding the variables resolution;
- The dynamic context (similar to what will be used in QA and Production) will use the variables in topology to name the servers, port numbers, user names for the connections, etc. The package will assign the appropriate values to the variable and run the interfaces on the appropriate servers. This context will only be used to validate that the processes defined in the Development context work properly when we use the variables.

Independently from ODI, we will need a table to store the values that will be used for the Topology variables. For this example, will be simply use different server names. Keep in mind that other topology parameters can be set using this same technique.

The table for our structure will contain the server names, as well as the name of an ODI agent: with many processes running concurrently, it is better to assign pools of servers to different agents. We will use the following structure:

create table ODI_SERVERS (
SERVER_NAME varchar(50),
AGENT_NAME varchar(50)

Loading this table is not part of the description we have here. If you have a list of servers available somewhere, ODI would be the perfect tool to load your table though…


2.1 Creation of the Variable
Details on the creation of a variable can be found in this post. We will simply review the main steps here.

Create a new variable called ServerName.
The data type is Alphanumeric.
You can set the default value to localhost.
You can use the Description field to describe the variable.
In the Refreshing tab, select the logical schema that hosts your ODI_SERVERS table and enter the following query:


Note: for more flexibility in the execution of your code, you should never hard-code a schema name – nor assume that the login used to connect to a server defaults to your schema name. Best practice in the example above would be to let ODI complete the table name with the schema name by using the following syntax:

select SERVER_NAME from <%=odiRef.getObjectName(“L”, “ODI_SERVERS”, “D”) %>


2.2 Definition of the Topology

2.2.1 Contexts

To get more details on the creation of contexts, please check out this example. If you already know your way around Topology, create the following two contexts: Development and Dynamic

Servers Context

2.2.2 Development Server: Hardcoded Values

In topology, define one of the servers that you want to access. We will use this server for our development work, and ignore the variables for now: enter the parameters as usual for the definition if this server:

Development Server

Make sure to test the connectivity to validate your parameters. Define a physical schemas as usual under this data server. Map it to the logical schema ORA_DYNAMIC_SERVER for the Development context.

Note: you will have to create the logical schema to perform this mapping.

Servers: Development Schema

2.2.3 Dynamic Server: Using the Variables

For the dynamic contexts, we will use the variable as part of the connection string (aka the JDBC URL). Instead of typing the actual hostname, type the variable name, including the CODE of the project where the variable was created (or GLOBAL for a global variable).

URL example for Oracle:

Servers: Dynamic

This physical server will be considered as the dynamic server. Map its physical schema to the logical schema defined in the previous step, in the Dynamic context this time.

Dynamic Server Schema


Your code will work as usual in the Development environment: you can validate your transformations, your processing logic and processing time. When you will want to use the Dynamic context though, a few more steps will be required.

3.1 Connection to the Databases

The ODI agents are in charge of the connection to the databases, sources and targets. At the beginning of the execution of a process, the agent will connect to the Master Repository to retrieve the Topology information. Then it will connect to the databases, generate the code, and send that code to the databases. This represents a challenge in our case, since we need to set a value for our variables before the agent connects to the databases: we cannot set the value of our variables in the package that will be executed...

3.2 Setting the Values for Topology Variables

The solution will be to pass these values as parameters to the process, so that the values are known to the agent before it establishes the connection. You can set the values no matter how you start the scenario: from a command line interface, from another scenario or from an ODI procedure, from a web service or from a Java API. Remember to declare your variables at the beginning of your package to make sure that the value of the parameters gets properly stored though!

For more details on how to pass parameters to an ODI process, you can check out this other post.


- Make sure that the variable name is properly spelled in Topology (Variable names are case sensitive
- Make sure that the variable is declared at the beginning of your package
- Make sure that the variable is referenced with its project CODE (not the project name) in Topology

For more information on ODI contexts, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "What is the Topology? "

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Monday Apr 27, 2009

Executing the Same Code in All Environments: ODI Contexts.

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Context, Topology, Logical Architecture and Physical Architecture are used here assuming that you understand these concepts in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information on these concepts.

The concept of Contexts in ODI is both very powerful and very convenient. Over the years, I have seen too many people ignore this feature, or not use it properly – hence not fully taking advantage of the flexibility offered by the tool. Hopefully, this brief discussion on the subject will help with a better overall utilization of this feature.


A typical challenge for any code that accesses remote systems is to update the connection parameters to access these without disrupting the code that has been developed. In particular in data integration solutions, when developers promote their code from development to QA and then to production, any modification of the code could alter its quality.

Many software solutions will provide parameter files that can be edited, or variables and parameters than can be changed to reflect the change of environment. The downside of these solutions is that maintenance is performed outside of the tool’s control, and leaves the door open for mis-configuration.

The architecture of ODI has been defined with this issue in mind. To solve this problem, developers will never have to know where they are physically connecting: the administrator will give them an alias (say ORACLE_SALES) and that will give them the connection they require. The details of the connection information is entered and managed by the administrator, through the GUI. This will allow the administrator to validate the connection strings and to check the validity of all environments: missing connections can be easily identified, different environments can be compared, etc.

As developers use the metadata to design the transformations, they have a limited need to access the actual systems. They will need to access the actual systems in two cases: to retrieve the Metadata (the reverse engineering operation) and to run the data integration processes. In the QA and production environments, we will need to access the systems as well to run the data integration jobs.

When metadata is imported from actual system, or when code is about to be executed, ODI will ask for the selection of a Context to define the execution environment.

The following screenshots show where ODI will prompt you for a context, and how you can switch from one execution context to the next.

Execution Context

Execution Context Selection

This of course assumes that you have enough security privileges to do so. Different users may have privileges to run the code in one environment and not in the other. For instance, QA engineers may not have the right to execute the processes in the Development environment or in the Production environment. Or if contexts define different company branches, employees from one branch would not be allowed to run code in a different branch.


The contexts in ODI are basically labels that you can create as needed. ODI comes with only one context called Global. Typical implementations would actually have three contexts: Development, QA, Production. As mentioned earlier, other implementations could define contexts for different branch or anything that identifies different execution locations.

Contexts will be defined in the Topology interface, along with 2 other types of objects: Logical Schemas and Physical Schemas. In the Topology GUI, you will find three corresponding tabs: Physical Architecture, Contexts, Logical Architecture.


The physical and logical architectures are organized by technologies.


The Context will simply list the contexts that you have defined, independently of the technologies.



Let us now take a databases for a data integration project. Let us say this is a sales database. The structure of the data will have to be consistent from Development to QA, and from QA to Production (Over time, the structures will evolve in the different environments. But for the QA to be meaningful, we need to be consistent from Development to QA and from QA to Production).

3.1 Creation of a Logical Schema

To represent the connectivity to the data structures, we will define an alias. Let’s call this alias ORA_SALES. This is our Logical Schema. The Logical Schema is created in the Logical Architecture tree under a given technology.

Logical Schema Creation.PNG

When you create a logical schema, unless you associate it with a context, it is just an alias that points nowhere. Create a logical Schema under the Oracle Technology and call it ORA_SALES. We will revisit this entry shortly after creating contexts, and again after defining the connectivity to the actual physical servers.

3.2 Contexts

Switch now to the Context tab in Topology, and create 3 contexts: Development, Test and Production. Simply give a name to each – no passwords are required.

Development Context.PNG

Once the contexts are created, we can revisit the definition of the logical schema we had created in the first step, and see that there is no physical connection defined for either context. Basically, we have said that we have a data model called ORA_SALES that will exist in three separate environments, but we have not defined the connectivity for either one yet: this will be done in the physical architecture.

3.3 Physical Architecture

The Physical Architecture is where we will define the actual servers that host the data. We will provide the credentials to connect to the servers: user name and password. We will also provide the JDBC connection strings, which usually contain the IP address or host name, the port number for the database, and any additional information as required by the technology: SID or service for Oracle, database name for SQL server, etc.

For this example to work in our environment, you will have to adapt it to point to your own databases and schemas.

Under the Oracle Technology, create a new Data Server (be careful when you make your selection from the menus: a common mistake is to create a new Technology instead of a new server!). Name this server, and then enter the appropriate user name and password for that server.

Physical Credentials

Once you have entered these parameters, you can click on the JDBC tab and enter the database JDBC connection information. Here, connecting to Oracle, we select the Oracle Driver name from the drop-down menu and update the JDBC URL with our server name, port number and SID.

Physical JDBC

Note: if you are not using an Oracle database, you will have to copy the JDBC driver files for your database in the appropriate ODI directory. Check out the ODI documentation for more details on this process.

You can validate the parameters you have entered with the “test” button: at this point the connection should be successful.

Succesfull Connexion

When you will save this entry (click Ok or Apply) ODI will open another window to select a Physical Schema on that server. If the schema window does not appear, you can right-click on the physical server you have just created in the Physical Architecture tree and select Insert Physical Schema.

Note: Different Technologies will have different terminologies and ODI will display the proper terminology as needed. From an ODI perspective though, we will always talk about a “Physical Schema”.

Select the Physical schema name, as well as a “work schema” from the drop-down menus.

Physical Schema Definition

Then click on the Context tab and click on the blue grid button to add a line, where you can select a context and the logical schema name. For instance, Development and ORA_SALES.

Physical Logical Mapping

What you are saying here is that ODI will connect to the physical schema that you have just defined when the generated code will refer to the ORA_SALES object and when the context selected at runtime is set to Development.

You can then define the appropriate servers and/or schemas for your QA and production environments, and associate them with the appropriate Context for your Logical Schema.


Once all physical schemas have been defined and mapped properly, you can edit your logical schema and review your mappings. You can make sure that the schema is actually pointing to a physical schema for all contexts. If a context displays Undefined for the physical schema, then your code cannot run in this context: ODI would not know where to connect.

Logical Schema Review


You can also edit your contexts and select the Schema tab to review the mapping of all logical schemas for this context: all you file directories, source systems and target systems must be properly mapped for your code to run in this environment.

Context Review

For more information on ODI contexts, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "What is the Topology? "

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Monday Apr 20, 2009

Using Variables in ODI: Creating a Loop in a Package

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Model, Project, Interface and Package are used here assuming that you understand these concepts in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information on these concepts.

In our last post in this series, we looked into passing parameters to dynamically set the name of the file that we want to process. An alternative may be to parse a list of file names that would be stored in a table in your database and to loop over this list to process all files.

To perform this, we will need to perform the following operations:
- Create a variable to store the file name, and a second variable that we will use as a cursor to point in our table
- Define a package and use the variables to loop over the different values in the table

If you need help with the creation of variables please refer to our post on the usage of variables in ODI Interfaces and Packages.


1.1 The Counter Variable

We will start here with the definition of a variable that we will use to loop through your list of files. Let’s call it Counter. I usually use an alphanumeric variable and define the SQL query as being

select #Counter+1 from dual

(Note: the code would vary for non Oracle databases). I find alphanumeric variables easier to process than using numerics as numerics come with decimal values and may require conversions based on where and how you use them. One advantage of numeric variables though is that you can use ODI to increment their values when you select the Assign action on the variable.

Create Counter Variable Counter Variable Query

1.2 The FileName Variable

For this example, we assume that the file names are stored into a table called ODI_FILES_TABLE. This table has only one column with the file names, called FILE_NAME. And we store the table in an Oracle database. You can easily adapt the code for different table structures or databases. Here we will parse the table taking advantage of the ROWNUM returned by the Oracle database.

Create a Variable in the same project, and call it FileName. Define the following query for the FileName variable:

select FILE_NAME
where COUNTER=#Counter

Note that we are using our fist variable to retrieve one record only. To run this statement, remember that the Counter variable MUST have a value – otherwise the where clause would not be valid.

Create FileName Variable FileName Variable Query


We will now create a package to design our loop. In the package, drag and drop the Counter variable and set the action to Set Variable. Assign the value 1 to the variable (select Assign in the properties window and enter the value 1 in the text box below the Assign selection).

Drag and drop the FileName variable in the package after the Counter variable. Set the action on the variable to Refresh Variable. In case of success after the refresh, execute any procedures or interfaces where the FileName is used. In case of failure of the refresh, execute a last step to indicate you are done: a failure here would indicate that you have exhausted all files listed in the table. In our example we are sending an email, but you could execute another branch of your process.

After the execution of the procedures and interfaces, add the Counter variable again, but this time set the action to Refresh Variable: this will increment the value of the variable. (If you are using a numeric variable, you can replace this with a Set Variable action and set the operation to Increment).

Then loop back to the refresh step of the FileName variable: this will either select the next file in the list… or fail and exit the loop.

Loop Package


3.1 A Cleaner Exit

For a cleaner exit than the one described here, you may want to use a third variable where keep track of the number of files in your table with the following query:

select count(*) from ODI_FILES_TABLE

Refresh this new variable at the beginning of the package. You can then compare your Counter variable (using the action Evaluate Variable) to that value and exit the loop when you reach this value.

3.2 Guarantying that the tables are found in all Contexts

Instead of hard-coding schema names in your variable refresh queries, remember to use the ODI Substitution Method that will automatically rebuild the schema name in all contexts: odiRef.getObjectName. For instance:

select FILE_NAME
from (select FILE_NAME, ROWNUM COUNTER from < % = odiRef.getObjectName("ODI_FILES_TABLE") % >)
where COUNTER=#Counter


For more information on ODI variables, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "Creating and Using Variables"

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.



Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« July 2016