Wednesday Sep 09, 2009

Generating Sample Data with ODI: A Case Study For Knowledge Modules and User Functions

Looking for Data Integration at OpenWorld 2009? Look no further: all you need is here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model, Knowledge Module and User Function are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

We've all been there: we start coding, waiting for a set of sample data to be available. We move along with the code... and the data is not available. Or we need to build a small (or not so small) data set quickly. Sure, we all have sample databases left and right for that purpose. But recently I was looking for a decent size data set for some tests (more than the traditional 30 sample records) and could not put my hands on what I needed. What the heck: why not have ODI build this for me?

The techniques that we will leveraged for this are the following:

  • Creation of a temporary interface to create the sample table (See this previous post for details on how to create a temporary interface)
  • Creation of a new knowledge module to generate enough records in the new table
  • Creation of ODI User Functions to simplify the generation of random values


All the objects mentioned in this article can be downloaded. Save

this XML file
if you want to import in your repository a project that already contains all the objects (IKM and User functions). Click
if you want to download a file that will let you import the different objects individually. You will have to unzip the file before importing the objects in the later case.

The samples provided here have all been designed for an Oracle database, but can be modified and adapted for other technologies.

Today we will discuss the different elements that allow us to generate the sample data set. In future posts, we will dissect the Knowledge Modules and User Functions to see what technological choices were made based on the different challenges that had to be solved.


For more details on how to create a temporary interface, you can refer to this post. For our example, we will create a new table in an existing schema. When you create your temporary interface, remember to set the following elements:

  • Select of your staging area ( In the Definition tab of the interface)
  • Name your target table
  • Select the location of your target table (work schema / data schema)
  • Name the Columns, and set their individual data type and length


Interface Definition Tab

For our example, we will use a fairly simple table structure:


SAMPLER_ID number(3)
SAMPLER_NAME varchar2(30)
SAMPLER_PRICE number(10,2)


Sampler Interface Table Creation


The Oracle database comes with a package called DBMS_RANDOM. Other random generators can be used (DBMS_CRYPTO for instance has random generation functions as well). These functions take more or less parameters, and if we realize after creating dozens of mappings that using the "other" package would have been better... we would be in a lot of trouble. Creating user functions will allow us to:

  • Have a naming convention that is simplified
  • Limit the number of parameters
  • Limit the complexity of the code
  • Later maintain the code independently of our interfaces, in a centralized location: if we decide to change the code entirely, we will make modifications in one single place - no matter how often we use that function.


For our example, we will have 5 ODI user functions in ODI (again, these can be downloaded


  • RandomDecimal(Min, Max): generates a random value (with decimals) between the Min and Max values
  • RandomNumber(Min, Max): generates a random value (without decimals) between the Min and Max values
  • RandomBool(): generate a 0 or a 1
  • RandomDate(MinDate, MaxDate): returns a date between MinDate and MaxDate (make sure MinDate and MaxDate are valid dates for Oracle)
  • RandomString(Format, Min, Max): generates a random string with a minimum of Min characters and a maximum of Max characters. Valid formats are:
    • 'u', 'U' - returning string in uppercase alpha characters
    • 'l', 'L' - returning string in lowercase alpha characters
    • 'a', 'A' - returning string in mixed case alpha characters
    • 'x', 'X' - returning string in uppercase alpha-numeric characters
    • 'p', 'P' - returning string in any printable characters.



We can either use these functions as is or as part of a more complex code logic, such as a case...when statement.

For our example, we will build the following mappings:

Column Mapping
SAMPLER_ID RandomNumber(1,999)
SAMPLER_NAME RandomString('A', 1, 30)
SAMPLER_PROMOTION case when RandomBool()=0 then 'FALSE'
else 'TRUE'
SAMPLER_PRICE RandomDecimal(1,10000)
SAMPLER_RELEASE_DATE RandomDate('01-JAN-2000', sysdate)

In ODI, the mappings will look like this:

Sampler Interface Mappings


Since we do not have any source table in this interface, we only need an IKM. The IKM provided will this example needs to be imported in your project.

Because the purpose of this KM is to generate sample data, it will have a few options where the default values will be different from the usual KMs:

  • TRUNCATE defaults to 'YES': we assume here that if you re-run the interface, you want to create a new sample. If you only want to add more records to an existing table, simply set this option to 'NO' in your interface.
  • CREATE_TABLE defaults to 'YES': we assume that the table to be loaded does not exist yet. You can turn that option to 'NO' if there is no need to create the table.
  • THOUSANDS_OF_RECORDS: set this to any value between 1 and 1,000 to generate between 1,000 and 1,000,000 records


Sampler IKM

Once you have set the values for your KM, you can run the interface and let it generate the random data set.

With the above configuration, and using a standard laptop (dual core 1.86GHz processor and 2 Gb of RAM) equipped with Oracle XE my statistics were as follows:

10,000 records generated in 5 seconds
100,000 records generated in 24 to 35 seconds (about 30 seconds on average)
1,000,000 records generated in 211 to 235 seconds (about 4 minutes on average)

Note that the machine was not dedicated to this process and was running other processes.

Statistics are available in the Operator interface.

Sampler Stats

To review the data loaded by ODI in your target table, simply reverse-engineer this table in a model, then right-click on the table and select View Data to see what was generated!



One question: why did I stop here and did not try to make this work for other technologies? Well, it turns out that ODI is really meant to move and transform data. As long as I have at least ONE table with random data in any one of my databases, it is now faster to just create a regular ODI interface and move the data across... The design will take less than a minute. The data transfer should not take much time either. Who would try to spend more time coding when the solution is that simple?

But if you want to make this work for other databases, here are your entry points:

  • Duplicate the KM and modify it to use SQL that would work on these other databases
  • Update the user functions to make sure that they use the appropriate functions for the given databases
  • Use the same logic to create your interface



All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Data Integration Showcased at OpenWorld 2009

Thursday Sep 03, 2009

Designing and Loading Your Own Staging Tables

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model and Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.


When ODI moves data from your source systems to your target system, it automates the management (creation, loading, deletion) of all the necessary staging tables. The Knowledge Modules control which tables are created when and where, so that the developers do not have to worry about these mechanics.
There are cases though where the logic of your integration processes calls for the creation of your own staging tables, independently of what ODI will create. You may want to pre-aggregate some data before applying further logic. You may want to break a very complex interface into a set on interfaces that will be easier to develop and maintain. Over time, I have found that creating these intermediate tables very valuable.
The problem is: they are not part of the data model.
One of the ODI features that is often overlooked is the ability to create tables directly from the designer tool.

We will look into the creation of the table from 2 different angles: using ODI Diagrams or using temporary interfaces. We will look into the pros and cons of each approach.Note that these techniques can be used beyond staging tables and leveraged in the design of your overall solutions.

1. ODI Diagrams

ODI Diagrams can be found under the Models section in the Designer interface.


Creating Diagrams will have many benefits from a design perspective:
- You will be able to retrieve and convert table structures from other technologies, ODI will automatically convert the data types as required;
- Once you have designed your new tables, ODI can generate a DDL that is context independent (See this previous post for more details on contexts);
- If you derive the structure of the new tables from existing models, ODI will offer the automatic creation of interfaces for you - using the base tables as sources and the new tables as targets.

Let's now see how to create a diagram and get started.

1.1 Creation of an ODI Diagram

In your models, you will find the entry point for Diagrams directly under the Model name itself. Simply right-click on the Diagrams entry and select Insert Diagram to create your first Diagram.

Insert Diagram

When you create a diagram, you can either model all your tables one column at a time, or start from existing structures. This is the feature that I like best, as it will save me a lot of time in all subsequent operations. To try this out, simply drag and drop a table from another model (if possible even from a different technology) directly in the model. ODI will first ask you if you really want to create this table in the new model. Then it will immediately match the data types with the new technology, and display the resulting table so that you can make any modification as required: you may want different datatypes, different column names, you may want to add and remove columns, or even rename the table itself. You can either perform these operations as you are copying the structure, or later on when you will edit your diagram.

Diagram Table Name

Diagram Table Columns

Once you validate the definition of the table, ODI will display the table in the diagram and add it to your model tree. Note that at this point in time, the table is not created in the database.

Not all tables will necessarily have a matching structure in other systems. You may have to create them directly from the diagram. You can then take advantage of the toolbar to create not only the table and its columns, but also to define the relationship between the different tables.


1.2 Leveraging the ODI Diagram

Once the Diagram has been created, several options become available:
- Generation of the DDLs to create the tables in the actual databases
- Generation of Interfaces using the initial tables as sources and the new tables as target (and vice-versa).
All these operations are possible from the Models menu (right-click on the model to see these operations).

Diagram Menus

The Generate DDL menu will allow you to create the tables you have designed along with all the necessary constraints. Furthermore, you will be able to run the DDL in all contexts with absolutely no code modification - hence guarantying consistency across environments .

The creation of interfaces will be as easy. But interfaces will require knowledge modules, and to truly take advantage of this feature, you want to make sure that the knowledge modules of your choice (and only these!) are imported in a project. The trick with "only these" is that you want the knowledge modules to be selected for you automatically. When ODI will generate the interfaces, if there is only one choice for the LKM, and only one choice for the IKM, maybe a CKM if needed - then you will not even have to open the interfaces before you run them (except for a sanity check maybe!). Do not hesitate and modify the default values for the KMs you have selected as, again, that will save you a tremendous amount of time in terms of reviews.

Once the KMs are set, go back to your model, and select the menu entry Generate Interfaces IN to generate the interfaces that will load your new tables (or Generate Interfaces OUT to generate interfaces that will extract data from your new tables). Select the project where you have imported the appropriate KMs as the destination for the interfaces, and wait for the automatic generation to happen. If you have renamed columns, they are properly mapped to the original column. If a table was created from several source tables, all sources are part of the interfaces. You only have to worry about new columns, relationships between source tables that ODI would not know of, and transformations if you need any.

Over time, I've used this feature a lot when I am given a set of files that are supposed to represent tables in a real case. I reverse engineer the files - then use the files definitions to quickly design the equivalent structures in the database that I need to load... and run the DDLs and interfaces all in one shot. A very nice way to save hours of development time!

1.3 When to use this approach

The real benefit of this approach is in the massive generation of DDLs and interfaces. If you only need to load a couple of tables, the benefit is not as obvious. But as the number of tables to create and interfaces to generate grows, so does the value of Diagrams. I have recently worked on a case where in less than an hour, I had two dozen tables designed, created, and loaded with data when initially all I had was a series of flat files...


There will be cases where you need to create just a few set of staging tables and you do not necessarily want to go through the creation of models and DDLs. For these cases, ODI will let you create your table directly from with your interfaces.

2.1 Creating the Interface

When you create a new table in an interface, you will first have to indicate where the table will be created. To do so, select "staging area different from target" in the definition tab of your interface, and select the schema where the table will be created.

New Interface Staging

2.2 Designing the table

The first step will be to name the table, and define where it will be created (in the data schema or in the work schema. These schemas are defined in the Topology interface. The work schema is typically your staging area). To name the table, click on the "Untitled" caption in the target side of you interface, and enter a table name in the properties window. It is in that same properties window that you will select the data or work schema.

Interface - Name Table

You can then add columns to this table. If you add a datastore on the source side of your interface, you can add the columns one by one (simply drag and drop the columns from the source datastore to the target datastore. Or you can right-click on any source column to add it to the target table. Or you can right-click on the table name and add all columns at once by selecting the menu "add to target". When the columns are added, ODI converts the data types to match those of the target technology, and pre-maps the target columns to the source columns.

Of course, you can always edit the column information later on in the Properties window.

Interface Add Columns

You can also right-click under the existing columns on the target side to manually add new columns. In this case, you will have to specify the column name, data type and size.

Interface Add New Column

2.3 Creating and loading the table

Once you have created the table as needed, you can use it as any other target table in your interface and design your transformations as usual. The one thing to remember will be to set the "CREATE_TARGET_TABLE" option to "YES" in the IKM of this interface to make sure that the table is created at execution time.

2.4 Using the staging table as a source

Once this table has been created and loaded, all you have to do to use it as a source is to drag and drop the Interface as a source in the next interface (you read that right: drag and drop the interface directly from the project tree, as if it were a table!). No need to actually reverse engineer the table and add it to your model.

This does not mean that you cannot reverse engineer the table and add it to your model. In many cases, the table will actually belong to the model in the end and definitely belongs with your metadata.

2.5 When to use this approach

This second approach is more attractive for "one-off" creation of staging tables. It is a very quick and easy way to integrate intermediary steps in your data integration logic, but does not have the scale effect of the diagram approach.


As we have seen here, ODI offer multiple techniques to create new tables very easily directly from the Graphical Interface. Depending on the techniques used to create the tables, ODI will offer many shortcuts to make your development initiatives even faster: either the generation of DDL and interfaces based on the old and new structures, or the ability to create a table and use it immediately without ever having to deal with the metadata...


All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Looking for Data Integration at OpenWorld 2009? Were are here!

For more information about ODI, click here.


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« June 2016