« June 2009 | Main | October 2009 »

September 2009 Archives

September 3, 2009

Designing and Loading Your Own Staging Tables

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model and Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

When ODI moves data from your source systems to your target system, it automates the management (creation, loading, deletion) of all the necessary staging tables. The Knowledge Modules control which tables are created when and where, so that the developers do not have to worry about these mechanics.
There are cases though where the logic of your integration processes calls for the creation of your own staging tables, independently of what ODI will create. You may want to pre-aggregate some data before applying further logic. You may want to break a very complex interface into a set on interfaces that will be easier to develop and maintain. Over time, I have found that creating these intermediate tables very valuable.
The problem is: they are not part of the data model.
One of the ODI features that is often overlooked is the ability to create tables directly from the designer tool.

We will look into the creation of the table from 2 different angles: using ODI Diagrams or using temporary interfaces. We will look into the pros and cons of each approach.Note that these techniques can be used beyond staging tables and leveraged in the design of your overall solutions.


1. ODI Diagrams

ODI Diagrams can be found under the Models section in the Designer interface.

DiagramNew.PNG

Creating Diagrams will have many benefits from a design perspective:
- You will be able to retrieve and convert table structures from other technologies, ODI will automatically convert the data types as required;
- Once you have designed your new tables, ODI can generate a DDL that is context independent (See this previous post for more details on contexts);
- If you derive the structure of the new tables from existing models, ODI will offer the automatic creation of interfaces for you - using the base tables as sources and the new tables as targets.

Let's now see how to create a diagram and get started.

1.1 Creation of an ODI Diagram

In your models, you will find the entry point for Diagrams directly under the Model name itself. Simply right-click on the Diagrams entry and select Insert Diagram to create your first Diagram.

Insert Diagram

When you create a diagram, you can either model all your tables one column at a time, or start from existing structures. This is the feature that I like best, as it will save me a lot of time in all subsequent operations. To try this out, simply drag and drop a table from another model (if possible even from a different technology) directly in the model. ODI will first ask you if you really want to create this table in the new model. Then it will immediately match the data types with the new technology, and display the resulting table so that you can make any modification as required: you may want different datatypes, different column names, you may want to add and remove columns, or even rename the table itself. You can either perform these operations as you are copying the structure, or later on when you will edit your diagram.

Diagram Table Name
Diagram Table Columns

Once you validate the definition of the table, ODI will display the table in the diagram and add it to your model tree. Note that at this point in time, the table is not created in the database.

Not all tables will necessarily have a matching structure in other systems. You may have to create them directly from the diagram. You can then take advantage of the toolbar to create not only the table and its columns, but also to define the relationship between the different tables.

Diagram.PNG


1.2 Leveraging the ODI Diagram

Once the Diagram has been created, several options become available:
- Generation of the DDLs to create the tables in the actual databases
- Generation of Interfaces using the initial tables as sources and the new tables as target (and vice-versa).
All these operations are possible from the Models menu (right-click on the model to see these operations).

Diagram Menus

The Generate DDL menu will allow you to create the tables you have designed along with all the necessary constraints. Furthermore, you will be able to run the DDL in all contexts with absolutely no code modification - hence guarantying consistency across environments .

The creation of interfaces will be as easy. But interfaces will require knowledge modules, and to truly take advantage of this feature, you want to make sure that the knowledge modules of your choice (and only these!) are imported in a project. The trick with "only these" is that you want the knowledge modules to be selected for you automatically. When ODI will generate the interfaces, if there is only one choice for the LKM, and only one choice for the IKM, maybe a CKM if needed - then you will not even have to open the interfaces before you run them (except for a sanity check maybe!). Do not hesitate and modify the default values for the KMs you have selected as, again, that will save you a tremendous amount of time in terms of reviews.

Once the KMs are set, go back to your model, and select the menu entry Generate Interfaces IN to generate the interfaces that will load your new tables (or Generate Interfaces OUT to generate interfaces that will extract data from your new tables). Select the project where you have imported the appropriate KMs as the destination for the interfaces, and wait for the automatic generation to happen. If you have renamed columns, they are properly mapped to the original column. If a table was created from several source tables, all sources are part of the interfaces. You only have to worry about new columns, relationships between source tables that ODI would not know of, and transformations if you need any.

Over time, I've used this feature a lot when I am given a set of files that are supposed to represent tables in a real case. I reverse engineer the files - then use the files definitions to quickly design the equivalent structures in the database that I need to load... and run the DDLs and interfaces all in one shot. A very nice way to save hours of development time!

1.3 When to use this approach

The real benefit of this approach is in the massive generation of DDLs and interfaces. If you only need to load a couple of tables, the benefit is not as obvious. But as the number of tables to create and interfaces to generate grows, so does the value of Diagrams. I have recently worked on a case where in less than an hour, I had two dozen tables designed, created, and loaded with data when initially all I had was a series of flat files...

2. TEMPORARY INTERFACES

There will be cases where you need to create just a few set of staging tables and you do not necessarily want to go through the creation of models and DDLs. For these cases, ODI will let you create your table directly from with your interfaces.

2.1 Creating the Interface

When you create a new table in an interface, you will first have to indicate where the table will be created. To do so, select "staging area different from target" in the definition tab of your interface, and select the schema where the table will be created.

New Interface Staging

2.2 Designing the table

The first step will be to name the table, and define where it will be created (in the data schema or in the work schema. These schemas are defined in the Topology interface. The work schema is typically your staging area). To name the table, click on the "Untitled" caption in the target side of you interface, and enter a table name in the properties window. It is in that same properties window that you will select the data or work schema.

Interface - Name Table

You can then add columns to this table. If you add a datastore on the source side of your interface, you can add the columns one by one (simply drag and drop the columns from the source datastore to the target datastore. Or you can right-click on any source column to add it to the target table. Or you can right-click on the table name and add all columns at once by selecting the menu "add to target". When the columns are added, ODI converts the data types to match those of the target technology, and pre-maps the target columns to the source columns.

Of course, you can always edit the column information later on in the Properties window.

Interface Add Columns

You can also right-click under the existing columns on the target side to manually add new columns. In this case, you will have to specify the column name, data type and size.

Interface Add New Column


2.3 Creating and loading the table

Once you have created the table as needed, you can use it as any other target table in your interface and design your transformations as usual. The one thing to remember will be to set the "CREATE_TARGET_TABLE" option to "YES" in the IKM of this interface to make sure that the table is created at execution time.

2.4 Using the staging table as a source

Once this table has been created and loaded, all you have to do to use it as a source is to drag and drop the Interface as a source in the next interface (you read that right: drag and drop the interface directly from the project tree, as if it were a table!). No need to actually reverse engineer the table and add it to your model.

This does not mean that you cannot reverse engineer the table and add it to your model. In many cases, the table will actually belong to the model in the end and definitely belongs with your metadata.

2.5 When to use this approach

This second approach is more attractive for "one-off" creation of staging tables. It is a very quick and easy way to integrate intermediary steps in your data integration logic, but does not have the scale effect of the diagram approach.

CONCLUSION

As we have seen here, ODI offer multiple techniques to create new tables very easily directly from the Graphical Interface. Depending on the techniques used to create the tables, ODI will offer many shortcuts to make your development initiatives even faster: either the generation of DDL and interfaces based on the old and new structures, or the ability to create a table and use it immediately without ever having to deal with the metadata...

Enjoy!


All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.


Looking for Data Integration at OpenWorld 2009? Were are here!

For more information about ODI, click here.

September 9, 2009

Generating Sample Data with ODI: A Case Study For Knowledge Modules and User Functions

Looking for Data Integration at OpenWorld 2009? Look no further: all you need is here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model, Knowledge Module and User Function are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.


We've all been there: we start coding, waiting for a set of sample data to be available. We move along with the code... and the data is not available. Or we need to build a small (or not so small) data set quickly. Sure, we all have sample databases left and right for that purpose. But recently I was looking for a decent size data set for some tests (more than the traditional 30 sample records) and could not put my hands on what I needed. What the heck: why not have ODI build this for me?

The techniques that we will leveraged for this are the following:

  • Creation of a temporary interface to create the sample table (See this previous post for details on how to create a temporary interface)

  • Creation of a new knowledge module to generate enough records in the new table

  • Creation of ODI User Functions to simplify the generation of random values


All the objects mentioned in this article can be downloaded. Save this XML file if you want to import in your repository a project that already contains all the objects (IKM and User functions). Click here if you want to download a file that will let you import the different objects individually. You will have to unzip the file before importing the objects in the later case.

The samples provided here have all been designed for an Oracle database, but can be modified and adapted for other technologies.

Today we will discuss the different elements that allow us to generate the sample data set. In future posts, we will dissect the Knowledge Modules and User Functions to see what technological choices were made based on the different challenges that had to be solved.

1. THE INTERFACE

For more details on how to create a temporary interface, you can refer to this post. For our example, we will create a new table in an existing schema. When you create your temporary interface, remember to set the following elements:

  • Select of your staging area ( In the Definition tab of the interface)

  • Name your target table

  • Select the location of your target table (work schema / data schema)

  • Name the Columns, and set their individual data type and length


Interface Definition Tab

For our example, we will use a fairly simple table structure:
TABLE NAME:

SAMPLER

COLUMNS:
SAMPLER_ID number(3)
SAMPLER_NAME varchar2(30)
SAMPLER_PROMOTION varchar2(10)
SAMPLER_PRICE number(10,2)
SAMPLER_RELEASE_DATE date

Sampler Interface Table Creation

2. USER FUNCTIONS

The Oracle database comes with a package called DBMS_RANDOM. Other random generators can be used (DBMS_CRYPTO for instance has random generation functions as well). These functions take more or less parameters, and if we realize after creating dozens of mappings that using the "other" package would have been better... we would be in a lot of trouble. Creating user functions will allow us to:

  • Have a naming convention that is simplified

  • Limit the number of parameters

  • Limit the complexity of the code

  • Later maintain the code independently of our interfaces, in a centralized location: if we decide to change the code entirely, we will make modifications in one single place - no matter how often we use that function.


For our example, we will have 5 ODI user functions in ODI (again, these can be downloaded here):

  • RandomDecimal(Min, Max): generates a random value (with decimals) between the Min and Max values

  • RandomNumber(Min, Max): generates a random value (without decimals) between the Min and Max values

  • RandomBool(): generate a 0 or a 1

  • RandomDate(MinDate, MaxDate): returns a date between MinDate and MaxDate (make sure MinDate and MaxDate are valid dates for Oracle)

  • RandomString(Format, Min, Max): generates a random string with a minimum of Min characters and a maximum of Max characters. Valid formats are:
    • 'u', 'U' - returning string in uppercase alpha characters

    • 'l', 'L' - returning string in lowercase alpha characters

    • 'a', 'A' - returning string in mixed case alpha characters

    • 'x', 'X' - returning string in uppercase alpha-numeric characters

    • 'p', 'P' - returning string in any printable characters.


SamplerUserFunctions.PNG

We can either use these functions as is or as part of a more complex code logic, such as a case...when statement.

For our example, we will build the following mappings:








ColumnMapping
SAMPLER_IDRandomNumber(1,999)
SAMPLER_NAMERandomString('A', 1, 30)
SAMPLER_PROMOTIONcase when RandomBool()=0 then 'FALSE'
else 'TRUE'
end
SAMPLER_PRICERandomDecimal(1,10000)
SAMPLER_RELEASE_DATERandomDate('01-JAN-2000', sysdate)


In ODI, the mappings will look like this:

Sampler Interface Mappings

3. THE KNOWLEDGE MODULE

Since we do not have any source table in this interface, we only need an IKM. The IKM provided will this example needs to be imported in your project.

Because the purpose of this KM is to generate sample data, it will have a few options where the default values will be different from the usual KMs:

  • TRUNCATE defaults to 'YES': we assume here that if you re-run the interface, you want to create a new sample. If you only want to add more records to an existing table, simply set this option to 'NO' in your interface.

  • CREATE_TABLE defaults to 'YES': we assume that the table to be loaded does not exist yet. You can turn that option to 'NO' if there is no need to create the table.

  • THOUSANDS_OF_RECORDS: set this to any value between 1 and 1,000 to generate between 1,000 and 1,000,000 records

Sampler IKM

Once you have set the values for your KM, you can run the interface and let it generate the random data set.

With the above configuration, and using a standard laptop (dual core 1.86GHz processor and 2 Gb of RAM) equipped with Oracle XE my statistics were as follows:

10,000 records generated in 5 seconds
100,000 records generated in 24 to 35 seconds (about 30 seconds on average)
1,000,000 records generated in 211 to 235 seconds (about 4 minutes on average)

Note that the machine was not dedicated to this process and was running other processes.

Statistics are available in the Operator interface.

Sampler Stats

To review the data loaded by ODI in your target table, simply reverse-engineer this table in a model, then right-click on the table and select View Data to see what was generated!

SamplerData.PNG

4. EXPANDING TO OTHER TECHNOLOGIES

One question: why did I stop here and did not try to make this work for other technologies? Well, it turns out that ODI is really meant to move and transform data. As long as I have at least ONE table with random data in any one of my databases, it is now faster to just create a regular ODI interface and move the data across... The design will take less than a minute. The data transfer should not take much time either. Who would try to spend more time coding when the solution is that simple?

But if you want to make this work for other databases, here are your entry points:

  • Duplicate the KM and modify it to use SQL that would work on these other databases

  • Update the user functions to make sure that they use the appropriate functions for the given databases

  • Use the same logic to create your interface


Enjoy!


All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Data Integration Showcased at OpenWorld 2009

September 13, 2009

How to Define Multi Record Format Files in ODI

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Datastore, Model and Logical Schema are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

It is not unusual to have to load files containing various record formats. For example a company might store orders and order lines using distinct record formats in the same flat file or you might have one single file containing a header, some records and a footer.
In this post, we'll use the following source file as an example:


1,101,2009/09/01,Computer Parts
2,234,101,Motherboard, Asus P6T,239.99
2,235,101,CPU,Intel Celeron 430,40
1,102,2009/09/02,Computer Parts
2,301,102,CPU,AMD Phenom II X4,170
1,103,2009/09/05,Printers
2,401,103,Inkjet Printer,Canon iP4600,69.99
2,402,103,Inkjet Printer,Epson WF30,39.99
2,403,103,Inkjet Printer,HP Deskjet D2660,49.99

As we can see the Order and Order Lines records have different formats (one has 4 fields, the other has 6 fields), they could also have a different field separator.

Identifying the Record Codes

The first step in order to handle such a file in ODI is to identify a record code, this record code should be unique for a particular record type. In our example the record code will be used by ODI to identify if the record is an Order or an Order Line. All the Order records should have the same record code, this also applies to the Order Lines records.
In our example the first field indicates the record code:
- 1 for Orders records.
- 2 for Order Lines records.

Define the Datastores

We assume that you have already created a Model using a Logical Schema that points to the directory containing your source file.

We will start by defining a datastore for the Order records.

Right-click on the File model and select Insert Datastore.
In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. 

Order_Datastore.jpg

In the Files tab, specify your flat file settings (delimiter, field separator etc.)

Refer to the ODI documentation for additional information regarding how to define a flat file datastore.

In our example the Order records have 4 fields:
- RECORD_TYPE
- ORDER_ID
- ORDER_DATE
- ORDER_TYPE

Go to the columns tab and add those 4 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.

Order_Datastore_Columns.jpg

Click OK.

In the Models view, right-click on the datastore and select View Data to display the file content and make sure it is defined correctly. 

Order_Datastore_Data.jpg

The data is filtered based on the record code value, we only see the Order records.

We will now apply the same approach to the Order Lines record.

Right-click on the File model and select Insert Datastore to add a second datastore for the Order Lines record.

In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. We are pointing this datastore to the same file we used for the Order records. 

Order_Line_Datastore.jpg

In the Files tab, specify your flat file settings (delimiter, field separator etc.).
Refer to the ODI documentation for additional information regarding how to define a flat file datastore.

In our example the Order Lines records have 6 fields:
- RECORD_TYPE
- LORDER_ID
- ORDER_ID
- LINE_ORDER_TYPE
- ITEM
- PRICE

Go to the columns tab and add those 6 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.

Order_Line_Datastore_Columns.jpg

Click OK.

Right-click on the datastore and select View Data to display the file content and make sure it is defined correctly.

Order_Line_Datastore_Data.jpg

The data is filtered based on the record code value, we only see the Order Lines records.

You can now use those 2 datastores in your interfaces.

All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

September 20, 2009

ODI User Functions: A Case Study

Looking for Data Integration at OpenWorld 2009? Check out here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface and User Function are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..

This post will give some examples of how and where user functions can be used in ODI. We will look back at a previous post and see why and how user functions were created for this case.

1. A CASE FOR USER FUNCTIONS

As I was trying to design a simple way to generate random data, I simple approach to specify the type of data to be generated. An example would be to easily generate a random string.

Working on Oracle for this example, I could leverage the database function DBMS_RANDOM.STRING. It takes 2 parameters: one for the type of characters (uppercase, lowercase, mixed case), and on for the length of the string. But I want a little more than this: I also want the ability to force my generation to have a minimum number of characters. For this, I now need to generate a random number. The database function DBMS_RANDOM.VALUE does this, but returns a decimal value. Well, the TRUNC function can take care of this... but my mapping expression becomes somewhat complex:

DBMS_RANDOM.STRING(Format, TRUNC(DBMS_RANDOM.VALUE(MinLen,MaxLen)))

Imagine now using this formula over and over again in your mappings - not the easiest and most readable portion of code to handle. And maintenance will easily become a nightmare if you simply cut and paste...

A user function makes sense at this point, and will provide the following benefits:


  • A readable name that makes the usage and understanding of the transformation formula a lot easier

  • A central place to maintain the transformation code

  • The ability to share the transformation logic with other developers who do not have to come up with the code for this anymore.

From a code generation perspective, if you use the User Function in your interfaces, ODI will replace the function name with the associated code and send that code to the database. Databases will never see the User Function name - the substitution is part of the code generation.

2. BUILDING THE USER FUNCTION

You need to provide 3 elements when you build a user function:


  • A name (and a group name to organize the functions)

  • A syntax

  • The actual code to be generated when the developers will use the functions


2.1 Naming the User Function

The name of the user function is yours to choose, but make sure that it properly describes what this function is doing: this will make the function all the more usable for others. You will also notice a drop down menu that will let you select a group for this function. If you want to create a new group, you can directly type the group name in the drop down itself.

For our example, we will name the user Function RandomString and create a new group called RandomGenerators.

User Function Name


2.2 The User Function Syntax

The next step will be to define the syntax for the function. Parameters are defined with a starting dollarsign:
$
and enclosed in parenthesis:
().
You can name your parameters as you want: these names will be used when you put together the code for the user functions, along with the $ and (). You can have no parameters or as many parameters as you want...

In our case, we need 3 parameters: One for the string format, one for the minimum length of the generated string, one for the maximum length. Our syntax will hence be:

RandomString($(Format), $(MinLen), $(MaxLen))

If you want to force the data types for the parameters, you can do so by adding the appropriate character after the parameter name: s for String, n for Numeric and d for Date. In that case our User Function syntax would be:

RandomString($(Format)s, $(MinLen)n, $(MaxLen)n)

User Function Syntax


2.3 Code of the User Function

The last part in the definition of the user function will be to define the associated SQL code. To do this, click on the Implementation tab and click the Add button. A window will pop up with two parts: the upper part is for the code per se, the bottom part is for the selection of the technologies where this syntax can be used. Whenever you will use the User Function in your mappings, if the underlying technology is one of the ones you have selected, then ODI will substitute the function name with the code you enter here.

For our example, select Oracle in the list of Linked Technologies and type the following code in the upper part:

DBMS_RANDOM.STRING($(Format), TRUNC(DBMS_RANDOM.VALUE($(MinLen),$(MaxLen))))

Note that we are replacing here the column names with the names of the parameters we have defined in the syntax field previously.

User Function Code

Click on the Ok button to save your syntax and on the Ok or Apply button to save your User Function.

3. EXPANDING THE USER FUNCTIONS TO OTHER TECHNOLOGIES

Once we are done with the previous step, the user function is ready to be used. One downside with our design so far: it can only be used on one technology. Chances are we will need a more flexibility.

One key feature of the User Functions is that they will give you the opportunity to enter the matching syntax for any other database of your choice. No matter which technology you will use later on, you will only have to provide the name of the user function, irrespectively of the underlying technology.

To add other implementations and technologies, simply go back to the Implementation tab of the User Function, click the Add button. Select the technology for which you are defining the code, and add the code.

Note that you can select multiple technologies for any given code implementation: these technologies will then be listed next to one antother.

4. USING THE USER FUNCTIONS IN IINTERFACES

To use the User Functions in your interfaces, simply enter them in your mappings, filters, joins and constraints the same way you would use SQL code.

Interface Mappings

When you will execute your interface, the generated code can be reviewed in the Operator interface. Note that you should never see the function names in the generated code. If you do, check out the following elements:


  • User Function names are case sensitive. Make sure that you are using the appropriate combination of uppercase and lowercase characters

  • Make sure that you are using the appropriate number of parameters, and that they have the appropriate type (string, number or date)

  • Make sure that there is a definition for the User Function for the technology in which it is running. This last case may be the easiest one to oversee, so try to keep it in mind!


As long as you see SQL code in place of the User Function name, the substitution happened successfully.

Enjoy!

All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

About September 2009

This page contains all entries posted to Data Integration and Management in September 2009. They are listed from oldest to newest.

June 2009 is the previous archive.

October 2009 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle