Friday Jan 03, 2014

ODI 12c - Components and LKMs/IKMs

Here I'd like to illustrate some nice capabilities of ODI 12c's knowledge module framework in combination with the new component based mapper. Some of this was prompted from recent questions from Holger Friedrich (on migration from OWB..'where is delete'), Stewart Bryson and Mark Rittman (on components and KMs here) and a general nagging thought I had that people were generally unaware of the capabilities. There's a lot more we can illustrate and talk about in this area, so this is a little taster...

Customized loading, integration and error management can be defined using knowledge modules. There is a text based substitution library that makes building such knowledge modules very easy - especially with the abundance of examples - plus the substitution reference documentation is a good reference guide (intro here, reference here). This can be used in harmony with ODI 12c components. In 12c we have modularized a lot and introduced components (components are logical and describe the WHAT of the transformation, component KMs describe the HOW, just like regular KMs), also we have formalized some parts of the framework but it is still as open as ever.

In supporting the odiRef substitution library we have chosen a similar approach to how the sub-select worked in 11g. If you want to build your own integration you can use the odiRef methods to get the target shape, details, connection etc. The source for the target, just like in 11g is provided via the odiRef methods also and may be rolled into the odiRef.getFrom method.

To illustrate, if you want to perform an integration to do a delete operation, then the following snippet is sufficient - in 11g or 12c (the code in red is boiler plate). In ODI 12c you can now have an arbitrary mapping graph leading up to the target;

  1. DELETE FROM <%=odiRef.getTable("L","TARG_NAME","A")%> T
  2. WHERE (<%=odiRef.getTargetColList("", "[COL_NAME]", ", ", "\n", "UK")%> )
  3. IN
  4. (
  5. select * from (
  6.  <%for (int i=odiRef.getDataSetMin(); i <= odiRef.getDataSetMax(); i++){%>
  7.  <%=odiRef.getDataSet(i, "Operator")%>
  8.    SELECT  <%=snpRef.getColList(i,"", "\t[COL_NAME]", ",\n", "", "UK")%>
  9.    FROM <%=odiRef.getFrom(i)%>
  10.    WHERE (1=1) <%=snpRef.getJoin(i)%> <%=snpRef.getFilter(i)%> <%=snpRef.getGrpBy(i)%> <%=snpRef.getHaving(i)%>
  11.  <%}%>
  12. ) S
  13. )

For example in the mapping below, I am using the subquery filter component to identify some rows that I then want to DELETE from the CONTACT_LIST target. The integration type property on target datastores was introduced in 12c and supports a set number of integrations to help filter the IKMs, if you select None, you can pick any IKM applicable for the technology.

If you look at the physical design for the above mapping you can now pick the IKM which performs the delete, the subquery filter also has a component KM which produces code, this code is rolled into the odiRef methods in the template. This then lets arbitrary map designs to be created and still build customized KMs for loading, integrating and error management. The odiRef substitution methods are simple text based APIs for producing text based on simple primitives.

This then produces the following SQL statement to perform the DELETE DML, our IKM has no knowledge of how to construct subquery filter, pivot or any other complex transformation - the code has been modularized. The code produced is below;
  1. DELETE FROM OE.CONTACT_LIST T WHERE (CUSTOMER_ID)
  2. IN (
  3. SELECT   CUSTOMER_ID
  4. FROM OE.ADDRESSES ADDRESSES
  5. WHERE (1=1)   and ( ( ADDRESSES.CUSTOMER_ID ,  ADDRESSES.PHONE_NUMBER )
  6.                = SOME  (  
  7. SELECT 
  8.   CUSTOMERS.CUSTOMER_ID  CUSTOMER_ID ,   CUSTOMERS.CUST_FIRST_NAME  PHONE_NUMBER  
  9. FROM
  10.  OE.CUSTOMERS CUSTOMERS  
  11.  WHERE  CUSTOMERS.MARITAL_STATUS = 'married' AND CUSTOMERS.CREDIT_LIMIT > 1000  AND  ADDRESSES.CUSTOMER_ID=CUSTOMERS.CUSTOMER_ID 
  12.     )
  13.    )  
  14. )

Another useful illustration of things working in harmony is related to the improved code generation for Oracle connectivity - the basics of database links have been greatly improved (no more requirements for view on source, but still supported). The framework change in order to support this will have more use cases down the line too. Now you can use the 12c database link LKMs to pull data from tables over a database link and leverage custom IKMs on the target. One of the 11g issues was related to the requirements the 11g Oracle to Oracle database link LKM placed on users, this has been greatly improved, you can now take advantage of this, plus use existing IKMs, build new ones etc.

The mapping above uses new 12c LKMs to access remote Oracle datastores and a customized IKM to illustrate the mix and match capabilities. 

Hopefully this gives you some background and insight into the ODI 12c mapping and knowledge module capabilities that you weren't aware of. Looking forward to any questions and additional ideas, insights that you have.

Wednesday Sep 09, 2009

Generating Sample Data with ODI: A Case Study For Knowledge Modules and User Functions

Looking for Data Integration at OpenWorld 2009? Look no further: all you need is here!

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Interface, Model, Knowledge Module and User Function are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.


We've all been there: we start coding, waiting for a set of sample data to be available. We move along with the code... and the data is not available. Or we need to build a small (or not so small) data set quickly. Sure, we all have sample databases left and right for that purpose. But recently I was looking for a decent size data set for some tests (more than the traditional 30 sample records) and could not put my hands on what I needed. What the heck: why not have ODI build this for me?

The techniques that we will leveraged for this are the following:

  • Creation of a temporary interface to create the sample table (See this previous post for details on how to create a temporary interface)
  • Creation of a new knowledge module to generate enough records in the new table
  • Creation of ODI User Functions to simplify the generation of random values

 


All the objects mentioned in this article can be downloaded. Save

this XML file
if you want to import in your repository a project that already contains all the objects (IKM and User functions). Click
here
if you want to download a file that will let you import the different objects individually. You will have to unzip the file before importing the objects in the later case.

The samples provided here have all been designed for an Oracle database, but can be modified and adapted for other technologies.

Today we will discuss the different elements that allow us to generate the sample data set. In future posts, we will dissect the Knowledge Modules and User Functions to see what technological choices were made based on the different challenges that had to be solved.

1. THE INTERFACE

For more details on how to create a temporary interface, you can refer to this post. For our example, we will create a new table in an existing schema. When you create your temporary interface, remember to set the following elements:

  • Select of your staging area ( In the Definition tab of the interface)
  • Name your target table
  • Select the location of your target table (work schema / data schema)
  • Name the Columns, and set their individual data type and length

 


Interface Definition Tab

For our example, we will use a fairly simple table structure:
TABLE NAME:

SAMPLER

COLUMNS:
SAMPLER_ID number(3)
SAMPLER_NAME varchar2(30)
SAMPLER_PROMOTION varchar2(10)
SAMPLER_PRICE number(10,2)
SAMPLER_RELEASE_DATE date

 

Sampler Interface Table Creation

2. USER FUNCTIONS

The Oracle database comes with a package called DBMS_RANDOM. Other random generators can be used (DBMS_CRYPTO for instance has random generation functions as well). These functions take more or less parameters, and if we realize after creating dozens of mappings that using the "other" package would have been better... we would be in a lot of trouble. Creating user functions will allow us to:

  • Have a naming convention that is simplified
  • Limit the number of parameters
  • Limit the complexity of the code
  • Later maintain the code independently of our interfaces, in a centralized location: if we decide to change the code entirely, we will make modifications in one single place - no matter how often we use that function.

 


For our example, we will have 5 ODI user functions in ODI (again, these can be downloaded

here
):

  • RandomDecimal(Min, Max): generates a random value (with decimals) between the Min and Max values
  • RandomNumber(Min, Max): generates a random value (without decimals) between the Min and Max values
  • RandomBool(): generate a 0 or a 1
  • RandomDate(MinDate, MaxDate): returns a date between MinDate and MaxDate (make sure MinDate and MaxDate are valid dates for Oracle)
  • RandomString(Format, Min, Max): generates a random string with a minimum of Min characters and a maximum of Max characters. Valid formats are:
    • 'u', 'U' - returning string in uppercase alpha characters
    • 'l', 'L' - returning string in lowercase alpha characters
    • 'a', 'A' - returning string in mixed case alpha characters
    • 'x', 'X' - returning string in uppercase alpha-numeric characters
    • 'p', 'P' - returning string in any printable characters.

 


SamplerUserFunctions.PNG

We can either use these functions as is or as part of a more complex code logic, such as a case...when statement.

For our example, we will build the following mappings:

Column Mapping
SAMPLER_ID RandomNumber(1,999)
SAMPLER_NAME RandomString('A', 1, 30)
SAMPLER_PROMOTION case when RandomBool()=0 then 'FALSE'
else 'TRUE'
end
SAMPLER_PRICE RandomDecimal(1,10000)
SAMPLER_RELEASE_DATE RandomDate('01-JAN-2000', sysdate)


In ODI, the mappings will look like this:

Sampler Interface Mappings

3. THE KNOWLEDGE MODULE

Since we do not have any source table in this interface, we only need an IKM. The IKM provided will this example needs to be imported in your project.

Because the purpose of this KM is to generate sample data, it will have a few options where the default values will be different from the usual KMs:

  • TRUNCATE defaults to 'YES': we assume here that if you re-run the interface, you want to create a new sample. If you only want to add more records to an existing table, simply set this option to 'NO' in your interface.
  • CREATE_TABLE defaults to 'YES': we assume that the table to be loaded does not exist yet. You can turn that option to 'NO' if there is no need to create the table.
  • THOUSANDS_OF_RECORDS: set this to any value between 1 and 1,000 to generate between 1,000 and 1,000,000 records

 

Sampler IKM

Once you have set the values for your KM, you can run the interface and let it generate the random data set.

With the above configuration, and using a standard laptop (dual core 1.86GHz processor and 2 Gb of RAM) equipped with Oracle XE my statistics were as follows:

10,000 records generated in 5 seconds
100,000 records generated in 24 to 35 seconds (about 30 seconds on average)
1,000,000 records generated in 211 to 235 seconds (about 4 minutes on average)

Note that the machine was not dedicated to this process and was running other processes.

Statistics are available in the Operator interface.

Sampler Stats

To review the data loaded by ODI in your target table, simply reverse-engineer this table in a model, then right-click on the table and select View Data to see what was generated!

SamplerData.PNG

4. EXPANDING TO OTHER TECHNOLOGIES

One question: why did I stop here and did not try to make this work for other technologies? Well, it turns out that ODI is really meant to move and transform data. As long as I have at least ONE table with random data in any one of my databases, it is now faster to just create a regular ODI interface and move the data across... The design will take less than a minute. The data transfer should not take much time either. Who would try to spend more time coding when the solution is that simple?

But if you want to make this work for other databases, here are your entry points:

  • Duplicate the KM and modify it to use SQL that would work on these other databases
  • Update the user functions to make sure that they use the appropriate functions for the given databases
  • Use the same logic to create your interface

Enjoy!

 


All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Data Integration Showcased at OpenWorld 2009

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
4
5
6
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today