Thursday Jun 07, 2012

ODI 11g – Faster Files

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

  1. improves the out of the box experience – just build the mapping and the appropriate KM is used
  2. improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

So if you are doing any file to file transformations, check it out!

Tuesday Mar 27, 2012

ODI 11g – Oracle Multi Table Insert

With the IKM Oracle Multi Table Insert you can generate Oracle specific DML for inserting into multiple target tables from a single query result – without reprocessing the query or staging its result.

When designing this to exploit the IKM you must split the problem into the reusable parts – the select part goes in one interface (I named SELECT_PART), then each target goes in a separate interface (INSERT_SPECIAL and INSERT_REGULAR).

So for my statement below…

/*INSERT_SPECIAL interface */ insert  all
when 1=1 And (INCOME_LEVEL > 250000) then
into SCOTT.CUSTOMERS_NEW
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
values
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)

/* INSERT_REGULAR interface */ when 1=1  then
into SCOTT.CUSTOMERS_SPECIAL
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
values
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
/*SELECT*PART interface */ select   
    CUSTOMERS.EMAIL EMAIL,
    CUSTOMERS.CREDIT_LIMIT CREDIT_LIMIT,
    UPPER(CUSTOMERS.NAME) NAME,
    CUSTOMERS.USER_MODIFIED USER_MODIFIED,
    CUSTOMERS.DATE_MODIFIED DATE_MODIFIED,
    CUSTOMERS.BIRTH_DATE BIRTH_DATE,
    CUSTOMERS.MARITAL_STATUS MARITAL_STATUS,
    CUSTOMERS.ID ID,
    CUSTOMERS.USER_CREATED USER_CREATED,
    CUSTOMERS.GENDER GENDER,
    CUSTOMERS.DATE_CREATED DATE_CREATED,
    CUSTOMERS.INCOME_LEVEL INCOME_LEVEL
from    SCOTT.CUSTOMERS   CUSTOMERS
where    (1=1)

Firstly I create a SELECT_PART temporary interface for the query to be reused and in the IKM assignment I state that it is defining the query, it is not a target and it should not be executed.

Then in my INSERT_SPECIAL interface loading a target with a filter, I set define query to false, then set true for the target table and execute to false. This interface uses the SELECT_PART query definition interface as a source.

Finally in my final interface loading another target I set define query to false again, set target table to true and execute to true – this is the go run it indicator!

To coordinate the statement construction you will need to create a package with the select and insert statements. With 11g you can now execute the package in simulation mode and preview the generated code including the SQL statements.

Hopefully this helps shed some light on how you can leverage the Oracle MTI statement. A similar IKM exists for Teradata. The ODI IKM Teradata Multi Statement supports this multi statement request in 11g, here is an extract from the paper at www.teradata.com/white-papers/born-to-be-parallel-eb3053/

Teradata Database offers an SQL extension called a Multi-Statement Request that allows several distinct SQL statements to be bundled together and sent to the optimizer as if they were one. Teradata Database will attempt to execute these SQL statements in parallel. When this feature is used, any sub-expressions that the different SQL statements have in common will be executed once, and the results shared among them.

It works in the same way as the ODI MTI IKM, multiple interfaces orchestrated in a package, each interface contributes some SQL, the last interface in the chain executes the multi statement.

Tuesday Nov 16, 2010

ODI 11g – Simple, Flexible, Powerful

 

SQL as a source, inline SQL, SQL override...want to know more ....? A couple of specific enhancements in the code generation capabilities of ODI 11g, opens the door to an area of code generation which provides great benefit from the ability to have SQL as a source to the construction of even better performing code to better interface designs and accelerators.

So what are they? The two key changes are;

  • support for sub-select to chain, when possible, multiple interfaces into one. With ODI 11g the code generation capabilities of ODI were extended in order to build larger more complex interface designs without the necessity of staging via temporary interfaces. There have already been blog posts on this subject, for example see Uli Bethke's post here.
  • a property in the KM definition to define current command for sub-select. A new property defined on a command in a KM to indicate to the ODI code generator that this command should be used for generating the SQL statement for a sub-select. This one is the key change in addition to the support for sub-select that opens the door. Is the penny dropping yet?

These two together open the door to building operator(or transformation)-like KMs within the bounds of the information that can be defined by a temporary interface and the options on the KM.

Let's see...

Simple Example - The SQL as Source IKM

Let's build a very simple example that is similar to Informatica's SQL override capability or inline views - SQL as source. So to the new breed of code template an IKM with the SQL code generator to be used for sub-query. We define an IKM and have the target technology just now set to Oracle;

image

Now comes the interesting part, we can define a command to fulfill the SQL generation. This can be as simple as actually just being the SQL code (SQL as source) or something a little more complex such as fulfilling a transformation such as a pivot or table function.

Let's check out the command definition, its very simple - the SQL representing the subquery will be an option on the IKM. Don't worry you won't have to build this KM its on the code samples already, you can download the KM here. (I built it with Oracle as a target technology at first then changed for all using undefined technology). Also it will become blatantly obvious how its used when you see a simple example.

image

From the above you see 2 things; 'Use current command for Derived-Table sub-select statement' is checked, a bit of a mouthful, basically ODI will use THIS command in the IKM for generating the SQL when a temporary interface is used in an interface and the sub-select check box is selected.

The IKM has an option VIEWQUERY. When the IKM is used in an interface design this is where you will define the SQL as a source.

image

That's a peek inside the IKM, we can now build temporary interfaces that are essentially inline view designs that will define the view columns (the target datastore columns in the temporary interface) and the SQL query which is to be used to realize the interface. With ODI we can go the extra mile and use the ODI reference APIs to ensure the SQL is portable also across systems so that we don't hard-wire access, schemas etc.

Building the inline view definition

Like any view (in database terms) we define a signature - the columns to be projected are defined on the temporary target datastore on an ODI temporary interface, below you see I have defined a comment (--) for the mapping expression and the indicator is for the target;

image

The execution unit is assigned the SQL_as_Source IKM, where the SQL query is then defined on the VIEWQUERY option.

image

Above I have hard-wired the query, as mentioned we can use ODI reference APIs to get the physical name from the datastore in a flexible manner. So the query could be defined using the odiRef.getObjectName which makes the statement much more flexible and provides flexibility via the context to use different physical schemas for example.

image

That is the SQL as a source or inline view defined. We can reuse this definition many times in different interfaces. Let's see how it is used.

Using the inline view in an interface

The temporary interface (our inline view definition) can be dropped into other interfaces and used like any other interface in ODI, now in 11g as we have seen from other posts we can include the interface as a sub-select. The INLINE_VIEW can be joined  with other tables etc and used like any other interface artifact.

image

Above I am using the INLINE_VIEW temporary interface and have selected the Use Temporary Interface as Derived Table (Sub-Select), since the tables used by the view are from the same server as the BONUS table.

So what code gets generated?

Simulating the code in ODI 11g we see the SQL statement we defined in our temporary interface nested inside the overall code (this was using the had-wired SQL rather than the odiRef'd version).

image

This KM lets you then use a temporary interface as your SQL Query or inline view and provides a great mechanism to balance fully fledged design using interfaces and integrating custom arbitrary SQL.

You can download the SQL as Source KM from the code samples here on java.net. The example illustrates a capability that we can exploit for other cases - more to come!

 

About

ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

Search

Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today