Thursday Jun 07, 2012

ODI 11g – Faster Files

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

  1. improves the out of the box experience – just build the mapping and the appropriate KM is used
  2. improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

So if you are doing any file to file transformations, check it out!

Tuesday Mar 27, 2012

ODI 11g – Oracle Multi Table Insert

With the IKM Oracle Multi Table Insert you can generate Oracle specific DML for inserting into multiple target tables from a single query result – without reprocessing the query or staging its result.

When designing this to exploit the IKM you must split the problem into the reusable parts – the select part goes in one interface (I named SELECT_PART), then each target goes in a separate interface (INSERT_SPECIAL and INSERT_REGULAR).

So for my statement below…

/*INSERT_SPECIAL interface */ insert  all
when 1=1 And (INCOME_LEVEL > 250000) then
into SCOTT.CUSTOMERS_NEW
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
values
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)

/* INSERT_REGULAR interface */ when 1=1  then
into SCOTT.CUSTOMERS_SPECIAL
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
values
(ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED)
/*SELECT*PART interface */ select   
    CUSTOMERS.EMAIL EMAIL,
    CUSTOMERS.CREDIT_LIMIT CREDIT_LIMIT,
    UPPER(CUSTOMERS.NAME) NAME,
    CUSTOMERS.USER_MODIFIED USER_MODIFIED,
    CUSTOMERS.DATE_MODIFIED DATE_MODIFIED,
    CUSTOMERS.BIRTH_DATE BIRTH_DATE,
    CUSTOMERS.MARITAL_STATUS MARITAL_STATUS,
    CUSTOMERS.ID ID,
    CUSTOMERS.USER_CREATED USER_CREATED,
    CUSTOMERS.GENDER GENDER,
    CUSTOMERS.DATE_CREATED DATE_CREATED,
    CUSTOMERS.INCOME_LEVEL INCOME_LEVEL
from    SCOTT.CUSTOMERS   CUSTOMERS
where    (1=1)

Firstly I create a SELECT_PART temporary interface for the query to be reused and in the IKM assignment I state that it is defining the query, it is not a target and it should not be executed.

Then in my INSERT_SPECIAL interface loading a target with a filter, I set define query to false, then set true for the target table and execute to false. This interface uses the SELECT_PART query definition interface as a source.

Finally in my final interface loading another target I set define query to false again, set target table to true and execute to true – this is the go run it indicator!

To coordinate the statement construction you will need to create a package with the select and insert statements. With 11g you can now execute the package in simulation mode and preview the generated code including the SQL statements.

Hopefully this helps shed some light on how you can leverage the Oracle MTI statement. A similar IKM exists for Teradata. The ODI IKM Teradata Multi Statement supports this multi statement request in 11g, here is an extract from the paper at www.teradata.com/white-papers/born-to-be-parallel-eb3053/

Teradata Database offers an SQL extension called a Multi-Statement Request that allows several distinct SQL statements to be bundled together and sent to the optimizer as if they were one. Teradata Database will attempt to execute these SQL statements in parallel. When this feature is used, any sub-expressions that the different SQL statements have in common will be executed once, and the results shared among them.

It works in the same way as the ODI MTI IKM, multiple interfaces orchestrated in a package, each interface contributes some SQL, the last interface in the chain executes the multi statement.

About

ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today