ODI 11g – Faster Files

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

  1. improves the out of the box experience – just build the mapping and the appropriate KM is used
  2. improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

So if you are doing any file to file transformations, check it out!

Comments:

There is a bug 13646250 on Windows in this IKM, the single quote wrappering the javac command is causing the problem in the compile program task and the execute program task. Removing the quotes on the compile and execute makes it work for me, you can change the IKM yourself and try.

Change the Compile Program task from ...
OdiOSCommand "-COMMAND='<%= odiRef.getOption("JAVA_HOME") %>/bin/javac' <%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>.java" "-ERR_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_err.txt" "-OUT_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_out.txt"

to
OdiOSCommand "-COMMAND=<%= odiRef.getOption("JAVA_HOME") %>/bin/javac <%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>.java" "-ERR_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_err.txt" "-OUT_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_out.txt"

Change the Execute Program task from....
OdiOSCommand "-COMMAND='<%= odiRef.getOption("JAVA_HOME") %>/bin/java' -cp <%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %> <?= getOdiClassName() ?>" "-ERR_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_err.txt" "-OUT_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_out.txt"

to
OdiOSCommand "-COMMAND=<%= odiRef.getOption("JAVA_HOME") %>/bin/java -cp <%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %> <?= getOdiClassName() ?>" "-ERR_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_err.txt" "-OUT_FILE=<%= odiRef.getSrcTablesList("", "[WORK_SCHEMA]", "", "") %>/<?= getOdiClassName() ?>_out.txt"

Cheers
David

Posted by David on June 25, 2012 at 10:02 AM PDT #

Hi David,
Can you provide the link to donwload the IKM File to File (Java)
Thanks.

Regards
Ashok

Posted by Ashok on November 22, 2012 at 07:48 AM PST #

Hi David,

When I am using this KM, I am getting one error in the log file
Column C1 : is mandatory

What could be the reason behind this ? All data going to bad file.
Please suggest.
Thanks

Posted by guest on November 23, 2012 at 04:02 AM PST #

Have you specified data for all mandatory target columns? It sounds like you haven't.

Cheers
David

Posted by David on November 26, 2012 at 08:48 AM PST #

Yes I have mapping for all target columns. its an one to one mapping. By giving a key column on target side of the interface is not making any difference. Same error in this case also. What i did is that i reversed two text files with pipe delimiter. Then did a one to one mapping and selected the IKM file to file (Java). But no luck till now.

When I had code some values in the target column like "11" then i can see few records loaded which is in a different format.
Input file
C1|C2|C3|C4
1412|tom|333|4455
2414|alex|333|4455
103|aaac|33|4435
4525|aaad|33|4415
5525|aaae|333|445
6525|aaaf|333|55
7|aaag|333|4955
8|aaah|33|4625
9|aaai|33|325
10|aaaj|336|4545
11|aaak|334|65

output file has blank rows
output log file contains
Oracle Data Integrator * File to File:
Copyright (c) Oracle Corporation. All rights reserved.

Number of threads: 1

Discradmax: 1

OutputFile: D:/FF/fjava_tgt/fjava_tgt.txt
BAD file: D:/FF/fjava_tgt/fjava_tgt.txt.bad

Pattern: D:\\FF/fjava1\.txt
Input file: D:\FF\fjava1.txt
Error line: 2 Column C1 : is mandatory
Maximum number of errors reached

Number of lines read for this file: 2

*************************************************************************
2 Rows successfully read.
1 Rows skipped (Header).
0 Rows successfully loaded.
==>0 Rows loaded with warning.
1 Rows not loaded due to data errors.
0 Rows not loaded because of filter.
Run began on Fri Nov 23 18:39:56 IST 2012
Run ended on Fri Nov 23 18:39:56 IST 2012
Elapsed time was: 140 milliseconde

Posted by guest on November 26, 2012 at 08:59 AM PST #

I found the problem, the java is tokenizing the string using the split function which actually uses a regular expression and pipe (|) is a special regular expression character.

To fix, escape the pip character in your file, change the delimiter for your source file to \|

Cheers
David

Posted by David on November 26, 2012 at 02:13 PM PST #

I raised a bug to track this...15918659

Posted by David on November 26, 2012 at 02:22 PM PST #

Thanks David for the information.

Posted by Ashok on November 27, 2012 at 06:26 AM PST #

Hi David,

can you please provide me the link for downloading this IKM file to file (java) as the same scenario came to me.

Kindly do the needful..

Many Thanks,
Pavan Kumar

Posted by Pavan KUmar on February 21, 2013 at 03:40 AM PST #

Hi Pavan

You can get it from the ODI companion CD from OTN.

Cheers
David

Posted by David on February 21, 2013 at 09:16 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today