Wednesday Jul 09, 2014

ODI 11g - HDFS Files to Oracle, community post from ToadWorld

There is a new tutorial on using ODI to move HDFS files into an Oracle database using OLH-OSCH from Deepak Vohra on the ToadWorld blog. This article covers all the setup required in great detail and will be very helpful if you're planning on integrating with HDFS files.

Thursday Aug 29, 2013

ODI - File transformation debatching

Carrying on from the blog post on performing file transformations within ODI using nXSD, in this post I will show how a large file can be debatched into smaller chunks/shards. The example uses a COBOL data file as input and debatches this file into parts using the debatching capability of the adapter.

COBOL is a great example of a complex data source for Hadoop and Big Data initiatives since there are plenty of systems that companies wish to unlock potential gems from. Its big data and there is tonnes of it! The ODI file transformer tool on has been extended to include an optional parameter to define the number of rows in the chunk/shard. Having this parameter now lets you take control of the relative size of the files being processed - perhaps the platform has certain characteristics that work better than others. Hadoop has challenges with millions of small files and with very very large ones, so being able to prepare the data is useful.

The COBOL copybook was used as an input in the Native Format Builder and an nXSD generated for the copybook, I used the following copybook;

  •         02  DTAR020.                                      
  •         03  DTAR020-KCODE-STORE-KEY.                                      
  •             05 DTAR020-KEYCODE-NO      PIC X(08).                         
  •             05 DTAR020-STORE-NO        PIC S9(03)   COMP-3.               
  •         03  DTAR020-DATE               PIC S9(07)   COMP-3.               
  •         03  DTAR020-DEPT-NO            PIC S9(03)   COMP-3.               
  •         03  DTAR020-QTY-SOLD           PIC S9(9)    COMP-3.               
  •         03  DTAR020-SALE-PRICE         PIC S9(9)V99 COMP-3.               

The data file came from an example on the web. Below you can see for the ODIFileTransformer tool, an actual example of the command executed, the SHARDROWS parameter defines the number of rows to be written to each data file. The tool simply adds an integer index to the end of the output parameter - a little basic I know, the source is on if you feel like being creative.

ODIFileTransformer "-INPUT=D:\input\cbl_data.bin" "-SCHEMA=D:\reference\cbl.xsd" "-OUTPUT=d:\output\out.xml" "-ROOT=ROOT" "-SHARDROWS=2" 

Executing this generates many output files with for example 2 rows in each, this is a contrived example just to illustrate the 2 rows in the generated file;

  1. <ROOT xmlns="">
  2.    <DTAR020>
  3.       <DTAR020-KCODE-STORE-KEY>
  4.          <DTAR020-KEYCODE-NO>69684558</DTAR020-KEYCODE-NO>
  5.          <DTAR020-STORE-NO>+020</DTAR020-STORE-NO>
  6.       </DTAR020-KCODE-STORE-KEY>
  7.       <DTAR020-DATE>+0040118</DTAR020-DATE>
  8.       <DTAR020-DEPT-NO>+280</DTAR020-DEPT-NO>
  9.       <DTAR020-QTY-SOLD>+000000001</DTAR020-QTY-SOLD>
  10.       <DTAR020-SALE-PRICE>+00000019.00</DTAR020-SALE-PRICE>
  11.    </DTAR020>
  12.    <DTAR020>
  13.       <DTAR020-KCODE-STORE-KEY>
  14.          <DTAR020-KEYCODE-NO>69684558</DTAR020-KEYCODE-NO>
  15.          <DTAR020-STORE-NO>+020</DTAR020-STORE-NO>
  16.       </DTAR020-KCODE-STORE-KEY>
  17.       <DTAR020-DATE>+0040118</DTAR020-DATE>
  18.       <DTAR020-DEPT-NO>+280</DTAR020-DEPT-NO>
  19.       <DTAR020-QTY-SOLD>-000000001</DTAR020-QTY-SOLD>
  20.       <DTAR020-SALE-PRICE>-00000019.00</DTAR020-SALE-PRICE>
  21.    </DTAR020>
  22. </ROOT>
This is a small extension to the earlier version of the tool and provides a useful debatching capability. The ZIP containing the tool is on the site, see the earlier post on some background details. ...and then what? Well that would be telling.

Monday Aug 05, 2013

ODI - Complex file example, multi record row

Here we will see how a parser is defined to project a schema based on a data file with many records in a single line in the file. The complex file adapter provides a plethora of capabilities and is probably one of those under utilized and little known components - plus I think its one that the more examples provided for it the better! This example I will show has a department defined on every line and each department can have many employees (as many as you want) defined on the same line. Replace Department and Employee for your own concepts; buyers, sellers, or whatever. We define the parser in an NXSD file, which is based on the standard XSD with additional information to define how the data is parsed and projected as XML. If you know XSD you are in business, if not, with a little knowledge you can become very dangerous - resort to the examples in the Oracle documentation if not.

My source data file has the following content;


I want to define a parser that creates departments for each line (ODI is a department and OWF is a department, its a 3 character string) and also has employees (David, Bob etc a 7 character string) with their job roles (ARCHITECT, DEVELOPER etc a 9 character string) and the department they are in.

The following NXSD file does exactly that;

  1. <?xml version="1.0" encoding="UTF-8" ?>

  2. <xsd:schema xmlns:xsd=""
  3.             xmlns:nxsd=""
  4.             targetNamespace=""
  5.             xmlns:tns=""
  6.             nxsd:encoding="ASCII" nxsd:stream="chars" nxsd:version="NXSD">

  7.   <xsd:element name="Departments">
  8.     <xsd:complexType>
  9.       <xsd:sequence>
  10.         <xsd:element name="Department" type="tns:DepartmentT" maxOccurs="unbounded"/>
  11.       </xsd:sequence>
  12.     </xsd:complexType>
  13.   </xsd:element>

  14.   <xsd:complexType name="DepartmentT">
  15.     <xsd:sequence>
  16.         <xsd:element name="Name" type="xsd:string" nxsd:style="fixedLength" nxsd:length="3" />
  17.         <xsd:element name="Employee" type="tns:EmployeeT"  nxsd:style="array" maxOccurs="unbounded" nxsd:arrayTerminatedBy="${eol}" />
  18.     </xsd:sequence>
  19.   </xsd:complexType>

  20.   <xsd:complexType name="EmployeeT">
  21.          <xsd:sequence>
  22.            <xsd:element name="Name" type="xsd:string" nxsd:style="fixedLength" nxsd:length="7"/>
  23.            <xsd:element name="Role" type="xsd:string" nxsd:style="fixedLength" nxsd:length="9"/>
  24.          </xsd:sequence>
  25.   </xsd:complexType>

  26. </xsd:schema>

My employee records above happen to fixed length based on the length of the name and the role, there are many of them, so they are in an unbounded array and the array is terminated by an EOL character. When I reverse this into ODI I get datastores for DEPARTMENTS, DEPARTMENT and EMPLOYEE; EMPLOYEE has columns Name and Role, DEPARTMENT has column Name.

I can define this NXSD in my favorite editor and develop and test the parsing using ODI, this avoids installing and using JDeveloper and gives you a shortcut to quickly developing and building parsers using this capability. Previous blogs on the topic are here and here.

Saturday Dec 29, 2012

ODI - Hive and Complex JSON

Time for a nested JSON example using Hive external tables. ODI treats nested complex types in Hive the same way it treats types in other technologies such as Oracle, the type name is captured, not the definition - you can see XMLType or SDO_GEOMETRY as an example within the ODI Oracle technology. The Hive technology in ODI also has the ARRAY, STRUCT and MAP types seeded. For the nested JSON example I will use the example define in this JSON SerDe page here.

The external table definition is below, I have defined this in Hive and reverse engineered into ODI just like the previous post. Note it is using a different SerDe from the post here, when I tried using that Google SerDe the data was not projected properly (all data was projected as beware of components used). Just like the previous post we need to add the jar whenever it is used (plus don't forget the ODI_HIVE_SESSION_JARS), either from Hive or ODI;

  1. ADD JAR /home/oracle/json/hive-serde-1.0.jar;
  3.   messageid string,
  4.   messagesize int,
  5.   sender string,
  6.   recipients array<string>,
  7.   messageparts array<struct<
  8.     extension: string,
  9.     size: int
  10.   >>,
  11.   headers map<string,string>
  12. )
  13. ROW FORMAT SERDE 'com.proofpoint.hive.serde.JsonSerde'
  14. LOCATION '/user/oracle/json_complex';

This external table has ARRAY fields, STRUCT fields and MAP fields, so we are going above and beyond simple types. The data I will use is the same as the referenced web page;

  1. {
  2.   "messageId": "34dd0d3c-f53b-11e0-ac12-d3e782dff199",
  3.   "messageSize": 12345,
  4.   "sender": "",
  5.   "recipients": ["", ""],
  6.   "messageParts": [
  7.     {
  8.       "extension": "pdf",
  9.       "size": 4567
  10.     },
  11.     {
  12.       "extension": "jpg",
  13.       "size": 9451
  14.     }
  15.   ],
  16.   "headers": {
  17.     "Received-SPF": "pass",
  18.     "X-Broadcast-Id": "9876"
  19.   }
  20. }

Again, I will use the Hive RKM that I mentioned in the post here in order to reverse engineer the external table defined in Hive into ODI. Below you can see the table and how it is represented in ODI. You can see the recipients and messageparts columns are defined as ARRAY, and headers is a MAP.

We can view the data in ODI just like other tables - remember the external table is projected the JSON data from HDFS through Hive;

When the datastore is used in an interface these are the columns that are available in mapping, you can use any of the Hive functions or operators available on these complex types, plus constructors for building them. For example to retrieve the 1st element in the array the following Hive QL can be used;

  • select sender, recipients[0] from message;

 You can build such expressions in ODI;

You just need to be aware of the datatypes you are using and the functions available. Haven't gone into complex SQL/Hive QL here, but you see the basic mechanics are pretty straightforward. One of the points that comes home here is the functionality level of the non-core pieces of Hadoop, so the first 2 JSON SerDe libraries I have used support different capabilities of JSON - so be aware.

ODI - Hive External Tables, reverse engineering

Here we see Hive external tables on JSON data files in action and being reversed engineered from Hive into ODI. Carrying on from my earlier post on Hive reverse engineering here we will tackle external tables with primitive data types. Just like Oracle external tables for accessing data on the filesystem, Hive external tables can access big data files in HDFS. This example uses a JSON SerDe for Hive which was downloaded from here. The external tables in Hive are quite nice! You just specify the HDFS directory and all files in that directory are projected through the external table definition, also you can specify sophisticated SerDe classes that comply with the Hive SerDe interfaces - these classes can reverse engineer complex types also which make them very useful.

Before launching ODI or the agent for this I added the SerDe JAR file to this environment variable used by ODI, here is what I did in my system;

  • export ODI_HIVE_SESSION_JARS=$ODI_HIVE_SESSION_JARS:/home/oracle/json/hive-json-serde-0.2.jar

I have 2 JSON files with data in my HDFS under /user/oracle/json I will specify this directory in the external table definition, along with the JSON SerDe. Within my Hive shell I can define the external table, remembering to add the JSON SerDe file (using the ADD JAR command) just before I create the external table.

  1. ADD JAR /home/oracle/json/hive-json-serde-0.2.jar;
  3.           field1 string, field2 int, field3 string, field4 double
  4.        )
  5.        ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
  6.        LOCATION '/user/oracle/json';

 The data in the first JSON file looks like the following, this was just an example from the google code snippets;

  • {"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
  • {"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
  • {"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
  • {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

I have defined these fields in the external table. Back in ODI, in my model below I am using the customized reverse with the RKM Hive mentioned above, also for this reverse I just want one table imported, so I defined the mask just to be my_table. Clicking the 'Reverse Engineer' button on the top left of the panel starts the reverse engineering.

If there are errors, then check the ODI Operator, that's where you will see anything that goes wrong. The table in ODI has just some rudimentary information about the number of columns, all of the external table definition for this is defined in Hive, this is the same as when we reverse engineer an Oracle external table. Once the table is captured in ODI I can view the data;

The custom Hive RKM I used is here - this has some changes to the one shipped with ODI to include external table reverse (I included EXTERNAL_TABLE in the getTables API call), plus support for custom SerDe JARs loaded via the environment variable ODI_HIVE_SESSION_JARS (just like the ODI IKMs load the JARs, the RKM also needs to).

Anyway, that's a quick peek at external tables in Hive and reverse engineering them into ODI. Once the table is captured in ODI you can use it in mappings just like other tables!

Thursday Dec 20, 2012

ODI 11g - Loading More Files Faster

This post shows you how to leverage more of your Oracle system, you can load more data and faster using these kind of best practices. The LKM here, lets you leverage the Oracle database's preprocessor to load MORE data faster. The KM also has the capability to leverage a regular expression to identify the file or files so that you can load much much more, it also sorts the files based on size to comply with the most efficient way of loading into Oracle via the external table. The preprocessor can basically do anything, but is commonly used to uncompress data on the fly - so you can eliminate transferring large uncompressed files around or needlessly uncompressing them! There are various creative blog posts around showing you what some customers have done - some even include the transfer in the preprocessing phase.

I posted the LKM on the site here

You can see the LKM configured below, I have used the preprocessor and a regular expression to load all gzipped files in the directory; 

When the code is generated below you can see the PREPROCESSOR directive used, plus you can see that the file names are going to be generated from a Java API call where the regular expression and directory are passed.

My gunzipdb.bat script has the following definition, wrappering the gzip utility (note the use of the -d -c directives);

  • @echo off
  • c:\ext_tab_test\gzip -d -c %1

Check this creative use of the PREPROCESSOR directive here (External table to load compressed data residing on remote locations) which does some creative work in the preprocessing script. This could do pretty much anything you want including transforming from some complex file to project through an external table.

Tuesday Jun 14, 2011

Complex File Processing - 0 to 60

The Complex File technology adds a powerful new driver into the ODI capabilities. It leverages the Fusion Middleware capabilities for supporting complex files, you can then build ODI interfaces just as you are used to!

Using an example from the Oracle® Fusion Middleware User's Guide for Technology Adapters 11g Release 1 (see here), the viewlet quickly builds up the model based on the XSD for the complex file (use Native Format Builder to construct).

 Have a look at the viewlet right here

You can pause it to catch your breath! Its condensed into 60 seconds to keep your attention.

Sunday Sep 13, 2009

How to Define Multi Record Format Files in ODI

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Datastore, Model and Logical Schema are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

It is not unusual to have to load files containing various record formats. For example a company might store orders and order lines using distinct record formats in the same flat file or you might have one single file containing a header, some records and a footer.
In this post, we'll use the following source file as an example:

1,101,2009/09/01,Computer Parts
2,234,101,Motherboard, Asus P6T,239.99
2,235,101,CPU,Intel Celeron 430,40
1,102,2009/09/02,Computer Parts
2,301,102,CPU,AMD Phenom II X4,170
2,401,103,Inkjet Printer,Canon iP4600,69.99
2,402,103,Inkjet Printer,Epson WF30,39.99
2,403,103,Inkjet Printer,HP Deskjet D2660,49.99

As we can see the Order and Order Lines records have different formats (one has 4 fields, the other has 6 fields), they could also have a different field separator.

Identifying the Record Codes

The first step in order to handle such a file in ODI is to identify a record code, this record code should be unique for a particular record type. In our example the record code will be used by ODI to identify if the record is an Order or an Order Line. All the Order records should have the same record code, this also applies to the Order Lines records.
In our example the first field indicates the record code:
- 1 for Orders records.
- 2 for Order Lines records.

Define the Datastores

We assume that you have already created a Model using a Logical Schema that points to the directory containing your source file.

We will start by defining a datastore for the Order records.

Right-click on the File model and select Insert Datastore.
In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. 




In the Files tab, specify your flat file settings (delimiter, field separator etc.)

Refer to the ODI documentation for additional information regarding how to define a flat file datastore.



In our example the Order records have 4 fields:

Go to the columns tab and add those 4 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.




Click OK.



In the Models view, right-click on the datastore and select View Data to display the file content and make sure it is defined correctly. 



The data is filtered based on the record code value, we only see the Order records.


We will now apply the same approach to the Order Lines record.

Right-click on the File model and select Insert Datastore to add a second datastore for the Order Lines record.

In the Definition tab, enter a name and specify the flat file resource name in the Resource Name field. We are pointing this datastore to the same file we used for the Order records. 



In the Files tab, specify your flat file settings (delimiter, field separator etc.).
Refer to the ODI documentation for additional information regarding how to define a flat file datastore.



In our example the Order Lines records have 6 fields:

Go to the columns tab and add those 6 columns to your datastore.

Now specify the Record Code in the 'Rec. Code' field of the RECORD_CODE column.



Click OK.


Right-click on the datastore and select View Data to display the file content and make sure it is defined correctly.



The data is filtered based on the record code value, we only see the Order Lines records.

You can now use those 2 datastores in your interfaces.


All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Thursday Apr 09, 2009

Using Parameters in ODI: The Dynamic File Name Example

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Model, Project, Interface and Package are used here assuming that you understand these concepts in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information on these concepts.

In a previous post we have seen how to use variables in the ODI interfaces and packages.
We used two methods to assign a value these variables: a direct assignment of a value, or a "refresh" statement that was running a SQL query and storing the resulting value in the variable.
Today, we will see how to set the value dynamically as we start our process. The value to be stored in our variable will be passed as a parameter to our process.

The example we will work on is a fairly common one. When dealing with flat files, it is not rare for the actual name of the file to vary over time. An external process may know the file name and pass that name over to ODI for further processing.

For our example to work, we will need to perform two actions:
- Reference the file name dynamically (In the ODI Metadata, we cannot hard-code the file name anymore)
- Pass the file name as a parameter to our process.


1.1 Creation of a Variable

Our first step is to create a variable to store the file name. Variables are defined in projects, even for variables that will be used in the metadata. Select the project where you will use the variable (typically the project that will contain the Interface where the file is used as a source or as a target).
We will create a variable called FileName.

Filename Variable Creation

Defining a default value for the variable is not required, though it is good practice. We do not need an associated query in the refresh tab of the variable object, as we will receive this value as a parameter.
Click Ok to save the definition of the variable.

1.2 Using the variable in the metadata

Now that we have created a variable, we can use it in place of the file name.
Note: If you are not familiar with the definition of Files in the ODI Models (where the metadata are defined) please refer to the ODI Tutorial where you will find a couple of examples.

I usually recommend starting with a hard-coded version of your file name. Copy a version of the file locally, and use it to define the file structure. This will allow you to validate the file format, make sure that the data can be viewed with the ODI GUI, validate alignment of the columns, etc. Pay particularly attention to date columns (unless you define them as strings – after all, this being a file, you can choose the most convenient format for you): dates require the definition of the date format. The ODI documentation gives you the necessary details on how to describe this format (See the Appendix on Date Formats in the Reference Manual).

In our example, we will use the file SRC_SALES_PERSON.txt that is provided as an example with ODI (look under your ODI installation directory under oracledi/demo/file).

Once we have validated that the file format is correct, we will edit the file definition. Double click on the file name in the Model tree, so that we can change the value of the Resource Name. We can keep the Name as is, to have a meaningful description of the nature of the file, and replace the Resource Name with our variable. Whenever using a variable outside of a project, keep in mind that you always have to prefix the variable name with the project CODE. In case of doubt, double click on your project Name in the Project tree: the definition window of the project will show you both the name of the project and the code of the project.

Project Code

In the example above, the project code is SALES.

Back to the Definition of our file, we now change the Resource Name from SRC_SALES_PERSON.txt to #SALES.FileName.

Dynamic File Name

When you click Ok to save this new definition, you can see in the Model tree that ODI will show you both the Name (SRC_SALES_PERSON.txt) and the Resource Name (#SALES.FileName) to indicate that this value will be set dynamically.

File Name In Tree

One important note is that once you use the variable as part of the resource name, you cannot see the data from your file from the GUI as the variable will only be resolved at run-time. From that perspective, it may be useful to have two definitions for the same file: one using the variable, and one that points to a specific file so that debugging is made easier.


2.1 Create a Package and Declare the Variable

You will have to create an interface where you use the file (as you would do for any other file) so that you can really see the operations in action. Then create a package where you drag and drop the FileName variable and the interface. Make sure that the variable is your first step (right-click on the variable icon in the package and select First Step if it is not the case. The first step of your package is identified by a little green triangle over the step name).

Parameters Package

Click on the icon representing the FileName variable and set the action on this variable to Declare Variable. This will authorize the passage of the value as a parameter at run time.

Parameter Declare Variable

2.2 Create a Scenario

You will only be able to pass parameters to a "production ready" object: the scenario. The creation of a scenario is trivial: right-click on the package name, select "generate scenario", edit the scenario name and version if needed and click ok.

Generate Scenario

Set Scenario Name and Version

If you have "declare variable" steps though, you will be prompted to define whether these will be parameters or not.

Scenario Parameters

When this window prompts you, if you do not change anything, all declared variables become placeholders for potential parameter values. Note that in the screenshot above, we have changed the selection from "Use All" to "Selective Use" simply to highlight the variable name. Either selection would have had the same value in our example.

At this point, we now have a scenario that is ready to receive a value for our FileName variable.

2.3 Different ways to start the scenario and pass the parameter

There are many ways to start a scenario. Each one of those will allow you to set the values for your parameters.

2.3.1 Starting the Scenario from Designer or Operator

In both cases, the GUI will prompt you for the value of the parameters after you select in which context and with which agent to run the scenario:

Set Scenario Params Value

ODI will by default select the last value assigned to the variable. You can uncheck the "Last Value" check box and change the value for the parameter. Hit the "Enter" key or the "Tab" key to validate your entry and run the scenario.

2.3.2 Starting the Scenario from a Command Line

You can also start a scenario from a command line interface. The script startscen (.bat for Windows or .sh for Unix-like environments) will take the scenario name as a parameter... along with the name of the variables that you are setting along with their values. Enclose the variable name and its value in double quotes, and prefix the project code with a minus sign (-). Remember to prefix the variable name with the project CODE as usual:

starscen DYNAMICFILENAME 001 "-SALES.FileName=c:/Incoming/Sales09092009.dat"

2.3.3 Starting the Scenario through a Web Service

If you start the scenario from a web service, you are invoking the service OdiInvoke. When you look at the WSDL for this Web Service, you will notice that beyond the scenario name, version and the execution context, you can specify pairs of variable names and values. Again, remember to specify the project code

WSDL to Start Scenario

2.3.4 Starting the Scenario from Another Scenario

If you start a scenario from another ODI package, you can either drag and drop the scenario in the parent package or use the ODI tool OdiStartScen and enter the information manually. When you look at the property window for that step, you will notice an Additional Variables tab. In this tab, you will be able to select your project, variable and to set the value for the variable. Remember to use the Enter or the Tab key to validate your choices as you make them.

Scenario Call From Package


For more information on ODI variables, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "Creating and Using Variables"

All Screenshots were taken using version of ODI. Actual icons and graphical representations may vary with other versions of ODI.


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« March 2015