Thursday Aug 29, 2013

ODI - File transformation debatching

Carrying on from the blog post on performing file transformations within ODI using nXSD, in this post I will show how a large file can be debatched into smaller chunks/shards. The example uses a COBOL data file as input and debatches this file into parts using the debatching capability of the adapter.

COBOL is a great example of a complex data source for Hadoop and Big Data initiatives since there are plenty of systems that companies wish to unlock potential gems from. Its big data and there is tonnes of it! The ODI file transformer tool on java.net has been extended to include an optional parameter to define the number of rows in the chunk/shard. Having this parameter now lets you take control of the relative size of the files being processed - perhaps the platform has certain characteristics that work better than others. Hadoop has challenges with millions of small files and with very very large ones, so being able to prepare the data is useful.

The COBOL copybook was used as an input in the Native Format Builder and an nXSD generated for the copybook, I used the following copybook;

  •         02  DTAR020.                                      
  •         03  DTAR020-KCODE-STORE-KEY.                                      
  •             05 DTAR020-KEYCODE-NO      PIC X(08).                         
  •             05 DTAR020-STORE-NO        PIC S9(03)   COMP-3.               
  •         03  DTAR020-DATE               PIC S9(07)   COMP-3.               
  •         03  DTAR020-DEPT-NO            PIC S9(03)   COMP-3.               
  •         03  DTAR020-QTY-SOLD           PIC S9(9)    COMP-3.               
  •         03  DTAR020-SALE-PRICE         PIC S9(9)V99 COMP-3.               

The data file came from an example on the web. Below you can see for the ODIFileTransformer tool, an actual example of the command executed, the SHARDROWS parameter defines the number of rows to be written to each data file. The tool simply adds an integer index to the end of the output parameter - a little basic I know, the source is on java.net if you feel like being creative.

ODIFileTransformer "-INPUT=D:\input\cbl_data.bin" "-SCHEMA=D:\reference\cbl.xsd" "-OUTPUT=d:\output\out.xml" "-ROOT=ROOT" "-SHARDROWS=2" 

Executing this generates many output files with for example 2 rows in each, this is a contrived example just to illustrate the 2 rows in the generated file;

  1. <ROOT xmlns="http://TargetNamespace.com/CBL">
  2.    <DTAR020>
  3.       <DTAR020-KCODE-STORE-KEY>
  4.          <DTAR020-KEYCODE-NO>69684558</DTAR020-KEYCODE-NO>
  5.          <DTAR020-STORE-NO>+020</DTAR020-STORE-NO>
  6.       </DTAR020-KCODE-STORE-KEY>
  7.       <DTAR020-DATE>+0040118</DTAR020-DATE>
  8.       <DTAR020-DEPT-NO>+280</DTAR020-DEPT-NO>
  9.       <DTAR020-QTY-SOLD>+000000001</DTAR020-QTY-SOLD>
  10.       <DTAR020-SALE-PRICE>+00000019.00</DTAR020-SALE-PRICE>
  11.    </DTAR020>
  12.    <DTAR020>
  13.       <DTAR020-KCODE-STORE-KEY>
  14.          <DTAR020-KEYCODE-NO>69684558</DTAR020-KEYCODE-NO>
  15.          <DTAR020-STORE-NO>+020</DTAR020-STORE-NO>
  16.       </DTAR020-KCODE-STORE-KEY>
  17.       <DTAR020-DATE>+0040118</DTAR020-DATE>
  18.       <DTAR020-DEPT-NO>+280</DTAR020-DEPT-NO>
  19.       <DTAR020-QTY-SOLD>-000000001</DTAR020-QTY-SOLD>
  20.       <DTAR020-SALE-PRICE>-00000019.00</DTAR020-SALE-PRICE>
  21.    </DTAR020>
  22. </ROOT>
This is a small extension to the earlier version of the tool and provides a useful debatching capability. The ZIP containing the tool is on the java.net site, see the earlier post on some background details. ...and then what? Well that would be telling.

Monday Aug 26, 2013

ODI - File transformation tool

ODI 11g introduced the complex file technology which supported the ability to define a parser for data files and plug it into ODI. I have created an ODI Open Tool (ODIFileTransformation) which utilizes part of this capability just to do native file to XML file transformation - so just file to file based on the nXSD parser definition. The native input file (such as COBOL or whatever) is parsed based on the nXSD definition and the resultant XML file streamed directly (to disk, a pipe or wherever). Btw...its very easy to extend ODI in a number of ways, adding a tool is an easy way to integrate existing capabilities.

Why? It gives us different opportunities for consuming the parsed data and it is now in a convenient form (XML). As an illustration I processed a 200Mb data file and the resultant XML file was generated in 120 seconds whereas using the complex file technology directly in ODI and staging the same native data file in a database took 9000 seconds. There is no staging with the tool, it simply reads the data, transforms using the nXSD and produces the generated XML. So.... you could write the resultant file on to a disk such as HDFS then process it using Hadoop.

The tool has the following signature and is useful for running from the command line for performing file to file transformation or from within a package. This tool was based on a test tool that was blogged here and I changed to remove the XML DOM nature so that it streams to the result file.

The tool is on the java.net site, to install, you will need the java mail JAR (download here) and the tool itself (download zip containing JAR and source here). The JARs get copied to the ODI userlib directory and then you restart the client and register the open tool. The tool can have more options added including debatching into chunked files. Hopefully some useful bits and pieces here. 

Monday Aug 05, 2013

ODI - Complex file example, multi record row

Here we will see how a parser is defined to project a schema based on a data file with many records in a single line in the file. The complex file adapter provides a plethora of capabilities and is probably one of those under utilized and little known components - plus I think its one that the more examples provided for it the better! This example I will show has a department defined on every line and each department can have many employees (as many as you want) defined on the same line. Replace Department and Employee for your own concepts; buyers, sellers, or whatever. We define the parser in an NXSD file, which is based on the standard XSD with additional information to define how the data is parsed and projected as XML. If you know XSD you are in business, if not, with a little knowledge you can become very dangerous - resort to the examples in the Oracle documentation if not.

My source data file has the following content;

ODIDavid  ARCHITECTBob    DEVELOPERJohn   MANAGER  
OWFFred   SUPPORT  Jim    SUPPORT  

I want to define a parser that creates departments for each line (ODI is a department and OWF is a department, its a 3 character string) and also has employees (David, Bob etc a 7 character string) with their job roles (ARCHITECT, DEVELOPER etc a 9 character string) and the department they are in.

The following NXSD file does exactly that;

  1. <?xml version="1.0" encoding="UTF-8" ?>

  2. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  3.             xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd"
  4.             targetNamespace="http://TargetNamespace.com/testsch"
  5.             xmlns:tns="http://TargetNamespace.com/testsch"
  6.             nxsd:encoding="ASCII" nxsd:stream="chars" nxsd:version="NXSD">

  7.   <xsd:element name="Departments">
  8.     <xsd:complexType>
  9.       <xsd:sequence>
  10.         <xsd:element name="Department" type="tns:DepartmentT" maxOccurs="unbounded"/>
  11.       </xsd:sequence>
  12.     </xsd:complexType>
  13.   </xsd:element>

  14.   <xsd:complexType name="DepartmentT">
  15.     <xsd:sequence>
  16.         <xsd:element name="Name" type="xsd:string" nxsd:style="fixedLength" nxsd:length="3" />
  17.         <xsd:element name="Employee" type="tns:EmployeeT"  nxsd:style="array" maxOccurs="unbounded" nxsd:arrayTerminatedBy="${eol}" />
  18.     </xsd:sequence>
  19.   </xsd:complexType>

  20.   <xsd:complexType name="EmployeeT">
  21.          <xsd:sequence>
  22.            <xsd:element name="Name" type="xsd:string" nxsd:style="fixedLength" nxsd:length="7"/>
  23.            <xsd:element name="Role" type="xsd:string" nxsd:style="fixedLength" nxsd:length="9"/>
  24.          </xsd:sequence>
  25.   </xsd:complexType>

  26. </xsd:schema>

My employee records above happen to fixed length based on the length of the name and the role, there are many of them, so they are in an unbounded array and the array is terminated by an EOL character. When I reverse this into ODI I get datastores for DEPARTMENTS, DEPARTMENT and EMPLOYEE; EMPLOYEE has columns Name and Role, DEPARTMENT has column Name.

I can define this NXSD in my favorite editor and develop and test the parsing using ODI, this avoids installing and using JDeveloper and gives you a shortcut to quickly developing and building parsers using this capability. Previous blogs on the topic are here and here.

Monday Jun 20, 2011

Oracle Data Integrator 11.1.1.5 Complex Files as Sources and Targets

Overview

ODI 11.1.1.5 adds the new Complex File technology for use with file sources and targets. The goal is to read or write file structures that are too complex to be parsed using the existing ODI File technology. This includes:

    • Different record types in one list that use different parsing rules
    • Hierarchical lists, for example customers with nested orders
    • Parsing instructions in the file data, such as delimiter types, field lengths, type identifiers
    • Complex headers such as multiple header lines or parseable information in header
    • Skipping of lines
    • Conditional or choice fields

Similar to the ODI File and XML File technologies, the complex file parsing is done through a JDBC driver that exposes the flat file as relational table structures. Complex files are mapped to one or more table structures, as opposed to the (simple) file technology, which always has a one-to-one relationship between file and table. The resulting set of tables follows the same concept as the ODI XML driver, table rows have additional PK-FK relationships to express hierarchy as well as order values to maintain the file order in the resulting table.

pic1.jpg

The parsing instruction format used for complex files is the nXSD (native XSD) format that is already in use with Oracle BPEL. This format extends the XML Schema standard by adding additional parsing instructions to each element. Using nXSD parsing technology, the native file is converted into an internal XML format. It is important to understand that the XML is streamed to improve performance; there is no size limitation of the native file based on memory size, the XML data is never fully materialized. The internal XML is then converted to relational schema using the same mapping rules as the ODI XML driver.

How to Create an nXSD file

Complex file models depend on the nXSD schema for the given file. This nXSD file has to be created using a text editor or the Native Format Builder Wizard that is part of Oracle BPEL. BPEL is included in the ODI Suite, but not in standalone ODI Enterprise Edition. The nXSD format extends the standard XSD format through nxsd attributes. NXSD is a valid XML Schema, since the XSD standard allows extra attributes with their own namespaces.

The following is a sample NXSD schema blog.xsd:

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" elementFormDefault="qualified" xmlns:tns="http://xmlns.oracle.com/pcbpel/demoSchema/csv" targetNamespace="http://xmlns.oracle.com/pcbpel/demoSchema/csv" attributeFormDefault="unqualified"

nxsd:encoding="US-ASCII" nxsd:stream="chars" nxsd:version="NXSD">

<xsd:element name="Root">

<xsd:complexType><xsd:sequence>

<xsd:element name="Header">

<xsd:complexType><xsd:sequence>

<xsd:element name="Branch" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>

<xsd:element name="ListDate" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}"/>

</xsd:sequence></xsd:complexType>

</xsd:element>

<xsd:element name="Customer" maxOccurs="unbounded">

<xsd:complexType><xsd:sequence>

<xsd:element name="Name" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>

<xsd:element name="Street" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="," />

<xsd:element name="City" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}" />

</xsd:sequence></xsd:complexType>

</xsd:element>

</xsd:sequence></xsd:complexType>

</xsd:element>

</xsd:schema>

The nXSD schema annotates elements to describe their position and delimiters within the flat text file. The schema above uses almost exclusively the nxsd:terminatedBy instruction to look for the next terminator chars. There are various constructs in nXSD to parse fixed length fields, look ahead in the document for string occurences, perform conditional logic, use variables to remember state, and many more.

nXSD files can either be written manually using an XML Schema Editor or created using the Native Format Builder Wizard. Both Native Format Builder Wizard as well as the nXSD language are described in the Application Server Adapter Users Guide. The way to start the Native Format Builder in BPEL is to create a new File Adapter; in step 8 of the Adapter Configuration Wizard a new Schema for Native Format can be created:

pic2.jpg

The Native Format Builder guides through a number of steps to generate the nXSD based on a sample native file. If the format is complex, it is often a good idea to “approximate” it with a similar simple format and then add the complex components manually. The resulting *.xsd file can be copied and used as the format for ODI, other BPEL constructs such as the file adapter definition are not relevant for ODI. Using this technique it is also possible to parse the same file format in SOA Suite and ODI, for example using SOA for small real-time messages, and ODI for large batches.

This nXSD schema in this example describes a file with a header row containing data and 3 string fields per row delimited by commas, for example blog.dat:

Redwood City Downtown Branch, 06/01/2011
Ebeneezer Scrooge, Sandy Lane, Atherton
Tiny Tim, Winton Terrace, Menlo Park

The ODI Complex File JDBC driver exposes the file structure through a set of relational tables with PK-FK relationships. The tables for this example are:

Table ROOT (1 row):

ROOTPK

Primary Key for root element

SNPSFILENAME

Name of the file

SNPSFILEPATH

Path of the file

SNPSLOADDATE

Date of load

Table HEADER (1 row):

ROOTFK

Foreign Key to ROOT record

HEADERORDER

Order of row in native document

BRANCH

Data

BRANCHORDER

Order of Branch within row

LISTDATE

Data

LISTDATEORDER

Order of ListDate within row

Table CUSTOMER (2 rows):

ROOTFK

Foreign Key to ROOT record

CUSTOMERORDER

Order of rows in native document

NAME

Data

NAMEORDER

Oder of Name within row

STREET

Data

STREETORDER

Order of Street within row

CITY

Data

CITYORDER

Order of City within row

Every table has PK and/or FK fields to reflect the document hierarchy through relationships. In this example this is trivial since the HEADER and all CUSTOMER records point back to the PK of ROOT. Deeper nested documents require this to identify parent elements. All child element tables also have a order field (HEADERORDER, CUSTOMERORDER) to define the order of rows, as well as order fields for each column, in case the order of columns varies in the original document and needs to be maintained. If order is not relevant, these fields can be ignored.

How to Create an Complex File Data Server in ODI

After creating the nXSD file and a test data file, and storing it on the local file system accessible to ODI, you can go to the ODI Topology Navigator to create a Data Server and Physical Schema under the Complex File technology.

pic3_new.jpg

This technology follows the conventions of other ODI technologies and is very similar to the XML technology. The parsing settings such as the source native file, the nXSD schema file, the root element, as well as the external database can be set in the JDBC URL:

pic4.jpg

The use of an external database defined by dbprops is optional, but is strongly recommended for production use. Ideally, the staging database should be used for this. Also, when using a complex file exclusively for read purposes, it is recommended to use the ro=true property to ensure the file is not unnecessarily synchronized back from the database when the connection is closed. A data file is always required to be present at the filename path during design-time. Without this file, operations like testing the connection, reading the model data, or reverse engineering the model will fail.

All properties of the Complex File JDBC Driver are documented in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator in Appendix C: Oracle Data Integrator Driver for Complex Files Reference.

David Allan has created a great viewlet Complex File Processing - 0 to 60 which shows the creation of a Complex File data server as well as a model based on this server.

How to Create Models based on an Complex File Schema

Once physical schema and logical schema have been created, the Complex File can be used to create a Model as if it were based on a database. When reverse-engineering the Model, data stores(tables) for each XSD element of complex type will be created. Use of complex files as sources is straightforward; when using them as targets it has to be made sure that all dependent tables have matching PK-FK pairs; the same applies to the XML driver as well.

Debugging and Error Handling

There are different ways to test an nXSD file. The Native Format Builder Wizard can be used even if the nXSD wasn’t created in it; it will show issues related to the schema and/or test data. In ODI, the nXSD will be parsed and run against the existing test XML file when testing a connection in the Dataserver. If either the nXSD has an error or the data is non-compliant to the schema, an error will be displayed.

Sample error message:

Error while reading native data.
[Line=1, Col=5] Not enough data available in the input, when trying to read data of length "19" for "element with name D1" from the specified position, using "style" as "fixedLength" and "length" as "". Ensure that there is enough data from the specified position in the input.

Complex File FAQ

Is the size of the native file limited by available memory?
No, since the native data is streamed through the driver, only the available space in the staging database limits the size of the data. There are limits on individual field sizes, though; a single large object field needs to fit in memory.

Should I always use the complex file driver instead of the file driver in ODI now?
No, use the file technology for all simple file parsing tasks, for example any fixed-length or delimited files that just have one row format and can be mapped into a simple table. Because of its narrow assumptions the ODI file driver is easy to configure within ODI and can stream file data without writing it into a database. The complex file driver should be used whenever the use case cannot be handled through the file driver.

Should I use the complex file driver to parse standard file formats such as EDI, HL7, FIX, SWIFT, etc.? 
The complex file driver is technically able to parse most standard file formats, the user would have to develop an nXSD to parse the expected message. However, in some instances the use case requires a supporting infrastructure, such as message validation, acknowledgement messages, routing rules, etc. In these cases products such as Oracle B2B or  Oracle Service Bus for Financial Services will be better suited and could be combined with ODI.

Are we generating XML out of flat files before we write it into a database?
We don’t materialize any XML as part of parsing a flat file, either in memory or on disk. The data produced by the XML parser is streamed in Java objects that just use XSD-derived nXSD schema as its type system. We use the nXSD schema because is the standard for describing complex flat file metadata in Oracle Fusion Middleware, and enables users to share schemas across products.

Is the nXSD file interchangeable with SOA Suite?
Yes, ODI can use the same nXSD files as SOA Suite, allowing mixed use cases with the same data format.

Can I start the Native Format Builder from the ODI Studio?
No, the Native Format Builder has to be started from a JDeveloper with BPEL instance. You can get BPEL as part of the SOA Suite bundle. Users without SOA Suite can manually develop nXSD files using XSD editors.

When is the database data written back to the native file?
Data is synchronized using the SYNCHRONIZE and CREATE FILE commands, and when the JDBC connection is closed. It is recommended to set the ro or read_only property to true when a file is exclusively used for reading so that no unnecessary write-backs occur.

Is the nXSD metadata part of the ODI Master or Work Repository?
No, the data server definition in the master repository only contains the JDBC URL with file paths; the nXSD files have to be accessible on the file systems where the JDBC driver is executed during production, either by copying or by using a network file system.

Where can I find sample nXSD files?
The Application Server Adapter Users Guide contains nXSD samples for various different use cases.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today