Monday Jun 20, 2011

Oracle Data Integrator 11.1.1.5 Complex Files as Sources and Targets

Overview

ODI 11.1.1.5 adds the new Complex File technology for use with file sources and targets. The goal is to read or write file structures that are too complex to be parsed using the existing ODI File technology. This includes:

    • Different record types in one list that use different parsing rules
    • Hierarchical lists, for example customers with nested orders
    • Parsing instructions in the file data, such as delimiter types, field lengths, type identifiers
    • Complex headers such as multiple header lines or parseable information in header
    • Skipping of lines
    • Conditional or choice fields

Similar to the ODI File and XML File technologies, the complex file parsing is done through a JDBC driver that exposes the flat file as relational table structures. Complex files are mapped to one or more table structures, as opposed to the (simple) file technology, which always has a one-to-one relationship between file and table. The resulting set of tables follows the same concept as the ODI XML driver, table rows have additional PK-FK relationships to express hierarchy as well as order values to maintain the file order in the resulting table.

pic1.jpg

The parsing instruction format used for complex files is the nXSD (native XSD) format that is already in use with Oracle BPEL. This format extends the XML Schema standard by adding additional parsing instructions to each element. Using nXSD parsing technology, the native file is converted into an internal XML format. It is important to understand that the XML is streamed to improve performance; there is no size limitation of the native file based on memory size, the XML data is never fully materialized. The internal XML is then converted to relational schema using the same mapping rules as the ODI XML driver.

How to Create an nXSD file

Complex file models depend on the nXSD schema for the given file. This nXSD file has to be created using a text editor or the Native Format Builder Wizard that is part of Oracle BPEL. BPEL is included in the ODI Suite, but not in standalone ODI Enterprise Edition. The nXSD format extends the standard XSD format through nxsd attributes. NXSD is a valid XML Schema, since the XSD standard allows extra attributes with their own namespaces.

The following is a sample NXSD schema blog.xsd:

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" elementFormDefault="qualified" xmlns:tns="http://xmlns.oracle.com/pcbpel/demoSchema/csv" targetNamespace="http://xmlns.oracle.com/pcbpel/demoSchema/csv" attributeFormDefault="unqualified"

nxsd:encoding="US-ASCII" nxsd:stream="chars" nxsd:version="NXSD">

<xsd:element name="Root">

<xsd:complexType><xsd:sequence>

<xsd:element name="Header">

<xsd:complexType><xsd:sequence>

<xsd:element name="Branch" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>

<xsd:element name="ListDate" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}"/>

</xsd:sequence></xsd:complexType>

</xsd:element>

<xsd:element name="Customer" maxOccurs="unbounded">

<xsd:complexType><xsd:sequence>

<xsd:element name="Name" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>

<xsd:element name="Street" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="," />

<xsd:element name="City" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}" />

</xsd:sequence></xsd:complexType>

</xsd:element>

</xsd:sequence></xsd:complexType>

</xsd:element>

</xsd:schema>

The nXSD schema annotates elements to describe their position and delimiters within the flat text file. The schema above uses almost exclusively the nxsd:terminatedBy instruction to look for the next terminator chars. There are various constructs in nXSD to parse fixed length fields, look ahead in the document for string occurences, perform conditional logic, use variables to remember state, and many more.

nXSD files can either be written manually using an XML Schema Editor or created using the Native Format Builder Wizard. Both Native Format Builder Wizard as well as the nXSD language are described in the Application Server Adapter Users Guide. The way to start the Native Format Builder in BPEL is to create a new File Adapter; in step 8 of the Adapter Configuration Wizard a new Schema for Native Format can be created:

pic2.jpg

The Native Format Builder guides through a number of steps to generate the nXSD based on a sample native file. If the format is complex, it is often a good idea to “approximate” it with a similar simple format and then add the complex components manually. The resulting *.xsd file can be copied and used as the format for ODI, other BPEL constructs such as the file adapter definition are not relevant for ODI. Using this technique it is also possible to parse the same file format in SOA Suite and ODI, for example using SOA for small real-time messages, and ODI for large batches.

This nXSD schema in this example describes a file with a header row containing data and 3 string fields per row delimited by commas, for example blog.dat:

Redwood City Downtown Branch, 06/01/2011
Ebeneezer Scrooge, Sandy Lane, Atherton
Tiny Tim, Winton Terrace, Menlo Park

The ODI Complex File JDBC driver exposes the file structure through a set of relational tables with PK-FK relationships. The tables for this example are:

Table ROOT (1 row):

ROOTPK

Primary Key for root element

SNPSFILENAME

Name of the file

SNPSFILEPATH

Path of the file

SNPSLOADDATE

Date of load

Table HEADER (1 row):

ROOTFK

Foreign Key to ROOT record

HEADERORDER

Order of row in native document

BRANCH

Data

BRANCHORDER

Order of Branch within row

LISTDATE

Data

LISTDATEORDER

Order of ListDate within row

Table CUSTOMER (2 rows):

ROOTFK

Foreign Key to ROOT record

CUSTOMERORDER

Order of rows in native document

NAME

Data

NAMEORDER

Oder of Name within row

STREET

Data

STREETORDER

Order of Street within row

CITY

Data

CITYORDER

Order of City within row

Every table has PK and/or FK fields to reflect the document hierarchy through relationships. In this example this is trivial since the HEADER and all CUSTOMER records point back to the PK of ROOT. Deeper nested documents require this to identify parent elements. All child element tables also have a order field (HEADERORDER, CUSTOMERORDER) to define the order of rows, as well as order fields for each column, in case the order of columns varies in the original document and needs to be maintained. If order is not relevant, these fields can be ignored.

How to Create an Complex File Data Server in ODI

After creating the nXSD file and a test data file, and storing it on the local file system accessible to ODI, you can go to the ODI Topology Navigator to create a Data Server and Physical Schema under the Complex File technology.

pic3_new.jpg

This technology follows the conventions of other ODI technologies and is very similar to the XML technology. The parsing settings such as the source native file, the nXSD schema file, the root element, as well as the external database can be set in the JDBC URL:

pic4.jpg

The use of an external database defined by dbprops is optional, but is strongly recommended for production use. Ideally, the staging database should be used for this. Also, when using a complex file exclusively for read purposes, it is recommended to use the ro=true property to ensure the file is not unnecessarily synchronized back from the database when the connection is closed. A data file is always required to be present at the filename path during design-time. Without this file, operations like testing the connection, reading the model data, or reverse engineering the model will fail.

All properties of the Complex File JDBC Driver are documented in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator in Appendix C: Oracle Data Integrator Driver for Complex Files Reference.

David Allan has created a great viewlet Complex File Processing - 0 to 60 which shows the creation of a Complex File data server as well as a model based on this server.

How to Create Models based on an Complex File Schema

Once physical schema and logical schema have been created, the Complex File can be used to create a Model as if it were based on a database. When reverse-engineering the Model, data stores(tables) for each XSD element of complex type will be created. Use of complex files as sources is straightforward; when using them as targets it has to be made sure that all dependent tables have matching PK-FK pairs; the same applies to the XML driver as well.

Debugging and Error Handling

There are different ways to test an nXSD file. The Native Format Builder Wizard can be used even if the nXSD wasn’t created in it; it will show issues related to the schema and/or test data. In ODI, the nXSD will be parsed and run against the existing test XML file when testing a connection in the Dataserver. If either the nXSD has an error or the data is non-compliant to the schema, an error will be displayed.

Sample error message:

Error while reading native data.
[Line=1, Col=5] Not enough data available in the input, when trying to read data of length "19" for "element with name D1" from the specified position, using "style" as "fixedLength" and "length" as "". Ensure that there is enough data from the specified position in the input.

Complex File FAQ

Is the size of the native file limited by available memory?
No, since the native data is streamed through the driver, only the available space in the staging database limits the size of the data. There are limits on individual field sizes, though; a single large object field needs to fit in memory.

Should I always use the complex file driver instead of the file driver in ODI now?
No, use the file technology for all simple file parsing tasks, for example any fixed-length or delimited files that just have one row format and can be mapped into a simple table. Because of its narrow assumptions the ODI file driver is easy to configure within ODI and can stream file data without writing it into a database. The complex file driver should be used whenever the use case cannot be handled through the file driver.

Should I use the complex file driver to parse standard file formats such as EDI, HL7, FIX, SWIFT, etc.? 
The complex file driver is technically able to parse most standard file formats, the user would have to develop an nXSD to parse the expected message. However, in some instances the use case requires a supporting infrastructure, such as message validation, acknowledgement messages, routing rules, etc. In these cases products such as Oracle B2B or  Oracle Service Bus for Financial Services will be better suited and could be combined with ODI.

Are we generating XML out of flat files before we write it into a database?
We don’t materialize any XML as part of parsing a flat file, either in memory or on disk. The data produced by the XML parser is streamed in Java objects that just use XSD-derived nXSD schema as its type system. We use the nXSD schema because is the standard for describing complex flat file metadata in Oracle Fusion Middleware, and enables users to share schemas across products.

Is the nXSD file interchangeable with SOA Suite?
Yes, ODI can use the same nXSD files as SOA Suite, allowing mixed use cases with the same data format.

Can I start the Native Format Builder from the ODI Studio?
No, the Native Format Builder has to be started from a JDeveloper with BPEL instance. You can get BPEL as part of the SOA Suite bundle. Users without SOA Suite can manually develop nXSD files using XSD editors.

When is the database data written back to the native file?
Data is synchronized using the SYNCHRONIZE and CREATE FILE commands, and when the JDBC connection is closed. It is recommended to set the ro or read_only property to true when a file is exclusively used for reading so that no unnecessary write-backs occur.

Is the nXSD metadata part of the ODI Master or Work Repository?
No, the data server definition in the master repository only contains the JDBC URL with file paths; the nXSD files have to be accessible on the file systems where the JDBC driver is executed during production, either by copying or by using a network file system.

Where can I find sample nXSD files?
The Application Server Adapter Users Guide contains nXSD samples for various different use cases.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today