« Using Oracle Warehouse Builder with Exadata: Success Story at Allegro Group | Main | Why does it take forever to build ETL processes »

File staging using external table preprocessor

Here is a quick write up of how to take advantage of the external table preprocessor feature which was squeezed into the 11.1.0.7 and 10.2.0.5 database patch sets from within OWB. OWB has rich metadata support for file design and its manipulation from the file metadata itself to the external table definition, SQL*Loader mappings and PLSQL support, it also provides a flexible approach (more like tips and tricks) for incorporating aspects of the database that are not directly designed in the UI, such objects include views, unbound operators in mapping etc. For external table, the access specification can be hand-crafted rather than generated from the metadata or imported from the data dictionary. It is this flexibility that allows you to incorporate the preprocessor directive into external table designs within production OWB releases.

For details of the preprocessor directive see the article 'Using the Preprocessor Feature with External Tables in Oracle Database 11g' on OTN.

The example we will illustrate here is using a compressed data file as the data file source for the external table. The external table represents customer details and is defined in a database schema. We will setup the metadata location for the Oracle module and select 'Import' (external tables can also be designed, you do not need to base it on a file defined in OWB, if you hit 'Finish' on the file selection page of the wizard you will create an 'unbound' external table);

exttab_gzip1

From the import wizard we can select the external table to be imported;

exttab_gzip2

When the table has been imported we get the signature of the table with all of the columns and data types defined;

exttab_gzip3

In the access specification section we see the PREPROCESSOR directive that our external table was defined with also imported. The script gunzipdb is used to encapsulate the unzipping and write the resultant file to the standard output stream to be consumed by the external table access driver. As mentioned from the above  post although commonly used for compressed files it could perform any manipulation on the file in preparation for it being consumed by the external table.

exttab_gzip4

The gunzipdb is just a shell script I created to wrapper the options to the gzip executable, it performs the unzip using the -d and -c options (on Windows I included the echo off command, also had much better success with large compressed files on Linux than on Windows).

We can inspect the external tables configuration parameters and customize the external table's treatment for missing values, trimming etc.

exttab_gzip5

We can also see the data file imported with the external table definition, note it is a gzip'd file and the SQL directory location can be set. We can also add additional data files if we wanted to process many files.

exttab_gzip6

 

We can now either just use this external table in a mapping or generate and deploy the external table definition off to another schema if desired.

exttab_gzip7

That's a quick run through of using the external table preprocessor capabilities from within OWB, if there are other interesting use-cases it'd be interesting to get some feedback.

TrackBack

TrackBack URL for this entry:
http://blogs.oracle.com/mt/mt-tb.cgi/12409

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)