File staging using external table preprocessor

Here is a quick write up of how to take advantage of the external table preprocessor feature which was squeezed into the 11.1.0.7 and 10.2.0.5 database patch sets from within OWB. OWB has rich metadata support for file design and its manipulation from the file metadata itself to the external table definition, SQL*Loader mappings and PLSQL support, it also provides a flexible approach (more like tips and tricks) for incorporating aspects of the database that are not directly designed in the UI, such objects include views, unbound operators in mapping etc. For external table, the access specification can be hand-crafted rather than generated from the metadata or imported from the data dictionary. It is this flexibility that allows you to incorporate the preprocessor directive into external table designs within production OWB releases.

For details of the preprocessor directive see the article 'Using the Preprocessor Feature with External Tables in Oracle Database 11g' on OTN.

The example we will illustrate here is using a compressed data file as the data file source for the external table. The external table represents customer details and is defined in a database schema. We will setup the metadata location for the Oracle module and select 'Import' (external tables can also be designed, you do not need to base it on a file defined in OWB, if you hit 'Finish' on the file selection page of the wizard you will create an 'unbound' external table);

exttab_gzip1

From the import wizard we can select the external table to be imported;

exttab_gzip2

When the table has been imported we get the signature of the table with all of the columns and data types defined;

exttab_gzip3

In the access specification section we see the PREPROCESSOR directive that our external table was defined with also imported. The script gunzipdb is used to encapsulate the unzipping and write the resultant file to the standard output stream to be consumed by the external table access driver. As mentioned from the above  post although commonly used for compressed files it could perform any manipulation on the file in preparation for it being consumed by the external table.

exttab_gzip4

The gunzipdb is just a shell script I created to wrapper the options to the gzip executable, it performs the unzip using the -d and -c options (on Windows I included the echo off command, also had much better success with large compressed files on Linux than on Windows).

We can inspect the external tables configuration parameters and customize the external table's treatment for missing values, trimming etc.

exttab_gzip5

We can also see the data file imported with the external table definition, note it is a gzip'd file and the SQL directory location can be set. We can also add additional data files if we wanted to process many files.

exttab_gzip6

 

We can now either just use this external table in a mapping or generate and deploy the external table definition off to another schema if desired.

exttab_gzip7

That's a quick run through of using the external table preprocessor capabilities from within OWB, if there are other interesting use-cases it'd be interesting to get some feedback.

Comments:

I would like to point out that parallel select on a external table will not work with a single zipped file. Only one slave will do the reading while rest of the slaves will be idle.The workaround is to have the external table split into multiple zipped files and then add all the files in the LOCATION clause. Without this ability, loading or reading uncompressed files is much faster. However, compressed files still will save you bandwidth and time when moving them across the network.

Posted by Vijay balebail on July 28, 2010 at 02:35 AM PDT #

Hi Vijay Thanks, that's right. When you configure the external table in OWB, the 'Data Files' node is where you can add multiple files. This list of files will be added to the location clause. Cheers David

Posted by David Allan on August 04, 2010 at 02:52 AM PDT #

Hi,

I can't get my external table to operate on the database, wgereas they are created fine from owb. The database runs on windows server 2003. The flat location is ok. I still get ora-29913.

Do you have any idea what might be the matter?

Thanks,

Rob

Posted by rob heinen on March 01, 2012 at 05:43 AM PST #

Hi Rob

Can the OS process running the Oracle instance access the file you are using? Is the file accessible to that database server and are the OS permissions such that the OS user running Oracle can access the file?

Cheers
David

Posted by David on March 01, 2012 at 07:11 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today