ODI - Integrating PDF using iText
By David Allan on Jan 14, 2013
Integrating PDF form data with ODI is as easy as any other data source, see the code template from Suraj Bang a few years back with OWB for what I followed and converted to ODI - a few changes for the newer version of iText and to work with ODI. The LKM PDF to SQL uses ODI's File technology as a source and loads to any ANSI SQL store. The LKM reads all of the form fields from a PDF form and supports the File technology's String and Numeric datatype. The fields have to be defined by position, so your field name in the file can be MYCOL_1, MYCOL_2, MYCOL_3 etc. or whatever prefix you desire. The LKM as mentioned uses the iText java api, so the JAR should be downloaded and copied into the normal ODI userlib directory, I tested with version 5.3.5 of iText.
The LKM PDF to SQL is here. I manually defined a File datastore with column names (remembering to use the position as the suffix) and types, the delimiter and such information are not used since its PDF, but I set them anyway. The example I used was to process W4 PDF data, the same as in Suraj's post and was able to process all of the PDFs in the same manner. Kudos to Suraj for the jython, with a few little changes from me to use the latest iText.