ODI - Integrating PDF using iText

Integrating PDF form data with ODI is as easy as any other data source,  see the code template from Suraj Bang a few years back with OWB for what I followed and converted to ODI - a few changes for the newer version of iText and to work with ODI. The LKM PDF to SQL uses ODI's File technology as a source and loads to any ANSI SQL store. The LKM reads all of the form fields from a PDF form and supports the File technology's String and Numeric datatype. The fields have to be defined by position, so your field name in the file can be MYCOL_1, MYCOL_2, MYCOL_3 etc. or whatever prefix you desire. The LKM as mentioned uses the iText java api, so the JAR should be downloaded and copied into the normal ODI userlib directory, I tested with version 5.3.5 of iText.

The LKM PDF to SQL is here. I manually defined a File datastore with column names (remembering to use the position as the suffix) and types, the delimiter and such information are not used since its PDF, but I set them anyway. The example I used was to process W4 PDF data, the same as in Suraj's post and was able to process all of the PDFs in the same manner. Kudos to Suraj for the jython, with a few little changes from me to use the latest iText.

Comments:

David,

Here is the link (old) for ODI developers who have not worked for OWB.
http://odi-ee.blogspot.com/2010/11/pdf-data-extraction-odi.html

I have added a link back to this post. Thanks for the code changes.

Cheers,
Suraj.

Posted by suraj on January 15, 2013 at 07:07 AM PST #

Hi Suraj

Cool, I forgot about that! Must be getting old...and of course you have a nice way to generate the metadata, sweet! I have updated the metadata generator with the new iText calls also. I will probably do a quick blog or viewlet on using it.

https://blogs.oracle.com/dataintegration/resource/odi_11g/pdf_metadata_generator_jython.txt

Did you see the post on using java table functions with javadb/derby/jdk7? This makes the construction of API based integration much simpler on the retrieval side and would greatly simplify the LKM, what do you think?

Cheers
David

Posted by David on January 15, 2013 at 09:37 AM PST #

I have posted an alternative to the LKM PDF to SQL approach below, this one uses the java table function mechanism....

https://blogs.oracle.com/dataintegration/entry/odi_java_table_function_for

Cheers
David

Posted by David on January 17, 2013 at 09:00 AM PST #

Hi David,

I did go through your posts of java table functions\MongoDB and tried it. I think it's cool.. helpful in encapsulating and integrating with apis. Just need to try few things to test its limits(Complexity of objects,Volume of data) :-) .

Cheers,
Suraj.

Posted by suraj on January 24, 2013 at 08:07 AM PST #

I created a quick viewlet to show this in ODI, see the blog post here;
https://blogs.oracle.com/dataintegration/entry/odi_extracting_data_from_pdf

Cheers
David

Posted by David on February 06, 2013 at 01:50 PM PST #

Metadata procedure generates file without values, like that:

COLUMN_1|COLUMN_2|COLUMN_3|COLUMN_4|COLUMN_5|COLUMN_6|COLUMN_7|COLUMN_8|COLUMN_9|COLUMN_10|COLUMN_11|COLUMN_12|COLUMN_13|COLUMN_14|COLUMN_15|COLUMN_16|COLUMN_17|COLUMN_18|COLUMN_19|COLUMN_20|COLUMN_21|COLUMN_22|COLUMN_23|COLUMN_24|COLUMN_25|COLUMN_26|COLUMN_27|COLUMN_28|COLUMN_29|COLUMN_30|COLUMN_31|COLUMN_32|COLUMN_33|COLUMN_34|COLUMN_35|COLUMN_36|COLUMN_37|COLUMN_38|COLUMN_39|COLUMN_40|COLUMN_41|COLUMN_42
topmostSubform[0].Page1[0].f1_17_0_[0]|topmostSubform[0].Page1[0].f1_18_0_[0]|topmostSubform[0].Page2[0].f2_12_0_[0]|topmostSubform[0].Page1[0].f1_16_0_[0]|topmostSubform[0].Page2[0].f2_04_0_[0]|topmostSubform[0].Page1[0].f1_04_0_[0]|topmostSubform[0].Page1[0].f1_06_0_[0]|topmostSubform[0].Page2[0].f2_18_0_[0]|topmostSubform[0].Page1[0].f1_07_0_[0]|topmostSubform[0].Page2[0].f2_09_0_[0]|topmostSubform[0].Page1[0].c1_01[0]|topmostSubform[0].Page1[0].c1_01[1]|topmostSubform[0].Page1[0].f1_01_0_[0]|topmostSubform[0].Page2[0].f2_10_0_[0]|topmostSubform[0].Page2[0].f2_06_0_[0]|topmostSubform[0].Page1[0].f1_20_0_[0]|topmostSubform[0].Page1[0].Line1[0].f1_15_0_[0]|topmostSubform[0].Page1[0].Line1[0].f1_10_0_[0]|topmostSubform[0].Page1[0].f1_19_0_[0]|topmostSubform[0].Page2[0].f2_05_0_[0]|topmostSubform[0].Page2[0].f2_11_0_[0]|topmostSubform[0].Page2[0].f2_08_0_[0]|topmostSubform[0].Page2[0].f2_16_0_[0]|topmostSubform[0].Page2[0].f2_07_0_[0]|topmostSubform[0].Page1[0].Line1[0].f1_14_0_[0]|topmostSubform[0].Page1[0].f1_05_0_[0]|topmostSubform[0].Page1[0].c1_02[0]|topmostSubform[0].Page2[0].f2_14_0_[0]|topmostSubform[0].Page2[0].f2_03_0_[0]|topmostSubform[0].Page2[0].f2_02_0_[0]|topmostSubform[0].Page2[0].f2_15_0_[0]|topmostSubform[0].Page1[0].c1_01[2]|topmostSubform[0].Page1[0].f1_02_0_[0]|topmostSubform[0].Page1[0].f1_22_0_[0]|topmostSubform[0].Page2[0].f2_19_0_[0]|topmostSubform[0].Page1[0].f1_03_0_[0]|topmostSubform[0].Page1[0].Line1[0].f1_09_0_[0]|topmostSubform[0].Page2[0].f2_17_0_[0]|topmostSubform[0].Page2[0].f2_13_0_[0]|topmostSubform[0].Page2[0].f2_01_0_[0]|topmostSubform[0].Page1[0].f1_08_0_[0]|topmostSubform[0].Page1[0].f1_13_0_[0]
|||||||||||||||||||||||||||||||||||||||||

Where can be a problem?

Thanks,
Wojciech

Posted by Wojciech on March 19, 2013 at 03:16 PM PDT #

Hi

Are you using the metadata generator I linked to in the comment above? Does it have the generate field values variable set to Y?
https://blogs.oracle.com/dataintegration/resource/odi_11g/pdf_metadata_generator_jython.txt

Which PDF form are you reading from? Does it have values filled out?

Cheers
David

Posted by David on March 19, 2013 at 03:57 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today