Technical info and insight on using Oracle Documaker for customer communication and document automation

  • ODEE
    January 6, 2016

Using ODEE's Transactional Data

Andy Little
Technical Director

Today I'm going to demonstrate how you can query the Oracle Documaker Enterprise Edition ("ODEE") transactional data to get some statistics on your processing. All you will need to complete this exercise is a functional ODEE environment that has some transactions in it, and you might need to know a little bit about the content of those transactions. In addition you'll need access to the database that houses the Assembly Line - read only access is fine for this exercise. This will actually be a pretty simple exercise that doesn't require much knowledge of SQL syntax or XML, but a little bit of expertise here would be helpful. Let's get started! Note: the examples herein assume you're using Oracle database 12c. If you're using DB2 or MS-SQL Server, you may have to tweak these examples.

First, let's understand the history of this transactional data in Documaker. As you might be aware, the Standard edition of Documaker uses a proprietary record layout for the transactional data generated by Documaker during execution. The transactional data may be stored in two files: the POL file ("POLFILE"), which contains form set information, and the NA file ("NAFILE"), which contains section and variable field information. In some systems these two files are combined into a single file called the NAPOL file. The layout of the NAFILE is documented here.  Side note: I've been here for 17 years and I still haven't gotten a good explanation for what the acronyms POL and NA mean!

With the advent of ODEE, two important design changes occurred: all transactional data was moved from the filesystem into a relational database system. This change was crucial to create a scalable, robust architecture for an enterprise-class document automation solution. Relying on intermediary or final output files being written to disk creates a bottleneck for systems that need to scale to accommodate large volumes. The second change was more subtle - changing the internal NAPOL data format from a proprietary record-based format to an self-describing XML format. This means that the NAPOL data becomes immediately more useful. It is a rich data source about every transaction that can be mined, queried, and inspected for statistical analysis, spot checking, and workflow control, and more. In many cases this data is consolidated into a single place - the document - which would otherwise be scattered across many disparate systems. I've uploaded a sample of the NAFILE in XML format here.

When using ODEE the NAPOL data is stored in the TRNS table of the Assembly Line database, in the column named TRNNAPOLXML. This column is defined with the datatype XMLTYPE. In Oracle database, this means that we can execute XPATH-style queries and obtain results from an XML document using SQL-type syntax. Note: my goal here is not to explain how this works internally, but I am giving you a little background to be helpful. In order to query this table and the resulting XML data, we're going to use Oracle's XMLTABLE function. To begin, open SQLPLUS (or your favorite SQL tool, such as TOAD or SQLDeveloper) and login to your Assembly Line schema with an appropriate credential. Here's what this looks like using SQLPLUS on my demonstration system:

$ sqlplus
SQL*Plus: Release Production on Thu Jan 7 01:29:30 2016
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Enter user-name: dmkr_asline
Enter password: 
Last Successful login time: Thu Jan 07 2016 01:29:08 +05:30
Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Next, you'll want to enter the query to execute. First, let's step through the significant query elements and you can modify this for your system before executing it. The first line instructs the database to return the row count of the TRN_ID columns which match our forthcoming criteria.


The next line tells the database the source table(s) for our query. In this case, the actual table is the TRNS table, which is part of the DMKR_ASLINE schema.


In addtion to the TRNS table, we have defined a second table using the XMLTABLE() command. The Oracle database will create this virtual table using the XPATH query '//*:Form', which instructs the XML parser to obtain every <Form> element and child nodes, and the source XML document is from the column called TRNNAPOLXML (from the TRNS table).


The virtual table will contain a single column called 'v_name' and it's datatype is VARCHAR2 with a length of 100. The data value for this column will be source by applying the XPATH '@name' to the nodes obtained from //*:Form.

COLUMNS v_name varchar2(100) PATH '@name')

Finally, the WHERE clause will filter into our count only those rows from the virtual table where the column v_name has a value of 'UL APPLICATION REJECTION NOTICE'.


Below is a snapshot of the XML within the TRANNAPOLXML column. I've placed arrows pointing to the <Form> nodes in this XML document. You can see the attribute "name" here as well.

The parameters for the XMLTABLE function instruct the database to create an XML snippet from the TRNNAPOLXML column by applying the XPATH //*:Form. This means that the XML used to create the XMLTABLE will be restricted to only the <Form> elements and their children. The XMLTABLE parameters tell the database that the virtual table will contain one column, v_name, which comes from the XPATH @name, which is parsed from the XML snippet. The end result for the above example is an XMLTABLE that looks like this:


In practice, the virtual table will be much larger, as it could encompass data from all the rows in the TRNS table - it could be huge! We'll discuss this implication later. Recalling that our WHERE clause restricts our count to include only the form name 'UL APPLICATION REJECTION NOTICE', we note that the TIFFINCLUDE form will not be included in the overall count. So, at this point you should have an understanding of how the query works. Let's enter the query into SQL PLUS (Note - for readability lines are wrapped, however you will enter the following on a single line):


After pressing <ENTER>, the query will be executed and you'll get the results. If you did everything correctly, you see some output that looks similar to this:


Keep in mind of course that your results may vary based on two factors: 1) if you have no transactions, then you obviously won't get any results; and 2) you may need to change the name of the form you're querying if you haven't been using my demo system. What this tells me is that of all the transactions in my system, the form UL APPLICATION REJECTION NOTICE was used 45 times. What this information doesn't show is the timeframe in which this form was used, or over how many transactions this form was used. Those two additional pieces of information will give us insight into how often this particular form is used. Recall that the only WHERE clause we used was based on the form name - we didn't place any restrictions on the timeframe of the transaction. We can augment our query using an additional clause that restricts the query to only those transactions that are completed and were generated within a specific window. Note again that for readability lines are wrapped, but you will enter everything on a single line.



What's different is that  I added an AND conjunction and then the filter on the ENDTIME to include only those transactions that completed between 01-Jan-2016 and 07-Jan-2016, and then I added another AND conjunction with a final criteria of only completed transactions (TRNSTATUS=999). After pressing <ENTER> we see the results of the count - 4! To gain some additional insight we might want to know how many total forms were generated during this time period as well, and we can obtain this by removing v_name filter from the query:

SQL> SELECT count(TRN_ID) FROM DMKR_ASLINE.TRNS,XMLTABLE ('//*:Form' PASSING TRNNAPOLXML COLUMNS v_name varchar2(100) PATH '@name') WHERE ENDTIME BETWEEN to_timestamp_tz('01-JAN-16', 'DD-MON-RR HH.MI.SSXFF AM') AND to_timestamp_tz('07-JAN-16', 'DD-MON-RR HH.MI.SSXFF AM')AND TRNSTATUS = 999; 


This query shows us the count of all <Form> nodes present in transactions that completed successfully between 01-Jan-2016 and 07-Jan-2016. That means that the UL Application Rejection Notice constitutes 50% of forms generated during that time period! So now you might be wondering why this information is useful - the biggest benefit is knowing how a form is used in your business. You can customize this query to dig even further into the data model by looking at fonts, sections, rules, even actual data values. You can augment the query to consider only transactions that had to be processed manually, so you can understand which forms are being routed to interactive. You can interrogate this data to determine if some of your forms are generated in cycles. There are many ways you can view this information - the key is knowing how to access it! I hope this has been useful for you - feel free to post comments or questions!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.