Inside the Document Factory

If you review the Oracle Documaker Enterprise Edition documentation, there’s a glossary section which defines the Document Factory:

"Document Factory is a document publishing and distribution application. The Document Factory monitors the jobs, transactions and other objects in Document Factory. "

It’s terse, but at it’s heart, that is exactly what the Document Factory does. And just for brevity, let’s call it docfactory from here on out – we’ll use Automated Document Factory ("ADF") to describe the processing model, and docfactory shall be the actual software. ADF is a termed defined by Gartner as..

"…an architecture and set of processes to manage the creation and delivery of mission-critical, high-volume digital documents. The ADF applies factory production concepts to the document production — raw materials, including data and preparation instructions, enter the ADF, where they are transformed into digital documents and prepared for delivery."

Catchy stuff indeed, but it formed the basis of the transformation of Documaker from a batch processing, single-threaded document production engine into a full-fledged enterprise-class document automation center. So… how exactly does this document factory work? Let’s start with the basics. The factory metaphor features some common roles filled by humans: supervisors (who ensure everything runs), schedulers (who determine when things should be done), and workers (who perform tasks). These roles are also fulfilled by Java processes that are named as such: Supervisor, Scheduler, and a host of workers (Historian, Archiver, Batcher, Distributor, Assembler, Presenter, Publisher, Receiver, and Identifier). The workers are further classified in to two groups: pure Java workers, and Gendata workers – so called because they are built using Documaker core runtime components. The classification is used primarily to control the scaling of the different workers. The pure Java workers are thread-pool aware whereas the Gendata workers are instance-aware. Stay tuned for an in-depth article on scaling that covers this topic in greater detail. In addition to the Java processes, there are external resources that are required to run the Docfactory – a database, a database connection pool, a set of JMS queues hosted on a JMS server, and JNDI lookup services. All of these items, with the exception of the database, are provided by an application server – either WebLogic or WebSphere. The database services can be provided by Oracle 11g or IBM DB2 9.7.

Processing begins with input data, which can make its way to the docfactory via several channels – hot directory, web service, or queue – and each channel has different handlers which control the processing. Receiver worker threads monitor directories which are configured as hot directories. Whenever a file is written to a hot directory, a Receiver worker thread will rename the file to a temporary name to prevent other Receiver worker threads from trying to process the file. The input file is read, and a new JOBS table record is created, which contains the input data from the file. Once the record is created, the Receiver sets the JOBSTATUS value for the new row, indicating that it is

Inside the Document Factory

Andy Little

Technical Director

Scaling the Walls of Performance

Future Planning: Documaker, WIPedit, and Thin Client Editor

Inside the Document Factory

Authors

Andy Little

Technical Director

Scaling the Walls of Performance

Future Planning: Documaker, WIPedit, and Thin Client Editor