X

Technical info and insight on using Oracle Documaker for customer communication and document automation

  • ODEE
    December 17, 2015

Stability through Monitoring

Andy Little
Technical Director

Monitoring is a critical and necessary function to ensure that systems and process are running properly. Good monitoring practice can also be proactive in identifying and resolving potential problems before they occur. In many implementations it is possible to omit or defer the definition of process monitoring which can lead to overlooking important items. Many times this effort is left up to systems analysts to determine which processes and files should be monitored, and without guidance, there are opportunities to miss critical functions. My goal in this post is to consolidate and
explain some of basic areas within a Documaker Enterprise implementation that should be monitored and how to monitor them.

Processes

Process monitoring is a basic function all IT departments and is
typically done with enterprise-level tools. Where Oracle Documaker is
concerned, there are two types of processes that should be monitored:
singletons and over-watch processes. A singleton is a process that exists in
only one instance and thus represents a single point of failure (SPOF). SPOFs
should be minimized in a highly available environment. To mitigate singletons
as SPOFs process monitoring should be enabled. It is possible for two workers in the Oracle Documaker infrastructure
to be configured as singletons – the Historian and the
Receiver. These Workers can be deployed in a clustered fashion, however certain
implementation choices can preclude the use of a clustered environment and thus
should be avoided. Specifically Hot
folder submission
in a clustered environment – while the hot folder is
a supported implementation method, it is not advisable from an enterprise
perspective. Additionally, when hot folder submission is used, there should be at
most one Receiver per hot folder directory. This model has two implications:

  • One hot folder per cluster member that must be
    independent and not shared.
  • Processes that deposit files to hot folders must
    use round-robin protocol to ensure load-balanced job submission.

Failure to implement a file delivery model that meets these
requirements will result in having to run a single Receiver worker across the ODEE
cluster. To do otherwise will result in duplicate job creation, as multiple
Receivers will attempt to process the same data files[1].

The Historian performs high volumes of database activity and
should be scheduled to run at off-peak hours. To prevent overloading the
database only one Historian worker should be configured. The Historian is not a
critical to the document generation process and does not represent a SPOF in
the document generation process chain.

Singleton processes are monitored by the application’s
over-watch process and therefore do not need special handling in the event of
failure, however these processes should be monitored to ensure there is always
at least one instance running on the primary node. In the event of primary node
failure, the singleton must be started on another node. Upon recovery, the singleton
should be shutdown on the failover node and restarted on the recovered node.
Startup and shutdown of the singleton is controlled by the presence of the appropriate
JAR in the deploy directory. The following processes should be monitored on
each node of the application tier to ensure functional operation of the Oracle
Documaker system:

Application

Process

Type

Docfactory

docfactory_supervisor

Over-watch process

Docupresentment

idswatchdog

Over-watch process

Docfactory

docfactory_receiver[2]

Singleton

 If there are numerous restarts of the over-watch processes, this
is indicative of a non-functional system and therefore merits investigation. The
over-watch processes are responsible for ensuring that the sub-processes are
running and are load-balanced, and there are many tunable parameters that the
over-watch processes use to know when to start/kill sub-processes. It is worth
noting that the over-watch processes are responsible for starting the
sub-processes, therefore a parent-child relationship is established. If parents
are stopped then the children will also be stopped –
therefore the monitoring application should ensure that children are properly
stopped before restarting the parent. Although outside the scope of this
document, it is advisable to monitor the appropriate processes of the
Presentation Tier to ensure functional operation of the web application server
(e.g. WebLogic NodeManager or WebSphere Application Server).

Recommendations

If the Receiver must be implemented as a Singleton, amend the
environment build instructions as follows:

  • After installation of Documaker on all application nodes create
    the directory [ODEE_HOME]/documaker/docfactory/undeploy
  • On non-primary application nodes, the receiver.jar should be
    moved from [ODEE_HOME]/documaker/docfactory/deploy to [ODEE_HOME]/documaker/docfactory/undeploy.

Create operational procedures that reflect the following
instructions:

  • Upon failure of primary application node, move receiver.jar from
    [ODEE_HOME]/documaker/docfactory/undeploy to [ODEE_HOME]/documaker/docfactory/deploy on the first
    non-primary node – hereafter called failover node.
  • Upon recovery of primary application node, move receiver jar
    from [ODEE_HOME]/documaker/docfactory/deploy to [ODEE_HOME]/documaker/docfactory/undeploy on the failover
    node.

Monitoring over-watch processes:

  • Establish thresholds for tracked metrics (e.g. process memory or
    CPU consumption, excessive GC or long-running GC, child-process restarts) to determine
    when investigation is required.
  • Ensure child processes are completely stopped before restarting
    an over-watch process.

Monitoring child-processes is not necessary in production as this is the responsibility of the over-watch processes,
however it could be useful for performance tuning and appropriate sizing in
performance test environment.

Logging

The logging mechanism within Oracle Documaker is highly
configurable and is based on LOG4J principles and industry standard logging
patterns. Log messages are generated by ODEE components and are passed to a
logging component, which then routes the messages according to priorities, filters, and appenders.

Priorities

Priorities are used to determine where log messages are sent.
Priorities are defined in the APPCONFIGCONTEXT level (per application/worker
process) and are set for various Java components of the Document Factory.
Generally it is advised to use the default “ERROR”
priority for all entries unless directed to modify a package for diagnostic
purposes.

Priority

Contents

FATAL

Events that prevent Documaker from starting properly (not
typically used)

ERROR

Events that cannot be processed and prevent Documaker from
running properly.

WARN

Events that cannot be processed but do not prevent
Documaker from running properly.

INFO

Events containing informational messages (not typically
used)

DEBUG

Events containing diagnostic information

Filters

Document Factory workers use filters to determine the location
where log messages are written: database or file. Filters are defined by creating
LogFilter entries in the ALCONFIGCONTEXT table. 
Each LogFilter entry includes a package name that correlates to specific
components within Documaker. When a Documaker component generates a log
message, it uses the LogFilter list to determine if it should pass the message
to the database or to the file system. Any package that is named in a LogFilter
will be written to the database; conversely packages not named in LogFilters
will be written to the file system in the [ODEE_HOME]/doucmaker/docfactory/temp/<process-name>
directory. The installation process creates LogFilter entries for each worker,
and some additional components. To change the log location for these
components, deactivate the corresponding row in the ALCONFIGCONTEXT table by
using Documaker Administrator (System -> Assembly Line -> Configure ->
LOG4J_LogFilter_LogFilter) or update the ACTIVE value to 0 in the table for the
appropriate row.

Appenders

Appenders define the destinations where log statements can be
sent. These are defined globally in the ALCONFIGCONTEXT table. They can also be
defined in the APPCONFIGCONTEXT level, which provides a level of override at
the application or worker process level. Each appender has specific
configuration information – see ODEE Administrator
Guide “Configuring
the LOG4J Appenders”.

Appender

Destination

stdout

Standard output appender (e.g. console).

To redirect console output from stdout, amend your startup
script procedure to redirect to the desired file.

Ensure you monitor the file size of this output
periodically as it will grow to available extents.

Example:

./docfactory.sh start 1>stdout.txt
2>stderr.txt

roll

File system.

To modify this setting, use the Documaker Administrator to
configure the Assembly Line. Locate Context LOG4J, Category Appender, Group
roll. Modify the property
File to point to the desired logging location. Note that
you can use replacement variables like ~THREADID and ~PROGRAM to further
categorize the filename/directory structure for the log.

LOG4J default configuration sets the maximum size of this
file and will automatically roll when the file size limit is hit. This limit
can be adjusted in Documaker Administrator using the filter above and
property
MaxFileSize. The
number of rolled files retained is set with the
MaxBackupIndex property.

process-roll

File system. To modify this setting, use the Documaker
Administrator to configure the Assembly Line. Locate Context LOG4J, Category
Appender, Group process-roll. Modify the property
File to point
to the desired logging location. Note that you can use replacement variables
like ~THREADID and ~PROGRAM to further categorize the filename/directory
structure for the log.

LogAppender

LOGS table (INFO, DEBUG, WARN priorities)

ErrorAppender

ERRS table (ERROR,FATAL priorities)

EMAIL

Email notification for critical error
messages.

Loggers

Loggers provide the match between Document Factory components
and Appenders. Loggers are hierarchical depending on the name of the logger,
e.g. oracle.documaker is the parent of oracle.documaker.util. Parent loggers
will roll up log messages for ancestors if the additivity value is set to YES –
this can result in duplicate messages. Loggers and their settings are defined
in ALCONFIGCONTEXT table. The logger settings determine which event information
from the logging package is captured (Priority) and the destinations available
to the logger (Appenders).

Special
Circumstances

When a worker is started, a database connection must be
established. The database connection must be established for the worker to
obtain its settings, particularly the LOG4J settings for the worker. Event
messages generated during this time are logged to the file system. Rudimentary
settings for logging are stored in
the
log4j.xml file, which is located inside the worker’s
deployment JAR file. Deployment JAR files are located in the
[ODEE_HOME]/docfactory/deploy directory. The LOG4J settings in this file at
installation are the same as the roll, LogAppender, and ErrorAppender settings
provided above – therefore if you make changes to these appenders
it is recommended to change them in the deploy files as well.

Documaker Interactive generally logs messages to the LOGS and
ERRS tables as necessary. However during debugging sessions it may be necessary
to generate debug information that goes to a file rather than one of the
aforementioned tables. This file location is specified in Documaker
Administrator -> System -> Assembly Line -> Correspondence ->
Context Name LOG4J, Category LOGGING, Group Name LOG4J_INIT, Property
logFilePath. By default this file shows in the root of the idm_server directory
on the presentation tier web application server.

Recommendations

  • Consolidate file logging to a single area for
    monitoring
    • DocFactory
      stdout/stderr (console redirect)
    • DocFactory
      Worker process logs (process-roll appender)
    • DocFactory
      Worker program logs (roll appender)
    • Documaker
      Interactive – debug logs
    • Web
      Application Server logs – use the console web application to change the
      name/location for dmkr_server and idm_server log files. Keep in mind this
      location must be uniformly accessible across all nodes in the cluster.
  • Periodically scan ERRS table for relevant error
    messages that may need resolution. Filter out document-related errors that are
    due to publishing problems (e.g. missing required data or other non-system
    errors). Use ERRDATA and ERRPROGRAM columns for filters –
    filter values TBD.
  • Periodically scan LOGS table for relevant
    information about the general health of the system. Filter out document-related
    log information. Filter values TBD.
  • For reconciliation reporting, scan the JOBS,
    TRNS, and PUBS tables and filter on JOBSTATUS, TRNSTATUS, and PUBSTATUS like ‘%41’

    this will show any jobs, transactions, or publications that resulted in an
    error. It may be desirable to create a VIEW that consolidates these columns
    into a single source that can be queried.

Instrumentation

Supervisor and Java-based workers (all except Assembler,
Distributor, Presenter) support JMX instrumentation to monitor class loading,
memory usage, garbage collection, and deadlocks. Each worker will have a
separate TCP/IP port for each instance
for monitoring, ports are assigned a starting number and then incremented by 1
for each additional instance.

Recommendations

Documented recommendation is not
to enable instrumentation in production mode because of additional overhead and
port usage; however, this could be mitigated by a long interval between checks
(default is 60 seconds). With the introduction of automated instrumentation in Java 6,
JMX is not required. Instrumentation can be configured programmatically
pursuant to the needs of an organization. Utilize enterprise tools to inject
instrumentation code and perform inspection according to desired results.

Notification

Supervisor can email notifications in the event of a fault. Messages
are not configurable and are the same messages that are delivered via other
LOG4J methods – email is one of the appenders so this
information will be captured in other appender locations as well.

Recommendations

  • Consider email notification in non-production environments to
    support development efforts.
  • Consolidate issue notification using an enterprise-wide tool –
    therefore consider using the other methods provided by the software to log
    messages to common locations.

System

A complete monitoring plan should include monitoring the overall
health of each node in across the tiers of the Oracle Documaker environment. At
a high level, the following metrics should be monitored:

  • CPU Utilization – utilization should
    average about 80%. Peaks are expected (e.g. during process startup or shutdown,
    or during high-load times) however the average should hover at or below 80%.
    Utilization average above 80% suggests the need to execute performance tuning
    activities or other remediation that may include one or more of the following:
    • Inspect
      the affected node’s configuration and ensure there are no
      unnecessary processes running.
    • Add
      a node to the affected node’s cluster. For example,
      if the affected node is a Documaker application node in a cluster, create a new
      node and add it to the cluster.
    • Tuning
      of process ceilings allowed on the affected node. For example, if the affected
      node is a Documaker application node and the UseLoadBalancing settings are
      enabled, it may be necessary to set a ceiling on the maximum number of processes
      that can be started.
  • Memory Utilization / Swapping  - ensuring the proper amount of memory
    allocated to a node is imperative since disk swapping leads to poor
    performance. If excessive memory/disk swapping occurs, consider one or more of
    the following remediation activities:
    • Inspect
      the memory consumption of processes on the affected node and remediate any
      irregularities (for non-Documaker processes).
    • Reduce
      the number of processes running on the affected node. If UseLoadBalancing
      settings are enabled on a Documaker application node, consider setting a
      ceiling on the maximum number of processes that can be started. Inspect the
      node to ensure there are no unnecessary processes that are running.
    • Reduce
      memory allocation of processes – performance tests should
      be conducted to determine the appropriate memory allocation of processes to
      achieve maximum performance for a given assembly line. The optimal
      configuration may result in a higher memory specification for application
      nodes, therefore be aware of the appropriate node memory sizing.
    • Disk
      Space –
      ensure the system has adequate free space for swap files, temporary files, log
      files, and data file storage areas as determined by system, business, and
      technology requirements.
  • Table Space – ensure the tablespaces
    used by the Documaker system have adequate space allocated to ensure new rows
    can be added to meet processing requirements.

Presence of other components of an enterprise system such as web
application servers and database may suggest additional monitoring requirements

consult the appropriate documentation for those systems.

Housekeeping

The Oracle Documaker data schema includes live and historical data tables. Live tables are populated and read
continuously by the Workers within the DocFactory. DocFactory workers do not
consume data pushed into the historical tables. No differentiation of data
disposition (e.g. in live or history tables) is presented to external consumers
(e.g. web service consumers, dashboard or Interactive users). To these consumers
the data simply appears to be within the Documaker schema. This is useful for
maintaining historical information apart from live data while still allowing
the historical data to be useful. It is important to ensure that the live data
tables are as lean as necessary to support business requirements.

Oracle Documaker includes a Historian worker that facilitates movement of data from live to
historical tables. The Historian worker can also facilitate cleaning the LOGS
and ERRS tables, and will also remove clean old data from historical tables.
All of these functions are configurable using schedules and rules. Schedules
are used to determine when a particular Historian task is executed –
typically during idle time. Rules are used to determine eligibility for
processing and can be used to support business requirements for retention –
e.g. transactions matching certain eligibility conditions may need to be
retained for a specific time period. The Historian also has the ability to
purge certain columns of data, which allows the system to retain the
transactional information for statistical use but removes the heavyweight
columns (e.g. BLOBs containing XML data and print stream data) to keep the
system lean.

Refer to the Historian documentation in the Documaker Enterprise
Edition Administrator Guide for complete details on configuration and
scheduling of Historian tasks.



[1] Note: this problem appears
to be specific to Linux systems, which do not implement file locking in the
same manner as Windows systems.

[2] This process should only
be monitored if the Receiver has been implemented as a Singleton.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.