So you've just installed a brand-new Oracle Documaker Enterprise Edition system ("ODEE") and at some point during your implementation, you're going to have to scale the system. You probably are already familiar with ODEE's scaling properties, but let's review a little bit. In the past, with Standard Edition (a/k/a Documaker, Documaker RP, "ODSE", or various other names) scaling meant figuring out how to split up input files into multiple jobs, and then distribute those jobs to multiple executions of ODSE, commingling the intermediary output, and then running a final print process. This creates a rigid framework that has to scale manually to meet increased volumes or reduced processing windows. Some years ago, Docupresentment (a/k/a IDS) came along and suddenly Documaker was adorned with a service-based interface that allow for real-time document generation both in batch and "batches of one". Docupresentment added some enhanced scaling capabilities, but still requires some amount of manual intervention for scaling large batches, and has limited automatic scaling capabilities. With ODEE and the database-backed processing capabilities combined with scalable technologies, you're in the driver's seat of a supercar in the world of truly scalable document automation. Under the hood, ODEE uses JMS queues to push around internal units of work from schedulers to workers, and as such requires a well-tuned JMS server to obtain the best performance. In this post, I'm going to discuss JMS configuration within WebLogic, and how you can implement JMS configuration for high-availability and failover with ODEE. Finally, we'll cover one facet of tuning, and that is JMS performance. Let's get started!
Let's review some of the JMS implementation details within WebLogic. The JMS components deployed by the ODEE installer consist of:
The hierarchy of these objects looks like this in a default installation with a single assembly line.
An ODEE Assembly Line has its own set of workers and therefore needs its own set of JMS resources - this is why the hierarchy of components is structured as it is: an Assembly Line has a JMS Server, JMS Module, Subdeployment, Queues, and a QCF. These can be collectively retargeted and migrated as scaling needs change.
First, it is important to know that WebLogic JMS provides two load-balancing algorithms: Round Robin (default) and Random. In the round-robin algorithm, WebLogic maintains an ordering of physical destinations within the distributed destination. The messaging load is distributed across the physical destinations one at a time in the order that they are defined in the WebLogic Server configuration. Each WebLogic Server maintains an identical ordering, but may be at a different point within the ordering. Multiple threads of execution within a single server using a given distributed destination affect each other with respect to which physical destination a member is assigned to each time they produce a message. Round-robin is the default algorithm and doesn't need to be configured, and is recommended for Documaker.
When an ODEE Worker starts, it must connect to a queue destination as a consumer. When distributed destinations are used, WebLogic JMS must find a physical destination that the worker will receive messages from. The choice of which destination member to use is made only upon initial connection by using one of the load-balancing algorithms. From that point on, the consumer gets messages from that member only. When testing failover behavior of Workers and queues, you will notice how ODEE handles loss of queue connections. When a distributed JMS destination member goes down, the Worker will lose connection to the member, and will destroy the existing consumer. The Worker will attempt to re-establish queue connection by creating a new consumer, according to the selected load-balancing algorithm.
When a producer sends a message, WebLogic JMS looks at the destination where the message is being sent. If the destination is a distributed destination, WebLogic JMS makes a decision as to where the message will be sent. The producer will send to one of the destination members according to one of the load-balancing algorithms. The producer makes such a decision each time it sends a message. However, there is no compromise of ordering guarantees between a consumer and producer, because consumers are load balanced once, and are then pinned to a single destination member. If a producer attempts to send a persistent message to a distributed destination, every effort is made to first forward the message to distributed members that utilize a persistent store. However, if none of the distributed members utilize a persistent store, then the message will still be sent to one of the members according to the selected load- balancing algorithm. Therefore it is important to understand that JMS Servers do not share messages in a cluster unless additional configuration is performed to forward JMS messages between distributed queue members.
This specific configuration is in relation to JMS clustering, however, in our testing with ODEE 12.6.2 we found that it does not properly support the use of clustered JMS queues (we have found that some older versions may support clustered JMS queues). A primary objective in implementing high availability is to eliminate single points of failure (SPoFs), and clustering is a typical remediation for SPoFs. However, there is another option available in WebLogic that remediates SPoFs and that is service migration - this is a feature of WebLogic high availability. In this configuration, a cluster of WebLogic managed servers can be made, and can be scaled, and JMS service can be pinned to one cluster member, and automatically (or manually, if you prefer) migrated from an unhealthy cluster member to a healthy cluster member. This model requires a bit more effort to ensure the cluster members are sized appropriately to handle the work being passed through the system, however in our testing we have found that JMS services are extremely lightweight and trivial in terms of performance hit on system processing speed.
Failover configuration for JMS can take several forms depending on your level of tolerance for message loss. Since this post is specifically dealing with performance I'm not going to cover failover in great detail. In general, JMS services can be configured for service migration which meets the failover requirement. To modify the default deployment of ODEE to support highly available configuration, perform the following configuration steps in WebLogic Console.
These instructions assume you have an existing ODEE installation that is already deployed to WebLogic, which means you have a machine (node), on which are multiple managed servers (one of which is hosting JMS modules). These instructions assume some familiarity with WebLogic Console, which is where this configuration takes place.
During a failover scenario, this configuration should act as follows: If the server instance that is hosting the JMS deployment should fail, then the services are automatically migrated to the next member of the cluster. The products and consumers using those JMS resources will then fail to connect to the now-nonexistent service on the now-dead server, and connection will be established to the next server in the list provided by jms.provider.URL setting. Messages remain intact if the persistent store is a database.
One method of performance tuning an ODEE implementation involves determining how efficient workers are handling the workload. Because every implementation is different (different inputs, documents, and rules), there isn't a one-size-fits-all solution. There are a number of activities that you can undertake to give visibility into your system, and one such activity is to monitor your JMS queues. Each queue can expose information about how many messages it contains, the high water mark of messages (e.g. the maximum number of messages that existed in the queue), the number of active consumers, and more. For our purposes, we are interested in, for each queue, the number of consumers and messages, and the high water mark of messages. If you've spent any time digging around in WebLogic console, you will soon learn that capturing enough of this information to conduct trend analysis is somewhat painful, requiring a lot of configuration and overhead. Luckily, I have put together a handy script that you can run in WLST to capture or display information. You can download the script here.
########## USER SETTINGS ############ # connection to WebLogic Instance username='<weblogic_user_id>' password='<weblogic_password>' wlsUrl='t3://<hostname>:<port>' # milliseconds to wait between polls to JMS queues sleepTime=5000; # comma-delimited list of managed servers hosting JMS services to query. includeServer = ['jms_server']; # comma-delimited list of JMS servers (note: not managed servers!) to query. includeJms = ['AL1Server']; # comma-delimited list of JMS destinations to query. includeDestinations = ['IdentifierReq','PresenterReq','AssemblerReq','DistributorReq','ArchiverReq'] #ReceiverReq,ReceiverRes,PubNotifierReq,BatcherReq,SchedulerReq,PublisherReq # path/file name of logfile to write output logfilename = 'jmsmon.csv'; # Logging output options: # 0 - log to screen and file # 1 - log to file # 2 - log to screen logoption = 0 ############ END USER SETTINGS ########### import time from time import gmtime, strftime def getTime(): return strftime("%Y-%m-%d %H:%M:%S", gmtime()) def monitorJms(): servers = domainRuntimeService.getServerRuntimes(); if (len(servers) > 0): for server in servers: serverName = server.getName() if serverName in includeServer: jmsRuntime = server.getJMSRuntime(); jmsServers = jmsRuntime.getJMSServers(); for jmsServer in jmsServers: jmsName = jmsServer.getName(); if jmsName in includeJms: destinations = jmsServer.getDestinations(); for destination in destinations: destName = destination.getName(); destName = destName[destName.find('@')+1:]; if destName in includeDestinations: try: if (logoption < 2): f.write("%s,%s,%s,%s,%s,%s,%s\n" %(getTime(),serverName,jmsName,destName,destination.getMessagesCurrentCount(),destination.getMessagesHighCount(),destination.getConsumersCurrentCount())); if (logoption == 0 or logoption == 2): print("%s\t%s\t%s\t%s\t%s,%s\t\t\t%s" %(getTime(),serverName,jmsName,destName,destination.getMessagesCurrentCount(),destination.getMessagesHighCount(),destination.getConsumersCurrentCount())); except: if (logoption < 2): f.write('ERROR_DATA\n'); if (logoption == 0 or logoption == 2): print('ERROR_DATA!'); connect(username,password, wlsUrl); if (logoption < 2): f = open(logfilename,'a+'); f.write('Time,ServerName,JMSServer,Destination,Msgs Cur,Msgs High,ConsumersCur\n'); if (logoption == 0 or logoption == 2): print 'Time\t\t\tServerName\tJMSServer\tDestName\tMesg Cur,High\tCons. Cur Count'; try: while 1: monitorJms(); if (logoption == 0 | logoption == 2): print('--'); java.lang.Thread.sleep(sleepTime); except KeyboardInterrupt: if (logoption < 2): f.close;
This script will output either to a file (as comma-separated values) or the terminal (as formatted output) a listing of each of the desired JMS servers and queues, and the message depths/high water mark and consumer count. To configure for your environment, you can drop the contents of the above into a file called jmsmon.py in your [ODEE_HOME]/documaker/j2ee/weblogic/oracle11g/scripts folder, and then add a shell script file to execute it, which is a simple file with these commands:
. ./set_middleware_env.sh > /dev/null
Edit the .py file and adjust the settings as necessary. You'll notice that the user settings are contained at the top of the file. The only settings you must change are the username, password, and WebLogic connection URL for server/port. You can optionally change the settings for includeServer, includeJms, and includeDestinations. Each of these settings is a comma-delimited array of names that you want to be polled and included in the results. If you have multiple JMS managed servers, add them to includeServer. If you have multiple JMS servers, add them to includeJms. You can specify which destinations are included by adding them to includeDestinations - note that this group is used for all managed servers and JMS servers. In this way, if you have a clustered configuration or multiple assembly lines, you can capture the statistics for all of them using this script. Note that the default settings are to log to screen and file, and the screen uses tab-formatted output, while file output is comma-separated values for analysis in a software package like Excel.
The script is meant to be executed during a load test, usually of at least 100 transactions or more to get some useful data for analysis. Run the script and start your test. While the load test is underway, you will see the current messages and high-water mark on these queues ramp up considerably, because these are the workers that typically take more time to complete a unit of work, so there will be a backlog of work. In my particular test case, I'm running 1,000 transactions, all of which are routed for manual intervention and so will not proceed beyond the Assembler worker. If I modify the script only to query the Assembler worker, we can see the number of messages waiting. This test tells us that the Assembler is pumping through around 125 transactions every 5 seconds or so, with a single Assembler instance running. I happen to know that these are relatively complex transactions, and this particular system is a virtual machine running on a laptop with the database, application server, and processing services consolidated to a single virtual machine so my performance expectations are low. By examining the consumer count (1) we know that the load balancing algorithm built into ODEE is not kicking in, based on the default configuration. The load balancer configuration allows the Scheduler work to query each worker at regular intervals. If the worker is able to respond within a specified time frame, it is deemed to be idle. If it is unable to respond within the time frame, it is deemed to be busy. After a predefined number of busy responses, the Scheduler will start up another worker instance (or thread pool, depending on the type of worker) as long as the configured maximum has not been reached. In the example above, if I was unhappy with the amount of time taken to run this batch of jobs, I could lower the threshold for load balancing to kick in, or I could preconfigure the number of instances on startup to be higher. In either case, the goal is to prevent worker starvation across the assembly line by having enough workers to satisfy the demand, while balancing this within the confines of the processing cluster. I reviewed the Identifier queue figures in another run and the high water mark for messages in this queue was under 50 and the current message count was very low, meaning the Identifier was keeping with demand.
There is no predetermined performance configuration that will meet all needs, since each implementation is different, but this exercise will give you information to determine how to configure ODEE for your implementation and environment. Good luck!