Planned Database Maintenance with WebLogic Multi Data Source (MDS)
This article discusses how to handle planned maintenance on the database server when it is accessed by WebLogic Multi Data Source (MDS) in a fashion that no service interruption occurs.
To ensure there is no service interruption there must be multiple database instances available so the database can be updated in a rolling fashion. Oracle technologies to accomplish this include RAC cluster and GoldenGate or a combination of these products (note that DataGuard cannot be used for planned maintenance without service interruption). Each database instance is configured as a member generic data source, as described in the product documentation. This approach assumes that the application is returning connections to the pool on a regular basis.
1. On mid-tier systems - Shutdown all member data sources associated with the RAC instance that will be shut down for maintenance. It's important that not all data sources in each MDS list be shutdown so that connections can be reserved on the other member(s). Wait for data source shutdown to complete. See http://docs.oracle.com/cd/E13222_01/wls/docs100/wlsmbeanref/mbeans/JDBCDataSourceRuntimeMBean.html?skipReload=true#shutdown.
2. At this point, it may be desirable to do some work on the database side to reduce remaining connections not associated with WLS data source. For the Oracle database server, this might include stopping (or relocating) the application services at the instances that will be shut down for maintenance, stopping the listener, and/or issue a transactional disconnect for the services on the database instance. See https://blogs.oracle.com/WebLogicServer/entry/agl_database_outages for more information that is included in the Active GridLink description.
3. Shutdown the instance immediate using your preferred tools
4. Do the planned maintenance.
5. Start up the database instance using your preferred tools
6. Startup the services when the database instances are ready for application use.
7. On midtier systems -Start the member data sources. See See http://docs.oracle.com/cd/E13222_01/wls/docs100/wlsmbeanref/mbeans/JDBCDataSourceRuntimeMBean.html?skipReload=true#start.
Shutting down the data source
Shutting down the data source involves first suspending the data source and then releasing the associated resources including the connections. When a member data source in a MDS is marked as suspended, the MDS will not try to get connections from the suspended pool but will go to the next member data source in the MDS to reserve connections. It's important not all data sources in each MDS list be shut down at the same time. If all members are shut down or fail, then access to the MDS will fail and the application will see failures.
When you gracefully suspend a data source, which happens as the first step of shut down:
- the data source is immediately marked as suspended at the beginning of the operation so that no further connections will be created on the data source
- idle (not reserved) connections are not closed but are marked as disabled.
- after a timeout period for the suspend operation, all remaining connections in the pool will be marked as suspended and “java.sql.SQLRecoverableException: Connection has been administratively disabled. Try later.” will be thrown for any operations on the connection, indicating that the data source is suspended. These connections remain in the pool and are not closed. We won't know until the data source is resumed if they are good or not. In this case, we know that the database will be shut down and the connections in the pool will not be good if the data source is resumed. Instead, we are doing a data source shutdown which will close all of the disabled connections.
The timeout period is 60 seconds by default. This can be changed by configuring or dynamically setting Inactive Connection Timeout Seconds to a non-zero value (note that this value is overloaded with another feature when connection leak profiling is enabled). There is no upper limit on the inactive timeout. Note that the processing actually checks for in-use (reserved) resources every tenth of a second so if the timeout value is set to 2 hours and it's done a second later, it will complete a second later.
Note that this operation runs synchronously; there is no asynchronous version of the mbean operation available. It was designed to run in a short amount of time but testing shows that there is no problem setting it for much longer. It should be possible to use threads in jython if you want to run multiple operations in one script as opposed to lots of script (a jython programmer is needed).
This procedure works for MDS configured with either Load-Balancing or Failover.
This is what a WLST script looks like to edit the configuration to increase the suspend timeout and then use the runtime MBean to shutdown a data source. It would need to be integrated into the existing framework for all WLS servers/data sources.
java weblogic.WLST myscript2.py
import sys, socket, os
hostname = socket.gethostname()
# Edit the configuration to set the suspend timeout
cd('/JDBCSystemResources/' + datasource + '/JDBCResource/' + datasource + '/JDBCConnectionPoolParams/' + datasource )
cmo.setInactiveConnectionTimeoutSeconds(21600) # set the suspend timeout
# Shutdown the data source
cd('/JDBCServiceRuntime/' + svr + '/JDBCDataSourceRuntimeMBeans/' + datasource )
Note that if MDS is using a database service, you cannot stop or relocate the service before suspending or shutting down the MDS. If you do,. MDS may attempt to create a connection to the now missing service and it will react as though the database is down and kill all connections, not allowing for a graceful shutdown. Since MDS suspend ensures that no new connections are created at the associated instance (and the MDS only creates connections on this instance, never another instance even if relocated), stopping the service is not necessary for MDS graceful shutdown. Also, since MDS suspend causes all connections to no longer do any operations, no further progress will be made on any sessions (the transactions won't complete) that remain in the MDS pool.
There is one known problem with this approach related to XA affinity that is enforced by the MDS algorithms. When an XA branch is created on a RAC instance, all additional branches are created on the same instance. While RAC supports XA across instances, there are some significant limitations that applications run into before the prepare so MDS enforces that all operations be on the same instance. As soon as the graceful suspend operation starts, the member data source is marked as suspended so no further connections are allocated there. If an application using global transactions tries to start another branch on the suspending data source, it will fail to get a connection and the transaction will fail. In this case of an XA transaction spanning multiple WLS servers, the suspend is not graceful. This is not a problem for Emulate 1PC or 2pc, which uses a single connection for all work, and LLR.
If there is a reason to separate the suspending of the data source, at which point all connections are disabled, from the releasing of the resources, it is possible to run suspend followed by forceShutdown. A forced shutdown must be used to avoid going through the waiting period a second time. This separation is not recommended.
To get a graceful shutdown of the data source when shutting down the database, the data source must be involved. This process of shutting down the data source followed by shutdown of the database requires coordination between the mid-tier and the database server processing. Processing is simplified by using Active GridLink instead of MDS; see the AGL blog included above.
When using the Oracle database, it is recommended that an application service be configured for each database so that it can be configured to have HA features. By using an application service, it is possible to start up the database without data source starting to use it until the administrator is ready to make it available and the application service is explicitly started.