By Steve Felts-Oracle on Jul 22, 2013
When an error occurs on a connection, it would be nice to be able to keep processing on another connection - that's what AC does. When on a single node database, getting another connection means that the database and network are still available, e.g., maybe just a network glitch. However, in the case of a Real Application Cluster (RAC), chances are good that even if you lost a connection on one instance you can get a connection on another instance for the same database. For this feature to work correctly, it's necessary to ensure that all of the work that you did on the connection before the error is replayed on the new connection - that's why AC is also called replay. To replay the operations, it's necessary to keep a list of the operations that have been done on the connection so they can be done again. Of course, when you replay the operations, data might have changed since the last time so it's necessary to keep track of the results to make sure that they match. So AC may fail if replaying the operations fail.
You can read the WLS documentation, an AC white paper, and the Database documentation to get the details. This article gives an overview of this feature (and a few tidbits that the documentation doesn't cover).
To use this feature, you need to use both the 12c driver and a 12c database. Note that if you use an 126.96.36.199 driver and/or database, it might appear to work but replay will only happen for read-only transactions (the 11g mode is not supported but we don't have a mechanism to prevent this).
You need to configure a database service to run with AC. To do that, you need to set the new 12c service attributes FAILOVER_TYPE=TRANSACTION and COMMIT_OUTCOME=true on the server side.
There's not much to turning on AC in WLS - you just use the replay driver "oracle.jdbc.replay.OracleDataSourceImpl" instead of "oracle.jdbc.OracleDriver" when configuring the data source. There's no programming in the application needed to use it. Internal to WLS, when you get a connection we call the API to start collecting the operations and when you close a connection we call the API to clear the operation history.
WLS introduced a labeling callback for applications that use the
connection labeling feature. If this callback is registered, it will be
called when a new connection is being initialized before replay. Even
if you aren't using labeling, you might still want to be called and
there is a new connection initialization callback that is for replay
(labeling callback trumps initialization callback if both exist).
It sounds easy and perfect - what's the catch? I've already mentioned that you need to get rid of references to concrete classes and use the new Oracle interfaces. For some applications that might be some significant work. I've mentioned that if replaying the operations fails or is inconsistent, AC fails. There are a few other operations that turn off AC - see the documentation for details. One of the big ones is that you can't use replay with XA transactions (at least for now). Selecting from V$instance or sys_context or other test traces for test instrumentation needs to be in callouts as the values change when replayed. If you use sysdate or systimestamp, you need to grant keep date time to your user.
We are also tracking two defects - AC doesn't work with Oracle proxy
authentication and it doesn't work with the new DRCP feature.
There's another more complex topic to consider (not currently in the current WLS documentation). By default, when a local transaction is completed on the connection, replay ends. You can keep working on the connection but failure from that point on will return an error to the application. This default is based on the service attribute SESSION_STATE_CONSISTENCY with a value of DYNAMIC. You can set the value to STATIC if your application does not modify non-transactional session state (NTSS) in the transaction. I'm not sure how many applications fall into this trap but the safe thing is to default to dynamic. I'll include such a code fragment below. A related common problem that people run into is forgetting to disable autocommit, which defaults to true, and the first (implicit) commit turns off replay if SESSION_STATE_CONSISTENCY is set to DYNAMIC.
It's important to know how to turn on debugging so that if a particular sequence doesn't replay, you can understand why. You simply need to run with the debug driver (ojdbc6_g.jar or ojdbc7_g.jar) and run with -Dweblogic.debug.DebugJDBCReplay (or turn this debug category on in the configuration).
AC won't replay everything and you still need to have some application logic to deal with the failures or return them to the end user. Also, there's some overhead in time and memory to keep the replay data. Still, it seems like a great feature for a lot of applications where you don't need to change anything but the driver name and you can avoid showing an error to the end user or simplify some recovery logic.
P.S. Confused about NTSS? So was I. Examples of non-transactional session state that can change at run-time are ALTER SESSION, PL/SQL global variables,
SYS_CONTEXT, and temporary table contents. Here's an example of a PL/SQL global variable. Imagine a package with the following body:
current_order number := null; current_line number; procedure new_order (customer_id number) is current_order := order_seq.nextval; insert into orders values (current_order, customer_id); current_line := 0; end new_order; procedure new_line (product_id number, count number) is current_line := current_line + 1; insert into order_lines values (current_order, current_line,product_id, count); end new_line; end order; and a psuedo-code sequence in WLS like this:
getConnection() exec "begin order.new_order(:my_customer_id); end;" commit; exec "begin order.new_line(:my_product_id, :my_count); end;" <DB server failure and failover> commit;
In this scenario, we won't replay the first transaction, because it's already
committed and we'd end up with two orders in the database. But if we don't
replay it, we lose the order_id, so the new_line call won't work. So we have to
disable replay after the commit. If you can guarantee that no such scenario exists, you can mark the service session-state as static.