Using XA Transactions in Coherence-based Applications
By jpurdy on May 31, 2012
While the costs of XA transactions are well known (e.g. increased data contention, higher latency, significant disk I/O for logging, availability challenges, etc.), in many cases they are the most attractive option for coordinating logical transactions across multiple resources.
There are a few common approaches when integrating Coherence into applications via the use of an application server's transaction manager:
- Use of Coherence as a read-only cache, applying transactions to the underlying database (or any system of record) instead of the cache.
- Use of TransactionMap interface via the included resource adapter.
- Use of the new ACID transaction framework, introduced in Coherence 3.6.
Each of these may have significant drawbacks for certain workloads.
Using Coherence as a read-only cache is the simplest option. In this approach, the application is responsible for managing both the database and the cache (either within the business logic or via application server hooks). This approach also tends to provide limited benefit for many workloads, particularly those workloads that either have queries (given the complexity of maintaining a fully cached data set in Coherence) or are not read-heavy (where the cost of managing the cache may outweigh the benefits of reading from it). All updates are made synchronously to the database, leaving it as both a source of latency as well as a potential bottleneck. This approach also prevents addressing "hot data" problems (when certain objects are updated by many concurrent transactions) since most database servers offer no facilities for explicitly controlling concurrent updates. Finally, this option tends to be a better fit for key-based access (rather than filter-based access such as queries) since this makes it easier to aggressively invalidate cache entries without worrying about when they will be reloaded. The advantage of this approach is that it allows strong data consistency as long as optimistic concurrency control is used to ensure that database updates are applied correctly regardless of whether the cache contains stale (or even dirty) data. Another benefit of this approach is that it avoids the limitations of Coherence's write-through caching implementation.
TransactionMap is generally used when Coherence acts as system of record. TransactionMap is not generally compatible with write-through caching, so it will usually be either used to manage a standalone cache or when the cache is backed by a database via write-behind caching. TransactionMap has some restrictions that may limit its utility, the most significant being:
- The lock-based concurrency model is relatively inefficient and may introduce significant latency and contention. As an example, in a typical configuration, a transaction that updates 20 cache entries will require roughly 40ms just for lock management (assuming all locks are granted immediately, and excluding validation and writing which will require a similar amount of time). This may be partially mitigated by denormalizing (e.g. combining a parent object and its set of child objects into a single cache entry), at the cost of increasing false contention (e.g. transactions will conflict even when updating different child objects).
- If the client (application server JVM) fails during the commit phase, locks will be released immediately, and the transaction may be partially committed. In practice, this is usually not as bad as it may sound since the commit phase is usually very short (all locks having been previously acquired). Note that this vulnerability does not exist when a single NamedCache is used and all updates are confined to a single partition (generally implying the use of partition affinity).
- The unconventional TransactionMap API is cumbersome but manageable.
- Only a few methods are transactional, primarily get(), put() and remove().
The ACID transactions framework (accessed via the Connection class) provides atomicity guarantees by implementing the NamedCache interface, maintaining its own cache data and transaction logs inside a set of private partitioned caches. This feature may be used as either a local transactional resource or as logging XA resource. However, a lack of database integration precludes the use of this functionality for most applications. A side effect of this is that this feature has not seen significant adoption, meaning that any use of this is subject to the usual headaches associated with being an early adopter (greater chance of bugs and greater risk of hitting an unoptimized code path). As a result, for the moment, we generally recommend against using this feature.
In summary, it is possible to use Coherence in XA-oriented applications, and several customers are doing this successfully, but it is not a core usage model for the product, so care should be taken before committing to this path. For most applications, the most robust solution is normally to use Coherence as a read-only cache of the underlying data resources, even if this prevents taking advantage of certain product features.