Oracles Latest Technology Acquisition
This week I managed to sit in on just over half a new training course on Oracles new Coherence product, acquired through the purchase of Tangosol. Coherence is described as a data grid, although as we will see that is not a particularly helpful image. I knew that Steve Harris, our VP of Java Products was keen to acquire something like a data grid, whether by buying one or building one, and it looks like now his wish has been granted.
What is Coherence
Describing coherence as a data grid is a little like describing the QE2 as a boat. It is 100% accurate but does nothing to convey what is possible. At the heart of Coherence is some clever Java based cluster software that dynamically adds new members and more importantly keeps all members informed about who is a member of the cluster (coherence). Layered on top of this is a distributed caching mechanism that distributes cached items across the cluster and replicates the data for high availability. Changes to entries in the cache cause events to fire, these events can be captured by any or all nodes in the cluster providing a consistent view of the distributed data.
What is it Good For
There are lots of scenarios where Coherence adds value.
Data can be lazily loaded from a database and stored in the cache, providing a huge reduction in database load and accelerating the access of applications to the data. Different kinds of application can access the same data from memory. This might be useful for an online retailer, providing easy access to the stock descriptions and at the same time providing a consistent view of stock levels across the whole product range without any complicated coding in the applications or any extra load on the database.
By storing all data in Coherence (loaded from a database) and then providing a lazy write mechanism into the database then it is possible to guarantee that all changes will be written to the database, even if the database goes down or individual nodes in the Coherence cluster go down. This is a major step forward in HA configuration as normally the availability of a system is the product of the availability of its individual tiers, with coherence we can remove the database from the equation improving the availability percentage.
In addition to distributing data across the cluster we can also execute code locally in the cache against a set of objects, removing the need to transfer large amounts of data into a single point for processing. This can be thought of as similar to stored procedures in a database, but inherently distributed and scalable making it possible to perform large scale calculations on distributed data sets..
The ability to monitor state changes in the cached items allows us to build very event driven systems. A change in a particular data item would allow us to trigger any action that was necessary such as a database update or the launch of a BPEL process. Filtering allows us to notice only those events we are interested in.
The acquisition of Coherence will probably have profound implicatoins for fusion middleware moving forward. Already there is integration with the session handling code in OC4J. Looking further afield we can envisage any component that requires clustered operation making use of the underlying coherence framework. Some possibilities include
- A JMS implementation that is memory based but highly available and with guaranteed delivery capability.
- BPEL persistence being distributed through the cache before being written to the database.
- Events being transmitted to a clustered BAM engine.
- BAM taking advantage of the filtering to enhance its event engine.
- A notification service to inform middle tier components of updates in the database.
I suspect over time we will see Coherence embed itself very firmly into the underlying fusion middleware infrastructure, adding data grid capabilities to a variety of components. It will be interesting to see how the acquisition is absorbed into the rest of the middleware stack, watch this space.