The Perfect Marriage: Oracle Business Rules & Coherence In-Memory Data Grid. High Scalable Business Rules with Extreme Low Latency
By Ricardo Ferreira on Aug 16, 2013
The idea of separating business rules from the application logic is by far an old concept. But in the last ten years, what we have seem is that dozen of platforms and technologies has been created to allow this separation of concerns. One of those technologies is BRMS, acronym of Business Rules Management System. The basic idea of one BRMS is to be a repository of rules, governing those rules in such way that they can be created, updated, tested and controlled by an external interface. Part of the BRMS responsibility it is also provide an API (more than one when possible) that allows external applications to interact with the BRMS, allowing those applications to send data over the network, and that data can trigger the execution of zero, one or multiples rules in the BRMS repository. This rule execution occurs outside of those external applications, minimizing their process memory footprint and generating much less CPU overhead since the execution processing of the rules happens in a separated server/cluster. This architecture approach is very powerful, allowing:
- Rules can be managed (created, updated) outside of the application code
- Rules can be reused across different applications, no matter their technology
- Less CPU overhead and smaller memory footprint in the applications
- More control over rules, auditing of changes and enterprise log history
- Integration with other IT artifacts like dictionaries, processes, services
With this context in place, we are all agree that the usage of one BRMS is a mandatory approach on every IT architecture due its power, if it were not for the fact that BRMS technologies introduces a lot of overhead in the overall transaction latency. In the middle of the external application that invokes the BRMS to execute rules and the BRMS platform itself, there is the network channel. This means that we must deal with network I/O and their technical implications (serialization, instability, buffering bytes approach) when we send/receive data to/from the BRMS. No matter if the BRMS provides an SOAP API, an REST API or any other TCP/IP based API, the overall transaction latency is compromised by the network overhead.
Another huge problem of BRMS platforms is scalability. When the BRMS platform is first introduced to an architecture, it handles an acceptable number of TPS (Transactions Per Second), which nowadays varies from 1K TPS to 5K TPS. But when other applications starts using the same BRMS platform, or the number of transactions just naturally grows, you can face scenarios when your BRMS platform must deal with 20K TPS or even 100K TPS. What happens when a huge numbers of objects are allocated in the heap space of the Java based server? The memory footprint starts to reach its maximum size and the garbage collector starts to run to reclaim the unused memory and/or redesign the layout space. No matter what job the garbage collector has to do, it will use the entire processing power to runs its job as soon as possible, since the amount of garbage to handle will be huge. This is true for the almost BRMS platforms of the market, no matter if its from one vendor or another. If the BRMS platform are Java based, when those servers JVM reach more than 16 GB of space in average, they starts to face a huge performance problem due garbage collection.
Differently from other architecture designs in which the load is distributed across a cluster, BRMS platforms must handle the entire processing in a single server due a general concept of BRMS platforms known as execution agenda and working memory. All the facts (the data sent as input) are maintained in this agenda in a single server, making the BRMS platform a pinned service, in which they do their job in a singleton fashion. In this situation, when you need to scale, you can introduce series of equally servers, below a corporate load-balancer that instead of distribute load, it divides entire transaction volumes across those servers. Because each server below the load-balancer handle the entire volume by itself, those servers limit concurrency by the number of processors available in their mainboard. If you need more compute power, due lack of concurrency, you are forced to buy a much higher server. Those servers are huge, expensive and costs a lot of money since they need to be big enough in terms of processors to handle thousands of executions simultaneously and completely alone. Not a very smart approach when you considering to handle millions of TPS.
With this situation in mind, it is necessary to design an architecture that would allow business rules execution be distributed across different servers. To achieve this behavior, it is necessary to use another software component that could share data (business entities, fact types, data transfer objects) across different processes, running in the same or different hardware boxes. And more important than that, a software component that would allow transaction latency to be short enough, reducing a lot of milliseconds introduced by network overhead. In other words, this software component must bring data to the unique hardware layer that really doesn't implies in I/O overhead, which is memory.
Recently, in order to deal with this problem and provide for a customer an scalable plus high performance way to use Oracle Business Rules, I designed an solution that solves both problems in a once, without losing the power of separation of concerns provided by BRMS platforms. In-Memory Data Grid technologies like Oracle Coherence has the power of handling massive amounts of data (MB, GB or even TB) completely in-memory. Moreover, this kind of technology has been written from scratch to distribute data across a number of servers, so scalability is never a problem here. When you integrate BRMS with In-Memory Data Grid technologies, you can do both of the two worlds: scalability plus high performance and also extreme low latency. And when I say extreme low latency I mean, sub-milliseconds of latency. Something around less than 650 μs in my tests.
This article will show how to integrate Oracle Business Rules with Oracle Coherence. The steps showed here can be reproduced for a huge number of scenarios, making your investment on Oracle Fusion Middleware (Cloud Application Foundation and/or SOA Suite stack) even more attractive.
The Business Scenario: Automatic Promotions for Bank Customers
Before we move to the implementation details of this article, we need to understand the business scenario used as didactic. We are about to simulate an automatic decision system that create promotions for banking customers based on their profiles. The idea here is let the BRMS platform decide which promotions to offer based on customer profiles that applications send it. This automatic promotion system should allow applications like internet banking sites, mobile applications or kiosk terminals, to present promotions (up-selling/cross-selling) to its final customers.
Building the Solution Domain Model
Let's start the development of the example. The first thing to do is the creation of the domain model, which means that we need to design and implement the business entities that will drive the client-side application execution, as such the business rules. The automatic promotion system will be composed of three entities: promotions, products and customers. A promotion it is something that the bank would offer to the customer, with contextual information about the business value of one or more products, derived from the customer profile. Here is the implementation of the promotion entity:
A product is something that the customer hire from the bank. Some kind of service or item that make the customer account more valuable to the bank and more attractive to the customer since it is a differentiator. Here is the implementation of the product entity:
And finally, we need to design the customer entity. The customer entity will be the representation of the person or company that hires one or more products from the bank. Here is the implementation of the customer entity:
As you can see in the code, the customer entity has a relationship with the two other entities. Build this code and package those three entities into a JAR file. We can now move to the second part of the implementation which is the creation of one SOA project that includes an business rules dictionary.
Creating the Business Rules Dictionary
Business rules in the Oracle Business Rules product are defined in an artifact called dictionary. In order to create an dictionary, you must use the Oracle JDeveloper IDE plus the SOA extension for JDeveloper. I will assume here that you are familiar with those tools, so I will not enter in too much detail about them. In JDeveloper, create a new SOA project, and after that create a business rules dictionary. With the dictionary in place, you must configure the dictionary to consider our domain model as fact types.
Now you can write down some business rules. Using the JDeveloper business rules editor, define the following rules as shown in the picture below.
For testing purposes, the variable "MinimumBalanceForCreditCard" it is just a global variable of type java.lang.Double that contains a constant value. Finally, you are required to expose those business rules through an decision function. As you probably already know, decision functions are constructions that make easier external applications to interact with Oracle Business Rules, minimizing the developers effort to deal with the Oracle Business Rules API, besides providing a very nice contract-based access point. Create one decision point that receives an customer as input, and returns the same customer as output. Don't forget to associate the ruleset with the decision function.
Integrating Oracle Business Rules and Coherence through Interceptors
Now here came the most exciting part of the article: the integration between Oracle Business Rules and Oracle Coherence In-Memory Data Grid. Starting from 12.1.2 version of Coherence, Oracle announced an new API called Live Events. This new API allows applications to listen/consume events from Coherence, no matter what type of event it is being generated. You can learn more about Coherence Live Events in this Youtube presentation.
Using both Coherence and Oracle Business Rules main libraries, implement the following event interceptor at your favorite Java development environment:
If you are familiar with the Oracle Business Rules Java API, you won't find any difficult to understand this code. What it does is simply create an DecisionPoint object during the constructor phase and put this object into a static variable, which allow this object to be shared across the entire JVM. Remember that the JVM in this context is a Coherence node, so what I am saying is that each Coherence node will hold an instance of one DecisionPoint. On the onEvent() method, there is the algorithm that checks which type of event the implementation should intercept, and also checks if the DecisionPoint instance should be updated. This last check is done based on the timestamp of the dictionary file.
After creating an DecisionPointInstance, the intercepted entries became the input variables for the business rules execution. The interceptor triggers the rules engine through the invoke() method, and after that it replaces the original intercepted entries with the result that came back from the business rules agenda. But only if one of the following events had happened: INSERTING or UPDATING. This check is necessary for two reasons. First, those are the only event types that occurs in the same thread of the cache transaction. Second, other event types like INSERTED or UPDATED happens in another thread, which means that they are triggered asynchronously by Coherence.
Setting Up an Coherence Distributed Cache with the Business Rules Interceptor
Now we can start the configuration of the Coherence cache. Since we are using POF as the serialization strategy, we need to assembly an POF configuration file. Starting from the 12.1.2 version of Coherence, there is a new tool called pof-config-gen that introspects JAR files searching for annotated classes with @Portable. Create a POF configuration file that should contain the following content:
And as expected, we also need to create an Coherence cache configuration file. Create one file called coherence-cache-config.xml and fill it with the following contents:
This cache configuration file is very straightforward. There is only three important things to consider here. First, we are using the new interceptor section to declare our interceptor and pass constructor arguments for it. Second, we used another feature from Coherence 12.1.2 version, which is the asynchronous backup feature. Using this feature dramatically reduces the latency of one single transaction, since backups are written after (in another thread) that the primary entry has been written. Not necessarily a pre-condition for the interceptor stuff works, but in the context of BRMS, should be a great idea. Third, we also defined a proxy-scheme that expose an TCP/IP endpoint, so we can use the Coherence*Extend feature later in this article, to allow a C++ application to access the same cache.
Testing the Scenario
Now that we have all the configuration in place, we can start the tests. Start an Coherence node JVM with the configuration file from the previous section. When you start the Coherence, a DecisionPoint object pointing to the business rules dictionary will be created in-memory. Implement a Java program to test the behavior of the implementation as the listing below:
This Java application can be executed with the storage-enabled parameter set to false. Executing this code will give you an output similar to this:
As you can see in the output, the number of promotions showed reveals that the business rules were really executed, since during the instantiation of the customer object promotions weren't provided. The output also tells us another important thing: transaction latency. For the first cache entry we got 53 ms as overall latency, quite short if you consider what happened behind the scenes. But the second cache entry is even much more faster, with 0 ms of latency. This means that the actual time necessary to execute the entire transaction was something below of one millisecond, giving us an real sub-millisecond latency scenario, measured in microseconds.
High Scalable Business Rules
It is not so obvious when you understand this implementation for first time, but another important aspect of this design is scalability. Since the cache type that we used was the distributed one, also known as partitioned, the overall cache entries are equally distributed among all Coherence nodes available. If we use only one node, of course that this one node will handle the entire dataset by itself. But if we use four nodes, each node will handle 25% of the dataset. This means that if we insert one million customer objects in the cache, each node will handle only 250K customers.
This type of data storage offers a huge benefit for Oracle Business Rules, which is the truly data load distribution. Remember that I said before that each Coherence node will hold one DecisionPoint instance? Since each node handle only a percentage of the entire dataset, its reasonable to think that each node will fire rules only for the data that it manages. This happens this way because Coherence interceptors are executed in the JVM that the data lives, not in the entire data grid since it is not a distributed processing. For instance, if the customer "A" is primarily stored in the "JVM 1", and this customer "A" has its fields updated by one client application, business rules will be fired and executed only in the "JVM 1". The other JVMs will not execute any business rules. This means that CPU overhead can be balanced across the cluster of servers, allowing the In-Memory Data Grid scale up horizontally, using the overall compute power of different servers available in the cluster.
API Transparency and Multiple Programming Language Support
Once the Oracle Business Rules is encapsulated in Coherence through an interceptor, there is another great advantage of this design: API transparency. Developers don't need to write custom code to interact with Oracle Business Rules. In fact, they don't ever need to know that business rules are being executed when objects are written in Coherence. Since all happens behind the scenes, this approach free developers from extra complexity, allowing them to work only in a data-oriented fashion which is very productive and less error prone.
And because Oracle Coherence offers you not only a Java API to interact with the In-Memory Data Grid, but also a C++, .NET and an REST API, you can leverage several types of clients and applications to trigger business rules executions. In fact, I have created a very small C++ application using Microsoft Visual Studio to test this behavior. The application code below inserts 1K customers into the In-Memory Data Grid, with an average transaction latency of ~5 ms, using a VM with 3 vCores and 10 GB of RAM.
An Alternative Version of the Interceptor for MDS Scenarios
The interceptor created in this article uses the Oracle Business Rules Java API to read the dictionary directly from the file system. This approach suggests two things: first, that the repository of the dictionary will be the file system. Second, that the authoring and management of the dictionary will be done through JDeveloper. This can lead into some lost of the BRMS power since business users won't feel comfortable authoring their rules in a technological environment such as JDeveloper. Administrators won't have the power of see who changed what since virtually any person can open the file in JDeveloper and change its contents.
A better way to manage this is storing the dictionary in a MDS repository, which is part of the Oracle SOA Suite platform. Storing the dictionary in the MDS repository allows business users to interact with business rules through the SOA composer, a very nice web tool, more simpler and easy-2-use than JDeveloper. Administrators can also track down changes, since everything in the MDS are audited, transaction based and securely controlled, since you have to first log in the console to get access to the composer.
I have implemented another version of the interceptor, making full use of the power of Oracle SOA Suite and MDS repositories. The implementation of MDSRulesInterceptor.java is being tested for over a month and is performing quite well, just like the FSRulesInterceptor.java implementation. In the future, I will post here this implementation, but for now just keep in mind the powerful things that can be done with Oracle Business Rules and Coherence In-Memory Data Grid. Oracle Fusion Middleware really rocks isn't?