Monday Oct 28, 2013

Oracle Coherence, Split-Brain and Recovery Protocols In Detail

This article provides a high level conceptual overview of Split-Brain scenarios in distributed systems. It will focus on a specific example of cluster communication failure and recovery in Oracle Coherence. This includes a discussion on the witness protocol (used to remove failed cluster members) and the panic protocol (used to resolve Split-Brain scenarios).

Note that the removal of cluster members does not necessarily indicate a Split-Brain condition. Oracle Coherence does not (and cannot) detect a Split-Brain as it occurs, the condition is only detected when cluster members that previously lost contact with each other regain contact.

Cluster Topology and Configuration

In order to create an good didactic for the article, let's assume a cluster topology and configuration. In this example we have a six member cluster, consisting of one JVM on each physical machine. The member IDs are as follows:

Member ID  IP Address
 1  10.149.155.76
 2  10.149.155.77
 3  10.149.155.236
 4  10.149.155.75
 5  10.149.155.79
 6  10.149.155.78

Members 1, 2, and 3 are connected to a switch, and members 4, 5, and 6 are connected to a second switch. There is a link between the two switches, which provides network connectivity between all of the machines.


Member 1 is the first member to join this cluster, thus making it the senior member. Member 6 is the last member to join this cluster. Here is a log snippet from Member 6 showing the complete member set:

2010-02-26 15:27:57.390/3.062 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=6): Started DefaultCacheServer...

SafeCluster: Name=cluster:0xDDEB

Group{Address=224.3.5.3, Port=35465, TTL=4}

MasterMemberSet
  (
  ThisMember=Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  OldestMember=Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=6, BitSetCount=2
    Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
    Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
    Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
    Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
    Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer)
    Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
    )
  RecycleMillis=120000
  RecycleSet=MemberSet(Size=0, BitSetCount=0
    )
  )

At approximately 15:30, the connection between the two switches is severed:


Thirty seconds later (the default packet timeout in development mode) the logs indicate communication failures across the cluster. In this example, the communication failure was caused by a network failure. In a production setting, this type of communication failure can have many root causes, including (but not limited to) network failures, excessive GC, high CPU utilization, swapping/virtual memory, and exceeding maximum network bandwidth. In addition, this type of failure is not necessarily indicative of a split brain. Any communication failure will be logged in this fashion. Member 2 logs a communication failure with Member 5:

2010-02-26 15:30:32.638/196.928 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  )

The Coherence clustering protocol (TCMP) is a reliable transport mechanism built on UDP. In order for the protocol to be reliable, it requires an acknowledgement (ACK) for each packet delivered. If a packet fails to be acknowledged within the configured timeout period, the Coherence cluster member will log a packet timeout (as seen in the log message above). When this occurs, the cluster member will consult with other members to determine who is at fault for the communication failure. If the witness members agree that the suspect member is at fault, the suspect is removed from the cluster. If the witnesses unanimously disagree, the accuser is removed. This process is known as the witness protocolSince Member 2 cannot communicate with Member 5, it selects two witnesses (Members 1 and 4) to determine if the communication issue is with Member 5 or with itself (Member 2). However, Member 4 is on the switch that is no longer accessible by Members 1, 2 and 3; thus a packet timeout for member 4 is recorded as well:

2010-02-26 15:30:35.648/199.938 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  )

Member 1 has the ability to confirm the departure of member 4, however Member 6 cannot as it is also inaccessible. At the same time, Member 3 sends a request to remove Member 6, which is followed by a report from Member 3 indicating that Member 6 has departed the cluster:

2010-02-26 15:30:35.706/199.996 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=2): MemberLeft request for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
2010-02-26 15:30:35.709/199.999 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=2): MemberLeft notification for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)

The log for Member 3 determines how Member 6 departed the cluster:

2010-02-26 15:30:35.161/191.694 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=3): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
  )
2010-02-26 15:30:35.165/191.698 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=3): Member departure confirmed by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
  ); removing Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)

In this case, Member 3 happened to select two witnesses that it still had connectivity with (Members 1 and 2) thus resulting in a simple decision to remove Member 6.


Given the departure of Member 6, Member 2 is left with a single witness to confirm the departure of Member 4:

2010-02-26 15:30:35.713/200.003 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Member departure confirmed by MemberSet(Size=1, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  ); removing Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)

In the meantime, Member 4 logs a missing heartbeat from the senior member. This message is also logged on Members 5 and 6.

2010-02-26 15:30:07.906/150.453 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.

Next, Member 4 logs a TcpRing failure with Member 2, thus resulting in the termination of Member 2:

2010-02-26 15:30:21.421/163.968 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=Cluster, member=4): TcpRing: Number of socket exceptions exceeded maximum; last was "java.net.SocketTimeoutException: connect timed out"; removing the member: 2

For quick process termination detection, Oracle Coherence utilizes a feature called TcpRing which is a sparse collection of TCP/IP-based connections between different members in the cluster. Each member in the cluster is connected to at least one other member, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer, only heartbeat communications are sent once a second per each link. If a certain number of exceptions are thrown while trying to re-establish a connection, the member throwing the exceptions is removed from the cluster. Member 5 logs a packet timeout with Member 3 and cites witnesses Members 4 and 6:

2010-02-26 15:30:29.791/165.037 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=5): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  )
2010-02-26 15:30:29.798/165.044 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=5): Member departure confirmed by MemberSet(Size=2, BitSetCount=2
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  ); removing Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)

Eventually we are left with two distinct clusters consisting of Members 1, 2, 3 and Members 4, 5, 6, respectively. In the latter cluster, Member 4 is promoted to senior member.


The connection between the two switches is restored at 15:33. Upon the restoration of the connection, the cluster members immediately receive cluster heartbeats from the two senior members. In the case of Members 1, 2, and 3, the following is logged:

2010-02-26 15:33:14.970/369.066 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=1): The member formerly known as Member(Id=4, Timestamp=2010-02-26 15:30:35.341, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

Likewise for Members 4, 5, and 6:

2010-02-26 15:33:14.343/336.890 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=4): The member formerly known as Member(Id=1, Timestamp=2010-02-26 15:30:31.64, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

This message indicates that a senior heartbeat is being received from members that were previously removed from the cluster, in other words, something that should not be possible. For this reason, the recipients of these messages will initially ignore them. After several iterations of these messages, the existence of multiple clusters is acknowledged, thus triggering the panic protocol to reconcile this situation. When the presence of more than one cluster (i.e. Split-Brain) is detected by a Coherence member, the panic protocol is invoked in order to resolve the conflicting clusters and consolidate into a single cluster. The protocol consists of the removal of smaller clusters until there is one cluster remaining. In the case of equal size clusters, the one with the older Senior Member will survive. Member 1, being the oldest member, initiates the protocol:

2010-02-26 15:33:45.970/400.066 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=1): An existence of a cluster island with senior Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) containing 3 nodes have been detected. Since this Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) is the senior of an older cluster island, the panic protocol is being activated to stop the other island's senior and all junior nodes that belong to it.

Member 3 receives the panic:

2010-02-26 15:33:45.803/382.336 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=3): Received panic from senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) caused by Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)

Member 4, the senior member of the younger cluster, receives the kill message from Member 3:

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.

In turn, Member 4 requests the departure of its junior members 5 and 6:

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.

2010-02-26 15:33:43.343/349.015 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=6): Received a Kill message from a valid Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer); stopping cluster service.

Once Members 4, 5, and 6 restart, they rejoin the original cluster with senior member 1. The log below is from Member 4. Note that it receives a different member id when it rejoins the cluster.

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.
2010-02-26 15:33:46.921/369.468 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Service Cluster left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:InvocationService, member=4): Service InvocationService left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=OptimisticCache, member=4): Service OptimisticCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=ReplicatedCache, member=4): Service ReplicatedCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service Management with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service DistributedCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service ReplicatedCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service OptimisticCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service InvocationService with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member(Id=6, Timestamp=2010-02-26 15:33:47.046, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) left Cluster with senior member 4
2010-02-26 15:33:49.218/371.765 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=n/a): Restarting cluster
2010-02-26 15:33:49.421/371.968 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2010-02-26 15:33:49.625/372.172 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): This Member(Id=5, Timestamp=2010-02-26 15:33:50.499, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xDDEB" with senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2)

Cool isn't it?

Friday Aug 16, 2013

The Perfect Marriage: Oracle Business Rules & Coherence In-Memory Data Grid. High Scalable Business Rules with Extreme Low Latency

The idea of separating business rules from the application logic is by far an old concept. But in the last ten years, what we have seem is that dozen of platforms and technologies has been created to allow this separation of concerns. One of those technologies is BRMS, acronym of Business Rules Management System. The basic idea of one BRMS is to be a repository of rules, governing those rules in such way that they can be created, updated, tested and controlled by an external interface. Part of the BRMS responsibility it is also provide an API (more than one when possible) that allows external applications to interact with the BRMS, allowing those applications to send data over the network, and that data can trigger the execution of zero, one or multiples rules in the BRMS repository. This rule execution occurs outside of those external applications, minimizing their process memory footprint and generating much less CPU overhead since the execution processing of the rules happens in a separated server/cluster. This architecture approach is very powerful, allowing:

  • Rules can be managed (created, updated) outside of the application code
  • Rules can be reused across different applications, no matter their technology
  • Less CPU overhead and smaller memory footprint in the applications
  • More control over rules, auditing of changes and enterprise log history
  • Integration with other IT artifacts like dictionaries, processes, services

With this context in place, we are all agree that the usage of one BRMS is a mandatory approach on every IT architecture due its power, if it were not for the fact that BRMS technologies introduces a lot of overhead in the overall transaction latency. In the middle of the external application that invokes the BRMS to execute rules and the BRMS platform itself, there is the network channel. This means that we must deal with network I/O and their technical implications (serialization, instability, buffering bytes approach) when we send/receive data to/from the BRMS. No matter if the BRMS provides an SOAP API, an REST API or any other TCP/IP based API, the overall transaction latency is compromised by the network overhead.

Another huge problem of BRMS platforms is scalability. When the BRMS platform is first introduced to an architecture, it handles an acceptable number of TPS (Transactions Per Second), which nowadays varies from 1K TPS to 5K TPS. But when other applications starts using the same BRMS platform, or the number of transactions just naturally grows, you can face scenarios when your BRMS platform must deal with 20K TPS or even 100K TPS. What happens when a huge numbers of objects are allocated in the heap space of the Java based server? The memory footprint starts to reach its maximum size and the garbage collector starts to run to reclaim the unused memory and/or redesign the layout space. No matter what job the garbage collector has to do, it will use the entire processing power to runs its job as soon as possible, since the amount of garbage to handle will be huge. This is true for the almost BRMS platforms of the market, no matter if its from one vendor or another. If the BRMS platform are Java based, when those servers JVM reach more than 16 GB of space in average, they starts to face a huge performance problem due garbage collection.

Differently from other architecture designs in which the load is distributed across a cluster, BRMS platforms must handle the entire processing in a single server due a general concept of BRMS platforms known as execution agenda and working memory. All the facts (the data sent as input) are maintained in this agenda in a single server, making the BRMS platform a pinned service, in which they do their job in a singleton fashion. In this situation, when you need to scale, you can introduce series of equally servers, below a corporate load-balancer that instead of distribute load, it divides entire transaction volumes across those servers. Because each server below the load-balancer handle the entire volume by itself, those servers limit concurrency by the number of processors available in their mainboard. If you need more compute power, due lack of concurrency, you are forced to buy a much higher server. Those servers are huge, expensive and costs a lot of money since they need to be big enough in terms of processors to handle thousands of executions simultaneously and completely alone. Not a very smart approach when you considering to handle millions of TPS.

With this situation in mind, it is necessary to design an architecture that would allow business rules execution be distributed across different servers. To achieve this behavior, it is necessary to use another software component that could share data (business entities, fact types, data transfer objects) across different processes, running in the same or different hardware boxes. And more important than that, a software component that would allow transaction latency to be short enough, reducing a lot of milliseconds introduced by network overhead. In other words, this software component must bring data to the unique hardware layer that really doesn't implies in I/O overhead, which is memory.

Recently, in order to deal with this problem and provide for a customer an scalable plus high performance way to use Oracle Business Rules, I designed an solution that solves both problems in a once, without losing the power of separation of concerns provided by BRMS platforms. In-Memory Data Grid technologies like Oracle Coherence has the power of handling massive amounts of data (MB, GB or even TB) completely in-memory. Moreover, this kind of technology has been written from scratch to distribute data across a number of servers, so scalability is never a problem here. When you integrate BRMS with In-Memory Data Grid technologies, you can do both of the two worlds: scalability plus high performance and also extreme low latency. And when I say extreme low latency I mean, sub-milliseconds of latency. Something around less than 650 μs in my tests.

This article will show how to integrate Oracle Business Rules with Oracle Coherence. The steps showed here can be reproduced for a huge number of scenarios, making your investment on Oracle Fusion Middleware (Cloud Application Foundation and/or SOA Suite stack) even more attractive.

The Business Scenario: Automatic Promotions for Bank Customers

Before we move to the implementation details of this article, we need to understand the business scenario used as didactic. We are about to simulate an automatic decision system that create promotions for banking customers based on their profiles. The idea here is let the BRMS platform decide which promotions to offer based on customer profiles that applications send it. This automatic promotion system should allow applications like internet banking sites, mobile applications or kiosk terminals, to present promotions (up-selling/cross-selling) to its final customers.

Building the Solution Domain Model

Let's start the development of the example. The first thing to do is the creation of the domain model, which means that we need to design and implement the business entities that will drive the client-side application execution, as such the business rules. The automatic promotion system will be composed of three entities: promotions, products and customers. A promotion it is something that the bank would offer to the customer, with contextual information about the business value of one or more products, derived from the customer profile. Here is the implementation of the promotion entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

@Portable public class Promotion {
    
    @PortableProperty(0) private String id;
    @PortableProperty(1) private String description;
    
    public Promotion() {}
    
    public Promotion(String id, String description) {
        setId(id);
        setDescription(description);
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }
    
}

A product is something that the customer hire from the bank. Some kind of service or item that make the customer account more valuable to the bank and more attractive to the customer since it is a differentiator. Here is the implementation of the product entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

@Portable public class Product {
    
    @PortableProperty(0) private int id;
    @PortableProperty(1) private String name;
    
    public Product() {}
    
    public Product(int id, String name) {
        setId(id);
        setName(name);
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
    
}

And finally, we need to design the customer entity. The customer entity will be the representation of the person or company that hires one or more products from the bank. Here is the implementation of the customer entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Set;

@Portable public class Customer {
    
    @PortableProperty(0) private String ssn;
    @PortableProperty(1) private String firstName;
    @PortableProperty(2) private String lastName;
    @PortableProperty(3) private Date birthDate;
    
    @PortableProperty(4) private String account;
    @PortableProperty(5) private String agency;
    @PortableProperty(6) private double balance;
    @PortableProperty(7) private char custType;
    
    @PortableProperty(8) private Set<Product> products;
    @PortableProperty(9) private List<Promotion> promotions;
    
    public Customer() {}
    
    public Customer(String ssn, String firstName, String lastName,
                    Date birthDate, String account, String agency,
                    double balance, char custType, Set<Product> products) {
        setSsn(ssn);
        setFirstName(firstName);
        setLastName(lastName);
        setBirthDate(birthDate);
        setAccount(account);
        setAgency(agency);
        setBalance(balance);
        setCustType(custType);
        setProducts(products);
    }
    
    public void addPromotion(String id, String description) {
        getPromotions().add(new Promotion(id, description));
    }

    public String getSsn() {
        return ssn;
    }

    public void setSsn(String ssn) {
        this.ssn = ssn;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public Date getBirthDate() {
        return birthDate;
    }

    public void setBirthDate(Date birthDate) {
        this.birthDate = birthDate;
    }

    public String getAccount() {
        return account;
    }

    public void setAccount(String account) {
        this.account = account;
    }

    public String getAgency() {
        return agency;
    }

    public void setAgency(String agency) {
        this.agency = agency;
    }

    public double getBalance() {
        return balance;
    }

    public void setBalance(double balance) {
        this.balance = balance;
    }

    public char getCustType() {
        return custType;
    }

    public void setCustType(char custType) {
        this.custType = custType;
    }

    public Set<Product> getProducts() {
        return products;
    }

    public void setProducts(Set<Product> products) {
        this.products = products;
    }

    public List<Promotion> getPromotions() {
        if (promotions == null) {
            promotions = new ArrayList<Promotion>();
        }
        return promotions;
    }

    public void setPromotions(List<Promotion> promotions) {
        this.promotions = promotions;
    }
    
} 

As you can see in the code, the customer entity has a relationship with the two other entities. Build this code and package those three entities into a JAR file. We can now move to the second part of the implementation which is the creation of one SOA project that includes an business rules dictionary.

Creating the Business Rules Dictionary

Business rules in the Oracle Business Rules product are defined in an artifact called dictionary. In order to create an dictionary, you must use the Oracle JDeveloper IDE plus the SOA extension for JDeveloper. I will assume here that you are familiar with those tools, so I will not enter in too much detail about them. In JDeveloper, create a new SOA project, and after that create a business rules dictionary. With the dictionary in place, you must configure the dictionary to consider our domain model as fact types.


Now you can write down some business rules. Using the JDeveloper business rules editor, define the following rules as shown in the picture below.


For testing purposes, the variable "MinimumBalanceForCreditCard" it is just a global variable of type java.lang.Double that contains a constant value. Finally, you are required to expose those business rules through an decision function. As you probably already know, decision functions are constructions that make easier external applications to interact with Oracle Business Rules, minimizing the developers effort to deal with the Oracle Business Rules API, besides providing a very nice contract-based access point. Create one decision point that receives an customer as input, and returns the same customer as output. Don't forget to associate the ruleset with the decision function.

Integrating Oracle Business Rules and Coherence through Interceptors

Now here came the most exciting part of the article: the integration between Oracle Business Rules and Oracle Coherence In-Memory Data Grid. Starting from 12.1.2 version of Coherence, Oracle announced an new API called Live Events. This new API allows applications to listen/consume events from Coherence, no matter what type of event it is being generated. You can learn more about Coherence Live Events in this Youtube presentation.

Using both Coherence and Oracle Business Rules main libraries, implement the following event interceptor at your favorite Java development environment:

package com.oracle.coherence.events;

import com.tangosol.net.events.EventInterceptor;
import com.tangosol.net.events.annotation.Interceptor;
import com.tangosol.net.events.partition.cache.EntryEvent;
import com.tangosol.util.BinaryEntry;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Set;

import oracle.rules.sdk2.decisionpoint.DecisionPoint;
import oracle.rules.sdk2.decisionpoint.DecisionPointBuilder;
import oracle.rules.sdk2.decisionpoint.DecisionPointDictionaryFinder;
import oracle.rules.sdk2.decisionpoint.DecisionPointInstance;
import oracle.rules.sdk2.dictionary.RuleDictionary;
import oracle.rules.sdk2.exception.SDKException;

/**
 * @author ricardo.s.ferreira@oracle.com
 */

@Interceptor
public class FSRulesInterceptor implements EventInterceptor<EntryEvent> {

    private List<EntryEvent.Type> types;
    private String dictionaryLocation;
    private String decisionFunctionName;
    private boolean dictionaryAutoUpdate;
    private long dictionaryTimestamp;

    private void parseEntryEventTypes(String entryEventTypes) {
        types = new ArrayList<EntryEvent.Type>();
        String[] listTypes = entryEventTypes.split(COMMA);
        for (String type : listTypes) {
            types.add(EntryEvent.Type.valueOf(type.trim()));
        }
    }

    private RuleDictionary loadRuleDictionary()
        throws FileNotFoundException, SDKException, IOException {
        File dictionaryFile = new File(dictionaryLocation);
        dictionaryTimestamp = dictionaryFile.lastModified();
        RuleDictionary ruleDictionary = RuleDictionary.readDictionary(
            new FileReader(dictionaryFile), new DecisionPointDictionaryFinder(null));
        return ruleDictionary;
    }

    private DecisionPoint createDecisionPoint() {
        DecisionPoint decisionPoint = null;
        try {
            decisionPoint =
                    new DecisionPointBuilder()
                    .with(loadRuleDictionary())
                    .with(decisionFunctionName).build();
        } catch (Exception ex) {
            throw new RuntimeException("Unable to create the DecisionPoint", ex);
        }
        return decisionPoint;
    }

    private void updateDecisionPointIfNecessary() {
        File dictionaryFile = new File(dictionaryLocation);
        if (dictionaryFile.lastModified() != dictionaryTimestamp) {
            decisionPoint.release();
            decisionPoint = createDecisionPoint();
        }
    }

    private boolean eventTypeIsAllowed(EntryEvent.Type entryEventType) {
        return types.contains(entryEventType);
    }

    public FSRulesInterceptor(String entryEventTypes,
                              String dictionaryLocation,
                              String decisionFunctionName) {
        parseEntryEventTypes(entryEventTypes);
        this.dictionaryLocation = dictionaryLocation;
        this.decisionFunctionName = decisionFunctionName;
        decisionPoint = createDecisionPoint();
    }

    public FSRulesInterceptor(String entryEventTypes,
                              String dictionaryLocation,
                              String decisionFunctionName,
                              boolean dictionaryAutoUpdate) {
        this(entryEventTypes, dictionaryLocation, decisionFunctionName);
        this.dictionaryAutoUpdate = dictionaryAutoUpdate;
    }

    public void onEvent(EntryEvent entryEvent) {

        BinaryEntry binaryEntry = null;
        Iterator<BinaryEntry> iter = null;
        Set<BinaryEntry> entrySet = null;
        DecisionPointInstance dPointInst = null;
        List<Object> inputs, outputs = null;

        try {

            if (eventTypeIsAllowed(entryEvent.getType())) {

                if (dictionaryAutoUpdate) {
                    updateDecisionPointIfNecessary();
                }

                inputs = new ArrayList<Object>();
                entrySet = entryEvent.getEntrySet();
                for (BinaryEntry binaryEntryInput : entrySet) {
                    inputs.add(binaryEntryInput.getValue());
                }

                dPointInst = decisionPoint.getInstance();
                dPointInst.setInputs(inputs);
                outputs = dPointInst.invoke();

                if (entryEvent.getType() == EntryEvent.Type.INSERTING ||
                    entryEvent.getType() == EntryEvent.Type.UPDATING) {
                    if (outputs != null && !outputs.isEmpty()) {
                        iter = entrySet.iterator();
                        for (Object output : outputs) {
                            if (iter.hasNext()) {
                                binaryEntry = iter.next();
                                binaryEntry.setValue(output);
                            }
                        }
                    }
                }
                
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }
    
    private static final String COMMA = ",";
    private static DecisionPoint decisionPoint;

}

If you are familiar with the Oracle Business Rules Java API, you won't find any difficult to understand this code. What it does is simply create an DecisionPoint object during the constructor phase and put this object into a static variable, which allow this object to be shared across the entire JVM. Remember that the JVM in this context is a Coherence node, so what I am saying is that each Coherence node will hold an instance of one DecisionPoint. On the onEvent() method, there is the algorithm that checks which type of event the implementation should intercept, and also checks if the DecisionPoint instance should be updated. This last check is done based on the timestamp of the dictionary file.

After creating an DecisionPointInstance, the intercepted entries became the input variables for the business rules execution. The interceptor triggers the rules engine through the invoke() method, and after that it replaces the original intercepted entries with the result that came back from the business rules agenda. But only if one of the following events had happened: INSERTING or UPDATING. This check is necessary for two reasons. First, those are the only event types that occurs in the same thread of the cache transaction. Second, other event types like INSERTED or UPDATED happens in another thread, which means that they are triggered asynchronously by Coherence.

Setting Up an Coherence Distributed Cache with the Business Rules Interceptor

Now we can start the configuration of the Coherence cache. Since we are using POF as the serialization strategy, we need to assembly an POF configuration file. Starting from the 12.1.2 version of Coherence, there is a new tool called pof-config-gen that introspects JAR files searching for annotated classes with @Portable. Create a POF configuration file that should contain the following content:

<?xml version='1.0'?>

<pof-config
   xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
   xmlns='http://xmlns.oracle.com/coherence/coherence-pof-config'
   xsi:schemaLocation='http://xmlns.oracle.com/coherence/coherence-pof-config coherence-pof-config.xsd'>
	
   <user-type-list>
      <include>coherence-pof-config.xml</include>
      <user-type>
         <type-id>1001</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Promotion</class-name>
      </user-type>
      <user-type>
         <type-id>1002</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Product</class-name>
      </user-type>
      <user-type>
         <type-id>1003</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Customer</class-name>
      </user-type>
   </user-type-list>
	
</pof-config>

And as expected, we also need to create an Coherence cache configuration file. Create one file called coherence-cache-config.xml and fill it with the following contents:

<?xml version="1.0"?>

<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config coherence-cache-config.xsd">

   <defaults>
      <serializer>pof</serializer>
   </defaults>

   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>customers</cache-name>
         <scheme-name>customersScheme</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>

   <caching-schemes>
	
      <distributed-scheme>
         <scheme-name>customersScheme</scheme-name>
         <service-name>DistribService</service-name>
         <backing-map-scheme>
            <local-scheme />
         </backing-map-scheme>
         <autostart>true</autostart>
         <interceptors>
            <interceptor>
               <name>rulesInterceptor</name>
               <instance>
                  <class-name>com.oracle.coherence.events.FSRulesInterceptor</class-name>
                  <init-params>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>INSERTING, UPDATING</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>C:\\multiChannelArchitecture.rules</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>BankingDecisionFunction</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.Boolean</param-type>
                        <param-value>true</param-value>
                     </init-param>
                  </init-params>
               </instance>
            </interceptor>
         </interceptors>
         <async-backup>true</async-backup>
      </distributed-scheme>
		
      <proxy-scheme>
         <scheme-name>customersProxy</scheme-name>
         <service-name>ProxyService</service-name>
         <acceptor-config>
            <tcp-acceptor>
               <local-address>
                  <address>cloud.app.foundation</address>
                  <port>5555</port>
               </local-address>
            </tcp-acceptor>
         </acceptor-config>
         <autostart>true</autostart>
      </proxy-scheme>

   </caching-schemes>
	
</cache-config>    

This cache configuration file is very straightforward. There is only three important things to consider here. First, we are using the new interceptor section to declare our interceptor and pass constructor arguments for it. Second, we used another feature from Coherence 12.1.2 version, which is the asynchronous backup feature. Using this feature dramatically reduces the latency of one single transaction, since backups are written after (in another thread) that the primary entry has been written. Not necessarily a pre-condition for the interceptor stuff works, but in the context of BRMS, should be a great idea. Third, we also defined a proxy-scheme that expose an TCP/IP endpoint, so we can use the Coherence*Extend feature later in this article, to allow a C++ application to access the same cache.

Testing the Scenario

Now that we have all the configuration in place, we can start the tests. Start an Coherence node JVM with the configuration file from the previous section. When you start the Coherence, a DecisionPoint object pointing to the business rules dictionary will be created in-memory. Implement a Java program to test the behavior of the implementation as the listing below:

package com.acme.architecture.multichannel.test;

import com.acme.architecture.multichannel.domain.Customer;

import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;

import java.util.Date;

public class Application {
    
    public static void main(String[] args) {
        
        NamedCache customers = CacheFactory.getCache("customers");
        
        // Simulating a simple customer of type 'Person'...
        // Should trigger the rule 'Basic Products for First Customers'
        // Expected number of promotions: 02
        String ssn = "12345";
        Customer personCust = new Customer(ssn, "Ricardo", "Ferreira",
                                         new Date(1981, 10, 05), "245671",
                                         "3158", 98000, 'P', null);
        
        long startTime = System.currentTimeMillis();
        customers.put(ssn, personCust);
        personCust = (Customer) customers.get(ssn);
        long elapsedTime = System.currentTimeMillis() - startTime;
        
        System.out.println();
        System.out.println("   ---> Number of Promotions: " +
                           personCust.getPromotions().size());
        System.out.println("   ---> Elapsed Time: " + elapsedTime + " ms");
        
        // Simulating a simple customer of type 'Company'...
        // Should trigger the rule 'Corporate Credit Card for Enterprise Customers'
        // Expected number of promotions: 01
        ssn = "54321";
        Customer companyCust = new Customer(ssn, "Huge", "Company",
                                         new Date(1981, 10, 05), "235437",
                                         "7856", 8900000, 'C', null);
        
        startTime = System.currentTimeMillis();
        customers.put(ssn, companyCust);
        companyCust = (Customer) customers.get(ssn);
        elapsedTime = (System.currentTimeMillis() - startTime);
        
        System.out.println("   ---> Number of Promotions: " +
                           companyCust.getPromotions().size());
        System.out.println("   ---> Elapsed Time: " + elapsedTime + " ms");
        System.out.println();
        
    }
    
}

This Java application can be executed with the storage-enabled parameter set to false. Executing this code will give you an output similar to this:

2013-08-15 23:37:53.058/1.378 Oracle Coherence 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/tangosol-coherence.xml"
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "cache-factory-config.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "cache-factory-builder-config.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 12.1.2.0.0 Build 44396
 Grid Edition: Development mode
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

2013-08-15 23:37:53.380/1.700 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "file:/C:/poc-itau-brms/resources/coherence-cache-config.xml"
2013-08-15 23:37:55.615/3.935 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Created cache factory com.tangosol.net.ExtensibleConfigurableCacheFactory
2013-08-15 23:37:56.206/4.526 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=Main Thread, member=n/a): TCMP bound to /10.0.3.15:8090 using SystemDatagramSocketProvider
2013-08-15 23:37:56.528/4.848 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): Failed to satisfy the variance: allowed=16, actual=54
2013-08-15 23:37:56.528/4.848 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): Increasing allowable variance to 20
2013-08-15 23:37:56.885/5.205 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication, Edition=Grid Edition, Mode=Development, CpuCount=3, SocketCount=3) joined cluster "cluster:0x50DB" with senior Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=3, SocketCount=3)
2013-08-15 23:37:57.189/5.509 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0x50DB

Group{Address=224.12.1.0, Port=12100, TTL=4}

MasterMemberSet(
  ThisMember=Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication)
  OldestMember=Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=2
    Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer)
    Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication)
    )
  MemberId|ServiceVersion|ServiceJoined|MemberState
    1|12.1.2|2013-08-15 21:16:58.769|JOINED,
    2|12.1.2|2013-08-15 23:37:56.653|JOINED
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

TcpRing{Connections=[1]}
IpMonitor{Addresses=0}

2013-08-15 23:37:57.243/5.581 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Loaded POF configuration from "file:/C:/poc-itau-brms/resources/pof-config.xml"
2013-08-15 23:37:57.261/5.581 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Loaded included POF configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/coherence-pof-config.xml"
2013-08-15 23:37:57.386/5.706 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:Management, member=2): Service Management joined the cluster with senior service member 1
2013-08-15 23:37:57.494/5.814 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=2): Loaded Reporter configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/reports/report-group.xml"
2013-08-15 23:37:58.012/6.332 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=DistributedCache:DistribService, member=2): Service DistribService joined the cluster with senior service member 1

   ---> Number of Promotions: 2
   ---> Elapsed Time: 53 ms
   ---> Number of Promotions: 1
   ---> Elapsed Time: 0 ms

2013-08-15 23:37:58.137/6.457 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=ShutdownHook, member=2): ShutdownHook: stopping cluster node

As you can see in the output, the number of promotions showed reveals that the business rules were really executed, since during the instantiation of the customer object promotions weren't provided. The output also tells us another important thing: transaction latency. For the first cache entry we got 53 ms as overall latency, quite short if you consider what happened behind the scenes. But the second cache entry is even much more faster, with 0 ms of latency. This means that the actual time necessary to execute the entire transaction was something below of one millisecond, giving us an real sub-millisecond latency scenario, measured in microseconds.

High Scalable Business Rules

It is not so obvious when you understand this implementation for first time, but another important aspect of this design is scalability. Since the cache type that we used was the distributed one, also known as partitioned, the overall cache entries are equally distributed among all Coherence nodes available. If we use only one node, of course that this one node will handle the entire dataset by itself. But if we use four nodes, each node will handle 25% of the dataset. This means that if we insert one million customer objects in the cache, each node will handle only 250K customers.

This type of data storage offers a huge benefit for Oracle Business Rules, which is the truly data load distribution. Remember that I said before that each Coherence node will hold one DecisionPoint instance? Since each node handle only a percentage of the entire dataset, its reasonable to think that each node will fire rules only for the data that it manages. This happens this way because Coherence interceptors are executed in the JVM that the data lives, not in the entire data grid since it is not a distributed processing. For instance, if the customer "A" is primarily stored in the "JVM 1", and this customer "A" has its fields updated by one client application, business rules will be fired and executed only in the "JVM 1". The other JVMs will not execute any business rules. This means that CPU overhead can be balanced across the cluster of servers, allowing the In-Memory Data Grid scale up horizontally, using the overall compute power of different servers available in the cluster.

API Transparency and Multiple Programming Language Support

Once the Oracle Business Rules is encapsulated in Coherence through an interceptor, there is another great advantage of this design: API transparency. Developers don't need to write custom code to interact with Oracle Business Rules. In fact, they don't ever need to know that business rules are being executed when objects are written in Coherence. Since all happens behind the scenes, this approach free developers from extra complexity, allowing them to work only in a data-oriented fashion which is very productive and less error prone.

And because Oracle Coherence offers you not only a Java API to interact with the In-Memory Data Grid, but also a C++, .NET and an REST API, you can leverage several types of clients and applications to trigger business rules executions. In fact, I have created a very small C++ application using Microsoft Visual Studio to test this behavior. The application code below inserts 1K customers into the In-Memory Data Grid, with an average transaction latency of ~5 ms, using a VM with 3 vCores and 10 GB of RAM.

#include "stdafx.h"
#include <windows.h>
#include <cstring>
#include <iostream>
#include "Customer.hpp"

#include "coherence/lang.ns"
#include "coherence/net/CacheFactory.hpp"
#include "coherence/net/NamedCache.hpp"

using namespace coherence::lang;
using coherence::net::NamedCache;
using coherence::net::CacheFactory;

int NUMBER_OF_ENTRIES = 1000;

__int64 currentTimeMillis()
{
	static const __int64 magic = 116444736000000000;
	SYSTEMTIME st;
	GetSystemTime(&st);
	FILETIME   ft;
	SystemTimeToFileTime(&st,&ft);
	__int64 t;
	memcpy(&t,&ft,sizeof t);
	return (t - magic)/10000;
}

int _tmain(int argc, _TCHAR* argv[])
{

	Customer customer;
	std::string _customerKey;
	String::View customerKey;
	__int64 startTime = 0, endTime = 0;
	__int64 elapsedTime = 0;
	
	NamedCache::Handle customer = CacheFactory::getCache("customers");

	startTime = currentTimeMillis();
	for (int i = 0; i < NUMBER_OF_ENTRIES; i++)
	{
		std::ostringstream stream;
		stream << "customer-" << (i + 1);
		_customerKey = stream.str();
		customerKey = String::create(_customerKey);
		std::string firstName = _customerKey;
		std::string agency = "3158";
		std::string account = "457899";
		char custType = 'P';
		double balance = 98000;
		Customer customer(customerKey, firstName,
			agency, account, custType, balance);
		Managed<Customer>::View customerWrapper = Managed<Customer>::create(customer);
		customers->put(customerKey, customerWrapper);
		Managed<Customer>::View result =
			cast<Managed<Customer>::View> (customers->get(customerKey));
	}
	endTime = currentTimeMillis();
	elapsedTime = endTime - startTime;

	std::cout << std::endl;
	std::cout << "   Elapsed Time..................: "
		<< elapsedTime << " ms" << std::endl;
	std::cout << std::endl;

	CacheFactory::shutdown();
	getchar();
	return 0;

}

An Alternative Version of the Interceptor for MDS Scenarios

The interceptor created in this article uses the Oracle Business Rules Java API to read the dictionary directly from the file system. This approach suggests two things: first, that the repository of the dictionary will be the file system. Second, that the authoring and management of the dictionary will be done through JDeveloper. This can lead into some lost of the BRMS power since business users won't feel comfortable authoring their rules in a technological environment such as JDeveloper. Administrators won't have the power of see who changed what since virtually any person can open the file in JDeveloper and change its contents.

A better way to manage this is storing the dictionary in a MDS repository, which is part of the Oracle SOA Suite platform. Storing the dictionary in the MDS repository allows business users to interact with business rules through the SOA composer, a very nice web tool, more simpler and easy-2-use than JDeveloper. Administrators can also track down changes, since everything in the MDS are audited, transaction based and securely controlled, since you have to first log in the console to get access to the composer.

I have implemented another version of the interceptor, making full use of the power of Oracle SOA Suite and MDS repositories. The implementation of MDSRulesInterceptor.java is being tested for over a month and is performing quite well, just like the FSRulesInterceptor.java implementation. In the future, I will post here this implementation, but for now just keep in mind the powerful things that can be done with Oracle Business Rules and Coherence In-Memory Data Grid. Oracle Fusion Middleware really rocks isn't?


Wednesday Aug 29, 2012

Integrating Coherence & Java EE 6 Applications using ActiveCache

OK, so you are a developer and are starting a new Java EE 6 application using the most wonderful features of the Java EE platform like Enterprise JavaBeans, JavaServer Faces, CDI, JPA e another cool stuff technologies. And your architecture need to hold piece of data into distributed caches to improve application's performance, scalability and reliability?

If this is your current facing scenario, maybe you should look closely in the solutions provided by Oracle WebLogic Server. Oracle had integrated WebLogic Server and its champion data caching technology called Oracle Coherence. This seamless integration between this two products provides a comprehensive environment to develop applications without the complexity of extra Java code to manage cache as a dependency, since Oracle provides an DI ("Dependency Injection") mechanism for Coherence, the same DI mechanism available in standard Java EE applications. This feature is called ActiveCache. In this article, I will show you how to configure ActiveCache in WebLogic and at your Java EE application.

Configuring WebLogic to manage Coherence

Before you start changing your application to use Coherence, you need to configure your Coherence distributed cache. The good news is, you can manage all this stuff without writing a single line of code of XML or even Java. This configuration can be done entirely in the WebLogic administration console. The first thing to do is the setup of a Coherence cluster. A Coherence cluster is a set of Coherence JVMs configured to form one single view of the cache. This means that you can insert or remove members of the cluster without the client application (the application that generates or consume data from the cache) knows about the changes. This concept allows your solution to scale-out without changing the application server JVMs. You can growth your application only in the data grid layer.

To start the configuration, you need to configure an machine that points to the server in which you want to execute the Coherence JVMs. WebLogic Server allows you to do this very easily using the Administration Console. For this example, consider the machine name as "coherence-server".

Remember that in order to the machine concept works, you need to ensure that the NodeManager are being executed in the target server that the machine points to. The NodeManager script can be found in <WLS_HOME>/server/bin/startNodeManager.sh.

The next thing to do is to configure an Coherence cluster. In the WebLogic administration console, navigate to Environment > Coherence Clusters and click in "New" button.

In the field "Name", set the value to "my-coherence-cluster". Click in next.

Specify a valid cluster address and port. The Coherence members will communicate with each other through this address and port. This configuration section tells Coherence to form a cluster using unicast of messages instead of multicast which is the standard. Since the method used will be unicast, you need to configure a valid cluster address and cluster port.

The Coherence cluster has been configured successfully. Now it is time to configure the Coherence members and add them to the cluster. In the WebLogic administration console, navigate to Environment > Coherence Servers and click in "New" button.

In the field "Name" set the value to "coh-server-1". In the field "Machine", associate this new Coherence server to the machine "coherence-server". In the field "Cluster", associate this new Coherence server to the cluster named "my-coherence-cluster". Click in the "Finish" button.

For this example, I will configure only one Coherence server. This means that the Coherence cluster will be composed by only one member. In production scenarios, you will have thousands of Coherence members, all of them distributed in different machines with different configurations. The idea behind Coherence clusters is exactly this: form a virtual data grid composed by different members that will be joined or removed to/from the cluster dynamically.

Before start the Coherence server, you need to configure its classpath. When the JVM of an Coherence server starts, its loads a couple classes that need to be available in runtime. To configure the classpath of the Coherence server, click into the Coherence server name. This action will bring the configuration page of the Coherence server. Click in the "Server Start" tab.

In the "Server Start" tab you will find a lot of fields that can be configured to control the behavior of the Coherence server JVM. In the "Class Path" field, you need to list the following JAR files:

- <WLS_HOME>/modules/features/weblogic.server.modules.coherence.server_12.1.1.0.jar

- <COHERENCE_HOME>/lib/coherence.jar

Remember to use a valid file separator compatible with the target operating system that you are using. Click in the "Save" button to update the configuration of the Coherence server.

Now you are ready to go. Start the Coherence server using the "Control" tab of WebLogic administration console. This will instruct WebLogic to send a command to the target machine. This command, once it is received by the target machine, will be responsible to start a new JVM for the Coherence server, according to the parameters that we have configured.

Configuring your Java EE Application to Access Coherence

Now lets pass to the funny part of the configuration. The first thing to do is to inform your Java EE application which Coherence cluster to join. Oracle had updated WebLogic server deployment descriptors so you will not have to change your code or the containers deployment descriptors like application.xml, ejb-jar.xml or web.xml.

In this example, I will show you how to enable DI ("Dependency Injection") to a Coherence cache from a Servlet 3.0 component. In the WEB-INF/weblogic.xml deployment descriptor, put the following metadata information:

<?xml version="1.0" encoding="UTF-8"?>
<wls:weblogic-web-app
	xmlns:wls="http://xmlns.oracle.com/weblogic/weblogic-web-app"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd
	http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd">
	
	<wls:context-root>myWebApp</wls:context-root>
	
	<wls:coherence-cluster-ref>
		<wls:coherence-cluster-name>my-coherence-cluster</wls:coherence-cluster-name>
	</wls:coherence-cluster-ref>
	
</wls:weblogic-web-app>

As you can see, using the "coherence-cluster-name" tag, we are informing our Java EE application that it should join the "my-coherence-cluster" when it loads in the web container. Without this information, the application will not be able to access the predefined Coherence cluster. It will form its own Coherence cluster without any members. So never forget to put this information.

Now put the coherence.jar and active-cache-1.0.jar dependencies at your WEB-INF/lib application classpath. You need to deploy this dependencies so ActiveCache can automatically take care of the Coherence cluster join phase. This dependencies can be found in the following locations:

- <WLS_HOME>/common/deployable-libraries/active-cache-1.0.jar

- <COHERENCE_HOME>/lib/coherence.jar

Finally, you need to write down the access code to the Coherence cache at your Servlet. In the following example, we have a Servlet 3.0 component that access a Coherence cache named "transactions" and prints into the browser output the content (the ammount property) of one specific transaction.

package com.oracle.coherence.demo.activecache;

import java.io.IOException;

import javax.annotation.Resource;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.tangosol.net.NamedCache;

@WebServlet("/demo/specificTransaction")
public class TransactionServletExample extends HttpServlet {
	
	@Resource(mappedName = "transactions") NamedCache transactions;
	
	protected void doGet(HttpServletRequest request,
			HttpServletResponse response) throws ServletException, IOException {
		
		int transId = Integer.parseInt(request.getParameter("transId"));
		Transaction transaction = (Transaction) transactions.get(transId);
		response.getWriter().println("<center>" + transaction.getAmmount() + "</center>");
		
	}

}

Thats it! No more configuration is necessary and you have all set to start producing and getting data to/from Coherence. As you can see in the example code, the Coherence cache are treated as a normal dependency in the Java EE container. The magic happens behind the scenes when the ActiveCache allows your application to join the defined Coherence cluster.

The most interesting thing about this approach is, no matter which type of Coherence cache your are using (Distributed, Partitioned, Replicated, WAN-Remote) for the client application, it is just a simple attribute member of com.tangosol.net.NamedCache type. And its all managed by the Java EE container as an dependency. This means that if you inject the same dependency (the Coherence cache named "transactions") in another Java EE component (JSF managed-bean, Stateless EJB) the cache will be the same. Cool isn't it?

Thanks to the CDI technology, we can extend the same support for non-Java EE standards components like simple POJOs. This means that you are not forced to only use Servlets, EJBs or JSF in order to inject Coherence caches. You can do the same approach for regular POJOs created for you and managed by lightweight containers running inside Oracle WebLogic Server.

Sunday Jul 08, 2012

The Developers Conference 2012: Presentation about CEP & BAM

This year I had the pleasure again of being one of the speakers in the TDC ("The Developers Conference") event. I have spoken in this event for three years from now. This year, the main theme of the SOA track was EDA ("Event-Driven Architecture") and I decided to delivery a comprehensive presentation about one of my preferred personal subjects: Real-time using Complex Event Processing. The theme of the presentation was "Business Intelligence in Real-time using CEP & BAM" and I would like to share here the presentation that I have done. The material is in Portuguese since was an Brazilian event that happened in São Paulo.

Once my presentation has a lot of videos, I decided to share the material as a Youtube video, so you can pause, rewind and play again how many times you want it. I strongly recommend you that before starting watching the video, you change the video quality settings to 1080p in High Definition.

Saturday Jun 23, 2012

Calculating the Size (in Bytes and MB) of a Oracle Coherence Cache

The concept and usage of data grids are becoming very popular in this days since this type of technology are evolving very fast with some cool lead products like Oracle Coherence. Once for a while, developers need an programmatic way to calculate the total size of a specific cache that are residing in the data grid. In this post, I will show how to accomplish this using Oracle Coherence API. This example has been tested with 3.6, 3.7 and 3.7.1 versions of Oracle Coherence.

To start the development of this example, you need to create a POJO ("Plain Old Java Object") that represents a data structure that will hold user data. This data structure will also create an internal fat so I call that should increase considerably the size of each instance in the heap memory. Create a Java class named "Person" as shown in the listing below.

package com.oracle.coherence.domain;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Random;

@SuppressWarnings("serial")
public class Person implements Serializable {
	
	private String firstName;
	private String lastName;
	private List<Object> fat;
	private String email;
	
	public Person() {
		generateFat();
	}
	
	public Person(String firstName, String lastName,
			String email) {
		setFirstName(firstName);
		setLastName(lastName);
		setEmail(email);
		generateFat();
	}
	
	private void generateFat() {
		fat = new ArrayList<Object>();
		Random random = new Random();
		for (int i = 0; i < random.nextInt(18000); i++) {
			HashMap<Long, Double> internalFat = new HashMap<Long, Double>();
			for (int j = 0; j < random.nextInt(10000); j++) {
				internalFat.put(random.nextLong(), random.nextDouble());
			}
			fat.add(internalFat);
		}
	}
	
	public String getFirstName() {
		return firstName;
	}

	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}

	public String getLastName() {
		return lastName;
	}

	public void setLastName(String lastName) {
		this.lastName = lastName;
	}

	public String getEmail() {
		return email;
	}

	public void setEmail(String email) {
		this.email = email;
	}

}

Now let's create a Java program that will start a data grid into Coherence and will create a cache named "People", that will hold people instances with sequential integer keys. Each person created in this program will trigger the execution of a custom constructor created in the People class that instantiates an internal fat (the random amount of data generated to increase the size of the object) for each person. Create a Java class named "CreatePeopleCacheAndPopulateWithData" as shown in the listing below.

package com.oracle.coherence.demo;

import com.oracle.coherence.domain.Person;
import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;

public class CreatePeopleCacheAndPopulateWithData {

	public static void main(String[] args) {
		
		// Asks Coherence for a new cache named "People"...
		NamedCache people = CacheFactory.getCache("People");
		
		// Creates three people that will be putted into the data grid. Each person
		// generates an internal fat that should increase its size in terms of bytes...
		Person pessoa1 = new Person("Ricardo", "Ferreira", "ricardo.ferreira@example.com");
		Person pessoa2 = new Person("Vitor", "Ferreira", "vitor.ferreira@example.com");
		Person pessoa3 = new Person("Vivian", "Ferreira", "vivian.ferreira@example.com");
		
		// Insert three people at the data grid...
		people.put(1, pessoa1);
		people.put(2, pessoa2);
		people.put(3, pessoa3);
		
		// Waits for 5 minutes until the user runs the Java program
		// that calculates the total size of the people cache...
		try {
			System.out.println("---> Waiting for 5 minutes for the cache size calculation...");
			Thread.sleep(300000);
		} catch (InterruptedException ie) {
			ie.printStackTrace();
		}
		
	}

}

Finally, let's create a Java program that, using the Coherence API and JMX, will calculate the total size of each cache that the data grid is currently managing. The approach used in this example was retrieve every cache that the data grid are currently managing, but if you are interested on an specific cache, the same approach can be used, you should only filter witch cache will be looked for. Create a Java class named "CalculateTheSizeOfPeopleCache" as shown in the listing below.

package com.oracle.coherence.demo;

import java.text.DecimalFormat;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;

import javax.management.MBeanServer;
import javax.management.MBeanServerFactory;
import javax.management.ObjectName;

import com.tangosol.net.CacheFactory;

public class CalculateTheSizeOfPeopleCache {
	
	@SuppressWarnings({ "unchecked", "rawtypes" })
	private void run() throws Exception {
		
        // Enable JMX support in this Coherence data grid session...
	System.setProperty("tangosol.coherence.management", "all");
		
        // Create a sample cache just to access the data grid...
	CacheFactory.getCache(MBeanServerFactory.class.getName());
		
	// Gets the JMX server from Coherence data grid...
	MBeanServer jmxServer = getJMXServer();
        
        // Creates a internal data structure that would maintain
	// the statistics from each cache in the data grid...
	Map cacheList = new TreeMap();
        Set jmxObjectList = jmxServer.queryNames(new ObjectName("Coherence:type=Cache,*"), null);
        for (Object jmxObject : jmxObjectList) {
            ObjectName jmxObjectName = (ObjectName) jmxObject;
            String cacheName = jmxObjectName.getKeyProperty("name");
            if (cacheName.equals(MBeanServerFactory.class.getName())) {
            	continue;
            } else {
            	cacheList.put(cacheName, new Statistics(cacheName));
            }
        }
        
        // Updates the internal data structure with statistic data
        // retrieved from caches inside the in-memory data grid...
        Set<String> cacheNames = cacheList.keySet();
        for (String cacheName : cacheNames) {
            Set resultSet = jmxServer.queryNames(
            	new ObjectName("Coherence:type=Cache,name=" + cacheName + ",*"), null);
            for (Object resultSetRef : resultSet) {
                ObjectName objectName = (ObjectName) resultSetRef;
                if (objectName.getKeyProperty("tier").equals("back")) {
                    int unit = (Integer) jmxServer.getAttribute(objectName, "Units");
                    int size = (Integer) jmxServer.getAttribute(objectName, "Size");
                    Statistics statistics = (Statistics) cacheList.get(cacheName);
                    statistics.incrementUnit(unit);
                    statistics.incrementSize(size);
                    cacheList.put(cacheName, statistics);
                }
            }
        }
        
        // Finally... print the objects from the internal data
        // structure that represents the statistics from caches...
        cacheNames = cacheList.keySet();
        for (String cacheName : cacheNames) {
            Statistics estatisticas = (Statistics) cacheList.get(cacheName);
            System.out.println(estatisticas);
        }
        
    }

    public MBeanServer getJMXServer() {
        MBeanServer jmxServer = null;
        for (Object jmxServerRef : MBeanServerFactory.findMBeanServer(null)) {
            jmxServer = (MBeanServer) jmxServerRef;
            if (jmxServer.getDefaultDomain().equals(DEFAULT_DOMAIN) || DEFAULT_DOMAIN.length() == 0) {
                break;
            }
            jmxServer = null;
        }
        if (jmxServer == null) {
            jmxServer = MBeanServerFactory.createMBeanServer(DEFAULT_DOMAIN);
        }
        return jmxServer;
    }
	
    private class Statistics {
		
        private long unit;
        private long size;
        private String cacheName;
		
	public Statistics(String cacheName) {
            this.cacheName = cacheName;
        }

        public void incrementUnit(long unit) {
            this.unit += unit;
        }

        public void incrementSize(long size) {
            this.size += size;
        }

        public long getUnit() {
            return unit;
        }

        public long getSize() {
            return size;
        }

        public double getUnitInMB() {
            return unit / (1024.0 * 1024.0);
        }

        public double getAverageSize() {
            return size == 0 ? 0 : unit / size;
        }

        public String toString() {
            StringBuffer sb = new StringBuffer();
            sb.append("\nCache Statistics of '").append(cacheName).append("':\n");
            sb.append("   - Total Entries of Cache -----> " + getSize()).append("\n");
            sb.append("   - Used Memory (Bytes) --------> " + getUnit()).append("\n");
            sb.append("   - Used Memory (MB) -----------> " + FORMAT.format(getUnitInMB())).append("\n");
            sb.append("   - Object Average Size --------> " + FORMAT.format(getAverageSize())).append("\n");
            return sb.toString();
        }

    }
	
    public static void main(String[] args) throws Exception {
	new CalculateTheSizeOfPeopleCache().run();
    }
	
    public static final DecimalFormat FORMAT = new DecimalFormat("###.###");
    public static final String DEFAULT_DOMAIN = "";
    public static final String DOMAIN_NAME = "Coherence";

}

I've commented the overall example so, I don't think that you should get into trouble to understand it. Basically we are dealing with JMX. The first thing to do is enable JMX support for the Coherence client (ie, an JVM that will only retrieve values from the data grid and will not integrate the cluster) application. This can be done very easily using the runtime "tangosol.coherence.management" system property. Consult the Coherence documentation for JMX to understand the possible values that could be applied. The program creates an in memory data structure that holds a custom class created called "Statistics".

This class represents the information that we are interested to see, which in this case are the size in bytes and in MB of the caches. An instance of this class is created for each cache that are currently managed by the data grid. Using JMX specific methods, we retrieve the information that are relevant for calculate the total size of the caches. To test this example, you should execute first the CreatePeopleCacheAndPopulateWithData.java program and after the CreatePeopleCacheAndPopulateWithData.java program. The results in the console should be something like this:

2012-06-23 13:29:31.188/4.970 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence.xml"
2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2012-06-23 13:29:31.266/5.048 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 3.6.0.4 Build 19111
 Grid Edition: Development mode
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

2012-06-23 13:29:33.156/6.938 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded Reporter configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/reports/report-group.xml"
2012-06-23 13:29:33.500/7.282 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/coherence-cache-config.xml"
2012-06-23 13:29:35.391/9.173 Oracle Coherence GE 3.6.0.4 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.177.133:8090 using SystemSocketProvider
2012-06-23 13:29:37.062/10.844 Oracle Coherence GE 3.6.0.4 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) joined cluster "cluster:0xC4DB" with senior Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2)
2012-06-23 13:29:37.172/10.954 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Cluster with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Management with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistributedCache with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0xC4DB

Group{Address=224.3.6.0, Port=36000, TTL=4}

MasterMemberSet
  (
  ThisMember=Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle)
  OldestMember=Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith)
  ActualMemberSet=MemberSet(Size=2, BitSetCount=2
    Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith)
    Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle)
    )
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0, BitSetCount=0
    )
  )

TcpRing{Connections=[1]}
IpMonitor{AddressListSize=0}

2012-06-23 13:29:37.891/11.673 Oracle Coherence GE 3.6.0.4 <D5> (thread=Invocation:Management, member=2): Service Management joined the cluster with senior service member 1
2012-06-23 13:29:39.203/12.985 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache, member=2): Service DistributedCache joined the cluster with senior service member 1
2012-06-23 13:29:39.297/13.079 Oracle Coherence GE 3.6.0.4 <D4> (thread=DistributedCache, member=2): Asking member 1 for 128 primary partitions

Cache Statistics of 'People':
   - Total Entries of Cache -----> 3
   - Used Memory (Bytes) --------> 883920
   - Used Memory (MB) -----------> 0.843
   - Object Average Size --------> 294640

I hope that this post could save you some time when calculate the total size of Coherence cache became a requirement for your high scalable system using data grids. See you!

Wednesday Feb 22, 2012

Oracle Coherence: First Steps Using Clusters and Basic API Usage

When we talk about distributed data grids, elastic caching platforms and in-memory caching technologies, Oracle Coherence is the first option that came in our minds. This happens because Oracle Coherence is the oldest and most mature implementation of data grids, creating successful histories across the world. It is Oracle Coherence the implementation with the bigger number of use cases in the world. Since it's aquisition by Oracle in 2007, the product has been enhanced with powerful enterprise features to remain it's position of the "better of the world" against it's competitors.


This article will help you given your first steps with Oracle Coherence. I have prepared a sequence of three videos that will guide you in the process of creating a data grid cluster, managing data using both Java API and CohQL ("Coherence Query Language") to finally test the reliability and fail over features of the product.

Oracle allows you to download and use any of your products for free, if you are interested in learning or testing the technology. Different  of other vendors that put you first in contact with a sales representative or simply not put their software available for download, Oracle encourages you to use the technology so you gain confidence with it. You can download Oracle Coherence at this link. If you don't possess a credential in the OTN ("Oracle Technology Network"), you will be asked to create one.

If you have a powerful computer and a fastest internet bandwidth, change the video quality settings to 1080p HD ("High Definition"). It will improve considerably the quality of your viewing.

About

Ricardo Ferreira is just a regular person that lives in Brazil and is passionate for technology, movies and his whole family. Currently is working at Oracle Corporation.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today