Friday Jan 31, 2014

Handling Large Objects (e.g., Audio/Video files) in Oracle NoSQL DB

Recently, I have worked in a project that required using Oracle NoSQL Database to store complex structures like audio and video files, and make those files rapidly accessible through an web application. When I started the development of file retrieval from the KVStore, I got surprised about how bad documented this scenario is, specially if you are dealing with LOB's. So I decided to document examples of how I got the scenario solved.

Writing LOB Values into KVStore

So you have a file stored in the file system and you would like to store it in the KVStore to enable further retrieval? The best way to accomplish this is to extract the InputStream from the file and put its value into KVStore like this:

package com.oracle.bigdata.nosql;

import java.io.IOException;
import java.io.InputStream;
import java.util.List;
import java.util.concurrent.TimeUnit;

import oracle.kv.Durability;
import oracle.kv.KVStore;
import oracle.kv.Key;

public class HowToStoreFilesFromFileSystem {

	// The reference below should be injected
	// by some runtime IoC mechanism or initialized
	// through constructor or setter methods...
	
	private KVStore store;

	public void storeValue(List<String> majorKeys, List<String> minorKeys,
			InputStream inputStream) {

		Key key = null;

		try {

			key = Key.createKey(majorKeys, minorKeys);
			store.putLOBIfAbsent(key, inputStream, Durability.COMMIT_NO_SYNC,
					10, TimeUnit.SECONDS);

		} catch (IOException ex) {

			throw new RuntimeException(
					"Error trying to store files into NoSQL Database", ex);

		}

	}

}

Reading LOB Values from the KVStore

Once you are ready to retrieve LOB values from the KVStore, perhaps wondering about how to rebuild the file in the file system, you can proceed this way:

package com.oracle.bigdata.nosql;

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;
import java.util.concurrent.TimeUnit;

import oracle.kv.Consistency;
import oracle.kv.KVStore;
import oracle.kv.Key;
import oracle.kv.lob.InputStreamVersion;

public class HowToLoadFilesFromKVStore {

	// The reference below should be injected
	// by some runtime IoC mechanism or initialized
	// through constructor or setter methods...

	private KVStore store;

	public File buildFileFromKVStore(List<String> majorKeys,
			List<String> minorKeys, String fileName) {

		Key key = null;
		InputStreamVersion isVersion = null;

		int _byte = 0;
		File file = null;
		InputStream inputStream = null;
		BufferedInputStream bis = null;
		ByteArrayOutputStream baos = null;
		FileOutputStream fos = null;

		try {

			key = Key.createKey(majorKeys, minorKeys);
			isVersion = store.getLOB(key, Consistency.NONE_REQUIRED, 10,
					TimeUnit.SECONDS);

			if (isVersion != null) {

				inputStream = isVersion.getInputStream();
				bis = new BufferedInputStream(inputStream);
				baos = new ByteArrayOutputStream();

				while ((_byte = bis.read()) != -1) {
					baos.write(_byte);
				}

				file = new File(fileName);
				fos = new FileOutputStream(file);
				fos.write(baos.toByteArray());
				fos.flush();

			}

		} catch (Exception ex) {

			throw new RuntimeException(
					"Error trying to get files from NoSQL Database", ex);

		} finally {

			if (fos != null) {

				try {
					fos.close();
				} catch (IOException ioe) {
					ioe.printStackTrace();
				}

			}

		}

		return file;

	}

}

In the example above, the file is completely rebuilt in the file system to be accessed from any application. Note that once you have the BufferedInputStream in place, you don't actually need to rebuild the file completely in the file system in order to play with it. Perhaps you can transfer all those bytes to an mechanism that is perfectly capable to handle that stream in-memory, taking full advantage of handling data on-demand based on the streaming API of LOB's.

Saturday Jan 18, 2014

Capturing Business Events for Future Playback through Oracle Event Processing Platform

In the heart of OEP application development there is the understanding of the business events anatomy, the knowledge about its structure, frequency about when they happen, its relationship with other types of events and of course, its volume. Business events play a critical role in event-driven applications because they are the input for the problem that you are trying to solve. Without the business events, an event-driven application would be like a car engine without fuel.

When you are designing an EPN (Event Processing Network) of an OEP application, you are actually expressing the way about how the business events will be received, processed and perhaps transformed in some kind of output. Like any other technology, the behavior of this output depends heavily of which data you have used as input. For one specific volume, the EPN could work, for another volume, maybe not. For one specific ordering, the EPN could work, with another ordering, it could not. It will only work when the right set of events is being used.

Its very common the situation when you deploy the first version of your OEP application and after a while users start complaining about undesired results. This happens because no matter how many times you tested your application, you will never get close to the mass of events present in the customer environment. The ordering, volume, size, encoding and frequency of the events found in the customer environment is always different of that mass of events that you used during functional testing. If this situation happens to you, its time to think in a way to record those events and bring them back to the development environment to figure out what is wrong.

This article aims to explore the record and playback feature of Oracle Event Processing Platform. Through the examples shown here, you will be able to record and replay a mass of business events to simulate some behavior that maybe you have forgot to capture at your EPN. This feature is also pretty cool for prototyping scenarios. Instead of bringing with a you a simulator of events, you can just replay a recorded set of events and present your OEP application.

Configuring the OEP Default Provider

In order to the record and playback features work, OEP needs to relies in a repository in which events will be stored. To make the developer work easier, OEP brings out-of-the-box an default provider for this repository based on the Berkeley DB technology. Located inside of every OEP domain, there is an instance of Berkeley DB (aka "BDB") which can be adjusted through the configuration file of the domain. In most cases, you don't need to change anything before start using the record and playback features, but there is one catch that you should be aware of.

BDB stores each recorded event as an entry in this repository. Each entry has a default size of 1KB (1000 bytes) pre-defined in the moment that you create the domain. The information about the size of each entry is important in terms of performance and how BDB will growth its storage. So if you know the average size of your events, it is a good idea to inform the size of each entry in the domain configuration file. For instance, suppose that we are dealing with a mass of events with 32KB of average size for each event. You can adjust BDB editing the domain configuration file located in <DOMAIN_HOME>/<SERVER_NAME>/config/config.xml:

As you can see in the example above, you need to adjust the cache-size property of the bdb-config section. This property is measured in terms of bytes. The value of this property represents the amount of memory that BDB need to allocate for each entry.

Recording the Events

In order to record events, you need to access the OEP Visualizer Console of your domain. Log in onto the console and navigate to the EPN of the OEP application that you want to record events.

In the EPN of your OEP application, you can right-click any of the elements inside, no matter if it is an adapter, channel, processor or an cache. But make totally sense to record the events from the adapters perspective. They are the genesis of your EPN, and all the events that will flow through it will come from the adapters. Right-click the input adapter of your EPN and choose the "Record Event" option:

You will open the record tab of the adapter. In the bottom of the page, in a section named "Change Recording Schedule", you will find an button named "Add". Click on it.

Adding a new record configuration scheme enables the UI for full-filling some fields. Enter in the "DataSet Name" field the name of this recording session. This field will be used to construct the physical repository of BDB. You are also required to set the "Selected Event Type List" field. You need to describe in this field which incoming events you are expecting to record. After this you can click in the "Save" button.

We are all set to start recording. Double-check if there are events being sent to your OEP application and click in the "Start" button. Doing this will start a recording session and all events received by the adapter will be recorded. At any point of time, you can click in the "Stop" button to stop the recording session.

Playing Back the Events

To start the playback of any recorded session, you need to go back to the EPN screen and right-click the channel that will receive the events. This will bring you to the playback tab of the selected channel. Just like the recording session, you need to add one playback configuration scheme clicking in the "Add" button.

The group of fields to be full-filled is almost the same. Pay attemption that in the "DataSet Name" field you need to inform the exactly name that you informed in the recording session. Since you can have multiple recording sessions under different names, when you playback you need to tell OEP which session you would like to playback. As such, you need to inform which events you would like to playback, filling the "Selected Event Type List" field.

Optionally, you can set some additional parameters related to the behavior of the playback. In the "Change Playback Schedule Parameters" you can find fields to define which interval of time you want to playback. This is useful when you recorded hours/days of event streaming and you would like to reproduce just one piece of it. You can also define the speed in which events will be injected into the channel, useful to test how your EPN behave in a high speed velocity situation. Finally, you can set if the playback will be repeated forever. Since the recording session is finite, you can playback it as many times you want, or can just set the "Repeat" field as true to automatically restart from the beginning when the recorded streaming ends.

When you are done, save the playback configuration scheme and click in the "Start" button to start the playback session according your settings. An "Playing..." message will be flipping in the screen during the playback session. At any time, you can stop the playback clicking in the "Stop" button.

All the configuration you made so far is stored in the OEP domain along with your application so don't be shy to restart the OEP server and potentially lose your job: all the settings will remain there when the server come back to life. This statement is true as long you don't undeploy your application from the OEP domain. If you undeploy your application, all the settings will be lost. If you want make those settings permanent, you can set those into your application. Access this link to learn how to configure a component to record events, and this link to learn how to configure a component to playback events.

Wednesday Jan 08, 2014

Service Enablement of CORBA Applications through Oracle Service Bus

An quick overview in the definition of an ESB will tell us that one of its main responsibilities is among other things, the enablement of existing systems to provide new fresh services, using the same or maybe different protocols and/or contracts. This is when the ESB make magical things happen, virtualizing existing services and making them available for the outside world, abstracting from the applications that will consume the service details about the underlying system.

With this statement in mind, it is reasonable to think that an good ESB must be able to handle different types of systems and technologies found in legacy systems, no matter if this system was built last year, five years ago or even in the last decade. Those systems represents the assets of an organization in terms of their business building blocks, so there is a huge chance that those systems carries a substantial number of business services that could be leveraged by an SOA initiative.

CORBA is a distributed technology very powerful, that was pretty popular in the 90's and the beginning of 2000 year. Many industries that demands an robust infrastructure to handle their business transactions, critical by nature and extreme sensitive in terms of performance and reliability, relied on CORBA as technology implementation. It is pretty common to find communications companies like internet providers, mobile operators and pre-paid services chain that built its foundation (also known as engineering services) on top of CORBA systems.

This article will show how to enable CORBA systems through OSB, the ESB implementation from Oracle that is part of the SOA Suite. Through the steps showed here, you will be able to leverage existing CORBA systems and expose that business logic (defined as CORBA objects) in different transports, protocols and contracts, making the reuse of that business logic both possible and viable. This article will not cover any CORBA specific ORB to make the techniques available here reproducible in different contexts.

The Interface Definition Language

The definition of any CORBA object is written in an neutral description language called IDL, acronym of Interface Definition Language. For this example, I will consider that OSB will service enable an functionality that sends SMS messages, and this functionality is currently implemented as an object of an CORBA system. The IDL of this object is described below:

module corbaEnablementExample
{

    struct Message
    {

        string content;
        string phoneId;

    };

    interface SMSGateway
    {

        oneway void send(in Message message);

    };

};

As you can see, this is a very simple object that accepts an message as main parameter and the message has attributes that represents the content to be sent as an SMS message, and the mobile phone number that will receive the content.

The CORBA Server Application

It does not matter for the didactic of this article in which programming language the server part of the CORBA application will be implemented. What really matters is which ORB the CORBA server application will register its implementation stub. To illustrate the example, lets suppose that this CORBA object is implemented in Java.

package corbaEnablementApplication;

import org.omg.CORBA.ORB;
import org.omg.CORBA.Object;
import org.omg.CosNaming.NameComponent;
import org.omg.CosNaming.NamingContextExt;
import org.omg.CosNaming.NamingContextExtHelper;
import org.omg.PortableServer.POA;
import org.omg.PortableServer.POAHelper;

import corbaEnablementExample.SMSGateway;
import corbaEnablementExample.SMSGatewayHelper;

public class ServerApplication {

	public static void main(String[] args) {
		
		ORB orb = null;
		Object tmp = null;
		POA rootPOA = null;
		SMSGatewayImpl impl = null;
		Object ref, objRef = null;
		SMSGateway href = null;
		NamingContextExt ncRef = null;
		NameComponent path[] = null;
		
		try {
			
			System.setProperty("org.omg.CORBA.ORBInitialHost", "soa.suite.machine");
			System.setProperty("org.omg.CORBA.ORBInitialPort", "8001");
			orb = ORB.init(args, null);
			
			tmp = orb.resolve_initial_references("RootPOA");
			rootPOA = POAHelper.narrow(tmp);
			rootPOA.the_POAManager().activate();
			
			impl = new SMSGatewayImpl();
			ref = rootPOA.servant_to_reference(impl);
			href = SMSGatewayHelper.narrow(ref);
			
			objRef = orb.resolve_initial_references("NameService");
			ncRef = NamingContextExtHelper.narrow(objRef);
			
			path = ncRef.to_name("sms-gateway");
			ncRef.rebind(path, href);
			
			System.out.println("----------------------------------------------------");
			System.out.println("  CORBA Server is Running and waiting for Requests  ");
			System.out.println("----------------------------------------------------");
			
			orb.run();
			
		} catch (Exception ex) {
			ex.printStackTrace();
		}
		
	}

}

The code listing above shows an CORBA server application that connects onto a ORB available on one 8001 TCP/IP port. After retrieve the POA from the ORB, it get access to the naming service that will be used to register the object implementation. Finally, the application binds the object implementation under the name of "sms-gateway", the name which the CORBA object will be known from the outside world. In order to test this CORBA server application, start an ORB under the port 8001 and execute the program using one JVM. If you don't have any commercial ORB available, you can use the ORB which comes with the JDK. Just enter in the /bin folder of your JDK and type:

orbd -ORBInitialHost soa.suite.machine -ORBInitialPort 8001

To check if this remote object is working properly, you need to write an CORBA client application. Here is an example of an CORBA client written upon the same IDL interface which the server was written:

package corbaEnablementApplication;

import java.util.Random;
import java.util.UUID;

import org.omg.CORBA.ORB;
import org.omg.CORBA.Object;
import org.omg.CosNaming.NamingContextExt;
import org.omg.CosNaming.NamingContextExtHelper;

import corbaEnablementExample.Message;
import corbaEnablementExample.SMSGateway;
import corbaEnablementExample.SMSGatewayHelper;

public class ClientApplication {
	
	private static final Random random = new Random();

	public static void main(String[] args) {
		
		ORB orb = null;
		Object objRef = null;
		NamingContextExt ncRef = null;
		SMSGateway smsGateway = null;
		
		try {
			
			System.setProperty("org.omg.CORBA.ORBInitialHost", "soa.suite.machine");
			System.setProperty("org.omg.CORBA.ORBInitialPort", "8001");
			orb = ORB.init(args, null);
			
			objRef = orb.resolve_initial_references("NameService");
			ncRef = NamingContextExtHelper.narrow(objRef);
			
			smsGateway = SMSGatewayHelper.narrow(ncRef.resolve_str("sms-gateway"));
			Message message = createNewRandomMessage();
			smsGateway.send(message);
			
		} catch (Exception ex) {
			ex.printStackTrace();
		}
		
	}

	private static Message createNewRandomMessage() {
		String content = UUID.randomUUID().toString();
		String phoneId = String.valueOf(random.nextLong());
		Message message = new Message(content, phoneId);
		return message;
	}

}

The Business Services Layer

In order to OSB get access to the remote object, it is necessary to create an mechanism that can translate the IIOP protocol (the protocol used in pure CORBA systems) for one protocol that OSB can understand, which could be RMI/IIOP or pure RMI. To accomplish that, the best way is to implement the wrapper pattern. Write down one EJB 3.0 service that encapsulates the CORBA remote object, and delegates its service calls to this object. The interface for this EJB 3.0 service should be something simpler like this:

package com.oracle.fmw.soa.osb.corba;

public interface SMSGateway {
	
	public void sendMessage(String content, String phoneId);

}

The implementation of this EJB 3.0 service should perform a job similar to the CORBA client application described previously, but quite different in terms of how it connect to an ORB:

package com.oracle.fmw.soa.osb.corba;

import javax.annotation.PostConstruct;
import javax.annotation.Resource;
import javax.ejb.Remote;
import javax.ejb.Stateless;

import org.omg.CORBA.ORB;
import org.omg.CORBA.Object;
import org.omg.CosNaming.NamingContextExt;
import org.omg.CosNaming.NamingContextExtHelper;

import corbaEnablementExample.Message;
import corbaEnablementExample.SMSGatewayHelper;

@Remote(value = SMSGateway.class)
@Stateless(name = "SMSGateway", mappedName = "SMSGateway")
public class SMSGatewayImpl implements SMSGateway {
	
	private corbaEnablementExample.SMSGateway smsGateway;
	
	@Resource(name = "namingService")
	private String namingService;
	
	@Resource(name = "objectName")
	private String objectName;
	
	@PostConstruct
	@SuppressWarnings("unused")
	private void retrieveStub() {
		
		ORB orb = null;
		Object objRef = null;
		NamingContextExt ncRef = null;
		
		try {
			
			orb = ORB.init();
			objRef = orb.resolve_initial_references(namingService);
			ncRef = NamingContextExtHelper.narrow(objRef);
			smsGateway = SMSGatewayHelper.narrow(ncRef.resolve_str(objectName));
			
		} catch (Exception ex) {
			
			throw new RuntimeException("EJB wrapper failed in the retrieval of the CORBA stub.", ex);
			
		}
		
	}

	@Override
	public void sendMessage(String content, String phoneId) {
		
		smsGateway.send(new Message(content, phoneId));
			
	}

}

The code is very similar to the CORBA client application showed before, with one important difference: it has no information about which ORB to connect. In this case, the EJB will reside in on WebLogic JVM. Each WebLogic JVM has one ORB implementation out-of-the-box. So when you write EJB objects that will wrap-up CORBA remote objects, you don't need to worry about which ORB to use. WebLogic it is already an ORB.


Note: as explained before, this article will not enter in details of any commercial ORB available, in defense of the clarity and didactic of the article. But keep in mind that the steps shown here for the stub retrieval can be quite different if you are using another ORB. If you are using Borland VisiBroker for instance, there is a unique way to access the ORB which is using an service called "Smart Agent", which dynamically finds another objects in the network. IONA Orbix has another unique way to connect to an ORB, which is by the use of the domain configuration location of the Orbix network.


Create one WebLogic domain and execute one or more WebLogic managed servers, and re-run the CORBA server application again. Remember that now, the CORBA server application should point to the WebLogic port since the ORB now should be the one available in the WebLogic subsystem. If you check the JNDI tree of the WebLogic JVM, you should see something like this:

This means that the remote CORBA object was properly registered in the CosNaming service available in the WebLogic ORB. Package the EJB 3.0 implementation into a JAR or an EAR and deploy it in the same WebLogic JVM that the CORBA remote object was registered. Now we have everything in place to start the development of the OSB project. For the purposes of this article, I will assume that the EJB 3.0 object is available under the following JNDI name: "SMSGateway#com.oracle.fmw.soa.osb.corba.SMSGateway".

The OSB Project

In the OSB side, all you have to do is to create an business service that points to one or more endpoints of the EJB 3.0 that is running in the one or more servers of the WebLogic domain. In order to accomplish that, you will need to teach OSB about how to communicate with this foreign WebLogic domain. This is done creating an JNDI provider for the OSB configuration scheme:


OSB also needs to access the EJB 3.0 interfaces (and any other helper classes) to instantiate client proxies, so you need to package all the EJB 3.0 artifacts (except of course from the enterprise bean implementation) and deploy it onto your OSB project:

Now we have everything in place. It is time to create the business service that will point to the EJB 3.0 wrapper. Create one business service and set its service type to "Transport Typed":

Configure the business service protocol as "EJB" and set its endpoint URI to the prefix "ejb:" plus the name of the JNDI provider and plus the JNDI name of the EJB 3.0:

Finally, you need to configure the client interface of the EJB 3.0 endpoint in the business service configuration page. Check the "EJB 3.0" checkbox and choose from the drop-down list which interface will be used for message communication.

Finish the creation of the business service and save the changes. You can now test your business service using the testing tool available on OSB:

After making an request to the business service using the OSB testing tool, you can check the CORBA server application log to see the results of this invocation. Here is an example:

With the business service in place, you can easily create one or more proxy services to access the remote CORBA object with minimal effort. For the OSB perspective, it is all about routing messages to the business service that you created, making the fact that this business service is a CORBA remote object really irrelevant.

No matter what will be your use case, now you have the CORBA remote object available in OSB for virtually anything. You can expose it directly using one of the available transports, you can forward messages for it in the middle of your pipeline, you can use it as enrichment mechanism using service callouts or you can just use the business service as one of the choices of an dynamic routing. If you choose to expose this business service into a new protocol, you can play with SOAP, REST, HTTP, JMS, Email, Tuxedo, File and FTP with zero-coding. OSB will take care of the protocol translation during messages exchanges.

You can download the project artifacts created in this article here.

Monday Oct 28, 2013

Oracle Coherence, Split-Brain and Recovery Protocols In Detail

This article provides a high level conceptual overview of Split-Brain scenarios in distributed systems. It will focus on a specific example of cluster communication failure and recovery in Oracle Coherence. This includes a discussion on the witness protocol (used to remove failed cluster members) and the panic protocol (used to resolve Split-Brain scenarios).

Note that the removal of cluster members does not necessarily indicate a Split-Brain condition. Oracle Coherence does not (and cannot) detect a Split-Brain as it occurs, the condition is only detected when cluster members that previously lost contact with each other regain contact.

Cluster Topology and Configuration

In order to create an good didactic for the article, let's assume a cluster topology and configuration. In this example we have a six member cluster, consisting of one JVM on each physical machine. The member IDs are as follows:

Member ID  IP Address
 1  10.149.155.76
 2  10.149.155.77
 3  10.149.155.236
 4  10.149.155.75
 5  10.149.155.79
 6  10.149.155.78

Members 1, 2, and 3 are connected to a switch, and members 4, 5, and 6 are connected to a second switch. There is a link between the two switches, which provides network connectivity between all of the machines.


Member 1 is the first member to join this cluster, thus making it the senior member. Member 6 is the last member to join this cluster. Here is a log snippet from Member 6 showing the complete member set:

2010-02-26 15:27:57.390/3.062 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=6): Started DefaultCacheServer...

SafeCluster: Name=cluster:0xDDEB

Group{Address=224.3.5.3, Port=35465, TTL=4}

MasterMemberSet
  (
  ThisMember=Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  OldestMember=Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=6, BitSetCount=2
    Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
    Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
    Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
    Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
    Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer)
    Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
    )
  RecycleMillis=120000
  RecycleSet=MemberSet(Size=0, BitSetCount=0
    )
  )

At approximately 15:30, the connection between the two switches is severed:


Thirty seconds later (the default packet timeout in development mode) the logs indicate communication failures across the cluster. In this example, the communication failure was caused by a network failure. In a production setting, this type of communication failure can have many root causes, including (but not limited to) network failures, excessive GC, high CPU utilization, swapping/virtual memory, and exceeding maximum network bandwidth. In addition, this type of failure is not necessarily indicative of a split brain. Any communication failure will be logged in this fashion. Member 2 logs a communication failure with Member 5:

2010-02-26 15:30:32.638/196.928 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  )

The Coherence clustering protocol (TCMP) is a reliable transport mechanism built on UDP. In order for the protocol to be reliable, it requires an acknowledgement (ACK) for each packet delivered. If a packet fails to be acknowledged within the configured timeout period, the Coherence cluster member will log a packet timeout (as seen in the log message above). When this occurs, the cluster member will consult with other members to determine who is at fault for the communication failure. If the witness members agree that the suspect member is at fault, the suspect is removed from the cluster. If the witnesses unanimously disagree, the accuser is removed. This process is known as the witness protocolSince Member 2 cannot communicate with Member 5, it selects two witnesses (Members 1 and 4) to determine if the communication issue is with Member 5 or with itself (Member 2). However, Member 4 is on the switch that is no longer accessible by Members 1, 2 and 3; thus a packet timeout for member 4 is recorded as well:

2010-02-26 15:30:35.648/199.938 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  )

Member 1 has the ability to confirm the departure of member 4, however Member 6 cannot as it is also inaccessible. At the same time, Member 3 sends a request to remove Member 6, which is followed by a report from Member 3 indicating that Member 6 has departed the cluster:

2010-02-26 15:30:35.706/199.996 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=2): MemberLeft request for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
2010-02-26 15:30:35.709/199.999 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=2): MemberLeft notification for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)

The log for Member 3 determines how Member 6 departed the cluster:

2010-02-26 15:30:35.161/191.694 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=3): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
  )
2010-02-26 15:30:35.165/191.698 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=3): Member departure confirmed by MemberSet(Size=2, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer)
  ); removing Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)

In this case, Member 3 happened to select two witnesses that it still had connectivity with (Members 1 and 2) thus resulting in a simple decision to remove Member 6.


Given the departure of Member 6, Member 2 is left with a single witness to confirm the departure of Member 4:

2010-02-26 15:30:35.713/200.003 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Member departure confirmed by MemberSet(Size=1, BitSetCount=2
  Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer)
  ); removing Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)

In the meantime, Member 4 logs a missing heartbeat from the senior member. This message is also logged on Members 5 and 6.

2010-02-26 15:30:07.906/150.453 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group.

Next, Member 4 logs a TcpRing failure with Member 2, thus resulting in the termination of Member 2:

2010-02-26 15:30:21.421/163.968 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=Cluster, member=4): TcpRing: Number of socket exceptions exceeded maximum; last was "java.net.SocketTimeoutException: connect timed out"; removing the member: 2

For quick process termination detection, Oracle Coherence utilizes a feature called TcpRing which is a sparse collection of TCP/IP-based connections between different members in the cluster. Each member in the cluster is connected to at least one other member, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer, only heartbeat communications are sent once a second per each link. If a certain number of exceptions are thrown while trying to re-establish a connection, the member throwing the exceptions is removed from the cluster. Member 5 logs a packet timeout with Member 3 and cites witnesses Members 4 and 6:

2010-02-26 15:30:29.791/165.037 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=PacketPublisher, member=5): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)
by MemberSet(Size=2, BitSetCount=2
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  )
2010-02-26 15:30:29.798/165.044 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=5): Member departure confirmed by MemberSet(Size=2, BitSetCount=2
  Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)
  Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer)
  ); removing Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer)

Eventually we are left with two distinct clusters consisting of Members 1, 2, 3 and Members 4, 5, 6, respectively. In the latter cluster, Member 4 is promoted to senior member.


The connection between the two switches is restored at 15:33. Upon the restoration of the connection, the cluster members immediately receive cluster heartbeats from the two senior members. In the case of Members 1, 2, and 3, the following is logged:

2010-02-26 15:33:14.970/369.066 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=1): The member formerly known as Member(Id=4, Timestamp=2010-02-26 15:30:35.341, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

Likewise for Members 4, 5, and 6:

2010-02-26 15:33:14.343/336.890 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=4): The member formerly known as Member(Id=1, Timestamp=2010-02-26 15:30:31.64, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

This message indicates that a senior heartbeat is being received from members that were previously removed from the cluster, in other words, something that should not be possible. For this reason, the recipients of these messages will initially ignore them. After several iterations of these messages, the existence of multiple clusters is acknowledged, thus triggering the panic protocol to reconcile this situation. When the presence of more than one cluster (i.e. Split-Brain) is detected by a Coherence member, the panic protocol is invoked in order to resolve the conflicting clusters and consolidate into a single cluster. The protocol consists of the removal of smaller clusters until there is one cluster remaining. In the case of equal size clusters, the one with the older Senior Member will survive. Member 1, being the oldest member, initiates the protocol:

2010-02-26 15:33:45.970/400.066 Oracle Coherence GE 12.1.2.0.0 <Warning> (thread=Cluster, member=1): An existence of a cluster island with senior Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) containing 3 nodes have been detected. Since this Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) is the senior of an older cluster island, the panic protocol is being activated to stop the other island's senior and all junior nodes that belong to it.

Member 3 receives the panic:

2010-02-26 15:33:45.803/382.336 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=3): Received panic from senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) caused by Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer)

Member 4, the senior member of the younger cluster, receives the kill message from Member 3:

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.

In turn, Member 4 requests the departure of its junior members 5 and 6:

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.

2010-02-26 15:33:43.343/349.015 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=6): Received a Kill message from a valid Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer); stopping cluster service.

Once Members 4, 5, and 6 restart, they rejoin the original cluster with senior member 1. The log below is from Member 4. Note that it receives a different member id when it rejoins the cluster.

2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 12.1.2.0.0 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service.
2010-02-26 15:33:46.921/369.468 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Service Cluster left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:InvocationService, member=4): Service InvocationService left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=OptimisticCache, member=4): Service OptimisticCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=ReplicatedCache, member=4): Service ReplicatedCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service Management with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service DistributedCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service ReplicatedCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service OptimisticCache with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member 6 left service InvocationService with senior member 5
2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=4): Member(Id=6, Timestamp=2010-02-26 15:33:47.046, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) left Cluster with senior member 4
2010-02-26 15:33:49.218/371.765 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=main, member=n/a): Restarting cluster
2010-02-26 15:33:49.421/371.968 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2010-02-26 15:33:49.625/372.172 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): This Member(Id=5, Timestamp=2010-02-26 15:33:50.499, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xDDEB" with senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2)

Cool isn't it?

Friday Aug 16, 2013

The Perfect Marriage: Oracle Business Rules & Coherence In-Memory Data Grid. High Scalable Business Rules with Extreme Low Latency

The idea of separating business rules from the application logic is by far an old concept. But in the last ten years, what we have seem is that dozen of platforms and technologies has been created to allow this separation of concerns. One of those technologies is BRMS, acronym of Business Rules Management System. The basic idea of one BRMS is to be a repository of rules, governing those rules in such way that they can be created, updated, tested and controlled by an external interface. Part of the BRMS responsibility it is also provide an API (more than one when possible) that allows external applications to interact with the BRMS, allowing those applications to send data over the network, and that data can trigger the execution of zero, one or multiples rules in the BRMS repository. This rule execution occurs outside of those external applications, minimizing their process memory footprint and generating much less CPU overhead since the execution processing of the rules happens in a separated server/cluster. This architecture approach is very powerful, allowing:

  • Rules can be managed (created, updated) outside of the application code
  • Rules can be reused across different applications, no matter their technology
  • Less CPU overhead and smaller memory footprint in the applications
  • More control over rules, auditing of changes and enterprise log history
  • Integration with other IT artifacts like dictionaries, processes, services

With this context in place, we are all agree that the usage of one BRMS is a mandatory approach on every IT architecture due its power, if it were not for the fact that BRMS technologies introduces a lot of overhead in the overall transaction latency. In the middle of the external application that invokes the BRMS to execute rules and the BRMS platform itself, there is the network channel. This means that we must deal with network I/O and their technical implications (serialization, instability, buffering bytes approach) when we send/receive data to/from the BRMS. No matter if the BRMS provides an SOAP API, an REST API or any other TCP/IP based API, the overall transaction latency is compromised by the network overhead.

Another huge problem of BRMS platforms is scalability. When the BRMS platform is first introduced to an architecture, it handles an acceptable number of TPS (Transactions Per Second), which nowadays varies from 1K TPS to 5K TPS. But when other applications starts using the same BRMS platform, or the number of transactions just naturally grows, you can face scenarios when your BRMS platform must deal with 20K TPS or even 100K TPS. What happens when a huge numbers of objects are allocated in the heap space of the Java based server? The memory footprint starts to reach its maximum size and the garbage collector starts to run to reclaim the unused memory and/or redesign the layout space. No matter what job the garbage collector has to do, it will use the entire processing power to runs its job as soon as possible, since the amount of garbage to handle will be huge. This is true for the almost BRMS platforms of the market, no matter if its from one vendor or another. If the BRMS platform are Java based, when those servers JVM reach more than 16 GB of space in average, they starts to face a huge performance problem due garbage collection.

Differently from other architecture designs in which the load is distributed across a cluster, BRMS platforms must handle the entire processing in a single server due a general concept of BRMS platforms known as execution agenda and working memory. All the facts (the data sent as input) are maintained in this agenda in a single server, making the BRMS platform a pinned service, in which they do their job in a singleton fashion. In this situation, when you need to scale, you can introduce series of equally servers, below a corporate load-balancer that instead of distribute load, it divides entire transaction volumes across those servers. Because each server below the load-balancer handle the entire volume by itself, those servers limit concurrency by the number of processors available in their mainboard. If you need more compute power, due lack of concurrency, you are forced to buy a much higher server. Those servers are huge, expensive and costs a lot of money since they need to be big enough in terms of processors to handle thousands of executions simultaneously and completely alone. Not a very smart approach when you considering to handle millions of TPS.

With this situation in mind, it is necessary to design an architecture that would allow business rules execution be distributed across different servers. To achieve this behavior, it is necessary to use another software component that could share data (business entities, fact types, data transfer objects) across different processes, running in the same or different hardware boxes. And more important than that, a software component that would allow transaction latency to be short enough, reducing a lot of milliseconds introduced by network overhead. In other words, this software component must bring data to the unique hardware layer that really doesn't implies in I/O overhead, which is memory.

Recently, in order to deal with this problem and provide for a customer an scalable plus high performance way to use Oracle Business Rules, I designed an solution that solves both problems in a once, without losing the power of separation of concerns provided by BRMS platforms. In-Memory Data Grid technologies like Oracle Coherence has the power of handling massive amounts of data (MB, GB or even TB) completely in-memory. Moreover, this kind of technology has been written from scratch to distribute data across a number of servers, so scalability is never a problem here. When you integrate BRMS with In-Memory Data Grid technologies, you can do both of the two worlds: scalability plus high performance and also extreme low latency. And when I say extreme low latency I mean, sub-milliseconds of latency. Something around less than 650 μs in my tests.

This article will show how to integrate Oracle Business Rules with Oracle Coherence. The steps showed here can be reproduced for a huge number of scenarios, making your investment on Oracle Fusion Middleware (Cloud Application Foundation and/or SOA Suite stack) even more attractive.

The Business Scenario: Automatic Promotions for Bank Customers

Before we move to the implementation details of this article, we need to understand the business scenario used as didactic. We are about to simulate an automatic decision system that create promotions for banking customers based on their profiles. The idea here is let the BRMS platform decide which promotions to offer based on customer profiles that applications send it. This automatic promotion system should allow applications like internet banking sites, mobile applications or kiosk terminals, to present promotions (up-selling/cross-selling) to its final customers.

Building the Solution Domain Model

Let's start the development of the example. The first thing to do is the creation of the domain model, which means that we need to design and implement the business entities that will drive the client-side application execution, as such the business rules. The automatic promotion system will be composed of three entities: promotions, products and customers. A promotion it is something that the bank would offer to the customer, with contextual information about the business value of one or more products, derived from the customer profile. Here is the implementation of the promotion entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

@Portable public class Promotion {
    
    @PortableProperty(0) private String id;
    @PortableProperty(1) private String description;
    
    public Promotion() {}
    
    public Promotion(String id, String description) {
        setId(id);
        setDescription(description);
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }
    
}

A product is something that the customer hire from the bank. Some kind of service or item that make the customer account more valuable to the bank and more attractive to the customer since it is a differentiator. Here is the implementation of the product entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

@Portable public class Product {
    
    @PortableProperty(0) private int id;
    @PortableProperty(1) private String name;
    
    public Product() {}
    
    public Product(int id, String name) {
        setId(id);
        setName(name);
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
    
}

And finally, we need to design the customer entity. The customer entity will be the representation of the person or company that hires one or more products from the bank. Here is the implementation of the customer entity:

package com.acme.architecture.multichannel.domain;

import com.tangosol.io.pof.annotation.Portable;
import com.tangosol.io.pof.annotation.PortableProperty;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Set;

@Portable public class Customer {
    
    @PortableProperty(0) private String ssn;
    @PortableProperty(1) private String firstName;
    @PortableProperty(2) private String lastName;
    @PortableProperty(3) private Date birthDate;
    
    @PortableProperty(4) private String account;
    @PortableProperty(5) private String agency;
    @PortableProperty(6) private double balance;
    @PortableProperty(7) private char custType;
    
    @PortableProperty(8) private Set<Product> products;
    @PortableProperty(9) private List<Promotion> promotions;
    
    public Customer() {}
    
    public Customer(String ssn, String firstName, String lastName,
                    Date birthDate, String account, String agency,
                    double balance, char custType, Set<Product> products) {
        setSsn(ssn);
        setFirstName(firstName);
        setLastName(lastName);
        setBirthDate(birthDate);
        setAccount(account);
        setAgency(agency);
        setBalance(balance);
        setCustType(custType);
        setProducts(products);
    }
    
    public void addPromotion(String id, String description) {
        getPromotions().add(new Promotion(id, description));
    }

    public String getSsn() {
        return ssn;
    }

    public void setSsn(String ssn) {
        this.ssn = ssn;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public Date getBirthDate() {
        return birthDate;
    }

    public void setBirthDate(Date birthDate) {
        this.birthDate = birthDate;
    }

    public String getAccount() {
        return account;
    }

    public void setAccount(String account) {
        this.account = account;
    }

    public String getAgency() {
        return agency;
    }

    public void setAgency(String agency) {
        this.agency = agency;
    }

    public double getBalance() {
        return balance;
    }

    public void setBalance(double balance) {
        this.balance = balance;
    }

    public char getCustType() {
        return custType;
    }

    public void setCustType(char custType) {
        this.custType = custType;
    }

    public Set<Product> getProducts() {
        return products;
    }

    public void setProducts(Set<Product> products) {
        this.products = products;
    }

    public List<Promotion> getPromotions() {
        if (promotions == null) {
            promotions = new ArrayList<Promotion>();
        }
        return promotions;
    }

    public void setPromotions(List<Promotion> promotions) {
        this.promotions = promotions;
    }
    
} 

As you can see in the code, the customer entity has a relationship with the two other entities. Build this code and package those three entities into a JAR file. We can now move to the second part of the implementation which is the creation of one SOA project that includes an business rules dictionary.

Creating the Business Rules Dictionary

Business rules in the Oracle Business Rules product are defined in an artifact called dictionary. In order to create an dictionary, you must use the Oracle JDeveloper IDE plus the SOA extension for JDeveloper. I will assume here that you are familiar with those tools, so I will not enter in too much detail about them. In JDeveloper, create a new SOA project, and after that create a business rules dictionary. With the dictionary in place, you must configure the dictionary to consider our domain model as fact types.


Now you can write down some business rules. Using the JDeveloper business rules editor, define the following rules as shown in the picture below.


For testing purposes, the variable "MinimumBalanceForCreditCard" it is just a global variable of type java.lang.Double that contains a constant value. Finally, you are required to expose those business rules through an decision function. As you probably already know, decision functions are constructions that make easier external applications to interact with Oracle Business Rules, minimizing the developers effort to deal with the Oracle Business Rules API, besides providing a very nice contract-based access point. Create one decision point that receives an customer as input, and returns the same customer as output. Don't forget to associate the ruleset with the decision function.

Integrating Oracle Business Rules and Coherence through Interceptors

Now here came the most exciting part of the article: the integration between Oracle Business Rules and Oracle Coherence In-Memory Data Grid. Starting from 12.1.2 version of Coherence, Oracle announced an new API called Live Events. This new API allows applications to listen/consume events from Coherence, no matter what type of event it is being generated. You can learn more about Coherence Live Events in this Youtube presentation.

Using both Coherence and Oracle Business Rules main libraries, implement the following event interceptor at your favorite Java development environment:

package com.oracle.coherence.events;

import com.tangosol.net.events.EventInterceptor;
import com.tangosol.net.events.annotation.Interceptor;
import com.tangosol.net.events.partition.cache.EntryEvent;
import com.tangosol.util.BinaryEntry;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Set;

import oracle.rules.sdk2.decisionpoint.DecisionPoint;
import oracle.rules.sdk2.decisionpoint.DecisionPointBuilder;
import oracle.rules.sdk2.decisionpoint.DecisionPointDictionaryFinder;
import oracle.rules.sdk2.decisionpoint.DecisionPointInstance;
import oracle.rules.sdk2.dictionary.RuleDictionary;
import oracle.rules.sdk2.exception.SDKException;

/**
 * @author ricardo.s.ferreira@oracle.com
 */

@Interceptor
public class FSRulesInterceptor implements EventInterceptor<EntryEvent> {

    private List<EntryEvent.Type> types;
    private String dictionaryLocation;
    private String decisionFunctionName;
    private boolean dictionaryAutoUpdate;
    private long dictionaryTimestamp;

    private void parseEntryEventTypes(String entryEventTypes) {
        types = new ArrayList<EntryEvent.Type>();
        String[] listTypes = entryEventTypes.split(COMMA);
        for (String type : listTypes) {
            types.add(EntryEvent.Type.valueOf(type.trim()));
        }
    }

    private RuleDictionary loadRuleDictionary()
        throws FileNotFoundException, SDKException, IOException {
        File dictionaryFile = new File(dictionaryLocation);
        dictionaryTimestamp = dictionaryFile.lastModified();
        RuleDictionary ruleDictionary = RuleDictionary.readDictionary(
            new FileReader(dictionaryFile), new DecisionPointDictionaryFinder(null));
        return ruleDictionary;
    }

    private DecisionPoint createDecisionPoint() {
        DecisionPoint decisionPoint = null;
        try {
            decisionPoint =
                    new DecisionPointBuilder()
                    .with(loadRuleDictionary())
                    .with(decisionFunctionName).build();
        } catch (Exception ex) {
            throw new RuntimeException("Unable to create the DecisionPoint", ex);
        }
        return decisionPoint;
    }

    private void updateDecisionPointIfNecessary() {
        File dictionaryFile = new File(dictionaryLocation);
        if (dictionaryFile.lastModified() != dictionaryTimestamp) {
            decisionPoint.release();
            decisionPoint = createDecisionPoint();
        }
    }

    private boolean eventTypeIsAllowed(EntryEvent.Type entryEventType) {
        return types.contains(entryEventType);
    }

    public FSRulesInterceptor(String entryEventTypes,
                              String dictionaryLocation,
                              String decisionFunctionName) {
        parseEntryEventTypes(entryEventTypes);
        this.dictionaryLocation = dictionaryLocation;
        this.decisionFunctionName = decisionFunctionName;
        decisionPoint = createDecisionPoint();
    }

    public FSRulesInterceptor(String entryEventTypes,
                              String dictionaryLocation,
                              String decisionFunctionName,
                              boolean dictionaryAutoUpdate) {
        this(entryEventTypes, dictionaryLocation, decisionFunctionName);
        this.dictionaryAutoUpdate = dictionaryAutoUpdate;
    }

    public void onEvent(EntryEvent entryEvent) {

        BinaryEntry binaryEntry = null;
        Iterator<BinaryEntry> iter = null;
        Set<BinaryEntry> entrySet = null;
        DecisionPointInstance dPointInst = null;
        List<Object> inputs, outputs = null;

        try {

            if (eventTypeIsAllowed(entryEvent.getType())) {

                if (dictionaryAutoUpdate) {
                    updateDecisionPointIfNecessary();
                }

                inputs = new ArrayList<Object>();
                entrySet = entryEvent.getEntrySet();
                for (BinaryEntry binaryEntryInput : entrySet) {
                    inputs.add(binaryEntryInput.getValue());
                }

                dPointInst = decisionPoint.getInstance();
                dPointInst.setInputs(inputs);
                outputs = dPointInst.invoke();

                if (entryEvent.getType() == EntryEvent.Type.INSERTING ||
                    entryEvent.getType() == EntryEvent.Type.UPDATING) {
                    if (outputs != null && !outputs.isEmpty()) {
                        iter = entrySet.iterator();
                        for (Object output : outputs) {
                            if (iter.hasNext()) {
                                binaryEntry = iter.next();
                                binaryEntry.setValue(output);
                            }
                        }
                    }
                }
                
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }
    
    private static final String COMMA = ",";
    private static DecisionPoint decisionPoint;

}

If you are familiar with the Oracle Business Rules Java API, you won't find any difficult to understand this code. What it does is simply create an DecisionPoint object during the constructor phase and put this object into a static variable, which allow this object to be shared across the entire JVM. Remember that the JVM in this context is a Coherence node, so what I am saying is that each Coherence node will hold an instance of one DecisionPoint. On the onEvent() method, there is the algorithm that checks which type of event the implementation should intercept, and also checks if the DecisionPoint instance should be updated. This last check is done based on the timestamp of the dictionary file.

After creating an DecisionPointInstance, the intercepted entries became the input variables for the business rules execution. The interceptor triggers the rules engine through the invoke() method, and after that it replaces the original intercepted entries with the result that came back from the business rules agenda. But only if one of the following events had happened: INSERTING or UPDATING. This check is necessary for two reasons. First, those are the only event types that occurs in the same thread of the cache transaction. Second, other event types like INSERTED or UPDATED happens in another thread, which means that they are triggered asynchronously by Coherence.

Setting Up an Coherence Distributed Cache with the Business Rules Interceptor

Now we can start the configuration of the Coherence cache. Since we are using POF as the serialization strategy, we need to assembly an POF configuration file. Starting from the 12.1.2 version of Coherence, there is a new tool called pof-config-gen that introspects JAR files searching for annotated classes with @Portable. Create a POF configuration file that should contain the following content:

<?xml version='1.0'?>

<pof-config
   xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
   xmlns='http://xmlns.oracle.com/coherence/coherence-pof-config'
   xsi:schemaLocation='http://xmlns.oracle.com/coherence/coherence-pof-config coherence-pof-config.xsd'>
	
   <user-type-list>
      <include>coherence-pof-config.xml</include>
      <user-type>
         <type-id>1001</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Promotion</class-name>
      </user-type>
      <user-type>
         <type-id>1002</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Product</class-name>
      </user-type>
      <user-type>
         <type-id>1003</type-id>
         <class-name>com.acme.architecture.multichannel.domain.Customer</class-name>
      </user-type>
   </user-type-list>
	
</pof-config>

And as expected, we also need to create an Coherence cache configuration file. Create one file called coherence-cache-config.xml and fill it with the following contents:

<?xml version="1.0"?>

<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config coherence-cache-config.xsd">

   <defaults>
      <serializer>pof</serializer>
   </defaults>

   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>customers</cache-name>
         <scheme-name>customersScheme</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>

   <caching-schemes>
	
      <distributed-scheme>
         <scheme-name>customersScheme</scheme-name>
         <service-name>DistribService</service-name>
         <backing-map-scheme>
            <local-scheme />
         </backing-map-scheme>
         <autostart>true</autostart>
         <interceptors>
            <interceptor>
               <name>rulesInterceptor</name>
               <instance>
                  <class-name>com.oracle.coherence.events.FSRulesInterceptor</class-name>
                  <init-params>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>INSERTING, UPDATING</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>C:\\multiChannelArchitecture.rules</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.String</param-type>
                        <param-value>BankingDecisionFunction</param-value>
                     </init-param>
                     <init-param>
                        <param-type>java.lang.Boolean</param-type>
                        <param-value>true</param-value>
                     </init-param>
                  </init-params>
               </instance>
            </interceptor>
         </interceptors>
         <async-backup>true</async-backup>
      </distributed-scheme>
		
      <proxy-scheme>
         <scheme-name>customersProxy</scheme-name>
         <service-name>ProxyService</service-name>
         <acceptor-config>
            <tcp-acceptor>
               <local-address>
                  <address>cloud.app.foundation</address>
                  <port>5555</port>
               </local-address>
            </tcp-acceptor>
         </acceptor-config>
         <autostart>true</autostart>
      </proxy-scheme>

   </caching-schemes>
	
</cache-config>    

This cache configuration file is very straightforward. There is only three important things to consider here. First, we are using the new interceptor section to declare our interceptor and pass constructor arguments for it. Second, we used another feature from Coherence 12.1.2 version, which is the asynchronous backup feature. Using this feature dramatically reduces the latency of one single transaction, since backups are written after (in another thread) that the primary entry has been written. Not necessarily a pre-condition for the interceptor stuff works, but in the context of BRMS, should be a great idea. Third, we also defined a proxy-scheme that expose an TCP/IP endpoint, so we can use the Coherence*Extend feature later in this article, to allow a C++ application to access the same cache.

Testing the Scenario

Now that we have all the configuration in place, we can start the tests. Start an Coherence node JVM with the configuration file from the previous section. When you start the Coherence, a DecisionPoint object pointing to the business rules dictionary will be created in-memory. Implement a Java program to test the behavior of the implementation as the listing below:

package com.acme.architecture.multichannel.test;

import com.acme.architecture.multichannel.domain.Customer;

import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;

import java.util.Date;

public class Application {
    
    public static void main(String[] args) {
        
        NamedCache customers = CacheFactory.getCache("customers");
        
        // Simulating a simple customer of type 'Person'...
        // Should trigger the rule 'Basic Products for First Customers'
        // Expected number of promotions: 02
        String ssn = "12345";
        Customer personCust = new Customer(ssn, "Ricardo", "Ferreira",
                                         new Date(1981, 10, 05), "245671",
                                         "3158", 98000, 'P', null);
        
        long startTime = System.currentTimeMillis();
        customers.put(ssn, personCust);
        personCust = (Customer) customers.get(ssn);
        long elapsedTime = System.currentTimeMillis() - startTime;
        
        System.out.println();
        System.out.println("   ---> Number of Promotions: " +
                           personCust.getPromotions().size());
        System.out.println("   ---> Elapsed Time: " + elapsedTime + " ms");
        
        // Simulating a simple customer of type 'Company'...
        // Should trigger the rule 'Corporate Credit Card for Enterprise Customers'
        // Expected number of promotions: 01
        ssn = "54321";
        Customer companyCust = new Customer(ssn, "Huge", "Company",
                                         new Date(1981, 10, 05), "235437",
                                         "7856", 8900000, 'C', null);
        
        startTime = System.currentTimeMillis();
        customers.put(ssn, companyCust);
        companyCust = (Customer) customers.get(ssn);
        elapsedTime = (System.currentTimeMillis() - startTime);
        
        System.out.println("   ---> Number of Promotions: " +
                           companyCust.getPromotions().size());
        System.out.println("   ---> Elapsed Time: " + elapsedTime + " ms");
        System.out.println();
        
    }
    
}

This Java application can be executed with the storage-enabled parameter set to false. Executing this code will give you an output similar to this:

2013-08-15 23:37:53.058/1.378 Oracle Coherence 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/tangosol-coherence.xml"
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "cache-factory-config.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "cache-factory-builder-config.xml" is not specified
2013-08-15 23:37:53.093/1.413 Oracle Coherence 12.1.2.0.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 12.1.2.0.0 Build 44396
 Grid Edition: Development mode
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

2013-08-15 23:37:53.380/1.700 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "file:/C:/poc-itau-brms/resources/coherence-cache-config.xml"
2013-08-15 23:37:55.615/3.935 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Created cache factory com.tangosol.net.ExtensibleConfigurableCacheFactory
2013-08-15 23:37:56.206/4.526 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=Main Thread, member=n/a): TCMP bound to /10.0.3.15:8090 using SystemDatagramSocketProvider
2013-08-15 23:37:56.528/4.848 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): Failed to satisfy the variance: allowed=16, actual=54
2013-08-15 23:37:56.528/4.848 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): Increasing allowable variance to 20
2013-08-15 23:37:56.885/5.205 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication, Edition=Grid Edition, Mode=Development, CpuCount=3, SocketCount=3) joined cluster "cluster:0x50DB" with senior Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=3, SocketCount=3)
2013-08-15 23:37:57.189/5.509 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0x50DB

Group{Address=224.12.1.0, Port=12100, TTL=4}

MasterMemberSet(
  ThisMember=Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication)
  OldestMember=Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=2
    Member(Id=1, Timestamp=2013-08-15 21:16:58.769, Address=10.0.3.15:8088, MachineId=2319, Location=site:,process:3000, Role=CoherenceServer)
    Member(Id=2, Timestamp=2013-08-15 23:37:56.653, Address=10.0.3.15:8090, MachineId=2319, Location=site:,process:3484, Role=AcmeArchitectureApplication)
    )
  MemberId|ServiceVersion|ServiceJoined|MemberState
    1|12.1.2|2013-08-15 21:16:58.769|JOINED,
    2|12.1.2|2013-08-15 23:37:56.653|JOINED
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

TcpRing{Connections=[1]}
IpMonitor{Addresses=0}

2013-08-15 23:37:57.243/5.581 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Loaded POF configuration from "file:/C:/poc-itau-brms/resources/pof-config.xml"
2013-08-15 23:37:57.261/5.581 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Cluster, member=2): Loaded included POF configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/coherence-pof-config.xml"
2013-08-15 23:37:57.386/5.706 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=Invocation:Management, member=2): Service Management joined the cluster with senior service member 1
2013-08-15 23:37:57.494/5.814 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Main Thread, member=2): Loaded Reporter configuration from "jar:file:/C:/mw-home/coherence/lib/coherence.jar!/reports/report-group.xml"
2013-08-15 23:37:58.012/6.332 Oracle Coherence GE 12.1.2.0.0 <D5> (thread=DistributedCache:DistribService, member=2): Service DistribService joined the cluster with senior service member 1

   ---> Number of Promotions: 2
   ---> Elapsed Time: 53 ms
   ---> Number of Promotions: 1
   ---> Elapsed Time: 0 ms

2013-08-15 23:37:58.137/6.457 Oracle Coherence GE 12.1.2.0.0 <D4> (thread=ShutdownHook, member=2): ShutdownHook: stopping cluster node

As you can see in the output, the number of promotions showed reveals that the business rules were really executed, since during the instantiation of the customer object promotions weren't provided. The output also tells us another important thing: transaction latency. For the first cache entry we got 53 ms as overall latency, quite short if you consider what happened behind the scenes. But the second cache entry is even much more faster, with 0 ms of latency. This means that the actual time necessary to execute the entire transaction was something below of one millisecond, giving us an real sub-millisecond latency scenario, measured in microseconds.

High Scalable Business Rules

It is not so obvious when you understand this implementation for first time, but another important aspect of this design is scalability. Since the cache type that we used was the distributed one, also known as partitioned, the overall cache entries are equally distributed among all Coherence nodes available. If we use only one node, of course that this one node will handle the entire dataset by itself. But if we use four nodes, each node will handle 25% of the dataset. This means that if we insert one million customer objects in the cache, each node will handle only 250K customers.

This type of data storage offers a huge benefit for Oracle Business Rules, which is the truly data load distribution. Remember that I said before that each Coherence node will hold one DecisionPoint instance? Since each node handle only a percentage of the entire dataset, its reasonable to think that each node will fire rules only for the data that it manages. This happens this way because Coherence interceptors are executed in the JVM that the data lives, not in the entire data grid since it is not a distributed processing. For instance, if the customer "A" is primarily stored in the "JVM 1", and this customer "A" has its fields updated by one client application, business rules will be fired and executed only in the "JVM 1". The other JVMs will not execute any business rules. This means that CPU overhead can be balanced across the cluster of servers, allowing the In-Memory Data Grid scale up horizontally, using the overall compute power of different servers available in the cluster.

API Transparency and Multiple Programming Language Support

Once the Oracle Business Rules is encapsulated in Coherence through an interceptor, there is another great advantage of this design: API transparency. Developers don't need to write custom code to interact with Oracle Business Rules. In fact, they don't ever need to know that business rules are being executed when objects are written in Coherence. Since all happens behind the scenes, this approach free developers from extra complexity, allowing them to work only in a data-oriented fashion which is very productive and less error prone.

And because Oracle Coherence offers you not only a Java API to interact with the In-Memory Data Grid, but also a C++, .NET and an REST API, you can leverage several types of clients and applications to trigger business rules executions. In fact, I have created a very small C++ application using Microsoft Visual Studio to test this behavior. The application code below inserts 1K customers into the In-Memory Data Grid, with an average transaction latency of ~5 ms, using a VM with 3 vCores and 10 GB of RAM.

#include "stdafx.h"
#include <windows.h>
#include <cstring>
#include <iostream>
#include "Customer.hpp"

#include "coherence/lang.ns"
#include "coherence/net/CacheFactory.hpp"
#include "coherence/net/NamedCache.hpp"

using namespace coherence::lang;
using coherence::net::NamedCache;
using coherence::net::CacheFactory;

int NUMBER_OF_ENTRIES = 1000;

__int64 currentTimeMillis()
{
	static const __int64 magic = 116444736000000000;
	SYSTEMTIME st;
	GetSystemTime(&st);
	FILETIME   ft;
	SystemTimeToFileTime(&st,&ft);
	__int64 t;
	memcpy(&t,&ft,sizeof t);
	return (t - magic)/10000;
}

int _tmain(int argc, _TCHAR* argv[])
{

	Customer customer;
	std::string _customerKey;
	String::View customerKey;
	__int64 startTime = 0, endTime = 0;
	__int64 elapsedTime = 0;
	
	NamedCache::Handle customer = CacheFactory::getCache("customers");

	startTime = currentTimeMillis();
	for (int i = 0; i < NUMBER_OF_ENTRIES; i++)
	{
		std::ostringstream stream;
		stream << "customer-" << (i + 1);
		_customerKey = stream.str();
		customerKey = String::create(_customerKey);
		std::string firstName = _customerKey;
		std::string agency = "3158";
		std::string account = "457899";
		char custType = 'P';
		double balance = 98000;
		Customer customer(customerKey, firstName,
			agency, account, custType, balance);
		Managed<Customer>::View customerWrapper = Managed<Customer>::create(customer);
		customers->put(customerKey, customerWrapper);
		Managed<Customer>::View result =
			cast<Managed<Customer>::View> (customers->get(customerKey));
	}
	endTime = currentTimeMillis();
	elapsedTime = endTime - startTime;

	std::cout << std::endl;
	std::cout << "   Elapsed Time..................: "
		<< elapsedTime << " ms" << std::endl;
	std::cout << std::endl;

	CacheFactory::shutdown();
	getchar();
	return 0;

}

An Alternative Version of the Interceptor for MDS Scenarios

The interceptor created in this article uses the Oracle Business Rules Java API to read the dictionary directly from the file system. This approach suggests two things: first, that the repository of the dictionary will be the file system. Second, that the authoring and management of the dictionary will be done through JDeveloper. This can lead into some lost of the BRMS power since business users won't feel comfortable authoring their rules in a technological environment such as JDeveloper. Administrators won't have the power of see who changed what since virtually any person can open the file in JDeveloper and change its contents.

A better way to manage this is storing the dictionary in a MDS repository, which is part of the Oracle SOA Suite platform. Storing the dictionary in the MDS repository allows business users to interact with business rules through the SOA composer, a very nice web tool, more simpler and easy-2-use than JDeveloper. Administrators can also track down changes, since everything in the MDS are audited, transaction based and securely controlled, since you have to first log in the console to get access to the composer.

I have implemented another version of the interceptor, making full use of the power of Oracle SOA Suite and MDS repositories. The implementation of MDSRulesInterceptor.java is being tested for over a month and is performing quite well, just like the FSRulesInterceptor.java implementation. In the future, I will post here this implementation, but for now just keep in mind the powerful things that can be done with Oracle Business Rules and Coherence In-Memory Data Grid. Oracle Fusion Middleware really rocks isn't?


Saturday Jul 13, 2013

Upgrading to Coherence for C++ 12.1.2 and The "Ambiguous Compilation Error" in Types Derived From AbstractAggregator

This weekend, I started the migration of some old C++ applications built on top of Coherence for C++ 3.7.1 API to its newest and refreshed version, released a few days ago. Version 12.1.2 introduces a lot of cool changes and features in the Coherence product, not mentioning the improvements done in different areas like installation, WebLogic integration, TCMP, Exabus, REST and Coherence*Extend. If you got yourself interested about those changes and improvements, check out the documentation at the following link, and please join us for the live virtual launch event on July 31st. Registration here.

After a couple hours migrating my projects from the oldest version to the newest, I got surprised with the following compiler message when trying to build the code:

call of overloaded 'Float64Sum(const coherence::lang::TypedHandle<coherence::util::extractor::ReflectionExtractor &)' is ambiguous 

If you are experiencing the same compiler message... don't worry. There is a quick and clean solution for this. This compiler error happens because in the 12.1.2 version of Coherence for C++ API, an overloaded version of the create() factory method was introduced in the types derived from the coherence::util::aggregator::AbstractAggregator class.

Until 3.7.1 version, the only way to instantiate an AbstractAggregator object was passing an instance of an coherence::util::ValueExtractor to its factory method. Now you also have the option to pass an instance of an coherence::lang::String::View. This object should contain the name of the attribute that will be aggregated. Automatically and behind the scenes, the Coherence for C++ API will fabricate an coherence::util::ValueExtractor for you.

In my case, I've changed my code from this:

Float64Sum::Handle scoreAggregator = Float64Sum::create(ReflectionExtractor::create("getScore")); 

To this:

Float64Sum::Handle scoreAggregator = Float64Sum::create("getScore"); 

And my project was able to be completely compiled again, both using the GNU Linux G++ and MS Visual Studio for C++ compilers.

Hope this blog helps you eliminate some hours of code debugging :-)

Tuesday Mar 19, 2013

Creating Scalable Fast Data Applications using Oracle Event Processing Platform (Setting Up an Active-Active Oracle CEP Domain)

This article will discover some technical aspects that should be considered if you are involved in serious implementations of Oracle CEP, the technical foundation of the Oracle's strategy for Fast Data called Oracle Event Processing Platform. It is expected that you have some basic knowledge about Oracle CEP, JMS and some knowledge about programming using Java

Fast Data and the Concern with Scalability

There is no such thing of application not meant to grow. Every application, even the simpler ones should expect some growth across the months or years during the time they are up and running. Growing is a consequence of a lot of things such as organizational growth, application maturity which in turn gives users more confidence to use it, an marketing campaign that worked and brought much more clients than expected, more front-ends enabling people to interact with your application through other types of devices, exponential generation of data from social networks that your application is configured to listen to, an market opportunity that demands more of your software or perhaps just natural growth of the users installed base.

It doesn't matter the source of growing, your application need to be ready to scale up. And this is true no matter which architectural style you're considering it as your strategy. Of course, there are some architectural styles that suggests a moderated growth like the client-server style or maybe the monolithic style. But take for instance the SOA ("Service-Oriented Architecture") style. The basic concept behind this architectural style is the reuse of services, which are the building blocks of the functional architecture, representing the business knowledge (standards, culture, procedures, routines) of an organization in a form of reusable functions. As much the reuse growths, more scalable your SOA foundation must be. You virtually can't predict the level of reuse of your services, but the key thing is, you should design your services to really scale up.

Another great example of architectural style that need to be designed to scale up is EDA ("Event-Driven Architecture"), which basically deals with processing of heterogeneous events coming from different sources, with different message formats and more importantly, with event channels that could potentially generate a number of events with different throughput's, frequencies and volumes. In the SOA architectural style this could happens too of course, but the scariest thing about EDA is that you don't necessarily deals with fixed message schemas, neither with well known message contracts. The previous knowledge about message contracts and schemas gives you the ability to predict the message size that the hardware infrastructure must deal with, an important requirement when you are sizing an infrastructure based on reasonable levels of reuse, like in the case of SOA.

As mentioned earlier, in the EDA architectural style you cannot predict the message schema or contract of yours events. It can be virtually any message format containing both structured and/or unstructured content. An good event-driven solution must be able to deal with this kind of situation, making the task of sizing an ideal hardware infrastructure really tough. Sizing an ideal hardware for an event-driven solution is a combination of both science and imagination. There are a lot of things to consider, a lot of scenarios to evaluate, a lot of hardware and/or software failures to predict and a huge number of situations that could potentially stops your application to run due hardware resources limitation, even in the first five hours running in production. Believe me, it is really tough.

Designing an event-driven solution that are ready to scale up demands more from the regular architect role that we found nowadays. It requires deep knowledge of the problem domain, deep knowledge of distributed systems, deep knowledge of servers systems (and perhaps engineered systems), deep knowledge of enterprise integration patterns and deep knowledge of the software's stacks used to build the solution, regardless if it is a proprietary, open-source or a combination of both.

Why do you need to worry about scalability? Because ten years ago the market demanded for event-driven solutions prepared to handle hundred of events per second. Today is the time of building event-driven solutions that should handle thousands of events per second. Fast Data, one of the new buzzwords of IT, demands for event-driven solutions that should handle millions of thousands events per second. My advice for any architect responsible for an event-driven solution should be, thinking in scalability as a huge main goal just like the problem domain to be solved.

Business Scenario for Scalability Study on Oracle CEP

Let's start our study of applying scalability in EDA considering a business scenario. Imagine that you are designing a EDA solution that combines technologies like CEP, BAM and JMS to deliver near real-time business monitoring of KPIs ("Key Performance Indicators") for an financial services company. All the information needed to process the KPIs came from a JMS channel that must be listened by an specialized adapter. Those JMS messages will be the business events. Inside every business event, there are information about payment transactions, containing some data like the total amount paid, the credit card brand, etc. The idea here is to process those payment transactions as they happens, in order to generate valuable KPIs that could be monitored through a near real-time monitoring solution like a BAM. For simplicity reasons, let's consider only one KPI for instance, and concentrate our focus on the CEP layer that is responsible for the KPIs compilation and aggregation. The example KPI will be the total count of payment transactions per second.

In order to compute this KPI, the CEP layer must execute the count aggregation function onto the stream of events, considering only those events of the last second. This means that this KPI will be compiled and aggregated on every one second (1000 milliseconds of time window) and the output should be also generated on every one second. The EPN ("Event Processing Network") of this business scenario should be something simpler like this:


Reading this EPN is not that complicated. You must basically read the flow from the left to the right. The basic idea behind this EPN is: listen the business events from an JMS adapter, put those events sequentially based on their temporal order into a event channel, compute the KPI based on the stream of events using an processor, send the generated output event (the KPI itself) to an new event channel and finally, present the KPI into the server output console using a custom adapter.

The event model of this EPN is composed by two simple event types. The first event type is the concept of the payment transaction, which acts in the EPN as event source. This event type contains three fields: an dateTime field that tells you the exactly moment that the payment transaction occurred, an amount field that reveals the amount paid for a product and/or service, and an brand field that tells you which credit card type was used in the payment transaction. The second event type would be the transactions per second KPI, which acts in this EPN as complex event. The only field that this event type has is the totalCountTPS, which represents the computed value of this KPI. The following UML class diagram summarizes this event model.


All the payment transactions are received through an built-in JMS adapter called in the EPN of paymentTransactionsJMSAdapter. This adapter is configured to listen to an JMS destination through an dedicated connection factory. The listing below is the configuration file for this JMS adapter.

<?xml version="1.0" encoding="UTF-8"?>
<wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">

	<jms-adapter>
		<name>paymentTransactionsJMSAdapter</name>
		<event-type>PaymentTransaction</event-type>
		<jndi-provider-url>t3://localhost:7001</jndi-provider-url>
		<connection-jndi-name>jmsConnectionFactory</connection-jndi-name>
		<destination-jndi-name>distributedQueue</destination-jndi-name>
		<concurrent-consumers>24</concurrent-consumers>
	</jms-adapter>
	
</wlevs:config>

The processor that will compute the KPI is also very simple. It basically counts the events coming from the event channel, filtering only those events that make part of one whole second. It also filters those events that has some meaningful values in the amount and brand fields, to prevent the computation of the KPI based on dirty events. The following CQL ("Continuous Query Language") statement are used to compute the KPI:

SELECT COUNT(dateTime) AS totalCountTPS
FROM paymentTransactionsChannel [RANGE 1 SECOND SLIDE 1 SECOND]
WHERE ammount > 0 AND brand IS NOT NULL

Finally, let's take a look in the last part of the EPN, which is the custom adapter created to print the results of the TransactionsPerSecondKPI event type in the server output console. As I mentioned earlier, for simplicity reasons I will not show how this event will be monitored in a BAM. Instead, I have created a custom adapter using the Oracle CEP adapters API to print in the server output console the content of the totalCountTPS field present in the TransactionsPerSecondKPI event type. The listing below is the implementation in Java of this custom console adapter.

package com.oracle.cep.examples.ha;

import com.bea.wlevs.ede.api.EventProperty;
import com.bea.wlevs.ede.api.EventRejectedException;
import com.bea.wlevs.ede.api.EventType;
import com.bea.wlevs.ede.api.EventTypeRepository;
import com.bea.wlevs.ede.api.StreamSink;
import com.bea.wlevs.util.Service;

public class ConsoleAdapter implements StreamSink {

	private EventTypeRepository eventTypeRepository;

	@Override
	public void onInsertEvent(Object event) throws EventRejectedException {
		EventType eventType = eventTypeRepository.getEventType("TransactionsPerSecondKPI");
		EventProperty eventProperty = eventType.getProperty("totalCountTPS");
		System.out.println("   ---> [Console] Total Count of TPS: " + eventProperty.getValue(event));
	}

	@Service
	public void setEventTypeRepository(EventTypeRepository eventTypeRepository) {
		this.eventTypeRepository = eventTypeRepository;
	}

}

As you can see in the code, this custom adapter doesn't make anything special. It just access the totalCountTPS field from the TransactionsPerSecondKPI event type and print the value in the server output console. The idea here is just to monitor in near real-time the computation of the KPIs, in order to test the behavior of Oracle CEP when is running in a single JVM or when its running in clustered on multiple JVMs.

Creating a Testing Environment to Simulate the Payment Transactions

Now that we are all set regarding the business scenario, we can start the tests. You need to create a simple Oracle CEP domain to test this application. You will also need an JMS provider to host the JMS destination. I would recommend you to use Oracle WebLogic as the JMS provider, but feel free to use any JMS provider compliant with JMS 1.1. Setup your JMS provider and create an queue to be used in the tests, and setup your Oracle CEP JMS adapter to listen this queue. Later in this article I will discuss more about the difference of using queues and topics, but for now let's just focus on the functional testing.

Implement the following Java program in your development environment. The program is just an example about how to send the JMS messages to an queue, and it considers that you are connecting to an WebLogic JMS domain. If you prefer using another JMS provider, you should adapt this program to correctly connect to your host system.

package com.oracle.cep.examples.ha;

import java.util.Hashtable;
import java.util.Random;

import javax.jms.Connection;
import javax.jms.ConnectionFactory;
import javax.jms.MapMessage;
import javax.jms.MessageProducer;
import javax.jms.Queue;
import javax.jms.Session;
import javax.naming.InitialContext;

public class EventsGenerator {

	public static void main(String[] args) {
		
		final String[] brands = {"Visa", "Mastercard",
				"Dinners", "American Express"};
		final Random random = new Random();
		
		InitialContext jndi = null;
		Hashtable<String, String> params = null;
		ConnectionFactory jmsConnectionFactory = null;
		Queue distributedQueue = null;
		
		Connection jmsConn = null;
		Session session = null;
		MessageProducer msgProd = null;
		MapMessage message = null;
		
		try {
			
			params = new Hashtable<String, String>();
			params.put(InitialContext.PROVIDER_URL, "t3://localhost:7001");
			params.put(InitialContext.INITIAL_CONTEXT_FACTORY,
					"weblogic.jndi.WLInitialContextFactory");
			jndi = new InitialContext(params);
			jmsConnectionFactory = (ConnectionFactory) jndi.lookup("jmsConnectionFactory");
			distributedQueue = (Queue) jndi.lookup("distributedQueue");
			
			jmsConn = jmsConnectionFactory.createConnection();
			session = jmsConn.createSession(false, Session.AUTO_ACKNOWLEDGE);
			msgProd = session.createProducer(distributedQueue);
			
			long startTime, endTime = 0;
			long elapsedTime = 0;
			
			for (;;) {
				startTime = System.currentTimeMillis();
				for (int i = 0; i < 10; i++) {
					message = session.createMapMessage();
					message.setLong("dateTime", System.currentTimeMillis());
					message.setDouble("ammount", random.nextInt(1000));
					message.setString("brand", brands[random.nextInt(brands.length)]);
					msgProd.send(message);
				}
				endTime = System.currentTimeMillis();
				elapsedTime = endTime - startTime;
				if (elapsedTime < 1000) {
					Thread.sleep(1000 - elapsedTime);
				}
				System.out.println("Tick...");
			}
			
		} catch (Exception ex) {
			ex.printStackTrace();
		} finally {
			
			try {
				if (msgProd != null) { msgProd.close(); }
				if (session != null) { session.close(); }
				if (jmsConn != null) { jmsConn.close(); }
			} catch (Exception ex) {
				ex.printStackTrace();
			}
			
		}
		
	}

}

What this program does is continuously send ten messages to the specified JMS queue. As you can see in the code, after sending the messages, it takes a pause of one second, considering the elapsed time taken to send the messages to the JMS queue. This program never ends, unless of course that the user terminate the JVM created.

Starting the Functional Tests with one Single Oracle CEP JVM

Make sure that all your development environment are up and running, including your JMS provider and an fully operational Oracle CEP domain. Deploy the Oracle CEP application into this domain and run the client JMS application. An recorded result of this functional test is published on Youtube, so you can check it out the results of this test.

In the following sections, it will be discovered two different approaches for applying scalability. These approaches will be applied in the scenario of near real-time business monitoring of KPIs, transforming it into a more scalable solution that could handle the growing of the number of events just adding more Oracle CEP JVMs across the same and/or multiple server systems.

The Simpler Scalability Approach Ever: Using JMS Queues

Being you just designing an application that should deal with asynchronous calls or, just worried about delivery guarantees of messages, use JMS queues is always a good choice. Consumer applications connected into a JMS queue works hard for the reading of the most recent message, in a FIFO style. This means that each consumer will enter in a race condition along with other consumers to compete about which one gets the maximum number of messages possible. For the application consumer perspective, this could be a hard problem since there are no guarantees that an specific message will be consumed by an specific consumer, but for the JMS queue perspective, this is a really powerful scalability technique. The reason for that is because each consumer application will work hard to provide maximum throughput for the consumption of messages.

Imagine for instance an consumer application connected to an JMS queue, running on a server system with 24 cores of CPU, being two hardware chips with 6 cores each, using the Hyper-Threading Intel technology. If we realize that this consumer application running on this type of hardware gives us an average throughput of 3,000 TPS ("Transactions per Second"), we can certainly assume that putting another copy of the same consumer application, running on another server system with the same hardware configuration will give us an average throughput of 6,000 TPS. This is what we call horizontally scalability, when you increase the throughput of an software based application adding more server systems that will engage its hardware resources to an specific software goal, which in this case is consume as fast as possible the messages from an JMS queue.

This type of scalability technique delivers another important advantage: when you increase the total count of server systems running the consumer applications, you minimize the number of messages that each server system will need to handle. This seems to be a little bit contradictory isn't it, since in theory should be "a good thing" that each server system handle more messages as possible. Well, it is a good thing, but like anything else in software architecture, there are trade-offs. Consider for instance each consumer application running on the same hardware configuration. In this scenario, is reasonable to think that each server system will handle the same number of messages in average, because each server system are putting its hardware resources to work equally, and due the nature of the JMS queues (FIFO style consumption), the total number of messages of the queue will be divided to the number of server systems available.

The problem here is the memory footprint of each JVM running the consumer application. If you are consuming messages from the JMS queue with fewer server systems, the total number of messages that each server system will need to handle will be higher. Since each received message is allocated in the heap memory of the JVM, the total size of the heap will increase. Did you know that an javax.jms.MapMessage with an payload of 256 bytes allocates more than 400 bytes in the heap space? Imagine all those messages being received by your consumer application in the same time. You could reach the maximum size of your heap in a question of seconds. And the problem of reaching the maximum size of an JVM is the inevitable execution of the garbage collector. When the JVM detects that the heap memory are full (or almost full depending of the algorithm used) or to much fragmented, it engages the garbage collector threads to reclaim the allocated memory and/or rearrange the heap layout space. Depending of the situation of the heap memory, the JVM could use almost all the CPU cores of the server system to accomplish this task, and that's when your consumer application starts to be slower than usual, presenting performance issues and becoming a system bottleneck.

Let's consider the usage of JMS queues in our business scenario of near real-time business monitoring of KPIs. Each Oracle CEP JVM would be connected to an JMS queue that would maintain the messages of payment transactions. Each Oracle CEP JVM would act as an consumer application through its JMS adapter. Considering that each Oracle CEP JVM (or a group of it) would be running on separated server system, we could assume that the increasing of the number of server systems will increase the average throughput of message consumption from the JMS queue, and also that each Oracle CEP JVM would handle a reasonable number of messages on its JVM heap. Enough of theory, let's in see in practice how this scalability approach could be applied in our business scenario.

The first thing to do is transform your Oracle CEP domain in a multi-server and clustered domain. You can find a comprehensive set of information in the product documentation to help you doing this, but I will highlight the main steps for you here.

In the root directory of your Oracle CEP domain, you will find a sub-directory called "defaultserver". As the name suggests, this is your default server that are created automatically during the domain creation. For development and staging environments, this server is more than enough. But if you want to expand your domain, will will need to change that. Rename the default server directory from "defaultserver" to "server1". After that, make two copies of this directory, and call these newly created directories of "server2" and "server3" respectively.

Now you have to setup up some aspects of the cluster. Open the configuration file of the server1 in a text editor. The configuration file of the server1 can be found in the following location: <DOMAIN_HOME>/server1/config/config.xml. With the configuration file opened in the text editor, you will have to change the "netio" and "cluster" tags. Change the port value of the "NetIO" component to "9001", and the port value of the element "sslNetIo" component to "9011".

In the cluster tag you will have to define how the server1 will behave together with other members. Edit the cluster tag according to the listing below:

<cluster>
    <server-name>server1</server-name>
    <multicast-address>239.255.0.1</multicast-address>
    <groups>clusteredServers</groups>
    <enabled>true</enabled>
</cluster>

That's it. This is what have to be done to transform a simple server in a clustered-aware member. Repeat the same steps to the server2 and to the server3. Just keep and mind that if you plan to execute those servers in the same server system, will need to define different ports for each server. Also remember that the tag "server-name" must match with the server name you gave, which would be the name of the directory of the server.

One of the advantages of using JMS queues as your scalability approach is that you don't need to change anything at your Oracle CEP application. Just deploy the same application in all the new servers and start the tests again. Start the servers server1 and server2 and run the client JMS application. You will see in the servers output that each server will be generating the totalCountTPS KPI based on the number of events that it could receive from the JMS queue, which would be the division of the total count of events (ten events per second) by the total count of servers, which in this case is two. This will result in five events per server in average. If you start the server3 during the processing of events, you will see that the total number of events that each server is handling will decrease, which is an evidence that the scalability is really working due the fact that the servers partitioned the load of events.

An recorded result of this second test are also available on Youtube. Watch out the video below for how using JMS queues affects the scalability of your Oracle CEP applications.

So far, it seems that using JMS queues as your scalability approach is the right thing to do right? In fact, for the most demanding scenarios, this approach should be enough. But you should be aware of one catch: in-flight events can be missed during failures. I mean "in-flight" for all those events that had been received by the JMS adapter and an acknowledge of this receiving has been sent back to the JMS provider. On that moment, the message are not longer in the JMS queue, and more importantly, no other Oracle CEP JVM is aware of this event. This means that there are no recovery of events during failures in this approach. If you cannot tolerate missing events, using JMS queues for scalability is not the most reliable approach. But if your scenario can tolerate missing events, don't think twice and choose this approach once is simpler and does not implies in changing the Oracle CEP application.

Reliability Really Matters: Complicating Things Just a Little Bit with JMS Topics

OK, so you realized that you cannot tolerate missing events, and you need to improve as much as possible the reliability of your Oracle CEP application. The road to achieve this is quite more longer than just using JMS queues. There are some challenges that you need to surpass in order to conquer this level of reliability when no missing events can be tolerated without of course putting scalability aside. The challenges that you will need to surpass are:

  • Provide guarantees that all of the JVMs are aware of all events
  • Equally distribute the load of events across all of the JVMs
  • Ensure that even in-flight events are completely synchronized
  • Provide backups for every cluster member to ensure HA

The good news is, all those challenges can be easily surpassed, and the Oracle CEP product provides native support for all that stuff. But the bad news is, compared to the previous approach, the final solution starts looking like more complicated in terms of design and in terms of product configuration. Let's learn how apply this type of scalability approach (with the highest level of reliability that exists) in our near real-time business monitoring of KPIs scenario.

The situation has now changed. You need to make the solution available in two different data centers, working in a active-active schema and with high availability assured, in case of failure of the primary servers. There is no chance that any event be missed since the operational people from the financial services company must rely with the KPIs to take important decisions.

The two data centers are located in the state of California, but are geographically separated. The first one is based in Redwood, and the other one is based on Palo Alto. These data centers are connected through a fiber optical cable with an 10 GB/s of high bandwidth.


There are some changes that need to take place both in the Oracle CEP application and at the Oracle CEP domain. Let's start with the Oracle CEP application changes. First, you need to modify the EPN assembly descriptor file to include an instance of the following Oracle CEP component: com.oracle.cep.cluster.hagroups.ActiveActiveGroupBean. This special component dynamically manages the group subscriptions available to partition the incoming events between the cluster members.

Secondly, you need to synchronize all the events with all members that belongs to the cluster, including in-flight events. This can be explicitly done using special HA adapters that Oracle CEP make available. Change your EPN flow to include just after of your input adapter an instance of an HA adapter. This adapter need to be aware about which property of your event type carries the information about its age. In that case, you need to configure in this HA adapter the "timeProperty" property. This is necessary to provides guarantees that even in case of failure of one cluster member, the ordering of the events won't be affected. You will also need to create an instance of an HA correlating adapter, just before your output adapter. After this changes, your EPN assembly descriptor should include the following components:

<wlevs:adapter id="haInboundAdapter" provider="ha-inbound">
    <wlevs:listener ref="paymentTransactionsChannel" />
    <wlevs:instance-property name="timeProperty" value="dateTime" />
</wlevs:adapter>

<wlevs:adapter id="haCorrelatingAdapter" provider="ha-correlating">
    <wlevs:listener ref="consoleAdapter" />
    <wlevs:instance-property name="failOverDelay" value="2000" /> 
</wlevs:adapter>
	
<bean id="clusterAdapter"
      class="com.oracle.cep.cluster.hagroups.ActiveActiveGroupBean" />

Now here comes the most important change of this approach, which is change the JMS destination from an queue to an topic. This change should be executed both at your JMS provider (because you need to explicitly create an topic endpoint) and at your JMS adapter configuration file. Using JMS topics provides guarantees that all of the JVMs are aware of all events. You will also need to define a events partition criteria at your JMS adapter configuration file. Since topic consumers acts more like subscribers instead of just consumers, every consumer application will receive a copy of all events sent to the topic endpoint. This means that an automatic partition of the events won't happen, but this need to occur in order to provide real scalability.

Thanks to the JMS technology, there is one way of provide some criteria during the messages consumption, which is using selectors. Selectors gives you the possibility to apply the Content Based Router EAI pattern based on existing header/property values. Change the JMS adapter configuration file to include selectors criteria to the message consumption:

<?xml version="1.0" encoding="UTF-8"?>
<wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">

	<jms-adapter>
		<name>paymentTransactionsJMSAdapter</name>
		<event-type>PaymentTransaction</event-type>
		<jndi-provider-url>t3://localhost:7001</jndi-provider-url>
		<connection-jndi-name>jmsConnectionFactory</connection-jndi-name>
		<destination-jndi-name>distributedTopic</destination-jndi-name>
		<message-selector>${CONDITION}</message-selector>
		<bindings>
			<group-binding group-id="ActiveActiveGroupBean_Redwood">
				<param id="CONDITION">site = 'rw'</param>
			</group-binding>
			<group-binding group-id="ActiveActiveGroupBean_PaloAlto">
				<param id="CONDITION">site = 'pa'</param>
			</group-binding>
		</bindings>
	</jms-adapter>
	
</wlevs:config>

Let's understand the changes. The group bindings entries provided tells the JMS adapter which events listen to. The first group binding tells that messages containing "rw" in the site property should be listened only by the Redwood site. The second group binding tells that messages containing "pa" in the site property should be listened only by the Palo Alto site. Each group binding is associated with a group id, which reveals what servers will be able to receive those events. For instance, in the case of the first group binding, only servers associated with the group "ActiveActiveGroupBean_Redwood" will be able to receive events that belongs to the Redwood site. This configuration will ensure that the load of events will be equally distributed across all the JVMs. Let's start the configuration of the Oracle CEP domain.

In the root directory of your Oracle CEP domain, create four copies from one of your current servers, naming them "rw1", "rw2", "pa1" and "pa2" respectively. Since the solution must be available in two different data centers, one in Redwood and another in Palo Alto, these four servers will sustain this topology. The rw1 and rw2 servers will be the Redwood servers, being rw1 the primary and rw2 its backup. The pa1 and pa2 servers will be the Palo Alto servers, being pa1 the primary and pa2 its backup. The idea here is to provide an active-active load balancing between servers rw1 and pa1, each one having its backups on each site. Those backups will ensure high availability for every cluster member. The final topology should be something similar to this:

You need to the change the server's configuration file in order make this topology works. Servers rw1 and rw2 should be part of the "ActiveActiveGroupBean_Redwood" group, and servers pa1 and pa2 should be part of the "ActiveActiveGroupBean_PaloAlto" group. Apply the configuration below on each server's file configuration of the Oracle CEP domain:

Don't forget to change the ports of the "NetIO" and "sslNetIo" components if you plan to execute all those servers in the same server system. In terms of Oracle CEP domain, this is all the configuration necessary, so we are all set in terms of infrastructure. Before starting the tests, we need to adapt the client JMS application in order to provide two things. First, to send the messages to an topic instead to an queue. Second, to provide the "site" property on each message, in order to sustain the message criteria provided in the JMS adapter configuration file. Implement the client JMS application according to the listing below.

package com.oracle.cep.examples.ha;

import java.util.Hashtable;
import java.util.Random;

import javax.jms.Connection;
import javax.jms.ConnectionFactory;
import javax.jms.MapMessage;
import javax.jms.MessageProducer;
import javax.jms.Session;
import javax.jms.Topic;
import javax.naming.InitialContext;

public class EventsGenerator {
	
	private static final int REDWOOD_SITE = 1;
	private static final int PALOALTO_SITE = 2;
	private static final String REDWOOD = "rw";
	private static final String PALOALTO = "pa";

	public static void main(String[] args) {
		
		final String[] brands = {"Visa", "Mastercard",
				"Dinners", "American Express"};
		final Random random = new Random();
		
		InitialContext jndi = null;
		Hashtable<String, String> params = null;
		ConnectionFactory jmsConnectionFactory = null;
		Topic distributedTopic = null;
		Connection jmsConn = null;
		Session session = null;
		MessageProducer msgProd = null;
		MapMessage message = null;
		int siteId = REDWOOD_SITE;
		
		try {
			
			params = new Hashtable<String, String>();
			params.put(InitialContext.PROVIDER_URL, "t3://localhost:7001");
			params.put(InitialContext.INITIAL_CONTEXT_FACTORY,
					"weblogic.jndi.WLInitialContextFactory");
			jndi = new InitialContext(params);
			jmsConnectionFactory = (ConnectionFactory) jndi.lookup("jmsConnectionFactory");
			distributedTopic = (Topic) jndi.lookup("distributedTopic");
			
			jmsConn = jmsConnectionFactory.createConnection();
			session = jmsConn.createSession(false, Session.AUTO_ACKNOWLEDGE);
			msgProd = session.createProducer(distributedTopic);
			
			long startTime, endTime = 0;
			long elapsedTime = 0;
			
			for (;;) {
				startTime = System.currentTimeMillis();
				for (int i = 0; i < 10; i++) {
					message = session.createMapMessage();
					switch (siteId) {
						case REDWOOD_SITE:
							message.setStringProperty("site", REDWOOD);
							siteId = PALOALTO_SITE;
							break;
						case PALOALTO_SITE:
							message.setStringProperty("site", PALOALTO);
							siteId = REDWOOD_SITE;
							break;
						default:
							break;
					}
					message.setLong("dateTime", System.currentTimeMillis());
					message.setDouble("ammount", random.nextInt(1000));
					message.setString("brand", brands[random.nextInt(brands.length)]);
					msgProd.send(message);
				}
				endTime = System.currentTimeMillis();
				elapsedTime = endTime - startTime;
				if (elapsedTime < 1000) {
					Thread.sleep(1000 - elapsedTime);
				}
				System.out.println("Tick...");
			}
			
		} catch (Exception ex) {
			ex.printStackTrace();
		} finally {
			
			try {
				if (msgProd != null) { msgProd.close(); }
				if (session != null) { session.close(); }
				if (jmsConn != null) { jmsConn.close(); }
			} catch (Exception ex) {
				ex.printStackTrace();
			}
			
		}
		
	}

}

This version of the client JMS application didn't change so much compared to its original version using queues, except from the usage of the message property called site. This message property will be used by the selectors as criteria to load-balance the incoming events across all the servers. In the real world, this type of message enrichment is commonly applied by an architectural mechanism that is situated before of the Oracle CEP JVMs, which could be an ESB, an application server or an corporate load balancer. Regardless which mechanism implementation will intend to use, in terms of architecture, it is responsibility of this mechanism to provide this message enrichment, another EAI pattern that must be part of your final solution.

Now we can start the tests. Deploy this new version of the Oracle CEP application into the servers (rw1, rw2, pa1 and pa2) and run again the client JMS application. An recorded result of this third and last test are also available on Youtube. Watch out the video below for how using JMS topics affects the scalability of your Oracle CEP applications.

Conclusion

Today more than ever, solution architects and developers should be aware about what kind of techniques can be applied in their solutions in order to provide real scalability. Trends like Fast Data are biggest motivators for that. This article discussed the importance of scalability, specially when necessary in event-driven solutions. The article, through an didactic business scenario, showed how to apply scalability in Oracle CEP applications, discovering the pros and cons of two different approaches. Finally, the article showed in details how to implement the two approaches in the Oracle CEP example application.

Thursday Dec 06, 2012

JavaOne 2012 LAD Session: The Future of JVM Performance Tuning

Hi folks. This year, together with the Oracle Open World Latin America, happened another edition of the JavaOne Latin America, the more important event of Java for the developers community. I would like to share with you the slides that I've used in my session. The session was "The Future of JVM Performance Tuning" and the idea was to share some knowledge about JVM enhancements that Oracle implemented in Hotspot about performance, specially those ones related with GC ("Garbage Collection") and SDP ("Sockets Direct Protocol").

I hope you enjoy the content :)

Wednesday Aug 29, 2012

Integrating Coherence & Java EE 6 Applications using ActiveCache

OK, so you are a developer and are starting a new Java EE 6 application using the most wonderful features of the Java EE platform like Enterprise JavaBeans, JavaServer Faces, CDI, JPA e another cool stuff technologies. And your architecture need to hold piece of data into distributed caches to improve application's performance, scalability and reliability?

If this is your current facing scenario, maybe you should look closely in the solutions provided by Oracle WebLogic Server. Oracle had integrated WebLogic Server and its champion data caching technology called Oracle Coherence. This seamless integration between this two products provides a comprehensive environment to develop applications without the complexity of extra Java code to manage cache as a dependency, since Oracle provides an DI ("Dependency Injection") mechanism for Coherence, the same DI mechanism available in standard Java EE applications. This feature is called ActiveCache. In this article, I will show you how to configure ActiveCache in WebLogic and at your Java EE application.

Configuring WebLogic to manage Coherence

Before you start changing your application to use Coherence, you need to configure your Coherence distributed cache. The good news is, you can manage all this stuff without writing a single line of code of XML or even Java. This configuration can be done entirely in the WebLogic administration console. The first thing to do is the setup of a Coherence cluster. A Coherence cluster is a set of Coherence JVMs configured to form one single view of the cache. This means that you can insert or remove members of the cluster without the client application (the application that generates or consume data from the cache) knows about the changes. This concept allows your solution to scale-out without changing the application server JVMs. You can growth your application only in the data grid layer.

To start the configuration, you need to configure an machine that points to the server in which you want to execute the Coherence JVMs. WebLogic Server allows you to do this very easily using the Administration Console. For this example, consider the machine name as "coherence-server".

Remember that in order to the machine concept works, you need to ensure that the NodeManager are being executed in the target server that the machine points to. The NodeManager script can be found in <WLS_HOME>/server/bin/startNodeManager.sh.

The next thing to do is to configure an Coherence cluster. In the WebLogic administration console, navigate to Environment > Coherence Clusters and click in "New" button.

In the field "Name", set the value to "my-coherence-cluster". Click in next.

Specify a valid cluster address and port. The Coherence members will communicate with each other through this address and port. This configuration section tells Coherence to form a cluster using unicast of messages instead of multicast which is the standard. Since the method used will be unicast, you need to configure a valid cluster address and cluster port.

The Coherence cluster has been configured successfully. Now it is time to configure the Coherence members and add them to the cluster. In the WebLogic administration console, navigate to Environment > Coherence Servers and click in "New" button.

In the field "Name" set the value to "coh-server-1". In the field "Machine", associate this new Coherence server to the machine "coherence-server". In the field "Cluster", associate this new Coherence server to the cluster named "my-coherence-cluster". Click in the "Finish" button.

For this example, I will configure only one Coherence server. This means that the Coherence cluster will be composed by only one member. In production scenarios, you will have thousands of Coherence members, all of them distributed in different machines with different configurations. The idea behind Coherence clusters is exactly this: form a virtual data grid composed by different members that will be joined or removed to/from the cluster dynamically.

Before start the Coherence server, you need to configure its classpath. When the JVM of an Coherence server starts, its loads a couple classes that need to be available in runtime. To configure the classpath of the Coherence server, click into the Coherence server name. This action will bring the configuration page of the Coherence server. Click in the "Server Start" tab.

In the "Server Start" tab you will find a lot of fields that can be configured to control the behavior of the Coherence server JVM. In the "Class Path" field, you need to list the following JAR files:

- <WLS_HOME>/modules/features/weblogic.server.modules.coherence.server_12.1.1.0.jar

- <COHERENCE_HOME>/lib/coherence.jar

Remember to use a valid file separator compatible with the target operating system that you are using. Click in the "Save" button to update the configuration of the Coherence server.

Now you are ready to go. Start the Coherence server using the "Control" tab of WebLogic administration console. This will instruct WebLogic to send a command to the target machine. This command, once it is received by the target machine, will be responsible to start a new JVM for the Coherence server, according to the parameters that we have configured.

Configuring your Java EE Application to Access Coherence

Now lets pass to the funny part of the configuration. The first thing to do is to inform your Java EE application which Coherence cluster to join. Oracle had updated WebLogic server deployment descriptors so you will not have to change your code or the containers deployment descriptors like application.xml, ejb-jar.xml or web.xml.

In this example, I will show you how to enable DI ("Dependency Injection") to a Coherence cache from a Servlet 3.0 component. In the WEB-INF/weblogic.xml deployment descriptor, put the following metadata information:

<?xml version="1.0" encoding="UTF-8"?>
<wls:weblogic-web-app
	xmlns:wls="http://xmlns.oracle.com/weblogic/weblogic-web-app"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd
	http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd">
	
	<wls:context-root>myWebApp</wls:context-root>
	
	<wls:coherence-cluster-ref>
		<wls:coherence-cluster-name>my-coherence-cluster</wls:coherence-cluster-name>
	</wls:coherence-cluster-ref>
	
</wls:weblogic-web-app>

As you can see, using the "coherence-cluster-name" tag, we are informing our Java EE application that it should join the "my-coherence-cluster" when it loads in the web container. Without this information, the application will not be able to access the predefined Coherence cluster. It will form its own Coherence cluster without any members. So never forget to put this information.

Now put the coherence.jar and active-cache-1.0.jar dependencies at your WEB-INF/lib application classpath. You need to deploy this dependencies so ActiveCache can automatically take care of the Coherence cluster join phase. This dependencies can be found in the following locations:

- <WLS_HOME>/common/deployable-libraries/active-cache-1.0.jar

- <COHERENCE_HOME>/lib/coherence.jar

Finally, you need to write down the access code to the Coherence cache at your Servlet. In the following example, we have a Servlet 3.0 component that access a Coherence cache named "transactions" and prints into the browser output the content (the ammount property) of one specific transaction.

package com.oracle.coherence.demo.activecache;

import java.io.IOException;

import javax.annotation.Resource;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.tangosol.net.NamedCache;

@WebServlet("/demo/specificTransaction")
public class TransactionServletExample extends HttpServlet {
	
	@Resource(mappedName = "transactions") NamedCache transactions;
	
	protected void doGet(HttpServletRequest request,
			HttpServletResponse response) throws ServletException, IOException {
		
		int transId = Integer.parseInt(request.getParameter("transId"));
		Transaction transaction = (Transaction) transactions.get(transId);
		response.getWriter().println("<center>" + transaction.getAmmount() + "</center>");
		
	}

}

Thats it! No more configuration is necessary and you have all set to start producing and getting data to/from Coherence. As you can see in the example code, the Coherence cache are treated as a normal dependency in the Java EE container. The magic happens behind the scenes when the ActiveCache allows your application to join the defined Coherence cluster.

The most interesting thing about this approach is, no matter which type of Coherence cache your are using (Distributed, Partitioned, Replicated, WAN-Remote) for the client application, it is just a simple attribute member of com.tangosol.net.NamedCache type. And its all managed by the Java EE container as an dependency. This means that if you inject the same dependency (the Coherence cache named "transactions") in another Java EE component (JSF managed-bean, Stateless EJB) the cache will be the same. Cool isn't it?

Thanks to the CDI technology, we can extend the same support for non-Java EE standards components like simple POJOs. This means that you are not forced to only use Servlets, EJBs or JSF in order to inject Coherence caches. You can do the same approach for regular POJOs created for you and managed by lightweight containers running inside Oracle WebLogic Server.

Sunday Jul 08, 2012

The Developers Conference 2012: Presentation about CEP & BAM

This year I had the pleasure again of being one of the speakers in the TDC ("The Developers Conference") event. I have spoken in this event for three years from now. This year, the main theme of the SOA track was EDA ("Event-Driven Architecture") and I decided to delivery a comprehensive presentation about one of my preferred personal subjects: Real-time using Complex Event Processing. The theme of the presentation was "Business Intelligence in Real-time using CEP & BAM" and I would like to share here the presentation that I have done. The material is in Portuguese since was an Brazilian event that happened in São Paulo.

Once my presentation has a lot of videos, I decided to share the material as a Youtube video, so you can pause, rewind and play again how many times you want it. I strongly recommend you that before starting watching the video, you change the video quality settings to 1080p in High Definition.

Saturday Jun 23, 2012

Calculating the Size (in Bytes and MB) of a Oracle Coherence Cache

The concept and usage of data grids are becoming very popular in this days since this type of technology are evolving very fast with some cool lead products like Oracle Coherence. Once for a while, developers need an programmatic way to calculate the total size of a specific cache that are residing in the data grid. In this post, I will show how to accomplish this using Oracle Coherence API. This example has been tested with 3.6, 3.7 and 3.7.1 versions of Oracle Coherence.

To start the development of this example, you need to create a POJO ("Plain Old Java Object") that represents a data structure that will hold user data. This data structure will also create an internal fat so I call that should increase considerably the size of each instance in the heap memory. Create a Java class named "Person" as shown in the listing below.

package com.oracle.coherence.domain;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Random;

@SuppressWarnings("serial")
public class Person implements Serializable {
	
	private String firstName;
	private String lastName;
	private List<Object> fat;
	private String email;
	
	public Person() {
		generateFat();
	}
	
	public Person(String firstName, String lastName,
			String email) {
		setFirstName(firstName);
		setLastName(lastName);
		setEmail(email);
		generateFat();
	}
	
	private void generateFat() {
		fat = new ArrayList<Object>();
		Random random = new Random();
		for (int i = 0; i < random.nextInt(18000); i++) {
			HashMap<Long, Double> internalFat = new HashMap<Long, Double>();
			for (int j = 0; j < random.nextInt(10000); j++) {
				internalFat.put(random.nextLong(), random.nextDouble());
			}
			fat.add(internalFat);
		}
	}
	
	public String getFirstName() {
		return firstName;
	}

	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}

	public String getLastName() {
		return lastName;
	}

	public void setLastName(String lastName) {
		this.lastName = lastName;
	}

	public String getEmail() {
		return email;
	}

	public void setEmail(String email) {
		this.email = email;
	}

}

Now let's create a Java program that will start a data grid into Coherence and will create a cache named "People", that will hold people instances with sequential integer keys. Each person created in this program will trigger the execution of a custom constructor created in the People class that instantiates an internal fat (the random amount of data generated to increase the size of the object) for each person. Create a Java class named "CreatePeopleCacheAndPopulateWithData" as shown in the listing below.

package com.oracle.coherence.demo;

import com.oracle.coherence.domain.Person;
import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;

public class CreatePeopleCacheAndPopulateWithData {

	public static void main(String[] args) {
		
		// Asks Coherence for a new cache named "People"...
		NamedCache people = CacheFactory.getCache("People");
		
		// Creates three people that will be putted into the data grid. Each person
		// generates an internal fat that should increase its size in terms of bytes...
		Person pessoa1 = new Person("Ricardo", "Ferreira", "ricardo.ferreira@example.com");
		Person pessoa2 = new Person("Vitor", "Ferreira", "vitor.ferreira@example.com");
		Person pessoa3 = new Person("Vivian", "Ferreira", "vivian.ferreira@example.com");
		
		// Insert three people at the data grid...
		people.put(1, pessoa1);
		people.put(2, pessoa2);
		people.put(3, pessoa3);
		
		// Waits for 5 minutes until the user runs the Java program
		// that calculates the total size of the people cache...
		try {
			System.out.println("---> Waiting for 5 minutes for the cache size calculation...");
			Thread.sleep(300000);
		} catch (InterruptedException ie) {
			ie.printStackTrace();
		}
		
	}

}

Finally, let's create a Java program that, using the Coherence API and JMX, will calculate the total size of each cache that the data grid is currently managing. The approach used in this example was retrieve every cache that the data grid are currently managing, but if you are interested on an specific cache, the same approach can be used, you should only filter witch cache will be looked for. Create a Java class named "CalculateTheSizeOfPeopleCache" as shown in the listing below.

package com.oracle.coherence.demo;

import java.text.DecimalFormat;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;

import javax.management.MBeanServer;
import javax.management.MBeanServerFactory;
import javax.management.ObjectName;

import com.tangosol.net.CacheFactory;

public class CalculateTheSizeOfPeopleCache {
	
	@SuppressWarnings({ "unchecked", "rawtypes" })
	private void run() throws Exception {
		
        // Enable JMX support in this Coherence data grid session...
	System.setProperty("tangosol.coherence.management", "all");
		
        // Create a sample cache just to access the data grid...
	CacheFactory.getCache(MBeanServerFactory.class.getName());
		
	// Gets the JMX server from Coherence data grid...
	MBeanServer jmxServer = getJMXServer();
        
        // Creates a internal data structure that would maintain
	// the statistics from each cache in the data grid...
	Map cacheList = new TreeMap();
        Set jmxObjectList = jmxServer.queryNames(new ObjectName("Coherence:type=Cache,*"), null);
        for (Object jmxObject : jmxObjectList) {
            ObjectName jmxObjectName = (ObjectName) jmxObject;
            String cacheName = jmxObjectName.getKeyProperty("name");
            if (cacheName.equals(MBeanServerFactory.class.getName())) {
            	continue;
            } else {
            	cacheList.put(cacheName, new Statistics(cacheName));
            }
        }
        
        // Updates the internal data structure with statistic data
        // retrieved from caches inside the in-memory data grid...
        Set<String> cacheNames = cacheList.keySet();
        for (String cacheName : cacheNames) {
            Set resultSet = jmxServer.queryNames(
            	new ObjectName("Coherence:type=Cache,name=" + cacheName + ",*"), null);
            for (Object resultSetRef : resultSet) {
                ObjectName objectName = (ObjectName) resultSetRef;
                if (objectName.getKeyProperty("tier").equals("back")) {
                    int unit = (Integer) jmxServer.getAttribute(objectName, "Units");
                    int size = (Integer) jmxServer.getAttribute(objectName, "Size");
                    Statistics statistics = (Statistics) cacheList.get(cacheName);
                    statistics.incrementUnit(unit);
                    statistics.incrementSize(size);
                    cacheList.put(cacheName, statistics);
                }
            }
        }
        
        // Finally... print the objects from the internal data
        // structure that represents the statistics from caches...
        cacheNames = cacheList.keySet();
        for (String cacheName : cacheNames) {
            Statistics estatisticas = (Statistics) cacheList.get(cacheName);
            System.out.println(estatisticas);
        }
        
    }

    public MBeanServer getJMXServer() {
        MBeanServer jmxServer = null;
        for (Object jmxServerRef : MBeanServerFactory.findMBeanServer(null)) {
            jmxServer = (MBeanServer) jmxServerRef;
            if (jmxServer.getDefaultDomain().equals(DEFAULT_DOMAIN) || DEFAULT_DOMAIN.length() == 0) {
                break;
            }
            jmxServer = null;
        }
        if (jmxServer == null) {
            jmxServer = MBeanServerFactory.createMBeanServer(DEFAULT_DOMAIN);
        }
        return jmxServer;
    }
	
    private class Statistics {
		
        private long unit;
        private long size;
        private String cacheName;
		
	public Statistics(String cacheName) {
            this.cacheName = cacheName;
        }

        public void incrementUnit(long unit) {
            this.unit += unit;
        }

        public void incrementSize(long size) {
            this.size += size;
        }

        public long getUnit() {
            return unit;
        }

        public long getSize() {
            return size;
        }

        public double getUnitInMB() {
            return unit / (1024.0 * 1024.0);
        }

        public double getAverageSize() {
            return size == 0 ? 0 : unit / size;
        }

        public String toString() {
            StringBuffer sb = new StringBuffer();
            sb.append("\nCache Statistics of '").append(cacheName).append("':\n");
            sb.append("   - Total Entries of Cache -----> " + getSize()).append("\n");
            sb.append("   - Used Memory (Bytes) --------> " + getUnit()).append("\n");
            sb.append("   - Used Memory (MB) -----------> " + FORMAT.format(getUnitInMB())).append("\n");
            sb.append("   - Object Average Size --------> " + FORMAT.format(getAverageSize())).append("\n");
            return sb.toString();
        }

    }
	
    public static void main(String[] args) throws Exception {
	new CalculateTheSizeOfPeopleCache().run();
    }
	
    public static final DecimalFormat FORMAT = new DecimalFormat("###.###");
    public static final String DEFAULT_DOMAIN = "";
    public static final String DOMAIN_NAME = "Coherence";

}

I've commented the overall example so, I don't think that you should get into trouble to understand it. Basically we are dealing with JMX. The first thing to do is enable JMX support for the Coherence client (ie, an JVM that will only retrieve values from the data grid and will not integrate the cluster) application. This can be done very easily using the runtime "tangosol.coherence.management" system property. Consult the Coherence documentation for JMX to understand the possible values that could be applied. The program creates an in memory data structure that holds a custom class created called "Statistics".

This class represents the information that we are interested to see, which in this case are the size in bytes and in MB of the caches. An instance of this class is created for each cache that are currently managed by the data grid. Using JMX specific methods, we retrieve the information that are relevant for calculate the total size of the caches. To test this example, you should execute first the CreatePeopleCacheAndPopulateWithData.java program and after the CreatePeopleCacheAndPopulateWithData.java program. The results in the console should be something like this:

2012-06-23 13:29:31.188/4.970 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence.xml"
2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2012-06-23 13:29:31.219/5.001 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2012-06-23 13:29:31.266/5.048 Oracle Coherence 3.6.0.4 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 3.6.0.4 Build 19111
 Grid Edition: Development mode
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

2012-06-23 13:29:33.156/6.938 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded Reporter configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/reports/report-group.xml"
2012-06-23 13:29:33.500/7.282 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "jar:file:/E:/Oracle/Middleware/oepe_11gR1PS4/workspace/calcular-tamanho-cache-coherence/lib/coherence.jar!/coherence-cache-config.xml"
2012-06-23 13:29:35.391/9.173 Oracle Coherence GE 3.6.0.4 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.177.133:8090 using SystemSocketProvider
2012-06-23 13:29:37.062/10.844 Oracle Coherence GE 3.6.0.4 <Info> (thread=Cluster, member=n/a): This Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) joined cluster "cluster:0xC4DB" with senior Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2)
2012-06-23 13:29:37.172/10.954 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Cluster with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service Management with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <D5> (thread=Cluster, member=n/a): Member 1 joined Service DistributedCache with senior member 1
2012-06-23 13:29:37.188/10.970 Oracle Coherence GE 3.6.0.4 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0xC4DB

Group{Address=224.3.6.0, Port=36000, TTL=4}

MasterMemberSet
  (
  ThisMember=Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle)
  OldestMember=Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith)
  ActualMemberSet=MemberSet(Size=2, BitSetCount=2
    Member(Id=1, Timestamp=2012-06-23 13:29:14.031, Address=192.168.177.133:8088, MachineId=55685, Location=process:1128, Role=CreatePeopleCacheAndPopulateWith)
    Member(Id=2, Timestamp=2012-06-23 13:29:36.899, Address=192.168.177.133:8090, MachineId=55685, Location=process:244, Role=Oracle)
    )
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0, BitSetCount=0
    )
  )

TcpRing{Connections=[1]}
IpMonitor{AddressListSize=0}

2012-06-23 13:29:37.891/11.673 Oracle Coherence GE 3.6.0.4 <D5> (thread=Invocation:Management, member=2): Service Management joined the cluster with senior service member 1
2012-06-23 13:29:39.203/12.985 Oracle Coherence GE 3.6.0.4 <D5> (thread=DistributedCache, member=2): Service DistributedCache joined the cluster with senior service member 1
2012-06-23 13:29:39.297/13.079 Oracle Coherence GE 3.6.0.4 <D4> (thread=DistributedCache, member=2): Asking member 1 for 128 primary partitions

Cache Statistics of 'People':
   - Total Entries of Cache -----> 3
   - Used Memory (Bytes) --------> 883920
   - Used Memory (MB) -----------> 0.843
   - Object Average Size --------> 294640

I hope that this post could save you some time when calculate the total size of Coherence cache became a requirement for your high scalable system using data grids. See you!

Wednesday Jun 20, 2012

Why should you choose Oracle WebLogic 12c instead of JBoss EAP 6?

In this post, I will present to you some interesting differences between Oracle WebLogic 12c and JBoss EAP 6, which has been just released a couple days ago from Red Hat. This article claims to help you in the evaluation of key points that you should consider when choosing for an Java EE application server. Before you start reading this post, I strongly recommend you the reading of one amazing report created by Crimson Consulting Group, an independent firm that evaluated Oracle WebLogic and JBoss and put their findings on a public, Oracle opinions aside report entitled "Cost of Ownership Analysis | Oracle WebLogic Server Costs versus JBoss Application Server". To get full access to this report, click here. This report should be a nice reading to your CTO and CFO. They will be very satisfied with this reading since most of their work are based on TCO analysis like this:

In the following sections, I will present to you some important technical and non-technical aspects that most customers ask us when they are seriously evaluating for an enterprise middleware infrastructure, specially when they are in doubt about considering JBoss for some reason. I would recommend that you keep the following question in mind while you are reading the overall post: "Why should I choose WebLogic instead of JBoss?"

1) Multi Datacenter Deployment and Clustering

- D/R ("Disaster & Recovery") architecture support is embedded on the WebLogic Server 12c product. JBoss EAP 6 on the other hand has no direct D/R support included, Red Hat relies on third-part tools with higher prices. When you consider a middleware solution to host your business critical application, you should worry with every architectural aspect that are related with the solution. Fail-over support is one little aspect of a truly reliable solution. If you do not worry about D/R, your solution will not be reliable. Having said that, with Red Hat and JBoss EAP 6, you have this extra cost that will increase considerably the total cost of ownership of the solution. As you probably saw in the Crimson Consulting Group report, open-source are not so cheap when you start seeing the big picture.

- WebLogic Server 12c supports advanced LAN clustering, detection of death servers and have a common alert framework. JBoss EAP 6 on the other hand has limited LAN clustering support with no server death detection, at least, natively from the product. JBoss EAP relies on mod_cluster to provide a decent server death detection. They do not generate any alerts when servers goes down (only if you buy JBoss ON which is a separated technology) and manual intervention are required when servers goes down. In most cases, admin people must rely on "kill -9", "tail -f someFile.log" and "ps ax | grep java" commands to manage failures and clustering anomalies.

- WebLogic Server 12c supports the concept of Node Manager, which is a separated process that runs on the physical | virtual servers that allows extend the administration of the cluster to WebLogic managed servers that are often distributed across multiple machines and geographic locations. JBoss EAP 6 on the other hand has the concept of "Host Controller" which is a very poor and still in prove JBoss instance manager. As you can see here, here and here, there are a lot of issues that this kind of technology need to correct.

- WebLogic Server 12c Node Manager supports Coherence to boost performance when managing servers. JBoss EAP 6 on the other hand has no similar technology. There is no way to coordinate JBoss and infiniband instances provided by JBoss using high throughput and low latency protocols like InfiniBand. The Node Manager feature also allows another very important feature that JBoss EAP lacks: secure the administration. When using WebLogic Node Manager, all the administration tasks are sent to the managed servers in a secure tunel protected by a certificate, which means that the transport layer that separates the WebLogic administration console from the managed servers are secured by SSL.

- WebLogic Server 12c are now integrated with OTD ("Oracle Traffic Director") which is a web server technology derived from the former Sun iPlanet Web Server. This software complements the web server support offered by OHS ("Oracle HTTP Server"). Using OTD, WebLogic instances are load-balanced by a high powerful software that knows how to handle SDP ("Socket Direct Protocol") over InfiniBand, which boost performance when used with engineered systems technologies like Oracle Exalogic Elastic Cloud. JBoss EAP 6 on the other hand only offers support to Apache Web Server with custom modules created to deal with JBoss clusters, but only across standard TCP/IP networks. 

2) Application and Runtime Diagnostics

- WebLogic Server 12c have diagnostics capabilities embedded on the server called WLDF ("WebLogic Diagnostic Framework") so there is no need to rely on third-part tools. JBoss EAP 6 on the other hand has no diagnostics capabilities. Their only diagnostics tool is the log generated by the application server. Admin people are encouraged to analyse thousands of log lines to find out what is going on.

- WebLogic Server 12c complement WLDF with JRockit MC ("Mission Control"), which provides to administrators and developers a complete insight about the JVM performance, behavior and possible bottlenecks. WebLogic Server 12c also have an classloader analysis tool embedded, and even a log analyzer tool that enables administrators and developers to view logs of multiple servers at the same time. JBoss EAP 6 on the other hand relies on third-part tools to do something similar. Again, only log searching are offered to find out whats going on.

- WebLogic Server 12c offers end-to-end traceability and monitoring available through Oracle EM ("Enterprise Manager"), including monitoring of business transactions that flows through web servers, ESBs, application servers and database servers, all of this with high deep JVM analysis and diagnostics. JBoss EAP 6 on the other hand, even using JBoss ON ("Operations Network"), which is a separated technology, does not support those features. Red Hat relies on third-part tools to provide direct Oracle database traceability across JVMs. One of those tools are Oracle EM for non-Oracle middleware that manage JBoss, Tomcat, Websphere and IIS transparently.

- WebLogic Server 12c with their JRockit support offers a tool called JRockit Flight Recorder, which can give developers a complete visibility of a certain period of application production monitoring with zero extra overhead. This automatic recording allows you to deep analyse threads latency, memory leaks, thread contention, resource utilization, stack overflow damages and GC ("Garbage Collection") cycles, to observe in real time stop-the-world phenomenons, generational, reference count and parallel collects and mutator threads analysis. JBoss EAP 6 don't even dream to support something similar, even because they don't have their own JVM.

3) Application Server Administration

- WebLogic Server 12c offers a complete administration console complemented with scripting and macro-like recording capabilities. A single WebLogic console can managed up to hundreds of WebLogic servers belonging to the same domain. JBoss EAP 6 on the other hand has a limited console and provides a XML centric administration. JBoss, after ten years, started the development of a rudimentary centralized administration that still leave a lot of administration tasks aside, so admin people and developers must touch scripts and XML configuration files for most advanced and even simple administration tasks. This lead applications to error prone and risky deployments. Even using JBoss ON, JBoss EAP are not able to offer decent administration features for admin people which must be high skilled in JBoss internal architecture and its managing capabilities.

- Oracle EM is available to manage multiple domains, databases, application servers, operating systems, packaged applications, cloud and virtualization, with a complete end-to-end visibility. JBoss ON does not provide management capabilities across the complete architecture, only basic monitoring for the operating system, the JBoss itself and for the JVM. But if you need to trace a business transaction across those layers, you just cannot. They only provide monitoring capabilities, not tracing across layers capabilities.

- WebLogic Server 12c has the same administration model whatever is the topology selected by the customer. JBoss EAP 6 on the other hand differentiates between two operational models: standalone-mode and domain-mode, that are not consistent with each other. Depending on the mode used, the administration skill is different.

- WebLogic Server 12c has no point-of-failures processes, and it does not need to define any specialized server. In WebLogic, even when the administration server is down, the managed servers can be started and finished without any constraints. Domain model in WebLogic is available for years (at least ten years or more) and is production proven. JBoss EAP 6 on the other hand needs special processes to garantee JBoss integrity, the PC ("Process-Controller") and the HC ("Host-Controller"). Different from WebLogic, the domain model in JBoss is quite new (one year at tops) of maturity, and need to mature considerably until start doing things like WebLogic domain model does.

- WebLogic Server 12c supports parallel deployment model which enables some artifacts being deployed at the same time. JBoss EAP 6 on the other hand does not have any similar feature. Every deployment are done atomically in the containers. This means that if you have a huge EAR (an EAR of 120 MB of size for instance) and deploy onto JBoss EAP 6, this EAR will take some minutes in order to starting accept thread requests. The same EAR deployed onto WebLogic Server 12c will reduce the deployment time at least in 2X compared to JBoss.

4) Support and Upgrades

- WebLogic Server 12c has patch management available. JBoss EAP 6 on the other hand has no patch management available, each JBoss EAP instance should be patched manually. To achieve such feature, you need to buy a separated technology called JBoss ON ("Operations Network") that manage this type of stuff. But until now, JBoss ON does not support JBoss EAP 6 so, in practice, JBoss EAP 6 does not have this feature.

- WebLogic Server 12c supports previuous WebLogic domains without any reconfiguration since its kernel is robust and mature since its creation in 1995. JBoss EAP 6 on the other hand has a proven lack of supportability between JBoss AS 4, 5, 6 and 7. Different kernels and messaging engines were implemented in JBoss stack in the last five years, reveling their incapacity to create a well architected and proven middleware technology.

- WebLogic Server 12c has patch prescription based on customer configuration. JBoss EAP 6 on the other hand has no such capability. People need to create ticket supports and have their installations revised by Red Hat support guys to gain some patch prescription from them.

- Oracle WebLogic Server independent of the version has 8 years of support of new patches and has lifetime release of existing patches beyond that. JBoss EAP 6 on the other hand provides patches for a specific application server version up to 5 years after the release date. JBoss EAP 4 and previous versions had only 4 years. A good question that Red Hat will argue to answer is: "what happens when you find issues after year 5"? 

5) RAC ("Real Application Clusters") Support

- WebLogic Server 12c ships with a specific JDBC driver to leverage Oracle RAC clustering capabilities (Fast-Application-Notification, Transaction Affinity, Fast-Connection-Failover, etc). Oracle JDBC thin driver are also available. JBoss EAP 6 on the other hand ships only the standard Oracle JDBC thin driver, which means that load balancing with FCF ("Fast Connection Failover") are not supported. Manual intervention in case of planned or unplanned RAC downtime is necessary. In JBoss EAP 6, situation does not reestablish automatically after downtime.

- WebLogic Server 12c has a feature called Active GridLink for Oracle RAC which provides up to 3X performance on OLTP applications. This seamless integration between WebLogic and Oracle database enable more value added to critical business applications leveraging their investments in Oracle database technology and Oracle middleware. JBoss EAP 6 on the other hand has no performance gains at all, even when admin people implement some kind of connection-pooling tuning.

- WebLogic Server 12c also supports transaction and web session affinity to the Oracle RAC, which provides aditional gains of performance. This is particularly interesting if you are creating a reliable solution that are distributed not only in an LAN cluster, but into a different data center. JBoss EAP 6 on the other hand has no such support.

6) Standards and Technology Support

- WebLogic Server 12c is fully Java EE 6 compatible and production ready since december of 2011. JBoss EAP 6 on the other hand became fully compatible with Java EE 6 only in the community version after three months, and production ready only in a few days considering that this article was written in June of 2012. Red Hat says that they are the masters of innovation and technology proliferation, but compared with Oracle and even other proprietary vendors like IBM, they historically speaking are lazy to deliver the most newest technologies and standards adherence.

- Oracle is the steward of Java, driving innovation into the platform from commercial and open-source vendors. Red Hat on the other hand does not have its own JVM and relies on third-part JVMs (ie: OpenJDK) to complete their application server offer. 95% of Red Hat customers are using Oracle HotSpot as JVM, which means that without any Oracle involvement, their support will be limited exclusively to the application server layer and we all know that most problems must be investigated transcending the whole middleware stack, from the application server layer to the JVM layer. Red Hat says that if you use OpenJDK as JVM, they will support you too. But OpenJDK as production JVM are not proven, and most developers relies on Oracle HotSpot to their mission critical systems. Only Oracle can provide you a truly complete stack, offering you not only the middleware stack, but also an complete set of hardware engineered to superior performance and scalability, like Oracle Exalogic Elastic Cloud.

- WebLogic Server 12c supports natively JDK 7, which empower developers to explore the maximum of the Java platform productivity when writing code. This feature differentiate WebLogic from others application servers (except GlassFish that are also managed by Oracle) because the usage of JDK 7 introduce such remarkable productivity features like the "try-with-resources" enhancement, catching multiple exceptions with one try block, Strings in the switch statements, JVM improvements in terms of JDBC, I/O, networking, security, concurrency and of course, the most important feature of Java 7: native support for multiple non-Java languages. More features regarding JDK 7 can be found here. JBoss EAP 6 on the other hand has a questionable support for JDK 7. As you can see here, there are few bugs happening when you seriously start using JBoss with JDK 7 implementations.

- Oracle WebLogic Server 12c supports integration with Spring framework allowing Spring applications to use WebLogic special transaction manager, exposing bean interfaces to WebLogic MBeans to take advantage of all WebLogic monitoring and administration advantages. JBoss EAP 6 on the other hand has no special integration with Spring. In fact, Red Hat offers a suspicious package called "JBoss Web Platform" that in theory supports Spring, but in practice this package does not offers any special integration. It is just a facility for Red Hat customers to have support from both JBoss and Spring technology using the same customer support.

7) Lightweight Development

- Oracle WebLogic Server 12c and Oracle GlassFish are completely integrated and can share applications without any modifications. Starting with the 12c version, WebLogic now understands natively GlassFish deployment descriptors and specific configurations in order to offer you a truly and reliable migration path from a community Java EE application server to a enterprise middleware product like WebLogic. JBoss EAP 6 on the other hand has no official and documented support to natively reuse an existing (or still in development) application from JBoss AS community server. Users of JBoss suffer of critical issues during deployment time that includes: changing the libraries and dependencies of the application, patching the DTD or XSD deployment descriptors, refactoring of the application layers due classloading issues and anomalies, rebuilding of persistence, business and web layers due issues with "usage of the certified version of an certain dependency" or "frameworks that Red Hat potentially does not recommend" etc. If you have the culture or enterprise IT directive of developing Java EE applications using community middleware to in a certain future, transition to enterprise (supported by a vendor) middleware, Oracle WebLogic plus Oracle GlassFish offers you a more sustainable solution.

- WebLogic Server 12c has a very light ZIP distribution (less than 165 MB). JBoss EAP 6 ZIP size is around 130 MB, together with JBoss ON you have more 100 MB resulting in a higher download footprint. This is particularly interesting if you plan to use automated setup of application server instances (for example, to rapidly setup a development or staging environment) using Maven or Hudson.

- WebLogic Server 12c has a complete integration with Maven allowing developers to setup WebLogic domains with few commands. Tasks like downloading WebLogic, installation, domain creation, data sources deployment are completely integrated. JBoss EAP 6 on the other hand has a limited offer integration with those tools. 

- WebLogic Server 12c has a startup mode called WLX that turns-off EJB, JMS and JCA containers leaving enabled only the web container. JBoss EAP 6 on the other hand has no such feature, you need to manually disable the containers if you do not intend to use them. For people that are starting their learning path in the Java EE development, this can be an issue since specific knowledge about the application server configuration will be needed.

- WebLogic Server 12c supports fastswap, which enables you to change classes without redeployment. This is particularly interesting if you are developing patches for the application that is already deployed in the staging environment, and you do not want to redeploy the entire application. This is the same behavior that most application servers offers to JSP pages, but with WebLogic Server 12c, you have the same feature for Java classes in general. JBoss EAP 6 on the other hand has no such support. Even JBoss EAP 5 does not support this until now.

8) JMS and Messaging

- WebLogic Server 12c has a proven and high scalable JMS implementation since its initial release in 1995. JBoss EAP 6 on the other hand uses a JMS implementation called HornetQ, which was introduced in JBoss EAP 5 replacing everything that was implemented in the previous versions. Red Hat loves to introduce new technologies across JBoss versions, playing around with customers and their investments. And when they are asked about why they have changed the implementation and caused such a mess, their answer is always: "the previous implementation was inadequate and not aligned with the community strategy so we are creating a new a improved one". This Red Hat practice leads to uncomfortable investments that in a near future (sometimes less than a year) will be affected in someway.

- WebLogic Server 12c has troubleshooting and monitoring features included on the WebLogic console and WLDF. JBoss EAP 6 on the other hand has no direct monitoring on the console, activity is reflected only on the logs, no debug logs available in case of JMS issues.

- WebLogic Server 12c supports messaging enterprise features like SAF ("Store and Forward"), Distributed Queues/Topics and Foreign JMS providers support that leverage JMS implementations without compromise developer code making things completely transparent. JBoss EAP 6 on the other hand do not even dream to support such features.

9) Caching and Grid

- Coherence, which is the leading and most mature data grid technology from Oracle, is available since early 2000 and was integrated with WebLogic in 2009. Coherence and WebLogic clusters can be both managed from WebLogic administrative console. Even Node Manager supports Coherence. JBoss on the other hand discontinued JBoss Cache, which was their caching implementation just like they did with the messaging implementation (JBossMQ) which was a issue for long term customers. JBoss EAP 6 ships InfiniSpan, which is immature and has a very limited set of proven use cases, compared with Oracle Coherence that since 2001 created a very comprehensive and successful list of use cases. It is really impressive how Oracle Coherence had prove for the industry how strong it is as a elastic data-grid platform. Take a careful look.

- WebLogic Server 12c has a feature called ActiveCache which uses Coherence to, without any code changes, replicate HTTP sessions from both WebLogic and other application servers like JBoss, Tomcat, Websphere, GlassFish and even Microsoft IIS. JBoss EAP 6 on the other hand have and very limited support for this, and they support only their own application server.

- Coherence can be used to manage both L1 and L2 cache levels, providing support to Oracle TopLink and others JPA compliant implementations, even Hibernate. JBoss EAP 6 and Infinispan on the other hand supports only Hibernate. And most important of all: Infinispan does not have any successful case of L1 or L2 caching level support using Hibernate, which lead us to reflect about its viability.

10) Performance

- WebLogic Server 12c is certified with Oracle Exalogic Elastic Cloud and can run unchanged applications at this engineered system. This approach can benefit customers from Exalogic optimization's of both kernel and JVM layers to boost performance in terms of 10X for web, OLTP, JMS and grid applications. JBoss EAP 6 on the other hand has no investment on engineered systems: customers do not have the choice to deploy on a Java ultra fast system if their project becomes relevant and performance issues are detected.

- WebLogic Server 12c maintains a performance gain across each new release: starting on WebLogic 5.1, the overall performance gain has been close to 4X, which close to a 20% gain release by release. JBoss on the other hand does not provide SPECJAppServer or SPECJEnterprise performance benchmarks. Their so called "performance gains" remains hidden in their customer environments, which lead us to think if it is true or not since we will never get access to those environments.

- WebLogic Server 12c has industry performance benchmarks with submissions across platforms and configurations leading SPECJ. Oracle WebLogic leads SPECJAppServer performance in multiple categories, fitting all customer topologies like: dual-node, single-node, multi-node and multi-node with RAC. JBoss... again, does not provide any SPECJAppServer performance benchmarks.

- WebLogic Server 12c has a feature called work manager which allows your application to embrace new performance levels based on critical resource utilization of the CPUs usage. Work managers prioritizes work and allocates threads based on an execution model that takes into account administrator-defined parameters and actual run-time performance and throughput. JBoss EAP 6 on the other hand has no compared feature and probably they never will. Not supporting such feature like work managers, JBoss EAP 6 forces admin people and specially developers to uncover performance gains in a intrusive way, rewriting the code and doing performance refactorings.

11) Professional Services Support

- WebLogic Server 12c and any other technology sold by Oracle give customers the possibility of hire OCS ("Oracle Consulting Services") to manage critical scenarios, deployment assistance of new applications, high skilled consultancy of architecture, best practices and people allocation together with customer teams. All OCS services are available without any restrictions, having the customer bought software from Oracle or just starting their implementation before any acquisition. JBoss EAP 6 or Red Hat to be more specifically, only offers professional services if you buy subscriptions from them. If you are developing a new critical application for your business and need the help of Red Hat for a serious issue or architecture decision, they will probably say: "OK... I can help you but after you buy subscriptions from me". Red Hat also does not allows their professional services consultants to manage environments that uses community based software. They will probably force you to first buy a subscription, download their "enterprise" version and them, optionally hire their consultants.

- Oracle provides you our university to educate your team into our technologies, including of course specialized trainings of WebLogic application server. At any time and location, you can hire Oracle to train your team so you get trustful knowledge according to your specific needs. Certifications for the products are also available if your technical people desire to differentiate themselves as professionals. Red Hat on the other hand have a limited pool of resources to train your team in their technologies. Basically they are selling training and certification for RHEL ("Red Hat Enterprise Linux") but if you demand more specialized training in JBoss middleware, they will probably connect you to some "certified" partner localized training since they are apparently discontinuing their education center, at least here in Brazil. They were not able to reproduce their success with RHEL education to their middleware division since they need first sell the subscriptions to after gives you specialized training. And again, they only offer you specialized training based on their enterprise version (EAP in the case of JBoss) which means that the courses will be a quite outdated. There are reports of developers that took official training's from Red Hat at this year (2012) and in a certain JBoss advanced course, Red Hat supposedly covered JBossMQ as the messaging subsystem, and even the printed material provided was based on JBossMQ since the training was created for JBoss EAP 4.3.

12) Encouraging Transparency without Ulterior Motives

- WebLogic Server 12c like any other software from Oracle can be downloaded any time from anywhere, you should only possess an OTN ("Oracle Technology Network") credential and you can download any enterprise software how many times you want. And is not some kind of "trial" version. It is the official binaries that will be running for ever in your data center. Oracle does not encourages the usage of "specific versions" of our software. The binaries you buy from Oracle are the same binaries anyone in the world could download and use for testing and personal education. JBoss EAP 6 on the other hand are not available for download unless you buy a subscription and get access to the Red Hat enterprise repositories. If you need to test, learn or just start creating your application using Red Hat's middleware software, you should download it from the community website. You are not allowed to download the enterprise version that, according to Red Hat are more secure, reliable and robust. But no one of us want to start the development of a software with an unsecured, unreliable and not scalable middleware right? So what you do? You are "invited" by Red Hat to buy subscriptions from them to get access to the "cool" version of the software.

- WebLogic Server 12c prices are publicly available in the Oracle website. If you want to know right now how much WebLogic will cost to your organization, just click here and get access to our price list. In the case of WebLogic, check out the "US Oracle Technology Commercial Price List". Oracle also encourages you to get in touch with a sales representative to discuss discounts that would make possible the investment into our technology. But you are not required to do this, only if you are interested in buying our technology or maybe you want to discuss some discount scenarios. JBoss EAP 6 on the other hand does not have its cost publicly available in Red Hat's website or in any other media, at least is not so easy to get such information. The only link you will possibly find in their website is a "Contact a Sales Representative" link. This is not a very good relationship between an customer and an vendor. This is not an example of transparency, mainly when the software are sold as open. In this situations, customers expects to see the software prices publicly available, so they can have the chance to decide, based on the existing features of the software, if the cost is fair or not.

Conclusion

Oracle WebLogic is the most mature, secure, reliable and scalable Java EE application server of the market, and have a proven record of success around the globe to prove it's majority. Don't lose the chance to discover today how WebLogic could fit your needs and sustain your global IT middleware strategy, no matter if your strategy are completely based on the Cloud or not.

Wednesday Apr 18, 2012

Oracle Technical Workshop: WebLogic Suite 12c (March 28, São Paulo)

In March 28 of 2012, I presented in the Pullman hotel in São Paulo, a whole day workshop about the technical innovations of Oracle WebLogic 12c, plus also the correlated middleware stack that Oracle created around it. This workshop, for those that could be in person, was very informational and productive once without any exception, every presentation was made in practice with the audience.

I would like to share with you the slides I have used in this workshop. The slides content are written in Portuguese, since was a Brazilian workshop for the Brazil folks. Please enjoy it!

Wednesday Feb 22, 2012

Oracle Coherence: First Steps Using Clusters and Basic API Usage

When we talk about distributed data grids, elastic caching platforms and in-memory caching technologies, Oracle Coherence is the first option that came in our minds. This happens because Oracle Coherence is the oldest and most mature implementation of data grids, creating successful histories across the world. It is Oracle Coherence the implementation with the bigger number of use cases in the world. Since it's aquisition by Oracle in 2007, the product has been enhanced with powerful enterprise features to remain it's position of the "better of the world" against it's competitors.


This article will help you given your first steps with Oracle Coherence. I have prepared a sequence of three videos that will guide you in the process of creating a data grid cluster, managing data using both Java API and CohQL ("Coherence Query Language") to finally test the reliability and fail over features of the product.

Oracle allows you to download and use any of your products for free, if you are interested in learning or testing the technology. Different  of other vendors that put you first in contact with a sales representative or simply not put their software available for download, Oracle encourages you to use the technology so you gain confidence with it. You can download Oracle Coherence at this link. If you don't possess a credential in the OTN ("Oracle Technology Network"), you will be asked to create one.

If you have a powerful computer and a fastest internet bandwidth, change the video quality settings to 1080p HD ("High Definition"). It will improve considerably the quality of your viewing.

Friday Oct 28, 2011

Enabling HPC (High Performance Computing) with InfiniBand in Java™ Applications

For many IT managers, choosing a computing fabric often falls short of receiving the center stage attention it deserves. Familiarity and time have set infrastructure architects on a seemingly intransigent course toward settling for fabrics that have become the "de-facto" norm. On one hand is Ethernet, which is regarded as the solution for an almost ridiculously broad range of needs, from web surfing to high-performance, every microsecond counts storage traffic. On the other hand is Fibre Channel, which provides more deterministic performance and bandwidth, but at the cost of bifurcating the enterprise network while doubling the cabling, administrative overhead, and total cost of ownership. But even with separate fabrics, these networks sometimes fall short of today's requirements. So slowly but surely some enterprises are waking up to the potential of new fabrics.

There is no other component of the data center that is as important as the fabric, and the I/O challenges facing enterprises today have been setting the stage for next-generation fabrics for several years. One of the more mature solutions, InfiniBand, is sometimes at the forefront when users are driven to select a new fabric. New customers are often surprised at what InfiniBand can deliver, and how easily it integrates into today's data center architectures. InfiniBand has been leading the charge toward consolidated, flexible, low latency, high bandwidth, lossless I/O connectivity. While other fabrics have just recently turned to addressing next generation requirements, with technologies such as Fibre Channel over Ethernet (FCoE) still seeking general acceptance and even standardization, InfiniBand's maturity and e bandwidth advantages continue to attract new customers.

Although InfiniBand remains a small industry compared to the Ethernet juggernaut, it continues to grow aggressively, and this year it is growing beyond projections. InfiniBand often plays a role in enterprises with huge datasets. The majority of Fortune 1000 companies are involved in high-throughput processing behind a wide variety of systems, including business analytics, content creation, content trans-coding, real-time financial applications, messaging systems, consolidated server infrastructures, and more. In these cases, InfiniBand has worked its way into the enterprise as a localized fabric that, via transparent interconnection to existing networks, is sometimes even hidden from the eyes of administrators.

For high performance computing environments, the capacity to move data across a network quickly and efficiently is a requirement. Such networks are typically described as requiring high throughput and low latency. High throughput refers to an environment that can deliver a large amount of processing capacity over a long period of time. Low latency refers to the minimal delay between processing input and providing output, such as you would expect in a real-time application.

In these environments, conventional networking using socket streams can create bottlenecks when it comes to moving data. Introduced in 1999 by the InfiniBand Trade Association, InfiniBand (IB for short) was created to address the need for high performance computing. One of the most important features of IB is Remote Direct Memory Access (RDMA). RDMA enables moving data directly from the memory of one computer to another computer, bypassing the operating system of both computers and resulting in significant performance gains. The usage of InfiniBand network is particularly interesting when applied in engineered systems like Oracle Exalogic and SPARC SuperCluster T4-4. In this type of system, you can boost the performance of your application from 3X to 10X without changing a single line of code.

The Sockets Direct Protocol or SDP for short is a networking protocol developed to support stream connections over InfiniBand fabric. SDP support was introduced to the JDK 7 release of the Java Platform, for applications deployed in the Solaris and Linux operating systems. The Solaris operating system has supported SDP and InfiniBand since Solaris 10. On the Linux, world the InfiniBand package is called OFED (OpenFabrics Enterprise Distribution). The JDK 7 release supports the 1.4.2 and 1.5 versions of OFED.

SDP support is essentially a TCP bypass technology. When SDP is enabled and an application attempts to open a TCP connection, the TCP mechanism is bypassed and communication goes directly to the InfiniBand network. For example, when your application attempts to bind to a TCP address, the underlying software will decide, based on information in the configuration file, if it should be rebound to an SDP protocol. This process can happen during the binding process or the connecting process, but happens only once for each socket.

There are no API changes required in your code to take advantage of the SDP protocol: the implementation is transparent and is supported by the classic networking and the NIO (New I/O) channels. SDP support is disabled by default. The steps to enable SDP support are:

  • Create an text-based file that will act as the SDP configuration file
  • Set the system property that specifies the location of this configuration file.

Creating an Configuration File for SDP

An SDP configuration file is a text file, and you decide where on the file system this file will reside. Every line in the configuration file is either a comment or a rule. A comment is indicated by the hash character "#" at the beginning of the line, and everything following the hash character will be ignored.

There are two types of rules, as follows:

  • A "bind" rule indicates that the SDP protocol transport should be used when a TCP socket binds to an address and port that match the rule.
  • A "connect" rule indicates that the SDP protocol transport should be used when an unbound TCP socket attempts to connect to an address and port that match the rule.

A rule has the following format:

("bind"|"connect")1*LWSP-char(hostname|ipaddress)["/"prefix])1*LWSP-char("*"|port)["-"("*"|port)]
The first keyword indicates whether the rule is a bind or a connect rule. The next token specifies either a host name or a literal IP address. When you specify a literal IP address, you can also specify a prefix, which indicates an IP address range. The third and final token is a port number or a range of port numbers. Consider the following notation in this sample configuration file:
# Enable SDP when binding to 192.168.0.1
bind 192.168.0.1 *

# Enable SDP when connecting to all
# application services on 192.168.0.*
connect 192.168.0.0/24     1024-*

# Enable SDP when connecting to the HTTP server
# or a database on oracle.infiniband.sdp.com
connect oracle.infiniband.sdp.com   80
connect oracle.infiniband.sdp.com   3306
The first rule in the sample file specifies that SDP is enabled for any port (*) on the local IP address 192.168.0.1. You would add a bind rule for each local address assigned to an InfiniBand adaptor. An InfiniBand adaptor is the equivalent of a network interface card (NIC) for InfiniBand. If you had several InfiniBand adaptors, you would use a bind rule for each address that is assigned to those adaptors. The second rule in the sample file specifies that whenever connecting to 192.168.0.* and the target port is 1024 or greater, SDP will be enabled. The prefix on the IP address /24 indicates that the first 24 bits of the 32-bit IP address should match the specified address. Each portion of the IP address uses 8 bits, so 24 bits indicates that the IP address should match 192.168.0 and the final byte can be any value. The "-*" notation on the port token specifies "and above." A range of ports, such as 1024—2056, would also be valid and would include the end points of the specified range.

The final rules in the sample file specify a host name called oracle.infiniband.sdp.com, first with the port assigned to an HTTP server (80) and then with the port assigned to a database (3306). Unlike a literal IP address, a host name can translate into multiple addresses. When you specify a host name, it matches all addresses that the host name is registered to in the name service.

How Enable the SDP Protocol?

SDP support is disabled by default. To enable SDP support, set the com.sun.sdp.conf system property by providing the location of the configuration file. The following example starts an application named MyApplication using a configuration file named sdp.conf:

     java -Dcom.sun.sdp.conf=sdp.conf -Djava.net.preferIPv4Stack=true MyApplication

MyApplication refers to the client application written in Java that is attempting to connect to the InfiniBand adaptor. Note that this example specifies another system property, java.net.preferIPv4Stack. Maybe, not everything discussed so far will work perfectly fine. There are some technical issues that you should be aware when enabling SDP, most of them, related with the IPv4 or IPv6 stacks. If something in your tests goes wrong, it is a nice idea ensure with your operating system administrators if the InfiniBand is correctly configured at this layer.

APIs and Supported Java Classes

    All classes in the JDK that read or write network sockets can use SDP. As said before, there are no changes in the way you write the code. This is particularly interesting if you are only installing a already deployed application in a cluster that uses InfiniBand. When SDP support is enabled, it just works without any change to your code. Compiling is not necessary. However, it is important to know that a socket is bound only once. A connection is an implicit bind. So, if the socket hasn't been previously bound and connect is invoked, the binding occurs at that time. For example, consider the following code:

    AsynchronousSocketChannel asyncSocketChannel = AsynchronousSocketChannel.open();
    InetSocketAddress localAddress = new InetSocketAddress("10.0.0.3", 8020);
    InetSocketAddress remoteAddress = new InetSocketAddress("192.168.0.1", 1521);
    asyncSocketChannel.bind(localAddress);
    Future result = asyncSocketChannel.connect(remoteAddress);
    

    In this example, the asynchronous socket channel is bound to a local TCP address when bind is invoked on the socket. Then, the code attempts to connect to a remote address by using the same socket. If the remote address uses InfiniBand, as specified in the configuration file, the connection will not be converted to SDP because the socket was previously bound.

    Conclusion

    InfiniBand is a "de-facto" standard for high performance computing environments that needs to guarantee extreme low latency and superior throughput. But to really take advantage of this technology, your applications need to be properly prepared. This article showed how you can maximize the performance of Java applications enabling the SDP support, the network protocol that "talks" with the InfiniBand network.

    About

    Ricardo Ferreira is just a regular person that lives in Brazil and is passionate for technology, movies and his whole family. Currently is working at Oracle Corporation.

    Search

    Categories
    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today