Is Big Data just Super Sexy Batch?
By Dain C. Hansen-Oracle on Jul 22, 2012
One of the key expectations we have for big data and our information architecture is to yield faster, better and more insightful analytics. That appeal of processing so much information quickly is why the Hadoop technologies may have originally been invented. But is it nothing more than a super sexy batch? Yes – on sexy. But there’s definitely an important real-time element involved.
Real-time replication is one of the most important and next generation approaches to the business analytics and data warehousing industry. It’s one of the most efficient ways to get information over to our analytics tools from the multiple sources and systems where data lives, and of course, get it there instantly without impact on the original source systems. How is this different than the famous “3Vs” for big data? V-ery different.
At the heart of big data precepts are good old fashion super-non-sexy batch processing underpinnings in extraction, transformation, loading. We just call it differently with terms like reduction or federation. But what’s differently is that this batch is done much much faster and with different styles of data, unstructured, semi-structured data, and also it’s applied on different non-relational systems. All of these elements contribute to velocity, volume and variety. So yes, some of the big data processing is implicitly fast.
So let’s go back to analytics again. What a business user may see on a business analytics dashboard is dependent on how the data is loaded, transformed, cleansed, and ultimately mastered into many different applications. But how timely the data is is a question that will still need to be asked whether it is big data or traditional enterprise data. And that gets to the heart of it. We need both. These technologies need to work together. Real-time and Big Data are very much sisters, brothers to developing next generation information architectures that can help businesses overcome the challenges of managing a data explosion.
While big data on its own has no simple means of applying ‘traditional’ change data capture, it’s still an important requirement to implement real-time solutions in conjunction with big data. Otherwise the speed advantage to indexing realms of big data will be undone by sluggish ETL processing that it’s dependent on. Big data can be processed at high volume with high velocity. Combine this power with real-time solutions in replication, change data capture, synchronization, and the integration to in-memory business analytics tooling, and you have what amounts to the compelling advantages of real-time business analytics.
Oracle's solution for Real-time Business Analytics is achieved by using Oracle Data Integration and Oracle Business Analytics. Within Oracle Business Analytics is Oracle Exalytics, the industry’s first in-memory BI machine designed to achieve explosive performance of BI applications. Oracle GoldenGate is an integral part of the real-time business analytics use case in that it accomplishes real-time data replication and capture, hence ensuring that applications have the data they need immediately. Together these solutions can be integrated with big data architectures using Oracle Big Data Connectors to help solve more complex information correlation challenges that span both types of worlds: enterprise data and big data.