Friday Apr 04, 2014

Turning Big Data into Real-Time Action for a Greater Customer Experience

The power shifted to us, consumers. The digital revolution allows us to access broader set of services, and communicate without boundaries. Today we demand more and better choices in a competitive market, putting pressures on businesses to catch up with our expectations.

By offering differentiated and improved experience to their customers organizations see that they can drive revenue growth via higher loyalty, and improved brand perception.  Because technology is a key enabler for delivering superb and consistent customer experience across all touchpoints, in recent years customer experience solutions have become a top priority for CIOs. Thanks to the availability of big data analytics, organizations can now analyze a broader variety of data, rather than a few basic data points, and gain deeper insight into their customers and operations. In turn, this deeper insight helps align their business to provide a seamless customer experience.

In our digital, fact-paced world we produce large volumes of data with unprecedented velocity. This data contains perishable value that requires fast capture, analysis, and action to be able to influence the operations or the interaction with the customer. Otherwise the insight or action may become irrelevant, which decreases the value for the customer and the organization significantly.  To extract the maximum value from highly dynamic and perishable data, you need to process much faster and take timely action. This is the main premise behind Oracle's Fast Data solutions, which we have discussed in previous blogs and webcasts.

Real-time data integration and analytics play a crucial role in our new world of big and fast data. Organizations that look into leveraging  big data to create greater customer experience, need to evaluate the analytical foundation behind their customer-facing systems and resulting interactions, and determine whether they can improve how and when they collect, analyze, and act on their ever-growing data assets.

In our next webcast my colleague Pete Schutt in the Oracle Business Analytics team and I will discuss how organizations can create value for their customers using real-time customer analytics, and how to leverage big data to build a solid business analytics foundation using the latest features of Oracle Data Integration and Oracle Business Analytics. We will provide multiple customer examples for different solution architectures.

Join us on April 15th 10am PT/ 1pm ET by registering via the link below.

Turning Big Data into Real-Time Action for a Greater Customer Experience

Tuesday, April 15th 10am PT/ 1pm ET

Until we meet at this webcast, please review my related article on this topic published on DBTA earlier this year:

Wednesday Jan 29, 2014

ODI 12.1.2 Demo on the Oracle BigDataLite Virtual Machine

Oracle's big data team has just announced the Oracle BigDataLite Virtual Machine, a pre-built environment to get you started on an environment reflecting the core software of Oracle's Big Data Appliance 2.4. BigDataLite is a VirtualBox VM that contains a fully configured Cloudera Hadoop distribution CDH 4.5, an Oracle DB 12c, Oracle's Big Data Connectors, Oracle Data Integrator 12.1.2, and other software.

You can use this environment to see ODI 12c in action integrating big data with Oracle DB using ODI's declarative graphical design, efficient EL-T loads, and Knowledge Modules designed to optimize big data integration. 

The sample data contained in BigDataLite represents the fictional Oracle MoviePlex on-line movie streaming company. The ODI sample performs the following two steps:

  • Pre-process application logs within Hadoop: All user activity on the MoviePlex web site is gathered on HDFS in Avro format. ODI is reading these logs through Hive and processes activities by aggregating, filtering, joining and unioning the records in an ODI flow-based mapping. All processing is performed inside Hive map-reduce jobs controlled by ODI, and the resulting data is stored in a staging table within Hive.
  • Loading user activity data from Hadoop into Oracle: The previously pre-processed data is loaded from Hadoop into an Oracle 12c database, where this data can be used as basis for Business Intelligence reports. ODI is using the Oracle Loader for Hadoop (OLH) connector, which executes distributed Map-Reduce processes to load data in parallel from Hadoop into Oracle. ODI is transparently configuring and invoking this connector through the Hive-to-Oracle Knowledge Module.

Both steps are orchestrated and executed through an ODI Package workflow. 

Demo Instructions

Please follow these steps to execute the ODI demo in BigDataLite:

  1. Download and install BigDataLite. Please follow the instructions in the Deployment Guide at the download page
  2. Start the VM and log in as user oracle, password welcome1.
  3. Start the Oracle Database 12c by double-clicking the icon on the desktop.


  4. Start ODI 12.1.2 by clicking the icon on the toolbar.


  5. Press Connect To Repository... on the ODI Studio window. 


  6. Press OK in the ODI Login dialog.


  7. Switch to the Designer tab, open the Projects accordion and expand the projects tree to Movie > First Folder > Mappings. Double-click on the mapping Transform Hive Avro to Hive Staging.


  8. Review the mapping that transforms source Avro data by aggregating, joining, and unioning data within Hive. You can also review the mapping Load Hive Staging to Oracle the same way. 

    (Click image for full size)

  9. In the Projects accordion expand the projects tree to Movie > First Folder > Packages. Double-click on the package Process Movie Data.


  10. The Package workflow for Process Movie Data opens. You can review the package.


  11. Press the Run icon on the toolbar. Press OK for the Run and Information: Session started dialogs. 




  12. You can follow the progress of the load by switching to the Operator tab and expanding All Executions and the upmost Process Movie Data entry. You can refresh the display by pressing the refresh button or setting Auto-Refresh. 


  13. Depending on the environment, the load can take 5-15 minutes. When the load is complete, the execution will show all green checkboxes. You can traverse the operator log and double-click entries to explore statistics and executed commands. 

This demo shows only some of the ODI big data capabilities. You can find more information about ODI's big data capabilities at:


Monday Oct 14, 2013

Streaming relational transactions to HBase

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog posts on "Streaming Relational Transactions to HDFS and Hive", in this blog post I focus on the architecture for streaming transactions into HBase.

As shown in the diagram below, integrating database with HBase is accomplished by developing a custom handler using Oracle GoldenGate's Java API and HBase APIs.

 The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file. The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions to HBase.

A sample implementation of the HBase adapter is provided on My Oracle Support (Knowledge ID - 1586211.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding HBase table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

As always, I would greatly appreciate if you can share your use case about integrating Oracle GoldenGate with your Big Data strategy and your feedback on using the custom handler for integrating relational database with your Big Data systems. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Wednesday Oct 09, 2013

Streaming relational transactions to Hive

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog post on 'Streaming relational transactions to HDFS', in this blog post I will discuss the architecture for streaming relational transactions into Hive.

Referring to the architecture diagram below, integrating database with Hive is accomplished by developing a custom handler using Oracle GoldenGate's Java API and Hadoop HDFS APIs.

The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file.The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions in a desired format, with the appropriate content to a file which is defined by the Hive DDL for the table.

A sample implementation of the Hive adapter is provided on My Oracle Support (Knowledge ID - 1586188.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding Hive table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

It would be great if you could share your use case about leveraging Oracle GoldenGate in your Big Data strategy and your feedback on using the custom handler for integrating relational database with Hive. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Thursday Oct 03, 2013

Streaming relational transactions to Hadoop- HDFS

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems'  in this blog post I will drill down on the architecture for streaming  relational transactions into HDFS. 

As you can see in the architecture diagram below, you can integrate relational database with HDFS by developing a custom handler using Oracle GoldenGate's Java API and Hadoop HDFS APIs.

 The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file.

The Pump process executes the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions in the desired format, with the appropriate content to a desired file on HDFS.

A sample implementation of the HDFS adapter is provided on My Oracle Support (Knowledge ID - 1586210.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a file on HDFS. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

Please share your use case about how you use Oracle GoldenGate in your Big Data strategy and your feedback on using the custom handler for integrating relational database with your Big Data systems. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Thursday Sep 12, 2013

Stream Relational Transactions into Big Data Systems

Are you one of the organizations adopting ‘big data systems’ to manage and analyze a class of data typically referred to as big data? If so, you may know that big data includes data that could be structured, semi-structured or unstructured, each of which originates from a variety of different sources.  Another characterization of big data is described by the data's volume, velocity, and veracity. Due to its promise to help harness the data deluge we are faced with, the adoption of big data solutions is becoming quite pervasive. In this blog post I’d like discuss how to leverage Oracle GoldenGate’s real-time replication for big data systems.

The term 'big data systems' is an umbrella terminology used in general to discuss a wide variety of technologies each of which is used for a specific purpose. Broadly speaking, big data technologies address the needs for batch, transactional, and real-time processing requirements. Using the appropriate big data technology is highly dependent on the use case being addressed.

While gaining business intelligence from transactional data continues to be a dominant factor in the decision making process, businesses have realized that gaining intelligence from other forms of data they have been collecting will enable them achieve a more complete view, address additional business objectives, and lead to better decision making. The following table illustrates some examples of various industry verticals, forms of data, and the objective the business attempts to achieve using the other forms of data.

Industry

Data

Objective

Healthcare

Practitioner’s notes, machine statistics.

Best practices and reduced hospitalization.

Retail

Weblog, click streams.

Micro-segmentation recommendations.

Banking

Weblogs, fraud reports.

Fraud detection, risk analysis.

Utilities

Smart meter reading, call center data.

Real-time and predictive utilization analysis.

Role of transactional data

When using other forms of data for analytics, better contextual intelligence is obtained when the analysis is combined with transactional data. Especially low-latency transactional data brings additional value to dynamically changing operations that day-old data cannot deliver. In organizations, a vast majority of applications' transactional data is captured in relational databases. In order to ensure an efficient supply of transactional data for big data analytics, there are several requirements that the data integration solution should address:

<!--[if !supportLists]-->· <!--[endif]-->Reliable change data capture and delivery mechanism

<!--[if !supportLists]-->· <!--[endif]-->Minimal resource consumption when extracting data from the relational data source

<!--[if !supportLists]-->· <!--[endif]-->Secured data delivery

<!--[if !supportLists]-->· <!--[endif]-->Ability to customize data delivery

<!--[if !supportLists]-->· <!--[endif]-->Support heterogeneous database sources

<!--[if !supportLists]-->· <!--[endif]-->Easy to install, configure and maintain

A solution which can reliably stream database transactions to a desired target enables that the effort is spent on data analysis rather than data acquisition. Also, when the solution is non-intrusive and minimally impacts the source database, it minimizes the need for additional resources and changes on the source database.

Oracle GoldenGate is a time tested and proven product for real-time, heterogeneous relational database replication. Oracle GoldenGate addresses the challenges listed above and is widely used by organizations for mission critical data replication among relational databases. Furthermore, GoldenGate moves transactional data in real-time to support timely operational business intelligence needs.

Oracle GoldenGate Integration Options for Big Data Analytics

There is a variety of integration options available with the Oracle GoldenGate product that facilitates delivering transactions on relational databases into non-relational targets.

Oracle GoldenGate provides pre-built adapters which integrate with Flat Files and Messaging Systems. Please refer to Oracle GoldenGate for Java - Administration Guide and Oracle GoldenGate for Flat Files -Administration Guide for more information.

Oracle GoldenGate also provides Java APIs and a framework for developing custom integrations to Java enabled targets. Using this capability, custom adapters or handlers can be developed to address specific requirements. In this blog post I’d like focus on Oracle GoldenGate Java APIs for developing custom integrations to big data systems.

As we mentioned earlier, 'big data systems' is an umbrella terminology used in general to describe a wide variety of technologies, each of which is used for a specific purpose. Among the various big data systems, Hadoop and its suite of technologies are widely adopted by various organizations for processing big data. The below diagram illustrates a general high level architecture for integrating with Hadoop.

<!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]-->

Custom Adapter

<!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]-->

Pump Parameter file

Adapter Properties file

<!--[if !vml]-->

You can implement custom adapter or handler for the big data system using Oracle GoldenGate's Java API. The custom adapter is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump and the custom adapter are configured through the Pump parameter file and custom adapter's properties file respectively. Depending upon the requirements, the properties for the custom adapter will need to be determined and implemented.

The Pump process will execute the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter will write the transactions into Hadoop.

Enabling the co-existence of big data systems with relational systems will benefit organizations to better serve customers and improve decision-making capabilities. Oracle GoldenGate, which has an excellent record of empowering IT on the various aspects of data management requirements, provides the capability to integrate with big data systems. In the upcoming blog posts, we will discuss in depth the implementation and the configuration of integrating Oracle GoldenGate with Hadoop technologies. 

Friday May 31, 2013

Improving Customer Experience for Segment of One Using Big Data

Customer experience has been one of the top focus areas for CIOs in recent years. A key requirement for improving customer experience is understanding the customer: their past and current interactions with the company, their preferences, demographic information etc. This capability helps the organization tailor their service or products for different customer segments to maximize their satisfaction. This is not a new concept. However, there have been two parallel changes in how we approach and execute on this strategy.

First one is the big data phenomenon that brought the ability to obtain a much deeper understanding of customers, especially bringing in social data. As this Forbes article "Six Tips for Turning Big Data into Great Customer Experiences" mentions big data especially has transformed online marketing. With the large volume and different types of data we have now available companies can run more sophisticated analysis, in a more granular way. Which leads to the second change: the size of customer segments. It is shrinking down to one, where each individual customer is offered a personalized experience based on their individual needs and preferences. This notion brings more relevance into the day-to-day interactions with customers, and basically takes customers satisfaction and loyalty to a new level that was not possible before. 

One of the key technology requirements to improve customer experience at such a granular level is to obtaining a complete and  up-to-date view of the customer. And that requires integrating data across disparate systems and in a timely manner. Data integration solution should move and transform large data volumes stored in heterogeneous systems in geographically dispersed locations. Moving data with very low latency to the customer data repository or a data warehouse, enables companies to have a relevant and actionable insight for each customer. Instead of relying on yesterday's data, which may not be pertinent anymore, the solution should analyze latest information and turn them into a deeper understanding of that customer. With that knowledge the company can formulate real opportunities to drive higher customer satisfaction.

Real-time data integration is key enabling technology for real-time analytics. Oracle GoldenGate's real-time data integration technology  has been used by many leading organizations to get the most out of their big data and build a closer relationship with customers.  One good example in the telecommunications industry is MegaFon. MegaFon is Russia's top provider of mobile internet solutions. The company deployed Oracle GoldenGate 11g to capture billions of monthly transactions from eight regional billing systems. The data was integrated and centralized onto Oracle Database 11g  and distributed to business-critical subsystems. The unified and up-to-date view into customers enabled more sophisticated analysis of mobile usage information and facilitated more targeted customer marketing. As a result of  the company increased revenue generated from the current customer base. Many other telecommunications industry leaders, including DIRECTV, BT, TataSky, SK Telecom, Ufone, have improved customer experience by leveraging real-time data integration. 

Telecommunications is not the only industry where single view of the customer drives more personalized interaction with customers. Woori Bank  implemented Oracle Exadata and Oracle GoldenGate.  In the past, it had been difficult for them to revise and incorporate changes to marketing campaigns in real time because they were working with the previous day’s data. Now, users can immediately access and analyze transactions for specific trends in the data mart access layer and adjust campaigns and strategies accordingly. Woori Bank can also send tailored offers to customers. 

This is just one example of how real-time data integration can transform business operations and the way a company interacts with its customers. I would like to invite you to learn more about data integration facilitating improved customer experience by  reviewing our  free resources here and following us on Facebook, Twitter, YouTube, and Linkedin.

Image courtesy of jscreationzs at FreeDigitalPhotos.net

Thursday Apr 11, 2013

Why Real Time?

Continuing on the five key data integration requirements topic, this time we focus on real-time data for decision making. 

[Read More]

Friday Mar 15, 2013

Pervasive Access to Any Data

In my previous blog, I shared with you the five key data integration requirements, which can be summarized as: integrating any data from any source, stored on premise or in the cloud, with maximum performance and availability, to achieve 24/7 access to timely and trusted information. Today, I want to focus on the requirement for integrating “any data”.

We all feel the impact of huge growth in the amount of raw data collected on a daily basis. And big data is a popular topic of information technology these days. Highly complex, large volumes of data bring opportunities and challenges to IT and business audiences. The opportunities, as discussed in McKinsey’s report, are vast, and companies are ready to tap into big data to differentiate and innovate in today’s competitive world.

One of the key challenges of big data is managing the unstructured data, which is estimated to be %80 of enterprise data. Structured and unstructured data must coexist and be used in conjunction with each other in order to gain maximum insight. This means, organizations must collect, organize, and analyze data from sensors, conversations, e-commerce websites, social networks, and many other sources.

Big data also changes the perspective into information management. It changes the question from “How do you look at your data?” to “How do you look at the data that is relevant to you?” This shift in perspective has huge implications in terms of information-management best practices and technologies applied. Data integration technologies now need to support unstructured and semi-structured data, in addition to structured transactional data, to be able to support a complete picture of the enterprise that will drive higher efficiencies, productivity and innovation.

Oracle addresses big data requirements with a complete solution.

In addition to Oracle Big Data Appliance for acquiring and organizing big data, Oracle offers Oracle Big Data Connectors that enable an integrated data set for analysis. Big Data Connectors is a software suite that integrates big data across the enterprise. Oracle Data Integrator offers an Application Adapter for Hadoop, which is part of the Big Data Connectors, and allows organizations to build Hadoop metadata within Oracle Data Integrator, load data into Hadoop, transform data within Hadoop, and load data directly into Oracle Database using Oracle Loader for Hadoop. Oracle Data Integrator has the necessary capabilities for integrating structured, semi-structured, and unstructured data to support organizations with transforming any type of data into real value.

If you would like to learn more about how to use Oracle’s data integration offering for your big data initiatives take a look at our resources on Bridging the Big Data Divide with Oracle Data Integration.

Thursday Mar 07, 2013

5 New Data Integration Requirements for Today’s Data-Driven Organizations

How are the latest technology trends affecting data integration requirements? Read about the 5 key data integration requirements and Oracle's Data Integration product strategy in meeting these requirements.[Read More]

Monday Feb 25, 2013

Connecting Velocity to Value: Introducing Oracle Fast Data

To understand fast data, one must first look at one of the most compelling new the breakthroughs in data management: big data. Big data solutions address the challenge today’s businesses are facing when it comes to managing the increasing volume, velocity, variety of all data - not just data within as well as about the organization. Much of the buzz and to-do around big data has been around Hadoop, NoSQL technologies, but little has been talked about velocity. Velocity is about the speed this data is generating. In many cases the economic value of this data diminishes fast as well. As a result, companies need to process large volumes of data in real-time and make decisions in a more rapid fashion to create value from highly-perishable, high-volumes of data in business operations.

This is where fast data comes in. Fast data solutions help manage the velocity (and scale) of any type of data and any type of event to enable precise action for real-time results.

Fast data solutions come from multiple technologies, and some of the concepts, such as complex event processing and business activity monitoring, have been in use in areas such as the financial services industry for years. But often, the pieces were used in isolation—a complex event process engine as a standalone application to apply predefined business rules to filter data, for example. But when these concepts are tied to analytics, capabilities expand to allow improved real-time insights. By tying together these strands, companies can filter/correlate, move/transform, analyze, and finally act on information from big data sources quickly and efficiently, enabling both real-time analysis and further business intelligence work once the information is stored.

Oracle’s Fast Data solutions offer multiple technologies that work hand-in-hand to create value out of high-velocity, high-volume data. They are designed to optimize the efficiency, scale for processing high volume events and transactions.

[Read More]

Friday Dec 28, 2012

ODI - Reverse Engineering Hive Tables

ODI can reverse engineer Hive tables via the standard reverse engineer and also an RKM to reverse engineer tables defined in Hive, this makes it very easy to capture table designs in ODI from Hive for integrating. To illustrate I will use the movie lens data set which is a common data set used in Hadoop training.

I have defined 2 tables in Hive for movies and their ratings as below, one file has fields delimited with '|' the other is tab delimited. 

  1. create table movies (movie_id int, movie_name string, release_date string, vid_release_date string,imdb_url string) row format delimited fields terminated by '|';
  2. create table movie_ratings (user_id string, movie_id string, rating float, tmstmp string) row format delimited fields terminated by '\t';

For this example I have loaded the Hive tables manually from my local filesystem (into Hive/HDFS) using the following LOAD DATA Hive commands and the movie lens data set mentioned earlier; 

  1. load data local inpath '/home/oracle/data/u.item' OVERWRITE INTO TABLE movies;
  2. load data local inpath '/home/oracle/data/u.data' OVERWRITE INTO TABLE movie_ratings;

The data set in the file u.item data file looks like the following with '|' delimiter;

  • 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
  • 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
  • 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0

In ODI I can define my Hive data server and logical schema, here is the JDBC connection for my Hive database (I just used the default);

I can then define my model and perform a selective reverse using standard ODI functionality, below I am reversing just the movies table and the movie ratings table;

I

After the reverse is complete, the tables will appear in the model in the tree, the data can be inspected just like regular datastores;

From here we see the data in the regular data view;

The ODI RKM for Hive performs logging that is useful in debugging if you hit issues with the reverse engineer. This is a very basic example of how some of the capabilities hang together, ODI can also be used to design the load of the file into Hive, transformations within it and subsequent loads using Oracle Loader for Hadoop into Oracle and on and on.

Monday Oct 15, 2012

Bridging the Gap in Cloud, Big Data, and Real-time

With all the buzz of around big data and cloud computing, it is easy to overlook one of your most precious commodities—your data. Today’s businesses cannot stand still when it comes to data. Market success now depends on speed, volume, complexity, and keeping pace with the latest data integration breakthroughs. Are you up to speed with big data, cloud integration, real-time analytics? Join us in this three part blog series where we’ll look at each component in more detail. Meet us online on October 24th where we’ll take your questions about what issues you are facing in this brave new world of integration.

Let’s start first with Cloud.

What happens with your data when you decide to implement a private cloud architecture? Or public cloud? Data integration solutions play a vital role migrating data simply, efficiently, and reliably to the cloud; they are a necessary ingredient of any platform as a service strategy because they support cloud deployments with data-layer application integration between on-premise and cloud environments of all kinds. For private cloud architectures, consolidation of your databases and data stores is an important step to take to be able to receive the full benefits of cloud computing. Private cloud integration requires bidirectional replication between heterogeneous systems to allow you to perform data consolidation without interrupting your business operations. In addition, integrating data requires bulk load and transformation into and out of your private cloud is a crucial step for those companies moving to private cloud. In addition, the need for managing data services as part of SOA/BPM solutions that enable agile application delivery and help build shared data services for organizations.

But what about public Cloud? If you have moved your data to a public cloud application, you may also need to connect your on-premise enterprise systems and the cloud environment by moving data in bulk or as real-time transactions across geographies.

For public and private cloud architectures both, Oracle offers a complete and extensible set of integration options that span not only data integration but also service and process integration, security, and management. For those companies investing in Oracle Cloud, you can move your data through Oracle SOA Suite using REST APIs to Oracle Messaging Cloud Service —a new service that lets applications deployed in Oracle Cloud securely and reliably communicate over Java Messaging Service . As an example of loading and transforming data into other public clouds, Oracle Data Integrator supports a knowledge module for Salesforce.com—now available on AppExchange. Other third-party knowledge modules are being developed by customers and partners every day.

To learn more about how to leverage Oracle’s Data Integration products for Cloud, join us live: Data Integration Breakthroughs Webcast on October 24th 10 AM PST.

Friday Sep 21, 2012

Tackling Big Data Analytics with Oracle Data Integrator

 By Mike Eisterer

 The term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and documents that can be mined for useful information.

 Companies are facing emerging technologies, increasing data volumes, numerous data varieties and the processing power needed to efficiently analyze data which changes with high velocity.

 

 

Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships

 

Oracle Data Integrator Enterprise Edition(ODI) is critical to any enterprise big data strategy. ODI and the Oracle Data Connectors provide native access to Hadoop, leveraging such technologies as MapReduce, HDFS and Hive. Alongside with ODI’s metadata driven approach for extracting, loading and transforming data; companies may now integrate their existing data with big data technologies and deliver timely and trusted data to their analytic and decision support platforms. In this session, you’ll learn about ODI and Oracle Big Data Connectors and how, coupled together, they provide the critical integration with multiple big data platforms.

 

Tackling Big Data Analytics with Oracle Data Integrator

October 1, 2012 12:15 PM at MOSCONE WEST – 3005

For other data integration sessions at OpenWorld, please check our Focus-On document

If you are not able to attend OpenWorld, please check out our latest resources for Data Integration.

Monday Aug 20, 2012

Hadoop - Invoke Map Reduce

Carrying on from the previous post on Hadoop and HDFS with ODI packages, this is another needed call out task - how to execute existing map-reduce code from ODI. I will show this by using ODI Packages and the Open Tool framework.

The Hadoop JobConf SDK is the class needed for initiating jobs whether local or remote – so the ODI agent could be hosted on a system other than the Hadoop cluster for example, and just fire jobs off to the Hadoop cluster. Also some useful posts such as this older one on executing map-reduce jobs from java (following the reply from Thomas Jungblut in this post) helped me get up to speed.

Where better to start than the WordCount example (see a version of it here, both mapper and reducer and inner classes), let’s see how this can be invoked from an ODI package. The HadoopRunJob below is a tool I added via the Open Tool framework, it basically wrappers the JobConf SDK, the parameters are defined in ODI.

You can see some of the parameters below, so I define the various class names I need below, plus various other parameters including the Hadoop job name, can also specify the job tracker to fire the job on (for a client-server style architecture). The input path and output path are also defined as parameters, you can see the first tool in the package is calling the copy file to HDFS – this is just to demonstrate that I will copy the files needed by the WordCount program into HDFS ready for it to run.

Nice and simple, and shields a lot of the complexity hidden behind some simple tools. The JAR file containing WordCount needed to be available to the ODI Agent (or Studio since I invoked it with the local agent), that was it. When the package is executed, just like normal the agent processes the code and executes the steps. I I run the package above it will successfuly copy the files to HDFS and perform the word count. On a second execution of the package an error will be reported because the output directory already exists as below.

I left the example like this to illustrate that we can then extend the package design to have conditional branching to handle errors after a flow, just like the following;

Here after executing the word count, the status is checked and you can conditionally branch on success or failure – just like any other ODI package. I used the beep just for demonstration.

The above HadoopRunJob tool used above was done using the v1 MapReduce SDK, with MR2/Yarn this again will change – these kinds of changes hammer home the need for better tooling to abstract common concepts for users to exploit.

You can see from these posts that we can provide useful tooling behind the basic mechanics of Hadoop and HDFS very easily, along with the power of generating map-reduce jobs from interface designs which you can see from the OBEs here.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today