Wednesday May 13, 2015

Looking for Cutting-Edge Data Integration: 2015 Excellence Awards

It is nomination time!!!

This year's Oracle Fusion Middleware Excellence Awards will honor customers and partners who are creatively using various products across Oracle Fusion Middleware. Think you have something unique and innovative with Oracle Data Integration products?

We'd love to hear from you! Please submit today in the Big Data and Analytics category.

The deadline for the nomination is July 31, 2015. Win a free pass to Oracle OpenWorld 2015!!

Let’s reminisce a little…

For details on the 2014 Data Integration Winners: NET Serviços and Griffith University, check out this blog post.

For details on the 2013 Data Integration Winners: Royal Bank of Scotland’s Market and International Banking and The Yalumba Wine Company, check out this blog post.

For details on the 2012 Data Integration Winners: Raymond James Financial and Morrisons Supermarkets, check out this blog post.

We hope to honor you!

Click here to submit your nomination today. And just a reminder: the deadline to submit a nomination is 5pm Pacific Time on July 31, 2015.

Tuesday May 12, 2015

ODI 12c - Improving Usability of KM recipes

This post is all about reducing user errors and improving usability surrounding definition of Knowledge Modules and their usage. Knowledge Modules are all about encapsulating recipes - every great cookbook has lots of recipes and some are based on common techniques and ingredients. ODI Knowledge Modules are data integration recipes - they define how to access, transform and store information based on the directions in the KM. There are a few usability improvements in the recent 12.1.3.0.1 release around both the KM definition and usage of the KM that make for an improved user experience. I've seen many KMs over the years where its many things to many people and there are a bundle of options that expose all facets for every path possible in the KM - the user has to read the description and follow the instructions.

The first improvement I'll mention is the KM (and procedure) option type of 'Choice'. Not exactly rocket science here I know, but an addition that greatly helps usage of a KM that may do more than one thing. Let's take the example of a KM that can make different .....pizzas. In the past you would have an option field which was a string based value where the user would type either margerita or pepperoni to drive a path within the KM implementation, users of the KM would have to know that those were the accepted option values and they'd have to type it in properly (otherwise things would go wrong). So now the options can be specified as the 'Choice' type, see below where in the IKM we capture the recipe type as a choice.

The choices can be defined in the default value field, below the recipe is going to either create margherita pizza or pepperoni- these are the only two choices and the default is margherita;

Then I can define all the rest of the options, let's say the pizza needs flour, oil, salt, yeast and pepperoni needs... pepperoni of course and margherita needs tomatoes and basil - so some of the options are applicable to both types and some are only applicable to the specific one. Prior to this release when the KM is used you would see all of these option values and you'd be reading the description 'only set basil if you are making margherita' and so on. Another feature has been added to improve this area. Below you can see all of the options....

One column was snipped out of the image - the condition expression. This is a groovy expression to determine whether the option is shown. So now we can say only display basil when margherita pizza is the recipe type or only display pepperoni when pepperoni is the recipe type. We see below the options only applicable to the recipe type are displayed - anything common has no condition expression.

The groovy snippet must return a string. The string must be of the format show=true|false

When you see the KM assigned in the mapping it becomes a little clearer. Below you can see the choice box, the user is constrained to pick one of those types;

 When margherita is selected above remember some options were for margherita and some were for pepperoni, we see a subset of options;

Above you can see tomatoes and basic, if you change the type to pepperoni the above options are hidden and pepperoni is displayed as below;

This helps guide the user into configuration options that are more applicable to a path within the KM. One of the other visual enhancements is the ability to group options together. We can add all of the options above into a group named 'Ingredients' that helps visually group related options together;

 Then when this is used you see the options related to ingredients from where the KM is assigned.

You can see how these help improve the usability of KMs in ODI and help reduce errors by further specializing how data is entered and related in the configuration options of the KM. The tasks within the KM can retrieve the option values and perform condition code based on those values. There are some other areas around this but that's all for now. The functionality described here is available in the 12.1.3.0.1.

Monday May 11, 2015

Oracle Big Data Preparation Cloud Service (BDP) – Coming Soon

What are your plans around Big Data and Cloud?

If your organization has already begun to explore these topics, you might be interested a new offering from Oracle that will dramatically simplify how you use your data in Hadoop and the Cloud:

Oracle Big Data Preparation Cloud Service (BDP)

There is a perception that most of the time spent in Big Data projects is dedicated to harvesting value. The reality is that 90% of the time in Big Data projects is really spent on data preparation. Data may be structured, but more often it will be semi-structured such as weblogs, or fully unstructured such as free form text. The content is vast, inconsistent, and incomplete, often off topic, and from multiple differing formats and sources. In this environment each new dataset takes weeks or months of effort to process, frequently requiring programmers writing custom scripts. Minimizing data preparation time is the key to unlocking the potential of Big Data.

Oracle Big Data Preparation Cloud Service (BDP) addresses this very reality. BDP is a non-technical, web-based tool that sets out to minimize data preparation time in an effort to quickly unlock the potential of your data. The BDP tool provides an interactive set of services that automate, streamline, and guide the process of data ingestion, preparation, enrichment, and governance without costly manual intervention.

The technology behind this service is amazing; it intuitively guides the user with a machine learning driven recommendation engine based on semantic data classification and natural language processing algorithms. But the best part is that non-technical staff can use this tool as easily as they use Excel, resulting in a significant cost advantage for data intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.

Curious to find out more? We invite you to view a short demonstration of BDP below:

Let us know what you think!

Stay tuned as we write more about this offering… visit often here!

Tuesday May 05, 2015

Oracle Data Integrator for Big Data Webcast - Recap

We followed our recent announcement of Oracle Data Integrator (ODI) for Big Data with an in-depth webcast with Denis Gray, Product Management Director for Oracle Data Integration and me. It was a deep dive into the product features, the differentiators and an inside look of how ODI for Big Data functions to support the various Apache projects. If you missed it you can watch it again here on demand.

We also talked about Oracle Metadata Management, a data governance tool that brings trust and transparency to Big Data projects within Oracle and 3rd party solutions.

 You will want to watch this if you are interested in knowing

a. How to become an ETL developer for Big Data without learning Java Coding

b. Why ODI for Big Data is a stand out technology architecture wise for Big Data processing

c. A comparative study on Big Data ETL vendors and offerings in the market.

 Below are some of the questions that we encountered in the session.

 How is unstructured data handled by ODI?

How is unstructured data handled by ODI?

We have different solutions for unstructured input.

We constantly put best practices and lessons learnt here at our blog. These blog posts on ODI for Big Data will also help get started.

ODI for Big Data Announcement post

Big Data Lite Free Demo and Download

This white paper Top 5 Big Data Integration Mistakes to Avoid also talks about the most common pitfalls enterprises make when approaching a big data project.

Is Oracle DI for Big Data a seperate licensed product than Oracle DI ?

Oracle Data Integrator Enterprise Edition Advanced Big Data Option is a separately licensed option. It is an option which would be purchased in addition to Oracle Data Integrator Enterprise Edition for advanced big data processing. More on this option can be found at the website and on the datasheet.

How do I load data from Oracle to Hadoop in an event driven manner ?

You can use Oracle GoldenGate for Big Data for this to capture all committed transactions on a source Oracle DB and deliver to Hive, HDFS, HBase, Flume, Kafka, and others. You can learn more about Oracle GoldenGate for Big Data here.

Can a customer be just fine with ODI, rather than puchasing Oracle Warehoue Builder for data warehousing projects, What are the strategic directions for these products from Oracle for data warehousing projects?

Oracle Data Integrator (ODI) is Oracle's strategic product for data integration. You can read the full statement of direction on this topic here. There are also automated migration utilities that help migrate your OWB work into the ODI environment.

Does the ODI comes with Financial Analytics also have the big data capability? or its only in Full version?

Financial Analytics Hub uses a restricted use license of ODI which is meant for use specifically with the Financial Analytics products as outlined in the license.

ODI EE has basic big data functionality such as Hive, Sqoop and Hbase. The Pig, Spark and Oozie functionality required ODI EE as well as the Advanced Big Data Option for ODI.

When customers expand beyond the specialized financial analytics they upgrade to the full license of ODIEE and ODIEE for Big Data.

Can I use Impala (Cloudera advancement on Hive). Does DI recognize Impala?

We have customers using Impala with ODI. This is supported through our regular JDBC support.

Will the utilization of Hadoop cut down on the need for a lot of manual coding, or will manual coding still be an essential part of Oracle Data Integration for Big Data?

ODI is specifically made to avoid manual coding and provide a graphical and metadata-driven way of Big Data Integration. Using tools instead of manual coding has been understood in the Data Integration community for decades, and this realization is coming to the Big Data community now through painful experiences.

See also this article: • Top 5 Big Data Integration Mistakes to Avoid

Are resources required for ODI managed by YARN in Hadoop?

ODI is using standard Hadoop subsystems like Hive, HDFS, Oozie, Pig, Spark, HBase which are managed by YARN, so we are implicitly taking advantage of YARN.

Can you please share any performance benchmarks we have with the other competitors?

We might suggest a couple whitepapers for your review on this subject:

Data Integration Platforms for Big Data and the Enterprise: Customer Perspectives on IBM, Informatica and Oracle

The Oracle Data Integrator Architecture –

Best Practices for Real-time Data Warehousing

Oracle Data Integrator Performance Guide

What is the major differentiator in the usage of Hive, spark or Pig within DI?

ODI provides a graphical and logical abstraction over these engines, so you can design your transformation without concern what the implementation engine will be. You can choose your engine after the fact based on their performance characteristics, such as in-memory performance of Spark. If a new Hadoop engine comes up in the future, it's easy to retool your logical design to run on a future language.

Can you explain the difference between using GG Big Data Adapters and GG java flat file Adapaters

The OGG BD adapter comes with 4 pre-packaged adapters for Hive, HDFS, Flume, HBase that have been developed and tested by Oracle R&D. It also comes with custom JAva and JMS support like the "Java flat file App Adapter". The only thing that is only in the "Java flat file App Adapter" is the Flat File support.

Oracle ODI is an alternate/equivalent to which particular product of Apache?

ODI is not competing against any Apache/Hadoop project, instead it integrates with them. We are utilizing HDFS, Hive, Spark, Pig, HBase, Sqoop for our transformations, unlike other DI vendors who deploy their own engine on Hadoop cluster.

So is ODI generating Hive/pig/MR code behind the scene when you define mappings?

You are correct. You design a logical mapping and pick an execution engine (Spark/Pig/Hive), ODI will generate and execute code for this engine. You have full visibility into this code, unlike a proprietary ETL engine.

Is Oracle Metadata Management (OMM) part of ODI or a separate product?

Oracle Metadata Management is a separate product (from ODI)however one that is complementary - please find the datasheet here:

Can you please share the details on GG on Big data

You can find more info here

Can you share the names of the blogs for effective ODI data integration..?

The ODI team is regularly posting on https://blogs.oracle.com/dataintegration/, but there is a rich community of bloggers writing about ODI: http://www.ateam-oracle.com/data-integration/di-odi/ http://odiexperts.com/ http://www.rittmanmead.com/category/oracle-data-integrator/ http://oracleodi.com/ https://gurcanorhan.wordpress.com/

And many many more - google for ODI BLOGS to get more info.

ODI can pull data at regular schedule (say every 2 min). golden gate do it real time. so it 2 min is kind of dealy is ok then do we need GG for bigdata ?

That is the general principle. If looking for real time replication, sub second latency then GoldenGate is the product. If looking for heavy processing of Big Data then ODI is the answer. They are actually complementary and work off of one another where customers use GG for data ingestion and ODI for data processing.

I'm Oracle apps DBA and Oracle performance DBA. Can i use my existing skillsets to transition into oracle DI for big data ? Is this completely different from DBA skillset?

ODI is popular with DBAs as the generated SQL code (RBDMS or Hive/Impala) is visible, and all of our runtime actions are "white box" so you can see what's happening. You can review queries and their query plans and optimize them using our Knowledge Module framework.

Thursday Apr 30, 2015

How To Setup Oracle GoldenGate When Performing the DB2 11.1 Upgrade

After the announcement of DB2 11.1 support in Oracle GoldenGate for DB2 z/OS ,a lot of questions was received on how to setup Oracle GoldenGate when performing the DB2 11.1 upgrade. This blog provides some instructions and explanations. 

DB2 11.1 increases the log record sequence numbers from 6 bytes to 10 bytes. The log reading API changed significantly to support the new log record format.  Oracle GoldenGate provides support for DB2 11.1 with a version specific build.  In other words, starting with Oracle GoldenGate 12.1.2.1.4, two downloadable builds will be provided to support DB2 z/OS: 

  • GoldenGate for DB2 10 and earlier versions
  • GoldenGate for DB2 11
If upgrading to DB2 11.1 in a data sharing configuration and you’ll be upgrading the subsystems in the group gradually (i.e. you’ll have a mixed DB2 11.1 & DB2 v10.1/9.1 group for some period of time), we first recommend that you upgrade the existing GoldenGate being used to the GoldenGate version that you plan to use once you’ve upgraded to DB2 11.1.  At the time of writing this document, the earliest version of GoldenGate that supports DB2 11.1 is 12.1.2.1.4.  

The diagram below depicts the GoldenGate and data sharing configuration prior to upgrading the first subsystem to DB2 11.1.
Picture
Please make sure you are not using the data sharing group name i.e. ADDO in the extract connection parameter. For example if the data sharing group name is ADDO, and the subsystem SSIDs of the group are ADD1 and ADD2..., please use the SSID name instead. When you use the data sharing group name, GoldenGate will connect to any of the subsystems to access log files from all of the subsystems in the data sharing group. However, during the upgrade process, we need to make sure the GoldenGate extract is connected to a specific subsystem of the group that will not be upgraded to DB2 v11.1 initially. For example,

SOURCEDB ADD1 userid uuuuuu, password ppppppp


To quickly modify a GoldenGate extract connection to another subsystem in the data sharing group, it is common practice to use an include file to define the connection parameter.  For example, the following “extract-conn.inc” file denoted in the “INCLUDE” parameter would contain the connection parameter above:

INCLUDE extract-conn.inc

In this example, you can keep the extract connected to ADD1 while upgrading the other members of the data sharing group to DB2 11.1. Data from all members in the data sharing group will be captured by GoldenGate. 
Picture
As soon as you upgrade one member of the data sharing group to DB2 11.1, you can choose to use the new GoldenGate for DB2 z/OS 11 build and connect the extract to that subsystem and capture log records from all the other subsystems in the data sharing group as illustrated below:
Picture
The DB2 IFI allows a GoldenGate extract to access log files for all DB2 subsystems that are a part of the DB2 data sharing group no matter which LPAR these subsystems are running in.  GoldenGate can capture from all members of a data sharing group even if there are different DB2 subsystem versions.  To clarify this further:

  • GoldenGate can connect to a DB2 11.1 subsystem and successfully capture log records from DB2 10.1 subsystem(s) that are also a part of the DB2 data sharing group.
  • In like manner, GoldenGate can also connect to a DB2 10.1 subsystem and successfully capture log records from DB2 11.1 subsystem(s) that are a part of the DB2 data sharing group.

Please refer to KM 1060540.1 if you need more information about the Oracle GoldenGate support for DB2 z/OS data sharing group.

If you have further question or suggestions,  please feel free to reach me at @@jinyu512

(Thanks my colleague Mark Geisler, Richard Johnson and Greg Wood for reviewing this doc.)

Wednesday Apr 15, 2015

Data Governance for Migration and Consolidation

By Martin Boyd, Senior Director of Product Management

How would you integrate millions of parts, customer and supplier information from multiple acquisitions into a single JD Edwards instance?  This was the question facing National Oilwell Varco (NOV), a leading worldwide provider of worldwide components used in the oil and gas industry.  If they could not find an answer then many operating synergies would be lost, but they knew from experience that simply “moving and mapping” the data from the legacy systems into JDE was not sufficient, as the data was anything but standardized.

This was the problem described yesterday in a session at the Collaborate Conference in Las Vegas.  The presenters were Melissa Haught of NOV and Deepak Gupta of KPIT, their systems integrator. Together they walked through an excellent discussion of the problem and the solution they have developed:

The Problem:  It is first important to recognize that the data to be integrated from many and various legacy systems had been created over time with different standards by different people according to their different needs. Thus, saying it lacked standardization would be an understatement.  So how do you “govern” data that is so diverse?  How do you apply standards to it months or years after it has been created? 

The Solution:  The answer is that there is no single answer, and certainly no “magic button” that will solve the problem for you.  Instead, in the case of NOV, a small team of dedicated data stewards, or specialists, work to reverse-engineer a set of standards from the data at hand.  In the case of product data, which is usually the most complex, NOV found they could actually infer rules to recognize, parse, and extract information from ‘smart’ part numbers, even from part numbering schemes from acquired companies.  Once these rules are created for an entity or a category and built in to their Oracle Enterprise Data Quality (EDQ) platform. Then the data is run through the DQ process and the results are examined.  Most often you will find out problems, which then suggest some rule refinements are required. Rule refinement and data quality processing steps run repeatedly until the result is as good as it can be.  The result is never 100% standardized and clean data though. Some data is always flagged into a “data dump” for future manual remediation. 

Lessons Learned:

  • Although technology is a key enabler, it is not the whole solution. Dedicated specialists are required to build the rules and improve them through successive iterations
  • A ‘user friendly’ data quality platform is essential so that it is approachable and intuitive for the data specialists who are not (nor should they be) programmers
  • A rapid iteration through testing and rules development is important to keep up project momentum.  In the case of NOV, specialists request rule changes, which are implemented by KPIT resources in India. So in effect, changes are made and re-run overnight which has worked very well

Technical Architecture:  Data is extracted from the legacy systems by Oracle Data Integrator (ODI), which also transforms the data in to the right ‘shape’ for review in EDQ.  An Audit Team reviews these results for completeness and correctness based on the supplied data compared to the required data standards.  A secondary check is also performed using EDQ, which verifies that the data is in a valid format to be loaded into JDE.

The Benefit:  The benefit of having data that is “fit for purpose” in JDE is that NOV can mothball the legacy systems and use JDE as a complete and correct record for all kinds of purposes from operational management to strategic sourcing.  The benefit of having a defined governance process is that it is repeatable.  This means that every time the process is run, the individuals and the governance team as a whole learn something from it and they get better at executing it next time around.  Because of this NOV has already seen orders of magnitude improvements in productivity as well as data quality, and is already looking for ways to expand the program into other areas.

All-in-all, Melissa and Deepak gave the audience great insight into how they are solving a complex integration program and reminded us of what we should already know: "integrating" data is not simply moving it. To be of business value, the data must be 'fit for purpose', which often means that both the integration process and the data must be governed. 

Friday Apr 10, 2015

Customers Tell All: What Sets Oracle Apart in Big Data Integration

Data integration has become a critical component of many technology solutions that businesses pursue to differentiate in their markets. Instead of relying on manual coding in house, more and more businesses choose data integration solutions to support their strategic IT initiatives, from big data analytics to cloud integration.

To explore the differences among the leading data integration solutions and the impact their technologies are having on real-world businesses, Dao Research recently conducted a research study, where they interviewed IBM, Informatica, and Oracle customers. In addition they reviewed publicly available solution information from these three vendors.

The research revealed some key findings that explains Oracle's leadership in the data integration space. For example:

  • Customers who participated in this study cite a range of 30 to 60 % greater development productivity using Oracle Data Integrator vs traditional ETL tools from Informatica and IBM. Dao's research ties Oracle's advantage to product architecture differences such as native push-down processing, the seperation of logical and physical layers, and the ability to extend Oracle Data Integrator using its knowledge modules.
  • The research also showed that Oracle’s data integration cost of ownership is lower because of its unified platform strategy (versus offering multiple platforms and options), its use of source and target databases for processing, higher developer productivity, faster implementation, and it doesn’t require management resources for a middle-tier integration infrastructure.
  • In the area of big data integration, the study highlights Oracle’s advantage with its flexible and native solutions. Unlike competitors’ offerings, developed as separate solutions, Oracle’s solution is aware of the cluster environment of big data systems. Oracle enables big data integration and cloud data integration through the use of a single platform with common tooling and inherent support for big data processing environments.
  • I should add that the latest release of Oracle Data Integrator EE Big Data Options  widens the competitive gap. Oracle is the only vendor that can automatically generate Spark, Hive, and Pig transformations from a single mapping. Oracle Data Integration customers can focus on building the right architecture for driving business value, and do not have to become expert on multiple programming languages.  For example, an integration architect in a large financial services provider told the research company "As an ODI developer, I am a Big Data developer without having to understand the underpinnings of Big Data. That's pretty powerful capability."


You can find the report of Dao's research here:

I invite you to read this research paper to understand why more and more customers trust Oracle for their strategic data integration initiatives after working with or evaluating competitive offerings.


This Week's A-Team Blog Speaks to Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The A-Team not only provides great content, they are humorous too!

Check out this week’s post, the title says it all: Getting Groovy with Oracle Data Integrator: Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The article covers various scripts written in Groovy and leverage the ODI SDK that assist in automating massive changes to one’s repository. These initially came to be as a result of customer desire in enhancing their environment in their effort to move from Oracle Warehouse Builder (ODI) to Oracle Data Integrator (ODI), but in the end came the realization that these scripts could be used by any ODI user.

Happy reading!

Thursday Apr 09, 2015

ODI, Big Data SQL and Oracle NoSQL

Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.

My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.

Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;

  • export KVHOME=/u01/nosql/kv-3.2.5
  • cd /u01/nosql/scripts
  • ./admin.sh

  • connect store -name kvstore
  • EXEC "CREATE TABLE recommendation( \
  •          custid INTEGER, \
  •          sno INTEGER, \
  •          genreid INTEGER,\
  •          movieid INTEGER,\
  •          PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
  • PUT TABLE -name RECOMMENDATION  -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json

The Manual Approach

This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;

  • CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
  •                  custid INT,
  •                  sno INT,
  •                  genreId INT,
  •                  movieId INT)
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
  •                            "oracle.kv.hosts"="localhost:5000",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="recommendation");

At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;

  • CREATE TABLE recommendation(
  •                  custid NUMBER,
  •                  sno NUMBER,
  •                  genreid NUMBER,
  •                  movieid NUMBER
  •          )
  •                  ORGANIZATION EXTERNAL
  •          (
  •                  TYPE ORACLE_HIVE
  •                  DEFAULT DIRECTORY DEFAULT_DIR
  •                  ACCESS PARAMETERS  (
  •                      com.oracle.bigdata.tablename=default.recommendation
  •                  )
  •          ) ;

Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;

  • SELECT m.title, g.name, c.first_name
  • FROM recommendation r, movie m, genre g, customer c
  • WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1 
  • ORDER by r.sno, r.genreid;

Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.

Oracle Data Integrator Approach

The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;


Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.

In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;

This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate. 

This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;

  • create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
  •  <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%> 
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
  •                            "oracle.kv.hosts"="<%=odiRef.getInfo("SRC_DSERV_NAME")%>",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");

You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 12.1.3.0.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.

Summary 

Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!


Wednesday Apr 08, 2015

Oracle GoldenGate for DB2 z/OS Supports DB2 11

With the release of Oracle GoldenGate 12.1.2.1.4 release, Oracle GoldenGate for DB2 z/OS provides the support for DB2 11. This release also includes the fix to make Oracle GoldenGate z/OS Extract compatible with  IBM APAR PI12599 for DB2 z/OS. [Read More]

Monday Apr 06, 2015

Announcing Oracle Data Integrator for Big Data

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level.

Increasing Big Data Heterogeneity and Transparency

This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include

  • Support for Apache Spark,
  • Support for Apache Pig, and
  • Orchestration using Oozie.

Click here for a detailed list of what is new in Oracle Data Integrator (ODI).

Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce.

Oracle Data Integrator for Big Data Webcast

We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data

  • Provides advanced scale and expanded heterogeneity for big data projects 
  • Uniquely compliments Hadoop’s strengths to accelerate decision making, and 
  • Ensures sub second latency with Oracle GoldenGate for Big Data.


Thursday Mar 26, 2015

Oracle Big Data Lite 4.1.0 is available with more on Oracle GoldenGate and Oracle Data Integrator

Oracle's big data team has announced the newest Oracle Big Data Lite Virtual Machine 4.1.0.  This newest Big Data Lite Virtual Machine contains great improvements from a data integration perspective with inclusion of the recently released Oracle GoldenGate for Big Data.  You will see this in an improved demonstration that highlights inserts, updates, and deletes into Hive using Oracle GoldenGate for Big Data with Oracle Data Integrator performing a merge of the new operations into a consolidated table.

Big Data Lite is a pre-built environment which includes many of the key capabilities for Oracle's big data platform.   The components have been configured to work together in this Virtual Machine, providing a simple way to get started in a big data environment.  The components include Oracle Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator, Oracle GoldenGate amongst others. 

Big Data Lite also contains hands-on labs and demonstrations to help you get started using the system.  Tame Big Data with Oracle Data Integration is a hands-on lab that teaches you how to design Hadoop data integration using Oracle Data Integrator and Oracle GoldenGate. 

                Start here to learn more!  Enjoy!

Wednesday Mar 11, 2015

Recap - How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Last week Oracle and Cloudera experts came together to discuss how Big Data is shaping Data Management.

Cloudera's Charles Zedlewski  spoke about  the Enterprise Data Hub, a Hadoop data store that can be used to store vast quantities of unstructured data. This data store, which is secure and governed, stores data in full fidelity, along with enriched and transformed data that can then be drawn upon to make critical business decisions.

Following Cloudera, Jeff Pollock emphasized the importance of Data Integration technologies that can take advantage of Hadoop clusters in their data processing methods. Offloading of data queries into the cluster, real time data ingestion into the data hub and building together an integrated Big Data reservoir were top of mind when designing Oracle's Data Management technologies.

Rounding the webcast up was Dain Hansen, who quizzed the two experts on some of the top mind questions that customers have about big data.

If you missed the webcast do not worry. You can watch it here on demand.

Friday Feb 27, 2015

How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Cutting through the Big Data Clutter

The Big Data world is changing rapidly, giving raise to new standards, languages and architectures. Customers are unclear about which Big Data technology will benefit their business the most, and how to future proof their Big Data investments.

This webcast helps customers sift through the changing Big Data architectures to help customer build their own resilient Big Data platform. Oracle and Cloudera experts discuss how enterprise platforms need to provide more flexibility to handle real-time and in memory computations for Big Data.



The speakers introduce the 4th generation architecture for Big Data that allows for expanded and critical capabilities to exist alongside each other. Customers can now see higher returns on their Big Data investment by ingesting real time data and improved data transformation for their Big Data analytics solutions. By choosing Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Metadata Management, customers gain the ability to keep pace with changing Big Data technologies like Spark, Oozie, Pig and Flume without losing productivity and reduce risk through robust Big Data governance.

In this webcast we also discuss the newly announced Oracle GoldenGate for Big Data. With this release, customers can stream real time data from their heterogeneous production systems into Hadoop and other Big Data systems like Apache Hive, HBase and Flume. This brings real time capabilities to customer’s Big Data architecture allowing them to enhance their big data analytics and ensure their Big Data reservoirs are up-to-date with production systems.

Click here to mark your calendars and join us for the webcast to understand Big Data Integration and ensure that you are investing in the right Big Data Integration solutions.

Thursday Feb 19, 2015

Introducing Oracle GoldenGate for Big Data!

Big data systems and big data analytics solutions are becoming critical components of modern information management architectures.  Organizations realize that by combining structured transactional data with semi-structured and unstructured data they can realize the full potential value of their data assets, and achieve enhanced business insight. Businesses also notice that in today’s fast-paced, digital business environment to be agile and respond with immediacy, access to data with low latency is essential. Low-latency transactional data brings additional value especially for dynamically changing operations that day-old data, structured or unstructured, cannot deliver.

Today we announced the general availability of Oracle GoldenGate for Big Data product, which offers a real-time transactional data streaming platform into big data systems. By providing easy-to-use, real-time data integration for big data systems, Oracle GoldenGate for Big Data facilitates improved business insight for better customer experience. It also allows IT organizations to quickly move ahead with their big data projects without extensive training and management resources. Oracle GoldenGate for Big Data's real-time data streaming platform also allows customers to keep their big data reservoirs up to date with their production systems. 

Oracle GoldenGate’s fault-tolerant, secure and flexible architecture shines in this new big data streaming offering as well. Customers can enjoy secure and reliable data streaming with subseconds latency. With Oracle GoldenGate’s core log-based change data capture capabilities it enables real-time streaming without degrading the performance of the source production systems.

The new offering, Oracle GoldenGate for Big Data, provides integration for Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. It also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.

You can learn more about our new offering via Oracle GoldenGate for Big Data data sheet and by registering for our upcoming webcast:

How to Future-Proof your Big Data Integration Solution

March 5th, 2015 10am PT/ 1pm ET

I invite you to join this webcast to learn from Oracle and Cloudera executives how to future-proof your big data infrastructure. The webcast will discuss :

  • Selection criteria that will drive business results with Big Data Integration 
  • Oracle's new big data integration and governance offerings, including Oracle GoldenGate for Big Data
  • Oracle’s comprehensive big data features in a unified platform 
  • How Cloudera Enterprise Data Hub and Oracle Data Integration combine to offer complementary features to store data in full fidelity, to transform and enrich the data for increased business efficiency and insights.

Hope you can join us and ask your questions to the experts.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
6
7
8
9
10
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today