Thursday Jul 02, 2015

Chalk Talk Video: How to Raise Trust and Transparency in Big Data with Oracle Metadata Management

Some fun new videos are available; we call the series ‘Chalk Talk’!

The first in the series that we will share with you around Oracle Data Integration speaks to raising trust and transparency within big data. It is known that crucial big data projects often fail due to a lack in the overall trust of the data. Data is not always transparent, and governing it can become a costly overhead. Oracle Metadata Management assists in the governance and trust across all data with the enterprise, Oracle and 3rd party.

View this video to learn more: Chalk Talk: How to Raise Trust and Transparency in Big Data.

For additional information on Oracle Metadata Management, visit the OEMM homepage.

Wednesday Jul 01, 2015

ODI - Integration with Oracle Storage Cloud Service

Oracle Data Integrator’s open tool framework can be leveraged to quickly get access to the Oracle Storage Cloud Service, which is gradually becoming an essential part for integrating on premise data to many cloud services. The reference implementation of an open tool for Oracle Storage Cloud is now available on the Data Integration project on Java.net: ODI OpenTool for Oracle Storage Cloud which can be used and modified as per your integration needs. [Read More]

Friday Jun 19, 2015

Oracle GoldenGate Certified on Teradata Unity

Oracle GoldenGate 12.1.2.1.1  is now certified with Unity 14.11.  With the certification, customers can use Oracle GoldenGate to deliver data to Teradata Unity which can then automate the distribution of data to multiple Teradata databases.  This joined effort from Oracle GoldenGate and Teradata extends the Oracle GoldenGate and Teradata ecosystem for real-time data integration. 

Tuesday Jun 16, 2015

ODI - Hive DDL Generation for Parquet, ORC, Cassandra and so on

Here you'll see how with some small surgical extensions we can use ODI to generate complex integration models in Hive for modelling all kinds of challenges;

  • integrate data from Cassandra, or any arbitrary SerDe
  • use Hive Parquet or ORC storage formats
  • create partitioned, clustered tables
  • define arbitrarily complex attributes on tables

I'm very interested in hearing what you think of this, and what is needed/useful over and above (RKMs etc) 

When you use this technique to generate your models you can benefit from one of ODI's lesser known but very powerful features - the Common Format Designer (see here for nice write up). With this capability you can build models from other models, generate the DDL and generate the mappings or interfaces to integrate the data from the source model to the target. This gets VERY interesting in the Big Data space since many customers want to get up and running with data they already know and love.

What do you need to get there? The following basic pieces;

  • a Hive custom create table action here.
  • flex fields for external table indicator, table data format, attribute type metadata get groovy script here

 Pretty simple right? If you take the simple example on the Confluence wiki;

  • CREATE TABLE parquet_test (
  •    id INT,
  •    str STRING,
  •    mp MAP<STRING,STRING>,
  •    lst ARRAY<STRING>,
  •    strct STRUCT<A:STRING,B:STRING>) 
  • PARTITIONED BY (part STRING)
  • STORED AS PARQUET;

...and think about how to model this in ODI, you'll think - how do I define the Parquet storage? How can the complex types be defined? That's where the flex fields come in, you can define any of the custom storage properties you wish in the datastore's Data Format flex field, then define the complex type formats.

In ODI the Parquet table above can be defined with the Hive datatypes as  below, we don't capture the complex fields within attributes mp, 1st or stct. The partitioned column 'part' is added in to the regular list of datastore attributes (like Oracle, unlike Hive DDL today);

Using the flex field Data Format we can specify the STORED AS PARQUET info, the table is not external to Hive so we do not specify EXTERNAL in the external data field;


This value can have lots of ODI goodies in it which makes it very versatile - you can use variables, these will be resolved when the generated procedure containing the DDL is executed, plus you can use <% style code substitution. 

The partitioning information is defined for the part attribute, the 'Used for Partitioning' field is checked below;

Let's take stock, we have our table defined as Parquet and we have the partitioning info defined. Now we have to define the complex types. This is done on attributes mp, 1st and stct using the creatively named Metadata field - below I have fully defined the MAP, STRUCT and ARRAY for the respective attributes;

Note the struct example escapes the : as the code is executed via the JDBC driver when executing via the agent and ':' is the way of binding information to a statement execution (you can see the generated DDL further below with the ':' escaped). 

With that we are complete, we can now generate DDL for our model;

This brings up a dialog with options for where to store the procedure with the generated DDL - this has actually done a base compare against the Hive system and allowed me to selected which datastores to create DDL for or which DDL to create, I have checked the parquet_test datastore;

Upon completion, we can see the procedure which was created. We can chose to further edit and customize this if so desired. We can schedule it anytime we want or execute.

Inspecting the Create Hive Table action, the DDL looks exactly as we were after when we first started - the datastore schema / name will be resolved upon execution using ODI's runtime context - which raised another interesting point! We could have used some ODI substitution code <% when defining the storage format for example;

As I mentioned earlier you can take this approach to capture all sorts of table definitions for integrating. If you want to look on the inside of how this is done, check the content of the Hive action in image below (I linked the code of the action above), it should look very familiar if you have seen any KM code and uses the flex fields to inject the relevant information in the DDL;

This is a great example of what you can do with ODI and how to leverage it to work efficiently. 

There is a lot more that can be illustrated here including what the RKM Hive can do or be modified to do. Perhaps creating the datastore and complex type information from JSON, JSON schema, AVRO schema or Parquet is very useful? I'd be very interested to hear your comments and thoughts. So let me know....

Tuesday Jun 09, 2015

Oracle Data Integrator Journalizing Knowledge Module for GoldenGate Integrated Replicat Blog from the A-Team

As always, useful content from the A-Team…

Check out the most recent blog about how to modify the out-of-the-box Journalizing Knowledge Module for GoldenGate to support the Integrated Replicat apply mode.

An Oracle Data Integrator Journalizing Knowledge Module for GoldenGate Integrated Replicat

Enjoy!

Wednesday May 13, 2015

Looking for Cutting-Edge Data Integration: 2015 Excellence Awards

It is nomination time!!!

This year's Oracle Fusion Middleware Excellence Awards will honor customers and partners who are creatively using various products across Oracle Fusion Middleware. Think you have something unique and innovative with Oracle Data Integration products?

We'd love to hear from you! Please submit today in the Big Data and Analytics category.

The deadline for the nomination is July 31, 2015. Win a free pass to Oracle OpenWorld 2015!!

Let’s reminisce a little…

For details on the 2014 Data Integration Winners: NET Serviços and Griffith University, check out this blog post.

For details on the 2013 Data Integration Winners: Royal Bank of Scotland’s Market and International Banking and The Yalumba Wine Company, check out this blog post.

For details on the 2012 Data Integration Winners: Raymond James Financial and Morrisons Supermarkets, check out this blog post.

We hope to honor you!

Click here to submit your nomination today. And just a reminder: the deadline to submit a nomination is 5pm Pacific Time on July 31, 2015.

Tuesday May 12, 2015

ODI 12c - Improving Usability of KM recipes

This post is all about reducing user errors and improving usability surrounding definition of Knowledge Modules and their usage. Knowledge Modules are all about encapsulating recipes - every great cookbook has lots of recipes and some are based on common techniques and ingredients. ODI Knowledge Modules are data integration recipes - they define how to access, transform and store information based on the directions in the KM. There are a few usability improvements in the recent 12.1.3.0.1 release around both the KM definition and usage of the KM that make for an improved user experience. I've seen many KMs over the years where its many things to many people and there are a bundle of options that expose all facets for every path possible in the KM - the user has to read the description and follow the instructions.

The first improvement I'll mention is the KM (and procedure) option type of 'Choice'. Not exactly rocket science here I know, but an addition that greatly helps usage of a KM that may do more than one thing. Let's take the example of a KM that can make different .....pizzas. In the past you would have an option field which was a string based value where the user would type either margerita or pepperoni to drive a path within the KM implementation, users of the KM would have to know that those were the accepted option values and they'd have to type it in properly (otherwise things would go wrong). So now the options can be specified as the 'Choice' type, see below where in the IKM we capture the recipe type as a choice.

The choices can be defined in the default value field, below the recipe is going to either create margherita pizza or pepperoni- these are the only two choices and the default is margherita;

Then I can define all the rest of the options, let's say the pizza needs flour, oil, salt, yeast and pepperoni needs... pepperoni of course and margherita needs tomatoes and basil - so some of the options are applicable to both types and some are only applicable to the specific one. Prior to this release when the KM is used you would see all of these option values and you'd be reading the description 'only set basil if you are making margherita' and so on. Another feature has been added to improve this area. Below you can see all of the options....

One column was snipped out of the image - the condition expression. This is a groovy expression to determine whether the option is shown. So now we can say only display basil when margherita pizza is the recipe type or only display pepperoni when pepperoni is the recipe type. We see below the options only applicable to the recipe type are displayed - anything common has no condition expression.

The groovy snippet must return a string. The string must be of the format show=true|false

When you see the KM assigned in the mapping it becomes a little clearer. Below you can see the choice box, the user is constrained to pick one of those types;

 When margherita is selected above remember some options were for margherita and some were for pepperoni, we see a subset of options;

Above you can see tomatoes and basic, if you change the type to pepperoni the above options are hidden and pepperoni is displayed as below;

This helps guide the user into configuration options that are more applicable to a path within the KM. One of the other visual enhancements is the ability to group options together. We can add all of the options above into a group named 'Ingredients' that helps visually group related options together;

 Then when this is used you see the options related to ingredients from where the KM is assigned.

You can see how these help improve the usability of KMs in ODI and help reduce errors by further specializing how data is entered and related in the configuration options of the KM. The tasks within the KM can retrieve the option values and perform condition code based on those values. There are some other areas around this but that's all for now. The functionality described here is available in the 12.1.3.0.1.

Monday May 11, 2015

Oracle Big Data Preparation Cloud Service (BDP) – Coming Soon

What are your plans around Big Data and Cloud?

If your organization has already begun to explore these topics, you might be interested a new offering from Oracle that will dramatically simplify how you use your data in Hadoop and the Cloud:

Oracle Big Data Preparation Cloud Service (BDP)

There is a perception that most of the time spent in Big Data projects is dedicated to harvesting value. The reality is that 90% of the time in Big Data projects is really spent on data preparation. Data may be structured, but more often it will be semi-structured such as weblogs, or fully unstructured such as free form text. The content is vast, inconsistent, and incomplete, often off topic, and from multiple differing formats and sources. In this environment each new dataset takes weeks or months of effort to process, frequently requiring programmers writing custom scripts. Minimizing data preparation time is the key to unlocking the potential of Big Data.

Oracle Big Data Preparation Cloud Service (BDP) addresses this very reality. BDP is a non-technical, web-based tool that sets out to minimize data preparation time in an effort to quickly unlock the potential of your data. The BDP tool provides an interactive set of services that automate, streamline, and guide the process of data ingestion, preparation, enrichment, and governance without costly manual intervention.

The technology behind this service is amazing; it intuitively guides the user with a machine learning driven recommendation engine based on semantic data classification and natural language processing algorithms. But the best part is that non-technical staff can use this tool as easily as they use Excel, resulting in a significant cost advantage for data intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.

Curious to find out more? We invite you to view a short demonstration of BDP below:

Let us know what you think!

Stay tuned as we write more about this offering… visit often here!

Tuesday May 05, 2015

Oracle Data Integrator for Big Data Webcast - Recap

We followed our recent announcement of Oracle Data Integrator (ODI) for Big Data with an in-depth webcast with Denis Gray, Product Management Director for Oracle Data Integration and me. It was a deep dive into the product features, the differentiators and an inside look of how ODI for Big Data functions to support the various Apache projects. If you missed it you can watch it again here on demand.

We also talked about Oracle Metadata Management, a data governance tool that brings trust and transparency to Big Data projects within Oracle and 3rd party solutions.

 You will want to watch this if you are interested in knowing

a. How to become an ETL developer for Big Data without learning Java Coding

b. Why ODI for Big Data is a stand out technology architecture wise for Big Data processing

c. A comparative study on Big Data ETL vendors and offerings in the market.

 Below are some of the questions that we encountered in the session.

 How is unstructured data handled by ODI?

How is unstructured data handled by ODI?

We have different solutions for unstructured input.

We constantly put best practices and lessons learnt here at our blog. These blog posts on ODI for Big Data will also help get started.

ODI for Big Data Announcement post

Big Data Lite Free Demo and Download

This white paper Top 5 Big Data Integration Mistakes to Avoid also talks about the most common pitfalls enterprises make when approaching a big data project.

Is Oracle DI for Big Data a seperate licensed product than Oracle DI ?

Oracle Data Integrator Enterprise Edition Advanced Big Data Option is a separately licensed option. It is an option which would be purchased in addition to Oracle Data Integrator Enterprise Edition for advanced big data processing. More on this option can be found at the website and on the datasheet.

How do I load data from Oracle to Hadoop in an event driven manner ?

You can use Oracle GoldenGate for Big Data for this to capture all committed transactions on a source Oracle DB and deliver to Hive, HDFS, HBase, Flume, Kafka, and others. You can learn more about Oracle GoldenGate for Big Data here.

Can a customer be just fine with ODI, rather than puchasing Oracle Warehoue Builder for data warehousing projects, What are the strategic directions for these products from Oracle for data warehousing projects?

Oracle Data Integrator (ODI) is Oracle's strategic product for data integration. You can read the full statement of direction on this topic here. There are also automated migration utilities that help migrate your OWB work into the ODI environment.

Does the ODI comes with Financial Analytics also have the big data capability? or its only in Full version?

Financial Analytics Hub uses a restricted use license of ODI which is meant for use specifically with the Financial Analytics products as outlined in the license.

ODI EE has basic big data functionality such as Hive, Sqoop and Hbase. The Pig, Spark and Oozie functionality required ODI EE as well as the Advanced Big Data Option for ODI.

When customers expand beyond the specialized financial analytics they upgrade to the full license of ODIEE and ODIEE for Big Data.

Can I use Impala (Cloudera advancement on Hive). Does DI recognize Impala?

We have customers using Impala with ODI. This is supported through our regular JDBC support.

Will the utilization of Hadoop cut down on the need for a lot of manual coding, or will manual coding still be an essential part of Oracle Data Integration for Big Data?

ODI is specifically made to avoid manual coding and provide a graphical and metadata-driven way of Big Data Integration. Using tools instead of manual coding has been understood in the Data Integration community for decades, and this realization is coming to the Big Data community now through painful experiences.

See also this article: • Top 5 Big Data Integration Mistakes to Avoid

Are resources required for ODI managed by YARN in Hadoop?

ODI is using standard Hadoop subsystems like Hive, HDFS, Oozie, Pig, Spark, HBase which are managed by YARN, so we are implicitly taking advantage of YARN.

Can you please share any performance benchmarks we have with the other competitors?

We might suggest a couple whitepapers for your review on this subject:

Data Integration Platforms for Big Data and the Enterprise: Customer Perspectives on IBM, Informatica and Oracle

The Oracle Data Integrator Architecture –

Best Practices for Real-time Data Warehousing

Oracle Data Integrator Performance Guide

What is the major differentiator in the usage of Hive, spark or Pig within DI?

ODI provides a graphical and logical abstraction over these engines, so you can design your transformation without concern what the implementation engine will be. You can choose your engine after the fact based on their performance characteristics, such as in-memory performance of Spark. If a new Hadoop engine comes up in the future, it's easy to retool your logical design to run on a future language.

Can you explain the difference between using GG Big Data Adapters and GG java flat file Adapaters

The OGG BD adapter comes with 4 pre-packaged adapters for Hive, HDFS, Flume, HBase that have been developed and tested by Oracle R&D. It also comes with custom JAva and JMS support like the "Java flat file App Adapter". The only thing that is only in the "Java flat file App Adapter" is the Flat File support.

Oracle ODI is an alternate/equivalent to which particular product of Apache?

ODI is not competing against any Apache/Hadoop project, instead it integrates with them. We are utilizing HDFS, Hive, Spark, Pig, HBase, Sqoop for our transformations, unlike other DI vendors who deploy their own engine on Hadoop cluster.

So is ODI generating Hive/pig/MR code behind the scene when you define mappings?

You are correct. You design a logical mapping and pick an execution engine (Spark/Pig/Hive), ODI will generate and execute code for this engine. You have full visibility into this code, unlike a proprietary ETL engine.

Is Oracle Metadata Management (OMM) part of ODI or a separate product?

Oracle Metadata Management is a separate product (from ODI)however one that is complementary - please find the datasheet here:

Can you please share the details on GG on Big data

You can find more info here

Can you share the names of the blogs for effective ODI data integration..?

The ODI team is regularly posting on https://blogs.oracle.com/dataintegration/, but there is a rich community of bloggers writing about ODI: http://www.ateam-oracle.com/data-integration/di-odi/ http://odiexperts.com/ http://www.rittmanmead.com/category/oracle-data-integrator/ http://oracleodi.com/ https://gurcanorhan.wordpress.com/

And many many more - google for ODI BLOGS to get more info.

ODI can pull data at regular schedule (say every 2 min). golden gate do it real time. so it 2 min is kind of dealy is ok then do we need GG for bigdata ?

That is the general principle. If looking for real time replication, sub second latency then GoldenGate is the product. If looking for heavy processing of Big Data then ODI is the answer. They are actually complementary and work off of one another where customers use GG for data ingestion and ODI for data processing.

I'm Oracle apps DBA and Oracle performance DBA. Can i use my existing skillsets to transition into oracle DI for big data ? Is this completely different from DBA skillset?

ODI is popular with DBAs as the generated SQL code (RBDMS or Hive/Impala) is visible, and all of our runtime actions are "white box" so you can see what's happening. You can review queries and their query plans and optimize them using our Knowledge Module framework.

Thursday Apr 30, 2015

How To Setup Oracle GoldenGate When Performing the DB2 11.1 Upgrade

After the announcement of DB2 11.1 support in Oracle GoldenGate for DB2 z/OS ,a lot of questions was received on how to setup Oracle GoldenGate when performing the DB2 11.1 upgrade. This blog provides some instructions and explanations. 

DB2 11.1 increases the log record sequence numbers from 6 bytes to 10 bytes. The log reading API changed significantly to support the new log record format.  Oracle GoldenGate provides support for DB2 11.1 with a version specific build.  In other words, starting with Oracle GoldenGate 12.1.2.1.4, two downloadable builds will be provided to support DB2 z/OS: 

  • GoldenGate for DB2 10 and earlier versions
  • GoldenGate for DB2 11
If upgrading to DB2 11.1 in a data sharing configuration and you’ll be upgrading the subsystems in the group gradually (i.e. you’ll have a mixed DB2 11.1 & DB2 v10.1/9.1 group for some period of time), we first recommend that you upgrade the existing GoldenGate being used to the GoldenGate version that you plan to use once you’ve upgraded to DB2 11.1.  At the time of writing this document, the earliest version of GoldenGate that supports DB2 11.1 is 12.1.2.1.4.  

The diagram below depicts the GoldenGate and data sharing configuration prior to upgrading the first subsystem to DB2 11.1.
Picture
Please make sure you are not using the data sharing group name i.e. ADDO in the extract connection parameter. For example if the data sharing group name is ADDO, and the subsystem SSIDs of the group are ADD1 and ADD2..., please use the SSID name instead. When you use the data sharing group name, GoldenGate will connect to any of the subsystems to access log files from all of the subsystems in the data sharing group. However, during the upgrade process, we need to make sure the GoldenGate extract is connected to a specific subsystem of the group that will not be upgraded to DB2 v11.1 initially. For example,

SOURCEDB ADD1 userid uuuuuu, password ppppppp


To quickly modify a GoldenGate extract connection to another subsystem in the data sharing group, it is common practice to use an include file to define the connection parameter.  For example, the following “extract-conn.inc” file denoted in the “INCLUDE” parameter would contain the connection parameter above:

INCLUDE extract-conn.inc

In this example, you can keep the extract connected to ADD1 while upgrading the other members of the data sharing group to DB2 11.1. Data from all members in the data sharing group will be captured by GoldenGate. 
Picture
As soon as you upgrade one member of the data sharing group to DB2 11.1, you can choose to use the new GoldenGate for DB2 z/OS 11 build and connect the extract to that subsystem and capture log records from all the other subsystems in the data sharing group as illustrated below:
Picture
The DB2 IFI allows a GoldenGate extract to access log files for all DB2 subsystems that are a part of the DB2 data sharing group no matter which LPAR these subsystems are running in.  GoldenGate can capture from all members of a data sharing group even if there are different DB2 subsystem versions.  To clarify this further:

  • GoldenGate can connect to a DB2 11.1 subsystem and successfully capture log records from DB2 10.1 subsystem(s) that are also a part of the DB2 data sharing group.
  • In like manner, GoldenGate can also connect to a DB2 10.1 subsystem and successfully capture log records from DB2 11.1 subsystem(s) that are a part of the DB2 data sharing group.

Please refer to KM 1060540.1 if you need more information about the Oracle GoldenGate support for DB2 z/OS data sharing group.

If you have further question or suggestions,  please feel free to reach me at @@jinyu512

(Thanks my colleague Mark Geisler, Richard Johnson and Greg Wood for reviewing this doc.)

Wednesday Apr 15, 2015

Data Governance for Migration and Consolidation

By Martin Boyd, Senior Director of Product Management

How would you integrate millions of parts, customer and supplier information from multiple acquisitions into a single JD Edwards instance?  This was the question facing National Oilwell Varco (NOV), a leading worldwide provider of worldwide components used in the oil and gas industry.  If they could not find an answer then many operating synergies would be lost, but they knew from experience that simply “moving and mapping” the data from the legacy systems into JDE was not sufficient, as the data was anything but standardized.

This was the problem described yesterday in a session at the Collaborate Conference in Las Vegas.  The presenters were Melissa Haught of NOV and Deepak Gupta of KPIT, their systems integrator. Together they walked through an excellent discussion of the problem and the solution they have developed:

The Problem:  It is first important to recognize that the data to be integrated from many and various legacy systems had been created over time with different standards by different people according to their different needs. Thus, saying it lacked standardization would be an understatement.  So how do you “govern” data that is so diverse?  How do you apply standards to it months or years after it has been created? 

The Solution:  The answer is that there is no single answer, and certainly no “magic button” that will solve the problem for you.  Instead, in the case of NOV, a small team of dedicated data stewards, or specialists, work to reverse-engineer a set of standards from the data at hand.  In the case of product data, which is usually the most complex, NOV found they could actually infer rules to recognize, parse, and extract information from ‘smart’ part numbers, even from part numbering schemes from acquired companies.  Once these rules are created for an entity or a category and built in to their Oracle Enterprise Data Quality (EDQ) platform. Then the data is run through the DQ process and the results are examined.  Most often you will find out problems, which then suggest some rule refinements are required. Rule refinement and data quality processing steps run repeatedly until the result is as good as it can be.  The result is never 100% standardized and clean data though. Some data is always flagged into a “data dump” for future manual remediation. 

Lessons Learned:

  • Although technology is a key enabler, it is not the whole solution. Dedicated specialists are required to build the rules and improve them through successive iterations
  • A ‘user friendly’ data quality platform is essential so that it is approachable and intuitive for the data specialists who are not (nor should they be) programmers
  • A rapid iteration through testing and rules development is important to keep up project momentum.  In the case of NOV, specialists request rule changes, which are implemented by KPIT resources in India. So in effect, changes are made and re-run overnight which has worked very well

Technical Architecture:  Data is extracted from the legacy systems by Oracle Data Integrator (ODI), which also transforms the data in to the right ‘shape’ for review in EDQ.  An Audit Team reviews these results for completeness and correctness based on the supplied data compared to the required data standards.  A secondary check is also performed using EDQ, which verifies that the data is in a valid format to be loaded into JDE.

The Benefit:  The benefit of having data that is “fit for purpose” in JDE is that NOV can mothball the legacy systems and use JDE as a complete and correct record for all kinds of purposes from operational management to strategic sourcing.  The benefit of having a defined governance process is that it is repeatable.  This means that every time the process is run, the individuals and the governance team as a whole learn something from it and they get better at executing it next time around.  Because of this NOV has already seen orders of magnitude improvements in productivity as well as data quality, and is already looking for ways to expand the program into other areas.

All-in-all, Melissa and Deepak gave the audience great insight into how they are solving a complex integration program and reminded us of what we should already know: "integrating" data is not simply moving it. To be of business value, the data must be 'fit for purpose', which often means that both the integration process and the data must be governed. 

Friday Apr 10, 2015

Customers Tell All: What Sets Oracle Apart in Big Data Integration

Data integration has become a critical component of many technology solutions that businesses pursue to differentiate in their markets. Instead of relying on manual coding in house, more and more businesses choose data integration solutions to support their strategic IT initiatives, from big data analytics to cloud integration.

To explore the differences among the leading data integration solutions and the impact their technologies are having on real-world businesses, Dao Research recently conducted a research study, where they interviewed IBM, Informatica, and Oracle customers. In addition they reviewed publicly available solution information from these three vendors.

The research revealed some key findings that explains Oracle's leadership in the data integration space. For example:

  • Customers who participated in this study cite a range of 30 to 60 % greater development productivity using Oracle Data Integrator vs traditional ETL tools from Informatica and IBM. Dao's research ties Oracle's advantage to product architecture differences such as native push-down processing, the seperation of logical and physical layers, and the ability to extend Oracle Data Integrator using its knowledge modules.
  • The research also showed that Oracle’s data integration cost of ownership is lower because of its unified platform strategy (versus offering multiple platforms and options), its use of source and target databases for processing, higher developer productivity, faster implementation, and it doesn’t require management resources for a middle-tier integration infrastructure.
  • In the area of big data integration, the study highlights Oracle’s advantage with its flexible and native solutions. Unlike competitors’ offerings, developed as separate solutions, Oracle’s solution is aware of the cluster environment of big data systems. Oracle enables big data integration and cloud data integration through the use of a single platform with common tooling and inherent support for big data processing environments.
  • I should add that the latest release of Oracle Data Integrator EE Big Data Options  widens the competitive gap. Oracle is the only vendor that can automatically generate Spark, Hive, and Pig transformations from a single mapping. Oracle Data Integration customers can focus on building the right architecture for driving business value, and do not have to become expert on multiple programming languages.  For example, an integration architect in a large financial services provider told the research company "As an ODI developer, I am a Big Data developer without having to understand the underpinnings of Big Data. That's pretty powerful capability."


You can find the report of Dao's research here:

I invite you to read this research paper to understand why more and more customers trust Oracle for their strategic data integration initiatives after working with or evaluating competitive offerings.


This Week's A-Team Blog Speaks to Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The A-Team not only provides great content, they are humorous too!

Check out this week’s post, the title says it all: Getting Groovy with Oracle Data Integrator: Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The article covers various scripts written in Groovy and leverage the ODI SDK that assist in automating massive changes to one’s repository. These initially came to be as a result of customer desire in enhancing their environment in their effort to move from Oracle Warehouse Builder (ODI) to Oracle Data Integrator (ODI), but in the end came the realization that these scripts could be used by any ODI user.

Happy reading!

Thursday Apr 09, 2015

ODI, Big Data SQL and Oracle NoSQL

Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.

My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.

Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;

  • export KVHOME=/u01/nosql/kv-3.2.5
  • cd /u01/nosql/scripts
  • ./admin.sh

  • connect store -name kvstore
  • EXEC "CREATE TABLE recommendation( \
  •          custid INTEGER, \
  •          sno INTEGER, \
  •          genreid INTEGER,\
  •          movieid INTEGER,\
  •          PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
  • PUT TABLE -name RECOMMENDATION  -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json

The Manual Approach

This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;

  • CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
  •                  custid INT,
  •                  sno INT,
  •                  genreId INT,
  •                  movieId INT)
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
  •                            "oracle.kv.hosts"="localhost:5000",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="recommendation");

At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;

  • CREATE TABLE recommendation(
  •                  custid NUMBER,
  •                  sno NUMBER,
  •                  genreid NUMBER,
  •                  movieid NUMBER
  •          )
  •                  ORGANIZATION EXTERNAL
  •          (
  •                  TYPE ORACLE_HIVE
  •                  DEFAULT DIRECTORY DEFAULT_DIR
  •                  ACCESS PARAMETERS  (
  •                      com.oracle.bigdata.tablename=default.recommendation
  •                  )
  •          ) ;

Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;

  • SELECT m.title, g.name, c.first_name
  • FROM recommendation r, movie m, genre g, customer c
  • WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1 
  • ORDER by r.sno, r.genreid;

Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.

Oracle Data Integrator Approach

The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;


Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.

In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;

This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate. 

This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;

  • create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
  •  <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%> 
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
  •                            "oracle.kv.hosts"="<%=odiRef.getInfo("SRC_DSERV_NAME")%>",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");

You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 12.1.3.0.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.

Summary 

Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!


Wednesday Apr 08, 2015

Oracle GoldenGate for DB2 z/OS Supports DB2 11

With the release of Oracle GoldenGate 12.1.2.1.4 release, Oracle GoldenGate for DB2 z/OS provides the support for DB2 11. This release also includes the fix to make Oracle GoldenGate z/OS Extract compatible with  IBM APAR PI12599 for DB2 z/OS. [Read More]
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« July 2015
SunMonTueWedThuFriSat
   
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today