Tuesday Aug 04, 2015
Tuesday Jul 07, 2015
By Sandrine Riley-Oracle on Jul 07, 2015
Next in the series for Oracle Data Integration chalk talk videos, we speak to Oracle Data Integrator (ODI) for big data. ODI allows you to become a big data developer without learning to code Java and Map Reduce! ODI generates the code and optimizes it with support for Hive, Spark, Oozie, and Pig.
View this video to learn more:
Thursday Jul 02, 2015
By Sandrine Riley-Oracle on Jul 02, 2015
Some fun new videos are available; we call the series ‘Chalk Talk’!
The first in the series that we will share with you around Oracle Data Integration speaks to raising trust and transparency within big data. It is known that crucial big data projects often fail due to a lack in the overall trust of the data. Data is not always transparent, and governing it can become a costly overhead. Oracle Metadata Management assists in the governance and trust across all data with the enterprise, Oracle and 3rd party.
View this video to learn more: Chalk Talk: How to Raise Trust and Transparency in Big Data.
For additional information on Oracle Metadata Management, visit the OEMM homepage.
Tuesday Jun 09, 2015
Oracle Data Integrator Journalizing Knowledge Module for GoldenGate Integrated Replicat Blog from the A-Team
By Sandrine Riley-Oracle on Jun 09, 2015
As always, useful content from the A-Team…
Check out the most recent blog about how to modify the out-of-the-box Journalizing Knowledge Module for GoldenGate to support the Integrated Replicat apply mode.
Wednesday May 13, 2015
By Sandrine Riley-Oracle on May 13, 2015
It is nomination time!!!
This year's Oracle Fusion Middleware Excellence Awards will honor customers and partners who are creatively using various products across Oracle Fusion Middleware. Think you have something unique and innovative with Oracle Data Integration products?
We'd love to hear from you! Please submit today in the Big Data and Analytics category.
The deadline for the nomination is July 31, 2015. Win a free pass to Oracle OpenWorld 2015!!
Let’s reminisce a little…
For details on the 2014 Data Integration Winners: NET Serviços and Griffith University, check out this blog post.
For details on the 2013 Data Integration Winners: Royal Bank of Scotland’s Market and International Banking and The Yalumba Wine Company, check out this blog post.
For details on the 2012 Data Integration Winners: Raymond James Financial and Morrisons Supermarkets, check out this blog post.
We hope to honor you!
Click here to submit your nomination today. And just a reminder: the deadline to submit a nomination is 5pm Pacific Time on July 31, 2015.
Monday May 11, 2015
By Sandrine Riley-Oracle on May 11, 2015
What are your plans around Big Data and Cloud?
If your organization has already begun to explore these topics, you might be interested a new offering from Oracle that will dramatically simplify how you use your data in Hadoop and the Cloud:
There is a perception that most of the time spent in Big Data projects is dedicated to harvesting value. The reality is that 90% of the time in Big Data projects is really spent on data preparation. Data may be structured, but more often it will be semi-structured such as weblogs, or fully unstructured such as free form text. The content is vast, inconsistent, and incomplete, often off topic, and from multiple differing formats and sources. In this environment each new dataset takes weeks or months of effort to process, frequently requiring programmers writing custom scripts. Minimizing data preparation time is the key to unlocking the potential of Big Data.
Oracle Big Data Preparation Cloud Service (BDP) addresses this very reality. BDP is a non-technical, web-based tool that sets out to minimize data preparation time in an effort to quickly unlock the potential of your data. The BDP tool provides an interactive set of services that automate, streamline, and guide the process of data ingestion, preparation, enrichment, and governance without costly manual intervention.
The technology behind this service is amazing; it intuitively guides the user with a machine learning driven recommendation engine based on semantic data classification and natural language processing algorithms. But the best part is that non-technical staff can use this tool as easily as they use Excel, resulting in a significant cost advantage for data intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.
Curious to find out more? We invite you to view a short demonstration of BDP below:
Let us know what you think!
Stay tuned as we write more about this offering… visit often here!
Wednesday Apr 15, 2015
By Irem Radzik-Oracle on Apr 15, 2015
By Martin Boyd, Senior Director of Product Management
How would you integrate millions of parts, customer and supplier information from multiple acquisitions into a single JD Edwards instance? This was the question facing National Oilwell Varco (NOV), a leading worldwide provider of worldwide components used in the oil and gas industry. If they could not find an answer then many operating synergies would be lost, but they knew from experience that simply “moving and mapping” the data from the legacy systems into JDE was not sufficient, as the data was anything but standardized.
This was the problem described yesterday in a session at the
Collaborate Conference in Las Vegas. The presenters were Melissa Haught
of NOV and Deepak Gupta of KPIT, their systems integrator. Together they
walked through an excellent discussion of the problem and the solution they
The Problem: It is first important to recognize that the data to be integrated from many and various legacy systems had been created over time with different standards by different people according to their different needs. Thus, saying it lacked standardization would be an understatement. So how do you “govern” data that is so diverse? How do you apply standards to it months or years after it has been created?
The Solution: The answer is that there is no single answer, and certainly no “magic button” that will solve the problem for you. Instead, in the case of NOV, a small team of dedicated data stewards, or specialists, work to reverse-engineer a set of standards from the data at hand. In the case of product data, which is usually the most complex, NOV found they could actually infer rules to recognize, parse, and extract information from ‘smart’ part numbers, even from part numbering schemes from acquired companies. Once these rules are created for an entity or a category and built in to their Oracle Enterprise Data Quality (EDQ) platform. Then the data is run through the DQ process and the results are examined. Most often you will find out problems, which then suggest some rule refinements are required. Rule refinement and data quality processing steps run repeatedly until the result is as good as it can be. The result is never 100% standardized and clean data though. Some data is always flagged into a “data dump” for future manual remediation.
- Although technology is a key enabler, it is not the whole solution. Dedicated specialists are required to build the rules and improve them through successive iterations
- A ‘user friendly’ data quality platform is essential so that it is approachable and intuitive for the data specialists who are not (nor should they be) programmers
- A rapid iteration through testing and rules development is important to keep up project momentum. In the case of NOV, specialists request rule changes, which are implemented by KPIT resources in India. So in effect, changes are made and re-run overnight which has worked very well
Technical Architecture: Data is extracted from the legacy systems by Oracle Data Integrator (ODI), which also transforms the data in to the right ‘shape’ for review in EDQ. An Audit Team reviews these results for completeness and correctness based on the supplied data compared to the required data standards. A secondary check is also performed using EDQ, which verifies that the data is in a valid format to be loaded into JDE.
The Benefit: The benefit of having data that is “fit for purpose” in JDE is that NOV can mothball the legacy systems and use JDE as a complete and correct record for all kinds of purposes from operational management to strategic sourcing. The benefit of having a defined governance process is that it is repeatable. This means that every time the process is run, the individuals and the governance team as a whole learn something from it and they get better at executing it next time around. Because of this NOV has already seen orders of magnitude improvements in productivity as well as data quality, and is already looking for ways to expand the program into other areas.
All-in-all, Melissa and Deepak gave the audience great insight into how they are solving a complex integration program and reminded us of what we should already know: "integrating" data is not simply moving it. To be of business value, the data must be 'fit for purpose', which often means that both the integration process and the data must be governed.
Friday Apr 10, 2015
By Irem Radzik-Oracle on Apr 10, 2015
Data integration has become a critical component of many technology solutions that businesses pursue to differentiate in their markets. Instead of relying on manual coding in house, more and more businesses choose data integration solutions to support their strategic IT initiatives, from big data analytics to cloud integration.
To explore the differences among the leading data integration solutions and the impact their technologies are having on real-world businesses, Dao Research recently conducted a research study, where they interviewed IBM, Informatica, and Oracle customers. In addition they reviewed publicly available solution information from these three vendors.
The research revealed some key findings that explains Oracle's leadership in the data integration space. For example:
- Customers who participated in this study cite a range of 30 to 60 % greater development productivity using Oracle Data Integrator vs traditional ETL tools from Informatica and IBM. Dao's research ties Oracle's advantage to product architecture differences such as native push-down processing, the seperation of logical and physical layers, and the ability to extend Oracle Data Integrator using its knowledge modules.
- The research also showed that Oracle’s data integration cost of ownership is lower because of its unified platform strategy (versus offering multiple platforms and options), its use of source and target databases for processing, higher developer productivity, faster implementation, and it doesn’t require management resources for a middle-tier integration infrastructure.
- In the area of big data integration, the study highlights Oracle’s advantage with its flexible and native solutions. Unlike competitors’ offerings, developed as separate solutions, Oracle’s solution is aware of the cluster environment of big data systems. Oracle enables big data integration and cloud data integration through the use of a single platform with common tooling and inherent support for big data processing environments.
I should add that the latest release of Oracle Data Integrator EE Big Data Options widens the competitive gap. Oracle is the only vendor that can automatically generate Spark, Hive, and Pig transformations from a single mapping. Oracle Data Integration customers can focus on building the right architecture for driving business value, and do not have to become expert on multiple programming languages. For example, an integration architect in a large financial services provider told the research company "As an ODI developer, I am a Big Data developer without having to understand the underpinnings of Big Data. That's pretty powerful capability."
You can find the report of Dao's research here:
I invite you to read this research paper to understand why more and more customers trust Oracle for their strategic data integration initiatives after working with or evaluating competitive offerings.
This Week's A-Team Blog Speaks to Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder
By Sandrine Riley-Oracle on Apr 10, 2015
The A-Team not only provides great content, they are humorous too!
Check out this week’s post, the title says it all: Getting Groovy with Oracle Data Integrator: Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder
The article covers various scripts written in Groovy and leverage the ODI SDK that assist in automating massive changes to one’s repository. These initially came to be as a result of customer desire in enhancing their environment in their effort to move from Oracle Warehouse Builder (ODI) to Oracle Data Integrator (ODI), but in the end came the realization that these scripts could be used by any ODI user.
Thursday Apr 09, 2015
By David Allan-Oracle on Apr 09, 2015
Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.
My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.
Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;
- export KVHOME=/u01/nosql/kv-3.2.5
- cd /u01/nosql/scripts
- connect store -name kvstore
- EXEC "CREATE TABLE recommendation( \
- custid INTEGER, \
- sno INTEGER, \
- genreid INTEGER,\
- movieid INTEGER,\
- PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
- PUT TABLE -name RECOMMENDATION -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json
The Manual Approach
This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;
- CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
- custid INT,
- sno INT,
- genreId INT,
- movieId INT)
- STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
- TBLPROPERTIES ( "oracle.kv.kvstore"="kvstore",
At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;
- CREATE TABLE recommendation(
- custid NUMBER,
- sno NUMBER,
- genreid NUMBER,
- movieid NUMBER
- ORGANIZATION EXTERNAL
- TYPE ORACLE_HIVE
- DEFAULT DIRECTORY DEFAULT_DIR
- ACCESS PARAMETERS (
- ) ;
Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;
- SELECT m.title, g.name, c.first_name
- FROM recommendation r, movie m, genre g, customer c
- WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1
- ORDER by r.sno, r.genreid;
Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.
Oracle Data Integrator Approach
The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;
Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.
In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;
This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate.
This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;
- create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
- <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%>
- STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
- TBLPROPERTIES ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
- "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");
You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 188.8.131.52.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.
Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!
Monday Apr 06, 2015
By Madhu Nair on Apr 06, 2015
Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level.
Increasing Big Data Heterogeneity and Transparency
This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include
- Support for Apache Spark,
- Support for Apache Pig, and
- Orchestration using Oozie.
Click here for a detailed list of what is new in Oracle Data Integrator (ODI).
Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce.
Oracle Data Integrator for Big Data Webcast
We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data
Thursday Mar 26, 2015
By Sandrine Riley-Oracle on Mar 26, 2015
Oracle's big data team has announced the newest Oracle Big Data Lite Virtual Machine 4.1.0. This newest Big Data Lite Virtual Machine contains great improvements from a data integration perspective with inclusion of the recently released Oracle GoldenGate for Big Data. You will see this in an improved demonstration that highlights inserts, updates, and deletes into Hive using Oracle GoldenGate for Big Data with Oracle Data Integrator performing a merge of the new operations into a consolidated table.
Big Data Lite is a pre-built environment which includes many of the key capabilities for Oracle's big data platform. The components have been configured to work together in this Virtual Machine, providing a simple way to get started in a big data environment. The components include Oracle Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator, Oracle GoldenGate amongst others.
Big Data Lite also contains hands-on labs and demonstrations to help you get started using the system. Tame Big Data with Oracle Data Integration is a hands-on lab that teaches you how to design Hadoop data integration using Oracle Data Integrator and Oracle GoldenGate.
Start here to learn more! Enjoy!
Thursday Feb 19, 2015
By Irem Radzik-Oracle on Feb 19, 2015
Big data systems and big data analytics solutions are becoming critical components of modern information management architectures. Organizations realize that by combining structured transactional data with semi-structured and unstructured data they can realize the full potential value of their data assets, and achieve enhanced business insight. Businesses also notice that in today’s fast-paced, digital business environment to be agile and respond with immediacy, access to data with low latency is essential. Low-latency transactional data brings additional value especially for dynamically changing operations that day-old data, structured or unstructured, cannot deliver.
Today we announced the general availability of Oracle GoldenGate for Big Data product, which offers a real-time transactional data streaming platform into big data systems. By providing easy-to-use, real-time data integration for big data systems, Oracle GoldenGate for Big Data facilitates improved business insight for better customer experience. It also allows IT organizations to quickly move ahead with their big data projects without extensive training and management resources. Oracle GoldenGate for Big Data's real-time data streaming platform also allows customers to keep their big data reservoirs up to date with their production systems.
Oracle GoldenGate’s fault-tolerant, secure and flexible architecture shines in this new big data streaming offering as well. Customers can enjoy secure and reliable data streaming with subseconds latency. With Oracle GoldenGate’s core log-based change data capture capabilities it enables real-time streaming without degrading the performance of the source production systems.
The new offering, Oracle GoldenGate for Big Data, provides integration for Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. It also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.
March 5th, 2015 10am PT/ 1pm ET
I invite you to join this webcast to learn from Oracle and Cloudera executives how to future-proof your big data infrastructure. The webcast will discuss :
- Selection criteria that will drive business results with Big Data Integration
- Oracle's new big data integration and governance offerings, including Oracle GoldenGate for Big Data
- Oracle’s comprehensive big data features in a unified platform
- How Cloudera Enterprise Data Hub and Oracle Data Integration combine to offer complementary features to store data in full fidelity, to transform and enrich the data for increased business efficiency and insights.
Hope you can join us and ask your questions to the experts.
Monday Feb 16, 2015
By Madhu Nair on Feb 16, 2015
This is the second of our Data Governance Series. Read the first part here.
The Four Pillars of Data Governance
Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.
Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:
Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.
Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.
But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.
So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.
Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.
No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.
But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.
More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.
But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.
For your data governance to really deliver—and keep delivering—you need to follow best practices.
Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.