Monday Feb 16, 2015

The Data Governance Commandments

This is the second of our Data Governance Series. Read the first part here.

The Four Pillars of Data Governance

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:


Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.

Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.

But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.


So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.

Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.


No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.

But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.

More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.

But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.

Best Practices

For your data governance to really deliver—and keep delivering—you need to follow best practices.

Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.

How Many Have You Got?

These four pillars are essential to holding up a great data governance strategy, and if you’re missing even one of them, you’re severely limiting the value and reliability of your data.

If you’re struggling to get all the pillars in place, you might want to read our short guide to data governance success.

Tuesday Feb 10, 2015

The Data Governance Commandments: Ignoring Your Data Challenges is Not an Option

This is the first of our Data Governance blog series. Read the next of the series here.

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

All businesses are data businesses in the modern world, and if you’re collecting any information on employees, performance, operations, or your customers, your organization is swimming in data by now. Whether you’re using it, or just sitting on it, that data is there and it is most definitely your responsibility.

Even if you lock it in a vault and bury your head in the sand, that data will still be there, and it will still be:

  • Subject to changeable regulations and legislation
  • An appealing target for cybercriminals
  • An opportunity that you’re missing out on

Those are already three very good reasons to start working on your data strategy. But let’s break it down a bit more.


Few things stand still in the world of business, but regulations in particular can move lightning-fast.

If your data is sat in a data warehouse you built a few years ago, that data could now be stored in an insecure format, listed incorrectly, and violating new regulations you haven’t taken into account.

You may be ignoring the data, but regulatory bodies aren’t—and you don’t want to find yourself knee-deep in fines.


Your network is like a big wall around your business. Cybercriminals only need to find one crack in the brickwork, and they’ll come flooding in.

Sure, you’ve kept firewalls, anti-virus software and your critical servers up to date, but what about that old data warehouse? How’s that looking?

If you’ve taken your eye off your DW for even a second, you’re putting all that data at risk. And if the cybercriminals establish a backdoor through the DW into the rest of the organization, who knows how far the damage could spread?

If you lose just consumer reputation and business following such a data breach, consider yourself lucky. The impact could be far worse for the organization that ignores its data security issues.


Even without the dangers of data neglect, ignoring your data means you’re ignoring fantastic business opportunities. The data you’re ignoring could be helping your business:

  • Better target marketing and sales activities
  • Make more informed business decisions
  • Get more from key business applications
  • Improve process efficiency

Can you afford to ignore all of these benefits, and risk the security and compliance of your data?

Thankfully, there are plenty of ways you can start tightening up your data strategy right away.

Check out our short guide to data governance, and discover the three principles you need to follow to take control of your data.

Thursday Jan 29, 2015

Oracle GoldenGate 12c for Oracle Database - Integrated Capture sharing capture session

Oracle GoldenGate for Oracle Database has introduced several features in Release In this blog post I would like to explain one of the interesting features:  “Integrated Capture (a.k.a. Integrated Extract) sharing capture session”. This feature allows making the creation of additional Integrated Extracts faster by leveraging existing LogMiner dictionaries. As Integrated Extract requires registering the Extract let’s first see what is ‘Registering the Extract’?


The above command registers the Extract process with the database for what is called “Integrated Capture Mode”. In this mode the Extract interacts directly with the database LogMining server to receive data changes in the form of logical change records (LCRs).

When you create Integrated Extract prior to release Oracle GoldenGate, you might have seen the delay in registering the Extract with database. It is mainly because the creation of Integrated Extract involves dumping the dictionary and then processing that dictionary to populate LogMiner tables for each session, which causes overhead to online systems and hence it requires extra time to startup. The same process is being followed when you create additional Integrated Extract.

What if you could use the existing LogMiner dictionaries to create the additional Integrated Extract? This is what it has been done in this release. Additional Integrated Extract creation can be made faster significantly by leveraging existing LogMiner dictionaries which have been mined already. Hence no more separate copy of the LogMiner dictionaries to be dumped with each Integrated Extract. As a result, it will make the creation of additional Integrated Extracts much faster and helps avoid significant database overhead caused by dumping and processing the dictionary.

In order to use the feature, you should have Oracle DB version or higher, and Oracle GoldenGate for Oracle version or higher. The feature is currently supported for non-CDB databases only.

Command Syntax:



{SHARE [AUTOMATIC | extract | NONE]}

It has primarily three options to select with; NONE is default if you don’t specify anything.

AUTOMATIC option will clone/share the LogMiner dictionary from the existing closest capture. If no suitable clone candidate is found, then a new LogMiner dictionary is created.

Extract option will clone/share from the capture session associated for the specified Extract. If this is not possible, then an error occurs the register does not complete.

NONE option does not clone or create a new LogMiner dictionary; this is the default.

While you use the feature, the SHARE options should be followed by SCN and specified SCN must be greater than or equal to at least one of the first SCN of existing captures and specified SCN should be less than current SCN.

Let’s see few behaviors prior to release and with SHARE options. 'Current SCN’ indicates the current SCN value when register Extract command was executed in following example scenario.

Capture Name

LogMiner ID

First SCN

Start SCN

LogMiner Dictionary ID (LM-DID)









































Behavior Prior to – No Sharing

Register extract EXT1 with database (current SCN: 60000)

Register extract EXT2 with database (current SCN: 65000)

Register extract EXT3 with database SCN 60000 (current SCN: 65555)

Register extract EXT4 with database SCN 61000   à Error!!

Registration of Integrated Extract EXT1, EXT2 and EXT3 happened successfully where as EXT4 fails because the LogMiner server does not exist at SCN 61000.

Also take a note that all Integrated Extract (EXT1 – EXT3) created dictionaries separately (LogMiner Dictionary IDs are different, now onwards I’ll call them LM-DID).

New behavior with different SHARE options

  • Register extract EXT4 with database SHARE AUTOMATIC (current SCN: 66000)

EXT4 automatically chose the capture session EXT2 as it has Start SCN 65000 which is nearer to current SCN 66000. Hence EXT4 & EXT3 capture sessions would share the same LM-DID 2.

  • Register extract EXT5 with database SHARE EXT1 (current SCN: 68000)

EXT5 is sharing the capture session EXT1. Since EXT1 is up and running, it doesn’t give any error. LM-DID 1 would be shared across EXT 5 and EXT1 capture sessions.

  • Register extract EXT6 with database SHARE NONE (current SCN: 70000)

EXT6 is being registered with SHARE NONE option; hence the new LogMiner dictionary will be created or dumped. Please see LM-DID column for EXT6 in above table. It contains LM-DID value 4.

  • Register extract EXT7 with database SCN 61000 SHARE NONE (current SCN: 70000)

It would generate an error as similar to EXT4 @SCN61000. The LogMiner Server doesn’t exist at SCN 61000 and since the SHARE option is NONE, it won’t share the existing LogMiner dictionaries as well. This is same behavior as prior to release.

  • Register extract EXT7 with database SCN 61000 SHARE AUTOMATIC (current SCN: 72000)

EXT7 is sharing the capture session EXT1 as it is the closest for SCN61000. You must have noticed that the EXT7 @SCN61000 scenario has passed with SHARE AUTOMATIC option, which was not the case earlier (EXT4 @61000).

  • Register extract EXT8 with database SCN 68000 SHARE EXT2 (current SCN: 76000)

EXT8 extract is sharing EXT2 capture session. Hence sharing of LogMiner dictionaries happens between EXT8 & EXT2

This feature is not only providing you faster start up for additional Integrated Extract, but also resolves few scenarios which wasn’t possible earlier. If you are using this feature and had questions or comments, please let me know by leaving your comments below. I’ll reply to you as soon as possible.

Thursday Jan 22, 2015

OTN Virtual Technology Summit Data Integration Subtrack Features Big Data Integration and Governance

I am sure many of you have heard about the quarterly Oracle Technology Network (OTN) Virtual Technology Summits. It provides a hands-on learning experience on the latest offerings from Oracle by bringing experts from our community and product management team. 

The next OTN Virtual Technology Summit is scheduled to February 11th (9am-12:30pm PT) and will feature Oracle's big data integration and metadata management capabilities with hands-on-lab content.

The Data Integration and Data Warehousing sub-track includes the following sessions and speakers:

Feb 11th 9:30am PT -- HOL: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.

Speaker: Michael Rainey, Oracle ACE, Principal Consultant, Rittman Mead

Feb 11th 10:30am PT -- HOL: Bringing Oracle Big Data SQL to Oracle Data Integration 12c Mappings

Oracle Big Data SQL extends Oracle SQL and Oracle Exadata SmartScan technology to Hadoop, giving developers the ability to execute Oracle SQL transformations against Apache Hive tables and extending the Oracle Database data dictionary to the Hive metastore. In this session we'll look at how Oracle Big Data SQL can be used to create ODI12c mappings against both Oracle Database and Hive tables, to combine customer data held in Oracle tables with incoming purchase activities stored on a Hadoop cluster. We'll look at the new transformation capabilities this gives you over Hadoop data, and how you can use ODI12c's Sqoop integration to copy the combined dataset back into the Hadoop environment.

Speaker: Mark Rittman, Oracle ACE Director, CTO and Co-Founder, Rittman Mead

Feb 11th 11:30am PT-- An Introduction to Oracle Enterprise Metadata Manager
This session takes a deep technical dive into the recently released Oracle Enterprise Metadata Manager. You’ll see the standard features of data lineage, impact analysis and version management applied across a myriad of Oracle and non-Oracle technologies into a consistent metadata whole, including Oracle Database, Oracle Data Integrator, Oracle Business Intelligence and Hadoop. This session will examine the Oracle Enterprise Metadata Manager "bridge" architecture and how it is similar to the ODI knowledge module. You will learn how to harvest individual sources of metadata, such as OBIEE, ODI, the Oracle Database and Hadoop, and you will learn how to create OEMM configurations that contain multiple metadata stores as a single coherent metadata strategy.

Speaker: Stewart Bryson, Oracle ACE Director, Owner and Co-founder, Red Pill Analytics

I invite you to register now to this free event and enjoy this feast for big data integration and governance enthusiasts.

Americas -- February 11th/ 9am to 12:30pm PT- Register Now

Please note the same OTN Virtual Technology Summit content will be presented again to EMEA and APAC. You can register for the via the links below.

EMEA – February 25th / 9am to 12:30pm GMT* - Register Now

APAC – March 4th / 9:30am-1:00pm IST* - Register Now

Join us and let us know how you like the data integration sessions in this quarter's OTN event.

Monday Jan 12, 2015

ODI 12c - Eclipse and Updated Mapping Builder Example

The 12c ODI release offers a full SDK which allows great opportunities for automation. You can reduce development times and work smarter. Here I will show how to use Eclipse to configure and setup the the SDK to start working smarter. In this example I will use an updated version of Mapping Builder which is a sample SDK illustration I blogged about previously. The builder has been updated to include auto mapping of attributes, multiple targets, loading modes and more. The updated code used in this example is here

There is a viewlet illustrating the using Eclipse and the ODI SDK here. The viewlet walks through the entire configuration of an Eclipse project using the 12.1.3 release of ODI. 

The important areas are with respect to the libraries you need to define for the project, below you can see the libraries used from the ODI installation, I have installed my software in C:\Oracle\ODI_1213 in these examples.

Once the libraries are configured you can import the mapping builder java source code...

Then you must define how the code is executed, you specify the parameters as arguments. This is of course an example for illustration, you can define how you would like this to be done in whatever way you like. You specify all of the mapping builder parameters here. I also pass the control file as a parameter rather than via stdin (which is what I did in previous blog posts).

I mentioned the mapping builder code now supports auto mapping, multiple targets and loading mode, you can see an example driver file below. The line starting with >mapping has 2 qualifiers, below I have used EQUALS and IGNORECASE. The first parameter  supports the following values;

  • EQUALS - supports matching EMPNO with EMPNO (or empno with EMPNO when IGNORECASE used for example)
  • SRCENDSWITH - supports matching source XEMPNO with EMPNO (plus case option as above)
  • TGTENDSWITH - supports matching source EMPNO with XEMPNO (plus case option as above)

The second parameter supports;

  • MATCH - exact match, matches EMPNO with EMPNO or empno with empno
  • IGNORECASE - supports mismatched case matching EMPNO matches empno

As well as auto mapping you can also specify specific column level mapping explicitly as was previously supported, the viewlet has an example. 

 You can also see the control file has a target_load directive which can be given a condition or else in the final field, this let's you load multiple conditional based targets. This is a new addition to the existing target directive. So the example has 3 directives now;

  • target - load via insert/append the data
  • targetinc - incremental update / merge the data
  • target_load - insert multiple targets 

In the example above, the following mapping is created, you can see the multiple conditional based targets and data is mapped all from a simple driver file;

Using Eclipse you can quickly build and debug your utilities and applications using the SDK and get the benefits of using an IDE such as code insight and auto completion.

Check out the viewlet and the mapping builder example plus the blogs on the SDK such as the recent one on the SDK overview and work smarter. Also, the updated code used in this example is here - get an overview of the mapping builder here.

Friday Jan 09, 2015

ODI 12c - Mapping SDK Overview

In this post I'll show some of the high level concepts in the physical design and the SDKs that go with it. To do this I'll cover some of the logical design area so that it all makes sense. The conceptual model for logical mapping in ODI 12c is shown below (it's quite a change from 11g), the model below allows us to build arbitrary flows. Each entity below you can find in the 12c SDK. Many of these have specialized classes - for example MapComponent has specializations for the many mapping components available from the designer - these classes may have specific business logic or specialized behavior. You can use the strongly typed, highly specialized classes like DatastoreComponent or you can write applications in a generic manner using the conceptual high level SDK - this is the technique I used in the mapping builder here.

The heart of the SDK for this area of the model can be found here;

If you need to see these types in action, take the mapping illustration below as an example, I have annotated the different items within the mapper. The connector points are shown in the property inspector, they are not shown in the graphical design. Some components have many input or output connector points (for example set component has many input connector points). Some components are simple expression based components (such as join and filter) we call these selector components, other components project a specific shape, we call those projectors - that's just how we classify them. 

In 12c we clearly separated the physical design from the logical, in 11g much of this was blended together. In separating them we also allow many physical designs for a logical mapping design. We also had to change the physical SDK and model so that we could support multiple targets and arbitrary flows. 11g was fairly rigid - if you look at the 'limitations' sections of the KMs you can see some of that. KMs are assigned on map physical nodes in the physical design, there are some helper methods on execution unit so you can set/get KMs.

The heart of the SDK for this logical mapping area of the model can be found here;

If we use the logical mapping shown earlier and look at the physical design we have for it, we can annotate the items below so you can envisage how each of the classes above is used in the design;

The MapPhysicalDesign class has all of the physical related information such as the ODI Optimization Context and Default Staging Location (there also exists a Staging Location Hint on the logical mapping design) - these are items that existed in ODI 11g and are carried forward.

To take an example if I want to change the LKMs or IKMs set on all physical designs, one approach would be to iterate through all of the nodes in a physical design and you can check whether an LKM or an IKM is assigned for that node - this then let;s you do all sorts - from get the current setting, to setting it with a new value. The snippet below gives a small illustration using groovy of the methods from the ODI SDK;

  1.         PhysicalDesignList=map.getPhysicalDesigns()
  2.          for (pd in PhysicalDesignList){
  3.             PhysicalNodesList=pd.getPhysicalNodes()
  4.             for (pn in PhysicalNodesList){
  5.                 if (pn.isLKMNode()){
  6.                     CurrentLKMName=pn.getLKMName()
  7. ...
  8.                          pn.setLKM(myLKM) 
  9.                 }else if (pn.isIKMNode()){
  10.                     CurrentIKMName=pn.getIKMName()
  11. ...
  12.                      pn.setIKM(myIKM)

There are many other methods within the SDK to do all sorts of useful stuff - first example is the getAllAPNodes method on a MapPhysicalDesign. This gives all of the nodes in a design which will have LKMs assigned - so you can quickly set or check. The second example is the getTargetNodes method on MapPhysicalDesign - this is handy to get all target nodes to set IKMs on, Final example is to find an AP node in the physical design for a logical component in your design - use the method findNode to achieve this.

Hopefully there are some useful pointers here, worth being aware of the ODI blog on Mapping SDK the ins and outs which provides an overview and cross reference to the primary ODI objects and the underpinning SDKs. If there are any other specific questions let us know.

Monday Dec 29, 2014

Oracle Data Enrichment Cloud Service (ODECS) - Coming Soon

What are your plans around Big Data and Cloud?

If your organization has already begun to explore these topics, you might be interested a new offering from Oracle that will dramatically simplify how you use your data in Hadoop and the Cloud:

Oracle Data Enrichment Cloud Service (ODECS)

There is a perception that most of the time spent in Big Data projects is dedicated to harvesting value. The reality is that 90% of the time in Big Data projects is really spent on data preparation. Data may be structured, but more often it will be semi-structured such as weblogs, or fully unstructured such as free form text. The content is vast, inconsistent, and incomplete, often off topic, and from multiple differing formats and sources. In this environment each new dataset takes weeks or months of effort to process, frequently requiring programmers writing custom scripts. Minimizing data preparation time is the key to unlocking the potential of Big Data.

Oracle Data Enrichment Cloud Service (ODECS) addresses this very reality. ODECS is a non-technical, web-based tool that sets out to minimize data preparation time in an effort to quickly unlock the potential of your data. The ODECS tool provides an interactive set of services that automate, streamline, and guide the process of data ingestion, preparation, enrichment, and governance without costly manual intervention.

The technology behind this service is amazing; it intuitively guides the user with a machine learning driven recommendation engine based on semantic data classification and natural language processing algorithms. But the best part is that non-technical staff can use this tool as easily as they use Excel, resulting in a significant cost advantage for data intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.

Curious to find out more? We invite you to view a short demonstration of ODECS below:

Let us know what you think!

Stay tuned as we write more about this offering…

Wednesday Dec 17, 2014

Oracle Partition Exchange Blog from the ODI A-Team

More great information from the ODI A-Team!

Check out the A-Team’s most recent blog about the Oracle Partition Exchange – it does come in two parts:

Using Oracle Partition Exchange with ODI

Configuring ODI with Oracle Partition Exchange

The knowledge module is on Java.Net, and it is called “IKM Oracle Partition Exchange Load”.  To search for it, enter “PEL” or “exchange” in the Search option of Java.Net.

A sample ODI 12.1.3 Repository is available as well.  The ODI sample repository has great examples of how to perform both initial and incremental data upload operations with Oracle Partition Exchange.  This repository will help users to understand how to use Oracle Partition Exchange with ODI.

Happy reading!

Thursday Dec 11, 2014

Recap of Oracle GoldenGate 12c for the Enterprise and the Cloud webcast

Last week I hosted a webcast on Oracle GoldenGate 12c's latest features and its solutions for cloud environments. For those of you who missed it, I wanted to give a quick recap and remind that you can watch it on-demand via the following link:

Oracle GoldenGate 12c for the Enterprise and the Cloud

In this webcast my colleague Chai Pydimukkala, senior director of product management for Oracle GoldenGate and I talked about some of the key challenges in cloud deployments and how Oracle GoldenGate addresses them. We discussed examples of cloud-specific data integration use cases, such as synchronizing data between on-premises systems and Oracle Cloud or Amazon Cloud environments. We also discussed zero downtime consolidation to cloud using Oracle GoldenGate.

In the webcast, Chai also presented  the latest features of Oracle GoldenGate 12.1.x including:

  • New database support, including Informix, SQL Server 2014, MySQL Community Edition
  • Real-time data integration between on-premises and cloud with SOCKS5 compliance
  • New features in Oracle GoldenGate Veridata especially the new data repair capabilities
  • Enhancements to Integrated Delivery, and support for capturing data from Active Data Guard standby system
  • The new migration utility to help with the move from Oracle Streams to Oracle GoldenGate.
As with previous GoldenGate webcasts we had a very interactive Q&A where we received tons of questions. We tried answer as much as possible in the available time but could not get to all of them.  Below are some of the commonly asked questions we received during the webcast and brief answers:

Question: Does GoldenGate replace ODI? When shall we use an ETL tool vs GoldenGate?

Answer: GoldenGate is designed for real-time, change data capture, routing, and delivery. It  performs basic, row level transformations. For complex transformation requirements you still need ETL/E-LT solutions. Our customers augment their existing ETL/E-LT solutions by adding GoldenGate for real-time, low-impact change data capture and delivery. GoldenGate can deliver data for ETL in flat file format, or feed staging tables, or it can be JMS messages.Oracle Data Integrator's E-LT architecture creates perfect combination as GoldenGate can capture changed data non-intrusively with low-impact, and deliver to staging tables in the target with sub-seconds latency. With ability to perform transformations within the target (or source) database, ODI takes this change data, performs transformations in micro-batches and loads user tables with high performance. Because of this natural and strategic fit between the products, we have tightly integrated ODI and GoldenGate.To learn more about how GoldenGate and ODI are integrated and work together, please watch this on-demand webcast. I also recommend reading the following white paper on real-time data warehousing best practices

Here you can see a demo of ODI and for customer examples, you can watch Paychex , RBS, and Raymond James videos.

Question: Is there a plan to sell GoldenGate as a service soon?

Answer: Yes, it is in the plans. We are working with the Oracle Cloud team. But we are not able to give a timeline.

Question: Are Integrated Capture and Delivery only available for Oracle Database or this can be used for non-Oracle databases?

Answer: Integrated Capture and Delivery are only available for Oracle Database and truly differentiate Oracle GoldenGate against other data integration and replication vendors. We offer Coordinated Delivery for all supported databases. Coordinated Delivery simplifies configuration significantly as well and it works with non-Oracle databases too. You can read more about Coordinated Delivery in a related blog,  via Oracle GoldenGate 12c Release 1 New Features Overview white paper or documentation.

Question: Is GoldenGate available for download for trial?

Answer: Yes, you can download GoldenGate on OTN for education and development purposes:  For big data use case, you can use Big Data Lite virtual environment to experiment with Oracle GoldenGate. 

Question: Does GoldenGate replace Active Data Guard? 

 No. The products are complementary. Data Guard is a physical replication solution designed for Oracle Database disaster recovery and offers it with great simplicity and performance. Oracle GoldenGate offers logical/transactional data replication which supplements Active Data Guard by eliminating downtime during planned outages (migration, consolidation, maintenance), and active-active data center synchronization for maximum availability. License for Oracle GoldenGate for Oracle Database includes also Active Data Guard. As mentioned in the webcast, GoldenGate 12c now can capture data from Active Data Guard's standby system too.

Question:  Does the GoldenGate Veridata repair subset of data instead of doing full sync ? Example : I want to repair only missed deletes.

Yes. Oracle GoldenGate Veridata can do granular repair for out-of-sync records. Please see our Oracle GoldenGate Veridata data sheet for more info.

Question: How do we use Enterprise Manager for GoldenGate? 

Answer: Oracle Management Pack for Oracle GoldenGate license includes a Enterprise Manager Plug-in that allows you to use your Oracle Enterprise Manager solution to monitor and manage Oracle GoldenGate solutions. 

If you have not attended the webcast live, I highly recommend watching Oracle GoldenGate 12c for the Enterprise and the Cloud on demand and listening to the long Q&A session with Chai. During the webcast we covered many other frequently asked questions.

Wednesday Dec 10, 2014

Oracle Enterprise Metadata Management is now available!

As a quick refresher, Metadata Management is essential to solve a wide variety of critical business and technical challenges which include how report figures are calculated, understanding the impact of changes to data upstream, providing reports in a business friendly way in the browser and providing reporting capabilities on the entire metadata of an enterprise for analysis and improvement. Oracle Enterprise Metadata Management is built to solve all these pressing needs for customers in a lightweight browser-based interface. Today, we announce the availability of Oracle Enterprise Metadata Management as we continue to enhance this offering.

With Oracle Enterprise Metadata Management, you will find business glossary updates, updates for a better experience to the user interface as well as improved and new metadata harvesting bridges including Oracle SQL Server Data Modeler, Microsoft SQL Server Integration Services, SAP Sybase PowerDesigner, Tableau and more. There are also new dedicated web pages for tracing data lineage and impact! At a more granular level you will also find new customizable action menus per repository object type for more personalization. For a full read on new features, please read here. Additionally, view here for the certification matrix details.

Download Oracle Enterprise Metadata Management!

Tuesday Dec 09, 2014

Big Data Governance– Balancing Big Risks and Bigger Profits

To me, nothing exemplifies the real value that Big Data brings to life than the role it played in the last edition of the FIFA soccer world cup. Stephen Hawkins predicted that England’s chance of winning a game drops by 60 percent every time the temperature increases by 5ºC. Moreover, he found that England plays better in stadiums situated 500 meters above sea level, and perform better if the games kick off later than 3PM local time. In short, England’s soccer team struggles to cope with the conditions in hot and humid countries.

We all have heard, meditated and opined about the value of Big Data, the panacea for all problems. And it is true. Big Data has started delivering real profits and wins to businesses. But as with any data management program, profit benefits should be maximized while striving to minimize potential risks and costs.

Customer Data is Especially Combustible Data

The biggest lift in businesses using Big Data is obtained through the mining of customer data. By storing and analyzing seemingly disparate customer attributes and running analytic models through the whole data set (data sampling is dying a painful demise), businesses are able to accurately predict buying patterns, customer preferences and create products and services that cater to today’s demanding consumers. But this veritable mine of customer information is combustible. And by that, what I mean is that a small leak is enough to undo any benefits hitherto extracted from ensuing blowbacks like financial remuneration, regulatory constrictions and most important of all the immense reputational damage. And this is why Big Data should always be well governed. Data Governance is an aspect of data security that helps with safeguarding Big Data in business enterprises.

Big Data Governance

Big Data Governance is but a part (albeit a very important part) of a larger Big Data Security strategy. Big Data security should involve considerations along the efficient and economic storage of data, retrieval of data and consumption of data. It should also deal with backups, disaster management and other traditional considerations.

When properly implemented a good Governance program serves as a crystal ball to the data flow within the organizations. It will answer questions on how safe the data is, who can and should be able to lay their hands on the data and proactively prevent data leakage and misuse. Because when dealing with Big Reservoirs of Data, small leakages can go unnoticed. 

Thursday Nov 20, 2014

Let Oracle GoldenGate 12c Take You to the Cloud

If your organization is in the ~80% of the global business community, you are most likely working on a cloud computing strategy for your organization, or actively implementing. The cloud computing growth rate is 5X more than the overall IT growth rate because of the clear and already proven cost savings, agility, and  scalability benefits of cloud architectures.

When organizations decide to embark on their cloud journey, they notice there are several questions and challenges to be addressed, involving data accessibility, security, availability, system management, performance etc. Oracle GoldenGate's real-time data integration and bi-directional transactional replication technology addresses critical challenges such as:

  • How to move my systems to the cloud without interrupting operations?
  • How to enable timely data synchronization between the systems on the cloud and on-premises to ensure access to consistent data for all end users?
  • How do I run operational reports with the data I have in cloud environments, or feed my analytical systems in cloud solutions?
  • In managed or private clouds, how do I keep the cloud platform highly available when I need to do maintenance, upgrades?

 On Tuesday,  December 2nd we will tackle these questions in a free webcast:

Live Webcast: Oracle GoldenGate 12c for the Enterprise and the Cloud

Tuesday, December 2nd, 2014 10am PT/ 1pm ET 

In this webcast, you will not only hear about Oracle GoldenGate's strong solutions for cloud environments, but also the latest features that strengthen its offering. The new features we will discuss include:

  • Support for Informix, SQL Server 2014, MySQL Community Edition, and big data environments
  • Real-time data integration between on premises and cloud with SOCKS5 compliance
  • New data repair functionality to help ensure database consistency across heterogeneous systems
  • Moving from Oracle Streams to GoldenGate with the new migration utility

 I would like to invite you to join me and my colleague Chai Pydimukkala, Senior Director of Product Management for Oracle GoldenGate in this session to learn the latest on GoldenGate 12c and ask your questions in a live Q&A.

Hope to see you there!

Tuesday Nov 18, 2014

Oracle GoldenGate for Informix is Released

Oracle GoldenGate for Informix is available on OTN and Oracle eDelivery. This is a new addition to Oracle GoldenGate's heterogeneous database support.[Read More]

Wednesday Nov 12, 2014

ODI 12c - Spark SQL and Hive?

In this post I'll cover some new capabilities in the Apache Spark 1.1 release and show what they mean to ODI today. There's a nice slide shown below from the Databricks training for Spark SQL that pitches some of the Spark SQL capabilities now available. As well as programmatic access via Python, Scala, Java, the Hive QL compatibility within Spark SQL is particularly interesting for ODI...... today. The Spark 1.1 release supports a subset of the Hive QL features which in turn is a subset of ANSI SQL, there is already a lot there and it is only going to grow. The Hive engine today uses map-reduce which is not fast today, the Spark engine is fast, in-memory - you can read much more on that elsewhere.

Figure taken from from the Databricks training for Spark SQL, July 2014.

In the examples below I used the Oracle Big Data Lite VM, I downloaded the Spark 1.1 release and built using Maven (I was on CDH 5.2). To use Spark SQL in ODI, we need to create a Hive data server - the Hive data server masquerades as many things, it can can be used for Hive, for HCatalog or for Spark SQL. Below you can see my data server, note the Hive port is 10001, by default 10000 is the Hive server port - we aren't using Hive server to execute the query, here we are using the Spark SQL server. I will show later how I started the Spark SQL server on this port (Apache Spark doc for this is here).

I started the server using the Spark standalone cluster that I configured using the following command from my Spark 1.1 installation;

./sbin/ --hiveconf bigdatalite --hiveconf hive.server2.thrift.port 10001 --master spark://

You can also specify local (for test), Yarn or other cluster information for the master. I could have just as easily started the server using Yarn by specify the master URI as something like --master yarn:// where 8032 is my Yarn resource manager port. I ran using the 10001 port so that I can run both Spark SQL and Hive engines in parallel whilst I do various tests. To reverse engineer I actually used the Hive engine to reverse engineer the table definitions in ODI (I hit some problems using the Spark SQL reversing, so worked around it) and then changed the model to use my newly created Spark SQL data server above.

Then I built my mappings just like normal - and used the KMs in ODI for Hive just like normal. For example the mapping below aggregates movie ratings and then joins with movie reference data to load movie rating data - the mapping uses the datastores from a model obtained from the Hive metastore;

If you look at the physical design the Hive KMs are assigned but we will execute this through the Spark SQL engine rather than through Hive. The switch from engine to engine was handled in the URL within our our Hive dataserver.

When the mapping is executed you can use the Spark monitoring API to check the status of the running application and Spark master/workers.

You can also monitor from the regular ODI operator and ODI console. Spark SQL support uses the Hive metastore for all the table definitions be they internally or externally managed data. 

There are other blogs from tools showing how to access and use Spark SQL, such as the one here from Antoine Amend using SQL Developer. Antoine has also another very cool blog worth checking out Processing GDELT Data Using Hadoop. In this post he shows a custom InputFormat class that produces records/columns. This is a very useful post for anyone wanting to see the Spark newAPIHadoopFile api in action. It has a pretty funky name, but is a key piece (along with its related methods) of the framework.

  1. // Read file from HDFS - Use GdeltInputFormat
  2. val input = sc.newAPIHadoopFile(
  3.    "hdfs://path/to/gdelt",
  4.    classOf[GdeltInputFormat],
  5.    classOf[Text],
  6.    classOf[Text]

Antoine also provides the source code to GdeltInputFormat so you can see the mechanics of his record reader, although the input data is delimited data (so could have been achieved in different ways) it's a useful resource to be aware of.

If you are looking at Spark SQL, this post was all about using Spark SQL via the JDBC route - there is another whole topic on transformations using the Spark framework alongside Spark SQL that is for future discussion. You should be aware of and check out the Hive QL compatibility documentation here, check what you can do can't do within Spark SQL today. Download the BDA Lite VM and give it a try.

Monday Nov 10, 2014

Big Data Governance and Metadata Management - A Recap

On the 30th of November we held a webcast on governing Big Data. It was the second in the series on Big Data (if you missed the first you can register for it here). We discussed the importance of bringing transparency to the Big Data Reservoir architecture and how to improve and enrich data within the reservoir using Oracle's Enterprise Data Quality (OEDQ). Oracle also announced Oracle Enterprise Metadata Management (OEMM), a comprehensive metadata management tool that is built with a business friendly search driven interface. 

Here is a quick recap of some of the questions that came through. 

Do these principles and technology of Metadata Management, Data Governance and Data Quality apply to Big Data as well as traditional Data?

All the technologies are equally applicable to Big Data as well as traditional data warehousing. In fact, Oracle Enterprise Data Quality and Oracle Enterprise Metadata Management are designed to bridge these two worlds.   

 Does Oracle Enterprise Metadata Management work with 3rd party metadata?

 Yes. We recognize that to truly govern data life cycle  Oracle Enterprise Metadata Management should be able to harvest data across multiple technologies and platforms including Oracle and Non Oracle Data bases, Business Analytics, Data Warehouses and ETL engines.

Is Oracle Enterprise Metadata Management compatible with 11g?

Oracle Enterprise Metadata Management is compatible with many 11g products too.

Where can I get more information about Oracle’s Data Integration products?

The best resource for Oracle Data Integration Products are

The Oracle Data Integration Home Page,

The Oracle Data Integration Technology Network,

The Oracle Data Integration Blog

Also connect with us on facebook and twitter (#OEDQ, #OEMM, ORCLGoldenGate, ODI12c)


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« October 2015