Thursday Nov 12, 2015

Oracle Data Integrator 12.2.1 - Parallel Development using Subversion Branches

This is the second article in the series of three articles covering Oracle Data Integrator (ODI) 12.2.1 lifecycle management features. In the previous article, Managing Versions in Apache Subversion, I discussed ODI Studio features for creating, viewing, comparing and restoring versions in Apache Subversion. In this article I will cover the Branch and Tag management capabilities that are required for parallel development.

Parallel Development Using Subversion Branches

First, let’s take a look at the high level setup of the Subversion branches in ODI for any parallel development. As shown in the below diagram, you can configure a Subversion branch/trunk with one master and work repository combination. In this example, the Subversion trunk is used by user set 1 as the main code line and contains all ODI objects. Branch 01 is configured for user set 2, who are working on a subset of ODI objects. User set 3 is working on a release branch and fixing bugs on the release code line. Each of these user set work in parallel and leverages Subversion branches as the channel to merge their changes across different branches.

Creating Tags

A Tag is a text name applied to the current versions of a set of objects to take a snapshot of their current state. It freezes the current state of objects and can be referred back with the Tag name at some future point in time. You can create a Tag directly from the ODI Studio and you can create it for all the repository objects or for a selected set of objects.

ODI takes following actions on the objects present in the Tag to ensure consistency.

  1. Add objects to Subversion if they are not yet versioned.
  2. Create versions for the objects if their latest changes are not yet versioned.
  3. Automatically include all the dependencies of the selected objects so that all the relevant objects can be found in the Tag when you refer back to it.

Thus, a Tag creation is a good way of ensuring all the relevant objects are being version controlled and it is recommended to create Tags at different logical points during your development cycle.

There are two type of Tags created in ODI

  • Full Tag – It is created with all the objects in the repository. It will first sync all the repository object state to Subversion versions before applying the Tag.

  • Partial Tag – It is created for a selected set of objects and user drags and drops an object into the Partial Tag creation wizard. Al the dependencies for the selected object are automatically added to the Tag.

In the example, a mapping was added to the Partial Tag and all the required model and topology objects are automatically added by ODI Studio.

Creating a Subversion Branch

You can create a Subversion branch from a Tag. The branch creation screen lists all the relevant Subversion Tags and their respective comments to easily locate a Tag for branch creation. The objects versions of the selected Tag becomes the branch point and may not necessarily be the latest version of the objects.

Populating ODI Repository from a Branch or Trunk Objects

Once you configure your ODI repository with a Subversion branch/trunk, you can populate it with the branch/trunks objects through a couple of ODI Studio menu options.

  1. Populating repository – It is used when you want to populate a fresh repository with the branch content. For example when you create a project branch and want to setup an ODI repository to the branch.
  2. Populating restored repository – This option is used when your current repository gets corrupted and you restore an older repository backup. It syncs the objects present in the restored repository with the latest contents from the Subversion branch/trunk. It will pull all the objects from the Subversion branch/trunk and imports them in a special Import Mode that only inserts or updates the objects presented in the repository while leaves behind any extra objects in the repository.

Branch Merge

Merging changes from one development branch to another is an essential need for any parallel development. ODI Studio allows you to merge changes from a Branch, Trunk or Tag to the repository it is connected.

Merge Summary

The merge process automatically merges the object changes that do not have any conflicts. The Merge summary details all the affected objects in the merge. It also lists the objects that could not be automatically merged and their conflicts should me manually resolved. The summary report can be saved in the file system in multiple formats so that it can be shared with the team.

Merge Results

The merge result window assists you in resolving conflicts from a branch merge. There are a couple of tabs available in this window.

Merge Object Selection Tab

It lists the objects affected in the merge and provides different search and filtering options. It allows following operations on the conflicted objects

  1. Mark the conflict as resolved for the selected object.
  2. Open Merge Conflict Resolution window for the selected object.
  3. Assign ownership of the conflict resolution to a particular user responsible for conflict resolution of that object. By default the merge process assigns conflict ownership to the user who has last modified it.

Merge Conflict Resolution Tab

It assists you in manually resolving an object’s conflict by presenting the object properties side by side.


A number of branch management features introduced in ODI 12.2.1 are vital for organizing objects across branches and allow parallel development between functionally and geographically distributed teams.

Stay tuned for the third and final article on lifecycle management, which will cover the release management capabilities using newly introduced Deployment Archives.

Wednesday Apr 15, 2015

Data Governance for Migration and Consolidation

By Martin Boyd, Senior Director of Product Management

How would you integrate millions of parts, customer and supplier information from multiple acquisitions into a single JD Edwards instance?  This was the question facing National Oilwell Varco (NOV), a leading worldwide provider of worldwide components used in the oil and gas industry.  If they could not find an answer then many operating synergies would be lost, but they knew from experience that simply “moving and mapping” the data from the legacy systems into JDE was not sufficient, as the data was anything but standardized.

This was the problem described yesterday in a session at the Collaborate Conference in Las Vegas.  The presenters were Melissa Haught of NOV and Deepak Gupta of KPIT, their systems integrator. Together they walked through an excellent discussion of the problem and the solution they have developed:

The Problem:  It is first important to recognize that the data to be integrated from many and various legacy systems had been created over time with different standards by different people according to their different needs. Thus, saying it lacked standardization would be an understatement.  So how do you “govern” data that is so diverse?  How do you apply standards to it months or years after it has been created? 

The Solution:  The answer is that there is no single answer, and certainly no “magic button” that will solve the problem for you.  Instead, in the case of NOV, a small team of dedicated data stewards, or specialists, work to reverse-engineer a set of standards from the data at hand.  In the case of product data, which is usually the most complex, NOV found they could actually infer rules to recognize, parse, and extract information from ‘smart’ part numbers, even from part numbering schemes from acquired companies.  Once these rules are created for an entity or a category and built in to their Oracle Enterprise Data Quality (EDQ) platform. Then the data is run through the DQ process and the results are examined.  Most often you will find out problems, which then suggest some rule refinements are required. Rule refinement and data quality processing steps run repeatedly until the result is as good as it can be.  The result is never 100% standardized and clean data though. Some data is always flagged into a “data dump” for future manual remediation. 

Lessons Learned:

  • Although technology is a key enabler, it is not the whole solution. Dedicated specialists are required to build the rules and improve them through successive iterations
  • A ‘user friendly’ data quality platform is essential so that it is approachable and intuitive for the data specialists who are not (nor should they be) programmers
  • A rapid iteration through testing and rules development is important to keep up project momentum.  In the case of NOV, specialists request rule changes, which are implemented by KPIT resources in India. So in effect, changes are made and re-run overnight which has worked very well

Technical Architecture:  Data is extracted from the legacy systems by Oracle Data Integrator (ODI), which also transforms the data in to the right ‘shape’ for review in EDQ.  An Audit Team reviews these results for completeness and correctness based on the supplied data compared to the required data standards.  A secondary check is also performed using EDQ, which verifies that the data is in a valid format to be loaded into JDE.

The Benefit:  The benefit of having data that is “fit for purpose” in JDE is that NOV can mothball the legacy systems and use JDE as a complete and correct record for all kinds of purposes from operational management to strategic sourcing.  The benefit of having a defined governance process is that it is repeatable.  This means that every time the process is run, the individuals and the governance team as a whole learn something from it and they get better at executing it next time around.  Because of this NOV has already seen orders of magnitude improvements in productivity as well as data quality, and is already looking for ways to expand the program into other areas.

All-in-all, Melissa and Deepak gave the audience great insight into how they are solving a complex integration program and reminded us of what we should already know: "integrating" data is not simply moving it. To be of business value, the data must be 'fit for purpose', which often means that both the integration process and the data must be governed. 

Thursday Feb 19, 2015

Hive, Pig, Spark - Choose your Big Data Language with Oracle Data Integrator

The strength of Oracle Data Integrator (ODI) has always been the separation of logical design and physical implementation. Users can define a logical transformation flow that maps any sources to targets without being concerned what exact mechanisms would be used to realize such a job. In fact, ODI doesn’t have its own transformation engine but instead outsources all work to the native mechanisms of the underlying platforms, may it be relational databases, data warehouse appliances, or Hadoop clusters.

In the case of Big Data this philosophy of ODI gains even more importance. New Hadoop projects are incubated and released on a constant basis and introduce exciting new capabilities; the combined brain trust of the big data community conceives new technology that outdoes any proprietary ETL engine. ODI’s ability to separate your design from the implementation enables you to pick the ideal environment for your use case; and if the Hadoop landscape evolves, it is easy to retool an existing mapping with a new physical implementation. This way you don’t have to tie yourself to one language that is hyped this year, but might be legacy in the next.

ODI enables the generation from logical design into executed code through physical designs and Knowledge Modules. You can even define multiple physical designs for different languages based on the same logical design. For example, you could choose Hive as your transformation platform, and ODI would generate Hive SQL as the execution language. You could also pick Pig, and the generated code would be Pig Latin. If you choose Spark, ODI will generate PySpark code, which is Python with Spark APIs. Knowledge Modules will orchestrate the generation of code for the different languages and can be further configured to optimize the execution of the different implementation, for example parallelism in Pig or in-memory caching for Spark.

The example below shows an ODI mapping that reads from a log file in HDFS, registered in HCatalog. It gets filtered, aggregated, and then joined with another table, before being written into another HCatalog-based table. ODI can generate code for Hive, Pig, or Spark based on the Knowledge Modules chosen. 

 ODI provides developer productivity and can future-proof your investment by overcoming the need to manually code Hadoop transformations to a particular language.  You can logically design your mapping and then choose the implementation that best suits your use case.

Monday Feb 16, 2015

The Data Governance Commandments

This is the second of our Data Governance Series. Read the first part here.

The Four Pillars of Data Governance

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:


Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.

Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.

But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.


So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.

Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.


No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.

But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.

More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.

But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.

Best Practices

For your data governance to really deliver—and keep delivering—you need to follow best practices.

Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.

How Many Have You Got?

These four pillars are essential to holding up a great data governance strategy, and if you’re missing even one of them, you’re severely limiting the value and reliability of your data.

If you’re struggling to get all the pillars in place, you might want to read our short guide to data governance success.

Tuesday Feb 10, 2015

The Data Governance Commandments: Ignoring Your Data Challenges is Not an Option

This is the first of our Data Governance blog series. Read the next of the series here.

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

All businesses are data businesses in the modern world, and if you’re collecting any information on employees, performance, operations, or your customers, your organization is swimming in data by now. Whether you’re using it, or just sitting on it, that data is there and it is most definitely your responsibility.

Even if you lock it in a vault and bury your head in the sand, that data will still be there, and it will still be:

  • Subject to changeable regulations and legislation
  • An appealing target for cybercriminals
  • An opportunity that you’re missing out on

Those are already three very good reasons to start working on your data strategy. But let’s break it down a bit more.


Few things stand still in the world of business, but regulations in particular can move lightning-fast.

If your data is sat in a data warehouse you built a few years ago, that data could now be stored in an insecure format, listed incorrectly, and violating new regulations you haven’t taken into account.

You may be ignoring the data, but regulatory bodies aren’t—and you don’t want to find yourself knee-deep in fines.


Your network is like a big wall around your business. Cybercriminals only need to find one crack in the brickwork, and they’ll come flooding in.

Sure, you’ve kept firewalls, anti-virus software and your critical servers up to date, but what about that old data warehouse? How’s that looking?

If you’ve taken your eye off your DW for even a second, you’re putting all that data at risk. And if the cybercriminals establish a backdoor through the DW into the rest of the organization, who knows how far the damage could spread?

If you lose just consumer reputation and business following such a data breach, consider yourself lucky. The impact could be far worse for the organization that ignores its data security issues.


Even without the dangers of data neglect, ignoring your data means you’re ignoring fantastic business opportunities. The data you’re ignoring could be helping your business:

  • Better target marketing and sales activities
  • Make more informed business decisions
  • Get more from key business applications
  • Improve process efficiency

Can you afford to ignore all of these benefits, and risk the security and compliance of your data?

Thankfully, there are plenty of ways you can start tightening up your data strategy right away.

Check out our short guide to data governance, and discover the three principles you need to follow to take control of your data.

Thursday Nov 06, 2014

Oracle Data Integrator and Hortonworks

Check out Oracle's Alex Kotopoulis being features on Hortonworks blog discussing how Oracle Data Integrator is the best tool for data ingest to Hadoop!

Remember to register for the November 11th joint webinar presented by Jeff Pollock, VP Oracle, and Tim Hall, VP Hortonworks.  Click here to register.  

Monday Oct 20, 2014

Announcing Availability of Oracle Enterprise Metadata Management

Oracle today announced the general availability of Oracle Enterprise Metadata Management (OEMM), Oracle's comprehensive Metadata Management technology for Data Governance. With this release Oracle stresses the importance that it lays on it's product strategy that not just offers best in class Data Integration solutions like Oracle Data Integrator (ODI), Oracle GoldenGate (OGG) and Oracle Enterprise Data Quality (OEDQ), but also on technology that ties together business initiatives like governance.

Data Governance Considerations

Organizations have been struggling to impose credible governance onto their data for long with ad-hoc processes and technologies that are unwieldy  and unscalable. There were a number of reasons why this was the case.

  • a. Data Governance cannot be done without managing metadata.
  • b. Data Governance cannot be done without extending across all platforms irrespective of technologies.
  • c. Data Governance cannot be done without a business and IT friendly interface.

Complete Stewardship -  Data Transparency from Source to Report

The biggest advantages of having an airtight Data Governance program is to reduce data risk, increase security and to manage your organization's Data Life-cycle. Any governance tool should be able to surface lineage, impact analysis and data flow not just within a Business Analytics, or within a Data Ware house but across all these systems, no matter what technology one is using. This increased transparency assesses accurately risks and impacts during changes to data. 

Data Flow Diagram across platforms.

With a focus on stewardship OEMM is designed to be intuitive and search based. It's search catalog allows easy browsing of all objects with collaboration and social features for the Data Steward.