Thursday Feb 19, 2015

Hive, Pig, Spark - Choose your Big Data Language with Oracle Data Integrator

The strength of Oracle Data Integrator (ODI) has always been the separation of logical design and physical implementation. Users can define a logical transformation flow that maps any sources to targets without being concerned what exact mechanisms would be used to realize such a job. In fact, ODI doesn’t have its own transformation engine but instead outsources all work to the native mechanisms of the underlying platforms, may it be relational databases, data warehouse appliances, or Hadoop clusters.

In the case of Big Data this philosophy of ODI gains even more importance. New Hadoop projects are incubated and released on a constant basis and introduce exciting new capabilities; the combined brain trust of the big data community conceives new technology that outdoes any proprietary ETL engine. ODI’s ability to separate your design from the implementation enables you to pick the ideal environment for your use case; and if the Hadoop landscape evolves, it is easy to retool an existing mapping with a new physical implementation. This way you don’t have to tie yourself to one language that is hyped this year, but might be legacy in the next.

ODI enables the generation from logical design into executed code through physical designs and Knowledge Modules. You can even define multiple physical designs for different languages based on the same logical design. For example, you could choose Hive as your transformation platform, and ODI would generate Hive SQL as the execution language. You could also pick Pig, and the generated code would be Pig Latin. If you choose Spark, ODI will generate PySpark code, which is Python with Spark APIs. Knowledge Modules will orchestrate the generation of code for the different languages and can be further configured to optimize the execution of the different implementation, for example parallelism in Pig or in-memory caching for Spark.

The example below shows an ODI mapping that reads from a log file in HDFS, registered in HCatalog. It gets filtered, aggregated, and then joined with another table, before being written into another HCatalog-based table. ODI can generate code for Hive, Pig, or Spark based on the Knowledge Modules chosen. 

 ODI provides developer productivity and can future-proof your investment by overcoming the need to manually code Hadoop transformations to a particular language.  You can logically design your mapping and then choose the implementation that best suits your use case.

Monday Feb 16, 2015

The Data Governance Commandments

This is the second of our Data Governance Series. Read the first part here.

The Four Pillars of Data Governance

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:

People

Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.

Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.

But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.

Processes

So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.

Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.

Technology

No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.

But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.

More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.

But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.

Best Practices

For your data governance to really deliver—and keep delivering—you need to follow best practices.

Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.

How Many Have You Got?

These four pillars are essential to holding up a great data governance strategy, and if you’re missing even one of them, you’re severely limiting the value and reliability of your data.

If you’re struggling to get all the pillars in place, you might want to read our short guide to data governance success.

Tuesday Feb 10, 2015

The Data Governance Commandments: Ignoring Your Data Challenges is Not an Option

This is the first of our Data Governance blog series. Read the next of the series here.

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

All businesses are data businesses in the modern world, and if you’re collecting any information on employees, performance, operations, or your customers, your organization is swimming in data by now. Whether you’re using it, or just sitting on it, that data is there and it is most definitely your responsibility.

Even if you lock it in a vault and bury your head in the sand, that data will still be there, and it will still be:

  • Subject to changeable regulations and legislation
  • An appealing target for cybercriminals
  • An opportunity that you’re missing out on

Those are already three very good reasons to start working on your data strategy. But let’s break it down a bit more.

Regulations

Few things stand still in the world of business, but regulations in particular can move lightning-fast.

If your data is sat in a data warehouse you built a few years ago, that data could now be stored in an insecure format, listed incorrectly, and violating new regulations you haven’t taken into account.

You may be ignoring the data, but regulatory bodies aren’t—and you don’t want to find yourself knee-deep in fines.

Security

Your network is like a big wall around your business. Cybercriminals only need to find one crack in the brickwork, and they’ll come flooding in.

Sure, you’ve kept firewalls, anti-virus software and your critical servers up to date, but what about that old data warehouse? How’s that looking?

If you’ve taken your eye off your DW for even a second, you’re putting all that data at risk. And if the cybercriminals establish a backdoor through the DW into the rest of the organization, who knows how far the damage could spread?

If you lose just consumer reputation and business following such a data breach, consider yourself lucky. The impact could be far worse for the organization that ignores its data security issues.

Potential

Even without the dangers of data neglect, ignoring your data means you’re ignoring fantastic business opportunities. The data you’re ignoring could be helping your business:

  • Better target marketing and sales activities
  • Make more informed business decisions
  • Get more from key business applications
  • Improve process efficiency

Can you afford to ignore all of these benefits, and risk the security and compliance of your data?

Thankfully, there are plenty of ways you can start tightening up your data strategy right away.

Check out our short guide to data governance, and discover the three principles you need to follow to take control of your data.

Thursday Nov 06, 2014

Oracle Data Integrator and Hortonworks

Check out Oracle's Alex Kotopoulis being features on Hortonworks blog discussing how Oracle Data Integrator is the best tool for data ingest to Hadoop!

Remember to register for the November 11th joint webinar presented by Jeff Pollock, VP Oracle, and Tim Hall, VP Hortonworks.  Click here to register.  

Monday Oct 20, 2014

Announcing Availability of Oracle Enterprise Metadata Management

Oracle today announced the general availability of Oracle Enterprise Metadata Management (OEMM), Oracle's comprehensive Metadata Management technology for Data Governance. With this release Oracle stresses the importance that it lays on it's product strategy that not just offers best in class Data Integration solutions like Oracle Data Integrator (ODI), Oracle GoldenGate (OGG) and Oracle Enterprise Data Quality (OEDQ), but also on technology that ties together business initiatives like governance.

Data Governance Considerations

Organizations have been struggling to impose credible governance onto their data for long with ad-hoc processes and technologies that are unwieldy  and unscalable. There were a number of reasons why this was the case.

  • a. Data Governance cannot be done without managing metadata.
  • b. Data Governance cannot be done without extending across all platforms irrespective of technologies.
  • c. Data Governance cannot be done without a business and IT friendly interface.

Complete Stewardship -  Data Transparency from Source to Report

The biggest advantages of having an airtight Data Governance program is to reduce data risk, increase security and to manage your organization's Data Life-cycle. Any governance tool should be able to surface lineage, impact analysis and data flow not just within a Business Analytics, or within a Data Ware house but across all these systems, no matter what technology one is using. This increased transparency assesses accurately risks and impacts during changes to data. 


Data Flow Diagram across platforms.

With a focus on stewardship OEMM is designed to be intuitive and search based. It's search catalog allows easy browsing of all objects with collaboration and social features for the Data Steward.

 Search based catalog and Business Glossary for easy browsing of objects.

Big Data Governance

OEMM along with Oracle Data Integrator provides a powerful combination to govern Big Data standards including HBase, SQOOP and JSON. With ODI providing complete support to these data standards for Data loading and transformation, OEMM harvests the ODI metadata to stitch together a complete data map that even traverses through any Big Data Reservoir that organizations have in place. 

Oracle and 3rd Party Metadata

OEMM is truly heterogeneous. It is designed to pull in and manage metadata from Oracle and 3rd party Data Bases, Data Warehouses, ETL, Business Intelligence, and other Reporting Tools. 

Visit the OEMM homepage for more information about Oracle Enterprise Metadata Management.

Friday Oct 17, 2014

Upcoming Webinar: Data Transformation and Acquisition Techniques, to Handle Petabytes of Data

Many organizations have become aware of the importance of big data technologies, such as Apache Hadoop but are struggling to determine the right architecture to integrate it with their existing analytics and data processing infrastructure. As companies are implementing Hadoop, they need to learn new skills and languages, which can impact developer productivity. Often times they resort to hand-coded solutions which can be brittle, impact the productivity of the developer and the efficiency of the Hadoop cluster.

To truly tap into the business benefits of the big data solutions, it’s necessary to ensure that the business and IT have simple tools-based methods to get data in, change and transform it, and keep it continuously updated with their data warehouse.

In this webinar you’ll learn how the Oracle and Hortonworks solution can:

  • Accelerate developer productivity
  • Optimize data transformation workloads for on Hadoop
  • Lower cost of data storage and processing
  • Minimize risks in deployment of big data projects
  • Provide proven industrial scale tooling for data integration projects

We will also discuss how technologies from both Oracle and Hortonworks can deploy the big data reservoir or data lake, an efficient cost-effective way to handle petabyte-scale data staging, transformations, and aged data requirements while reclaiming compute power and storage from your existing data warehouse.

Speakers:
Jeff Pollock, Vice President, Oracle
Tim Hall, Vice President, Hortonworks

Hosted by:
Tim Matteson
, Co-Founder, Data Science Central

Click Here to Register.

Wednesday Oct 15, 2014

Oracle Data Integrator Certified with Hortonworks HDP 2.1

To often companies fall into what they perceive is the path of least resistance by using custom, hand-coded methods to create big data solutions but with the rush to production these hand coded solutions more often perform slower and are more costly to maintain.   To truly tap into the business benefits of the big data solutions, a simple tools based solutions is required to move large volumes of data into Hadoop and efficiently transform it without the need for costly mid-tier servers.    The Oracle Data Integration Solutions team is pleased to announce the certification of Oracle Data Integrator with Hortonworks HDP 2.1. 

This collaboration between both the Oracle Data Integrator and Hortonworks teams will provide customers a familiar and comprehensive data integration platform for Hadoop covering high-volume, high-performance batch-loads,  agile transformations using the power of Hadoop and a superior developer experience with the flow-based declarative user interface of Oracle Data Integrator. 

To learn more, click here.    

Also, on November 11th, 2014 Jeff Pollock, VP Oracle Data Integration Solutions and Tim Hall, VP of Product Management Hortonworks will be hosting a joint webinar to discuss the certification and how technologies from both Oracle and Hortonworks can be used to deploy big data reservoirs.    To register, click here

Wednesday Jul 02, 2014

Learn more about ODI and Apache Sqoop

The ODI A-Team just published a new article about moving data from relational databases into Hadoop using ODI and Apache Sqoop. Check out the blog post here: Importing Data from SQL databases into Hadoop with Sqoop and Oracle Data Integrator (ODI)

Monday Jun 30, 2014

Oracle Enterprise Data Quality Product Family

Oracle Enterprise Data Quality (OEDQ) is critical to the Oracle Data Integration portfolio. OEDQ helps make sure that the data that customers use is fit for purpose. While Oracle Data Integrator (ODI) helps with Data Movement and Extract transform and Load (ELT) and Oracle GoldenGate (OGG) is a leader in data replication, OEDQ is the tool that helps maintain data consistency and quality.

Critical Data Quality Challenges

Data used for decision making and analytics has to be fully trustworthy. However in real life data rarely comes clean. It contains missing values, duplicate entries, misspelt words, non standardized names and various other forms of questionable data. Making critical decisions with such data results in operational inefficiencies, loss of goodwill among customers, faulty market readings and audit and compliance lapses.

Essential Data Quality Capabilities

Ever since there have been databases and applications, there have been data quality problems. Unfortunately all those problems are not created equal and neither are the solutions that address them. Some of the largest differences are driven by the data type, or domain, of the data in question. The most common data domains in data quality are customer (or more generally, party data including suppliers, employees, etc.) and product data. Oracle Enterprise Data Quality products recognize these differences and provide purpose-built capabilities to address each. Quick to deploy and easy to use, Oracle Enterprise Data Quality products bring the ability to enhance the quality of data to all stakeholders in any data management initiative.

 Oracle Enterprise Data Quality covers:

 • Profiling, Audit and Dashboards

• Parsing and Standardization

• Match and Merge

• Case Management

• Address Verification

• Product Data Capabilities

Zebra Technologies Uses OEDQ to Reduce Costs

"We chose Enterprise Data Quality because it is easy to integrate. We have got the business groups excited about cleaning up the data."

Click here to learn more about Oracle Enterprise Data Quality.