Thursday Aug 30, 2012

Article about Oracle Enterprise Data Quality

Rittman Mead recently published an interesting article on Oracle Enterprise Data Quality and on its integration with Oracle Data Integrator. You can find it at Introducing Oracle Enterprise Data Quality.

Wednesday Aug 29, 2012

Load Plan article in Oracle Magazine

Timely article in Oracle Magazine on ODI Load Plans from Mark Rittman in the current issue, worth having a quick read of the article and play with the sample which is included if you get the time. Thanks to Mark for investing the time and energy providing such useful information to the community.

http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52bi-1735905.html

Mark goes over the main benefits of the load plan in the article. Interested to hear any creative use cases or comments in general.

Monday Aug 27, 2012

It was worth the wait… Welcome Oracle GoldenGate 11g Release 2

It certainly was worth the wait to meet Oracle GoldenGate 11gR2, because it is full of new features on multiple fronts. In fact, this release has the longest and strongest list of new features in Oracle GoldenGate’s history. The new release brings GoldenGate closer to the Oracle Database while expanding the support for global implementations and heterogeneous systems. It is more secure, more flexible, and faster.

We announced the availability of Oracle GoldenGate 11gR2 via a press release. If you haven’t seen it yet, please check it out. As covered in this announcement, there are a variety of improvements in the product:

  • Integrated Capture for Oracle Database: brings Oracle GoldenGate’s Capture process closer to the Oracle Database engine and enables support for Advanced Compression among other benefits.
  • Enhanced Conflict Detection & Resolution, speeds and simplifies the conflict detection and resolution process for Active-Active deployments.
  • Globalization, meaning Oracle GoldenGate can be deployed for databases that use multi-byte/Unicode character sets.
  • Security and Performance Improvements, includes support Federal Information Protection Standard (FIPS).
  • Increased Extensibility by kicking off actions based on an event record in the transaction log or in the Trail file.
  • Integration with Oracle Enterprise Manager 12c , in addition to the Oracle GoldenGate Monitor product.
  • Expanded Heterogeneity, including capture from IBM DB2 for i on iSeries (AS/400) and delivery to Postgres

We will explain these new features in more detail at our upcoming launch webcast:

(Sept 12 8am/10am PT)

In addition to learning more about these new features, the webcast will allow you to ask your questions to product management via live Q&A section. So, I hope you will not miss this opportunity to explore the new release of Oracle GoldenGate 11g and see how it can deliver enterprise-class real-time data integration solutions..

I look forward to a great webcast to unveil GoldenGate’s new capabilities.

Monday Aug 20, 2012

Hadoop - Invoke Map Reduce

Carrying on from the previous post on Hadoop and HDFS with ODI packages, this is another needed call out task - how to execute existing map-reduce code from ODI. I will show this by using ODI Packages and the Open Tool framework.

The Hadoop JobConf SDK is the class needed for initiating jobs whether local or remote – so the ODI agent could be hosted on a system other than the Hadoop cluster for example, and just fire jobs off to the Hadoop cluster. Also some useful posts such as this older one on executing map-reduce jobs from java (following the reply from Thomas Jungblut in this post) helped me get up to speed.

Where better to start than the WordCount example (see a version of it here, both mapper and reducer and inner classes), let’s see how this can be invoked from an ODI package. The HadoopRunJob below is a tool I added via the Open Tool framework, it basically wrappers the JobConf SDK, the parameters are defined in ODI.

You can see some of the parameters below, so I define the various class names I need below, plus various other parameters including the Hadoop job name, can also specify the job tracker to fire the job on (for a client-server style architecture). The input path and output path are also defined as parameters, you can see the first tool in the package is calling the copy file to HDFS – this is just to demonstrate that I will copy the files needed by the WordCount program into HDFS ready for it to run.

Nice and simple, and shields a lot of the complexity hidden behind some simple tools. The JAR file containing WordCount needed to be available to the ODI Agent (or Studio since I invoked it with the local agent), that was it. When the package is executed, just like normal the agent processes the code and executes the steps. I I run the package above it will successfuly copy the files to HDFS and perform the word count. On a second execution of the package an error will be reported because the output directory already exists as below.

I left the example like this to illustrate that we can then extend the package design to have conditional branching to handle errors after a flow, just like the following;

Here after executing the word count, the status is checked and you can conditionally branch on success or failure – just like any other ODI package. I used the beep just for demonstration.

The above HadoopRunJob tool used above was done using the v1 MapReduce SDK, with MR2/Yarn this again will change – these kinds of changes hammer home the need for better tooling to abstract common concepts for users to exploit.

You can see from these posts that we can provide useful tooling behind the basic mechanics of Hadoop and HDFS very easily, along with the power of generating map-reduce jobs from interface designs which you can see from the OBEs here.

Book Review Getting Started with Oracle Data Integrator 11g

The Getting Started with Oracle Data Integrator 11g: A hands On Tutorial book is a good introduction to ODI with some common examples that will be useful to many. It illustrates how data can be imported from various sources, transformed and exported to various sources with ODI. Following the step by step guide of the examples and you will have the right knowledge at the end of the book to create your own data integration solutions. The book is easy to read for novices and offers a great introduction to the world of ODI with useful tips from the authors - and there are a few of them ;-)

This book is an ideal book for somebody who wants to learn quickly about ODI, it's not a reference manual but a good introduction. It is filled with an abundance of useful info, I even discovered new information myself! Such as the Column Setup Wizard when importing flat file data. The chapter on Working with XML Files is also a useful one for anyone working in that area clearly spelling out with an example how it works.

The first and second chapters are all about the product overview and installation. Chapter 3 introduces variables which is a useful reference since it details now only how variables are defined but also where they can be used - this is a much asked question which I have seen many ask for mailing lists and forums. So good to see it included here. The fourth chapter introduces the design objects including topology, models and interfaces. There are useful notes scattered throughout the book which are worth checking out - these are little insights into useful hints on using the tool, personally these are my favorite items in the book. Chapters five, six and seven have illustrations of working with databases including Oracle, MySQL and Microsoft SQL Server covering various areas along the way from building interfaces using ODI's declarative designer, designing lookups, writing transformation expressions and KM selection. Chapters eight and nine are all about files and XML files and as I mentioned even surprised me with a few little pieces of info. Chapter 10 covers the workflow oriented objects including load plans, packages and procedures - a lot for a single chapter but an ok introduction. These 3 could easily be expanded into much, much more information. Chapter 11 covers errors in general - nice to see this too, this gives some insight of how the error handling in ODI works, and where to go to check up on errors. Finally there is an introduction to the ODI management components including the integration with Oracle's Enterprise Manager and the ODI console itself. There are some topics covered quickly such as procedures and I couldn't find any information on user functions, but on the whole it's a good start.

All in all, this book is in excellent read for somebody who wants to have a quick start in ODI. Its a useful book to dive into each chapter and have a read up on topics you don;t have to read end to end - my favorite type. Its an introduction, not a cookbook, the cookbook would be another useful companion book for ODI! This book is ideal for somebody who wants to get up and running in a short amount of time. If you interested in this book, you can get further info from Packt Publishing  here plus see Julien's earlier blog post here on discounts.

Thursday Aug 16, 2012

Hadoop and HDFS - file system tools

Underpinning the Oracle Big Data Appliance (and any other Hadoop cluster) is HDFS, working with files in HDFS is just like working with regular file systems (quite different under the hood) but to manipulate from the OS you just have a different API to use, Hadoop uses the 'hadoop fs' command prior to a mkdir or rm, whereas local file systems use just mkdir or rm. ODI has file management tools available in the package editor- there are tools for preparing files and moving them around. The HDFS commands can act just like any other tool in ODI, let’s see how!

Here I will show you how I added a bunch of tools to perform common HDFS file system commands and map-reduce job execution from within the ODI package editor – this is over and above the support ODI has for building interfaces that exploit the Hadoop ecosystem.

You will see how users can easily perform HDFS actions and execute map-reduce jobs – I will save the later for another post. Using ODI Tools extensibility into the package editor, I have wrapped the following;

Tool

Description

HDFSCopyFromLocalFile

Copy a file (or group of files) from the local file system to HDFS.

HDFSCopyToLocalFile

Copy a file (or group of files) from HDFS to the local file system.

HDFSMkDirs

Create directories in HDFS. Will create intermediate directories in a path that are missing.

HDFSMoveFromLocalFile

Move a file (or group of files) from the local file system to HDFS.

HDFSRm

Delete directories in HDFS – can recursively delete also.

HadoopRunJob

Run a map-reduce job by defining the job parameters - mapper class, reducer class, formats, input, output etc.

These are common HDFS actions that every Hadoop training walks the user through and are equivalents of the local file system tools ODI has under the Files group in the package toolbox. You can see from the example below the HDFSCopyFromLocalFile call has a source uri to copy and a destination on HDFS - its very simple to use. This uses the FileSystem SDK to manipulate the HDFS file system. The HDFSCopyFromLocalFile below copies a file /home/oracle/weblogs/apache08122012_1290.log from the local file system, to HDFS at /users/oracle/logs/input. Very simple.

The above flow basically removes a directory structure, creates a directory, copies some files, runs a few map-reduce jobs and then tidies up with some removal. I built these tools using ODI’s Open Tool SDK, its another great way of extending the product. In the above image you can see in the Toolbox there is a bunch of tools in the Plugins folder. These tools used Hadoop’s SDKs including org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path and org.apache.hadoop.conf.Configuration. Another cool aspect to ODI Open Tools is that you can call the tools from commands within a KM task, or from procedure commands.

The dependent JARs must be available for ODI to execute. This is a common reply in forums across the web to developers using these SDKs. The HadoopRunJob uses the JobConf SDK to define the map-reduce job (following the reply from Thomas Jungblut in this post). I will cover this in another blog post. For more on integrating with Hadoop and ODI see the self study material here.

Monday Aug 06, 2012

Deduplicating and Creating Lists – Oracle SQL LISTAGG

How do you leverage LISTAGG from ODI? How do you aggregate rows into strings or lists? Let’s see! Here we will see a few things including LISTAGG, an Oracle Analytic SQL function useful for creating lists from rows of information. We will also see how ODI can be extended to recognize this special function and generate the code we desire.

The data in our example has many order numbers for each customer. What we want is a single row for the customer and a comma separated list of order numbers for example. The LISTAGG function is perfect for this, and is blogged about all over the place. The example we will build in ODI takes the data from a source table and creates a comma separated list of order numbers;

To get ODI to recognize LISTAGG as an aggregation function we extend the ODI language elements in the Topology by adding a LISTAGG entry and define it as a ‘Group Function’ as below (I defined one implementation for Oracle and switched off the Universal flag). This will enable the ODI built-in aggregation analysis, so we get the group by generated automatically.

The ODI interface we will define looks like the following – we simply define the LISTAGG expression as an expression in the target column mapping and the CUSTID column will be the group by!

I did come across a limitation which would have been better had it not been there for user functions – they cannot include aggregation expressions, otherwise we could wrapper this up in a nice friendly user function (avoiding the technology specific grammar).

Other systems have such aggregation and list creation capabilities – for example if you look around Hypersonic SQL added GROUP_CONCAT for example to do similar list creation stuff, many others seem to be derived from the XMLAGG functions which build XML structures from relational.

The LISTAGG function is a useful function to remember.

Wednesday Aug 01, 2012

ODI 11g - Hadoop integration self study

There is a self study available at the link below which is a great introduction to the Hadoop related integration available in ODI 11.1.1.6 (see earlier blog here). Thanks to the curriculum development group for creating this material. You can see from the study how ODI was extended to support integration in and out of the Hadoop ecosystem.

https://apex.oracle.com/pls/apex/f?p=44785:24:0::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:6130,29

The paper here titled 'High Performance Connectors for Load and Access of Data from Hadoop to Oracle  Database' which describes the raw capabilities in the Oracle Loader for Hadoop and Oracle Direct Connector for HDFS is encapsulated in the HDFS File/Hive to Oracle KM, so the different options for loading described in the paper are modeled as capabilities of the Knowledge Module. Another great illustration of the capabilities of KMs.

Much more to come in this space... 

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« August 2012 »
SunMonTueWedThuFriSat
   
2
3
4
5
7
8
9
10
11
12
13
14
15
17
18
19
21
22
23
24
25
26
28
31
 
       
Today