Wednesday Mar 12, 2014

ODI 12c - Data Input Experts

Back in the olde days of OWB I blogged about a few utilities (see here) that were useful for collecting user input data in custom flows, users build such flows to implement accelerators to take the mundane tasks out of common activities. In ODI you can also use groovy SwingBuilder, this let's you build useful dialogs very easily. I posted some examples such as the one below for model creation in ODI and a launchpad example;

The utilities for OWB I mentioned in the blog are just basic java classes that were invoked from OWB via tcl/jacl. These utilities are written in java and can still be used from ODI via groovy. Still as useful, still as functional. Let's see how we call them now!

The required JARs need to be put on the groovy classpath, which is under the ODI IDE's Tools->Preferences option, and then under ODI->System->Groovy and set the groovy classpath to include jexpert.jar, tcljava.jar and jacl.jar. For example I have the following referencing the JARs from my 11gR2 database which has the OWB code;

  • D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jexpert.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\tcljava.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jacl.jar

I can then launch the shuttle dialog for example as follows;

  1. import oracle.owb.jexpert.ShuttleObjects
  2. arrayOfString = [ "PRODUCT_ID", "PRODUCT_NAME", "PRODUCT_COLOR", "PRODUCT_DESC", "PRODUCT_LONG_DESC", "CATEGORY_ID", "CATEGORY_NAME", "CATEGORY_DESCRIPTION", "SUBCATEGORY_ID", "SUBCATEGORY_NAME", "SUBCATEGORY_DESCRIPTION" ]
  3. sels = ShuttleObjects.getselection("Select dimension levels", "Select columns to identify levels:", "Columns:", "Levels", (String[]) arrayOfString.toArray())

  4. println sels

I can use the returned variable sels and do whatever ODI stuff I need, you can see the code above executed from within ODI and the dialog appearing with the information;

Likewise the data entry dialog works as is, when that dialog is executed from groovy, just like in OWB we can get the information displayed, the user can enter data, we can collect it and action it in our groovy using the ODI SDK;

The blog on the 12c mapping SDK here has a good SDK reference table that gives you pointers for all parts of the product into the SDK areas. This is definitely a handy one to bookmark, I often use it myself. Learn some scripting it'll help save you are your teams a lot of time.

Monday Oct 14, 2013

Streaming relational transactions to HBase

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog posts on "Streaming Relational Transactions to HDFS and Hive", in this blog post I focus on the architecture for streaming transactions into HBase.

As shown in the diagram below, integrating database with HBase is accomplished by developing a custom handler using Oracle GoldenGate's Java API and HBase APIs.

 The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file. The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions to HBase.

A sample implementation of the HBase adapter is provided on My Oracle Support (Knowledge ID - 1586211.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding HBase table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

As always, I would greatly appreciate if you can share your use case about integrating Oracle GoldenGate with your Big Data strategy and your feedback on using the custom handler for integrating relational database with your Big Data systems. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Wednesday Oct 09, 2013

Streaming relational transactions to Hive

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog post on 'Streaming relational transactions to HDFS', in this blog post I will discuss the architecture for streaming relational transactions into Hive.

Referring to the architecture diagram below, integrating database with Hive is accomplished by developing a custom handler using Oracle GoldenGate's Java API and Hadoop HDFS APIs.

The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file.The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions in a desired format, with the appropriate content to a file which is defined by the Hive DDL for the table.

A sample implementation of the Hive adapter is provided on My Oracle Support (Knowledge ID - 1586188.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding Hive table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

It would be great if you could share your use case about leveraging Oracle GoldenGate in your Big Data strategy and your feedback on using the custom handler for integrating relational database with Hive. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Thursday Oct 03, 2013

Streaming relational transactions to Hadoop- HDFS

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems'  in this blog post I will drill down on the architecture for streaming  relational transactions into HDFS. 

As you can see in the architecture diagram below, you can integrate relational database with HDFS by developing a custom handler using Oracle GoldenGate's Java API and Hadoop HDFS APIs.

 The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file.

The Pump process executes the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions in the desired format, with the appropriate content to a desired file on HDFS.

A sample implementation of the HDFS adapter is provided on My Oracle Support (Knowledge ID - 1586210.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a file on HDFS. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

Please share your use case about how you use Oracle GoldenGate in your Big Data strategy and your feedback on using the custom handler for integrating relational database with your Big Data systems. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

Tuesday Mar 12, 2013

ODI - Compressing/Decompressing Files in Parallel

Here's a couple of user functions for compressing and decompressing files in parallel, you can control the degree of parallelism, it will compress/decompress all files in one directory and write into another. The number of parallel processes can be configured on invocation. I posted some time back about how ODI User Functions are really useful, this is a great example. What I have here is a couple of user functions you can call in an ODI Procedure or an ODI KM for example and the functions have 3 arguments; the input directory of files to compress/decompress, the output directory where zipped files will be stored or contents extracted to and the number of processes in the pool to process all of the files.

Below you can see the FileZipper user function used in an ODI procedure to demonstrate how to compress all of the files in a directory d:\inputlogs into a target directory d:\outputzips it uses 4 parallel processes to perform the compression. Obviously the performance is determined on the processors you have available in order to gain maximum benefit. 

You can download the user functions below, the target output directory must exist;

  1. FileZipper( inputDirectory, outputZipDirectory, numberOfProcesses) user function is here
  2. FileUnzipper( zipDirectory, outputDirectory, numberOfProcesses) user function is here

 

You can look into the user functions and edit, change and enhance. Let me know what you think. The implementation is java code that uses Java thread pools introduced in Java 5. A great example of leveraging java core capabilities to provide real benefit in your integration processes.

Thursday Jan 10, 2013

ODI - MongoDB and Java Table Functions Anyone?

Let's see how we could integrate MongoDB using ODI, first take a step back. Derby/JavaDB introduced table functions a few years ago. Table functions are really useful, they are in the Oracle database and as much fun in JavaDB! ODI is a great platform for integration and JavaDB and table functions provide a really nice way to integrate arbitrary Java APIs into your designs. What I have done here is;

  • built a very simple java table function to project a table to represent the documents in a MongoDB collection. The collection is passed as a parameter to the KM and the column names are the keys for the MongoDB document. The data comes from the values.
  • built a very simple LKM from a Java table function to SQL

All of this will use the JavaDB in-memory, so no admin, simple to use. Rick Hillegas wrote a nice article with some handy classes that I have used. The mongo_table class I have written uses the EnumeratorTableFunction class included in Rick's examples. The MongoDB DBCursor class is a Java Iterator, which makes it really nice to pass to the  EnumeratorTableFunction class, and let it do all of the work.

The LKM I constructed declares the table function to JavaDB/Derby, for example below, the function is declared based on the source datastore name (MONGO_EMPS) and columns (my source datastore has EMPNO,ENAME,SAL, note the table function will actually project types defined in ODI's datastore), the function has the MongoDB database name and collection name as parameters.

  1. create function MONGO_EMPS( dbName varchar( 330), collectionName varchar( 30))
  2. returns table
  3. (
  4.   EMPNO VARCHAR(20),
  5.   ENAME VARCHAR(30),
  6.   SAL NUMERIC(10),
  7. )
  8. language java
  9. parameter style DERBY_JDBC_RESULT_SET
  10. no sql
  11. external name 'mongo_table.readCollection'

Then the actual code to use the function as a source is executed from a source task (the target is SQL as I mentioned earlier for the LKM). Below you can see my execution using the test MongoDB and the myStuff collection;

  1. select
  2. MON.EMPNO   C1_EMPNO,
  3. MON.ENAME   C2_ENAME,
  4. MON.SAL   C6_SAL
  5. from table(MONGO_EMPS('test', 'myStuff' )) MON
  6. where (1=1)
  7. And (MON.SAL > 4000)

Note I can also perform some filtering as an example, here it is being done in JavaDB and in my case its in-memory. No setup, no persistence just on the fly Java. Ideally I would push the filter down to MongoDB rather than reading and filtering in the driver - more on that later.

I had defined my documents in MongoDB using the following basic commands in the mongo shell;

  1. use test
  2. a1 = { EMPNO: "1", ENAME : "Fred", SAL : 10000 }
  3. a2 = { EMPNO: "2", ENAME : "John", SAL : 2000 }
  4. db.myStuff.insert( a1 )
  5. db.myStuff.insert( a2 )

In ODI, I can simply then use the datastore representing the MongoDB collection of documents in an interface and map it to my target;

The physical design uses the LKM JavaTableFunction to SQL KM and sets the MongoDB databases, collection and the Java table function name.

That's it. Pretty straightforward and we are reading and consuming MongoDB documents. So what about complex document types like this? These are transported, more to come. The other point here is that this is a generic LKM that you can plug in other arbitrary table functions - so you can consume from any API, its very simple. For the LKM I created, I just defined 2 parameters (because my table function only had two), but really we need a better way to handle this and ensure they are ignored if not defined etc. That's all polishing tho, anyway fun stuff you can see all with a small piece of code leveraging JavaDB!

Friday Oct 05, 2012

Hidden Gems: Accelerating Oracle Data Integrator with SOA, Groovy, SDK, and XML

On the last day of Oracle OpenWorld, we had a final advanced session on getting the most out of Oracle Data Integrator through the use of various advanced techniques.

The primary way to improve your ODI processes is to choose the optimal knowledge modules for your load and take advantage of the optimized tools of your database, such as OracleDataPump and similar mechanisms in other databases. Knowledge modules also allow you to customize tasks, allowing you to codify best practices that are consistently applied by all integration developers.

ODI SDK is another very powerful means to automate and speed up your integration development process. This allows you to automate Life Cycle Management, code comparison, repetitive code generation and change of your integration projects. The SDK is easily accessible through Java or scripting languages such as Groovy and Jython.

Finally, all Oracle Data Integration products provide services that can be integrated into a larger Service Oriented Architecture. This moved data integration from an isolated environment into an agile part of a larger business process environment. All Oracle data integration products can play a part in thisracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions.

  • Oracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions.
  • Oracle Data Integrator allows full control of its runtime sessions through web services, so that integration jobs can become part of business processes.
  • Oracle Data Service Integrator provides a data virtualization layer over your distributed sources, allowing unified reading and updating for heterogeneous data without replicating and moving data.
  • Oracle Enterprise Data Quality provides data quality services to cleanse and deduplicate your records through web services.

Tuesday Jun 19, 2012

ODI 11g - Faster Files

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

  1. improves the out of the box experience – just build the mapping and the appropriate KM is used
  2. improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

So if you are doing any file to file transformations, check it out!

Thursday Oct 13, 2011

Oracle GoldenGate's Advanced Features and Best Practices: A Recap from OpenWorld

I was joined by Shan Nawaz of Verisign last Thursday morning for a great session covering best practices and advanced features of Oracle GoldenGate. Despite the rain; people recovering from the previous nights festivities; and coinciding with a whole bunch of cloud sessions and demos; we had pretty amazing attendance.

Oracle GoldenGate Advanced Features and Best Practices Session

I kicked off by giving an overview of Oracle Data Integration and a quick crash course on Oracle GoldenGate for the few in the audience not familiar with out technology.

After that, I handed over to Shan, who did a terrific job covering some of the best practices for getting optimal performance out of Oracle GoldenGate. He started off with an image that truly summarizes the difference GoldenGate can make to an enterprise:

Shan then moved on to cover a number of philosophies and techniques that he had used successfully to maximize the potential of Verisign’s investment in Oracle GoldenGate, including:

  • Making sure you understand what the data integration requirements are before you start. What is the primary objective of using GoldenGate? What is the maximum throughput you expect, and maximum latency you can accept? If you don’t know these things, you won’t be able to measure success.
  • Learning when you might need to consider parallel data streams to handle maximum load, and what functionality is available to help you achieve that.
  • Determining the impact of I/O on GoldenGate processes and architecting solutions to minimize this.
  • Improving the performance of the GoldenGate Capture, Pump and Delivery processes and optimal tuning of network components.

This was followed by my Advanced Features portion of the session, in which I highlighted some of the less commonly known aspects of Oracle GoldenGate functionality. I covered:

  • The Event Marker Infrastructure – which allows GoldenGate processes to take specific actions whenever they encounter certain database records
  • Flat File Integration – which adds micro-batch file delivery capabilities to GoldenGate
  • Java Integration – which enables GoldenGate to invoke Java APIs based on database events; and provides capture and delivery of data using a JMS message bus

Finally, I gave a preview of some future (next calendar year) functionality that aims to enable Oracle GoldenGate to directly integrate with Oracle Coherence and Oracle TopLink to allow database changes to immediately trigger cache changes. This should help solve a number of use-cases where cached data may become stale if the underlying data changes.

With so much content, we didn’t have much time for Q&A, but still managed to have quite a lively, if brief, Q&A session. Shan had to leave quickly to catch an international flight, but still graciously answered some in-depth questions and swapped contact information with another customer so they could continue the conversation off-line.

It was a pleasure to present with Shan, and I am looking forward to similar sessions at Oracle OpenWorld 2012.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today