Monday Aug 26, 2013

ODI - File transformation tool

ODI 11g introduced the complex file technology which supported the ability to define a parser for data files and plug it into ODI. I have created an ODI Open Tool (ODIFileTransformation) which utilizes part of this capability just to do native file to XML file transformation - so just file to file based on the nXSD parser definition. The native input file (such as COBOL or whatever) is parsed based on the nXSD definition and the resultant XML file streamed directly (to disk, a pipe or wherever). Btw...its very easy to extend ODI in a number of ways, adding a tool is an easy way to integrate existing capabilities.

Why? It gives us different opportunities for consuming the parsed data and it is now in a convenient form (XML). As an illustration I processed a 200Mb data file and the resultant XML file was generated in 120 seconds whereas using the complex file technology directly in ODI and staging the same native data file in a database took 9000 seconds. There is no staging with the tool, it simply reads the data, transforms using the nXSD and produces the generated XML. So.... you could write the resultant file on to a disk such as HDFS then process it using Hadoop.

The tool has the following signature and is useful for running from the command line for performing file to file transformation or from within a package. This tool was based on a test tool that was blogged here and I changed to remove the XML DOM nature so that it streams to the result file.

The tool is on the java.net site, to install, you will need the java mail JAR (download here) and the tool itself (download zip containing JAR and source here). The JARs get copied to the ODI userlib directory and then you restart the client and register the open tool. The tool can have more options added including debatching into chunked files. Hopefully some useful bits and pieces here. 

Tuesday Mar 12, 2013

ODI - Compressing/Decompressing Files in Parallel

Here's a couple of user functions for compressing and decompressing files in parallel, you can control the degree of parallelism, it will compress/decompress all files in one directory and write into another. The number of parallel processes can be configured on invocation. I posted some time back about how ODI User Functions are really useful, this is a great example. What I have here is a couple of user functions you can call in an ODI Procedure or an ODI KM for example and the functions have 3 arguments; the input directory of files to compress/decompress, the output directory where zipped files will be stored or contents extracted to and the number of processes in the pool to process all of the files.

Below you can see the FileZipper user function used in an ODI procedure to demonstrate how to compress all of the files in a directory d:\inputlogs into a target directory d:\outputzips it uses 4 parallel processes to perform the compression. Obviously the performance is determined on the processors you have available in order to gain maximum benefit. 

You can download the user functions below, the target output directory must exist;

  1. FileZipper( inputDirectory, outputZipDirectory, numberOfProcesses) user function is here
  2. FileUnzipper( zipDirectory, outputDirectory, numberOfProcesses) user function is here

 

You can look into the user functions and edit, change and enhance. Let me know what you think. The implementation is java code that uses Java thread pools introduced in Java 5. A great example of leveraging java core capabilities to provide real benefit in your integration processes.

Tuesday Jun 19, 2012

ODI 11g - Faster Files

Deep in the trenches of ODI development I raised my head above the parapet to read a few odds and ends and then think why don’t they know this? Such as this article here – in the past customers (see forum) were told to use a staging route which has a big overhead for large files. This KM is an example of the great extensibility capabilities of ODI, its quite simple, just a new KM that;

  1. improves the out of the box experience – just build the mapping and the appropriate KM is used
  2. improves out of the box performance for file to file data movement.

This improvement for out of the box handling for File to File data integration cases (from the 11.1.1.5.2 companion CD and on) dramatically speeds up the file integration handling. In the past I had seem some consultants write perl versions of the file to file integration case, now Oracle ships this KM to fill the gap. You can find the documentation for the IKM here. The KM uses pure java to perform the integration, using java.io classes to read and write the file in a pipe – it uses java threading in order to super-charge the file processing, and can process several source files at once when the datastore's resource name contains a wildcard. This is a big step for regular file processing on the way to super-charging big data files using Hadoop – the KM works with the lightweight agent and regular filesystems.

So in my design below transforming a bunch of files, by default the IKM File to File (Java) knowledge module was assigned. I pointed the KM at my JDK (since the KM generates and compiles java), and I also increased the thread count to 2, to take advantage of my 2 processors.

For my illustration I transformed (can also filter if desired) and moved about 1.3Gb with 2 threads in 140 seconds (with a single thread it took 220 seconds) - by no means was this on any super computer by the way. The great thing here is that it worked well out of the box from the design to the execution without any funky configuration, plus, and a big plus it was much faster than before,

So if you are doing any file to file transformations, check it out!

Thursday Oct 13, 2011

Oracle GoldenGate's Advanced Features and Best Practices: A Recap from OpenWorld

I was joined by Shan Nawaz of Verisign last Thursday morning for a great session covering best practices and advanced features of Oracle GoldenGate. Despite the rain; people recovering from the previous nights festivities; and coinciding with a whole bunch of cloud sessions and demos; we had pretty amazing attendance.

Oracle GoldenGate Advanced Features and Best Practices Session

I kicked off by giving an overview of Oracle Data Integration and a quick crash course on Oracle GoldenGate for the few in the audience not familiar with out technology.

After that, I handed over to Shan, who did a terrific job covering some of the best practices for getting optimal performance out of Oracle GoldenGate. He started off with an image that truly summarizes the difference GoldenGate can make to an enterprise:

Shan then moved on to cover a number of philosophies and techniques that he had used successfully to maximize the potential of Verisign’s investment in Oracle GoldenGate, including:

  • Making sure you understand what the data integration requirements are before you start. What is the primary objective of using GoldenGate? What is the maximum throughput you expect, and maximum latency you can accept? If you don’t know these things, you won’t be able to measure success.
  • Learning when you might need to consider parallel data streams to handle maximum load, and what functionality is available to help you achieve that.
  • Determining the impact of I/O on GoldenGate processes and architecting solutions to minimize this.
  • Improving the performance of the GoldenGate Capture, Pump and Delivery processes and optimal tuning of network components.

This was followed by my Advanced Features portion of the session, in which I highlighted some of the less commonly known aspects of Oracle GoldenGate functionality. I covered:

  • The Event Marker Infrastructure – which allows GoldenGate processes to take specific actions whenever they encounter certain database records
  • Flat File Integration – which adds micro-batch file delivery capabilities to GoldenGate
  • Java Integration – which enables GoldenGate to invoke Java APIs based on database events; and provides capture and delivery of data using a JMS message bus

Finally, I gave a preview of some future (next calendar year) functionality that aims to enable Oracle GoldenGate to directly integrate with Oracle Coherence and Oracle TopLink to allow database changes to immediately trigger cache changes. This should help solve a number of use-cases where cached data may become stale if the underlying data changes.

With so much content, we didn’t have much time for Q&A, but still managed to have quite a lively, if brief, Q&A session. Shan had to leave quickly to catch an international flight, but still graciously answered some in-depth questions and swapped contact information with another customer so they could continue the conversation off-line.

It was a pleasure to present with Shan, and I am looking forward to similar sessions at Oracle OpenWorld 2012.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today