Friday Feb 14, 2014

ODI 12c - Table Functions, Parallel Unload to File and More

ODI 12c includes a new component for integrating and transformation data programmatically, there have been plenty of examples through the years of such implementations, recent examples include SQL access to R from Mark Hornick (see an example blog here). As well as a great integration technique they have fantastic performance and scalability options - hence you see posts and talks from Kuassi Mensah on in-database map-reduce; all about leveraging the Oracle database's parallel query engine and the skills you already have (SQL and PLSQL/java).

The table function component in ODI 12c lets you integrate an existing table function implementation into a flow - the parameters for the table function can be scalar or a ref cursor, you can see how the examples from the AMIS posting here are defined within the mapping designer below, there are multiple table functions chained together, used as both a data source and a transformation;

In the above image you can see the table function name defined in the database is specified in the component's general properties (property is Function Name). The signature for the function must be manually defined by adding input/output connector points and attributes. Check the AMIS blog and reflect on the design above.

Regarding performance, one of the examples I blogged (OWB here and ODI here) was parallel unload to file. The table function examples from those previous blogs were fairly rudimentary, in this blog we will see what happens when we tweak the implementation of such functions - we can get much better performance. Here is the table function implementation I will use within the ODI examples (the type definitions used come from the OWB blog post above).

  1. create or replace function ParallelUnloadX (r SYS_REFCURSOR) return NumSet 
  3.    TYPE row_ntt IS TABLE OF VARCHAR2(32767);
  4.    v_rows row_ntt;
  5.    v_buffer VARCHAR2(32767);
  6.    i binary_integer := 0; 
  7.    v_lines pls_integer := 0;
  8.    c_eol CONSTANT VARCHAR2(1) := CHR(10); 
  9.    c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 
  10.    c_maxline CONSTANT PLS_INTEGER := 32767; 
  11.    out utl_file.file_type; 
  12.    filename varchar2(256) := 'dbunload'; 
  13.    directoryname varchar2(256) := 'MY_DIR'; 
  14.    vsid varchar2(120); 
  15. begin 
  16.    select sid into vsid from v$mystat where rownum=1; 
  17.    filename := filename || vsid || '.dat'; 
  18.    out := utl_file.fopen (directoryname, filename , 'w');

  19.    loop 
  20.      fetch r BULK COLLECT INTO v_rows; 
  21.      for i in 1..v_rows.COUNT LOOP
  22.        if LENGTH(v_buffer) + c_eollen + LENGTH (v_rows(i)) <= c_maxline THEN
  23.          v_buffer := v_buffer || c_eol || v_rows(i);
  24.        else
  25.          IF v_buffer IS NOT NULL then
  26.            utl_file.put_line(out, v_buffer);
  27.          end if;
  28.          v_buffer := v_rows(i);
  29.        end if;
  30.      end loop;
  31.      v_lines := v_lines + v_rows.COUNT;
  32.      exit when r%notfound;
  33.    end loop;
  34.    close r;
  35.    utl_file.put_line(out, v_buffer); 

  36.    utl_file.fclose(out); 
  37.    PIPE ROW(i); 
  38.    return ;
  39. end; 
  40. /

The function uses PARALLEL_ENABLE and PARTITION BY keywords - these 2 are critical to performance and scalability. In addition, this function is further optimized; it uses the PLSQL BULK COLLECT capability and also buffers data in PLSQL variables before writing to file (this avoids IO calls). This was not rocket science to tune (plenty of posts on PLSQL IO tuning such as this) yet you can see the impact it has on performance further below.

My mapping using the table function as a target is shown below, 

In the physical design I define the parallel hints, this will then perform parallel unloads to file and you can easily leverage the hardware and power of the Oracle database. Using the hints to tweak the physical design let's the designer very easily compare and tune performance - you do not have to design the parallelism in your own flows.

In the table below you can see the performance difference when I use the PARALLEL(4) hint on a 4 CPU machine;

5 million rows  16s  6s
32 million rows 200s  47s 

If I execute the agent based SQL to file LKM, the time taken out of the box is 398 seconds (slower than 47s above when a hint is used) on the 32 million row example, the only divide and conquer techniques with the LKM are building a custom workflow to do such. With the table function approach if your database is on a bigger, more powerful host you can easily take advantage of the system by tweaking the hints.

As you see, the ODI table function component provides another custom exit point in a flow which let's you not only provide some useful integration capabilities but you can also do it in a very efficient manner - leveraging and exploiting the database you are running on. Hopefully this gives you a little insight and an interesting use case or two.

Thursday Jul 14, 2011

ODI 11g - Parallel, Pipelined Unload to File

Here we can see how we can leverage table functions as targets in ODI interfaces to do cool stuff! It carries on from my pipelining post. Table functions let you do all kinds of powerful transformations and are a great way for incorporating in-memory custom transformations on sets. The illustration here is using a table function to parallel unload to file, it was initially posted on the OWB blog here.

There is a viewlet right here that walks you through the creation of the interface.

There are controls at the bottom if you need to pause, or step forward/backward. 


Tuesday Jul 05, 2011

ODI 11g - Pipelines

Carrying on from the simple, flexible, powerful posting on the code generation capabilities of ODI 11g which illustrated the SQL as a Source option we can see how additional new code generation components can be defined (which let us capture information about specific transformations) and assemble in a pipeline. This is the kind of info I covered in my ODTUG KScope11 'Pushing the Envelope with ODIEE' talk at Long Beach this year - had a great time meeting everyone there!

The example we will illustrate here is using table functions (could be other transformations such as SQL unpivot) in the Oracle database – table functions are functions that can be queried like a table and support parallelization capabilities along with piping of data which are critical elements for improving performance and leveraging capabilities of the server (the same approach can be used for DB2 table functions or Microsoft SQLServer CROSS APPLY style capabilities). The flow can be constructed by piping SQL and transformations (like table functions) in an assembly like manner by constructing interfaces in a modular manner.

Below is an illustration of a number of interfaces assembled in a pipeline which can leverage the code generation capabilities in ODI 11g for assembling components.

We have seen from other posts how the interfaces can be organized into flows which are resolved into a single SQL (for example) with sub-queries. In this post you will see how transformations such as table functions can be incorporated into a design and the components assembled. The ODI 11g capability to define the SQL subquery generator within a KM opens the door to this flexibility. Table functions themselves are useful for many things;

  • parallelizing arbitrarily complex (PLSQL, SQL PL) code in a SQL manner
  • eliminating intermediate storage
  • SQL-izing the stored procedure

In the illustration below I have a pipelined table function FX which is a temporary target data store process data from datastores T1 and T2. The output of the table function is then joined with T3 and further consumed in target FX_TAB. The temporary datastore FX (representing the table function) has user defined properties set which indicate the inputs and outputs for FX (A and B are inputs and have expressions, X and Y are outputs).
The image has the two interfaces side by side, on the left we have a temporary interface for T1 and T2 joined and providing data for table function FX. On the right, the result of the table function is joined with T3 and written to FX_TAB.

The above illustrates how table functions can be used in pipelined manner with query inputs. The meaning of the columns in the FX temporary target datastore is inferred from the user defined property settings – this lets us distinguish the inputs from the outputs. Below in the KM definition (SQL_TableFunction) we can see how the SQL for the transformation type is generated.

Things to note are that the KM supports both cursor and scalar definitions (so a query can be the input to the table function or alternatively you can provide scalar values such as ODI variables, the option CURSOR_INPUT provides the switch between cursor and scalar). So, importantly table functions can be used as a data source, as transformation pipes and and targets (since there is a step in the KM for this). A good use case for a target table function is when the results of the SQL are being written in parallel to file for example.

In my case I used the cursor approach, so the CURSOR_INPUT option for the SQL_TableFunction KM assignment is set to true (the default);

The cursor option in the KM lets us generate either the following style for the cursor;

…from TABLE ( <tablefunction> ( CURSOR(SELECT <columns> FROM <tables> where <conditions>

or for scalar inputs;

…from TABLE ( <tablefunction> ( <variables> ) )

Simple and flexible. As you can see we can assemble a flow of ODI interfaces into a series of pipes that interconnect leveraging ODIs declarative description for the capabilities that are supported well out of the box and extend to encompass table functions for incorporating pipelining or other transformation capabilities that you can think of. A topical area of assembly style problem solving is the divide and conquer Map-Reduce style problem, check out Jean-Pierre’s example from the Oracle Data Warehousing blog MapReduce + Oracle = Table Functions, we can see how this maps to the components we have described above;

each mapper and reducer is an ODI interface.

Where is the sample KM? It's on the Oracle Code Sample site here;

The example above was illustrated using SQL and table functions to chain pipes together, there are other avenues for creating named pipes between interfaces to do much, much more.


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« April 2014