Friday Feb 14, 2014

ODI 12c - Table Functions, Parallel Unload to File and More

ODI 12c includes a new component for integrating and transformation data programmatically, there have been plenty of examples through the years of such implementations, recent examples include SQL access to R from Mark Hornick (see an example blog here). As well as a great integration technique they have fantastic performance and scalability options - hence you see posts and talks from Kuassi Mensah on in-database map-reduce; all about leveraging the Oracle database's parallel query engine and the skills you already have (SQL and PLSQL/java).

The table function component in ODI 12c lets you integrate an existing table function implementation into a flow - the parameters for the table function can be scalar or a ref cursor, you can see how the examples from the AMIS posting here are defined within the mapping designer below, there are multiple table functions chained together, used as both a data source and a transformation;

In the above image you can see the table function name defined in the database is specified in the component's general properties (property is Function Name). The signature for the function must be manually defined by adding input/output connector points and attributes. Check the AMIS blog and reflect on the design above.

Regarding performance, one of the examples I blogged (OWB here and ODI here) was parallel unload to file. The table function examples from those previous blogs were fairly rudimentary, in this blog we will see what happens when we tweak the implementation of such functions - we can get much better performance. Here is the table function implementation I will use within the ODI examples (the type definitions used come from the OWB blog post above).

  1. create or replace function ParallelUnloadX (r SYS_REFCURSOR) return NumSet 
  2.    PIPELINED  PARALLEL_ENABLE (PARTITION r BY ANY) is 
  3.    TYPE row_ntt IS TABLE OF VARCHAR2(32767);
  4.    v_rows row_ntt;
  5.    v_buffer VARCHAR2(32767);
  6.    i binary_integer := 0; 
  7.    v_lines pls_integer := 0;
  8.    c_eol CONSTANT VARCHAR2(1) := CHR(10); 
  9.    c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 
  10.    c_maxline CONSTANT PLS_INTEGER := 32767; 
  11.    out utl_file.file_type; 
  12.    filename varchar2(256) := 'dbunload'; 
  13.    directoryname varchar2(256) := 'MY_DIR'; 
  14.    vsid varchar2(120); 
  15. begin 
  16.    select sid into vsid from v$mystat where rownum=1; 
  17.    filename := filename || vsid || '.dat'; 
  18.    out := utl_file.fopen (directoryname, filename , 'w');

  19.    loop 
  20.      fetch r BULK COLLECT INTO v_rows; 
  21.      for i in 1..v_rows.COUNT LOOP
  22.        if LENGTH(v_buffer) + c_eollen + LENGTH (v_rows(i)) <= c_maxline THEN
  23.          v_buffer := v_buffer || c_eol || v_rows(i);
  24.        else
  25.          IF v_buffer IS NOT NULL then
  26.            utl_file.put_line(out, v_buffer);
  27.          end if;
  28.          v_buffer := v_rows(i);
  29.        end if;
  30.      end loop;
  31.      v_lines := v_lines + v_rows.COUNT;
  32.      exit when r%notfound;
  33.    end loop;
  34.    close r;
  35.    utl_file.put_line(out, v_buffer); 

  36.    utl_file.fclose(out); 
  37.    PIPE ROW(i); 
  38.    return ;
  39. end; 
  40. /

The function uses PARALLEL_ENABLE and PARTITION BY keywords - these 2 are critical to performance and scalability. In addition, this function is further optimized; it uses the PLSQL BULK COLLECT capability and also buffers data in PLSQL variables before writing to file (this avoids IO calls). This was not rocket science to tune (plenty of posts on PLSQL IO tuning such as this) yet you can see the impact it has on performance further below.

My mapping using the table function as a target is shown below, 

In the physical design I define the parallel hints, this will then perform parallel unloads to file and you can easily leverage the hardware and power of the Oracle database. Using the hints to tweak the physical design let's the designer very easily compare and tune performance - you do not have to design the parallelism in your own flows.

In the table below you can see the performance difference when I use the PARALLEL(4) hint on a 4 CPU machine;

No Hint PARALLEL(4)
5 million rows  16s  6s
32 million rows 200s  47s 

If I execute the agent based SQL to file LKM, the time taken out of the box is 398 seconds (slower than 47s above when a hint is used) on the 32 million row example, the only divide and conquer techniques with the LKM are building a custom workflow to do such. With the table function approach if your database is on a bigger, more powerful host you can easily take advantage of the system by tweaking the hints.

As you see, the ODI table function component provides another custom exit point in a flow which let's you not only provide some useful integration capabilities but you can also do it in a very efficient manner - leveraging and exploiting the database you are running on. Hopefully this gives you a little insight and an interesting use case or two.

Wednesday Dec 11, 2013

GoldenGate 12c - What is Coordinated Delivery?

In Oracle GoldenGate 12c, the parallel apply is improved with two features: Integrated Delivery and Coordinated Delivery. Different from the Integrated Delivery, Coordinated Delivery is not limited to Oracle Database.This blog discusses the new Coordinated Delivery feature by answering two questions:


  • Why do I need Coordinated Delivery?

  • What is Coordinated Delivery?
  • [Read More]

    Monday Nov 11, 2013

    ODI 12c - Parallel Table Load

    In this post we will look at the ODI 12c capability of parallel table load from the aspect of the mapping developer and the knowledge module developer - two quite different viewpoints. This is about parallel table loading which isn't to be confused with loading multiple targets per se. It supports the ability for ODI mappings to be executed concurrently especially if there is an overlap of the datastores that they access, so any temporary resources created may be uniquely constructed by ODI. Temporary objects can be anything basically - common examples are staging tables, indexes, views, directories - anything in the ETL to help the data integration flow do its job. In ODI 11g users found a few workarounds (such as changing the technology prefixes - see here) to build unique temporary names but it was more of a challenge in error cases.

    ODI 12c mappings by default operate exactly as they did in ODI 11g with respect to these temporary names (this is also true for upgraded interfaces and scenarios) but can be configured to support the uniqueness capabilities. We will look at this feature from two aspects; that of a mapping developer and that of a developer (of procedures or KMs).

    1. Firstly as a Mapping Developer.....

    1.1 Control when uniqueness is enabled

    A new property is available to set unique name generation on/off. When unique names have been enabled for a mapping, all temporary names used by the collection and integration objects will be generated using unique names. This property is presented as a check-box in the Property Inspector for a deployment specification.

    1.2 Handle cleanup after successful execution
    Provided that all temporary objects that are created have a corresponding drop statement then all of the temporary objects should be removed during a successful execution. This should be the case with the KMs developed by Oracle.

    1.3 Handle cleanup after unsuccessful execution

    If an execution failed in ODI 11g then temporary tables would have been left around and cleaned up in the subsequent run. In ODI 12c, KM tasks can now have a cleanup-type task which is executed even after a failure in the main tasks. These cleanup tasks will be executed even on failure if the property 'Remove Temporary Objects on Error' is set.

    If the agent was to crash and not be able to execute this task, then there is an ODI tool (OdiRemoveTemporaryObjects here) you can invoke to cleanup the tables - it supports date ranges and the like.

    That's all there is to it from the aspect of the mapping developer it's much, much simpler and straightforward. You can now execute the same mapping concurrently or execute many mappings using the same resource concurrently without worrying about conflict. 

    2. Secondly as a Procedure or KM Developer.....

    In the ODI Operator the executed code shows the actual name that is generated - you can also see the runtime code prior to execution (introduced in 11.1.1.7), for example below in the code type I selected 'Pre-executed Code' this lets you see the code about to be processed and you can also see the executed code (which is the default view).

    References to the collection (C$) and integration (I$) names will be automatically made unique by using the odiRef APIs - these objects will have unique names whenever concurrency has been enabled for a particular mapping deployment specification. It's also possible to use name uniqueness functions in procedures and your own KMs.

    2.1 New uniqueness tags 

    You can also make your own temporary objects have unique names by explicitly including either %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG in the name passed to calls to the odiRef APIs. Such names would always include the unique tag regardless of the concurrency setting.

    To illustrate, let's look at the getObjectName() method. At <% expansion time, this API will append %UNIQUE_STEP_TAG to the object name for collection and integration tables. The name parameter passed to this API may contain  %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. This API always generates to the <? version of getObjectName()

    At execution time this API will replace the unique tag macros with a string that is unique to the current execution scope. The returned name will conform to the name-length restriction for the target technology, and its pattern for the unique tag. Any necessary truncation will be performed against the initial name for the object and any other fixed text that may have been specified.

    Examples are:-
    1. <?=odiRef.getObjectName("L", "%COL_PRFEMP%UNIQUE_STEP_TAG", "D")?>
    2. SCOTT.C$_EABH7QI1BR1EQI3M76PG9SIMBQQ
    1. <?=odiRef.getObjectName("L", "EMP%UNIQUE_STEP_TAG_AE", "D")?>
    2. SCOTT.EMPAO96Q2JEKO0FTHQP77TMSAIOSR_
    Methods which have this kind of support include getFrom, getTableName, getTable, getObjectShortName and getTemporaryIndex. There are APIs for retrieving this tag info also, the getInfo API has been extended with the following properties (the UNIQUE* properties can also be used in ODI procedures);
    • UNIQUE_STEP_TAG - Returns the unique value for the current step scope, e.g. 5rvmd8hOIy7OU2o1FhsF61 Note that this will be a different value for each loop-iteration when the step is in a loop.
    • UNIQUE_SESSION_TAG - Returns the unique value for the current session scope, e.g. 6N38vXLrgjwUwT5MseHHY9
    • IS_CONCURRENT - Returns info about the current mapping, will return 0 or 1 (only in % phase)
    • GUID_SRC_SET - Returns the UUID for the current source set/execution unit (only in % phase)
    The getPop API has been extended with the IS_CONCURRENT property which returns info about an mapping, will return 0 or 1. 

    2.2 Additional APIs

    Some new APIs are provided including getFormattedName which will allow KM developers to construct a name from fixed-text or ODI symbols that can be optionally truncate to a max length and use a specific encoding for the unique tag. It has syntax getFormattedName(String pName[, String pTechnologyCode]) This API is available at both the % and the ? phase.  The format string can contain the ODI prefixes that are available for getObjectName(), e.g. %INT_PRF, %COL_PRF, %ERR_PRF, %IDX_PRF alongwith %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. The latter tags will be expanded into a unique string according to the specified technology. Calls to this API within the same execution context are guaranteed to return the same unique name provided that the same parameters are passed to the call. e.g.
    • <%=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")%>
    • <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")?>
    • C$_MY_TAB7wDiBe80vBog1auacS1xB_AE
    • <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG.log", "FILE")?>
    • C2_MY_TAB7wDiBe80vBog1auacS1xB.log

    2.3 Name length generation 

    As part of name generation, the length of the generated name will be compared with the maximum length for the target technology and truncation may need to be applied. When a unique tag is included in the generated string it is important that uniqueness is not compromised by truncation of the unique tag. When a unique tag is NOT part of the generated name, the name will be truncated by removing characters from the end - this is the existing 11g algorithm. When a unique tag is included, the algorithm will first truncate the <postfix> and if necessary  the <prefix>. It is recommended that users will ensure there is sufficient uniqueness in the <prefix> section to ensure uniqueness of the final resultant name.

    SUMMARY

    To summarize, ODI 12c make it much simpler to utilize mappings in concurrent cases and provides APIs for helping developing any procedures or custom knowledge modules in such a way they can be used in highly concurrent, parallel scenarios. 

    Tuesday Mar 12, 2013

    ODI - Compressing/Decompressing Files in Parallel

    Here's a couple of user functions for compressing and decompressing files in parallel, you can control the degree of parallelism, it will compress/decompress all files in one directory and write into another. The number of parallel processes can be configured on invocation. I posted some time back about how ODI User Functions are really useful, this is a great example. What I have here is a couple of user functions you can call in an ODI Procedure or an ODI KM for example and the functions have 3 arguments; the input directory of files to compress/decompress, the output directory where zipped files will be stored or contents extracted to and the number of processes in the pool to process all of the files.

    Below you can see the FileZipper user function used in an ODI procedure to demonstrate how to compress all of the files in a directory d:\inputlogs into a target directory d:\outputzips it uses 4 parallel processes to perform the compression. Obviously the performance is determined on the processors you have available in order to gain maximum benefit. 

    You can download the user functions below, the target output directory must exist;

    1. FileZipper( inputDirectory, outputZipDirectory, numberOfProcesses) user function is here
    2. FileUnzipper( zipDirectory, outputDirectory, numberOfProcesses) user function is here

     

    You can look into the user functions and edit, change and enhance. Let me know what you think. The implementation is java code that uses Java thread pools introduced in Java 5. A great example of leveraging java core capabilities to provide real benefit in your integration processes.

    Wednesday Aug 29, 2012

    Load Plan article in Oracle Magazine

    Timely article in Oracle Magazine on ODI Load Plans from Mark Rittman in the current issue, worth having a quick read of the article and play with the sample which is included if you get the time. Thanks to Mark for investing the time and energy providing such useful information to the community.

    http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52bi-1735905.html

    Mark goes over the main benefits of the load plan in the article. Interested to hear any creative use cases or comments in general.

    Thursday Jul 14, 2011

    ODI 11g - Parallel, Pipelined Unload to File

    Here we can see how we can leverage table functions as targets in ODI interfaces to do cool stuff! It carries on from my pipelining post. Table functions let you do all kinds of powerful transformations and are a great way for incorporating in-memory custom transformations on sets. The illustration here is using a table function to parallel unload to file, it was initially posted on the OWB blog here.

    There is a viewlet right here that walks you through the creation of the interface.

    There are controls at the bottom if you need to pause, or step forward/backward. 

     

    Wednesday Jun 01, 2011

    Load plans - getting up and running

    To get up and running with load plans there are a few things you'll need to do in order to execute them. One is to setup your agent, if you haven't got one, there is a useful ODI 11g OBE below that explains how to configure a standalone agent;

    http://st-curriculum.oracle.com/obe/fmw/odi/odi_11g/Setup_ODI_Agent/Setup_ODI_Agent.htm

    In ODI 11.1.1.5 there is also a change in the requirements on the database schema you use (for Oracle anyway), in that there is an additional execute privilege needed on DBMS_LOCK which is needed for load plans, see doc below;

    http://download.oracle.com/docs/cd/E21764_01/integrate.1111/e12643/admin_reps.htm#ODIDG156

    If you have not granted this privilege you will get the error 'PLS-00201: identfiier DBMS_LOCK must be declared' like below;

    One you have granted the execute privilege on DBMS_LOCK to the work schema you are good to go and execute the load plans.

    Friday Nov 20, 2009

    Parallel Processing in ODI

    This post assumes that you have some level of familiarity with ODI. The concepts of Packages, Interfaces, Procedures and Scenarios are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information.

    ODI: Parallel Processing

    A common question in ODI is how to run processes in parallel. When you look at a typical ODI package, all steps are described in a serial fashion and will be executed in sequence.

    ParallelPackageSerial.PNG

    However, this same package can parallelize and synchronize processes if needed.

    PARALLEL PROCESSES

    The first piece of the puzzle if you want to parallelize your executions is that a package can invoke other packages once they have been compiled into scenarios (the process of generation of scenarios is described later in this post). You can then have a master package that will orchestrate other scenarios. There is no limit as to how many levels of nesting you will have, as long as your processes are making sense: Your master package invokes a seconday package which, in turn invokes another package...

    When you invoke these scenarios, you have two possible execution modes: synchronous and asynchronous.

    ParallelScenario.PNG

    A synchronous execution will serialize the scenario execution with other steps in the package: ODI executes the scenario, and only after its execution is completed, runs the next step.

    An asynchronous execution will only invoke the scenario but will immediately execute the next step in the calling package: the scenario will then run in parallel with the next step. You can use this option to start multiple scenarios concurrently: they will all run in parallel, independently of one another.

    SYNCHRONIZING PROCESSES

    Once we have started multiple processes in parallel, a common requirement is to synchronize these processes: some steps may run in parallel, but at times we will need all separate threads to be completed before we proceed with a final series of steps. ODI provides a tool for this: OdiWaitForChildSession.

    ParallelSynchronize.PNG

    An interesting feature is that as you start your different processes in parallel, they can each be assigned a keyword (this is just one of the parameters you can set when you start a scenario). When you synchronize the processes, you can select which processes will be synchronized based on a selection of keywords.

    ADDING SCENARIOS TO YOUR PACKAGE FOR PARALLEL PROCESSING

    To add a scenario to your package, simply drag and drop the generated scenario in the package, and edit the execution parameters as needed. In particular, remember to set the execution mode to Asynchronous.

    You can generate a scenario from a package, from an interface, or from a procedure. The last two will be more atomic (one interface or one procedure only per execution unit). The typical way to generate a scenario is to right-click on one of these objects and to select Generate Scenario.

    The generation of scenarios can also be automated with ODI processes that would invoke the ODI tool OdiGenerateAllScen. The parameters of this tool will let you define which scenarios are being generated automatically.

    In all cases, scenarios can be found in the object tree, under the object they were generated from - or in the Operator interface, in the Scenarios tab.

    While you are developing your different objects, keep in mind that you can Regenerate existing scenarios. This is faster than deleting existing ones only to re-create them with the same version number. To re-generate a scenario, simply right-click on the existing version and select Regenerate ... .

    From an execution perspective, you can specify that the scenario you will execute is version -1 (negative one) to ensure that the latest version number is always the one executed. This is a lot easier than editing the parameters with each new release.

    DISPLAYING PARALLEL PROCESSING

    You will notice that as of 10.1.3.4, ODI does not graphically differentiate between serialized and parallelized executions: all are represented in a serial manner. One way to make parallel executions more visible is stack up the objects vertically, versus the more natural horizontal execution for serialized objects. (If we have electricians reading this, the layout will be very familiar to them, but this is only a coincidence...)

    ParallelPackageStackUp.PNG

    OTHER OBJECTS THAN SCENARIOS

    Scenarios are not the only objects that will allow for parallel (or Asynchronous) execution. If you look at the ODI tool OdiOSCommand, you will notice a Synchronous option that will allow you to define if the external component you are executing will run in parallel with the current process, or if it will be serialized in your process. The same is true for the Data Quality tool OdiDataQuality.

    EXECUTION LOGS

    As you will start running more processes in parallel, be ready to see more processes being executed concurrently in the Operator interface. If you are only interested in seing the master processes though, the Hierarchy tab will allow you to limit your view to parent processes. Children processes will be listed under the entry Childres Sessions under each session.

    Likewise, when you access the logs from the web front end, you can view the Parent processes only.

    Enjoy!

    Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

    About

    Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    2
    3
    5
    6
    7
    8
    9
    10
    12
    13
    14
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today