Wednesday Apr 16, 2014

Learn about Oracle Data Integrator (ODI) Agents

Check out two new ODI A-Team blog posts – all about the Oracle Data Integrator (ODI) Agents! Understand where to install the ODI standalone agent, and find out more about the ODI agent flavors and installation types. Which one(s) make sense for you?

Understanding Where to Install the ODI Standalone Agent

ODI Agents: Standalone, JEE and Colocated 

Happy reading!

Tuesday Apr 15, 2014

Fusion Application Bulk Import Process

Written by Ayush Ganeriwal, Oracle Data Integrator Product Management

In the previous blog post we looked at the Fusion Applications end-to-end bulk data integration use cases. Now let’s take a closer look at the Bulk Import process that transforms and moves data from Interface tables to internal tables. For this use case ODI is bundled along with Fusion Application and get configured transparently by the Fusion Application provisioning process. The entire process is automated and controlled through the Fusion Application User Interface. It also seeds the ODI repository with the Fusion Application specific Models, Interfaces, Procedures and Packages which are then dynamically modified through ODI SDK for any Fusion Application customizations.


Fusion Application Bulk Import Process

The above diagram shows the Bulk import process in Fusion Application where ODI is used for data transformation. Here the Interface tables are the source tables which were populated by other processes before the kicking off the Bulk Import process. The Fusion Application internal tables are the target for these integrations where the data needs to be loaded. These internal tables are directly used for Fusion Application functionalities therefore a number of data validations are applied to load only the good quality data into the internal tables. The data validation errors are monitored and corrected through Fusion Application User Interface. The metadata of Fusion Application tables is not fixed and gets modified as the Application is customized for customer’s requirement. Any change in such source or target tables would require corresponding adjustments in ODI artifacts too and is taken care of by the AppComposer which uses ODI SDK to make such changes in ODI artifacts. If auditing is enabled then any change in the internal table data or the changes in ODI artifacts are recorded in centralized auditing table.

Packaged ODI Artifacts

There are a large number of ODI models, interfaces and packages seeded in the default ODI repository used for Bulk Import. These ODI artifacts are built based upon the base metadata of Fusion Application schema.

Extensibility

As part of the customization, Fusion Application entities are added or modified as per the customer’s requirement. Such customizations result in changes in the underlying Fusion Application’s internal tables and interface tables, and require the ODI artifacts to be updated accordingly. The Fusion Application development team as built the extensibility framework to update ODI artifacts dynamically along with any change in Fusion Application schema. It leverages the ODI-SDK for performing any changes in the ODI repository. The dynamic generation of ODI artifacts is automatically kicked off as part of Patching and Upgrades process. Fusion Application AppComposer User Interface also supports explicitly triggering this process so that administrators can regenerate ODI artifacts whenever they make any customizations.

Validation Error Reporting

The validation errors are populated in intermediate tables and are exposed through BI Publisher so that admin users can correct and recycle these error records.

Auditing

The Fusion Application auditing framework keeps track of the changes performed by each of the users and at what time. There are two levels of auditing captured in Fusion Application audit table for Bulk Import use case. First, metadata changes in ODI artifacts through ODI SDK during customizations. Second, the transactional data changes in the Fusion Application table data as part of ODI interfaces execution. For these purposes the ODI team has exposed some substitution APIs that are used by Fusion Application development team to customize ODI KMs to perform such auditing during the actual data movement.

Provisioning and Upgrade

The provisioning process takes care of install and configuring ODI for the Fusion Application instance.

It takes care of automatically creating ODI repository schemas, configuring topology, setting up ODI agents, setup configurations for ODI –ESS bridge, seeding packaged ODI artifacts, apply modifications to seeded artifacts and create internal users in IDM for external authentication. There is a separate process to apply patches or upgrade the environment to the newer release. Such patching or upgrade processes not only take care of importing newer ODI artifacts but also kick off a CRM extensibility process that modifies ODI artifacts as per the Fusion Application customizations.

External Authentication

There is a dedicated IDM configured with each Fusion Application instance and all Fusion Application components are expected to have their users authenticated through this centralized IDM. For Bulk Import use case ODI is configured with external authentication and there are internal users created in IDM that are used for communication with ODI agent and kicking off ODI jobs.

Enterprise Scheduler Service (ESS) - ODI Bridge

The ODI scenarios are kicked off through ODI-ESS bridges. It is a separate library build for ODI-ESS integration and gets deployed along with Enterprise Scheduler Service (ESS) in Fusion Application environment. It supports both synchronous and asynchronous modes of invocation for ODI jobs. In the asynchronous mode the session status is updated to callbacks to the ESS services. There is a topology editor provided to manage the ESS callback service connectivity exclusively for Fusion Application use cases.

Note: Use of ESS-ODI Bridge is restricted to Fusion Application use case only at the moment.

High Availability

The ODI agent is deployed on Weblogic cluster in the Fusion Application environment to take advantage of ODI high availability capabilities. By default there is only one managed server in the Weblogic cluster created for ODI but as the load increases more managed servers can be added to the cluster to distribute execution of ODI sessions among ODI agent instances in the cluster.

Stay tuned for the last post on this topic coming soon.  This was part two in a series of three posts.  The initial post can be found here.

Friday Apr 11, 2014

ODI 12c - Expression Hints

The ODI 12c mapping designer let's you design a mapping using components and define expressions for each of those components. Like 11g, in 12c there are hints to let ODI know how you would like to compute a physical plan for your design and from this generate code. I think the rules about how some of this work are not known - both in 11g and 12c - people second guess how they think it works. There's no magic to it, let's have a look at it. Underpinning the mapping are the technology definitions, it's here that datatypes are defined and datatype mappings between technologies. This let's ODI be very flexible in support for arbitrary data transformations between systems and how such data and its datatype is mapped across heterogeneous systems.

Putting the heterogeneous nature aside we can look at how datatypes are transformed just in a distributed example for Oracle (the example is for demonstration, in reality the database link LKM will be used which will do no staging). The example has 2 columns in a source table that are both VARCHAR2(30), one of those columns has an actual string, the other has a date. The target system has 2 columns in our target table that are VARCHAR2(10) and DATE. Note the target for one is shorter than its source and the other is a DATE datatype and not a string.

We can define the simple table to table mapping as below and define the expressions on the target table.

By default the expressions have no hint defined and will execute where the table is executed - in this case on the target system. We can see how the C$ table would be defined by previewing the DDL code in the physical design, we will see the type/DDL in the syntax of the particular technology. Below you can see the source datatype information is propagated - the length is still 30.

 If we look at the target table we can see the expressions defined on it, in the example below I have selected the TGTEMP table and I can see the 2 expressions, I could actually change where the expression is defined for this particular physical design, I will not do that though, I will go back to the logical designer and set the hint there - then all of my potential physical designs leverage it.

Use the 'Execute on hint' property for the attribute, you can see the different options there, just now it has value no hint. 

Below I have selected 'Source' to indicate I want the SUBSTR expression executed on the source system. 

After this change has been made, if you look at the physical design you will see that the datatype information on our AP is now different. Because the data has been prepared on the source then the datatype for our C$ column now takes on the definition of the target (VARCHAR2(10)). 

This gives you some idea as to how the hints works for expressions in a datastore. ODI 12c also has an expression component that let's you define groups of expressions. I generally think this is good for when an expression is reused within a mapping, but I know everyone works differently with tools and I have heard that some people like using this for all complex expressions rather than it being defined within a component such as a target table as they can easily 'see' where complex expressions are defined. Each to their own, the good thing is that you can do whatever you like. One benefit with using the expression component is that ODI by default will push that expression component as close to the source as possible and you can easily grab the entire component and push it to the stage, target or wherever.

The mapping below defines the expressions on the expression component, again there are no hints defined on the component or individual expressions.

When the physical design for this mapping is inspected we can see the expression component is by default on the source execution unit. This goes for other components too (filter, join etc.). In this case you can see both of the columns in the AP take on the downstream target table's datatypes (VARCHAR2(10) and DATE).

Changing the hint on the logical design for the expression component to stage will place the expression in the downstream execution unit. If I had just switched the hint to be stage for the expression component then in the physical design the expression would go in TARGET_UNIT_1. In 11g, ODI also supported a concept where the stage was different for the target. This is still available in 12c and is configured by defining what you want to execute in the stage by using these hints plus defining what the stage is (so similar to 11g apart from you don't have to switch tabs and the gestures are more simple). So firstly, define the expression to execute on the stage using that hint. Then on the logical mapping if you click on the canvas background you will see a property named 'Staging Location Hint', you can set this to the logical schema location for the staging area if you have one. By default it is not set as the staging area is the same as the target.

Let's change this to MEMORY_ENGINE just to see what the physical design looks like. We see we now have multiple execution units and the middle one where we executed the expression component is the 'stage' executing on the MEMORY_ENGINE location.

The hints are done on the logical design. You can also hard-wire physical changes in the physical design, I will go into that in a subsequent post but wanted to stick to the logical hints here to demystify how this works. I hope this is some useful background, I think for ODIers from 11g it will help.

Oracle Data Integrator (ODI) Usage in Fusion Applications (FA)

Written by Ayush Ganeriwal, Oracle Data Integrator Product Management

Oracle Data Integrator (ODI) is the bulk data transformation platform for Fusion Applications (FA). ODI is used by Fusion Customer Relationship Management (CRM), Fusion Human Capital Management (HCM), Fusion Supply Chain Management (SCM), Fusion Incentive Compensation (IC) and Fusion Financials family products and many other Fusion Application teams are following suit. Among all these product families CRM is the biggest consumer of ODI leveraging a breadth of ODI features and functionality, out of which some features were developed specifically for Fusion Applications use. Some ODI features they utilize include: ODI SDK, high availability, external authentication, various out of the box and customized Knowledge Modules, ODI-ESS bridge, callbacks to ESS EJBs, auditing, open tools, etc. In this post we will first talk about the different Fusion Application use cases at higher level and then take a closer look at different integration points.

Figure 1 shows data integration need of a typical on-premise Fusion Applications deployment.

  1. Bulk Import: Fusion Applications exposes a set of interface tables as the entry point for data load from any outside source. The bulk import process validates this data and loads it in the internal table which can then be used by the fusion application.
  2. Data Migration: Extracting data from external applications, legacy applications or any other data source and loading it into Fusion Application’s interface table. ODI can be used for such data load.
  3. Preparing Data Files: Converting data into Comma Separated Values (CSV) files that can be imported through Fusion Application’s files import wizard. ODI can be used to extract data into such CSV file.

Figure 1: Data Integration Needs in On-Premise Fusion Application

Figure 2 shows the on-demand or cloud environment requirements, which are slightly different as there is no direct connectivity available to the interface tables.

  1. Bulk Import: Fusion Application exposes a set of interface tables as the entry point for any data load from any outside source. The bulk import process validates this data and then loads it in the internal table which can then be used by the application.
  2. Preparing Data Files: Converting data into CSV files that can be imported through Fusion Application’s files import wizard. ODI can be used for creation on such CSV files.
  3. Uploading data files: The data files are uploaded to the Tenant File repository through either Fusion Application’s File import page or Oracle WebCenter Content Document Transfer Utility. The WebCenter Utility is built using ODI open tool framework allowing orchestrating entire process through ODI package.
  4. Loading Interface Table: Data files to be loaded in the interface tables so that it can be consumed by the Bulk Import process. ODI is used for loading these interface tables.

Figure 2: Data Integration Needs for On-Demand Fusion Application

Stay tuned for more blog posts on this topic coming next week. This was part one in a series of three posts.


Tuesday Apr 01, 2014

ODI 12c - Mapping Builder

A few years ago I posted a utility (see interface builder post here) to build interfaces from driver files, here I have updated it for 12c to build mappings from driver files. The example uses a tab delimited text file to control the mapping creation, but it could be easily taken and changed to drive from whatever you wanted to capture the design of the mapping.

The mapping can be as complete or incomplete as you’d like, so could just contain the objects or could be concise and semantically complete.

The control file is VERY simple and just like ODI requests the minimal amount of information required. The basic format is as follows;So for example the control file below can define the sources, target, joins, mapping expressions etc;

Directive Column2 Column3 Column4 Column5 Column6
 source  <model>  <datastore>  <alias>
 .....can add many
 target  <model>  <datastore>
 filter  <filter_condition>  <filter_datastore_alias>  <alias>
 lookup  <model> <datastore>  <driver_alias> <lookup_condition>  <alias>
 join  <join_condition>  <alias>
 ....can add many of the components above.
 mapping  <column>  <expression>

So for example the control file below can define the sources, target, joins, mapping expressions etc;

  • source SOURCE EMP EMP
  • source SOURCE DEPT DEPT
  • target TARGET_MODEL TGTEMP
  • join EMP.DEPTNO = DEPT.DEPTNO AJOIN
  • filter EMP.SAL > 1 EMP AFILTER
  • lookup SOURCE BONUS EMP BONUS.ENAME = EMP.ENAME ALOOKUP
  • mapping ENAME UPPER(EMP.ENAME) 
  • mapping DEPTNO ABS(DEPT.DEPTNO) 
  • mapping COMM ABS(BONUS.COMM)

When executed, this generates the mapping below with the join, filter, lookup and target expressions from the file;


You should be able to join the dots between the control file sample and the mapping design above. You will need to compile and execute the code in OdiMappingBuilder;

java –classpath <cp> OdinterfaceBuilder jdbc:oracle:thin:@localhost:1521:ora112 oracle.jdbc.OracleDriver ODI_MASTER mypwd WORKREP1 SUPERVISOR myodipwd DEMOS SDK DEMO1 < mymappingcontrolfile.tab

The mapping to be created is passed from the command line. You can intersperse other documentation lines between the control lines so long as the control keywords in first column don’t clash. See the driver file below viewed from within Excel;

Anyway some useful snippets of code for those learning the SDK (download OdiMappingBuilder here), or for those wanting to capture the design outside and generate ODI mappings. Have fun!

Wednesday Mar 12, 2014

ODI 12c - Data Input Experts

Back in the olde days of OWB I blogged about a few utilities (see here) that were useful for collecting user input data in custom flows, users build such flows to implement accelerators to take the mundane tasks out of common activities. In ODI you can also use groovy SwingBuilder, this let's you build useful dialogs very easily. I posted some examples such as the one below for model creation in ODI and a launchpad example;

The utilities for OWB I mentioned in the blog are just basic java classes that were invoked from OWB via tcl/jacl. These utilities are written in java and can still be used from ODI via groovy. Still as useful, still as functional. Let's see how we call them now!

The required JARs need to be put on the groovy classpath, which is under the ODI IDE's Tools->Preferences option, and then under ODI->System->Groovy and set the groovy classpath to include jexpert.jar, tcljava.jar and jacl.jar. For example I have the following referencing the JARs from my 11gR2 database which has the OWB code;

  • D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jexpert.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\tcljava.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jacl.jar

I can then launch the shuttle dialog for example as follows;

  1. import oracle.owb.jexpert.ShuttleObjects
  2. arrayOfString = [ "PRODUCT_ID", "PRODUCT_NAME", "PRODUCT_COLOR", "PRODUCT_DESC", "PRODUCT_LONG_DESC", "CATEGORY_ID", "CATEGORY_NAME", "CATEGORY_DESCRIPTION", "SUBCATEGORY_ID", "SUBCATEGORY_NAME", "SUBCATEGORY_DESCRIPTION" ]
  3. sels = ShuttleObjects.getselection("Select dimension levels", "Select columns to identify levels:", "Columns:", "Levels", (String[]) arrayOfString.toArray())

  4. println sels

I can use the returned variable sels and do whatever ODI stuff I need, you can see the code above executed from within ODI and the dialog appearing with the information;

Likewise the data entry dialog works as is, when that dialog is executed from groovy, just like in OWB we can get the information displayed, the user can enter data, we can collect it and action it in our groovy using the ODI SDK;

The blog on the 12c mapping SDK here has a good SDK reference table that gives you pointers for all parts of the product into the SDK areas. This is definitely a handy one to bookmark, I often use it myself. Learn some scripting it'll help save you are your teams a lot of time.

Friday Mar 07, 2014

Using the Unpivot Component in ODI 12c

As we’ve seen last week the latest ODI 12c patches added several new Mapping Components such as Pivot or Unpivot. In this blog post we will walk you through an example of how to use the new Unpivot Component. 

You can use the following SQL statements to recreate this example in your environment. It will create the source (UNPIVOT_TEST) and target (TRG_UNPIVOT_TEST) tables used in this article in your database then you can reverse engineer them in ODI.

CREATE TABLE unpivot_test (
year NUMBER,
q1_sales NUMBER,
q2_sales NUMBER,
q3_sales NUMBER,
q4_sales NUMBER
);
insert into unpivot_test values (2012, 10.5, 11.4, 9.5, 8.7);
insert into unpivot_test values (2013, 9.5, 10.5, 10.3, 7.6);
commit;
CREATE TABLE trg_unpivot_test (
year NUMBER,
quarter VARCHAR2(255),
sales NUMBER
);

Our goal is to unpivot the data on the Quarter columns when going from UNPIVOT_TEST into TRG_UNPIVOT_TEST as shown below


Follow these steps to add and configure an Unpivot Component in a Mapping:

  1. First add the Source table UNPIVOT_TEST into your Mapping, to do so drag and drop the datastore from the Models into the Mapping
  2. Next add an Unpivot component into the Mapping. This is done by clicking on the Unpivot Component in the Component palette and then clicking on the Mapping diagram. A new UNPIVOT component will appear in the Mapping:

  3. Drag and drop the YEAR column from UNPIVOT_TEST into the UNPIVOT component. There is no need to add the remaining source attributes yet, we will be using them in the Unpivot Transforms section later.

  4. Then click on the UNPIVOT component and select the Attributes panel. We will add 2 new attributes representing the row and value locator attributes: QUARTER (VARCHAR) and SALES (NUMERIC).

  5. Click on the General panel and select QUARTER in the Row Locator drop-down list. We will be performing the unpivot operation using this column.

  6. Next add the transformations to unpivot the data. Click on the Unpivot Transforms panel and create a transformation rule for each of the 4 quarters and match the appropriate QUARTER and SALES values as shown below:

  7. Finally add the Target datastore TRG_UNPIVOT_TEST and link the UNPIVOT component to it. Leave the default settings in the Attributes Matching panel and click OK

  8. In this example you can use the default Physical settings for your Mapping. Integration Type is set to Control Append by default and the IKM Oracle Insert is used
  9. Finally click on Run to execute the Mapping, 8 inserts are performed and you should see the following data in your target table:

  10. If you review the generated code you will notice that ODI leverages the UNPIVOT function on Oracle to perform such operation. The Unpivot component supports Oracle as well as any database

You can recreate the following example using the ODI 12c Getting Started VirtualBox image which is available on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/odi-demo-2032565.html

Friday Feb 28, 2014

Pivoting Data in ODI 12c

We have recently added several new Mapping Components in Oracle Data Integrator 12c such as Pivot or Unpivot. In this blog post we will walk you through an example of how to use the new Pivot Component.

You can use the following SQL statements to recreate this example in your environment. It will create the source (PIVOT_TEST) and target (TRG_PIVOT_TEST) tables used in this article in your database then you can reverse engineer them in ODI.

CREATE TABLE pivot_test (
year NUMBER,
quarter VARCHAR2(255),
sales NUMBER
);

insert into pivot_test values (2012, 'Q1', 10.5);
insert into pivot_test values (2012, 'Q2', 11.4);
insert into pivot_test values (2012, 'Q3', 9.5);
insert into pivot_test values (2012, 'Q4', 8.7);
insert into pivot_test values (2013, 'Q1', 9.5);
insert into pivot_test values (2013, 'Q2', 10.5);
insert into pivot_test values (2013, 'Q3', 10.3);
insert into pivot_test values (2013, 'Q4', 7.6);
commit;

CREATE TABLE trg_pivot_test (
year NUMBER,
q1_sales NUMBER,
q2_sales NUMBER,
q3_sales NUMBER,
q4_sales NUMBER
);

Our goal is to pivot the data on the Quarter column when going from PIVOT_TEST into TRG_PIVOT_TEST as shown below:


Follow these steps to add and configure a Pivot Component in an ODI 12c Mapping:

  1. First add the Source table PIVOT_TEST into your Mapping, to do so drag and drop the PIVOT_TEST datastore from the Models into the Mapping.
  2. Next add a Pivot component into the Mapping. This is done by clicking on the Pivot Component in the Component palette and then clicking on the Mapping diagram. A new PIVOT component will appear in the Mapping:

  3. Drag and drop the YEAR column from PIVOT_TEST into the PIVOT component. There is no need to add the QUARTER and SALES attributes yet, they will be used later in the Row Locator and Attributes sections.

  4. Click on the PIVOT component and in the Properties window select the Row Locator panel. In our example the Row Locator will be the QUARTER column which is transposed from rows into 4 columns in our target table TRG_PIVOT_TEST.

  5. Open up the Expression Editor next to the Row Locator field and select the QUARTER column from our source table PIVOT_TEST. Then click OK.


  6. Now specify the various values the QUARTER column can take. This is done using the Row Locator Values table. Click on the + icon under Row Locator Values and add the 4 possible values: ‘Q1’, ‘Q2’, ‘Q3’ and ‘Q4’.

  7. Then click on the Attributes panel and add the 4 output attributes which correspond to each Row Locator values: Q1_SALES, Q2_SALES, Q3_SALES and Q4_SALES.

  8. Next select a Matching Row for the output attributes you just created. The Matching Row values come from the Row Locator Values entered earlier.
    Pick ‘Q1’ for Q1_SALES, ‘Q2’ for Q2_SALES, ‘Q3’ for Q3_SALES and ‘Q4’ for Q4_SALES.
    Finally enter an expression for each of the new attributes, use PIVOT_TEST.SALES for all of them as we are interested in getting the Sales data into those columns. You can type the expression using the Expression Editor or drag and drop the SALES column from PIVOT_TEST into each of the newly created attributes.

  9. Finally add the target table TRG_PIVOT_TEST and connect the PIVOT component to it. Unselect the Create Attributes on Source checkbox in the Attribute Matching window and click OK to finish the Mapping configuration.

  10. In this example you can use the default Physical settings for your Mapping. Integration Type is set to Control Append by default and the IKM Oracle Insert is used.
  11. Click on Run to execute the Mapping, 2 inserts are performed and you should see the following data in your target table.

  12. If you review the generated code you will notice that ODI leverages the PIVOT function on Oracle to perform such operation. The Pivot component supports Oracle as well as any other database supported by ODI 12c.

You can recreate the following example using the ODI 12c Getting Started VirtualBox image which is available on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/odi-demo-2032565.html

Friday Feb 14, 2014

ODI 12c - Table Functions, Parallel Unload to File and More

ODI 12c includes a new component for integrating and transformation data programmatically, there have been plenty of examples through the years of such implementations, recent examples include SQL access to R from Mark Hornick (see an example blog here). As well as a great integration technique they have fantastic performance and scalability options - hence you see posts and talks from Kuassi Mensah on in-database map-reduce; all about leveraging the Oracle database's parallel query engine and the skills you already have (SQL and PLSQL/java).

The table function component in ODI 12c lets you integrate an existing table function implementation into a flow - the parameters for the table function can be scalar or a ref cursor, you can see how the examples from the AMIS posting here are defined within the mapping designer below, there are multiple table functions chained together, used as both a data source and a transformation;

In the above image you can see the table function name defined in the database is specified in the component's general properties (property is Function Name). The signature for the function must be manually defined by adding input/output connector points and attributes. Check the AMIS blog and reflect on the design above.

Regarding performance, one of the examples I blogged (OWB here and ODI here) was parallel unload to file. The table function examples from those previous blogs were fairly rudimentary, in this blog we will see what happens when we tweak the implementation of such functions - we can get much better performance. Here is the table function implementation I will use within the ODI examples (the type definitions used come from the OWB blog post above).

  1. create or replace function ParallelUnloadX (r SYS_REFCURSOR) return NumSet 
  2.    PIPELINED  PARALLEL_ENABLE (PARTITION r BY ANY) is 
  3.    TYPE row_ntt IS TABLE OF VARCHAR2(32767);
  4.    v_rows row_ntt;
  5.    v_buffer VARCHAR2(32767);
  6.    i binary_integer := 0; 
  7.    v_lines pls_integer := 0;
  8.    c_eol CONSTANT VARCHAR2(1) := CHR(10); 
  9.    c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 
  10.    c_maxline CONSTANT PLS_INTEGER := 32767; 
  11.    out utl_file.file_type; 
  12.    filename varchar2(256) := 'dbunload'; 
  13.    directoryname varchar2(256) := 'MY_DIR'; 
  14.    vsid varchar2(120); 
  15. begin 
  16.    select sid into vsid from v$mystat where rownum=1; 
  17.    filename := filename || vsid || '.dat'; 
  18.    out := utl_file.fopen (directoryname, filename , 'w');

  19.    loop 
  20.      fetch r BULK COLLECT INTO v_rows; 
  21.      for i in 1..v_rows.COUNT LOOP
  22.        if LENGTH(v_buffer) + c_eollen + LENGTH (v_rows(i)) <= c_maxline THEN
  23.          v_buffer := v_buffer || c_eol || v_rows(i);
  24.        else
  25.          IF v_buffer IS NOT NULL then
  26.            utl_file.put_line(out, v_buffer);
  27.          end if;
  28.          v_buffer := v_rows(i);
  29.        end if;
  30.      end loop;
  31.      v_lines := v_lines + v_rows.COUNT;
  32.      exit when r%notfound;
  33.    end loop;
  34.    close r;
  35.    utl_file.put_line(out, v_buffer); 

  36.    utl_file.fclose(out); 
  37.    PIPE ROW(i); 
  38.    return ;
  39. end; 
  40. /

The function uses PARALLEL_ENABLE and PARTITION BY keywords - these 2 are critical to performance and scalability. In addition, this function is further optimized; it uses the PLSQL BULK COLLECT capability and also buffers data in PLSQL variables before writing to file (this avoids IO calls). This was not rocket science to tune (plenty of posts on PLSQL IO tuning such as this) yet you can see the impact it has on performance further below.

My mapping using the table function as a target is shown below, 

In the physical design I define the parallel hints, this will then perform parallel unloads to file and you can easily leverage the hardware and power of the Oracle database. Using the hints to tweak the physical design let's the designer very easily compare and tune performance - you do not have to design the parallelism in your own flows.

In the table below you can see the performance difference when I use the PARALLEL(4) hint on a 4 CPU machine;

No Hint PARALLEL(4)
5 million rows  16s  6s
32 million rows 200s  47s 

If I execute the agent based SQL to file LKM, the time taken out of the box is 398 seconds (slower than 47s above when a hint is used) on the 32 million row example, the only divide and conquer techniques with the LKM are building a custom workflow to do such. With the table function approach if your database is on a bigger, more powerful host you can easily take advantage of the system by tweaking the hints.

As you see, the ODI table function component provides another custom exit point in a flow which let's you not only provide some useful integration capabilities but you can also do it in a very efficient manner - leveraging and exploiting the database you are running on. Hopefully this gives you a little insight and an interesting use case or two.

Wednesday Feb 12, 2014

ODI 12c - Mapping SDK Auto Mapping

The ODI 12c release has the new flow based mapping designer, this comes with new concepts to make the mapping design as efficient as possible as well as the runtime execution of such! The 12c release also has a new SDK for mapping, the 11g SDK is still available for backwards compatibility, but if if you want to properly leverage the 12c release the new mapping designer and SDK is the way forward. I posted a bunch of SDK examples (here and here) which demonstrated different mapping designs - the examples were in groovy and the column/attribute level mapping expressions were all done explicitly, I did not illustrate any auto mapping capabilities. So... I thought I should do it here. In doing so I'll show some other APIs within the mapping area that are very useful.

The 12c release introduced mapping components and categorized such components so that we can minimize the column level mapping expressions. If you compare ODI 12c with OWB, ODI 12c has a lot less inter component and cross component information, OWB capture a lot of information in a very explicit manner (it was very verbose, concise, but verbose).

One of the useful capabilities in the UI is to perform auto mapping, the function createExpressions below will use all available in-scope attributes that are upstream from the target component and match with attributes in the component you are targeting. The match can be done by equality, ends or starts and ignore case or exact match. Quite a simple piece of code and you can see the use of the function getUpstreamLeafAttributes for components or even getUpstreamInScopeAttributes for connector points. Some components have multiple input connector points with different graphs, for example set component, other ones are simple.

  1. enum MatchTypes {EQUALS,SRCENDSWITH, TGTENDSWITH, SRCSTARTSWITH, TGTSTARTSWITH}
  2. enum MatchCaseTypes {MATCH,IGNORECASE}

  3. def createExpressions(component, conPoint, matchType, matchCaseType) { 
  4.   atts = null
  5.   if (conPoint != null)   atts = conPoint.getUpstreamInScopeAttributes()
  6.   else atts = component.getUpstreamLeafAttributes(component)
  7.   tatts = component.getAttributes()
  8.   for (MapAttribute tgt_attr : tatts) {
  9.     attr_str = tgt_attr.getName()
  10.     if (matchCaseType == MatchCaseTypes.IGNORECASE) {
  11.       attr_str = attr_str.toLowerCase()
  12.     }
  13.     sourceCol = null;
  14.     for (MapAttribute src_attr : atts) {
  15.       src_attr_str = src_attr.getName()
  16.       if (matchCaseType == MatchCaseTypes.IGNORECASE) {
  17.        src_attr_str = src_attr_str.toLowerCase()
  18.       }
  19.       if ( (matchType == MatchTypes.SRCENDSWITH && src_attr_str.endsWith( attr_str )) ||
  20.            (matchType == MatchTypes.SRCSTARTSWITH && src_attr_str.startsWith( attr_str )) ||
  21.            (matchType == MatchTypes.TGTSTARTSWITH && attr_str.startsWith( src_attr_str )) ||
  22.            (matchType == MatchTypes.TGTENDSWITH && attr_str.endsWith( src_attr_str )) ||
  23.            (matchType == MatchTypes.EQUALS && attr_str.equals( src_attr_str )) ) {
  24.        sourceCol = src_attr
  25.        break
  26.       }
  27.     }
  28.     if (sourceCol != null && conPoint != null)  tgt_attr.setExpression( conPoint, sourceCol, null )      
  29.     else if (sourceCol != null)  tgt_attr.setExpression( sourceCol )      
  30.   }
  31. }

You can then call this function on a datastore to auto map all attribute expressions in the component as follows;

  • createExpressions(tgtempDatastoreComponent, null,MatchTypes.EQUALS,MatchCaseTypes.MATCH);

To illustrate the set component, you can code the population of each connector point as follows;

  • createExpressions(setComponent, inConnectorPoint1,MatchTypes.EQUALS,MatchCaseTypes.MATCH);
  • createExpressions(setComponent, inConnectorPoint2,MatchTypes.EQUALS,MatchCaseTypes.IGNORECASE);

this will auto map the attributes in a set component for each connector point with different rules (just for illustration purposes). You can see below the result of calling these 2 functions on the set component for each connector point. All upstream in-scope attributes are considered.

These APIs to get scoping attributes make it simple to build customized accelerators for building expressions when auto mapping. Its a little different than in 11g, have a look at the examples I posted above and the snippets above, there's a lot you can do and its easy to utilize.

Wednesday Jan 22, 2014

Deep Dive Into Oracle Data Integrator Changed Data Capture Leveraging Oracle GoldenGate

Check out this blog post below from Christophe – first the details related to Oracle Data Integrator’s (ODI) Journalizing Knowledge Modules (JKMs) as well as a deeper dive into the particulars around the seamless out-of-the-box integration between Oracle Data Integrator (ODI) and Oracle GoldenGate.

http://www.ateam-oracle.com/understanding-the-odi-jkms-and-how-they-work-with-oracle-goldengate/

Happy reading!

Thursday Jan 09, 2014

ODI - High performance data movement using Datapump

Great blog post below by Ben from the A-Team on improved Datapump support in ODI. As well as a great write up the KM is posted on the https://java.net/projects/oracledi site too.

http://www.ateam-oracle.com/using-oracle-data-pump-in-oracle-data-integrator-odi/

Feedback appreciated on this.

Friday Jan 03, 2014

ODI 12c - Components and LKMs/IKMs

Here I'd like to illustrate some nice capabilities of ODI 12c's knowledge module framework in combination with the new component based mapper. Some of this was prompted from recent questions from Holger Friedrich (on migration from OWB..'where is delete'), Stewart Bryson and Mark Rittman (on components and KMs here) and a general nagging thought I had that people were generally unaware of the capabilities. There's a lot more we can illustrate and talk about in this area, so this is a little taster...

Customized loading, integration and error management can be defined using knowledge modules. There is a text based substitution library that makes building such knowledge modules very easy - especially with the abundance of examples - plus the substitution reference documentation is a good reference guide (intro here, reference here). This can be used in harmony with ODI 12c components. In 12c we have modularized a lot and introduced components (components are logical and describe the WHAT of the transformation, component KMs describe the HOW, just like regular KMs), also we have formalized some parts of the framework but it is still as open as ever.

In supporting the odiRef substitution library we have chosen a similar approach to how the sub-select worked in 11g. If you want to build your own integration you can use the odiRef methods to get the target shape, details, connection etc. The source for the target, just like in 11g is provided via the odiRef methods also and may be rolled into the odiRef.getFrom method.

To illustrate, if you want to perform an integration to do a delete operation, then the following snippet is sufficient - in 11g or 12c (the code in red is boiler plate). In ODI 12c you can now have an arbitrary mapping graph leading up to the target;

  1. DELETE FROM <%=odiRef.getTable("L","TARG_NAME","A")%> T
  2. WHERE (<%=odiRef.getTargetColList("", "[COL_NAME]", ", ", "\n", "UK")%> )
  3. IN
  4. (
  5. select * from (
  6.  <%for (int i=odiRef.getDataSetMin(); i <= odiRef.getDataSetMax(); i++){%>
  7.  <%=odiRef.getDataSet(i, "Operator")%>
  8.    SELECT  <%=snpRef.getColList(i,"", "\t[COL_NAME]", ",\n", "", "UK")%>
  9.    FROM <%=odiRef.getFrom(i)%>
  10.    WHERE (1=1) <%=snpRef.getJoin(i)%> <%=snpRef.getFilter(i)%> <%=snpRef.getGrpBy(i)%> <%=snpRef.getHaving(i)%>
  11.  <%}%>
  12. ) S
  13. )

For example in the mapping below, I am using the subquery filter component to identify some rows that I then want to DELETE from the CONTACT_LIST target. The integration type property on target datastores was introduced in 12c and supports a set number of integrations to help filter the IKMs, if you select None, you can pick any IKM applicable for the technology.

If you look at the physical design for the above mapping you can now pick the IKM which performs the delete, the subquery filter also has a component KM which produces code, this code is rolled into the odiRef methods in the template. This then lets arbitrary map designs to be created and still build customized KMs for loading, integrating and error management. The odiRef substitution methods are simple text based APIs for producing text based on simple primitives.

This then produces the following SQL statement to perform the DELETE DML, our IKM has no knowledge of how to construct subquery filter, pivot or any other complex transformation - the code has been modularized. The code produced is below;
  1. DELETE FROM OE.CONTACT_LIST T WHERE (CUSTOMER_ID)
  2. IN (
  3. SELECT   CUSTOMER_ID
  4. FROM OE.ADDRESSES ADDRESSES
  5. WHERE (1=1)   and ( ( ADDRESSES.CUSTOMER_ID ,  ADDRESSES.PHONE_NUMBER )
  6.                = SOME  (  
  7. SELECT 
  8.   CUSTOMERS.CUSTOMER_ID  CUSTOMER_ID ,   CUSTOMERS.CUST_FIRST_NAME  PHONE_NUMBER  
  9. FROM
  10.  OE.CUSTOMERS CUSTOMERS  
  11.  WHERE  CUSTOMERS.MARITAL_STATUS = 'married' AND CUSTOMERS.CREDIT_LIMIT > 1000  AND  ADDRESSES.CUSTOMER_ID=CUSTOMERS.CUSTOMER_ID 
  12.     )
  13.    )  
  14. )

Another useful illustration of things working in harmony is related to the improved code generation for Oracle connectivity - the basics of database links have been greatly improved (no more requirements for view on source, but still supported). The framework change in order to support this will have more use cases down the line too. Now you can use the 12c database link LKMs to pull data from tables over a database link and leverage custom IKMs on the target. One of the 11g issues was related to the requirements the 11g Oracle to Oracle database link LKM placed on users, this has been greatly improved, you can now take advantage of this, plus use existing IKMs, build new ones etc.

The mapping above uses new 12c LKMs to access remote Oracle datastores and a customized IKM to illustrate the mix and match capabilities. 

Hopefully this gives you some background and insight into the ODI 12c mapping and knowledge module capabilities that you weren't aware of. Looking forward to any questions and additional ideas, insights that you have.

Wednesday Dec 18, 2013

ODI 12c - Temporal Data Loading

The temporal validity feature in 12c of the Oracle Database is a great feature for any time based data - see the 12c database OBE here for a demo of the database capability. If you are thinking dimensional data that varies over time.... the temporal validity capabilities of 12c are a great fit, worth checking it out. I wanted to post this to stir discussion in this area, this uses some customized KMs to integrate data into a database table that supports the temporal validity capabilities of the 12c database. This gives a few benefits - one is with respect to querying, you can easily query the current view of the world without adding filters on numerous tables and so on, you can also query back in time - again without having such filters messing up semantics of your logical query.

  1. create table DIM_CUSTOMER (
  2. KEY NUMBER NOT NULL,
  3. H_BID VARCHAR2(5) NULL,
  4. H_ADDR VARCHAR2(20) NULL,
  5. H_ZIP VARCHAR2(5) NULL,
  6. C_BID VARCHAR2(5) NULL,
  7. C_MSTAT VARCHAR2(10) NULL,
  8. C_TELE VARCHAR2(10) NULL
  9. );
  10. alter table DIM_CUSTOMER add period for CURRENT_IND;  

The data tables for your slowly changing dimension tables probably already have start/end timestamps, the above example shows how temporal information can be added to a table, adding the temporal validity capability is as simple as an 'ALTER TABLE' on the existing table. If we then try and reverse this in ODI, the columns to support temporal validity will not be reversed as they are hidden and the RKM Oracle shipped with ODI uses the ALL_TAB_COLUMNS dictionary view which does not project hidden columns. Creating a new RKM Oracle (Inc Hidden) lets us reverse engineer such tables into ODI. You can download my modified RKM here. Using this lets me reverse the datastore above and see the following definition in ODI (the RKM uses all_tab_cols rather than all_tab_columns etc so I can see hidden columns);


You can see the columns CURRENT_IND_START, CURRENT_IND_END and CURRENT_IND have been defined on the table, this was as a result of adding the temporal validity to the table in the ALTER TABLE statement above. I have defined the SCD tags for starting timestamp, ending timestamp and current record flag on the attributes. Note I can also define how new records are versioned, I define that some columns will simply have their values overwritten and other columns will trigger new rows to be added. I have also defined a surrogate key - this is optional, so you do not need to create such annotations to use this IKM.

The data is then loaded and I can specify the start/end timestamps for the row when it is loaded. The IKM will load the data, updating data marked as overwrite and inserting new rows where trigger data has changed.

The IKM is almost identical to the IKMs for slowly changing dimensions with some minor changes - ODI in the slowly changing KM will manage the current row indicator, with Oracle this is a virtual column so is computed. The population of that column has been removed from the KM plus a few other places it was used. I called it IKM Oracle Temporal Data, you can download it here.

 If I load my data then change the status from Married to Single and rerun, then I will get the following rows loaded, note I am querying the surrogate key also since I supplied that metadata (this is optional);

  1. select key, c_mstat,
  2. to_char(current_ind_start,'dd-mon-yyyy') "Start",
  3. to_char(current_ind_end,'dd-mon-yyyy') "End"
  4. from scott.dim_customer;
  5.        KEY C_MSTAT, Start       End               
  6.         11 Married 18-dec-2013 18-dec-2013
  7.         12 Single 18-dec-2013 01-jan-2400

A new surrogate key is generated for the versioned row. If I define only the current time to be used, then the database will automatically filter my data;

  1. exec dbms_flashback_archive.enable_at_valid_time('CURRENT');

  2. select key, c_mstat,
  3.   to_char(current_ind_start,'dd-mon-yyyy') "Start",
  4.   to_char(current_ind_end,'dd-mon-yyyy') "End"
  5. from scott.dim_customer;
  6.        KEY C_MSTAT Start End     
  7.         12 Single 18-dec-2013 01-jan-2400

I can also query as of a particular time without impacting my query

  1. exec dbms_flashback_archive.enable_at_valid_time('ASOF','18-DEC-13 08.28.30.771994 AM');

  2. select key, c_mstat,
  3.   to_char(current_ind_start,'dd-mon-yyyy') "Start",
  4.   to_char(current_ind_end,'dd-mon-yyyy') "End"
  5. from scott.dim_customer;
  6.        KEY C_MSTAT Start End
  7.         11 Married 18-dec-2013 18-dec-2013

Imagine without this, on the query side you will have to have lots of conditional queries based on time for each object that varies by time. With this approach you define the time window and query. Simple. The KMs help you easily load data into such tables too, you can see how flexible ODI is with respect to providing new integration patterns to take advantage of the greatest and latest features. The KMs used in this blog post have been posted on the java.net site to stir ideas and further discussion, I'm interested to hear what you think.

Tuesday Dec 17, 2013

ODI 12c - Slowly Changing Dimensions

Here we see how to setup a slowly changing dimension load in ODI 12c, everything from defining the metadata on the datastore to loading the data. I illustrated how this was done in ODI 11g here, for ODI 12c this has been simplified, the ODI 12c viewlet here provides a very quick look at setting up a datastore for supporting slowly changing dimension data loading. It uses the IKM Slowly Changing Dimension and shows dimension members being versioned when for example a marital status change happens.

The datastore editor has been enhanced to be able to quickly annotate attributes in the datastore with SCD metadata - information such as the surrogate key attribute, or the attribute that triggers history.

This used to be a particular pain to setup, so now you can quickly define this information and also get a convenient display/overview of the attributes and what they mean and are used for. Check out the viewlet above, see how it hangs together. I plan on illustrating thus further, so we have seen how you can load slowly changing dimensions for a data warehouse, we can take this even further and use the temporal validity feature of the Oracle 12c database - how you load temporal data, what does the KM give you etc. 

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today