Tuesday Apr 22, 2014

Fusion Application Incremental Data Migration

Written by Ayush Ganeriwal, Oracle Data Integrator Product Management

In the last post we discussed how Fusion Application uses ODI to transform and move data from their interface tables to the internal tables. In this article we will look into the use case of how Fusion Application uses ODI for extracting data from legacy applications and loading into Interface tables.

Fusion Applications have created a large number of ODI interfaces and packages for addressing this use case. These ODI artifacts are shipped to Fusion Application customers for performing the initial and incremental data migration from their legacy applications to Fusion Applications interface tables. These shipped artifacts can be customized as per the customizations in the customer’s environment. The below diagram depicts the data migration from Siebel to Fusion Customer Relationship Management (CRM). The whole process is performed in two stages. First the data is extracted from the underlying database of Siebel’s production system into staging area on Oracle database and then migrated into Fusion Applications interface tables. ODI is used for both of these operations. Once the initial migration is done then the trickle of change data is replicated and transformed through ODI and Golden gate combination.


Incremental Migration using ODI and Golden Gate

Initial Load

The initial is the bulk data movement process where the snapshot of the data from legacy application is moved into staging area in Oracle database. Depending upon the underlying database type of the legacy application, appropriate ODI Knowledge Module is used in the interfaces to get the best performance. For instance, for the Oracle to Oracle data movement the Knowledge Modules with DBLINK is used to move data natively through DBLINK.

Replication

The replication process takes care of moving data incrementally after the Initial load is complete. Oracle Golden Gate is leveraged for this incremental change data replication which continuously replicates the changes into the staging area. The trickle of change data is then moved from staging area to Fusion Applications interface tables through ODI’s change data capture processes using ODI Journalizing Knowledge Modules.

Thanks again for reading about ODI in the Fusion Applications!  This was the last in a series of three posts.  To review the related posts:  Oracle Data Integrator (ODI) Usage in Fusion Applications and Fusion Application Bulk Import Process.

Wednesday Apr 16, 2014

Learn about Oracle Data Integrator (ODI) Agents

Check out two new ODI A-Team blog posts – all about the Oracle Data Integrator (ODI) Agents! Understand where to install the ODI standalone agent, and find out more about the ODI agent flavors and installation types. Which one(s) make sense for you?

Understanding Where to Install the ODI Standalone Agent

ODI Agents: Standalone, JEE and Colocated 

Happy reading!

Tuesday Apr 15, 2014

Fusion Application Bulk Import Process

Written by Ayush Ganeriwal, Oracle Data Integrator Product Management

In the previous blog post we looked at the Fusion Applications end-to-end bulk data integration use cases. Now let’s take a closer look at the Bulk Import process that transforms and moves data from Interface tables to internal tables. For this use case ODI is bundled along with Fusion Application and get configured transparently by the Fusion Application provisioning process. The entire process is automated and controlled through the Fusion Application User Interface. It also seeds the ODI repository with the Fusion Application specific Models, Interfaces, Procedures and Packages which are then dynamically modified through ODI SDK for any Fusion Application customizations.


Fusion Application Bulk Import Process

The above diagram shows the Bulk import process in Fusion Application where ODI is used for data transformation. Here the Interface tables are the source tables which were populated by other processes before the kicking off the Bulk Import process. The Fusion Application internal tables are the target for these integrations where the data needs to be loaded. These internal tables are directly used for Fusion Application functionalities therefore a number of data validations are applied to load only the good quality data into the internal tables. The data validation errors are monitored and corrected through Fusion Application User Interface. The metadata of Fusion Application tables is not fixed and gets modified as the Application is customized for customer’s requirement. Any change in such source or target tables would require corresponding adjustments in ODI artifacts too and is taken care of by the AppComposer which uses ODI SDK to make such changes in ODI artifacts. If auditing is enabled then any change in the internal table data or the changes in ODI artifacts are recorded in centralized auditing table.

Packaged ODI Artifacts

There are a large number of ODI models, interfaces and packages seeded in the default ODI repository used for Bulk Import. These ODI artifacts are built based upon the base metadata of Fusion Application schema.

Extensibility

As part of the customization, Fusion Application entities are added or modified as per the customer’s requirement. Such customizations result in changes in the underlying Fusion Application’s internal tables and interface tables, and require the ODI artifacts to be updated accordingly. The Fusion Application development team as built the extensibility framework to update ODI artifacts dynamically along with any change in Fusion Application schema. It leverages the ODI-SDK for performing any changes in the ODI repository. The dynamic generation of ODI artifacts is automatically kicked off as part of Patching and Upgrades process. Fusion Application AppComposer User Interface also supports explicitly triggering this process so that administrators can regenerate ODI artifacts whenever they make any customizations.

Validation Error Reporting

The validation errors are populated in intermediate tables and are exposed through BI Publisher so that admin users can correct and recycle these error records.

Auditing

The Fusion Application auditing framework keeps track of the changes performed by each of the users and at what time. There are two levels of auditing captured in Fusion Application audit table for Bulk Import use case. First, metadata changes in ODI artifacts through ODI SDK during customizations. Second, the transactional data changes in the Fusion Application table data as part of ODI interfaces execution. For these purposes the ODI team has exposed some substitution APIs that are used by Fusion Application development team to customize ODI KMs to perform such auditing during the actual data movement.

Provisioning and Upgrade

The provisioning process takes care of install and configuring ODI for the Fusion Application instance.

It takes care of automatically creating ODI repository schemas, configuring topology, setting up ODI agents, setup configurations for ODI –ESS bridge, seeding packaged ODI artifacts, apply modifications to seeded artifacts and create internal users in IDM for external authentication. There is a separate process to apply patches or upgrade the environment to the newer release. Such patching or upgrade processes not only take care of importing newer ODI artifacts but also kick off a CRM extensibility process that modifies ODI artifacts as per the Fusion Application customizations.

External Authentication

There is a dedicated IDM configured with each Fusion Application instance and all Fusion Application components are expected to have their users authenticated through this centralized IDM. For Bulk Import use case ODI is configured with external authentication and there are internal users created in IDM that are used for communication with ODI agent and kicking off ODI jobs.

Enterprise Scheduler Service (ESS) - ODI Bridge

The ODI scenarios are kicked off through ODI-ESS bridges. It is a separate library build for ODI-ESS integration and gets deployed along with Enterprise Scheduler Service (ESS) in Fusion Application environment. It supports both synchronous and asynchronous modes of invocation for ODI jobs. In the asynchronous mode the session status is updated to callbacks to the ESS services. There is a topology editor provided to manage the ESS callback service connectivity exclusively for Fusion Application use cases.

Note: Use of ESS-ODI Bridge is restricted to Fusion Application use case only at the moment.

High Availability

The ODI agent is deployed on Weblogic cluster in the Fusion Application environment to take advantage of ODI high availability capabilities. By default there is only one managed server in the Weblogic cluster created for ODI but as the load increases more managed servers can be added to the cluster to distribute execution of ODI sessions among ODI agent instances in the cluster.

Stay tuned for the last post on this topic coming soon.  This was part two in a series of three posts.  The initial post can be found here.

Friday Apr 11, 2014

ODI 12c - Expression Hints

The ODI 12c mapping designer let's you design a mapping using components and define expressions for each of those components. Like 11g, in 12c there are hints to let ODI know how you would like to compute a physical plan for your design and from this generate code. I think the rules about how some of this work are not known - both in 11g and 12c - people second guess how they think it works. There's no magic to it, let's have a look at it. Underpinning the mapping are the technology definitions, it's here that datatypes are defined and datatype mappings between technologies. This let's ODI be very flexible in support for arbitrary data transformations between systems and how such data and its datatype is mapped across heterogeneous systems.

Putting the heterogeneous nature aside we can look at how datatypes are transformed just in a distributed example for Oracle (the example is for demonstration, in reality the database link LKM will be used which will do no staging). The example has 2 columns in a source table that are both VARCHAR2(30), one of those columns has an actual string, the other has a date. The target system has 2 columns in our target table that are VARCHAR2(10) and DATE. Note the target for one is shorter than its source and the other is a DATE datatype and not a string.

We can define the simple table to table mapping as below and define the expressions on the target table.

By default the expressions have no hint defined and will execute where the table is executed - in this case on the target system. We can see how the C$ table would be defined by previewing the DDL code in the physical design, we will see the type/DDL in the syntax of the particular technology. Below you can see the source datatype information is propagated - the length is still 30.

 If we look at the target table we can see the expressions defined on it, in the example below I have selected the TGTEMP table and I can see the 2 expressions, I could actually change where the expression is defined for this particular physical design, I will not do that though, I will go back to the logical designer and set the hint there - then all of my potential physical designs leverage it.

Use the 'Execute on hint' property for the attribute, you can see the different options there, just now it has value no hint. 

Below I have selected 'Source' to indicate I want the SUBSTR expression executed on the source system. 

After this change has been made, if you look at the physical design you will see that the datatype information on our AP is now different. Because the data has been prepared on the source then the datatype for our C$ column now takes on the definition of the target (VARCHAR2(10)). 

This gives you some idea as to how the hints works for expressions in a datastore. ODI 12c also has an expression component that let's you define groups of expressions. I generally think this is good for when an expression is reused within a mapping, but I know everyone works differently with tools and I have heard that some people like using this for all complex expressions rather than it being defined within a component such as a target table as they can easily 'see' where complex expressions are defined. Each to their own, the good thing is that you can do whatever you like. One benefit with using the expression component is that ODI by default will push that expression component as close to the source as possible and you can easily grab the entire component and push it to the stage, target or wherever.

The mapping below defines the expressions on the expression component, again there are no hints defined on the component or individual expressions.

When the physical design for this mapping is inspected we can see the expression component is by default on the source execution unit. This goes for other components too (filter, join etc.). In this case you can see both of the columns in the AP take on the downstream target table's datatypes (VARCHAR2(10) and DATE).

Changing the hint on the logical design for the expression component to stage will place the expression in the downstream execution unit. If I had just switched the hint to be stage for the expression component then in the physical design the expression would go in TARGET_UNIT_1. In 11g, ODI also supported a concept where the stage was different for the target. This is still available in 12c and is configured by defining what you want to execute in the stage by using these hints plus defining what the stage is (so similar to 11g apart from you don't have to switch tabs and the gestures are more simple). So firstly, define the expression to execute on the stage using that hint. Then on the logical mapping if you click on the canvas background you will see a property named 'Staging Location Hint', you can set this to the logical schema location for the staging area if you have one. By default it is not set as the staging area is the same as the target.

Let's change this to MEMORY_ENGINE just to see what the physical design looks like. We see we now have multiple execution units and the middle one where we executed the expression component is the 'stage' executing on the MEMORY_ENGINE location.

The hints are done on the logical design. You can also hard-wire physical changes in the physical design, I will go into that in a subsequent post but wanted to stick to the logical hints here to demystify how this works. I hope this is some useful background, I think for ODIers from 11g it will help.

Oracle Data Integrator (ODI) Usage in Fusion Applications (FA)

Written by Ayush Ganeriwal, Oracle Data Integrator Product Management

Oracle Data Integrator (ODI) is the bulk data transformation platform for Fusion Applications (FA). ODI is used by Fusion Customer Relationship Management (CRM), Fusion Human Capital Management (HCM), Fusion Supply Chain Management (SCM), Fusion Incentive Compensation (IC) and Fusion Financials family products and many other Fusion Application teams are following suit. Among all these product families CRM is the biggest consumer of ODI leveraging a breadth of ODI features and functionality, out of which some features were developed specifically for Fusion Applications use. Some ODI features they utilize include: ODI SDK, high availability, external authentication, various out of the box and customized Knowledge Modules, ODI-ESS bridge, callbacks to ESS EJBs, auditing, open tools, etc. In this post we will first talk about the different Fusion Application use cases at higher level and then take a closer look at different integration points.

Figure 1 shows data integration need of a typical on-premise Fusion Applications deployment.

  1. Bulk Import: Fusion Applications exposes a set of interface tables as the entry point for data load from any outside source. The bulk import process validates this data and loads it in the internal table which can then be used by the fusion application.
  2. Data Migration: Extracting data from external applications, legacy applications or any other data source and loading it into Fusion Application’s interface table. ODI can be used for such data load.
  3. Preparing Data Files: Converting data into Comma Separated Values (CSV) files that can be imported through Fusion Application’s files import wizard. ODI can be used to extract data into such CSV file.

Figure 1: Data Integration Needs in On-Premise Fusion Application

Figure 2 shows the on-demand or cloud environment requirements, which are slightly different as there is no direct connectivity available to the interface tables.

  1. Bulk Import: Fusion Application exposes a set of interface tables as the entry point for any data load from any outside source. The bulk import process validates this data and then loads it in the internal table which can then be used by the application.
  2. Preparing Data Files: Converting data into CSV files that can be imported through Fusion Application’s files import wizard. ODI can be used for creation on such CSV files.
  3. Uploading data files: The data files are uploaded to the Tenant File repository through either Fusion Application’s File import page or Oracle WebCenter Content Document Transfer Utility. The WebCenter Utility is built using ODI open tool framework allowing orchestrating entire process through ODI package.
  4. Loading Interface Table: Data files to be loaded in the interface tables so that it can be consumed by the Bulk Import process. ODI is used for loading these interface tables.

Figure 2: Data Integration Needs for On-Demand Fusion Application

Stay tuned for more blog posts on this topic coming next week. This was part one in a series of three posts.


Tuesday Apr 01, 2014

ODI 12c - Mapping Builder

A few years ago I posted a utility (see interface builder post here) to build interfaces from driver files, here I have updated it for 12c to build mappings from driver files. The example uses a tab delimited text file to control the mapping creation, but it could be easily taken and changed to drive from whatever you wanted to capture the design of the mapping.

The mapping can be as complete or incomplete as you’d like, so could just contain the objects or could be concise and semantically complete.

The control file is VERY simple and just like ODI requests the minimal amount of information required. The basic format is as follows;So for example the control file below can define the sources, target, joins, mapping expressions etc;

Directive Column2 Column3 Column4 Column5 Column6
 source  <model>  <datastore>  <alias>
 .....can add many
 target  <model>  <datastore>
 filter  <filter_condition>  <filter_datastore_alias>  <alias>
 lookup  <model> <datastore>  <driver_alias> <lookup_condition>  <alias>
 join  <join_condition>  <alias>
 ....can add many of the components above.
 mapping  <column>  <expression>

So for example the control file below can define the sources, target, joins, mapping expressions etc;

  • source SOURCE EMP EMP
  • source SOURCE DEPT DEPT
  • target TARGET_MODEL TGTEMP
  • join EMP.DEPTNO = DEPT.DEPTNO AJOIN
  • filter EMP.SAL > 1 EMP AFILTER
  • lookup SOURCE BONUS EMP BONUS.ENAME = EMP.ENAME ALOOKUP
  • mapping ENAME UPPER(EMP.ENAME) 
  • mapping DEPTNO ABS(DEPT.DEPTNO) 
  • mapping COMM ABS(BONUS.COMM)

When executed, this generates the mapping below with the join, filter, lookup and target expressions from the file;


You should be able to join the dots between the control file sample and the mapping design above. You will need to compile and execute the code in OdiMappingBuilder;

java –classpath <cp> OdinterfaceBuilder jdbc:oracle:thin:@localhost:1521:ora112 oracle.jdbc.OracleDriver ODI_MASTER mypwd WORKREP1 SUPERVISOR myodipwd DEMOS SDK DEMO1 < mymappingcontrolfile.tab

The mapping to be created is passed from the command line. You can intersperse other documentation lines between the control lines so long as the control keywords in first column don’t clash. See the driver file below viewed from within Excel;

Anyway some useful snippets of code for those learning the SDK (download OdiMappingBuilder here), or for those wanting to capture the design outside and generate ODI mappings. Have fun!

Thursday Mar 27, 2014

Interested in presenting and sharing your insights around Data Integration at OpenWorld?

You have been successful in making your organization run smoother. Faster. More cost-effectively. You have come up with the perfect solution to increase your staff retention, speed up your lead to sales pipeline, or minimize your supply management costs. We want to hear your story. Submit your proposal today and share your success at OpenWorld 2014.

Send us a proposal covering your Data Integration success. If it is selected, you will share your idea, experiences, and stories with Oracle customers, developers, and partners from around the world. You will also get a complimentary full pass to the conference.

Learn more about submitting your proposal.

Conference attendees want to hear it straight from you.

Don't wait—proposals must be submitted by April 15!

Wednesday Mar 12, 2014

ODI 12c - Data Input Experts

Back in the olde days of OWB I blogged about a few utilities (see here) that were useful for collecting user input data in custom flows, users build such flows to implement accelerators to take the mundane tasks out of common activities. In ODI you can also use groovy SwingBuilder, this let's you build useful dialogs very easily. I posted some examples such as the one below for model creation in ODI and a launchpad example;

The utilities for OWB I mentioned in the blog are just basic java classes that were invoked from OWB via tcl/jacl. These utilities are written in java and can still be used from ODI via groovy. Still as useful, still as functional. Let's see how we call them now!

The required JARs need to be put on the groovy classpath, which is under the ODI IDE's Tools->Preferences option, and then under ODI->System->Groovy and set the groovy classpath to include jexpert.jar, tcljava.jar and jacl.jar. For example I have the following referencing the JARs from my 11gR2 database which has the OWB code;

  • D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jexpert.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\tcljava.jar;D:\app\dallan\product\11.2.0\dbhome_1\owb\lib\int\jacl.jar

I can then launch the shuttle dialog for example as follows;

  1. import oracle.owb.jexpert.ShuttleObjects
  2. arrayOfString = [ "PRODUCT_ID", "PRODUCT_NAME", "PRODUCT_COLOR", "PRODUCT_DESC", "PRODUCT_LONG_DESC", "CATEGORY_ID", "CATEGORY_NAME", "CATEGORY_DESCRIPTION", "SUBCATEGORY_ID", "SUBCATEGORY_NAME", "SUBCATEGORY_DESCRIPTION" ]
  3. sels = ShuttleObjects.getselection("Select dimension levels", "Select columns to identify levels:", "Columns:", "Levels", (String[]) arrayOfString.toArray())

  4. println sels

I can use the returned variable sels and do whatever ODI stuff I need, you can see the code above executed from within ODI and the dialog appearing with the information;

Likewise the data entry dialog works as is, when that dialog is executed from groovy, just like in OWB we can get the information displayed, the user can enter data, we can collect it and action it in our groovy using the ODI SDK;

The blog on the 12c mapping SDK here has a good SDK reference table that gives you pointers for all parts of the product into the SDK areas. This is definitely a handy one to bookmark, I often use it myself. Learn some scripting it'll help save you are your teams a lot of time.

Friday Mar 07, 2014

Using the Unpivot Component in ODI 12c

As we’ve seen last week the latest ODI 12c patches added several new Mapping Components such as Pivot or Unpivot. In this blog post we will walk you through an example of how to use the new Unpivot Component. 

You can use the following SQL statements to recreate this example in your environment. It will create the source (UNPIVOT_TEST) and target (TRG_UNPIVOT_TEST) tables used in this article in your database then you can reverse engineer them in ODI.

CREATE TABLE unpivot_test (
year NUMBER,
q1_sales NUMBER,
q2_sales NUMBER,
q3_sales NUMBER,
q4_sales NUMBER
);
insert into unpivot_test values (2012, 10.5, 11.4, 9.5, 8.7);
insert into unpivot_test values (2013, 9.5, 10.5, 10.3, 7.6);
commit;
CREATE TABLE trg_unpivot_test (
year NUMBER,
quarter VARCHAR2(255),
sales NUMBER
);

Our goal is to unpivot the data on the Quarter columns when going from UNPIVOT_TEST into TRG_UNPIVOT_TEST as shown below


Follow these steps to add and configure an Unpivot Component in a Mapping:

  1. First add the Source table UNPIVOT_TEST into your Mapping, to do so drag and drop the datastore from the Models into the Mapping
  2. Next add an Unpivot component into the Mapping. This is done by clicking on the Unpivot Component in the Component palette and then clicking on the Mapping diagram. A new UNPIVOT component will appear in the Mapping:

  3. Drag and drop the YEAR column from UNPIVOT_TEST into the UNPIVOT component. There is no need to add the remaining source attributes yet, we will be using them in the Unpivot Transforms section later.

  4. Then click on the UNPIVOT component and select the Attributes panel. We will add 2 new attributes representing the row and value locator attributes: QUARTER (VARCHAR) and SALES (NUMERIC).

  5. Click on the General panel and select QUARTER in the Row Locator drop-down list. We will be performing the unpivot operation using this column.

  6. Next add the transformations to unpivot the data. Click on the Unpivot Transforms panel and create a transformation rule for each of the 4 quarters and match the appropriate QUARTER and SALES values as shown below:

  7. Finally add the Target datastore TRG_UNPIVOT_TEST and link the UNPIVOT component to it. Leave the default settings in the Attributes Matching panel and click OK

  8. In this example you can use the default Physical settings for your Mapping. Integration Type is set to Control Append by default and the IKM Oracle Insert is used
  9. Finally click on Run to execute the Mapping, 8 inserts are performed and you should see the following data in your target table:

  10. If you review the generated code you will notice that ODI leverages the UNPIVOT function on Oracle to perform such operation. The Unpivot component supports Oracle as well as any database

You can recreate the following example using the ODI 12c Getting Started VirtualBox image which is available on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/odi-demo-2032565.html

Friday Feb 28, 2014

Pivoting Data in ODI 12c

We have recently added several new Mapping Components in Oracle Data Integrator 12c such as Pivot or Unpivot. In this blog post we will walk you through an example of how to use the new Pivot Component.

You can use the following SQL statements to recreate this example in your environment. It will create the source (PIVOT_TEST) and target (TRG_PIVOT_TEST) tables used in this article in your database then you can reverse engineer them in ODI.

CREATE TABLE pivot_test (
year NUMBER,
quarter VARCHAR2(255),
sales NUMBER
);

insert into pivot_test values (2012, 'Q1', 10.5);
insert into pivot_test values (2012, 'Q2', 11.4);
insert into pivot_test values (2012, 'Q3', 9.5);
insert into pivot_test values (2012, 'Q4', 8.7);
insert into pivot_test values (2013, 'Q1', 9.5);
insert into pivot_test values (2013, 'Q2', 10.5);
insert into pivot_test values (2013, 'Q3', 10.3);
insert into pivot_test values (2013, 'Q4', 7.6);
commit;

CREATE TABLE trg_pivot_test (
year NUMBER,
q1_sales NUMBER,
q2_sales NUMBER,
q3_sales NUMBER,
q4_sales NUMBER
);

Our goal is to pivot the data on the Quarter column when going from PIVOT_TEST into TRG_PIVOT_TEST as shown below:


Follow these steps to add and configure a Pivot Component in an ODI 12c Mapping:

  1. First add the Source table PIVOT_TEST into your Mapping, to do so drag and drop the PIVOT_TEST datastore from the Models into the Mapping.
  2. Next add a Pivot component into the Mapping. This is done by clicking on the Pivot Component in the Component palette and then clicking on the Mapping diagram. A new PIVOT component will appear in the Mapping:

  3. Drag and drop the YEAR column from PIVOT_TEST into the PIVOT component. There is no need to add the QUARTER and SALES attributes yet, they will be used later in the Row Locator and Attributes sections.

  4. Click on the PIVOT component and in the Properties window select the Row Locator panel. In our example the Row Locator will be the QUARTER column which is transposed from rows into 4 columns in our target table TRG_PIVOT_TEST.

  5. Open up the Expression Editor next to the Row Locator field and select the QUARTER column from our source table PIVOT_TEST. Then click OK.


  6. Now specify the various values the QUARTER column can take. This is done using the Row Locator Values table. Click on the + icon under Row Locator Values and add the 4 possible values: ‘Q1’, ‘Q2’, ‘Q3’ and ‘Q4’.

  7. Then click on the Attributes panel and add the 4 output attributes which correspond to each Row Locator values: Q1_SALES, Q2_SALES, Q3_SALES and Q4_SALES.

  8. Next select a Matching Row for the output attributes you just created. The Matching Row values come from the Row Locator Values entered earlier.
    Pick ‘Q1’ for Q1_SALES, ‘Q2’ for Q2_SALES, ‘Q3’ for Q3_SALES and ‘Q4’ for Q4_SALES.
    Finally enter an expression for each of the new attributes, use PIVOT_TEST.SALES for all of them as we are interested in getting the Sales data into those columns. You can type the expression using the Expression Editor or drag and drop the SALES column from PIVOT_TEST into each of the newly created attributes.

  9. Finally add the target table TRG_PIVOT_TEST and connect the PIVOT component to it. Unselect the Create Attributes on Source checkbox in the Attribute Matching window and click OK to finish the Mapping configuration.

  10. In this example you can use the default Physical settings for your Mapping. Integration Type is set to Control Append by default and the IKM Oracle Insert is used.
  11. Click on Run to execute the Mapping, 2 inserts are performed and you should see the following data in your target table.

  12. If you review the generated code you will notice that ODI leverages the PIVOT function on Oracle to perform such operation. The Pivot component supports Oracle as well as any other database supported by ODI 12c.

You can recreate the following example using the ODI 12c Getting Started VirtualBox image which is available on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/odi-demo-2032565.html

Friday Feb 14, 2014

ODI 12c - Table Functions, Parallel Unload to File and More

ODI 12c includes a new component for integrating and transformation data programmatically, there have been plenty of examples through the years of such implementations, recent examples include SQL access to R from Mark Hornick (see an example blog here). As well as a great integration technique they have fantastic performance and scalability options - hence you see posts and talks from Kuassi Mensah on in-database map-reduce; all about leveraging the Oracle database's parallel query engine and the skills you already have (SQL and PLSQL/java).

The table function component in ODI 12c lets you integrate an existing table function implementation into a flow - the parameters for the table function can be scalar or a ref cursor, you can see how the examples from the AMIS posting here are defined within the mapping designer below, there are multiple table functions chained together, used as both a data source and a transformation;

In the above image you can see the table function name defined in the database is specified in the component's general properties (property is Function Name). The signature for the function must be manually defined by adding input/output connector points and attributes. Check the AMIS blog and reflect on the design above.

Regarding performance, one of the examples I blogged (OWB here and ODI here) was parallel unload to file. The table function examples from those previous blogs were fairly rudimentary, in this blog we will see what happens when we tweak the implementation of such functions - we can get much better performance. Here is the table function implementation I will use within the ODI examples (the type definitions used come from the OWB blog post above).

  1. create or replace function ParallelUnloadX (r SYS_REFCURSOR) return NumSet 
  2.    PIPELINED  PARALLEL_ENABLE (PARTITION r BY ANY) is 
  3.    TYPE row_ntt IS TABLE OF VARCHAR2(32767);
  4.    v_rows row_ntt;
  5.    v_buffer VARCHAR2(32767);
  6.    i binary_integer := 0; 
  7.    v_lines pls_integer := 0;
  8.    c_eol CONSTANT VARCHAR2(1) := CHR(10); 
  9.    c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol); 
  10.    c_maxline CONSTANT PLS_INTEGER := 32767; 
  11.    out utl_file.file_type; 
  12.    filename varchar2(256) := 'dbunload'; 
  13.    directoryname varchar2(256) := 'MY_DIR'; 
  14.    vsid varchar2(120); 
  15. begin 
  16.    select sid into vsid from v$mystat where rownum=1; 
  17.    filename := filename || vsid || '.dat'; 
  18.    out := utl_file.fopen (directoryname, filename , 'w');

  19.    loop 
  20.      fetch r BULK COLLECT INTO v_rows; 
  21.      for i in 1..v_rows.COUNT LOOP
  22.        if LENGTH(v_buffer) + c_eollen + LENGTH (v_rows(i)) <= c_maxline THEN
  23.          v_buffer := v_buffer || c_eol || v_rows(i);
  24.        else
  25.          IF v_buffer IS NOT NULL then
  26.            utl_file.put_line(out, v_buffer);
  27.          end if;
  28.          v_buffer := v_rows(i);
  29.        end if;
  30.      end loop;
  31.      v_lines := v_lines + v_rows.COUNT;
  32.      exit when r%notfound;
  33.    end loop;
  34.    close r;
  35.    utl_file.put_line(out, v_buffer); 

  36.    utl_file.fclose(out); 
  37.    PIPE ROW(i); 
  38.    return ;
  39. end; 
  40. /

The function uses PARALLEL_ENABLE and PARTITION BY keywords - these 2 are critical to performance and scalability. In addition, this function is further optimized; it uses the PLSQL BULK COLLECT capability and also buffers data in PLSQL variables before writing to file (this avoids IO calls). This was not rocket science to tune (plenty of posts on PLSQL IO tuning such as this) yet you can see the impact it has on performance further below.

My mapping using the table function as a target is shown below, 

In the physical design I define the parallel hints, this will then perform parallel unloads to file and you can easily leverage the hardware and power of the Oracle database. Using the hints to tweak the physical design let's the designer very easily compare and tune performance - you do not have to design the parallelism in your own flows.

In the table below you can see the performance difference when I use the PARALLEL(4) hint on a 4 CPU machine;

No Hint PARALLEL(4)
5 million rows  16s  6s
32 million rows 200s  47s 

If I execute the agent based SQL to file LKM, the time taken out of the box is 398 seconds (slower than 47s above when a hint is used) on the 32 million row example, the only divide and conquer techniques with the LKM are building a custom workflow to do such. With the table function approach if your database is on a bigger, more powerful host you can easily take advantage of the system by tweaking the hints.

As you see, the ODI table function component provides another custom exit point in a flow which let's you not only provide some useful integration capabilities but you can also do it in a very efficient manner - leveraging and exploiting the database you are running on. Hopefully this gives you a little insight and an interesting use case or two.

Wednesday Feb 12, 2014

ODI 12c - Mapping SDK Auto Mapping

The ODI 12c release has the new flow based mapping designer, this comes with new concepts to make the mapping design as efficient as possible as well as the runtime execution of such! The 12c release also has a new SDK for mapping, the 11g SDK is still available for backwards compatibility, but if if you want to properly leverage the 12c release the new mapping designer and SDK is the way forward. I posted a bunch of SDK examples (here and here) which demonstrated different mapping designs - the examples were in groovy and the column/attribute level mapping expressions were all done explicitly, I did not illustrate any auto mapping capabilities. So... I thought I should do it here. In doing so I'll show some other APIs within the mapping area that are very useful.

The 12c release introduced mapping components and categorized such components so that we can minimize the column level mapping expressions. If you compare ODI 12c with OWB, ODI 12c has a lot less inter component and cross component information, OWB capture a lot of information in a very explicit manner (it was very verbose, concise, but verbose).

One of the useful capabilities in the UI is to perform auto mapping, the function createExpressions below will use all available in-scope attributes that are upstream from the target component and match with attributes in the component you are targeting. The match can be done by equality, ends or starts and ignore case or exact match. Quite a simple piece of code and you can see the use of the function getUpstreamLeafAttributes for components or even getUpstreamInScopeAttributes for connector points. Some components have multiple input connector points with different graphs, for example set component, other ones are simple.

  1. enum MatchTypes {EQUALS,SRCENDSWITH, TGTENDSWITH, SRCSTARTSWITH, TGTSTARTSWITH}
  2. enum MatchCaseTypes {MATCH,IGNORECASE}

  3. def createExpressions(component, conPoint, matchType, matchCaseType) { 
  4.   atts = null
  5.   if (conPoint != null)   atts = conPoint.getUpstreamInScopeAttributes()
  6.   else atts = component.getUpstreamLeafAttributes(component)
  7.   tatts = component.getAttributes()
  8.   for (MapAttribute tgt_attr : tatts) {
  9.     attr_str = tgt_attr.getName()
  10.     if (matchCaseType == MatchCaseTypes.IGNORECASE) {
  11.       attr_str = attr_str.toLowerCase()
  12.     }
  13.     sourceCol = null;
  14.     for (MapAttribute src_attr : atts) {
  15.       src_attr_str = src_attr.getName()
  16.       if (matchCaseType == MatchCaseTypes.IGNORECASE) {
  17.        src_attr_str = src_attr_str.toLowerCase()
  18.       }
  19.       if ( (matchType == MatchTypes.SRCENDSWITH && src_attr_str.endsWith( attr_str )) ||
  20.            (matchType == MatchTypes.SRCSTARTSWITH && src_attr_str.startsWith( attr_str )) ||
  21.            (matchType == MatchTypes.TGTSTARTSWITH && attr_str.startsWith( src_attr_str )) ||
  22.            (matchType == MatchTypes.TGTENDSWITH && attr_str.endsWith( src_attr_str )) ||
  23.            (matchType == MatchTypes.EQUALS && attr_str.equals( src_attr_str )) ) {
  24.        sourceCol = src_attr
  25.        break
  26.       }
  27.     }
  28.     if (sourceCol != null && conPoint != null)  tgt_attr.setExpression( conPoint, sourceCol, null )      
  29.     else if (sourceCol != null)  tgt_attr.setExpression( sourceCol )      
  30.   }
  31. }

You can then call this function on a datastore to auto map all attribute expressions in the component as follows;

  • createExpressions(tgtempDatastoreComponent, null,MatchTypes.EQUALS,MatchCaseTypes.MATCH);

To illustrate the set component, you can code the population of each connector point as follows;

  • createExpressions(setComponent, inConnectorPoint1,MatchTypes.EQUALS,MatchCaseTypes.MATCH);
  • createExpressions(setComponent, inConnectorPoint2,MatchTypes.EQUALS,MatchCaseTypes.IGNORECASE);

this will auto map the attributes in a set component for each connector point with different rules (just for illustration purposes). You can see below the result of calling these 2 functions on the set component for each connector point. All upstream in-scope attributes are considered.

These APIs to get scoping attributes make it simple to build customized accelerators for building expressions when auto mapping. Its a little different than in 11g, have a look at the examples I posted above and the snippets above, there's a lot you can do and its easy to utilize.

Wednesday Jan 29, 2014

ODI 12.1.2 Demo on the Oracle BigDataLite Virtual Machine

Oracle's big data team has just announced the Oracle BigDataLite Virtual Machine, a pre-built environment to get you started on an environment reflecting the core software of Oracle's Big Data Appliance 2.4. BigDataLite is a VirtualBox VM that contains a fully configured Cloudera Hadoop distribution CDH 4.5, an Oracle DB 12c, Oracle's Big Data Connectors, Oracle Data Integrator 12.1.2, and other software.

You can use this environment to see ODI 12c in action integrating big data with Oracle DB using ODI's declarative graphical design, efficient EL-T loads, and Knowledge Modules designed to optimize big data integration. 

The sample data contained in BigDataLite represents the fictional Oracle MoviePlex on-line movie streaming company. The ODI sample performs the following two steps:

  • Pre-process application logs within Hadoop: All user activity on the MoviePlex web site is gathered on HDFS in Avro format. ODI is reading these logs through Hive and processes activities by aggregating, filtering, joining and unioning the records in an ODI flow-based mapping. All processing is performed inside Hive map-reduce jobs controlled by ODI, and the resulting data is stored in a staging table within Hive.
  • Loading user activity data from Hadoop into Oracle: The previously pre-processed data is loaded from Hadoop into an Oracle 12c database, where this data can be used as basis for Business Intelligence reports. ODI is using the Oracle Loader for Hadoop (OLH) connector, which executes distributed Map-Reduce processes to load data in parallel from Hadoop into Oracle. ODI is transparently configuring and invoking this connector through the Hive-to-Oracle Knowledge Module.

Both steps are orchestrated and executed through an ODI Package workflow. 

Demo Instructions

Please follow these steps to execute the ODI demo in BigDataLite:

  1. Download and install BigDataLite. Please follow the instructions in the Deployment Guide at the download page
  2. Start the VM and log in as user oracle, password welcome1.
  3. Start the Oracle Database 12c by double-clicking the icon on the desktop.


  4. Start ODI 12.1.2 by clicking the icon on the toolbar.


  5. Press Connect To Repository... on the ODI Studio window. 


  6. Press OK in the ODI Login dialog.


  7. Switch to the Designer tab, open the Projects accordion and expand the projects tree to Movie > First Folder > Mappings. Double-click on the mapping Transform Hive Avro to Hive Staging.


  8. Review the mapping that transforms source Avro data by aggregating, joining, and unioning data within Hive. You can also review the mapping Load Hive Staging to Oracle the same way. 

    (Click image for full size)

  9. In the Projects accordion expand the projects tree to Movie > First Folder > Packages. Double-click on the package Process Movie Data.


  10. The Package workflow for Process Movie Data opens. You can review the package.


  11. Press the Run icon on the toolbar. Press OK for the Run and Information: Session started dialogs. 




  12. You can follow the progress of the load by switching to the Operator tab and expanding All Executions and the upmost Process Movie Data entry. You can refresh the display by pressing the refresh button or setting Auto-Refresh. 


  13. Depending on the environment, the load can take 5-15 minutes. When the load is complete, the execution will show all green checkboxes. You can traverse the operator log and double-click entries to explore statistics and executed commands. 

This demo shows only some of the ODI big data capabilities. You can find more information about ODI's big data capabilities at:


Wednesday Jan 22, 2014

Deep Dive Into Oracle Data Integrator Changed Data Capture Leveraging Oracle GoldenGate

Check out this blog post below from Christophe – first the details related to Oracle Data Integrator’s (ODI) Journalizing Knowledge Modules (JKMs) as well as a deeper dive into the particulars around the seamless out-of-the-box integration between Oracle Data Integrator (ODI) and Oracle GoldenGate.

http://www.ateam-oracle.com/understanding-the-odi-jkms-and-how-they-work-with-oracle-goldengate/

Happy reading!

Thursday Jan 09, 2014

ODI - High performance data movement using Datapump

Great blog post below by Ben from the A-Team on improved Datapump support in ODI. As well as a great write up the KM is posted on the https://java.net/projects/oracledi site too.

http://www.ateam-oracle.com/using-oracle-data-pump-in-oracle-data-integrator-odi/

Feedback appreciated on this.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today