Tuesday Aug 04, 2015

Simplicity in Leveraging Oracle Data Integrator for Cloud Applications

Check out last week’s A-Team Blog post… A Universal Cloud Applications Adapter for ODI

Learn about the simplicity of leveraging Oracle Data Integrator (ODI) with all emerging technologies in the world of cloud computing!

For more A-Team reads on ODI, browse through the A-Team Chronicles.

Tuesday Jun 16, 2015

ODI - Hive DDL Generation for Parquet, ORC, Cassandra and so on

Here you'll see how with some small surgical extensions we can use ODI to generate complex integration models in Hive for modelling all kinds of challenges;

  • integrate data from Cassandra, or any arbitrary SerDe
  • use Hive Parquet or ORC storage formats
  • create partitioned, clustered tables
  • define arbitrarily complex attributes on tables

I'm very interested in hearing what you think of this, and what is needed/useful over and above (RKMs etc) 

When you use this technique to generate your models you can benefit from one of ODI's lesser known but very powerful features - the Common Format Designer (see here for nice write up). With this capability you can build models from other models, generate the DDL and generate the mappings or interfaces to integrate the data from the source model to the target. This gets VERY interesting in the Big Data space since many customers want to get up and running with data they already know and love.

What do you need to get there? The following basic pieces;

  • a Hive custom create table action here.
  • flex fields for external table indicator, table data format, attribute type metadata get groovy script here

 Pretty simple right? If you take the simple example on the Confluence wiki;

  • CREATE TABLE parquet_test (
  •    id INT,
  •    str STRING,
  •    mp MAP<STRING,STRING>,
  •    lst ARRAY<STRING>,
  •    strct STRUCT<A:STRING,B:STRING>) 
  • PARTITIONED BY (part STRING)
  • STORED AS PARQUET;

...and think about how to model this in ODI, you'll think - how do I define the Parquet storage? How can the complex types be defined? That's where the flex fields come in, you can define any of the custom storage properties you wish in the datastore's Data Format flex field, then define the complex type formats.

In ODI the Parquet table above can be defined with the Hive datatypes as  below, we don't capture the complex fields within attributes mp, 1st or stct. The partitioned column 'part' is added in to the regular list of datastore attributes (like Oracle, unlike Hive DDL today);

Using the flex field Data Format we can specify the STORED AS PARQUET info, the table is not external to Hive so we do not specify EXTERNAL in the external data field;


This value can have lots of ODI goodies in it which makes it very versatile - you can use variables, these will be resolved when the generated procedure containing the DDL is executed, plus you can use <% style code substitution. 

The partitioning information is defined for the part attribute, the 'Used for Partitioning' field is checked below;

Let's take stock, we have our table defined as Parquet and we have the partitioning info defined. Now we have to define the complex types. This is done on attributes mp, 1st and stct using the creatively named Metadata field - below I have fully defined the MAP, STRUCT and ARRAY for the respective attributes;

Note the struct example escapes the : as the code is executed via the JDBC driver when executing via the agent and ':' is the way of binding information to a statement execution (you can see the generated DDL further below with the ':' escaped). 

With that we are complete, we can now generate DDL for our model;

This brings up a dialog with options for where to store the procedure with the generated DDL - this has actually done a base compare against the Hive system and allowed me to selected which datastores to create DDL for or which DDL to create, I have checked the parquet_test datastore;

Upon completion, we can see the procedure which was created. We can chose to further edit and customize this if so desired. We can schedule it anytime we want or execute.

Inspecting the Create Hive Table action, the DDL looks exactly as we were after when we first started - the datastore schema / name will be resolved upon execution using ODI's runtime context - which raised another interesting point! We could have used some ODI substitution code <% when defining the storage format for example;

As I mentioned earlier you can take this approach to capture all sorts of table definitions for integrating. If you want to look on the inside of how this is done, check the content of the Hive action in image below (I linked the code of the action above), it should look very familiar if you have seen any KM code and uses the flex fields to inject the relevant information in the DDL;

This is a great example of what you can do with ODI and how to leverage it to work efficiently. 

There is a lot more that can be illustrated here including what the RKM Hive can do or be modified to do. Perhaps creating the datastore and complex type information from JSON, JSON schema, AVRO schema or Parquet is very useful? I'd be very interested to hear your comments and thoughts. So let me know....

Tuesday Jun 09, 2015

Oracle Data Integrator Journalizing Knowledge Module for GoldenGate Integrated Replicat Blog from the A-Team

As always, useful content from the A-Team…

Check out the most recent blog about how to modify the out-of-the-box Journalizing Knowledge Module for GoldenGate to support the Integrated Replicat apply mode.

An Oracle Data Integrator Journalizing Knowledge Module for GoldenGate Integrated Replicat

Enjoy!

Tuesday May 12, 2015

ODI 12c - Improving Usability of KM recipes

This post is all about reducing user errors and improving usability surrounding definition of Knowledge Modules and their usage. Knowledge Modules are all about encapsulating recipes - every great cookbook has lots of recipes and some are based on common techniques and ingredients. ODI Knowledge Modules are data integration recipes - they define how to access, transform and store information based on the directions in the KM. There are a few usability improvements in the recent 12.1.3.0.1 release around both the KM definition and usage of the KM that make for an improved user experience. I've seen many KMs over the years where its many things to many people and there are a bundle of options that expose all facets for every path possible in the KM - the user has to read the description and follow the instructions.

The first improvement I'll mention is the KM (and procedure) option type of 'Choice'. Not exactly rocket science here I know, but an addition that greatly helps usage of a KM that may do more than one thing. Let's take the example of a KM that can make different .....pizzas. In the past you would have an option field which was a string based value where the user would type either margerita or pepperoni to drive a path within the KM implementation, users of the KM would have to know that those were the accepted option values and they'd have to type it in properly (otherwise things would go wrong). So now the options can be specified as the 'Choice' type, see below where in the IKM we capture the recipe type as a choice.

The choices can be defined in the default value field, below the recipe is going to either create margherita pizza or pepperoni- these are the only two choices and the default is margherita;

Then I can define all the rest of the options, let's say the pizza needs flour, oil, salt, yeast and pepperoni needs... pepperoni of course and margherita needs tomatoes and basil - so some of the options are applicable to both types and some are only applicable to the specific one. Prior to this release when the KM is used you would see all of these option values and you'd be reading the description 'only set basil if you are making margherita' and so on. Another feature has been added to improve this area. Below you can see all of the options....

One column was snipped out of the image - the condition expression. This is a groovy expression to determine whether the option is shown. So now we can say only display basil when margherita pizza is the recipe type or only display pepperoni when pepperoni is the recipe type. We see below the options only applicable to the recipe type are displayed - anything common has no condition expression.

The groovy snippet must return a string. The string must be of the format show=true|false

When you see the KM assigned in the mapping it becomes a little clearer. Below you can see the choice box, the user is constrained to pick one of those types;

 When margherita is selected above remember some options were for margherita and some were for pepperoni, we see a subset of options;

Above you can see tomatoes and basic, if you change the type to pepperoni the above options are hidden and pepperoni is displayed as below;

This helps guide the user into configuration options that are more applicable to a path within the KM. One of the other visual enhancements is the ability to group options together. We can add all of the options above into a group named 'Ingredients' that helps visually group related options together;

 Then when this is used you see the options related to ingredients from where the KM is assigned.

You can see how these help improve the usability of KMs in ODI and help reduce errors by further specializing how data is entered and related in the configuration options of the KM. The tasks within the KM can retrieve the option values and perform condition code based on those values. There are some other areas around this but that's all for now. The functionality described here is available in the 12.1.3.0.1.

Friday Apr 10, 2015

This Week's A-Team Blog Speaks to Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The A-Team not only provides great content, they are humorous too!

Check out this week’s post, the title says it all: Getting Groovy with Oracle Data Integrator: Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The article covers various scripts written in Groovy and leverage the ODI SDK that assist in automating massive changes to one’s repository. These initially came to be as a result of customer desire in enhancing their environment in their effort to move from Oracle Warehouse Builder (ODI) to Oracle Data Integrator (ODI), but in the end came the realization that these scripts could be used by any ODI user.

Happy reading!

Thursday Apr 09, 2015

ODI, Big Data SQL and Oracle NoSQL

Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.

My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.

Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;

  • export KVHOME=/u01/nosql/kv-3.2.5
  • cd /u01/nosql/scripts
  • ./admin.sh

  • connect store -name kvstore
  • EXEC "CREATE TABLE recommendation( \
  •          custid INTEGER, \
  •          sno INTEGER, \
  •          genreid INTEGER,\
  •          movieid INTEGER,\
  •          PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
  • PUT TABLE -name RECOMMENDATION  -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json

The Manual Approach

This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;

  • CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
  •                  custid INT,
  •                  sno INT,
  •                  genreId INT,
  •                  movieId INT)
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
  •                            "oracle.kv.hosts"="localhost:5000",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="recommendation");

At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;

  • CREATE TABLE recommendation(
  •                  custid NUMBER,
  •                  sno NUMBER,
  •                  genreid NUMBER,
  •                  movieid NUMBER
  •          )
  •                  ORGANIZATION EXTERNAL
  •          (
  •                  TYPE ORACLE_HIVE
  •                  DEFAULT DIRECTORY DEFAULT_DIR
  •                  ACCESS PARAMETERS  (
  •                      com.oracle.bigdata.tablename=default.recommendation
  •                  )
  •          ) ;

Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;

  • SELECT m.title, g.name, c.first_name
  • FROM recommendation r, movie m, genre g, customer c
  • WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1 
  • ORDER by r.sno, r.genreid;

Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.

Oracle Data Integrator Approach

The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;


Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.

In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;

This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate. 

This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;

  • create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
  •  <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%> 
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
  •                            "oracle.kv.hosts"="<%=odiRef.getInfo("SRC_DSERV_NAME")%>",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");

You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 12.1.3.0.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.

Summary 

Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!


Monday Jan 12, 2015

ODI 12c - Eclipse and Updated Mapping Builder Example

The 12c ODI release offers a full SDK which allows great opportunities for automation. You can reduce development times and work smarter. Here I will show how to use Eclipse to configure and setup the the SDK to start working smarter. In this example I will use an updated version of Mapping Builder which is a sample SDK illustration I blogged about previously. The builder has been updated to include auto mapping of attributes, multiple targets, loading modes and more. The updated OdiMappingBuilder.java code used in this example is here

There is a viewlet illustrating the using Eclipse and the ODI SDK here. The viewlet walks through the entire configuration of an Eclipse project using the 12.1.3 release of ODI. 


The important areas are with respect to the libraries you need to define for the project, below you can see the libraries used from the ODI installation, I have installed my software in C:\Oracle\ODI_1213 in these examples.

Once the libraries are configured you can import the mapping builder java source code...

Then you must define how the code is executed, you specify the parameters as arguments. This is of course an example for illustration, you can define how you would like this to be done in whatever way you like. You specify all of the mapping builder parameters here. I also pass the control file as a parameter rather than via stdin (which is what I did in previous blog posts).

I mentioned the mapping builder code now supports auto mapping, multiple targets and loading mode, you can see an example driver file below. The line starting with >mapping has 2 qualifiers, below I have used EQUALS and IGNORECASE. The first parameter  supports the following values;

  • EQUALS - supports matching EMPNO with EMPNO (or empno with EMPNO when IGNORECASE used for example)
  • SRCENDSWITH - supports matching source XEMPNO with EMPNO (plus case option as above)
  • TGTENDSWITH - supports matching source EMPNO with XEMPNO (plus case option as above)

The second parameter supports;

  • MATCH - exact match, matches EMPNO with EMPNO or empno with empno
  • IGNORECASE - supports mismatched case matching EMPNO matches empno

As well as auto mapping you can also specify specific column level mapping explicitly as was previously supported, the viewlet has an example. 

 You can also see the control file has a target_load directive which can be given a condition or else in the final field, this let's you load multiple conditional based targets. This is a new addition to the existing target directive. So the example has 3 directives now;

  • target - load via insert/append the data
  • targetinc - incremental update / merge the data
  • target_load - insert multiple targets 

In the example above, the following mapping is created, you can see the multiple conditional based targets and data is mapped all from a simple driver file;

Using Eclipse you can quickly build and debug your utilities and applications using the SDK and get the benefits of using an IDE such as code insight and auto completion.

Check out the viewlet and the mapping builder example plus the blogs on the SDK such as the recent one on the SDK overview and work smarter. Also, the updated OdiMappingBuilder.java code used in this example is here - get an overview of the mapping builder here.

Friday Jan 09, 2015

ODI 12c - Mapping SDK Overview

In this post I'll show some of the high level concepts in the physical design and the SDKs that go with it. To do this I'll cover some of the logical design area so that it all makes sense. The conceptual model for logical mapping in ODI 12c is shown below (it's quite a change from 11g), the model below allows us to build arbitrary flows. Each entity below you can find in the 12c SDK. Many of these have specialized classes - for example MapComponent has specializations for the many mapping components available from the designer - these classes may have specific business logic or specialized behavior. You can use the strongly typed, highly specialized classes like DatastoreComponent or you can write applications in a generic manner using the conceptual high level SDK - this is the technique I used in the mapping builder here.

The heart of the SDK for this area of the model can be found here;

If you need to see these types in action, take the mapping illustration below as an example, I have annotated the different items within the mapper. The connector points are shown in the property inspector, they are not shown in the graphical design. Some components have many input or output connector points (for example set component has many input connector points). Some components are simple expression based components (such as join and filter) we call these selector components, other components project a specific shape, we call those projectors - that's just how we classify them. 

In 12c we clearly separated the physical design from the logical, in 11g much of this was blended together. In separating them we also allow many physical designs for a logical mapping design. We also had to change the physical SDK and model so that we could support multiple targets and arbitrary flows. 11g was fairly rigid - if you look at the 'limitations' sections of the KMs you can see some of that. KMs are assigned on map physical nodes in the physical design, there are some helper methods on execution unit so you can set/get KMs.

The heart of the SDK for this logical mapping area of the model can be found here;

If we use the logical mapping shown earlier and look at the physical design we have for it, we can annotate the items below so you can envisage how each of the classes above is used in the design;

The MapPhysicalDesign class has all of the physical related information such as the ODI Optimization Context and Default Staging Location (there also exists a Staging Location Hint on the logical mapping design) - these are items that existed in ODI 11g and are carried forward.

To take an example if I want to change the LKMs or IKMs set on all physical designs, one approach would be to iterate through all of the nodes in a physical design and you can check whether an LKM or an IKM is assigned for that node - this then let;s you do all sorts - from get the current setting, to setting it with a new value. The snippet below gives a small illustration using groovy of the methods from the ODI SDK;

  1.         PhysicalDesignList=map.getPhysicalDesigns()
  2.          for (pd in PhysicalDesignList){
  3.             PhysicalNodesList=pd.getPhysicalNodes()
  4.             for (pn in PhysicalNodesList){
  5.                 if (pn.isLKMNode()){
  6.                     CurrentLKMName=pn.getLKMName()
  7. ...
  8.                          pn.setLKM(myLKM) 
  9.                 }else if (pn.isIKMNode()){
  10.                     CurrentIKMName=pn.getIKMName()
  11. ...
  12.                      pn.setIKM(myIKM)

There are many other methods within the SDK to do all sorts of useful stuff - first example is the getAllAPNodes method on a MapPhysicalDesign. This gives all of the nodes in a design which will have LKMs assigned - so you can quickly set or check. The second example is the getTargetNodes method on MapPhysicalDesign - this is handy to get all target nodes to set IKMs on, Final example is to find an AP node in the physical design for a logical component in your design - use the method findNode to achieve this.

Hopefully there are some useful pointers here, worth being aware of the ODI blog on Mapping SDK the ins and outs which provides an overview and cross reference to the primary ODI objects and the underpinning SDKs. If there are any other specific questions let us know.

Wednesday Dec 17, 2014

Oracle Partition Exchange Blog from the ODI A-Team

More great information from the ODI A-Team!

Check out the A-Team’s most recent blog about the Oracle Partition Exchange – it does come in two parts:

Using Oracle Partition Exchange with ODI

Configuring ODI with Oracle Partition Exchange

The knowledge module is on Java.Net, and it is called “IKM Oracle Partition Exchange Load”.  To search for it, enter “PEL” or “exchange” in the Search option of Java.Net.

A sample ODI 12.1.3 Repository is available as well.  The ODI sample repository has great examples of how to perform both initial and incremental data upload operations with Oracle Partition Exchange.  This repository will help users to understand how to use Oracle Partition Exchange with ODI.

Happy reading!

Wednesday Nov 12, 2014

ODI 12c - Spark SQL and Hive?

In this post I'll cover some new capabilities in the Apache Spark 1.1 release and show what they mean to ODI today. There's a nice slide shown below from the Databricks training for Spark SQL that pitches some of the Spark SQL capabilities now available. As well as programmatic access via Python, Scala, Java, the Hive QL compatibility within Spark SQL is particularly interesting for ODI...... today. The Spark 1.1 release supports a subset of the Hive QL features which in turn is a subset of ANSI SQL, there is already a lot there and it is only going to grow. The Hive engine today uses map-reduce which is not fast today, the Spark engine is fast, in-memory - you can read much more on that elsewhere.

Figure taken from from the Databricks training for Spark SQL, July 2014.

In the examples below I used the Oracle Big Data Lite VM, I downloaded the Spark 1.1 release and built using Maven (I was on CDH 5.2). To use Spark SQL in ODI, we need to create a Hive data server - the Hive data server masquerades as many things, it can can be used for Hive, for HCatalog or for Spark SQL. Below you can see my data server, note the Hive port is 10001, by default 10000 is the Hive server port - we aren't using Hive server to execute the query, here we are using the Spark SQL server. I will show later how I started the Spark SQL server on this port (Apache Spark doc for this is here).

I started the server using the Spark standalone cluster that I configured using the following command from my Spark 1.1 installation;

./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.bind.host bigdatalite --hiveconf hive.server2.thrift.port 10001 --master spark://192.168.122.1:7077

You can also specify local (for test), Yarn or other cluster information for the master. I could have just as easily started the server using Yarn by specify the master URI as something like --master yarn://192.168.122.1:8032 where 8032 is my Yarn resource manager port. I ran using the 10001 port so that I can run both Spark SQL and Hive engines in parallel whilst I do various tests. To reverse engineer I actually used the Hive engine to reverse engineer the table definitions in ODI (I hit some problems using the Spark SQL reversing, so worked around it) and then changed the model to use my newly created Spark SQL data server above.

Then I built my mappings just like normal - and used the KMs in ODI for Hive just like normal. For example the mapping below aggregates movie ratings and then joins with movie reference data to load movie rating data - the mapping uses the datastores from a model obtained from the Hive metastore;

If you look at the physical design the Hive KMs are assigned but we will execute this through the Spark SQL engine rather than through Hive. The switch from engine to engine was handled in the URL within our our Hive dataserver.

When the mapping is executed you can use the Spark monitoring API to check the status of the running application and Spark master/workers.

You can also monitor from the regular ODI operator and ODI console. Spark SQL support uses the Hive metastore for all the table definitions be they internally or externally managed data. 

There are other blogs from tools showing how to access and use Spark SQL, such as the one here from Antoine Amend using SQL Developer. Antoine has also another very cool blog worth checking out Processing GDELT Data Using Hadoop. In this post he shows a custom InputFormat class that produces records/columns. This is a very useful post for anyone wanting to see the Spark newAPIHadoopFile api in action. It has a pretty funky name, but is a key piece (along with its related methods) of the framework.

  1. // Read file from HDFS - Use GdeltInputFormat
  2. val input = sc.newAPIHadoopFile(
  3.    "hdfs://path/to/gdelt",
  4.    classOf[GdeltInputFormat],
  5.    classOf[Text],
  6.    classOf[Text]

Antoine also provides the source code to GdeltInputFormat so you can see the mechanics of his record reader, although the input data is delimited data (so could have been achieved in different ways) it's a useful resource to be aware of.

If you are looking at Spark SQL, this post was all about using Spark SQL via the JDBC route - there is another whole topic on transformations using the Spark framework alongside Spark SQL that is for future discussion. You should be aware of and check out the Hive QL compatibility documentation here, check what you can do can't do within Spark SQL today. Download the BDA Lite VM and give it a try.

Tuesday Oct 28, 2014

ODI 12c and DBaaS in the Oracle Public Cloud

This article illustrates how to connect ODI on premise to Oracle in the cloud (OPC), specifically the Database as a Service (DBaaS, see doc here) offering. You will see how easy it is to configure connectivity from within ODI and the use of familiar tools gives you the same consistency from on premise use to the cloud. A big concern for cloud computing is security and ensuring access is restricted and as secure as possible. For ODI on premise the challenge is how to connect to such a secure service. ODI provides tasks for transferring files to and from locations - databases are generally accessed via JDBC.

The initial state of an Oracle DBaaS service restricts remote access to SSL - so you can't just remotely connect by default to an Oracle database listener for example (it is possible to open that up by configuring this within DBaaS). File transfer to the cloud can be done out of the box using sftp capabilities, access to the database in order to load it with data, to transform data within it and to extract data from it can be done with a small bit of SSL tunneling - let's see how. The examples discussed in this article have been developed with ODI 12.1.3, a copy of the driver which performs the SSL tunneling can be found on java.net here. With this driver it is effortless to work with an on premise ODI and an Oracle database in the cloud.

Before we get to the ODI parts let's look at the basics, this is mentioned in the DBaaS documentation but sometimes it's simpler to read what someone has done than follow the doc.....

If you want to be using ODI or other remote capabilities such as ssh, sftp then before creating the Oracle database instance in the cloud you should generate a secure private key-public key pair. The public key gets used when you create the Oracle database instance in the cloud and the private key is used by SSL tools (such as sftp, ssh or the driver used) to securely connect to the cloud. 

When you create the key using something like PUTTY, then ensure you save the public key, private key and export the key using the OpenSSH key option also. ODI actually needs the OpenSSH format right now as the version of a library it depends on supports this.


You can see where the public key is provided below in the Instance Configuration section.....


The great news about the DBaaS capabilities is that it is all very familiar for Oracle database folks also the service itself can be managed from the command line - so as well as the web pages console and EM etc, you can use the command line and work the way you are used to.

Anyway, back on course... when you have the instance up and running it's time to have some fun with it!

File Transfer to the Oracle Public Cloud

In ODI you can use the OdiSftpPut/Get tool in a package, procedure or KM to transfer data to/from the cloud. You can see in the example below the OdiSftpPut tools is being used to transfer a file 'm.csv' from the local filesystem (d:\data) to the directory (/home/oracle) in the cloud. The private key is specified in the property 'SSH Identity File' and the key file password is specified in 'Remote User Password'. The OS user to use for the ftp is specified as 'oracle' in the property 'Remote User Name'.

Very simple. The DBaaS instance has OS users created when initialized you can read more about the users 'opc' and 'oracle' in the DBaaS documentation.

Transforming Data in the Oracle Public Cloud

Mappings are used to transform data from one representation to another. In this example you will see how the file staged in the Oracle Public Cloud is integrated with an Oracle target - just like standard ODI on premise use cases. It is no different. Below you can see the image has the logical mapping at the top, with the file being mapped to an Oracle target table, then the middle part of the image shows the physical design, the map uses the LKM File to Oracle (External Table) KM to create an external table on top of the file in the cloud and then the target is integrated with the Oracle Insert KM. 

When I execute the mapping all of the statements to transform are executed in the OPC - in this particular design everything is executed inside the Oracle database.

The ODI data server definition is using a custom driver (here) which extends the Oracle JDBC driver. The driver creates a SSH tunnel between the host executing the connect and the instance in the cloud. This means all ODI objects such as procedures, mappings and so forth that execute statements on regular Oracle systems can execute them on the cloud instances too. I actually created my demonstration schemas and granted all the permissions using a procedure in ODI. The procedure is shown below, see the snippet of the statements creating the users - the target logical schema was my DBAAS_INSTANCE.

Let's dig under the covers and have a look at how the physical schema is defined. You'll need the driver, and have it copied into your oracledi/userlib directory (or wherever your agent is installed if using an agent). You can then define the connection, specify the database user name and password for that database user;

Then you specify the driver, the driver you need to download and mentioned above. The URL is of the form of the Oracle JDBC driver. The difference is in how you specify the host, port and sid/service. The sid/service are your actual cloud service details. Since we are using the SSH tunnel technique, we actually specify the local host and a port number (default 5656) on the local host.

The properties to configure the SSH tunnel are defined either in the properties panel or in a file specified in the properties panel. I've decided here to use the file approach, so can specify the file name in the property propertiesFile.

In my example, this file contains;

  • sslUser=oracle
  • sslHost=<my_cloud_ip_address>
  • sslRHost=<my_cloud_ip_address>
  • sslRPort=1521
  • sslPassword=my_private_key_password
  • sslPrivateKey=D:\\credentials\\dbcloud12c_private_key_openssh.ppk

That is all that is needed and you can be very creative using all the powers of Oracle in the cloud and ODI for integration. Here is a summary of the properties supported by the driver.

Property Name Property Description
sslUser The OS user to connect via SSL with.
sslHost The address of the host to SSL to.
sslRHost The tunnel can be made between the client through the SSL host to this host. In my case this was the same as the SSL host.
sslRPort The port to tunnel to. The Oracle listener is often run on 1521, so this is the default if this property is not specified.
sslPassword The password for the private key file. In ODI you must use OpenSSH formatted private key file.
sslPrivateKey The SSL private key file location.
sslLPort By default the local port used is 5656, it can be changed with this property. You must reference this port number in the URL also.

The driver is a fairly simple wrapper around the Oracle JDBC driver, it leverages SSL tunneling to forward requests on a secure port to the Oracle TNS listener. This technique enables a very familiar way of connecting and interacting with the Oracle database, the driver is on java.net and is available to try and get feedback on. So try it and let us know what you think. Familiarity and consistency are very important both from the stance of the tooling and leveraging existing knowledge (including modules). This allows ODI users to work with the Oracle Public Cloud DBaaS instance just as they do with their on premise systems. Have fun!

Friday Oct 24, 2014

Automating ODI development tasks using the SDK

By Ayush Dash, Oracle Consulting Services

Oracle Data Integrator (ODI) 11g uses Interfaces to move data from a source to a target datastore. ODI Studio is a very convenient drag and drop UI to build, test and execute such interfaces. These interfaces can have different processing logic and complex transformations for disparate requirements, but there could be interfaces which behave in the exact same manner except the source and target are different.

Let’s say, I have these requirements defined, able to identify the different buckets of processing required and develop respective interfaces. That’s easily done! However, if I change the requirement such that I need 10 interfaces for each bucket, the requirement gets little complex and am I face increased level of effort. What about 100 such interfaces for each buckets? Much more effort required now! It’s the same repetitive set of tasks but it needs to be done for each interface for each bucket. The problem we face here is to somehow expedite and automate the entire sequence of steps for each bucket and reduce the redundant, manual development of ODI interfaces. As the number of interface grows, our problem (increase in effort) compounds.

Note, this is not limited to interfaces only, it can be extended to generate scenarios, packages etc.

Use Case:

In one of my ODI engagements, we had the below requirements with aggressive timelines.

  1. Incremental Loads from a Source Database to Incremental Database. (ODI interfaces)
  2. Data masking on Incremental Database (not an ODI task)
  3. Incremental loads from Incremental Database to Target Database. (ODI Interfaces)

This had to be done for Oracle and PeopleSoft (2 buckets) and a total of 2300 tables (So a total of 4600 interfaces. 2300 interfaces for step 1 and 2300 for step 3) and eventually respective scenarios.

ODI SDK Groovy Scripts:

ODI Studio provides a Groovy Editor; a java based scripting language as part of its install. Groovy can be leveraged to work with ODI SDK and build automation scripts. Below is the list of scripts;

  • CreateProject – Creates an ODI Project with a ProjectName and FolderName.
  • ImportKnowledgeModules – Imports the specified Knowledgemodules to the KM directories.
  • CreateModels – Creates the source and target Models for existing Datastores.
  • CreateModelsandDatastore – Creates Datastores and Models.
  • CreateInterfaceIterative – Iterates through all the Source Datastores and generates an interface for each with respective Target Datastores.
  • CreateInterfaceIterativeandSetKMOptions – Creates Interfaces and set a KM options iteratively.
  • CreateScenario – Create scenarios for all the interfaces.
  • ExecuteScenario – Executes all the scenarios under all the interfaces.
  • CountInterface – Counts the no. of interfaces, can be used al validation.

The scripts and guide have been uploaded to the Oracle Data Integration project on Java.net: https://java.net/projects/oracledi.

All the scripts can be downloaded from here: https://java.net/projects/oracledi/downloads/download/ODI/SDK%20Samples/ODI%2011g%20SDK%20Automation/Automation%20Scripts/Automation%20Scripts.zip

Please refer to the ODI SDK Automation Guide for detailed steps: https://java.net/projects/oracledi/downloads/download/ODI/SDK%20Samples/ODI%2011g%20SDK%20Automation/ODI%2011g%20SDK%20Automation%20Guide.doc

Thursday Sep 25, 2014

ODI 12c - Migration from OWB to ODI - PLSQL Procedures Pt 2

In the first part of this blog post I showed you how PLSQL procedures could be invoked from ODI and help you benefit from supporting such functionality in the tool and increasing performance. Here in part 2, I'll show how the map can be changed in OWB and subsequently migrated using the migration utility.

Remember the original mapping design in OWB used a transformation operator to invoke the PLSQL procedure, the same mapping can be modified by replacing that transformation component with a construct operator to build the ref cursor and also a table function operator, see below for the original map design at the top and the modified one below with the ref cursor and table function;


The mapping can be configured to be further optimized by specifying extraction and loading hints in order to tune performance for your system;


You will have to enable the parallel DML capabilities for the OWB mapping also. With this design you can test/ensure it works and then use it in a migration. Why go through all this? You may want to try and optimize existing mappings in OWB, or when migration your map may be more complex and you do not wish to reconstruct it (perhaps the mapping is large and complex upstream from the PLSQL procedure). Doing this will save that work, you need to remove the PLSQL procedure plus target attribute mappings and insert the cursor and table function.

When the migration utility migrates this, it will migrate the entire logical design and also the hints you have specified. You will see the following mapping in ODI;

You will have to add the enable parallel DML code into the begin mapping command and then the code will be equivalent and performing as such. For details of the OWB to ODI migration utility see here, it's also worth checking various other useful resources such as the migration in action demo here and Stewart Bryson's 'Making the Move from Oracle Warehouse Builder to Oracle Data Integrator 12c' article here (useful tip on database link mechanics in ODI).

ODI 12c - Migrating from OWB to ODI - PLSQL Procedures

The OWB to ODI migration utility does a lot, but there are a few things it doesn't handle today. Here's one for our OWB customers moving to ODI who are scratching their heads on the apparent lack of support for PLSQL procedures in ODI... With a little creative work you can not only get those mappings into ODI but there is potential to dramatically improve performance (my test below improves performance by 400% and can be easily further tuned based on hardware).

This specific illustration will cover the support for PLSQL procedures (not functions) in OWB and what to do in ODI, OWB takes care of invoking PLSQL procedures mid-flow - it did this by supporting PLSQL row-based code for certain (not all) map designs out of the box - PLSQL procedure invocation was one case this was done. The PLSQL that OWB generated was pretty much as efficient as it could have been for PLSQL (used bulk collect and many other best practices you didn't have to worry about as a map designer) but it was limited in the logical map support (couldn't have a set-based operator such as join after a row based only operator such as a PLSQL transformation) - it was also PLSQL not SQL.

Here we see how with a simple pipelined, parallel enabled table function wrapper around your PLSQL procedure call, how you capture the same design in ODI 12c and/or get the mapping migrated from OWB. I think the primary hurdle customers have is what is the option going forward. To solve this, we will just leverage more of the Oracle database; table functions and parallelize the <insert your favorite word> out of it!

The mapping below calls a PLSQL procedure and OWB generated PLSQL row based code for this case, the target is getting loaded with data from the source table and the 2 output parameters of the PLSQL procedure;

When you try and migrate such a mapping using the OWB to ODI migration utility, you'll get a message indicating that the map cannot be migrated using the utility. Let's see what we can do! The vast majority of mappings are set-based, generally a very small subset are row based PLSQL mappings. Let's see how this is achieved in ODI 12c.

I did a test using the generated code from OWB - no tuning just the raw code for the above mapping - it took ">">">12 minutes 32 seconds to process about 32 million rows and invoke the PLSQL procedure and perform a direct path insert into the target. With my ODI 12c design using a very simple table function wrapper around the PLSQL procedure I can cut the time to 3 minutes 14 seconds!! Not only can I do this, but I can easily further optimize it to better leverage the Oracle database server by quickly changing the hints - I had a 4 processor machine, so that's about as much as I could squeeze out of it.

Here is my map design in ODI 12c;

The table function wrapper to call the PLSQL procedure is very simple, line 7 is where I call the PLSQL procedure, I use the object instance in the call and pipe the data when the call is made;

  1. create or replace function TF_CALL_PROC(input_values sys_refcursor) return TB_I_OBJ pipelined parallel_enable(PARTITION input_values BY ANY) AS
  2.   out_pipe i_obj := i_obj(null,null,null,null,null,null,null,null,null);
  3. BEGIN
  4.   LOOP
  5.     FETCH input_values INTO out_pipe.prod_id, out_pipe.cust_id,out_pipe.time_id,out_pipe.channel_id,out_pipe.promo_id,out_pipe.quantity_sold,out_pipe.amount_sold;
  6.     EXIT WHEN input_values%NOTFOUND;
  7.     MYPROC(out_pipe.prod_id,out_pipe.status,out_pipe.info);
  8.     PIPE ROW(out_pipe);
  9.   END LOOP;
  10.   CLOSE input_values;
  11.   RETURN;
  12. END;

This is a very simple table function (with enough metadata you could generate it), it uses table function pipelining and parallel capabilities - I will be able to parallelize all aspects the generated statement and really leverage the Oracle database. The above table function uses the types below, it has to project all of the data used downstream - whereas OWB computed this, you will have to do that.

  1. create or replace type I_OBJ as object (
  2.  prod_id number,
  3.  cust_id number,
  4.  time_id date,
  5.  channel_id number, 
  6.  promo_id number,
  7.  quantity_sold number(10,2),
  8.  amount_sold number(10,2),
  9.  status varchar2(10),
  10.  info number
  11.   );
  12. create or replace type TB_I_OBJ as table of I_OBJ; 

The physical design in ODI has the PARALLEL(4) hints on my source and target and I enable parallel DML using the begin mapping command within the physical design.

You can see in above image when using Oracle KMs there are options for hints on sources and targets, you can easily set these to take advantage of the hardware resources, tweak these to pump the performance throughput!

To summarize, you can see how we can leverage the database to really speed the process up (remember the 400%!), also we can still capture the design in ODI and on top of that unlike in OWB, this approach let's us carry on doing arbitrary data flow transformations after the table function component which is invoking our PLSQL procedure - so we could join, lookup etc. Let me know what you think of this, I'm a huge fan of table functions I think they afford a great extensibility capability.  

Tuesday Sep 09, 2014

ODI 12c - Models in Data Modeler and ODI

Ever wondered how to get your models from Oracle SQL Developer Data Modeler (SDDM) in to ODI data models? The most common practice is to generate and execute the physical DDL scripts from SDDM to the target database. Another technique is possible leveraging the ODI SDK and the SDDM SDK - that's what I will illustrate here. There is an example script posted here on the java.net site.

There is an end to end demo viewlet below to see the script in action, check it out;

https://blogs.oracle.com/dataintegration/resource/viewlets/odi_models_from_sddm_viewlet_swf.html

The viewlet shows the transformation script in action creating the ODI model from the Data Modeler design, here's a peek at the script in action;

In the viewlet you will see how I added the groovy scripting engine as an engine for SDDM - I can then leverage my groovy skills and the ODI SDK to build the script and provide useful capabilities.

This script is publicly available at link above, take, enhance and comment! Join the communities on LinkedIn for Oracle Data Integration and the OTN forum to learn and exhange with other members of the community.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« September 2015
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today