Friday Jul 26, 2013

The Best Data Integration for Oracle Exadata Comes from Oracle

In a previous blog post I talked about about how Oracle Exadata customers can migrate/consolidate their systems without downtime. In that blog post I mentioned that Oracle Data Integrator and Oracle GoldenGate offer unique and optimized data integration solutions for Oracle Exadata. For example, customers that choose to feed their data warehouse or reporting database with near real-time throughout the day, can do so without decreasing  performance or availability of source and target systems. And if you ask why real-time, the short answer is: in today’s fast-paced, always-on world, business decisions need to use more relevant, timely data to be able to act fast and seize opportunities. A longer response to "why real-time" question can be found in a related blog post.

If we look at the solution architecture, as shown on the diagram below,  Oracle Data Integrator and Oracle GoldenGate are both uniquely designed to take full advantage of the power of the database and to eliminate unnecessary middle-tier components. Oracle Data Integrator (ODI) is the best bulk data loading solution for Exadata. ODI is the only ETL platform that can leverage the full power of Exadata, integrate directly on the Exadata machine without any additional hardware, and by far provides the simplest setup and fastest overall performance on an Exadata system.

We regularly see customers achieving a 5-10 times boost when they move their ETL to ODI on Exadata. For  some companies the performance gain is even much higher. For example a large insurance company did a proof of concept comparing ODI vs a traditional ETL tool (one of the market leaders) on Exadata. The same process that was taking 5hrs and 11 minutes to complete using the competing ETL product took 7 minutes and 20 seconds with ODI. Oracle Data Integrator was 42 times faster than the conventional ETL when running on Exadata.This shows that Oracle's own data integration offering helps you to gain the most out of your Exadata investment with a truly optimized solution. 

GoldenGate is the best solution for streaming data from heterogeneous sources into Exadata in real time. Oracle GoldenGate can also be used together with Data Integrator for hybrid use cases that also demand non-invasive capture, high-speed real time replication. Oracle GoldenGate enables real-time data feeds from heterogeneous sources non-invasively, and delivers to the staging area on the target Exadata system. ODI runs directly on Exadata to use the database engine power to perform in-database transformations. Enterprise Data Quality is integrated with Oracle Data integrator and enables ODI to load trusted data into the data warehouse tables. Only Oracle can offer all these technical benefits wrapped into a single intelligence data warehouse solution that runs on Exadata.

Compared to traditional ETL with add-on CDC this solution offers:

  • Non-invasive data capture from heterogeneous sources and avoids any performance impact on source
  • No mid-tier; set based transformations use database power
  • Mini-batches throughout the day –or- bulk processing nightly which means maximum availability for the DW
  • Integrated solution with Enterprise Data Quality enables leveraging trusted data in the data warehouse

In addition to Starwood Hotels and Resorts, Morrison Supermarkets, United Kingdom’s fourth-largest food retailer, has seen the power of this solution for their new BI platform and shared their story with us. Morrisons needed to analyze data across a large number of manufacturing, warehousing, retail, and financial applications with the goal to achieve single view into operations for improved customer service. The retailer deployed Oracle GoldenGate and Oracle Data Integrator to bring new data into Oracle Exadata in near real-time and replicate the data into reporting structures within the data warehouse—extending visibility into operations. Using Oracle's data integration offering for Exadata, Morrisons produced financial reports in seconds, rather than minutes, and improved staff productivity and agility. You can read more about Morrison’s success story here and hear from Starwood here.

I also recommend you watch our on demand webcast on Zero-Downtime Migration to Oracle Exadata Using Oracle GoldenGate: A Customer Case Study and download free resources on Oracle Data Integration products to learn more about their powerful architecture and solutions for data-driven enterprises.

Tuesday Jul 09, 2013

ODI - Loading MongoDB (API as Target)

In this post I will show how to load documents into a MongoDB collection. The interface design looks just like all other ODI interfaces, but behind the scenes the KM configured in the physical design uses the MongoDB SDK (see MongoDB SDK here) to insert the documents. The target datastore below represents a MongoDB document, the columns are the keys in the document. Each row is inserted as a document, and each column is a key, the column value is the value. The ENAME value below is shown as a complex JSON value.

The IKM I have used is a multi-connect IKM, the source is a SQL data source and the target uses a MongoDB groovy command. The heart of the IKM to insert the documents into the collection has a SQL select as the source command and the following groovy code for the target command;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient(" <%=odiRef.getOption("MONGO_SERVER")%> ", <%=odiRef.getOption("MONGO_PORT")%> );
  3. DB db = mongoClient.getDB("<%=odiRef.getOption("MONGODB")%>");
  4. DBCollection coll = db.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")
  5. BasicDBObject doc = new BasicDBObject();

  6. <%=odiRef.getColList(" ", "doc.put(\u0022[COL_NAME]\u0022, \u0022#[CX_COL_NAME]\u0022);", " \n ", "", "((INS and !TRG) and REW)")%>

  7. coll.insert(doc);

The odiRef.getColList method call above generates code for every target column, the code performs a doc.put invocation to add the key-value pairs into the document. For example this is the code generated and executed based on the interface design discussed above;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient("DALLAN-SVR", 27017);
  3. DB db = mongoClient.getDB("test");
  4. DBCollection coll = db.getCollection("testCollection")
  5. BasicDBObject doc = new BasicDBObject();

  6.  doc.put("EMPNO", "#EMPNO"); 
  7.  doc.put("ENAME", "#ENAME"); 
  8.  doc.put("JOB", "#JOB"); 
  9.  doc.put("MGR", "#MGR"); 
  10.  doc.put("HIREDATE", "#HIREDATE"); 
  11.  doc.put("SAL", "#SAL"); 
  12.  doc.put("COMM", "#COMM"); 
  13.  doc.put("DEPTNO", "#DEPTNO");

  14. coll.insert(doc);

 This is a simple illustration of how to load documents into MongoDB. We can go into the MongoDB command line and execute the command to see all objects in the collection and get the list of documents, below you can see a preview of executing db.testCollection.find()

  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a6"), "EMPNO" : "7369", "ENAME" : "{ NAME : SMITH, DESCR : 22 }", "JOB" : "CLERK", "MGR" : "7902", "HIREDATE" : "1980-12-17 00:00:00.0", "SAL" : "801", "COMM" : "", "DEPTNO" : "20" }
  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a7"), "EMPNO" : "7499", "ENAME" : "{ NAME : ALLEN, DESCR : 22 }", "JOB" : "SALESMAN", "MGR" : "7698", "HIREDATE" : "1981-02-20 00:00:00.0", "SAL" : "1601", "COMM" : "300", "DEPTNO" : "30" }

 You can see the key:value pairs in our document. For those MongoDB gurus, you'll notice in the 'complex' data illustration, this is really a string and not a MongoDB complex object - that discussion is for another day.

This post is not just about MongoDB, but also a useful post on how to integrate APIs as a target in a data flow. 

Wednesday Aug 29, 2012

Load Plan article in Oracle Magazine

Timely article in Oracle Magazine on ODI Load Plans from Mark Rittman in the current issue, worth having a quick read of the article and play with the sample which is included if you get the time. Thanks to Mark for investing the time and energy providing such useful information to the community.

Mark goes over the main benefits of the load plan in the article. Interested to hear any creative use cases or comments in general.

Wednesday Jun 01, 2011

Load plans - getting up and running

To get up and running with load plans there are a few things you'll need to do in order to execute them. One is to setup your agent, if you haven't got one, there is a useful ODI 11g OBE below that explains how to configure a standalone agent;

In ODI there is also a change in the requirements on the database schema you use (for Oracle anyway), in that there is an additional execute privilege needed on DBMS_LOCK which is needed for load plans, see doc below;

If you have not granted this privilege you will get the error 'PLS-00201: identfiier DBMS_LOCK must be declared' like below;

One you have granted the execute privilege on DBMS_LOCK to the work schema you are good to go and execute the load plans.

Sunday May 29, 2011

What’s new with Oracle Data Integrator - Load Plans

Oracle Data Integrator 11gR1 PS1 introduces a major new feature called Load Plans. This post will give you an overview of this feature.

The documentation defines the load plan as an executable object that can contain a hierarchy of steps that can be executed conditionally, in parallel or in series. The leaves of this hierarchy are Scenarios. Packages, interfaces, variables, and procedures can be added to Load Plans for executions in the form of scenarios.

In a nutshell, Load Plans are extremely powerful objects for organizing and launching scenarios in a production context. They should help you getting rid of manual scripts coded for starting scenarios in the correct order, and of packages used to launch other scenarios in cascade.

Creating a Load Plan 

Load Plans appear in both the Designer and Operator Navigator as shown below. They are available for edition in a development and production repositories, and can be organized into scenario folders.

Creating a load plan is pretty much straightforward: Right-click and select New (Load Plan), specifiy a Name for the Load Plan. As a Load Plan will be launching scenarios, you can define at that level how these scenarios will be logged (Log Session, and Log Session Step, etc options).

The real work with Load Plans takes place on the Steps tab. There, you can define a hierarchy of steps. The leaves of this hieararchy will be Scenarios that will be started in sequence, in parallel and/or conditionally based on the values of variables.

In the example below, the Datawarehouse Load Plan does the following in a sequence (serial step).

  1. First it runs an Initialization step (this step starts a scenario called INITIALIZATION), 
  2. It does a Refresh [of the] Dimensions in Parallel (more information below.)
  3. Then it evaluates the value of the IS_LOAD_FACT variable. This variable passed as a startup parameter of this load plan.
    • If this value if 1, it run the LOAD_SALES then the FINALIZE_FACT_LOADING scenarios
    • If this value if 2, it run the LOAD_SALES scenario only
    • Otherwise, it runs the FINALIZE_FACT_LOADING scenario

Refreshing the Dimensions in Parallel implies that we perform two actions simultaneously:

  • Load the Products (this is done by the LOAD_PRODUCTS scenario)
  • Load Geographies (This is a package loading a set of country/regions/cities tables) and then load Customers (this is a second package).
This steps embeds a serial step within a parallel step. It is possible in load plans to embed steps within steps, creating a complete execution flow in the hierarchy.

To add the steps, you can either use the wizards (available with the "+" button in the toolbar). You can also drag and drop scenarios, interfaces, procedures, etc directly from the Designer tree view into the Step hierarchy to automatically create a scenario for this component and add this scenario as a step in the load plan.

If you prefer top-down development, you can create a load plan and add using the wizard scenarios that do not exist yet. In the example below, the scenario added with the wizard does not exist yet, and by using version number -1, we simply tell the load plan to execute the latest version of this scenario.

In addition, you can, from a load plan step, access the object from which the scenario was created or regenerate the scenario.
Reorganizing the load plan is also extremely simple as it is just matter of drag and drop !

Running Load Plans 

After saving your Load Plan this, you can  run it by clicking the execute button in the toolbar. The load plan running will be shown in the Load Plan Executions accordion on the Operator. The Steps tab of the Load Plan Run will show you the steps executed, their status and statistics. This whole tab reflects the executions in progress, and can be refreshed while the executions take place.
The sessions started by the load plan still appear in the Session's list, but the Steps tab is 10 times more useful to monitor the overall execution of all these sessions. By clicking on the Session ID link (in blue) in this tab, you open the Session editor and can drill down into the session.

Like Scenarios, Load Plans can also be started from a command line or a web service interface.They can of course be scheduled using external scheduling or the built-in scheduler.

Note that  Load Plans require a JEE or Standalone agent for running. They cannot run within the Studio Local Agent. This is due to the fact that the Load Plan execution flow is distributed across the agents running the sessions started from the Load Plan. Using this architecture, there is no single technical failure point that may prevent a load plan from proceeding its execution flow when the execution takes place on multiple agents.

Exception Handling 

Exception Handling and Restartability behavior are one of coolest things in the Load Plans.

An Exception is simply a group of steps (like a mini Load Plans) that can be executed in case of a failure. 

In the example above, I have defined two Exceptions (Minor Exception and Major Exception). They will start a scenario that mail the administrator. The major one in addition starts a scenario to dump the log. These exceptions can be triggered on step failure.

Every step has  a property that indicates when a whether the exception should be executed when this steps fails, and whether the failure on this step should be raised to the parent step in the hierarchy. By raising a failure, you can escalate the failure, up to the root step, which fails the whole Load. By ignoring the failure, you flag this step's failure as a minor error.

In the example below, if any of the parallel steps refreshing the dimensions fails ("Max Error Child Count=0") the Refresh dimension is considered as failed. In the even of such failure, I will run the Minor Exception and continue the load. Even if not all dimensions are refreshed, the fact can still be loaded, as I am using ODI data integrity framework to isolate facts that would reference dimensions not correctly refreshed.

This example  also  illustrate the restartability for such a step. If I decide to restart this load plan, only the failed children would be restarted, as defined by the Restart Type option.

Note that when restarting an existing load plan, ODI does not overwrite the first load plan run, but copies it and restarts the copy. Each Load Plan Run is preserved for error identification and tracking.

Load Plans vs. Packages

Users already knowledgeable of ODI should now wonder: Are Load Plans a new type of packages? Well, although there are similarities between these two objects, they do not have the same objective: 

  • Packages are (simply said) technical workflows with a strong transaction nature, produced mainly by data integration developers.
  • Load Plans aim at making easier the functional design and administration of a production, and are produced by production users and data integration project leads/architects.

Let's discuss the differences:

Capability Load Plans  Packages Comments
Edition Design-Time and Run-Time  Design Time. Packages are compiled into scenarios at run-time.  If production needs to modify the execution flow, it is preferable to deliver a load plan.  
Starting/Monitoring UI, Command-Line, Web Services, Scheduling UI, Command-Line, Web Services, Scheduling Both features are equivalent.
Transactions. Each Load Plan Step contains its own transactions. Package Steps may share transaction. If the workflow requires a transaction that spawns accross several steps, use a package.
Parallelism Yes, using Parallel Steps. Parallel execution is easy to follow in the Operator. Yes, by starting other scenarios. Parallel execution is hard to follow in the Operator. When there is a strong need for parallel step execution, use preferably Load Plans. 
Restartability Yes. Status of previous runs is persisted. Yes. Status of previous executions is overwritten. Database transactions are not  continued, hence restarting the whole package is often needed. Due to their transactional nature and the fact that their execution is overwritten by the new execution, packages are often restarted as atomic units of work. Load Plan provide better flexibility for restartability.
Branching/Loops Branching (Case/When is supported) Loops are not supported. Branching and Looping are supported. If there is a need for looping in a workflow, use preferably packages.

That's all for today. Stay tuned for more deep dives in the new features !!!


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« July 2016