Monday Nov 25, 2013

In-Session Parallelism in ODI12c

contributed by Ayush Ganeriwal

In this post we shall discuss the new in-session parallelism introduced in the ODI12c release. ODI12c now comes with the intelligence to concurrently execute part of the mappings that are independent of each other. For example the data load into C$ tables from two disparate data source is done in parallel as these operations are independent of each other and can be done simultaneously. Similarly, such parallelism can be achieved for flow control and target loads too in different use cases. ODI12c automatically identifies the session tasks that can be executed concurrently and generates code for their parallel execution. Users can visually see what part of the mapping would be executed in parallel and if needed this behavior can be changed to execute tasks sequentially if the business logic demands so.

Parallelism Display in Deployment Plan

A typical deployment plan is shown above where the data from a file and relations data store are first loaded into C$ tables in staging area and then joined and loaded into target table. Here different source stores and their corresponding C$ tables, join components and, the target data store are organized in separate blue and yellow boxes which are called execution units and execution unit groups respectively. Execution unit group contains one or more execution units which are independent of each other and can be executed concurrently. Execution unit contains the set of operations that need to be executed serially. In above example the SOURCE_GROUP contains two execution units which are independent of each other and are executed in parallel. The TARGET_GROUP contains only one execution unit indicating absence of parallelism.

How is parallelism achieved within a session?

The session tasks within a step are now generated in a hierarchical manner similar to a load plan steps hierarchy. There are a couple of new type of tasks introduced namely Serial task and Parallel task. These are container tasks which can have one or more child tasks under them. As the name suggests the child tasks under a serial container task would be executed serially whereas the children tasks of parallel container task would be executed in parallel. These container tasks can be nested within each other resulting in multiple levels in the task hierarchy. Depending upon the deployment plan, ODI12c generates the hierarchy of these serial and parallel tasks to achieve in session parallelism. Here is example of a serial and parallel task hierarchy.

Mapping segments candidate for Parallelism

Following below are some of the common mapping parts under which tasks are usually independent of each other and can be performed simultaneously.

1. Loading source data into staging area

2. Loading data into multiple target tables

3. Performing flow control on I$ table

These are some of the common candidate area for parallelism and depending upon the mapping flow there may be other areas that could be executed in parallel. Let’s look at an example of one of the above common parallelism candidates and see how the behavior can be altered by tweaking the deployment plan.

Loading source data into staging area

In this mapping a file and DB table are joined and then loaded in the target table. ODI generates following deployment plan for it so that the sources can be loaded in collector tables in staging are concurrently and then join and target load related tasks performed sequentially.

On executing the mapping with this deployment plan the tasks hierarchy is generated as follows. We can notice here that following set of tasks are organized under parallel container task

a. LKM tasks for dropping and recreating C$ tables for each data sources

      b. LKM task to load data into each of the C$ tables

Suppose we do not want these C$ tables to be loaded in parallelly and rather want them to load sequentially then we can achieve that by simply modifying deployment plan by dragging and dropping one of the execution unit outside of the execution unit group. This would create a separate execution unit group for each of the source dataset forcing all the execution units to run serially.

On executing the task hierarchy of it is generated as follows showing serial execution of all tasks.

Threads configuration for parallel tasks

ODI12c runtime spawns a separate thread for each of the parallel tasks in a session. This effectively means that execution of a highly parallelize mapping may take high number of threads exhausting all system resources. ODI provide two level of configuration to control the number of such parallel threads in the physical agent configuration. One to control the maximum number of threads in the agent and other is to control max thread count within a session. A parallel task would be started only if a new thread can be spawned according to these thread configurations. Therefore, these should be kept at sufficiently large levels.

Connection management for parallel task

Since each of the parallel tasks is executed by a separate thread, they should not work on the common connection for any non-transactional behavior. So for performing the non-transactional operations each of the parallel tasks acquires its own parallel connection from the connection pool. Therefore, for highly parallel mappings the connection pool size should be configured appropriately according to the level of parallelism.

One other point to keep in mind here is that since each parallel task acquires a new auto-commit connection closes it on task completion thus the OnConnect/OnDisconnect tasks would be executed while creating and closing of such connection. Thus you may see multiple OnConnect/OnDisconnect tasks entries for a mapping having parallel tasks.

Another implication of such parallelism is that any parallel tasks cannot rely on any alteration in database connection done by the earlier tasks because each of the parallel tasks would get a new connection which would be unaware of any operations performed by earlier tasks.

For operations performed on a transaction connection the parallel tasks do not have much flexibility and has to perform operations one by one. For such transactional connections synchronization is maintained so that only one parallel task can perform operation on the connection. Thus having parallel tasks participating in such transactions may actually degrade performance due to the extra locking/unlocking for such synchronization.

The take-away

ODI12c allows parallelizing parts of a mapping to achieve extreme performance. It has the intelligence to automatically identify parallelizable parts within a mapping and also provide flexibility to achieve serial behavior for special business needs. Such parallelism would result in an improved performance and for even higher levels of parallelism the connections pool and thread count can be further fine tuned.

For more uptodate Oracle Data Integration news also follow the twitter feeds from @ORCLGoldenGate and @nmadhu2k3.

Thursday Nov 21, 2013

Recap of Oracle GoldenGate 12c and Oracle Data Integrator 12c Launch Webcast

Last week we had a great video webcast for Oracle GoldenGate 12c and Oracle Data Integrator 12c. Our development executives, Brad Adelberg and Tim Hall,  talked about the new features and how the new release helps with delivering future-ready data integration solutions with extreme performance and high IT productivity. 

If you missed the webcast you can watch it on demand via the following page:
On-Demand Video Webcast: Introducing 12c for Oracle Data Integration

In previous blogs we have talked about the new features of Oracle GoldenGate 12c and Oracle Data Integrator 12c, so I am not going to repeat them here. But worth repeating are what our customers and partners say about the new 12c release.

SolarWorld’s Senior Database Administrator, Russ Toyama was in the studio with our executives for the launch webcast, and Russ discussed their GoldenGate implementation for SolarWorld's manufacturing process. SolarWorld is the largest U.S. solar panel manufacturer for more than 35 years and needed operational intelligence to continuously improve the quality of its products, while ensuring the systems operate with high performance and stability.

SolarWorld uses GoldenGate to move data from multiple manufacturing databases into a single decision support system database in real time, freeing up the OLTP systems for transaction processing, which improves their performance and stability. The DSS database is very flexible in meeting reporting needs, and provides a comprehensive view of multiple manufacturing processes. This provides the traditional roles of reporting and engineering analysis to continuously improve product quality, yield and efficiency, and enables real time monitoring of the production manufacturing process. Using this real-time monitoring capability SolarWorld is able to detect the deviations from the norm right away, and take action to remedy or understand the situation. This not only improves production quality, but also improves cost management. The manufacturing process is a series of steps building upon the previous, so if there is an issue, it needs to be corrected as soon as possible to prevent waste and reduce manufacturing costs.

For GoldenGate 12c we heard from Surren Partabh, CTO for Technology Services for BT. Surren explained the role of Oracle GoldenGate for their private cloud initiative and how they have improved customer experience and availability, while managing costs as well.  Surren highlighted that Oracle's data integration product family is one of the cornerstones for BT"s cloud migration project and enable to migrate to cloud simply, and in an agile manner. BT used GoldenGate to build a replication hub to help with the migration from legacy systems to the cloud with a reliable fallback strategy. Surren also commented on the tight integration between GoldenGate 12c and Oracle Data Integrator 12c,  saying that it is a "step in the right direction" as it simplifies the actual installation, configuration, management, and monitoring of solutions. You can watch the interview with BT's Surren Partabh here.

In the launch webcast we also heard from Mark Rittman, CTO of Rittman Mead Consulting. Mark talked about the Oracle Data Integrator 12c new features in great depth, given that he was closely involved with the Beta testing program. Mark shared his opinion that the new flow-based design interface is the most critical feature of this 12c release and will bring major productivity gains for developers. He added that interoperability with Oracle Warehouse Builder for easier migration and tighter integration with Oracle GoldenGate are very valuable for customers. Mark also discussed the new release of Oracle BI Applications and how its use of Oracle Data Integrator for data movement and transformations simplifies life for Oracle BI Apps customers.  The complete interview with Mark Rittman is available for you as well.

If you missed the launch webcast last week, I hope you take the time to watch it on demand and discover how Oracle has changed the data integration and replication technology space with Oracle Data Integrator 12c and Oracle GoldenGate 12c. For more information including white papers and podcasts you can also download free resources.

Friday Nov 15, 2013

ODI 12c - OWB Migration, Integration Plus Components

Today is another big day for ODI and OWB! Get your hands on the much anticipated functionality covering everything from new components added into the ODI 12c mapping designer, to the ETL design migration utility (announced here) to move designs from OWB into ODI 12c which all complement the integration and auditing of OWB jobs from ODI which was included in 12c.

The patches for ODI and OWB are great news on many fronts, not only in being a major point in the roadmap. As well as the OWB migration this is great for the ODI the product, the new components provide a lot of great capabilities (and will help with a lot of those 'how do I do this' questions).

Included in the ODI patch you'll find a component bundle including transformation components from pivot/unpivot through subquery filtering on to table functions and the like. All great examples of the product's extensibility - a fusion of declarative design and technology specific knowledge modules in harmony. Lots of good stuff that you will hear much more of.

There are multiple patches - if you are only interested in ODI features and enhancements then all you need is the ODI patch below;

  • ODI installed (ODI plus patch number 17053768)

If you want the OWB migration utility then you will need the following patch to OWB as well as the ODI patch above;

  • OWB installed (OWB plus patch number 17547241)

If you install the ODI patch you will find some new components in the palette, the components let you express many more transformations - these were transformations which could be designed in OWB, now in ODI as well as the components we have knowledge modules to support their code generation for heterogenous environments.

The ODI 12c release included a new ODI Tool and support for monitoring OWB jobs - jobs can be initiated using the tool OdiStartOwbJob tool which is available for use from within packages, procedures, KMs or command line. The jobs can be monitored from ODI, this includes OWB mappings and process flows. The patch provides the migration utility plus components.

In subsequent postings we will deep dive into these areas. 

Advanced Replication for The Masses – Oracle GoldenGate 12c for the Oracle Database.

Customer requirements for data replication are reaching very high levels.   The demand is for continuous, uninterrupted flow of data in real-time while lowering IT costs and improving efficiencies. This part of the market constitutes the middle to upper end of the data replication market and customers are broadly educated and well versed in replication technologies as never before. As a result, they demand well-designed, well-engineered, and well-crafted products – possessing advanced features that compliment their evolving IT systems.

Vendors offer a wide variety of replication products and services, but with the exception of Oracle these other vendors rely on brute force log scraping, inefficient fetching of data types not supported or encrypted, and heavy reliance of manual configuration and tuning when dealing with highly tolerant RAC systems or high data volumes.    Oracle GoldenGate 12c is at the forefront and differentiates itself in the market by providing a truly well-designed, well-engineered and smart solution for the Oracle database.

What are the key new features of Oracle GoldenGate 12c for the Oracle database?

  • Support for Database 12c Multitenant feature which allows smooth transitions to private cloud deployments.
  • The new Integrated Delivery for Oracle Database. This feature enables simpler and more powerful configurations with a streamlined process model that coordinates multiple parallel apply processes, which improve performance for high volume implementations. 
  • Enhanced high availability via integration with Data Guard Fast Start Failover
  • Improved security with tighter integration with the Oracle wallet

Oracle GoldenGate is at the vanguard of extreme performance and the one standout feature of the 12c release is Integrated Delivery.    Oracle GoldenGate now leverages a lightweight streaming API built into Database 12c for Oracle GoldenGate. The Smart Product Design of the Oracle GoldenGate 12c Integrated Delivery feature enables simpler and more powerful configurations that coordinate multiple parallel apply processes to improve performance for high volume implementations. 

This high performance feature leverages numerous database parallel apply servers for automatic dependency aware, parallel apply. What this means is that Integrated Delivery will automatically balance transactions across these apply processes and coordinate the transactions to maintain order and insure referential integrity, while delivery data at speeds not feasible by other vendors.

Oracle GoldenGate 12c has many other new features for the Oracle database. You can learn more about GoldenGate 12c by downloading our new resources and watching the replay of the executive video webcast with customer and partner speakers.

Monday Nov 11, 2013

ODI 12c - Parallel Table Load

In this post we will look at the ODI 12c capability of parallel table load from the aspect of the mapping developer and the knowledge module developer - two quite different viewpoints. This is about parallel table loading which isn't to be confused with loading multiple targets per se. It supports the ability for ODI mappings to be executed concurrently especially if there is an overlap of the datastores that they access, so any temporary resources created may be uniquely constructed by ODI. Temporary objects can be anything basically - common examples are staging tables, indexes, views, directories - anything in the ETL to help the data integration flow do its job. In ODI 11g users found a few workarounds (such as changing the technology prefixes - see here) to build unique temporary names but it was more of a challenge in error cases.

ODI 12c mappings by default operate exactly as they did in ODI 11g with respect to these temporary names (this is also true for upgraded interfaces and scenarios) but can be configured to support the uniqueness capabilities. We will look at this feature from two aspects; that of a mapping developer and that of a developer (of procedures or KMs).

1. Firstly as a Mapping Developer.....

1.1 Control when uniqueness is enabled

A new property is available to set unique name generation on/off. When unique names have been enabled for a mapping, all temporary names used by the collection and integration objects will be generated using unique names. This property is presented as a check-box in the Property Inspector for a deployment specification.

1.2 Handle cleanup after successful execution
Provided that all temporary objects that are created have a corresponding drop statement then all of the temporary objects should be removed during a successful execution. This should be the case with the KMs developed by Oracle.

1.3 Handle cleanup after unsuccessful execution

If an execution failed in ODI 11g then temporary tables would have been left around and cleaned up in the subsequent run. In ODI 12c, KM tasks can now have a cleanup-type task which is executed even after a failure in the main tasks. These cleanup tasks will be executed even on failure if the property 'Remove Temporary Objects on Error' is set.

If the agent was to crash and not be able to execute this task, then there is an ODI tool (OdiRemoveTemporaryObjects here) you can invoke to cleanup the tables - it supports date ranges and the like.

That's all there is to it from the aspect of the mapping developer it's much, much simpler and straightforward. You can now execute the same mapping concurrently or execute many mappings using the same resource concurrently without worrying about conflict. 

2. Secondly as a Procedure or KM Developer.....

In the ODI Operator the executed code shows the actual name that is generated - you can also see the runtime code prior to execution (introduced in, for example below in the code type I selected 'Pre-executed Code' this lets you see the code about to be processed and you can also see the executed code (which is the default view).

References to the collection (C$) and integration (I$) names will be automatically made unique by using the odiRef APIs - these objects will have unique names whenever concurrency has been enabled for a particular mapping deployment specification. It's also possible to use name uniqueness functions in procedures and your own KMs.

2.1 New uniqueness tags 

You can also make your own temporary objects have unique names by explicitly including either %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG in the name passed to calls to the odiRef APIs. Such names would always include the unique tag regardless of the concurrency setting.

To illustrate, let's look at the getObjectName() method. At <% expansion time, this API will append %UNIQUE_STEP_TAG to the object name for collection and integration tables. The name parameter passed to this API may contain  %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. This API always generates to the <? version of getObjectName()

At execution time this API will replace the unique tag macros with a string that is unique to the current execution scope. The returned name will conform to the name-length restriction for the target technology, and its pattern for the unique tag. Any necessary truncation will be performed against the initial name for the object and any other fixed text that may have been specified.

Examples are:-
  1. <?=odiRef.getObjectName("L", "%COL_PRFEMP%UNIQUE_STEP_TAG", "D")?>
  1. <?=odiRef.getObjectName("L", "EMP%UNIQUE_STEP_TAG_AE", "D")?>
Methods which have this kind of support include getFrom, getTableName, getTable, getObjectShortName and getTemporaryIndex. There are APIs for retrieving this tag info also, the getInfo API has been extended with the following properties (the UNIQUE* properties can also be used in ODI procedures);
  • UNIQUE_STEP_TAG - Returns the unique value for the current step scope, e.g. 5rvmd8hOIy7OU2o1FhsF61 Note that this will be a different value for each loop-iteration when the step is in a loop.
  • UNIQUE_SESSION_TAG - Returns the unique value for the current session scope, e.g. 6N38vXLrgjwUwT5MseHHY9
  • IS_CONCURRENT - Returns info about the current mapping, will return 0 or 1 (only in % phase)
  • GUID_SRC_SET - Returns the UUID for the current source set/execution unit (only in % phase)
The getPop API has been extended with the IS_CONCURRENT property which returns info about an mapping, will return 0 or 1. 

2.2 Additional APIs

Some new APIs are provided including getFormattedName which will allow KM developers to construct a name from fixed-text or ODI symbols that can be optionally truncate to a max length and use a specific encoding for the unique tag. It has syntax getFormattedName(String pName[, String pTechnologyCode]) This API is available at both the % and the ? phase.  The format string can contain the ODI prefixes that are available for getObjectName(), e.g. %INT_PRF, %COL_PRF, %ERR_PRF, %IDX_PRF alongwith %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. The latter tags will be expanded into a unique string according to the specified technology. Calls to this API within the same execution context are guaranteed to return the same unique name provided that the same parameters are passed to the call. e.g.
  • <%=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")%>
  • <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")?>
  • C$_MY_TAB7wDiBe80vBog1auacS1xB_AE
  • <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG.log", "FILE")?>
  • C2_MY_TAB7wDiBe80vBog1auacS1xB.log

2.3 Name length generation 

As part of name generation, the length of the generated name will be compared with the maximum length for the target technology and truncation may need to be applied. When a unique tag is included in the generated string it is important that uniqueness is not compromised by truncation of the unique tag. When a unique tag is NOT part of the generated name, the name will be truncated by removing characters from the end - this is the existing 11g algorithm. When a unique tag is included, the algorithm will first truncate the <postfix> and if necessary  the <prefix>. It is recommended that users will ensure there is sufficient uniqueness in the <prefix> section to ensure uniqueness of the final resultant name.


To summarize, ODI 12c make it much simpler to utilize mappings in concurrent cases and provides APIs for helping developing any procedures or custom knowledge modules in such a way they can be used in highly concurrent, parallel scenarios. 

Thursday Nov 07, 2013

Time to Get Started with Oracle Data Integrator 12c!

It is time to get started with Oracle Data Integrator 12c!

We would like to highlight for you a great place to begin your journey with ODI.  Here you will find the Getting Started section for ODI, which provides a few options.

Step 1 –

Start by downloading the Getting Started (PDF). The document provides general background information and detailed examples to help you learn how to use Oracle Data Integrator. This document guides you through a first project with Oracle Data Integrator 12c, and the instructions in this document are required for using the Getting Started Demonstration Environment.

If you would like to run the Getting Started Demonstration Environment on your system go to Step 2 A.

If you would like to run the Getting Started Demonstration Environment installed and pre-configured in a Virtual Image, go to Step 2 B.

Step 2 -

A-   A -  If you would like to run the Getting Started Demo Environment on your system, you will want to then download the Demo Environment for Getting Started (ZIP) archive containing ODI metadata, configuration scripts and sample data.

B-   B -  Alternatively, a pre-configured virtual machine is available with the Getting Started installation and configuration. The Virtual Machine platform uses Oracle Virtual Box technology, and use of this configuration would necessitate the download/installation of Virtual Box and the Getting Started virtual machine. If you prefer this route and would rather run the Getting Started Demonstration Environment installed and preconfigured in a Virtual Machine, please download the ODI 12c Getting Started Virtual Machine.

Step 3 –

In continuation with the Getting Started Demonstration Environment and the Getting Started Guide, you will now be able to run through a first project with Oracle Data Integrator.

Now that you have successfully done the exercises above, you may want to play on your own, and develop with Oracle Data Integrator on your environment...  here you can download a full version of Oracle Data Integrator 12c

Enjoy, and happy developing!