Friday Apr 11, 2014

ODI 12c - Expression Hints

The ODI 12c mapping designer let's you design a mapping using components and define expressions for each of those components. Like 11g, in 12c there are hints to let ODI know how you would like to compute a physical plan for your design and from this generate code. I think the rules about how some of this work are not known - both in 11g and 12c - people second guess how they think it works. There's no magic to it, let's have a look at it. Underpinning the mapping are the technology definitions, it's here that datatypes are defined and datatype mappings between technologies. This let's ODI be very flexible in support for arbitrary data transformations between systems and how such data and its datatype is mapped across heterogeneous systems.

Putting the heterogeneous nature aside we can look at how datatypes are transformed just in a distributed example for Oracle (the example is for demonstration, in reality the database link LKM will be used which will do no staging). The example has 2 columns in a source table that are both VARCHAR2(30), one of those columns has an actual string, the other has a date. The target system has 2 columns in our target table that are VARCHAR2(10) and DATE. Note the target for one is shorter than its source and the other is a DATE datatype and not a string.

We can define the simple table to table mapping as below and define the expressions on the target table.

By default the expressions have no hint defined and will execute where the table is executed - in this case on the target system. We can see how the C$ table would be defined by previewing the DDL code in the physical design, we will see the type/DDL in the syntax of the particular technology. Below you can see the source datatype information is propagated - the length is still 30.

 If we look at the target table we can see the expressions defined on it, in the example below I have selected the TGTEMP table and I can see the 2 expressions, I could actually change where the expression is defined for this particular physical design, I will not do that though, I will go back to the logical designer and set the hint there - then all of my potential physical designs leverage it.

Use the 'Execute on hint' property for the attribute, you can see the different options there, just now it has value no hint. 

Below I have selected 'Source' to indicate I want the SUBSTR expression executed on the source system. 

After this change has been made, if you look at the physical design you will see that the datatype information on our AP is now different. Because the data has been prepared on the source then the datatype for our C$ column now takes on the definition of the target (VARCHAR2(10)). 

This gives you some idea as to how the hints works for expressions in a datastore. ODI 12c also has an expression component that let's you define groups of expressions. I generally think this is good for when an expression is reused within a mapping, but I know everyone works differently with tools and I have heard that some people like using this for all complex expressions rather than it being defined within a component such as a target table as they can easily 'see' where complex expressions are defined. Each to their own, the good thing is that you can do whatever you like. One benefit with using the expression component is that ODI by default will push that expression component as close to the source as possible and you can easily grab the entire component and push it to the stage, target or wherever.

The mapping below defines the expressions on the expression component, again there are no hints defined on the component or individual expressions.

When the physical design for this mapping is inspected we can see the expression component is by default on the source execution unit. This goes for other components too (filter, join etc.). In this case you can see both of the columns in the AP take on the downstream target table's datatypes (VARCHAR2(10) and DATE).

Changing the hint on the logical design for the expression component to stage will place the expression in the downstream execution unit. If I had just switched the hint to be stage for the expression component then in the physical design the expression would go in TARGET_UNIT_1. In 11g, ODI also supported a concept where the stage was different for the target. This is still available in 12c and is configured by defining what you want to execute in the stage by using these hints plus defining what the stage is (so similar to 11g apart from you don't have to switch tabs and the gestures are more simple). So firstly, define the expression to execute on the stage using that hint. Then on the logical mapping if you click on the canvas background you will see a property named 'Staging Location Hint', you can set this to the logical schema location for the staging area if you have one. By default it is not set as the staging area is the same as the target.

Let's change this to MEMORY_ENGINE just to see what the physical design looks like. We see we now have multiple execution units and the middle one where we executed the expression component is the 'stage' executing on the MEMORY_ENGINE location.

The hints are done on the logical design. You can also hard-wire physical changes in the physical design, I will go into that in a subsequent post but wanted to stick to the logical hints here to demystify how this works. I hope this is some useful background, I think for ODIers from 11g it will help.

Monday Mar 11, 2013

ODI - Tip of the day, latest scenario version

Here's how you can run the latest version of an ODI scenario from a load plan, package or SDK without explicitly defining the scenario version. Although the UI infers you pick a specific version of the scenario code to execute, if you type in -1 for the version the latest version of the scenario code is executed.

Not sure how many customers are aware of that but a handy tip. I'm confident not many use it in the UI as its kind of tricky to setup. If you just type the scenario name and -1 its OK that works fine, but if you use the scenario picker you have to close the wizard after since the version is not editable, you would have to then edit the step and set the version to -1.

Friday May 08, 2009

Using ODI Variables in Topology: Pushing Contexts to the Extreme

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Context, Topology, Logical Architecture, Physical Architecture and Variables are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for more details.

In a previous post, we have discussed the notion of Context and how powerful it can be. There is a limit however. If you remember our discussion, each server, each schema on each server, have to be defined in the ODI Topology. Now imagine that you design processes that have to run on a large number of systems. The exact same code will be executed; only the physical location will be different. How about having thousands of systems where this code has to run. Who wants to maintain the connection information manually through a graphical interface? And maintain thousands of contexts while doing so?

Realistically, I think that contexts are fine up to about a dozen of environments. Beyond that, you need an environment that will be more dynamic. But we want to keep the benefits of having the exact same code on all systems, the flexibility of having a complete separation of the generated code and of the execution location. The solution? Use variables in Topology!

1. DEFINING THE ENVIRONMENT

Before jumping heads down in the usage of variables in Topology, I strongly recommend the creation of 2 contexts:
- A development context where all URL in topology point to an actual server, not using the variables. This will ensure that data can be viewed, and interfaces can be tested without any concerns regarding the variables resolution;
- The dynamic context (similar to what will be used in QA and Production) will use the variables in topology to name the servers, port numbers, user names for the connections, etc. The package will assign the appropriate values to the variable and run the interfaces on the appropriate servers. This context will only be used to validate that the processes defined in the Development context work properly when we use the variables.

Independently from ODI, we will need a table to store the values that will be used for the Topology variables. For this example, will be simply use different server names. Keep in mind that other topology parameters can be set using this same technique.

The table for our structure will contain the server names, as well as the name of an ODI agent: with many processes running concurrently, it is better to assign pools of servers to different agents. We will use the following structure:

create table ODI_SERVERS (
SERVER_NAME varchar(50),
AGENT_NAME varchar(50)
);

Loading this table is not part of the description we have here. If you have a list of servers available somewhere, ODI would be the perfect tool to load your table though…

2. SETUP: CREATION OF THE VARIABLE AND TOPOLOGY DEFINITION

2.1 Creation of the Variable
Details on the creation of a variable can be found in this post. We will simply review the main steps here.

Create a new variable called ServerName.
The data type is Alphanumeric.
You can set the default value to localhost.
You can use the Description field to describe the variable.
In the Refreshing tab, select the logical schema that hosts your ODI_SERVERS table and enter the following query:

select SERVER_NAME from ODI_SERVERS

Note: for more flexibility in the execution of your code, you should never hard-code a schema name – nor assume that the login used to connect to a server defaults to your schema name. Best practice in the example above would be to let ODI complete the table name with the schema name by using the following syntax:

select SERVER_NAME from <%=odiRef.getObjectName(“L”, “ODI_SERVERS”, “D”) %>

Variable-ServerName


2.2 Definition of the Topology

2.2.1 Contexts

To get more details on the creation of contexts, please check out this example. If you already know your way around Topology, create the following two contexts: Development and Dynamic

Servers Context

2.2.2 Development Server: Hardcoded Values

In topology, define one of the servers that you want to access. We will use this server for our development work, and ignore the variables for now: enter the parameters as usual for the definition if this server:

Development Server


Make sure to test the connectivity to validate your parameters. Define a physical schemas as usual under this data server. Map it to the logical schema ORA_DYNAMIC_SERVER for the Development context.

Note: you will have to create the logical schema to perform this mapping.

Servers: Development Schema


2.2.3 Dynamic Server: Using the Variables

For the dynamic contexts, we will use the variable as part of the connection string (aka the JDBC URL). Instead of typing the actual hostname, type the variable name, including the CODE of the project where the variable was created (or GLOBAL for a global variable).

URL example for Oracle:
jdbc:oracle:thin:@#PROJECTCODE.ServerName:1521:ORCL

Servers: Dynamic


This physical server will be considered as the dynamic server. Map its physical schema to the logical schema defined in the previous step, in the Dynamic context this time.

Dynamic Server Schema


3. EXECUTING CODE IN THE DYNAMIC CONTEXT

Your code will work as usual in the Development environment: you can validate your transformations, your processing logic and processing time. When you will want to use the Dynamic context though, a few more steps will be required.

3.1 Connection to the Databases

The ODI agents are in charge of the connection to the databases, sources and targets. At the beginning of the execution of a process, the agent will connect to the Master Repository to retrieve the Topology information. Then it will connect to the databases, generate the code, and send that code to the databases. This represents a challenge in our case, since we need to set a value for our variables before the agent connects to the databases: we cannot set the value of our variables in the package that will be executed...

3.2 Setting the Values for Topology Variables

The solution will be to pass these values as parameters to the process, so that the values are known to the agent before it establishes the connection. You can set the values no matter how you start the scenario: from a command line interface, from another scenario or from an ODI procedure, from a web service or from a Java API. Remember to declare your variables at the beginning of your package to make sure that the value of the parameters gets properly stored though!

For more details on how to pass parameters to an ODI process, you can check out this other post.


4. UNDERSTANDING POTENTIAL ERRORS

- Make sure that the variable name is properly spelled in Topology (Variable names are case sensitive
- Make sure that the variable is declared at the beginning of your package
- Make sure that the variable is referenced with its project CODE (not the project name) in Topology


For more information on ODI contexts, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "What is the Topology? "

All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

Monday Apr 27, 2009

Executing the Same Code in All Environments: ODI Contexts.

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Context, Topology, Logical Architecture and Physical Architecture are used here assuming that you understand these concepts in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information on these concepts.

The concept of Contexts in ODI is both very powerful and very convenient. Over the years, I have seen too many people ignore this feature, or not use it properly – hence not fully taking advantage of the flexibility offered by the tool. Hopefully, this brief discussion on the subject will help with a better overall utilization of this feature.

1. THE ISSUE AT HAND – AND THE ODI WAY TO SOLVE IT

A typical challenge for any code that accesses remote systems is to update the connection parameters to access these without disrupting the code that has been developed. In particular in data integration solutions, when developers promote their code from development to QA and then to production, any modification of the code could alter its quality.

Many software solutions will provide parameter files that can be edited, or variables and parameters than can be changed to reflect the change of environment. The downside of these solutions is that maintenance is performed outside of the tool’s control, and leaves the door open for mis-configuration.

The architecture of ODI has been defined with this issue in mind. To solve this problem, developers will never have to know where they are physically connecting: the administrator will give them an alias (say ORACLE_SALES) and that will give them the connection they require. The details of the connection information is entered and managed by the administrator, through the GUI. This will allow the administrator to validate the connection strings and to check the validity of all environments: missing connections can be easily identified, different environments can be compared, etc.

As developers use the metadata to design the transformations, they have a limited need to access the actual systems. They will need to access the actual systems in two cases: to retrieve the Metadata (the reverse engineering operation) and to run the data integration processes. In the QA and production environments, we will need to access the systems as well to run the data integration jobs.

When metadata is imported from actual system, or when code is about to be executed, ODI will ask for the selection of a Context to define the execution environment.

The following screenshots show where ODI will prompt you for a context, and how you can switch from one execution context to the next.

Execution Context

Execution Context Selection

This of course assumes that you have enough security privileges to do so. Different users may have privileges to run the code in one environment and not in the other. For instance, QA engineers may not have the right to execute the processes in the Development environment or in the Production environment. Or if contexts define different company branches, employees from one branch would not be allowed to run code in a different branch.

2. CREATION AND MANAGEMENT OF CONTEXTS

The contexts in ODI are basically labels that you can create as needed. ODI comes with only one context called Global. Typical implementations would actually have three contexts: Development, QA, Production. As mentioned earlier, other implementations could define contexts for different branch or anything that identifies different execution locations.

Contexts will be defined in the Topology interface, along with 2 other types of objects: Logical Schemas and Physical Schemas. In the Topology GUI, you will find three corresponding tabs: Physical Architecture, Contexts, Logical Architecture.

Topology


The physical and logical architectures are organized by technologies.

Technologies

The Context will simply list the contexts that you have defined, independently of the technologies.

Contexts

3. WORKING WITH CONTEXTS: REAL LIFE EXAMPLE

Let us now take a databases for a data integration project. Let us say this is a sales database. The structure of the data will have to be consistent from Development to QA, and from QA to Production (Over time, the structures will evolve in the different environments. But for the QA to be meaningful, we need to be consistent from Development to QA and from QA to Production).

3.1 Creation of a Logical Schema

To represent the connectivity to the data structures, we will define an alias. Let’s call this alias ORA_SALES. This is our Logical Schema. The Logical Schema is created in the Logical Architecture tree under a given technology.

Logical Schema Creation.PNG

When you create a logical schema, unless you associate it with a context, it is just an alias that points nowhere. Create a logical Schema under the Oracle Technology and call it ORA_SALES. We will revisit this entry shortly after creating contexts, and again after defining the connectivity to the actual physical servers.


3.2 Contexts

Switch now to the Context tab in Topology, and create 3 contexts: Development, Test and Production. Simply give a name to each – no passwords are required.

Development Context.PNG

Once the contexts are created, we can revisit the definition of the logical schema we had created in the first step, and see that there is no physical connection defined for either context. Basically, we have said that we have a data model called ORA_SALES that will exist in three separate environments, but we have not defined the connectivity for either one yet: this will be done in the physical architecture.

3.3 Physical Architecture

The Physical Architecture is where we will define the actual servers that host the data. We will provide the credentials to connect to the servers: user name and password. We will also provide the JDBC connection strings, which usually contain the IP address or host name, the port number for the database, and any additional information as required by the technology: SID or service for Oracle, database name for SQL server, etc.

For this example to work in our environment, you will have to adapt it to point to your own databases and schemas.

Under the Oracle Technology, create a new Data Server (be careful when you make your selection from the menus: a common mistake is to create a new Technology instead of a new server!). Name this server, and then enter the appropriate user name and password for that server.

Physical Credentials


Once you have entered these parameters, you can click on the JDBC tab and enter the database JDBC connection information. Here, connecting to Oracle, we select the Oracle Driver name from the drop-down menu and update the JDBC URL with our server name, port number and SID.

Physical JDBC


Note: if you are not using an Oracle database, you will have to copy the JDBC driver files for your database in the appropriate ODI directory. Check out the ODI documentation for more details on this process.

You can validate the parameters you have entered with the “test” button: at this point the connection should be successful.

Succesfull Connexion


When you will save this entry (click Ok or Apply) ODI will open another window to select a Physical Schema on that server. If the schema window does not appear, you can right-click on the physical server you have just created in the Physical Architecture tree and select Insert Physical Schema.

Note: Different Technologies will have different terminologies and ODI will display the proper terminology as needed. From an ODI perspective though, we will always talk about a “Physical Schema”.

Select the Physical schema name, as well as a “work schema” from the drop-down menus.

Physical Schema Definition


Then click on the Context tab and click on the blue grid button to add a line, where you can select a context and the logical schema name. For instance, Development and ORA_SALES.

Physical Logical Mapping


What you are saying here is that ODI will connect to the physical schema that you have just defined when the generated code will refer to the ORA_SALES object and when the context selected at runtime is set to Development.

You can then define the appropriate servers and/or schemas for your QA and production environments, and associate them with the appropriate Context for your Logical Schema.

4. REVIEWING LOGICAL SCHEMAS

Once all physical schemas have been defined and mapped properly, you can edit your logical schema and review your mappings. You can make sure that the schema is actually pointing to a physical schema for all contexts. If a context displays Undefined for the physical schema, then your code cannot run in this context: ODI would not know where to connect.

Logical Schema Review


5. REVIEWING CONTEXTS

You can also edit your contexts and select the Schema tab to review the mapping of all logical schemas for this context: all you file directories, source systems and target systems must be properly mapped for your code to run in this environment.

Context Review


For more information on ODI contexts, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "What is the Topology? "

All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today