Main

ETL Archives

April 25, 2007

Error handling using Data Rules

This has been a topic on the shows we presented, but it is a very powerful feature to use, so here is a little bit on how to use data rules in ETL. Note that data rules are part of the Data Quality option to Warehouse Builder.

To create this solution we identify a number of discrete phases after the initial setup steps which are generic for creating sources and targets in Warehouse Builder:


  1. Create a data rule to document the business rule you want to enforce
  2. Apply the data rule to a specific table (or set of tables is the rule applies on more tables)
  3. Create a mapping with the tables and set the properties for error handling
Create a Data Rule

Data rules are metadata definitions of a business rule you define. You can use some of the pre-defined rules in Warehouse Builder (as-is or customized) or create completely new rules. In this case we will create a new rule.

Steps:

Import the HR schema (you may find that a lot of the data fails the rule, but that is not the point we are making here...)


  1. In the HANDSON project locate the Data Rules node
  2. Right mouse click and choose New...
  3. Name the new module ERROR_HANDLERS
  4. Click OK, this launches the Data Rule Wizard
  5. Click Next in the welcome screen

  6. Call the new rule EMAIL_VERIFICATION
  7. Since we will be verifying the pattern for email addresses, on the Define Rule panel set the following:

    Type: Common Format
    Format: Email Address

    CreateEmailRule:


  8. In the Values section you see the default pattern (in Regular Expressions) applied, since this is a rule that will only recognize 3 letter domains (.com, .org, .net etc.) we will modify the rule Highlight and copy (ctrl-c) the text from the first line. Paste the text into the second line visible (the empty line shows this is extensible)

    RegularExpressionEmail:


  9. In the second line find the +. then locate the {3} behind it, we will change this to recognize 2 letter domains. Change the {3} into {2}

    RegularExpressionModified:


  10. Finish the wizard
Regular expressions are a very interesting language, to learn or read more take a look at the Oracle Regular Expressions Pocket Reference by Jonathan Gennick and Peter Linsley. Nice format, great reference for playing with regexps...

Apply the data rule

The important thing to understand when using data rules is that, first they are applied to a table (this can be a source, but read on for that), only then are they moved into the mapping to produce ETL routines.

Because the rule is applied to the table and because you can choose to remove errors from the main table into an error table (no this is not DML error logging! More on that in a later post, it is coming in 10.2.0.3!), you will need to regenerate the table. The error table DDL is generated only when the main table (after applying the rule) is generated.

Applying the rule to any table is done in the Data Object editor:


  1. Copy the HR.EMPLOYEES table into any target module
  2. Name it STG_EMPS
  3. Open the STG_EMPS table in the editor (double click the table)
  4. Navigate to the Data Rules tab

    ApplyDataRuleDOE:


  5. In the Applied Rules panel, click the Apply Rule button to add a new rule to the table
  6. Click Next on the welcome screen of the wizard
  7. Find the ERROR_HANDLERS module and select the EMAIL_VERIFICATION rule
  8. Click Next
  9. Change the name (of the usage) to ETL_EMAIL_VERIFICATION
  10. Click Next

    ApplyRuleWizardBinding:


  11. In the Binding, find and select EMAIL
  12. Finish the wizard
  13. The rule is now attached (applied) to the STG_EMPS table
If you now generate the STG_EMPS you will see an STG_EMPS_ERR table as well. This table gets deployed when you deploy STG_EMPS (sorry no other way to do this). In configuration you can change the error table name as well if so desired.

Once the STG_EMPS and STG_EMPS_ERR tables are deployed and thus in the database you can start creating a mapping that utilizes the data rule within STG_EMPS.

Data rules in a mapping

Create a new mapping (if you have the table in a mapping already and it was in there before adding the rule, do a synchronize inbound so the mapping gets the latest table definition including the data rule) and drag the STG_EMPS table into it. Make sure you can see the entire operator:

MapTableOperatorwithRule:

Notice how the operator has two groups. The top group is the regular in/out group for the table data and you can map from and to this group. The second group (ERR_GROUP above) can be mapped from (not into) and reflects all the columns that are in the STG_EMPS_ERR table. So there is no need to join these two tables or to import the STG_EMPS_ERR table to retrieve rows from. You can map from both groups in the same mapping and even join them back together (simply think of them as two tables).

Activating a data rule

Once a table with a data rule is in your mapping, you need to choose how you want to use the data rule. For that highlight the operator and find the operator properties:

ActivateDataRuleInMap:

You have three choices:
The default is Ignore. This simply does not act upon the data rule, so it does not influence the generated code or the mapping logic.
MOVE TO ERROR means that the data rule is enforced when running the mapping and that all data rows that fail the rule (or any of the rules if you have multiple) is moved out of the main table STG_EMPS into STG_EMPS_ERR. That failing row does no longer exist in the STG_EMPS table and any selects from it will not see this row.
REPORT means that the error rows are written into the error table, but are NOT removed from the STG_EMPS table and remain in the regular select flow from the main table.

Multiple rules on the same table are possible and can be individually configured. In the case that 2 rules are violated, the error reason is concatenated for the record indicating multiple violations.

ErrorTableProperties:

A second set of properties governs the behavior of the error table. Truncate the error table simply means that the table gets truncated by OWB before loading into it. You would typically do this if you process the error rows in the originating mapping ensuring the information is captured.

Roll up errors gives you a consolidated and aggregated set of rows, ensuring that each row is only given to the extraction query once when selecting from the error table. Note it does not prevent multiple rows to be present, it merely aggregates them to show a single one. If you need all violations as separate records, switch this to No.

Once you have set all properties, you can actually inspect what the pluggable mapping looks like for this operator, and more surprisingly (I think) you can deploy this mapping as is with only a single operator! So it could play the role of validation mapping. In that case you would probably choose Report as strategy.

As a last step, you can create a mapping the extracts both from the source STG_EMPS and from the STG_EMPS_ERR operators in the same mapping.

CONNECT BY in OWB 10gR2 (Hierarchical queries)

A common question from OWB users is how to incorporate the CONNECT BY clause and build maps that handle parent child tables. The good news is now you can build such maps...so I have just found out!


To define a hierarchic query in OWB you can define the CONNECT BY clause in a FILTER operator (this is in the OWB 10.2.0.2 patch). For example in the mapping below the filter operator CONNECT_BY has the hierarchic CONNECT BY clause:


Connect By Map: This map uses the filter operator with a filter condition containing a CONNECT_BY clause.


The CONNECT BY clause can be used in a FILTER (the START WITH clause if used must be after the CONNECT BY, even though the Oracle SQL grammar states the other way) when bug 5233636 is fixed, the filter condition will conform to the grammar ie. START WITH INOUTGRP1.ENAME=�KING� CONNECT BY PRIOR�).


In the expression operator pseudo columns can be utilized,  for example an output expression value could be the LEVEL pseudo column. You can also use others such as the SYS_CONNECT_BY_PATH to get a fully qualified path of hierarchic information.


Connect By Pseudo Columns: The pseudo columns for CONNECT BY can be used in a map.


When the code is generated we get the CONNECT BY SQL we want!


Taking this further the 'create dimension' expert has been extended on the Exchange to cover 2 more scenarios; generating normalized level-based dimension and generating a single level dimension with a level indicator attribute. The expert prompts for minimal user input - for example below the parent and member columns are identified:


Connect By Expert: Create dimension expert for handling parent child tables.


After the columns are identified you are prompted for a starting condition, here we select the root of the hierarchy, where the MGR is NULL:


Connect By Expert Levels: Select the start with and max levels


The normalization option will create n levels and populate the dimension table based on the parent child source table. It will also push ragged dimension members from the actual leaf to the deepest level in the hierarchy. The single level option will create a dimension with a single level including a parent reference attribute and a level indicator for depth in the tree, along with the accompanying data loader.


In summary we can see how we can incorporate the CONNECT BY SQL clause into OWB mappings. Experts and scripting provide a way of accelerating the development of objects with common design patterns.


 

May 2, 2007

AW Loading/Solving with OWB 10gR2

This note details how AW cubes are created by OWB 10gR2, it covers how cubes can be loaded with data, solved and how large cubes can be loaded using the parallel solve capabilities of the AW in Oracle 10g.

The AW engine in 10g has an api (AWXML) that allows both client-server style usage (as in AWM) and batch like usage using java-stored procedures (this is the OWB generate/deploy model). We'll cover some details for how the AWXML is generated in order to understand facets of building an AW solution with OWB.

How is the AWXML map generated for ragged fact data in OWB?
If the natural keys option is defined for the dimension, then any cube map using this dimension will have mapping code (AWXML mapping code) generated by OWB for handling ragged fact data. The source column for the cube dimension level is actually mapped to every parent level also; this enables ragged fact data to be loaded. If the generate keys options is defined then when you define a map to load data at this level then (because of the generate surrogate keys behavior) you will be loading cube dimension members for this level, and only this level.

How are aggregations defined?
OWB allows reuse of existing dimensions without necessarily defining additional hierarchies. Aggregations are generated based on the cube dimension level references defined within OWB. Only hierarchies where the cube dimension level is a member will be included in the aggregation, and OWB goes a step further. If the cube dimension level referenced is a non-leaf level of the hierarchy, then levels lower in the hierarchy will be excluded when the cube or measures are solved. So if you have a budget cube and an actual sales cube then they can share the same dimension definitions without additional dimension hierarchy definitions.

So what is AW auto-solving, what is it all about?
An important attribute of the OLAP AWXML engine is the AW auto-solve facility, this will happen when:

·  Cube is compressed, or
·  Cube is non-compressed - all measures are set to �Solve=Yes� and the dimension levels that the cube refers to are at the leaf level of all hierarchies the level is a member of.
· Auto-solved when OWB map is executed

Incremental Aggregation of cube is dependent on auto-solve (load and aggregate in one operation). Incremental aggregation is a property of the cube operator in the mapping.

OWB can generate non-autosolved cubes;

· OWB map optionally solves these cubes
· OWB transformation can be used to solve
· Cube has to be
    Non-compressed plus some measures are set to �Solve=No� or the dimension levels that the cube refers to are non-leaf levels of a hierarchy the level is a member of.
· Incremental Aggregation is only for Auto-Solved cubes.

Solve one measure and not another / load and solve?
With compressed cube use the NOAGG operation on the cube dimension aggregation tab. With non-compressed cubes it is possible to solve one measure and not another as follows. You will need the latest database patch also for this (10.2.0.2, bug 4550247 has details for 10.1 patch) for it to work properly.  The options defined on cube measures for solve indicate which measures will be included in the primary solve. The solve indicator on the cube operator in the map however indicates whether this solve will be executed or not. So the map can just load data or load and solve the data.

So how do I solve independent of load?
There is a transformation function for executing solves, so solves can be scheduled independently from loading. It is also possible to solve measures independently from each other using this function (WB_OLAP_AW_PRECOMPUTE).
For example the following will solve the measure SALES in the SALES_CUBE

declare
  rslt varchar2(4000);
begin
�  rslt := wb_olap_aw_precompute(�MART',�SALES_CUBE','SALES');
�end;
/

This routine has parameters for parallel solve, max number of job queues etc. If parallel solve then an ASYNCHRONOUS solve job is launched and the master job ID returned via the functions return value.


What calculation plans are generated by OWB?
The following calculation plans are generated by OWB

· Calculation plan generated for the cube
· Calculation plan generated for each stored measure

This allows measures to be solved individually after a data load, or entire cubes to be solved. The actual calculation plan can also exclude levels based on OWB metadata.


How do I parallel solve, what is the degree of parallelism?
This is a configuration parameter of the cube mapping. The cube operator has a property �Allow Parallel Solve� and also a property for the �Max Job Queues Allocated�, these 2 properties control whether parallel solve is performed and also the size of the job pool (the default is to let the AWXML engine figure it out).

What is the output of an OWB AW map?
There is an output parameter AW_EXECUTE_RESULT from the mapping. When the map is executed using parallel solve, this output parameter will contain the Oracle job ID � this is ASYNCHORONOUS. The user must then, or can then use the dictionary views below to determine when the job is complete and what to do next;

all_scheduler_jobs
 (Scheduled Jobs)
all_scheduler_job_run_details
 (Job History)
all_scheduler_running_jobs
 (Running Jobs)

If the map is not executed using parallel solve, the AW_EXECUTE_RESULT output parameter will return the �Successful� tag or an error. For error view the OLAPSYS.XML_LOAD_LOG table.

Hopefully this provides some background to the different kinds of load/solve capabilities that can be incorporated into OWB.

File Listing/XMLType from a stored procedure

Ever wondered how to get a list of file names in a directory from within the Oracle database? See post for more details on OWB. This is something I've done in 2 cases, one which happens to use the XML/SQL capabilities from within a map, and the other using simple indexing from within a process flow....

For a mapping...

Here you will see how java in the database, along with XML/SQL can be used to provide a useful utility for getting a list of files on the OS. With a small piece of java we can return an XML document containing the files in a directory. The document could also contain other useful information about the file such as the updated/created timestamp (you just jave to extend the java and XML returned) etc. You can then use this in SQL queries.

The following Java operation has a directory name parameter and returns a list of file names in the directory. The files are returned in an XMLType instance which has a top level node <files> and <file> child nodes, where the <file> child has an attribute name with the name of the file.

With the java loaded into the database, a JSP wrapper and some permissions granting, we can now write SQL to get the file names;

SELECT
  extractValue(value(GET_XML_FILE_NODE), '/file/@name') FILE_NAME
FROM
  TABLE ((XMLSEQUENCE(EXTRACT((GETFILELIST('/tmp')), '/files/file')))) GET_XML_FILE_NODE;

Can now get all files with a specific name using regular SQL for example;

select * from (  SELECT
  extractValue(value(GET_XML_FILE_NODE), '/file/@name') FILE_NAME
FROM
  TABLE ((XMLSEQUENCE(EXTRACT((GETFILELIST('/tmp')), '/files/file')))) GET_XML_FILE_NODE) where FILE_NAME like 'STG_FILE%';

For a process flow....

For a process flow rather than using the document, we have one procedure to get the number of files in a directory and another for getting the nth file (so we can process from within a loop):

select getfilecount('/tmp') from dual;

to get the 1st file (index 0):

select getnthfile('/tmp', 0) from dual;

The java stored procedures for both cases...

Here is the java code for both of these examples:

import java.io.File;
import java.io.IOException;
import java.sql.SQLException;
import oracle.xml.parser.v2.*;
import oracle.xdb.XMLType;
import oracle.jdbc.driver.OracleConnection;
import oracle.jdbc.pool.OracleDataSource;

public class UtilFileSystem {
     public static XMLType getDirList(String dir) throws SQLException {
       XMLType xmlt = null;
       try {
         File[] logfiles = new File(dir).listFiles();
         XMLDocument xml = (XMLDocument) new XMLDocument();
         XMLElement docse = (XMLElement) xml.createElement("files");
         XMLElement docs = (XMLElement) xml.appendChild( docse);
         OracleDataSource ods = new OracleDataSource();
         ods.setURL("jdbc:oracle:kprb:");
         OracleConnection conn = (OracleConnection) ods.getConnection();
         for (int f = logfiles.length; f > 0; f--) {
           try {
             XMLElement ele = (XMLElement) xml.createElement("file");
             ele.setAttribute("name", logfiles[f-1].getName());
             XMLElement adoc = (XMLElement) docs.appendChild( ele);
          }catch (ArrayIndexOutOfBoundsException e){
              System.err.println("Usage: java ReadFile filename\n");         
              e.printStackTrace();
              throw e;
          }
        }         
        xmlt = new XMLType(conn,xml);
      }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }catch (SQLException gex){
        gex.printStackTrace();
        throw gex;
      }
      return xmlt;
     }

     public static int getFileCount(String dir) throws SQLException {
       try {
         File[] logfiles = new File(dir).listFiles();
         return logfiles.length;
      }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }
     }
     public static String getNthFile(String dir, int n) throws SQLException {
       try {
         File[] logfiles = new File(dir).listFiles();
         return logfiles[n].getName();
       }catch (RuntimeException ex){
        ex.printStackTrace();
        throw ex;
      }
     }
}

It was loaded into the database using something like the following:

loadjava -user <TGT_SCH>/<TGT_SCH_PASSWD>@<HOST>:<PORT>:<SID> UtilFileSystem.java -resolve -force -verbose -thin

The PLSQL wrappers...

I also created a PLSQL wrapper for the Java (even this is optional in 10g):

create or replace function getfilelist (dir_path IN VARCHAR2) return XMLType is LANGUAGE JAVA NAME 'UtilFileSystem.getDirList(java.lang.String) return oracle.xdb.XMLType';

/

create or replace function getfilecount (dir_path IN VARCHAR2) return NUMBER is LANGUAGE JAVA NAME 'UtilFileSystem.getFileCount(java.lang.String) return int';
/
create or replace function getnthfile (dir_path IN VARCHAR2, n IN NUMBER) return VARCHAR2 is LANGUAGE JAVA NAME 'UtilFileSystem.getNthFile(java.lang.String, int) return java.lang.String';
/

Granting privs for java in the database...

Finally since this was accessing the file system from within the database, the schema running the java must be granted OS permissions, in the example below I have granted access to SCOTT, this should be your schema where the code executes):

exec dbms_java.grant_permission( 'SCOTT', 'SYS:java.io.FilePermission', '/tmp', 'read' )

In summary...

After this you are ready to go, you can now use the functions in OWB mappings (for example use the function itself via a transformation, or via an expression in a constant or expression operator) or process flows transformation activities/expressions.

Kind of fun and useful at the same time!


...odds and ends...

Filtering....

It is also possible to include filters such that only files with a certain pattern/suffix  etc. are retrieved. For example the snippet below could replace the initialization of the logfiles variable in the functions above to list only files with suffix '.dat' (include import java.io.FilenameFilter; also)

/* This example returns files that end with '.dat' */
FilenameFilter filter = new FilenameFilter() {
  public boolean accept(File dir, String name) {
    return name.endsWith(".dat");
  }
};
String[] logfiles = new File(dir).list(filter);
/* also change getnthfile do not need '.getName()' */

Java DB Permissions
If you use the format <directory_name>- this will grant privileges on all directories and files within a directory. With this you could have additional filters, since this would let you read the file modified times and allow you to get all files modified after a certain timestamp.

May 3, 2007

Designing an XML View in OWB

To write about Travis at The Fillmore or generating views from OWB...? Oh just this once I'll do the OWB thang...


Here we will see one way to design a view in OWB. This is applicable to any view but the XML views are an interesting example, so here goes. The mapping editor can be used to build views and materialized views, this is done by having the target operator as a view operator (or materialized view) for example. After the map is fully designed you can create and bind from the view operator in the map - this will generate the view object with the SQL generated from the map definition for the content of the view.


In the example below we are reading XML from a purchase orders table where the XML is stored in an XMLType column and targeting a view, the map extracts line item information (it uses the components generated from the XML expert on OTN)  along with some details of the purchase order:


XMLViewMap:


When the create and bind is selected a new name can be entered. The view definition generated is shown below:


XMLViewDef:


There are certain parts of the XML/SQL that are not possible in OWB (such as the XMLTable function - these have to be embedded in a view), but the rewrite of these boils down to the same code as the XML/SQL extract style functions anyway. The expression/filter style operators allow a free text style expression to be defined so all kinds of stuff can be done including XQuery within the database.


Back to Travis, I mean work...:)

May 15, 2007

Failing to deploy/execute objects....

Getting the exception RPE-01012 when trying to deploy a map? Or RPE-02245 with a process flow? Or failed to
execute a map with ORA-01017: invalid username/password. How do you execute via a PLSQL function/procedure from the target user? Here's a few things related to failures on
deployment, execution etc that its useful to be aware of.

Failed to deploy Map with RPE-01012

This is generally just a privilege issue because you are deploying to a
regular database user which has no OWB specific privileges. This is the
'RPE-01012: Cannot deploy PL/SQL maps to the target schema because it
is not owned by the Control Center' error.

Check if the schema
where deploying the map to is in the same database instance as
the runtime platform.  If
so, has the schema (where maps are being deployed into) been added as
an OWB user (Global Explorer panel/Security/Users)? This will actually
grant some roles to the schema to enable maps to be deployed. This error (RPE-01012) comes from a simple check to see if
the schema you are deploying the map to has been enabled as an OWB
target user. If not in the same instance, then you will need the OWB repository installed in the remote instance and the user added as a target user in it.



For example to recreate the problem if I have a map X in module M which
uses location L, L is just a plain old database user P. If I try
deploying the map I will get the RPE-01012. You can actually deploy a
table to this schema which kinds of tricks you into believing
everything is OK (it is not, for maps anyway, they have greater
dependency on audit/credentials etc).



One way to setup the database user is using OWB, you can create an OWB
user based on an existing database user (or create a new one), this
functionality is available to the repository owner under the Security
node in the global explorer.
CreateOWBUser:

Ensure the user has the target schema option (I used existing database
user XWEEK).
EnableTargetSchema:

This will grant privileges to the database user which will allow the
deployment etc. to operate.

Deploying the map should be successful now.

Failed to execute Map with ORA-01017: invalid
username/password

With this case executing a map (say M) fails using the sqlplus exec
template SQL script. The error in the audit states ORA-01017: invalid
username/password; logon denied. You are puzzled because you can
connect to SQLPlus as WH, but get some odd error when you execute the
command (and view the execution audit in the control center for
example);

sqlplus WH/WH @sqlplus_exec_template.sql  DESIGN WH_L PLSQL M "," ","

This
commonly happens when location WH_L representing database user WH has
been registered by the repository owner (DESIGN) or another user and
the preference 'Share location password during runtime' is disabled
(this is the default).
SharePasswordRuntime:

It more than
likely means the WH user has no registration details for WH_L, since
the details are not shared. You can logon to OWB as WH and register the
location WH_L, entering the password. Now go back to sqlplus and you
should execute fine.

How to execute Map/Process from Target user in PLSQL procedure
The
routine has to be invoker rights in order to set the roles for the
session (you cannot see the wb_rt_api_exec.run_task method without
enabling this role). Here is the function I created in the target to
allow the execute of any object.

    -- With this you will have to change the name of your CC owner, I have used OWB_CC
    create or replace function execute_code ( obj_loc  varchar2,
                                              obj_typ  varchar2,
                                              obj_name varchar2,
                                              sys_parm varchar2,
                                              cus_parm varchar2)
                              RETURN NUMBER  AUTHID CURRENT_USER as
      r number;
      cc_owner varchar2(256) := 'OWB_CC';
    begin
      execute immediate 'set role OWB_D_'||cc_owner||', OWB_O_' || cc_owner;
     
execute immediate 'begin :r :=
'||cc_owner||'.wb_rt_api_exec.run_task(:obj_loc, :obj_typ, :obj_name,
:sys_parm, :cus_parm,0,0); end;' using OUT r, IN obj_loc, IN obj_typ,
IN obj_name, IN sys_parm, IN cus_parm;
      return r;
    end;

Execute a process flow from a target schema:
    declare v number; begin v := execute_code('OWF_LOC', 'PROCESS', 'DEMOFLOW/FLOW_FTP', ',', ','); end;
    /

Failing to Deploy Process Flows, getting RPE-02245

This happens when OWB and WF are in different instances. Access from WF to OWB is performed through a proxy user. During deployment the proxy user is granted the necessary object privileges required to execute the deployed process flows. The proxy user must already exist and does not require any privileges apart from CONNECT; it can therefore be an ordinary database user or a OWB user. Create the user through OWB with the same password as the existing remote Workflow OWF_MGR user, now go deploy the process flow.

Anyway hope these are useful.

May 18, 2007

Calling Web Services from OWB 10gR2

Consuming a web service in OWB where the web service produces a document? Read on... This entry details how to incorporate a web service into an OWB mapping.

There are a number of ways to execute web services from within the database and therefore from with OWB 10gR2:
·    perform all the work using UTL_HTTP
·    utilize Jpublisher which does a lot of work to interface SQL to the web service
·    utilize the UTL_DWS package that allows generic web service consumption

This example leverages the JPublisher code that generates object types, tables of objects and table functions from a WSDL file related to the 2006 World Cup (this was a topical example from the summer based on a World Cup web service, see WSDL here). It is similar to the example on OTN (see here) but includes how to consume a web service that produces a document.

1. Download JPub 10.2 from OTN (here) - on Linux edit the jpub script for local env.
2. Download DBWS zip for 10.2 (here)
3. The demo uses a web service providing information from the 2006 World Cup, the WSDL was copied to my local machine for reference, this contains complex types with collections etc.
4. Publish PLSQL wrapper and proxy code for the WSDL into the schema (tgt_102), also generated table function proxies;

./jpub -user=<user>/<password> "-sysuser=sys/<passwd>" "-compile=true" "-proxyopts=tabfun" "-proxywsdl=http://<host>:<port>/footballpoolwebservice.wsdl"
If you use the "-proxyopts=tabfun,noload" and remove -user and sysuser, you can simply generate the code independent of a schema - the code is not loaded into the server, but is generated. The code includes SQL scripts with the type definitions, functions and table functions and also Java code that must be loaded into the database that binds the SQL, Java and Web Service worlds.
5. For my example I created a table of stadium names in order to make an interesting call on the generated table function;
create table stads (name varchar2(256));
insert into stads values ('Olympiastadion');
insert into stads values ('Zentralstadion');
commit;

The example built in OWB is one to retrieve all of the stadium information for the tables listed in the table STADS above. From SQL*Plus the query looks like;
exec sys.utl_dbws.set_http_proxy('www-proxy.us.oracle.com:80');
select
  r.arg0 name,
  r.res.iSeatsCapacity_ capacity,
  r.res.sCityName_ city
from table(JPUB_PLSQL_WRAPPER.to_table_stadiuminfo(cursor(select name from stads))) r;

There are options for supplying PLSQL package names (rather than getting JPUB_PLSQL_WRAPPER), and also java package names for the proxies (the default is genproxy).

Let�s see how we leverage this from OWB.....

Import the generated package or just the types and table types. Table function support is loosely coupled in OWB 10gR2 so there is not a great value in reverse engineering the package if you need the table functions (since the names have to be manually entered in the table operator), but certainly the types etc. are used so ensure all the types and table types are imported.
WS Import:
 
Also import the table STADS that has been created above, this will be used in the mapping.


Create a mapping then add a table function operator to the mapping. You will have to enter the table function name in the operator�s property panel.
WS TF Props:

Change the name of the operator for readability;
WS TF Name:

Add the input attribute for the function being used (this is taken from the dependent function):
WS TF Atts:

Add the output attribute; this will be the type returned by the table function (you can assume OBJECT_VALUE here):
WS TF Out Atts:

Now add an expand operator for the type returned by the table function and another expand operator for the RES value (which is of type OBJ_TSTADIUMINFO), here we are expanding into the structure for each row returned by the web service.
WS Map1:

For simplicity let�s create a new table and write all of the records returned to this table;
WS Map2:
 
Just map from the output group of the expand operator to the new table operator�s group:
WS Map3:

Then create and bind the new table;
WS Map4:

We�ll define the source rows for the web service, first let�s define the input to be a ref cursor, this will be based on the STADS source table:
WS Map5:

Now add the STADS source table and map the NAME attribute into the table function�s input:
WS Map6:
 
For calling web services from the database we must set up the HTTP proxy information, add a pre mapping process that uses the procedure INITIALIZE_PROXY (we�ll define this in a second):
WS Map7:

Previewing the code we see the following, looks good:
WS Map8:

After executing the mapping we see the following content in the target:
WS Map9:

Here is the procedure definition that we are using to initialize the HTTP proxy:
WS Map10:

WS Map11:

I like this web service example, it illustrates producing complex documents and how such documents can be consumed from OWB -  but is missing the Scotland football (or soccer) team :).

May 21, 2007

OWB and Oracle Workflow - going forwards

Many people have seen the OWF statement of direction and get worried that OWB uses this heavily in their solution. Well, there is no reason to worry... While OWF as a standalone product will do what the statements say, products that are relying on the OWF solution will keep using the latest version (2.6.4) going forward.

For 11g OWB the integration is getting tighter as we will bundle and install the approriate OWF code with the OWB install. In other words you will be getting a WF directory under the OWB node in the Oracle home.

While we are talking about this, just to give away a little bit on the 11g release for OWB, the install is one of the things we did some major rework on. If you install an 11g database, you will get the core OWB pieces installed making it much simpler to start using the tool. Simply point it at an 11g database and you can create your new repository (no SYSDBA required anymore!) and get going. With OWF installed that step is taken out of the equation as well...

While I know that this is not an official statement of what is going on, I hope it reconfirms that OWF is still a very much alive solution component and will be available for use.

May 22, 2007

Mailing Map Errors

Sending mail from map or process flow activities often pops up in questions. The OWB process flow designer has a mail activity that makes sending a mail simple, you may find this inadequate if so, sending mail from the Oracle database is fairly simple, googling around you can find all kinds of stuff (here is one example). When I was trying this out I tried both approaches.

From within a process flow the simplest way I found to get all errors (possible errors initializing activity parameters for map and the map execution itself) is to have a custom function that takes the process flow execution id which you can get from a global OWB variable and pass the name of the flow and the name of the activity in order to query the OWB audit tables (it is not great passing names, but you will get all error types). I have created a simple function which also takes a boolean indicating the format (using HTML or plain text) and will return a string which will be the email body, it can return plain text or HTML formatted text. The OWB mail activity supports a text body, the example I google'd (see SQL here) has both text and HTML.

Here is the code for the function I used in the process flow;

create or replace function GET_ACTIVITY_ERRORS(
    FLOW_AUDIT_ID IN NUMBER, 
    PROCESS_FLOW_NAME IN VARCHAR2,
    ACTIVITY_NAME IN VARCHAR2,
    HTML_RESULT IN BOOLEAN DEFAULT false)
RETURN VARCHAR2    IS
cursor cAllErrors is
  select to_char(p.created_on, 'HH:MM:SS DD-MON-YY') || ' '||
  p.execution_audit_id || ' '||
  p.message_text  AuditText
  from all_rt_audit_exec_messages p
  where execution_audit_id in (
   select max(execution_audit_id)
    from all_rt_audit_executions a
    where a.parent_execution_audit_id=FLOW_AUDIT_ID
    and a.execution_name=PROCESS_FLOW_NAME||':'||ACTIVITY_NAME);

errorText VARCHAR2(4000) := 'Error in activity '|| ACTIVITY_NAME || ' : ' || chr(10) || chr(10) ;

BEGIN
    if (html_result) then
      errorText := '<br><b><u>Error in activity '|| ACTIVITY_NAME || ' : </u></b><br>' || chr(10) || chr(10) ;
    end if;
    FOR aerr in cAllErrors LOOP
      if (html_result) then
        errorText := errorText || '<br>  ' || aerr.AuditText || '</br>' || chr(10);
      else
        errorText := errorText || '  ' || aerr.AuditText || chr(10);
      end if;
    END LOOP;
    RETURN errorText;
    EXCEPTION
        WHEN OTHERS THEN
            return errorText;  -- return whatever is in the buffer
    RETURN NULL;
END;

This function is used in the flow using the OWB email activity and in the version which will send HTML formatted emails, I use the boolean to get HTML tokens in the result. With this function I'll get errors such as UK violations when executing the map or issues initializing the parameters for the map such as date format issues (say there was an expression during the evaluation of the mapping activity input parameter representing load date).

The basic process flow has a mapping activity that has a transition for success, all other error paths (activity initiation errors, execution warnings and execution errors) go through the mail path. I tried using the assign activity rather than the transformation activity to get the string representing the email body, I preferred the transformation activity - when an assign activity uses a custom transformation, you will have to qualify the function with the schema since the OWB code does an ALTER SESSION SET CURRENT_SCHEMA to the OWB control center so that the OWB code that the assign implementation references will be resolved. Not exactly good for everyone using it having to hard-code schema names, that's after having setup the evaluation location for the flows. So the transformation activity was used. The output of the function is bound to a variable named EMAIL_BODY. The variable is used in the email activity.

Mail Flow:

The transformation activity that retrieves the error codes has a parameter which the flow audit id, the OWB global variable PARENT_AUDIT_ID is passed into this parameter, I've set the literal value to false, since this is an expression that needs evaluated.

Mail Flow Audit Id:

The output of the function is stored in the process flow variable EMAIL_BODY, you can assign the result of the function to the variable by binding the output parameter GET_ACTIVITY_ERRORS to the variable;

Mail Flow Body:

This version of the flow sends a plain simple text email with the audit details;

Mail OWB 1:

If we use the PLSQL procedure that sends HTML content we can get richer emails using all of the HTML controls we need, so the new flow uses the send html mail PLSQL procedure and alters the HTML_RESULT parameter to GET_ACTIVITY_ERRORS.

Mail Flow Xform:

With this approach we can do all kinds of formatting, tables and linking. Here is the output with bold and underlying HTML;

Mail OWB 2:

If you need any manual intervention there are 2 mechanisms, using the notification activity which is an elaborate mechanism which relies on the workflow notification, there is also a manual activity that pauses the flow and allows manual intervention which can be performed in the repository browser using the activity expedite, more to come on this.....


Manual intervention in a process flow (restart, retry or reroute): Part I

As a follow up on the emailing errors from process flows, this post looks at using the Manual activity in the process flow editor to choose a path interactively. If you combine this post with the emailing post, you basically replaced the workflow notification (which requires a middle-tier) with a fully out of the box solution without middle tier.

The Manual activity is essentially a simple activity, in that it always waits for input from the OWB Browser. The input is in the form of the result codes OK, OK with Warnings or Failure. In the browser you can choose which one you have posted as the outcome for the activity. You can however now build very interesting scenarios like restarting or retrying activities based on an administrators decision to do so. And this can be done after taking the required corrective action...

For this posting (don't worry the rest will follow soon) lets just look at a basic example as to how we can get the manual interaction to work.

Create a process flow package and a process flow that looks like this:

ProcessWithManualActitity:

The idea here is to show the results as to what will happen. When I press Failure in the Browser it will go through the error stream and execute the procedure (which simply inserts 'yes' into a table, you do not need this to get Manual to work!) so I can see what is going on...

The parameter values are set as text strings, and in this case I've left the priority at 50 (medium, see the online help for details on priority):

ManualParametersToSet:

Deploy the package to workflow using the normal steps and execute the process flow.

Run the process flow, the control center now looks like this (note you can start this from the browser as well!):

RunningFlowFromCCM:

In my case it will basically sit in "Busy" until I do something with the Manual activity, so lets get going on that one. Launch the OWB browser and to into the Execution Schedule Report. Find your process flow and note it is Busy (as I said)...

MainJobExecutionScreen:

Click the Execution Job Report on the Busy line to drill into the flow, you will get the following page:

ExecutionDetailsToExpedite:

Notice the Expedite Button etc in the red box. Do NOT press that one, this will push the process flow to the end and basically aborts the entire thing. I did this and it does not do what we are trying to do here! Instead click on the Execution Job Report for the Manual Activity (the Green Box), you will get a very similar page, but now just for this activity:

ExpediteScreenForManualActivity:

Here choose Expedite after setting the status (the red box again). In the green box you can see the text and body where you can tell administrators what is going on. We tried doing some formatting with HTML tags, but that does not quite work as you can see.

Now press the Expedite Execution button and you'll see a Busy_Failure (or Busy_OK) status and after the process runs to completion you will see the approriate end status (in my case a failure).

ExpeditedStatusResult:

Next up, doing some loop backs around the manual activity...

Manual intervention in a process flow (restart, retry or reroute): Part II

Doing the actual "go back after I did something" is very simple. I've extended and changed the process flow used in the previous post (part I) to look like this:

LoopBackUponError:

Essentially I have some execution part (my silly procedure which insert yes into a table) which could be a mapping or set of mappings and then I have error handling. In this case if the procedure fails (for whatever reason, I can make this more sophisticated by going on error status or specific outputs) it goes into the manual activity. The administrator can come in to Browser and investigate the issue, resolve it and restart the process with the last execution.

In my example, I renamed the table to insert into, so I got the procedure to fail (imagine the select from an external table fails because the file does not (yet) exist). Go into the Browser and say Expedite Execution with a Status go OK and it tries the procedure again. Since I re-renamed the table it now works nicely...

FinalResultOfLoopback:

Notice that at the highest level in the Browser this job is marked as OK but with some note that something went wrong at some point.

For the next entry in this series (didn't plan to do this, but hey all good things come in 3-s - Indiana Jones, Star Wars (ok 2 times 3) and of course Pirates of the Carribean...) we'll have a play with a mapping and commit control... that should be fun.

May 30, 2007

Mapping Generation Language in OMB (Retrieve and Alter)

For those of you playing with OMB Plus, here is just and example on how to find generation languages in a mapping. Also figured it may be helpful to know (in advance) the results you may get from these queries (including context changes which are required):

OMB+> OMBDCC
/
OMB+> OMBCC 'SAP_DWH_QUICKSTART'
Context changed.
OMB+> OMBCC 'QS_STG'
Context changed.
OMB+> OMBDCC
ORACLE_MODULE /SAP_DWH_QUICKSTART/QS_STG

OMB+> OMBRETRIEVE MAPPING 'STG_COMPANY_MAP' GET PROPERTIES (GENERATION_LANGUAGE)
ABAP

OMB+> OMBRETRIEVE MAPPING 'STG_BILLING_US' GET PROPERTIES (GENERATION_LANGUAGE)
PLSQL

OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (GENERATION_LANGUAGE)
SQLLOADER

Obviously this way you can retrieve all properties, here are some interesting ones for SQL Loader (btw, most people may not know these exist, in the UI you can find them under the Mapping => Configuration menu):

OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (MULTITHREADING)
USE_DEFAULT
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (OPERATION_RECOVERABLE)
true
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (CONTINUE_LOAD)
false

And while we are on the topic, here is how you alter a property:

OMB+> OMBALTER MAPPING 'LOAD_CUST_FILE' SET PROPERTIES (CONTINUE_LOAD) VALUES ('true')
Map LOAD_CUST_FILE altered.
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (CONTINUE_LOAD)
true

Now if you wonder, how does he know the property names? Rather than looking it up I use the following very complex naming algorithm:

  • Look in the configuration UI (in this case for mappings)
  • Find the property you are looking for, for example Continue Load
  • Make it all upper case and substitute spaces with underscores: CONTINUE_LOAD
  • Mostly this works...
And then write your OMB...



May 31, 2007

Additional info on Running mappings from OWB browser

Got some questions on the previous post. I actually used the execution job reports to start the mapping in question. The only problem with that is that mappings (and process flows of course) only show up there after they have been run once before. Now in development that nicely works of course, but if you work in hosted (and even non-hosted) production environments the mapping gets deployed, but not executed until the schedule kicks in or until you go in as operations staff and run via the browser...

That is where we end up in catch-22 as to but in order to run it, it needs to have run once before... Well that is actually not true. It is just the path I took in this instance, but in OWB browser land many roads lead to Rome (or to executing mappings)...

So here is another way. I'm using a different mapping, which has been deployed from the control center (as if it were production) but has not been executed. I'm actually starting this from the location, which makes sense for DBAs or operations personnel, since they will understand the physical schema.

TGTLocDetails:


The location report allows me to Filter on the Object types (they are a bit funny, but overall make sense) and then in this case I sorted on the last deployed time stamp. You can see my NVR_RUN_B4 mapping is listed on top now. I can now click on the object and get to the deployment history:

DeploymentHistoryWithStart:


However since this is a mapping (and this works also with process flows) you now have the option to start the mapping (upper right corner). Hit start and you get the familiar parameter screen as shown here:

StartMappingAgain:


From now on you can either choose to run via the 'Location route' or use the execution history. I'd be happy to get feedback on this... I think it might make sense to have a report of executable objects in general and you can run these, but not sure how many folks out there use this functionality a lot or for a large number of executions.



June 5, 2007

Making Multi-Configuration easier in 10.2 OWB

One of the powerful concepts in OWB today is the fact that we allow for multiple configurations (it is part of EE ETL option!). In 10.2 the concept while present (you may have seen the DEFAULT_CONFIGURATION stuff on your status bar) is a bit hidden. This expert (which becomes a menu in the tool and can be placed on the right mouse menus) makes some of the concepts more visible and usable.

For example you can now (with the expert in place) see the configuration values for an object (or object type, lets say table) across the configurations. You can think of Default -> QA env -> Prod Env, and then view the parallel settings for a table in a simple spreadtable.

Lots of goodies in this one, so do take a look at it.

Synchronizing Pluggable Maps

This post illustrates a simple process to determine which maps depend on a pluggable mapping component and to actually synchronize them. The MDL
for the expert is here, feel free to download, use, comment and enhance;) It works for OWB 10gR2 and 11g.

After importing the MDL the following 2 steps are a one-off to setup the expert in the repository. You can add a shortcut to run the expert from a pluggable map (the expert only supports pluggable mappings just now);
Experts Add Remove:

You will see a dialog, click on the box next to SYNCHALL, now the setup is done!

Experts Sync:

Now you can actually run the expert from the main tree as if it was another OWB component;

Experts Synchronize Dependents:


Now the fun starts! You should see a dialog appear with all the maps
that use your component, I have 2 maps that use my pluggable map;

Experts Sync Run:



You can select multiple maps and click OK to synchronize. Be patient, you will get a dialog indicating the expert is complete. If you want to see what goes on behind the scenes you can switch the log window on and the task assistant (both expert properties) and see the OMB that gets executed, in the end there are statements such as the following executed to synchronize dependent mappings;

OMBSYNCHRONIZE PLUGGABLE_MAPPING '/DIM_EXAMPLES/PLUGGA' TO MAPPING '/DIM_EXAMPLES/WH/ANOTHER_USE_OF_PLUG' OPERATOR 'PLUGGA' USE (RECONCILE_STRATEGY 'REPLACE', MATCHING_STRATEGY 'MATCH_BY_NAME')

Feel free to
look under the hood of how this works and hope it is useful.


June 8, 2007

What tables are used in my view?

Metadata is an interesting thing. Views are also interesting, however they typically break the metadata story as they are code. No one typically knows what exactly is in the view and you struggle therefore understanding the impact of a change in tables to all the views you have. This post is the first as to how to resolve this using Warehouse Builder.

I have a simple example, a view called REFERENCE_EXAMPLE and it (at least in theory) contains a query that has 2 tables used in it. They are EMPLOYEES and CITIES. What I will do is to link them together using the OWB repository and then look at the Lineage and Impact assessments.

In OMB Plus go to the module that has the objects (OMBCC) then type the following:

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE 'EMPLOYEES'
View REFERENCE_EXAMPLE altered.
OMB+> OMBSAVE
Save complete.

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE 'CITIES'
View REFERENCE_EXAMPLE altered.
OMB+> OMBSAVE
Save complete.

Now launch Lineage from the view and you get:

ViewLineageToTables:

All of a sudden you can show graphically what objects are related to views. Now if you incorporate the view into a mapping, you can drill either way and the view is no longer that dreaded black box object. This is a very small piece of code, but it can be very usefull especially when the views are already created and are just imported into OWB. Next post I'll show a nice way of creating views in OWB and getting complete metadata...

Oh, and should you want to go cross module, make sure you either do a full path to the object to be referenced or a relative path, for example:

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE '../DQ_SRC/JOBS'
View REFERENCE_EXAMPLE altered.
OMB+>

June 11, 2007

Some tips and hints about (XML) file as target...

I was playing with the XML as file target in OWB and I noticed some things that are worthwhile mentioning (I thought)... some of these also apply to a generic flat file though...


First things first, if you want to write to file (using the default operator as described here) make sure you understand that we are using UTL_FILE as a mechanism. This means certain things as to how to set up access to directories. More about is in the Oracle documentation on utilities (I think).


Now if you use a file as target, you do NOT have to:



  1. Create a file definition first (provided you are not interested in fancy attribute names etc.)
  2. Synchronize (or create and bind initially) unless you really want to. The reason for doing this is that we will create a file definition for you and it makes lineage more complete (but the code will work without doing this)

What you do need to do is go into mapping configuration and set the data file properties:


FlatFileTgt:


All of this information will be used to generate the actual mapping. Not checking the XML box obviously will give you an ASCII output file.


Before we go to the XML case, do be aware of the fact that the file is defined as having as many fields (and seperators) as the file definition in the mapping. So for a regular flat file with this mapping:


FileTGTMapping:


You get the following result (Note the last ',' in the file!):


1000,999,1,10/1/05,6,
1001,175955,1,10/1/05,6,
1002,175955,1,10/1/05,6,
1003,205852,1,10/20/05,6,
1004,184497,1,11/1/05,8,
1005,180226,1,10/5/05,17,
1006,175955,1,11/1/05,7,
1007,175955,2,9/30/05,4,
1008,406995,1,11/29/05,118,
1009,180226,2,12/12/05,60,
1010,193039,1,12/1/05,118,
1011,197310,1,11/28/05,104,
1012,193039,1,11/2/05,69,


Now if we look at the same mapping, but now creating an XML file (and only for the XML variant) and not mapping an attribute or element in the XML file, you will run into an error upon deployment (PLS-00201: identifier 'NULL$1' must be declared)...


To avoid this and to explicitly model your expected data result (e.g. that last attribute - or any middle attribute in the file) you should create a mapping like this specifically adding a NULL value into the XML file.


XMLMappingWithConstantForNullHandling:


That way you will get the following (just a single 'row' in this case):


<?xml version="1.0" ?>


- <XML_TGT>



- <INOUTGRP1>



  <ORDER_NBR>1000</ORDER_NBR>


  <CUST_NBR>999</CUST_NBR>


  <BILL_TO_ADR>1</BILL_TO_ADR>


  <ORDER_DT>10/1/05</ORDER_DT>


  <PAYMENT_TERMS>6</PAYMENT_TERMS>


  <STATUS />

  </INOUTGRP1>

Note the 'empty' status part for this XML file...  If you work with the operator this way you will get compiling code and an explicit model of what data you are expecting. If you want to exclude attributes from the file, make sure that you remove the attribute from the definition. Since this is only captured (see first part of the post) in the mapping, this can be done quite quickly and efficiently.

Is Predefined Constant....

When using constants, take a look at this 'Is Predefined Constant' checkbox. This is the effect on the usage of a constant in a simple insert statement:


Unchecked (see the red line below):


      INSERT
      /*+ APPEND PARALLEL("TGT_ORDERS") */
      INTO
        "TGT_ORDERS"
        ("ORDER_NBR",
        "CUST_NBR",
        "BILL_TO_ADR",
        "ORDER_DT",
        "PAYMENT_TERMS",
        "STATUS")
        (SELECT
  "ORDERS_STG"."ORDER_NBR" "ORDER_NBR",
  "ORDERS_STG"."CUST_NBR" "CUST_NBR",
  "ORDERS_STG"."BILL_TO_ADR" "BILL_TO_ADR",
  "CST_MAP"."CONSTANT_0_OUTPUT1" "OUTPUT1",
  "ORDERS_STG"."PAYMENT_TERMS" "PAYMENT_TERMS",
  "ORDERS_STG"."STATUS" "STATUS"
FROM
  "ORDERS_STG"  "ORDERS_STG"
        )
      ;


Checked (see the blue line below):


INSERT
      /*+ APPEND PARALLEL("TGT_ORDERS") */
      INTO
        "TGT_ORDERS"
        ("ORDER_NBR",
        "CUST_NBR",
        "BILL_TO_ADR",
        "ORDER_DT",
        "PAYMENT_TERMS",
        "STATUS")
        (SELECT
  "ORDERS_STG"."ORDER_NBR" "ORDER_NBR",
  "ORDERS_STG"."CUST_NBR" "CUST_NBR",
  "ORDERS_STG"."BILL_TO_ADR" "BILL_TO_ADR",
  sysdate "OUTPUT1",
  "ORDERS_STG"."PAYMENT_TERMS" "PAYMENT_TERMS",
  "ORDERS_STG"."STATUS" "STATUS"
FROM
  "ORDERS_STG"  "ORDERS_STG"
        )
      ;


The change comes down to calling the SYSDATE insert directly rather than using a function to do so... Should make a performance improvement in large row counts.

June 19, 2007

Creating Data Rules in Scripting

Here are a bunch of examples for creating data rules from OMB, this will be a useful cheat sheet for anyone generating rules from any repositories where the rules already exist. You can download a tcl file which creates the examples below from here. You can check out this post also which has useful info on applying data rules.

There are some subtle differences between how the UI creates the rules and the API exposed in scripting (specifically on unique key / referential / functional dependency). The UI poses a simple question of how many attributes are interesting for each rule type, in the API you have to add attributes to the respective groups for each type (the pre-built groups and attributes are defined in the OMB documentation in the API guide). See the examples below for more info - it is not exactly obvious!

The following examples will be covered:
   1. create a custom rule
   2. create a custom rule with multiple columns
   3. create a fixed domain list
   4. create a domain range
   5. create a no null rule
   6. create a unique key rule
   7. create a referential rule
   8. create domain pattern list for postcode
   9. customize built-in patterns
  10. create functional dependency
  11. apply a data rule to a table


1. Create a Custom Data Rule
In the example below we create a single column rule that check that the column which can be supplied is > 300 - by default the custom rule has an attribute VALUE in the group THIS:

OMBCREATE DATA_RULE 'CUSTOM_RULE' SET PROPERTIES(RULE_TYPE) VALUES('ATTR_VALUE_RULE')
OMBALTER DATA_RULE 'CUSTOM_RULE' SET PROPERTIES(BUSINESS_NAME,DESCRIPTION) VALUES('Custom Rule', 'Single column rule')
OMBALTER DATA_RULE 'CUSTOM_RULE' GROUP 'THIS' ATTRIBUTE 'VALUE' SET PROPERTIES(DATATYPE) VALUES('NUMBER')
OMBALTER DATA_RULE 'CUSTOM_RULE' SET PROPERTIES(ATTR_VALUE_CLAUSE) VALUES('"THIS"."VALUE" > 300')

2. Create a Custom Data Rule with Multiple Columns
Here we can add additional attributes to the data rule (so many columns can be supplied ie. can check salary > XYZ and job_type in ....):

OMBCREATE DATA_RULE 'CUSTOM_RULE2' SET PROPERTIES(RULE_TYPE) VALUES('ATTR_VALUE_RULE')
OMBALTER DATA_RULE 'CUSTOM_RULE2' SET PROPERTIES(BUSINESS_NAME,DESCRIPTION) VALUES('Custom Rule2', 'Multi column rule')
OMBALTER DATA_RULE 'CUSTOM_RULE2' GROUP 'THIS' ADD ATTRIBUTE 'ANOTHER_VALUE'
OMBALTER DATA_RULE 'CUSTOM_RULE2' GROUP 'THIS' ATTRIBUTE 'VALUE' SET PROPERTIES(DATATYPE) VALUES('NUMBER')
OMBALTER DATA_RULE 'CUSTOM_RULE2' GROUP 'THIS' ATTRIBUTE 'ANOTHER_VALUE' SET PROPERTIES(DATATYPE) VALUES('VARCHAR2')
OMBALTER DATA_RULE 'CUSTOM_RULE2' SET PROPERTIES(ATTR_VALUE_CLAUSE) VALUES('THIS.VALUE> 300 AND THIS.ANOTHER_VALUE IN (SELECT DISTINCT JOBTYPE FROM EMP_JOBS)')

3. Create a Fixed Domain List
Create a domain rule with a fixed set of values:

OMBCREATE DATA_RULE 'DOMAIN_RULE' SET PROPERTIES(RULE_TYPE) VALUES('DOMAIN_LIST_RULE')
OMBALTER DATA_RULE 'DOMAIN_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('Domain List Rule')
OMBALTER DATA_RULE 'DOMAIN_RULE' ADD DOMAIN_VALUE 'dd' ADD DOMAIN_VALUE 'ee'

4. Create a Domain Range
Create a domain range rule:

OMBCREATE DATA_RULE 'DOMAIN_RANGE_RULE' SET PROPERTIES(RULE_TYPE) VALUES('DOMAIN_RANGE_RULE')
OMBALTER DATA_RULE 'DOMAIN_RANGE_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('Domain Range Rule')
OMBALTER DATA_RULE 'DOMAIN_RANGE_RULE' SET PROPERTIES(MIN_VALUE) VALUES(500)
OMBALTER DATA_RULE 'DOMAIN_RANGE_RULE' SET PROPERTIES(MAX_VALUE) VALUES(20000)
OMBALTER DATA_RULE 'DOMAIN_RANGE_RULE' GROUP 'THIS' ATTRIBUTE 'VALUE' SET PROPERTIES(DATATYPE) VALUES('NUMBER')

5. Create a No Null Rule
Create a 'no null' rule:

OMBCREATE DATA_RULE 'NO_NULL_RULE' SET PROPERTIES(RULE_TYPE) VALUES('DOMAIN_NO_NULL_RULE')
OMBALTER DATA_RULE 'NO_NULL_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('No Nulls Rule')

6. Create a Unique Key Rule
Create a unique key rule with a single column:

OMBCREATE DATA_RULE 'UK_RULE' SET PROPERTIES(RULE_TYPE) VALUES('IDENTITY_RULE')
OMBALTER DATA_RULE 'UK_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('UK Rule')
OMBALTER DATA_RULE 'UK_RULE' SET PROPERTIES(IGNORE_NULLS) VALUES('true')

How to set the number of attributes (for a composite unique key rule)? After this you edit the rule in the UI you will see 'Number of Attributes: 2', the attribute names you use will appear when you apply the data rule to a table (so you will bind the attributes from the rule to columns in the table):

OMBCREATE DATA_RULE 'UK_RULE2' SET PROPERTIES(RULE_TYPE) VALUES('IDENTITY_RULE')
OMBALTER DATA_RULE 'UK_RULE2' SET PROPERTIES(BUSINESS_NAME) VALUES('UK Rule2')
OMBALTER DATA_RULE 'UK_RULE2' SET PROPERTIES(IGNORE_NULLS) VALUES('true')
OMBALTER DATA_RULE 'UK_RULE2' GROUP 'THIS' ADD ATTRIBUTE 'KEY_ATTRIBUTE_2'



7. Create a Referential Rule
Create a referential rule:

OMBCREATE DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(RULE_TYPE) VALUES('REFERENCE_RULE')
OMBALTER DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('Referential Rule')
OMBALTER DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(LOCAL_MIN_COUNT) VALUES(0)
OMBALTER DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(LOCAL_MAX_COUNT) VALUES(1)
OMBALTER DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(REMOTE_MIN_COUNT) VALUES(0)
OMBALTER DATA_RULE 'REFERENTIAL_RULE' SET PROPERTIES(REMOTE_MAX_COUNT) VALUES(1)

How to set the number of attributes - use group LOCAL/REMOTE, I use the same naming convention for attribute as OWB built-in name:

OMBCREATE DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(RULE_TYPE) VALUES('REFERENCE_RULE')
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(BUSINESS_NAME) VALUES('Referential Rule2')
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(LOCAL_MIN_COUNT) VALUES(0)
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(LOCAL_MAX_COUNT) VALUES(1)
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(REMOTE_MIN_COUNT) VALUES(0)
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' SET PROPERTIES(REMOTE_MAX_COUNT) VALUES(1)
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' GROUP 'LOCAL' ADD ATTRIBUTE 'LOCAL_KEY_ATTRIBUTE_2'
OMBALTER DATA_RULE 'REFERENTIAL_RULE2' GROUP 'REMOTE' ADD ATTRIBUTE 'REMOTE_KEY_ATTRIBUTE_2'


8. Create a Domain Pattern List for Postcode
Create a domain pattern list for British postcodes, any arbitrary regular expression can be supplied:

OMBCREATE DATA_RULE 'DOMAIN_PATTERN_RULE' SET PROPERTIES(RULE_TYPE) VALUES('DOMAIN_PATTERN_LIST_RULE')
OMBALTER DATA_RULE 'DOMAIN_PATTERN_RULE' SET PROPERTIES(BUSINESS_NAME) VALUES('Domain Pattern Rule')
OMBALTER DATA_RULE 'DOMAIN_PATTERN_RULE' SET PROPERTIES(DESCRIPTION) VALUES('British Postcode RegExp')
# I have escaped various characters with \ such as [ ] {  }
OMBALTER DATA_RULE 'DOMAIN_PATTERN_RULE' ADD DOMAIN_VALUE '(^(\[A-PR-UWYZ0-9\]\\[A-HK-Y0-9\]\\[AEHMNPRTVXY0-9\]?\[ABEHMNPRVWXY0-9\]? \{1,2\}\[0-9\]\\[ABD-HJLN-UW-Z\]\\{2\}|GIR 0AA)$)'

9. Customize Built in
Create a customized version of the built-in telephone format rule (there are common formats defined for;
  telephone      -DOMAIN_FORMAT_TELEPHONE_RULE
  IP address    -DOMAIN_FORMAT_IP_RULE
  SSN             -DOMAIN_FORMAT_SSN_RULE
  Date             -DOMAIN_FORMAT_DATE_RULE
  Number        -DOMAIN_FORMAT_NUMBER_RULE
  URL             -DOMAIN_FORMAT_URL_RULE
  Email            -DOMAIN_FORMAT_EMAIL_RULE 
Some of these are localized versions but it is fairly straightforward to create versions for other locales.

OMBCREATE DATA_RULE 'COMMON_FORMAT_TELE' SET PROPERTIES(RULE_TYPE) VALUES('DOMAIN_FORMAT_TELEPHONE_RULE')
OMBALTER DATA_RULE 'COMMON_FORMAT_TELE' SET PROPERTIES(BUSINESS_NAME) VALUES('Domain Format Telephone Rule')
OMBALTER DATA_RULE 'COMMON_FORMAT_TELE' SET PROPERTIES(DESCRIPTION) VALUES('Telephone Customized')
# I have escaped [ ] {  }
OMBALTER DATA_RULE 'COMMON_FORMAT_TELE' ADD DOMAIN_VALUE '(^\[\\[:space:\]\\]*\[0-9\]\\{3\}\[\\[:punct:\]|\[:space:\]\\]?\[0-9\]\\{4\}\[\\[:space:\]\\]*$)'

10. Create Functional Dependency
Create a functional dependency rule:

OMBCREATE DATA_RULE 'FUNC_DEP_RULE' SET PROPERTIES(RULE_TYPE) VALUES ('FUNCTIONAL_DEP_RULE')
OMBALTER DATA_RULE 'FUNC_DEP_RULE' SET PROPERTIES(FUNCTIONAL_DEP_THRESHOLD) VALUES (10)

If you want more than 1 determinant then add to the group (there is a group for DETERMINANTS and DEPENDENCY):

OMBALTER DATA_RULE 'FUNC_DEP_RULE' GROUP 'DETERMINANTS' ADD ATTRIBUTE 'DETERMINANT_2'
OMBALTER DATA_RULE 'FUNC_DEP_RULE' GROUP 'DEPENDENCY' ADD ATTRIBUTE 'DEPENDENCY_2'


11. How to Apply a Rule to a Table?
So how do you actually add a rule usage to a table.....you have to tie the table to the rule and the columns to the attributes!

The example below adds a domain rule to the COUNTRIES table binding the VALUE attribute to the COUNTRY_NAME column

OMBALTER TABLE 'COUNTRIES' ADD DATA_RULE_USAGE 'DRU' SET REF DATA_RULE '../DRS/DOMAIN_RULE' GROUP 'THIS' SET REF TABLE 'COUNTRIES' ATTRIBUTE 'VALUE' SET REF COLUMN 'COUNTRY_NAME'

That's a quick run down that I hope its useful to share. Hope this helps!

June 28, 2007

Oracle's ETL direction and OWB

There seem to be some questions around the strategy Oracle will follow around ETL and what OWB's role will be going forward. This is actually described in the OWB roadmap (here).


Click here for the OWB Roadmap


For current customers of OWB and for the many prospects that see the clear value of an Oracle ETL tool, this roadmap discusses where OWB will go in the next couple of releases. Actually most of the features in the roadmap are aimed to come online roughly in the next calendar year (I cannot tell you more specific dates due to legal reasons of course!).

Now the roadmap obviously is a little vague in that it will not describe exact detailed features, but let me assure everyone, OWB will get a lot of really great stuff on top of the already present market leading features (that is a Forrester quote, not an Oracle quote). I think everyone should look at the Forrester Wave in the link and understand the position of all tools in this wave and where they will be going.

I know everyone wants more specifics, so I will try to get you some, which we also did at the ODTUG conference, and which we will run at the BIWA Sig presentations (mark the 11th of July, go here for details). That session is also going to be recorded and I will post that link here as well. Everyone interested in this do dial in!

Some more details then (as a prelude to the webcast):

In the roadmap you will see that OWB will leverage Sunopsis / ODI components, which will allow OWB to combine both the best of the OWB world and the ODI world, allowing you to customize the mappings in the way you can do in ODI. For new and current customers this means that rather than getting worried as to moving to ODI, ODI will come to OWB customers as embedded functionality allowing you to choose! As a current customer this means you can use the new functionality if you need it, but you can happily keep on building regular mappings, and that will NOT go away in OWB. So lets not get worried about migrations, there is no need... you will all get the best of both worlds directly embedded in OWB.

If you are interested in seeing how the ODI KM integration would work in OWB join the webcast, I will be showing this integration live as a demo. In fact we will actually show how you can load data from DB2 into DB2 using OWB and native DB2 code. I will also show you how to customize and import ODI knowledge modules into OWB (future!)...

Some other things, not sure if I can already show this, is the SOA infrastructure support. I think going forward we see more people looking at embedding webservices into a SaaS setup. OWB is of course a prime candidate to generate these webservices. Imagine using the name and address validation as a service across the organization. So we are definitely aware this is something coming into full swing more and more. Hence the roadmap and the features that allow us to create a webservice directly from a mapping (or  a process) in the client. You can also consume webservices of course.

The other thing around SOA, is of course BPEL integration, and the webservices will nicely allow us to integrate into BPEL processes. Actually, today you can use the database webservices features (described on OTN) to do exactly this already, but it will be so much simpler from within the product.

With Oracle focussing squarely on Oracle BI Enterprise Edition, also see the OBI SE One edition which has OWB as a core component, OWB is moving its excellent Discoverer support to OBI EE. This allows you to directly integrate the OWB metadata with the OBI EE repository and dramatically increase productivity... I'll show some of this in the webcast as well, so if you are thinking of either or both tools, do tune in and understand what is coming.

As we go along in development, we will be sharing more and more information and demonstrations on where OWB is going. We figure that showing real software is better than telling you 'greate things are coming, stay tuned'...


June 29, 2007

SQL and OWB - Accelerated Map Construction? Part 1

For those folk who know SQL even a little SQL and then come to OWB, this would be a great helper; build your map from SQL. How do I build a map that looks like this, how do you support this etc. The reality is that although there are a percentage of SQL statements that fit nicely into the OWB mapping paradigm, there are times when you must rethink the SQL to fit into the capabilities of OWB.

Imagine the possibilities, you could use it to learn how to build maps, to create maps quickly, to construct maps from SQL generators such as summary advisors etc. Before getting carried away though... this is a demo, a demo built using OWB experts and an ANSI SQL parser, like most generators there are various caveats, with this demo there are even more, but I'll do my best to maximize the demo appeal;)

A Simple Example
Let's take a simple example from the SQL reference manual, this is an interesting and small example:


INSERT INTO TGT_TAB (DEPT_ID, MN_SAL, MX_SAL)
 SELECT department_id, MIN(salary) MIN_SAL, MAX (salary) MAX_SAL
   FROM HR.employees
    WHERE HIRE_DATE > '01-JAN-98'
    GROUP BY department_id
    HAVING MIN(salary) < 5000
    ORDER BY department_id;

With OWB this would be represented by a number of operators with various properties set. How long would it take to build this by hand? How many mistakes would you make? Would you forget the order in which the operators should be created?  Let's look at this in an accelerated way using copy-paste!

On the 'Mappings' node in the tree, select 'Build from SQL...'. With this we can copy-paste some SQL (ANSI SQL) and get the map automatically built. I did this as a demo there are a few caveats but with what is here you will see the great potential time saver plus resource of information that this can be.

Launch the dialog to create via SQL:

Map Builder 1:

Below we copy-paste the SQL we are interested in

Map Builder 2:


The dialog as well as prompting for the map name and SQL also included an option for the code generation. Since I generally build SQL maps, I defaulted the expert to use set based mode as the default (this set both the generation mode and default operating mode to set based).  After completing this dialog, the generator is off and running - it has conveniently generated the map including all of the operators with respective properties to represent the SQL entered:


Map Builder 3:


So what did the generator do?
  1. Added tables for all of the FROM tables (schema name is mapped to module, alias is operator name)
  2. Added filter operator for WHERE clause if a single table
  3. Added joiner operator for WHERE clause if multiple tables
  4. Added expression operator for any expressions in select list or
  5. Added aggregation operator if using an aggregated function (including group by clause, having clause)
  6. Added constant operator for literals in select list
  7. Added set operator for any set operation such as UNION, MINUS etc
  8. Added sorter for any ordering defined
  9. Inserted a target table if an INSERT clause is used
This demo works nicely for INSERTs and SELECT statements, it breaks down on the richness of the SQL parsing support (I used one that I googled that had a nice API and I could quickly build a demonstration from). It would be better to have rich support for all the Oracle specifics including specific grammar, hints etc. Later I'll cover DELETEs and UPDATEs.

Taking this Further
OK, taking this further it would be great to then click on a view or materialized view and construct the map from the view/materialized view query - then you can get really rich lineage/impact analysis information.

Map Builder 4:

This example gets fairly complex data warehouse SQL reverse engineered into an OWB mapping all with an effortless copy/paste;

Map Builder 5:

The SQL happens to be straight from the Oracle Data Warehousing guide (Example 20-8 GROUPING combined with HAVING), I made it slightly more complex by union the query with itself to illustrate the union operator in the map (just for fun);

SELECT channel_desc, calendar_month_desc, country_iso_code,
       TO_CHAR(
SUM(amount_sold), '9,999,999,999') SALES$,
       GROUPING(channel_desc) CH,
       GROUPING
(calendar_month_desc) MO,
       GROUPING(country_iso_code) CO
FROM sales, customers, times, channels, countries

WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id
AND customers.country_id = countries.country_id
AND sales.channel_id= channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND country_iso_code IN ('GB', 'US')
GROUP BY CUBE(channel_desc, calendar_month_desc, country_iso_code)
HAVING (GROUPING(channel_desc)=1 AND GROUPING(calendar_month_desc)= 1
AND GROUPING(country_iso_code)=1) OR (GROUPING(channel_desc)=1
AND GROUPING (calendar_month_desc)= 1) OR (GROUPING(country_iso_code)=1
AND GROUPING(calendar_month_desc)= 1);


Constructing this would have involved quite a number of mouse clicks I bet. As you see this demos really nicely for a few examples, even as an accelerator for building a percentage of the map it is quite nice ie. quickly get a bunch of operators on the mapping canvas with basic joiners, sorters etc..

Deletes and Updates
I guess this is where it gets tough. How can something so simple in SQL be tricky for me to understand in OWB? Well with any tool it will have a few nuances, a few rough edges that you can forgive for a while. How can the following queries be so tricky to model;

1. DELETE FROM TGTAB;

2. DELETE FROM TGTAB where somedate is null;

3. DELETE FROM TGTAB where somedate < '01-JAN-1995';

This partly falls down to modeling for the complex scenario and not making some simple ones simple. Back to generating the maps, you can  imagine parsing something like this and generating a map becomes a little trickier task.

To build the OWB mapping for the above involves a little more work than you might think, and there are a few ways to do it, here is the SQL for (3) implemented in 2 ways with OWB;

Delete Map 1:

As well as the table operator which has the DELETE loading type, you will need the same table as a source. There are 2 options to the SQL where clause for the date, you can either include a FILTER operator between the tables or define the filter (option 2 above) on the target table using the 'Target Filter for Delete' property (option 1 above). You must also either select match by constraints or set the match column when delete property for each column to be matched. The other SQL statements above are variations on the above either with different filters or no filter at all (1).

The UPDATE SQL statement is another one similar in style to DELETE because of the matching columns.

1. UPDATE TGTAB set somestatus=NULL;
2. UPDATE employees
SET salary = salary * 1.1
WHERE department_id = 100


Add the table also as a source, and ensure that the match column when updating and load column when updating properties are set appropriately.  In this example I made sure SALARY was match when updating was false and load when updating was true. For EMPLOYEE_ID I set as match when updating to true and load when updating to false.

Update Map 1:

Even with some of these simple DELETE/UPDATE statements there are some nuances, so you can imagine the complexity when you get some complex scenarios, generation of the map from SQL needs some more thought. The earlier map generations were a good fit for the OWB mapping paradigm, and were easy to build in a generator, the UPDATE/DELETE statements need some more thought, but there is hope. The table operator when used for update/delete has a lot of properties from the table operator down to each attribute - all of which must be tweaked to ensure you get what you want. These kind of issues will be resolved in OWB to make the
SQL to OWB transition much simpler. When this happens it will not only be simpler to construct such maps but there will be a much simpler model for generating the maps.

In Summary
For some statements at times manual construction of the map is a fine art, so automation via a generator is a little tricky, tricky that is until we resolve some of the ease of use issues, which will make the generator a reality for more than INSERT and SELECT statements.. I thought I'd share some of these ideas anyway and see what people thought.

July 5, 2007

Process Flow - Execute a map for all files in a directory

How to call a map for every file in a directory? Flat file maps have the data file name as a parameter. The map can be called from a flow and the data file name set, the map will be wrapped in a while loop that iterates over every file in a directory. Here is an illustration, it uses the functions getfilecount and getnthfile from the post here.

The flow looks like you'd expect for a basic programming loop;

  • an initialization block to get the number of files in a directory, I have the directory as a parameter to the process flow too. The initialization is the GETFILECOUNT activity which uses the function from the blog post linked above.
  • the while statement with the condition (iterate for every file in directory using the index)
    • get the nth file in the directory using the function GETNTHFILE
    • execute the map (LOAD_F2) passing the fully qualified name of the file
    • increment the index
  • end loop
The process flow looks like this;

Process All Files:

The process flow uses the following variables and parameters;

Process All Files Parameters:

In the above image DIR_PATH is a process flow parameter, so the flow can be executed and directory specified upon execution. There are a few variables; one for holding an index for the loop (FILE_INDEX), one for holding the file name returned from getnthfile (FILE_NAME) and another variable for the number of files (NO_OF_FILES).

The while condition for the loop is defined as loop while index is less than the number of files;

Process All Files While:

The increment index (ASSIGN) activity inside the while loop looks like

Process All Files Increment:

This lets us easily execute a single map many times with a group of files. Its also possible to separate the control file/log file location so they can be written to one directory and source data files processed from another (so getnthfile will not process control files etc..).

July 6, 2007

Process Flow - Execute a map for all files in a directory - Part 2

This note is a description of a useful expert (download at end) for generating process flow components for loading all files from a directory based on the design in the earlier post described here. The expert prompts for minimal information and builds a flow containing the selected map (a flat file/SQL*Loader map), functions and while loop from the earlier design. The expert also generates the 2 PLSQL functions that are java stored procedures. So all you need to do is load the dependent class into the schema where the functions are deployed.

You can add the expert as a short cut of a map, this will let you run the expert from a flat file map;

Flow Accel 1:

You are then prompted for the process flow module, package, process name and the directory where the data files reside;

Flow Accel 2:

That's it basically, you should have a process flow generated, plus the supporting functions in the same module as the map!

Flow Accel 3:

Here we see the 2 PLSQL functions generated, these are the java stored procedures that depend on the Java class in this post;

Flow Accel 4:

Note in OWB you can also define PLSQL functions/procedures that are Java stored procedures (you can load the java directly in the database no need to compile outside), here is the implementation I have generated for one of the functions;

Flow Accel 6:

And finally (but definitely not least) here is the generated process flow that takes care of the nitty gritty details of the activity setup, the loop conditions etc.;

Flow Accel 5:

So in addition to deploying the maps;
  1. perform a loadjava on the UtilFileSystem.java file from this post
  2. grant permissions (using dbms_java.grant_permission) on the directory you need - as is done in this post
  3. deploy the GETFILECOUNT and GETNTHFILE functions generated with this expert
  4. set the location for the process flow module and deploy
  5. it is now ready to run.
The MDL file for this expert can be found here on the blog - after import you will find it in the Global Explorer panel, under FLOW_ACCELERATORS / LOAD_FILES_FROM_DIRECTORY. You should add it as a shortcut ('add/remove experts here' upon right click of a map) to a map in the main tree.

Here are some snippets of the OMB used within the expert;
...
# Create the functions
#
catch {
  OMBCREATE FUNCTION 'GETFILECOUNT' SET PROPERTIES (RETURN_TYPE,IMPLEMENTATION) VALUES ('NUMBER', 'LANGUAGE JAVA NAME ''UtilFileSystem.getFileCount(java.lang.String) return int'';') ADD PARAMETER 'DIR_PATH' SET PROPERTIES (DATATYPE) VALUES ('VARCHAR2')
}
catch {
  OMBCREATE FUNCTION 'GETNTHFILE' SET PROPERTIES (RETURN_TYPE,IMPLEMENTATION) VALUES ('VARCHAR2', 'LANGUAGE JAVA NAME ''UtilFileSystem.getNthFile(java.lang.String, int) return java.lang.String'';') ADD PARAMETER 'DIR_PATH' SET PROPERTIES (DATATYPE) VALUES ('VARCHAR2') ADD PARAMETER 'N' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
}
...
#
# Create the flow
#
OMBCREATE PROCESS_FLOW '$flow_name'

# Add the IN parameter to the flow
#
OMBALTER PROCESS_FLOW '$flow_name' ADD PARAMETER 'DIR_PATH' SET PROPERTIES (DIRECTION,DATATYPE,VALUE) VALUES ('IN','STRING','$dirp')

#
# Define the local variables used in the flow
#
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'FILE_INDEX' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'FILE_NAME' SET PROPERTIES (DATATYPE,VALUE) VALUES ('STRING','')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'NO_OF_FILES' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')

#
# Add all the activities
#
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSFORMATION ACTIVITY 'GETFILECOUNT' SET REFERENCE TRANSFORMATION '$map_mod_name/GETFILECOUNT'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSFORMATION ACTIVITY 'GETNTHFILE' SET REFERENCE TRANSFORMATION '$map_mod_name/GETNTHFILE'
OMBALTER PROCESS_FLOW '$flow_name' ADD WHILE_LOOP ACTIVITY 'WHILE_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD END_LOOP ACTIVITY 'END_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD END_ERROR ACTIVITY 'END_ERROR'
OMBALTER PROCESS_FLOW '$flow_name' ADD MAPPING ACTIVITY 'FILE_LOAD' SET REFERENCE MAPPING '$map_mod_name/$map_name'
OMBALTER PROCESS_FLOW '$flow_name' ADD ASSIGN ACTIVITY 'ASSIGN'

#
# Define the transitions between the activities
#
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'ST_2_GC' FROM  ACTIVITY 'START1' TO 'GETFILECOUNT'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'GC_2_WH' FROM  ACTIVITY 'GETFILECOUNT' TO 'WHILE_LOOP' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('SUCCESS')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'GC_2_EN' FROM  ACTIVITY 'GETFILECOUNT' TO 'END_ERROR' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('ERROR')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_GF' FROM  ACTIVITY 'WHILE_LOOP' TO 'GETNTHFILE' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('LOOP')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'GF_2_MP' FROM  ACTIVITY 'GETNTHFILE' TO 'FILE_LOAD' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('SUCCESS')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'GF_2_ER' FROM  ACTIVITY 'GETNTHFILE' TO 'END_ERROR' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('ERROR')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'MP_2_AS' FROM  ACTIVITY 'FILE_LOAD' TO 'ASSIGN'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'AS_2_EL' FROM  ACTIVITY 'ASSIGN' TO 'END_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'EL_2_WL' FROM  ACTIVITY 'END_LOOP' TO 'WHILE_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_EN' FROM  ACTIVITY 'WHILE_LOOP' TO 'END_SUCCESS' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('EXIT')

#
# Define the parameters for the activities
#
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'GETFILECOUNT' MODIFY PARAMETER 'DIR_PATH' SET PROPERTIES (BINDING) VALUES ('$flow_name.DIR_PATH')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'GETFILECOUNT' MODIFY PARAMETER 'GETFILECOUNT' SET PROPERTIES (BINDING) VALUES ('NO_OF_FILES')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'WHILE_LOOP' MODIFY PARAMETER 'CONDITION' SET PROPERTIES (VALUE) VALUES ('FILE_INDEX < NO_OF_FILES')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'GETNTHFILE' MODIFY PARAMETER 'GETNTHFILE' SET PROPERTIES (BINDING) VALUES ('FILE_NAME')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'GETNTHFILE' MODIFY PARAMETER 'DIR_PATH' SET PROPERTIES (BINDING) VALUES ('$flow_name.DIR_PATH')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'GETNTHFILE' MODIFY PARAMETER 'N' SET PROPERTIES (BINDING) VALUES ('FILE_INDEX')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FILE_LOAD' MODIFY PARAMETER 'DATA_FILE_NAME' SET PROPERTIES (VALUE,ISLITERALVALUE) VALUES ('$flow_name.DIR_PATH || ''/'' || FILE_NAME', 'false')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ASSIGN' MODIFY PARAMETER 'VALUE' SET PROPERTIES (VALUE,ISLITERALVALUE) VALUES ('FILE_INDEX + 1', 'false')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ASSIGN' MODIFY PARAMETER 'VARIABLE' SET PROPERTIES (BINDING) VALUES ('FILE_INDEX')

There are quite a few useful examples above that are not totally obvious from the scripting documentation.

In summary this is a useful little accelerator which you can also use to see how
basic constructs like LOOPs are constructed in process flows.

July 9, 2007

Dynamically generating target file names

A little known fact is how to dynamically create a file name for a map that has a file as a target (for example). The configuration property 'Target Data File Name' allows a data file name to be entered, it is actually possible to insert a PLSQL expression here given the correct formatting. This let's you define any valid PLSQL expression, so you could add a date to a file name for example.

In the example below  append the SYSDATE to the string 'orders' and this is the name of my file (a file name such as 'orders05-JUL-07' will be written).

Map File Target:

The generated code will be like the following snippet with the expression
being evaluated in PLSQL:

   

        "ORDERS_FILE_Fh" := UTL_FILE.FOPEN(

          'ITD_FILES_LOCATION1',

          'orders'||sysdate||'',

          'w',

          32767);



Basically the file name you type will be wrapped in single quotes, so
if you use quotes in your expression which end up with a valid PLSQL
expression you can do what you want.

Publishing Process Flow as a Web Service

Here we see how to publish an OWB process flow as a web service using 10g components (11g database has such capabilities built-in). OWB contains a PL/SQL package WB_RT_API_EXEC with a function called RUN_TASK that submits execution requests to the OWB runtime engine. The requests could be for mappings, process flows or scheduled jobs. This example illustrates how the function may be published as a web service to allow the execution of OWB process flows.

It is assumed that the reader is able to create Web Services using JDeveloper. For the purposes of this document; please imagine that a process flow MY_PKG / MY_PF (that's package name / process flow name) has been deployed to process flow location PFMOD_LOCATION1.
 
Overview of the RUN_TASK function

The specification for the function is

  function run_task
  ( p_location_name in varchar2
  , p_task_type in varchar2
  , p_task_name in varchar2
  , p_custom_params in varchar2 default null
  , p_system_params in varchar2 default null
  , p_oem_friendly in number default 0
  , p_background in number default 0
  ) return number;

this is the same function used by the sqlplus_exec_template.sql script in OWB 10gR2. It may be called using (assuming a repository owner connection);

declare
  result_num number;
begin
  result_num:= wb_rt_api_exec.run_task('PFMOD_LOCATION1', 'PROCESS', 'MY_PKG/MY_PF', ',', ',', 0 ,0);
end;

Full documentation of the function can be found in <OWB_ORACLE_HOME>/owb/rtasst/wb_rt_api_exec.pls

Publishing the Web Service using JDeveloper

Use the following steps to publish the function via JDeveloper;
  1. Create a new database connection referring to the OWB repository owner schema.
  2. Run JDeveloper�s �Create PL/SQL Web Service� wizard using the database connection and select package WB_RT_API_EXEC and then program unit FUNCTION RUN_TASK
  3. Build then deploy the WAR file to an Oracle Application Server
It is recommended that the deployed web services be protected using J2EE security techniques. Once available the web service endpoint will look like the following;

wb_rt_api_exec_run_task endpoint - For a formal definition, please review the Service Description (rpc style).

wb_rt_api_exec_run_task service
The following operations are supported.

·         runTaskBSSSSSBB
·         runTaskBSSSSSB
·         runTaskBSSSSS
·         runTask
·         runTaskBSSSS

JDeveloper has generated a selection of web services as the run_task function has default parameters. The runTask web service can be used to submit an execution request for any process flow, or map. In our case, to execute the process flow use the following values;

Process WS 1:

Web Services using Target Schema privileges

The previous example creates a Web Service using an OWB repository owner schema, you don't really want to to this, since it opens up the execution of a lot more objects than what you probably desire. Enterprise solutions should create Web Services using OWB target schema credentials so that end users can only access targets for which they know the password ie. they must have previously registered the location�s password.

In order to do this, you must create the following function in an OWB target user schema and then select it as the web service program unit when publishing as a web service from JDeveloper;

-- With this you will have to change the name of your CC owner, I have used OWB_CC
create or replace function execute_code ( obj_loc  varchar2,
                                          obj_typ  varchar2,
                                          obj_name varchar2,
                                          sys_parm varchar2,
                                          cus_parm varchar2)
                          RETURN NUMBER  AUTHID CURRENT_USER as
  r number;
  cc_owner varchar2(256) := 'OWB_CC';
begin
  execute immediate 'set role OWB_D_'||cc_owner||', OWB_O_' || cc_owner;
  execute immediate 'begin :r := '||cc_owner||'.wb_rt_api_exec.run_task(:obj_loc, :obj_typ, :obj_name, :cus_parm, :sys_parm,0,0); end;' using OUT r, IN obj_loc, IN obj_typ, IN obj_name, IN cus_parm, IN sys_parm;
  return r;
end;

This function has been defined with invoker rights and calls the OWB function with definer rights.

If the �share location password preference� is not set then please make sure that all locations have been registered by the OWB user that is submitting execution requests.

Web Services for specific execution objects

A further refinement is to create a Web Service that submits a specific execution to the OWB runtime engine. The above examples illustrated how to have generic functions to execute any process flow. An example function that can only call the process flow mentioned above is;

-- With this you will have to change the name of your CC owner, I have used OWB_CC
create or replace function execute_my_pf ()
                          RETURN NUMBER  AUTHID CURRENT_USER as
  r number;
  cc_owner varchar2(256) := 'OWB_CC';
begin
  execute immediate 'set role OWB_D_'||cc_owner||', OWB_O_' || cc_owner;
  execute immediate 'begin :r := '||cc_owner||'.wb_rt_api_exec.run_task('PFMOD_LOCATION1', 'PROCESS', 'MY_PKG/MY_PF', ',', ',', 0 ,0); end;' using OUT r, IN obj_loc, IN obj_typ, IN obj_name, IN cus_parm, IN sys_parm;
  return r;
end;

This has some advantages in that you can build a much more specific interface for calling the item and parameters can be defined explicitly rather than via the generic interface which provides greater flexibility for calling any flow/map.

July 10, 2007

OWB 11g and beyond webcast

Just a quick reminder, the webcast is tomorrow morning, 11.45 am ET:

http://ioug.itconvergence.com/pls/apex/f?p=219:1:408683087713141

Look at the Next Webcast section. Oh and it is a good idea to visit this once in a while to look for the next webcasts... on various, but also on OWB.

July 11, 2007

Webcast slides and the recording

For all of those who missed the webcast this morning, you can find the recording and the slides I used on the BIWA website. Very good to see so much interest for OWB's future, which I hope you all agree looks very bright indeed... Lots of good things coming.

Anyways, for those who want the direct links:

  • Presentation in PDF: http://ioug.itconvergence.com/pls/htmldb/DWBISIG.download_my_file?p_file=845
  • Recording with my stuff and the all important demonstrations(!): http://ioug.itconvergence.com/pls/htmldb/DWBISIG.download_my_file?p_file=846

July 18, 2007

What is new in 11g and 10.2.0.3 - Feature 1 Federation with OWB

Ok, you all watched the launch of database 11g (or did you tune in to the OWB webcast on the same day?), so lets write down what is coming in handy new things. To clarify the title, I will be using mostly 11.1 software to show the features, but they should work exactly the same in 10.2.0.3 unless stated otherwise...

So the first one that has been in OWB for a while, but that was broken as well, is how do you create Views and MViews while still keeping lineage and impact in tact? Well the simple answer is that you use mappings. In essence what we are doing here is working on Federation of data rather than consolidation (e.g. ETL). So we will leave the data in place but make it look like it is in a single place for any query.

In order to really do federation and use OWB there are some other pieces of the puzzle as far as infrastructure go that should be in place. In its briefest form a federated system with its Oracle technology in place could look like this:

FederationSchematic:

Now the cool bit here is that, while you can just do this with the OWB piece, that piece makes this really work as you need it to because you get:

  • Full metadata transparency delivering impact and lineage analysis
  • A graphical way of managing your data flows using the OWB GUI
  • A graphical debugger for your views (which are mappings at the end of the day)
  • Simplified maintenance and better documentation
  • Version management of all objects
  • A one button choice to either Federate or Consolidate without changing any logical design
The piece I want to show here, is how you can use mappings to actually create the views and how easy it is to change modes between Federation and Consolidation. Last we'll quickly look at the metadata story with a quick look at Lineage and change propagation.

A mapping for federation would look like this:

FedMappingOverview:

where the most notable thing is that the target in this case is a non-existing view. I simply added an empty view operator into the mapping and mapped the approriate columns from the joiner over to the view, which creates the attributes in the operator.

Next I do a Create and Bind on the view (right mouse click for the menu option) and I place it in a module. The result is as follows:

FedMappingGenView:

As you can see the entire view is resolved, the code is generated (the select essentially) including any location information (e.g. dblinks, schema references etc). This is where the maintenance becomes a lot simpler, connection informatino, dblinks etc are decoupled from the actual query and can be changed based on the system. Simply reuse a new configuration and regenerate...

Now if you want to change the strategy (for performance, or for other reasons), what you do is go back to the mapping and replace the view with a table operator. Lets say you created the table in OWB (whichever which way you want to really), you simply do a synchronize in the mapping to replace the view with a table and your system is now an ETL rather than a Federated system and after you deploy the mapping and the table you can run your jobs into the consolidated system.

Obviously if you would want to do this manually, you would create the table (as in OWB) and then move the select into a PL/SQL package and run that. Sounds simple enough, but if you have hundreds of these views in place the automation you can get from OWB will make this task so much faster that it cannot be beaten by hand coding.

Last but not least, changes can be made much easier and faster in OWB using the lineage and impact analysis functionality (not new in 10.2.0.3 nor 11g, but very relevant for this particular feature). If a source column has changed, you can propagate that directly to the view definition.

ChangePropagation:

There is no need to synchronize anything as this is all been taken care off (this includes the view definition)!

July 27, 2007

More process flow basics - for loops

Here we create a typical programming loop within the process flow iterating over a number of activities based on a numeric index. The flow created represents the following pseudocode;

  assign_rslt := 0;
  for (int iter_idx:=0; iter_idx < number_iterations; iter_idx++) {
    // my activity in the for loop is simply assigning the for loop variable
    // to another variable, this can be much more complex and involve other activity types
    assign_rslt := iter_idx;
  }

Using OMB this can be created as below, the script will create a process flow module named FLOW_EXAMPLE, a package named FLOWPACK and a flow named FLOW_FOR_LOOP. The flow can be deployed and executed after the flow module has been editing and configured and the location set. It has no dependencies since the content of the flow is very simple to illustrate the activities.

set flow_mod_name FLOW_EXAMPLE
set flow_pack FLOWPACK
set flow_name FLOW_FOR_LOOP
set iter_cnt 20
OMBCC '/$OMB_CURRENT_PROJECT'
catch { OMBCREATE PROCESS_FLOW_MODULE '$flow_mod_name' }
OMBCC '$flow_mod_name'
catch { OMBCREATE PROCESS_FLOW_PACKAGE '$flow_pack' }
OMBCC '$flow_pack'

OMBCREATE PROCESS_FLOW '$flow_name'
OMBALTER PROCESS_FLOW '$flow_name' ADD PARAMETER 'NUMBER_ITERATIONS' SET PROPERTIES (DIRECTION,DATATYPE,VALUE) VALUES ('IN','INTEGER','$iter_cnt')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'ITER_IDX' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'ASSIGN_RSLT' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')

OMBALTER PROCESS_FLOW '$flow_name' ADD ASSIGN ACTIVITY 'ACTIV_IN_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD FOR_LOOP ACTIVITY 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD END_LOOP ACTIVITY 'END_LOOP'

OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'ST_2_GC' FROM  ACTIVITY 'START1' TO 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_GF' FROM  ACTIVITY 'FOR_LOOP' TO 'ACTIV_IN_LOOP' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('LOOP')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'AS_2_EL' FROM  ACTIVITY 'ACTIV_IN_LOOP' TO 'END_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'EL_2_WL' FROM  ACTIVITY 'END_LOOP' TO 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_EN' FROM  ACTIVITY 'FOR_LOOP' TO 'END_SUCCESS' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('EXIT')

OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'CONDITION' SET PROPERTIES (VALUE) VALUES ('ITER_IDX < NUMBER_ITERATIONS')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'INITIAL_VALUE' SET PROPERTIES (VALUE) VALUES ('0')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'VARIABLE' SET PROPERTIES (BINDING) VALUES ('ITER_IDX')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'NEXT_VALUE' SET PROPERTIES (VALUE,ISLITERALVALUE) VALUES ('ITER_IDX + 1', 'FALSE')

# Define the assign activity in the loop
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ACTIV_IN_LOOP' MODIFY PARAMETER 'VARIABLE' SET PROPERTIES (BINDING) VALUES ('ASSIGN_RSLT')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ACTIV_IN_LOOP' MODIFY PARAMETER 'VALUE' SET PROPERTIES (VALUE) VALUES ('ITER_IDX')


There are a few properties such as the TRANSITION_CONDITION (values LOOP and EXIT) that are important. The actual for loop activity has a bunch of parameters that constitute the loop variable, the initial value the increment expression and the exit criteria. Hope this helps.

August 6, 2007

Set based errors - DML Error Logging

Error logging enables the processing of DML statements to continue despite errors being encountered during the statement execution. The details of the error such as the error code and the associated error message are stored in an error table. After the DML operation completes, you can use the error table to correct rows with errors and subsequently process. DML error logging is supported for SQL statements such as INSERT, UPDATE, MERGE, and multi-table insert.

It is useful in long-running, bulk DML statements - for example processing 1 million records and 10 fail, with DML error logging all good records can be committed and the 10 error rows recorded in an error table. Until OWB 10.2.0.3 this was only possible with row based mapping code, now it is possible in set based mode also.

Warehouse Builder provides error logging for the tables, views, and materialized views used in set-based PL/SQL mappings. DML error logging is supported only for target schemas created in Oracle DB 10g R2 or later.

The Error Table

Error tables store error details. You can define error tables for tables, views, and materialized views only. Error tables are used for the following purposes:

  • DML error logging (including physical errors).
  • Capturing logical errors when data rules are applied to tables, views, or materialized views.
An error  table is generated and deployed along with the base table, view, or materialized view if the shadow table name is set. The error table will have the following columns for DML errors;

Column NameDescription
ORA_ERR_NUMBER$Oracle error number
ORA_ERR_MESG$Oracle error message text
ORA_ERR_ROWID$Rowid of the row in error (for update and delete)
ORA_ERR_OPTYPE$Type of operation: insert (I), update (U), delete (D)
ORA_ERR_TAG$Step or detail audit ID from the runtime audit data. This is the STEP_ID column in the runtime view ALL_RT_AUDIT_STEP_RUNS.

If you do not want OWB to generate the error table, you can always build your own error table and supply the name to the mapping, the database provides a function DBMS_ERRLOG.CREATE_ERROR_LOG for generating an error table that can also be used.

Enabling DML Error Logging in ETL

DML error logging is generated for set-based PL/SQL mappings if the following conditions are satisfied:
  • the Error table name property is set for the operator (table/view/mv)
  • the PL/SQL Generated Mode of the module that contains the mapping is set to 10gR2 and above or Default.
If the value is set to Default, ensure that location associated with the module has the Version set to 10.2 or above.

When you use a data object in a mapping, the Error Table Name property for this data object is derived from the shadow table name property of the data object . If you modify the error table name of a data object (using the shadow table name property), you must synchronize all the operators bound to this data object.

The execution of mappings that contain data objects for which DML error logging is enabled fails if any of the following conditions occur:
  • the number of errors generated exceeds the specified maximum number of errors for the mapping. The default set for this value is 50. You can modify this value by setting the Maximum number of errors configuration property of the mapping. In the Project Explorer, right-click the mapping and select Configure . In the Maximum number of errors property, specify the maximum number of errors that can generated before the mapping execution is terminated.
  • errors occur due to functionality that is not supported. (see SQL Reference manual for details of DML Error logging feature restrictions). Depending on your error logging needs you can configure the table operator in a mapping to use the APPEND or NOAPPEND hint. For example, direct-path insert does not support error logging for unique key violations. To log unique key violations, use the NOAPPEND hint (be aware of performance implications of this, there is an interesting article here on some performance findings of using direct path and conventional path modes with the DML error logging feature).
So in my map
DML Errors 1:

The maximum number of errors was set to 50 (the default).

Now (with OWB 10.2.0.3) I get the following SQL generated now with the 'LOG ERRORS INTO ' clause generated;
INSERT INTO "TSALES"
  ("PROD_ID",
....
  "SALES"."AMOUNT_SOLD" "AMOUNT_SOLD"
FROM
  "SALES"  "SALES"
  )
LOG ERRORS INTO TSALES_ERR (get_audit_detail_id) REJECT LIMIT 50;

There is also a truncate error table property that can be used for housekeeping.

If you want to process the errors from within the mapping you can retrieve the current execution errors (that is if your error table has errors for more than one execution, it may not since your may choose to truncate at the start of the map execution) by joining the error table with the ALL_RT_AUDIT_STEP_RUNS views and use the get_runtime_audit_id variable in the join condition;

DML Errors3:

In the above map you see the first step load the TSALES table, the second step will join the error table with the ALL_RT_AUDIT_STEP_RUNS using the audit id recorded in the error table in the column ORA_ERR_TAG$ and the STEP_ID column. You could choose a different strategy here either always truncating the error table for the map or using the max step id for example.

This is a brief introduction to DML Error logging support in OWB 10.2.0.3. More to come I am sure.

August 13, 2007

Control Flow in a Map

In a multi step mapping here is an approach to determine the status of steps within the map and control processing.

If you have 2 steps in a map step A and B, after each step in your map you can add another step which checks the
status of the previous step and based on a
condition does what you need. This concept is based on a set based (pure SQL map) and also the dependency on step result status is based on the OWB runtime audit trail, so if you do not use the audit trail, this will not fly.

The OWB runtime audit views such as ALL_RT_AUDIT_STEP_RUNS can be used. You can either reverse engineer these into your project and use them in the map or add an unbound table/view operator such as ALL_RT_AUDIT_STEP_RUNS into the map.

Using this view if you add a filter which does something like;

INOUTGRP1.MAP_RUN_ID = get_runtime_audit_id
AND INOUTGRP1.STEP_NAME ='SALES_STEPA'
AND INOUTGRP1.NUMBER_RECORDS_INSERTED > 0

where the
previous step has a target operator named SALES_STEPA (which is bound to
the SALES table for example), the filter will check the current map execution (since get_runtime_audit_id is used), step
SALES_STEPA and see if any records were inserted for example then you
can put whatever logic you need after the filter. You would add this
logic after each step in the map. The filter can do whatever you need, check errors, records inserted, updated etc.

To illustrate, here is simple map with 2 main data steps (2 data targets we are really interested in) named SALES_STEPA and SALES_STEPB. I've added intermediate steps for saving some state information in MY_CONTROL_TABLE after each step.

Map Step Control:

I have used the map's target load order to ensure that the order is SALES_STEPA, MY_CONTROL_TABLE, SALES_STEPB, MY_CONTROL_TABLE2. The runtime view ALL_RT_AUDIT_STEP_RUNS has a number of useful columns for determining the status of the step and what happened.

Above we've looked at the body of the map. There are also pre and post map customization points where you can call a PLSQL function or procedure. For pre map customization use the 'Pre-Mapping Process'. With this operator based upon its result the rest of the mapping can be conditionally executed. Use the property 'Mapping Run Condition' to determine whether the body of the map should be executed, by default this property has value SUCCESS, so only if the routine completes successfully will the body be executed. For post map customization use the 'Post-Mapping Process' operator. With this operator it can be conditionally executed based upon the result of the map body. The property 'Post-Mapping Process Run Condition' is defaulted to ON SUCCESS, but this can be changed to ALWAYS / ON ERROR / ON WARNING if desired.

Hopefully this overview of the anatomy of a map is a useful starter.

August 14, 2007

SQL and OWB - Accelerated Map Construction? Part 2

Regarding reverse engineering OWB maps from SQL (see earlier post here), I've uploaded the expert as stands, it is a demo remember with no support contract:) It will be useful for people to see a range of OMB related to map construction as well as illustrating some of the SQL to OWB potential. Some of the demo SQL below requires the OWB 10.2.0.3 patch since there were improvements and fixes in the aggregator operator that allow you to use the CUBE/ROLLUP functions and more.

Installation Notes:
To run the expert you'll have to do the following steps;

  1. import MDL (build_mapping_from_sql.mdl)
    into your repository. This will create a global expert in Global
    Explorer under Public Experts/SQL_ACCELERATORS/BUILD_MAP_FROM_SQL
  2. Download the SQL parser from
    here. There is a little bit of work here...unzip the archive,
    change into the Zql/classes directory. Create a java archive of the Zql
    classes for the expert  by executing the following (on Windows) 'jar cvf
    zql.jar Zql\*.class' or (on UNIX the following) 'jar cvf zql.jar Zql/*.class' You must have a JDK bin  directory on your
    path (OWB software home has one under the oracle home under jdk/bin).
    ZQL claims 'Zql is not a commercial product: feel free to use it, but we
    provide no warranty.'. For this demo it was really useful for illustrating these capabilities, the API is simple to use and provided an easy way to extend the function support.
  3. Copy the JAR file (zql.jar)  to your OWB client install under
    owb/lib/ext    You will have to restart your OWB client.
  4. In the main tree add the expert as a shortcut to a map. Right click
    map, select Add/Remove experts enable check box for build map from SQL,
    click OK. The expert should now appear on the menu.

Notes:

You are now ready to go. You can use any tables and SQL you have. Here are some adhoc notes:

  • SQL Terminator: SQL must be terminated by a semi-colon (;)
  • Name Resolution: Names are resolved either
    from the context in which the map is being created or if the table
    name is prefixed with the schema the table must exist in an OWB module
    with that name. So if I create a map for select ENAME from MYSCOTT.EMP;
    then the table EMP will be bound to the table EMP in OWB Oracle module
    MYSCOTT.
  • Function support is restricted to ANSI SQL and Oracle functions, if you have custom functions you will have to add an entry in the expert - see the tcl function addOracleFunctions (in the expert code) for examples, this is in the procedure declaration property of the expert.
  • Some Limitations:

  1. INSERT and SELECT statements only..on tables (for now, it is a
    demo).
  2. ANSI SQL so no Oracle specific syntax supported (join syntax etc..)
  3. Limited subquery support

  4. The Oracle function library has been added so it can be parsed,
    but not all functions tested.
  5. Literals must be named (ie select 1 a, 'hello' b from dual; will
    work,
    select 1 from dual will fail)
  6. Datatype analysis of literals is limited (NUMBER/VARCHAR2
    supported)
  7.  +++ TBD

Demo SQL

Some demonstration SQL that illustrates what has been covered. Import the tables from the sample SH schema into an OWB module named
SH and HR schema into an OWB module named HR.



1. Now run the expert from the mappings node in SH module, enter the
following SQL. If you add a target table and map from EXPR1.OUTGRP1 to
the new tables input group you can generate the OWB code. This example
illustrates the OWB aggregator operator setting up the group by clause
and the output expressions.


SQL:

SELECT
  DECODE(GROUPING(channel_desc), 1, 'Multi-channel sum',
channel_desc) AS Channel,
  DECODE (GROUPING (country_iso_code), 1,
'Multi-country sum', country_iso_code) AS Country,
  TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales,
customers, times, channels, countries
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id AND customers.country_id =
countries.country_id AND sales.channel_id= channels.channel_id AND
channels.channel_desc IN ('Direct Sales', 'Internet') AND
times.calendar_month_desc= '2000-09' AND country_iso_code IN ('GB',
'US')
GROUP BY CUBE(channel_desc, country_iso_code);

Generated Map:
SQL Example1:


2. This example illustrates using a target table TGT (it will be
defined by the expert) and again it uses aggregation. It depends on the
SCOTT EMP table (and must be run from SCOTT module);


SQL:

INSERT INTO TGT (sal,deptno)
 SELECT sum(sal) salary, deptno deptno
 from
emp p
group by deptno
having comm>1;


Generated Map:

SQL Example2:




3. This example illustrates using the Oracle UPPER function and simple
datatype analysis of inputs;


SQL:

select 'MY_VAR' stringcol, 1 num_col, SYSDATE today, upper(ename) uname
from emp;


Generated Map:

SQL Example3:


4. Set operations are also supported, the example below illustrates a
compound statement with aliases for tables, sorting, deduplication,
filtering (run from SCOTT mappings);


SQL:

INSERT INTO TGT (ename, deptno, code )
SELECT  distinct e.ename en, 5
dn, upper('ggg') x FROM emp e, dual d where e.deptno>1
union
select   v.ename en, 999 dnx, '45000' xx FROM emp v  where
v.deptno>10000
minus
select  p.ename pn,111 fff, '3333' yy from emp p
order by en DESC;


Generated Map:

SQL Example4:


5. Illustrating some Oracle functions let's see XML extractValue (I
added an alias for the column for it to work, it should be run from HR
depending on PURCHASEORDER table);



SELECT extractValue(OBJECT_VALUE,
'/PurchaseOrder/LineItems/LineItem[Part/@Id="715515011020"]/Description')
ev from purchaseorder;



6. Illustrating some more XML (I
changed alias for the column and removed the double quotes around
"Reference" for it to work, it should be run from HR
depending on PURCHASEORDER/EMPLOYEES table);



SELECT extractValue(OBJECT_VALUE, '/PurchaseOrder/Reference') Reference
FROM purchaseorder, hr.employees e WHERE extractValue(OBJECT_VALUE,
'/PurchaseOrder/User') = e.email AND e.employee_id = 100;




7. An aggregation operator with a subquery (this should be run from HR
mappings node), the subquery is embedded in having expression;


SQL:

SELECT department_id, manager_id
FROM employees
GROUP BY department_id,
manager_id
HAVING (department_id, manager_id) IN   (SELECT
department_id, manager_id FROM employees x WHERE x.department_id =
employees.department_id) ORDER BY department_id;

Generated Map:

SQL Example7:

Summary
The experts and demo SQL provide a real taster for the accelerated map construction from SQL as mentioned this is not a polished product but an example of expert technology that can be a real time saver.

August 28, 2007

Match and Merge - Complex Deduplication Rules Part 1

Often the match and merge operator is completely missed by users in OWB. Turn the light on. It is probably the most powerful mapping operator in the operator set and it is in the Core ETL of OWB. OWB has had this operator for a number of years and it has been quietly improving as time has gone by. For example some core algorithms were added in 10gR2 of the database specifically for improving performance in this area. UTL_MATCH is in the database in 10gR2, OWB had a PLSQL implementation prior to this - includes Levenshtein Distance/Jaro Winkler algorithms etc. OWB also provides double metaphone routines that can be used if desired.

There are some nifty capabilities that are not immediately obvious (what's new you cry!), here I'll lay down an example that shows cross table matching using compound custom rules. We have a number of tables, files (whatever) from various sources that should be consolidated. We have a bunch of rules that define how the rows should be matched and we have rules for how the rows should be merged.

Match rules in order:

Column
Rule description
SSN
If the SSN is not null and not equal to 999-99-9999 then use fuzzy (edit distance) matching
SAP_CUST_IDIf the SAP_CUST_ID is not null then use partial (abbreviation) matching
XYZ_CUST_IDIf the XYZ_CUST_ID is not null then use exact matching
ABC_CUST_IDIf the ABC_CUST_ID is not null then use exact matching
NAMEif first name and the last name are not null then use fuzzy (soundex) matching

So the match rules are read as in match if the rule 1 description is satisfied, or if the rule 2 description is satisfied etc..

Merge Rules:
Column
Rule description
NAME_MThe longest non-null middle name
SSN
The most common SSN
NAME_F
The longest non-null first name from Table A
A_CUST_SEQ
From the same record as the merged SSN
SAP_CUST_IDThe most common SAP_CUST_ID with 7 characters

The merge rules are descriptions of how each column should be projected through the merge.

Datastore A
   
XYZ_CUST_ID ABC_CUST_ID CUST_SEQ SAP_CUST_ID NAME_F NAME_M NAME_L SSN
9138 KI17038 John Martin Smith 915-12-1234
2271 KI17038 Jonathan R. Smith
4805 R5KI9-17038

Smith 915-21-1234
2716 R5KI9-17038N Jonathan V Smith 286-17-5289

Datastore B
XYZ_CUST_ID ABC_CUST_ID CUST_SEQ SAP_CUST_ID NAME_F NAME_M NAME_L SSN
293105


John
Smith
293105


Marianne
Smith
292617


Jon M. Smith

Datastore C
XYZ_CUST_ID ABC_CUST_ID CUST_SEQ SAP_CUST_ID NAME_F NAME_M NAME SSN

006-41803



Smith 999-99-9999

2006-41803



Smith 999-99-9999

2006-41803



Smith 915-21-1234

These 3 tables, files (or whatever) have related data that we wish to match-merge with the rules above.

So how to initially consolidate this disparate data in preparation for match-merging? Simply use a union all operator for example.

The rules when applied mean the following rows would be matched;

Match Merge 1:

In OWB using the match merge operator we create the match rules. For the table of match rules above we define the rules as below. The first rule MA_1_SSN is defined as a custom rule, the implementation
(in a very small amount of PLSQL) checks that the SSN is not null, it
is not equal to '999-99-9999' (for both 'this' and 'that' - the rows
being compared) and that the edit distance is less than 2 (this example
was based on 10gR2, so I used the database's native UTL_MATCH package,
OWB provided OWB_MATCH_MERGE for pre 10gR2 with a pure PLSQL
implementation).

Match Merge 2:

Note all rules in above example are defined as passive and only one is active. OWB does not directly execute passive match rules. It only executes passive rules when they are called through an active custom match rule. All defined match rules appear in a list of available functions in the Custom Match Rule Editor. To reorder rules, click down on the row header on the left hand side, hold down
for a couple of seconds then drag the row up or down.

The second rule is ensuring the SAP_CUST_ID column matches using partial (abbreviation) matching (the rule type is conditional and the algorithm is abbreviation):

Match Merge 3:

Note here, the rule we wanted to defined was 'If the SAP_CUST_ID is not null then use partial (abbreviation) matching'. We will do the not null check in the final custom match rule! This goes for the next few rules.

The next rule checks XYZ_CUST_ID for an exact match:

Match Merge 4:

The fourth rule checks ABC_CUST_ID for an exact match:

Match Merge 5:


Finally the active rule at the end is a composite rule that augments the earlier defined rules with further refinements (for example adds not null checks on SAP_CUST_ID rule) then finally performs SOUNDEX matching on the names:


Match Merge 6:


When you are building your custom rules there is a specific editor when you edit the rule, it has the available parameters and match functions available for building the custom rule. You can incorporate other rules including the passive rules defined (in this way they are triggered).

Match Merge 7:


So finally our match rules are defined!

Merge Rules

The merge rules define how the operator's merged attributes are determined. Let's reflect what the requirements are:

Column
Rule description
SSN
The most common SSN
NAME_M
The longest non-null middle name
NAME_F
The longest non-null first name from Table A
SAP_CUST_ID
The most common SAP_CUST_ID with 7 characters
A_CUST_SEQ
From the same record as the merged SSN

OWB provides a number of built-in merge rule types, for some of the above we need a custom rule, let's dig deeper.

The NAME_M column is populated from the longest middle name, for this attribute the Min/Max rule type can be used, the attribute selected and Longest chosen as the attribute relation (it could be mininum,maximum,longest,shortest):

Match Merge 8:

For NAME_F we will decide that table A will will be identified by all records having a CUST_SEQ value of null. We could also have identified this by adding a constant attribute for each source (or system) when we unioned the tables and used this rather than purely being dependent on the data. We can code custom merge rules to decide on which value to merge, here we loop around and return the longest non-null name - this is just a simple PLSQL loop around the collection M_MATCHES;

fName varchar2(2000) := null;
BEGIN
    -- return the longest first name from table a
    -- in table a, CUST_SEQ is not null
  FOR i IN M_MATCHES.FIRST .. M_MATCHES.LAST LOOP
      IF  M_MATCHES(i)."NAME_F" IS NOT NULL and
          M_MATCHES(i)."CUST_SEQ"  is not null THEN
        IF fName IS NULL OR LENGTH(RTRIM(M_MATCHES(i)."NAME_F")) > LENGTH(RTRIM(fName)) THEN
          fName := M_MATCHES(i)."NAME_F";
        END IF;
      END IF;
  END LOOP;
  RETURN fName;
END;

The SSN rule can be defined using a PLSQL loop that simply counts occurrences and returns the SSN with the highest occurrence count, here we see the definition of the custom rule for SSN. It is simply 2 loops one 2 get the occurrence count, the 2nd loop returns the highest occurrence count:

type info_type is table of varchar2(4000) index by binary_integer;
type infocnt_type is table of pls_integer index by binary_integer;
v_infos info_type;
v_infocnt infocnt_type;
retval varchar2(20);
retcnt pls_integer;
found boolean;
indx pls_integer := 1;
begin
  -- return the modal value of "SSN" that is not null
  for i in M_MATCHES.FIRST .. M_MATCHES.LAST loop
      if M_MATCHES(i)."SSN" is not null and
M_MATCHES(i)."SSN" != '999-99-9999' then

        found := false;
        if v_infos.count > 0 then
          for j in  v_infos.first .. v_infos.last loop

            if found = false and v_infos(j) = M_MATCHES(i)."SSN" then
              -- found it
              v_infocnt(j) := v_infocnt(j) + 1;
              found := true;
            end if;
          end loop;
        end if; -- v_infos is not null
        if found = false then
          v_infos(indx) := M_MATCHES(i)."SSN";
          v_infocnt(indx) := 1;
          indx := indx + 1;
        end if;
      end if;
  end loop;
  -- find value with max count
  if v_infos.count > 0 then
    for i in v_infos.first .. v_infos.last loop
      if retval is null then
        retval := v_infos(i);
        retcnt := v_infocnt(i);
      else
        if (v_infocnt(i) > retcnt) then
          retcnt := v_infocnt(i);
          retval := v_infos(i);
          indx := indx + 1;
        end if;
      end if;
    end loop;
  end if;
  return retval;
end;


This might look like a lot of code, but really is pretty much 97% boilerplate. Look for SSN in the above code, there are only a few specific parts related to this pattern of finding the most common occurrence.

The CUST_SEQ merge rule which returns the CUST_SEQ from the merged SSN is a loop to find the SSN and then a return of the matched SSN, if none found then return the previously merged CUST_SEQ:

BEGIN
  -- select the cust_seq from the same record as the merged SSN.
  for i in M_MATCHES.FIRST .. M_MATCHES.LAST loop
      if M_MATCHES(i)."SSN" is not null and
         M_MATCHES(i)."CUST_SEQ" is not null and
         M_MATCHES(i)."SSN" = M_MERGE."SSN"  then
        return  M_MATCHES(i)."CUST_SEQ" ;
      end if;
  end loop;
  return  M_MERGE."CUST_SEQ";  -- previously selected cust sequence
END;

The rule for SAP_CUST_ID is to select the most common customer id that
is 7 characters in length, we can easily do this using the rank rule
type and the expression can check for length of 7:

Match Merge 9:

I think that's enough on the rules for both matching and merging, you get the idea now that this is very powerful.

Anatomy of the Map

Let's check out the map structure, to consolidate multiple source tables you can simply union all the tables together. The great thing about OWB is that you can actually also do a lot more if needed! You can perform a selection/filter process using the OWB operators up until the data is pushed into the match merge operator. This is a technique that we've suggested in the past to improve the performance so that there is not a bunch of redundant matching going on - so ensure you have reduced the number of bins to those bins with changes.

Match Merge 10:

The binning concept is very simple; it is all to do with divide and conquer! Often misunderstood and misused. Divide the problem up into bins and match and merge these chunks. A good use of binning can dramatically reduce the number comparisons the map will perform.

Here is an excerpt from the documentation:
'When
Warehouse Builder matches the rows, it compares each row with the subsequent row
for all rows within the same grouping. Limiting the number of rows can greatly
enhance performance, because Warehouse Builder searches for matches only within
a bin and not throughout the entire data set.'


The OWB wizard would be better if there was a binning advisor that actually tied into the data and said OK, you have selected column ZIPCODE this will give you a median bin size of W, an average bin size of X, a minimum bin size of Y and a max bin size of Z, this kind of information would help you make some of the decisions. You must also remember that you cannot pick any arbitrary column as the binning since you also want to avoid separating rows that should be matched. The
attributes you select for grouping similar rows depends on your data.

The general matching loops are structed as;

OUTER Loop for each bin
   MIDDLE loop for new records (filtered input records) OR MIDDLE loop for all input records
      INNER loop for all input records

In the MIDDLE loop it is possible to use the 'Match new records only' option which will restrict the MIDDLE loop to only those records identified by the new record condition. Although entitled new records only, you can use this for any specialist filtering that you have such as matching new and updated records. This is another option that again can dramatically alter performance.

In Summary

As you can see OWB has very sophisticated deduplication and match merge capabilities essential for incorporating data quality into your information processes. This is integrated into the OWB designer and works just like the rest of the operator set, and now you can see it truly is one of the richest and powerful operators in Oracle's data quality operator set.

There is a zip file here containing the MDL for this demonstration along with the data for the sources, have a play.

In further posts we'll look at the matching of items based on context information (see an existing OBE here) such as whether the column is a name, an address or whatever. There are a number of ways to compare strings and depending on what the data means, different comparison algorithms can be used, we will look at this further.

September 21, 2007

Leveraging XDB

I've been meaning to post details on leveraging XDB from OWB for some time, so finally here we go. This note is intended to provide an overview of how to build transformations of XML related objects within OWB 10gR2, it is intended to give a brief overview of what can be achieved and by no means is it a definitive list of capabilities (see the XML DB Developer�s Guide for a complete reference). It cross-references a few examples constructed in the XML DB Developer�s Guide (Part Number B14259-02). From this the XML expert utilities were constructed for generating components based upon common transformation patterns (consuming and generating XML), you can find those on the exchange along with the demos (download here).

With the addition of many Oracle data types/object types in the OWB 10gR2 release and additional operators in mapping, the world of XML DB is exposed within OWB. So what can be done? The following XDB functionality is possible with OWB 10gR2.

  • Use an XML source to create Relational targets
  • Use Relational sources to create XML targets
  • Transform XML source to XML targets
  • and probably much more besides
To build XML related maps in OWB involves understanding how the various XML SQL operators can be incorporated in OWB mapping operators, after this is grasped there are endless possibilities.

When constructing the maps there are a couple of rules of thumb for the production 10gR2 release.
  • Always generate set based code (bug 5049926), and
  • Use invoker rights for the mapping (to enable access to the XDBADMIN role from the generated PLSQL package).
Let's build this up example by example....

XML to Relational (XDB Example 3.27)

This example demonstrates how to access the text nodes for each Description element in the PurchaseOrder document. The OWB mapping designed in the following section will construct the following SQL as a source to a target table;
SELECT extractValue(value(des), '/Description')
  FROM purchaseorder p,
       table(XMLSequence(
               extract(p.OBJECT_VALUE,
                       '/PurchaseOrder/LineItems/LineItem/Description'))) des
  WHERE existsNode(p.OBJECT_VALUE,
                   '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')

Filtering with existsNode. The where clause can be constructed using the OWB Filter operator, the filter expression is defined in the OWB Expression Builder using the XML SQL operator �existsNode�;
SELECT extractValue(value(des), '/Description')
  FROM purchaseorder p,
       table(XMLSequence(
               extract(p.OBJECT_VALUE,
                       '/PurchaseOrder/LineItems/LineItem/Description'))) des
  WHERE existsNode(p.OBJECT_VALUE,
                   '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')
        = 1;

XML Map1:

The correlated join between the results of the SQL function table and the row operated on by the SQL function extract is shown below and is a mixture of an expression with the extract operator and the varray iterator operator. The table that provides input to extract must appear before the table expression in the FROM list - OWB generates this code, XMLSEQUENCE returns a varray of XMLType (XMLTypeSequence � this is seeded in OWB as SYS.XMLSequenceType). The correlated join ensures a one-to-many (1:N) relationship between the rows generated by the SQL function table and the row containing the value that is processed by extract;

SELECT extractValue(value(des), '/Description')
  FROM purchaseorder p,
       table(XMLSequence(
               extract(p.OBJECT_VALUE,
                       '/PurchaseOrder/LineItems/LineItem/Description'))) des
  WHERE existsNode(p.OBJECT_VALUE,
                   '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')
        = 1;

XML Map2:

The varray iterator named ITERATE_OVER_TABLE_XMLTYPE is for iterating over the varray (XMLSequenceType) generated by the XMLSequence SQL operator;

SELECT extractValue(value(des), '/Description')
  FROM purchaseorder p,
       table(XMLSequence(
               extract(p.OBJECT_VALUE,
                       '/PurchaseOrder/LineItems/LineItem/Description'))) des
  WHERE existsNode(p.OBJECT_VALUE,
                   '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')
        = 1;

XML Map3:

The extraction of the description attribute is performed using the expression operator in OWB. Many output expressions can be defined, for example below the attribute DESCRIPTION has been defined with the expression extractValue(INGRP1.VALUE, �/Description�) as below;

SELECT extractValue(value(des), '/Description')
FROM purchaseorder p,
       table(XMLSequence(
               extract(p.OBJECT_VALUE,
                       '/PurchaseOrder/LineItems/LineItem/Description'))) des
  WHERE existsNode(p.OBJECT_VALUE,
                   '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')
        = 1;

Additional attributes can be added to the expression operator�s output group if more information is needed from the XML. For loading large documents in 10g the database introduced API
DBMS_XMLSTORE , which is a good option since it is implemented in the
Oracle kernel in C so will give good performance, but does have the
caveat that the document must be the same shape as the implicit file to
table schema mapping (the mapping between the XML tags and relational
columns names is implicit in the engine, see the Oracle XML
documentation for more details, the map between SQL to XML and XML to
SQL can be found here). (See note 358553.1 for information on
constructing XSL for complex tables).

Loading from non-staged source

It is also possible with OWB to pass the XML document for the ETL mapping as an input parameter to the mapping (or construct it in an expression). You should be wary though you may get the 'ORA-31186 too many nodes' error since for in-memory XMLTypes have a limit in 10gR2 for XPath result sizes of 64k (the 64k limit is the number of nodes matched by the xpath not the total no. of nodes in the document). You can resolve this by loading the XMLType into an XMLTYPE table, registering the XML schema will also be valuable. This approach may be favorable if there is a file input or a real time consumer which takes the XML document from a queue and executes the OWB generated PLSQL mapping package passing the XML document as input parameter (commit control attribute also can be configured). The real time process could be an Oracle Streams apply process for example.
For example to execute the map from a PLSQL block, the XML can be passed as an input parameter to the map execution as follows;
declare
  xml XMLType;
  status VARCHAR2(4000);
begin
  -- read the file
  xml := XMLType(bfilename('SUBDIR',
            'VJONES-20021009123337583PDT.xml'),
           nls_charset_id('AL32UTF8'));
  -- execute the mapping
  example_3_27_from_input.MAIN(status,xml);
end;

Relational to XML (XDB Example 3.44)

Using SQL/XML Functions to Generate XML. This query generates an XML document that contains information from the tables departments, locations, countries, employees, and jobs:
SELECT XMLElement(
         "Department",
         XMLAttributes(d.Department_id AS "DepartmentId"),
         XMLForest(d.department_name AS "Name"),
         XMLElement(
           "Location",
           XMLForest(street_address AS "Address",
                     city AS "City",
                     state_province AS "State",
                     postal_code AS "Zip",
                     country_name AS "Country")),
           XMLElement(
             "EmployeeList",
             (SELECT XMLAgg(
                       XMLElement(
                         "Employee",
                         XMLAttributes(e.employee_id AS "employeeNumber"),
                         XMLForest(
                           e.first_name AS "FirstName",
                           e.last_name AS "LastName",
                           e.email AS "EmailAddress",
                           e.phone_number AS "PHONE_NUMBER",
                           e.hire_date AS "StartDate",
                           j.job_title AS "JobTitle",
                           e.salary AS "Salary",
                           m.first_name || ' ' || m.last_name AS "Manager"),
                         XMLElement("Commission", e.commission_pct)))
                FROM hr.employees e, hr.employees m, hr.jobs j
                WHERE e.department_id = d.department_id
                  AND j.job_id = e.job_id
                  AND m.employee_id = e.manager_id)))
  AS XML
  FROM hr.departments d, hr.countries c, hr.locations l
  WHERE department_name = 'Executive'
    AND d.location_id = l.location_id
    AND l.country_id  = c.country_id;

The joiner operator relates the 3 HR source tables and defines the join condition to use;
SELECT �.
  FROM hr.departments d, hr.countries c, hr.locations l
  WHERE department_name = 'Executive'
    AND d.location_id = l.location_id
    AND l.country_id  = c.country_id;

XML Map4:

Create a forest
of XML elements from the relational items:

XML Map5:



Define the XMLElement for �EmployeeList� that is created from an inline query containing calls to XMLElement, XMLAttributes and XMLForest.

XML Map6:

Collate the expression together to create a Department element.

XML Map7:

That's a quick run through, with some creative use of OWB map operators you can now see how to leverage the XML SQL functions. There are a few experts for consuming and generating XML including useful utility functions for saving XML to the filesytem. On the blog I've posted a few entries for processing files from a directory which are all useful pieces in the puzzle. The XML generation expert will allow you to pick a bunch of tables and build a document from them, you define the join criteria between the tables and the XML element and attribute names and the XML expert will generate the components to generate the XML. The consuming expert is based on an XML schema which you supply and it will generate a number of pluggable mappings for the components in the schema. Remember they are experts so don't be afraid to open the box and change the template code, you can do all sorts including all kinds of complex XQueries.


November 6, 2007

Merge rules in scripting

So it has been a bit busy over here with OpenWorld coming up (are you all going???)... but I recently played around with the match merge case discussed earlier seeing if I can script some pieces of it and that was quite a bit of fun. But here it is...

Keep in mind that this does the entire set of MERGE rules for the operator (I've already created the rest via the UI). Also I'm using some custom rules which is placed in a variable and then loaded.

OMBCC 'MY_PROJECT/TARGET'

#Creating a match bin

OMBALTER MAPPING 'CUST_MATCH_MERGE'
MODIFY OPERATOR 'MATCHMERGE'
SET PROPERTIES (MATCH_KEYS) VALUES('INGRP1.NAME_L')

#Creating a merge rule set
#Rule 1:
#Note that the order of the last two settings has to be exactly like this! TYPE before Attribute.

OMBALTER MAPPING 'CUST_MATCH_MERGE'
    ADD MERGE_RULES 'ME_1_NAMEM'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (TYPE) VALUES ('MM_MIN_MAX')
    MODIFY MERGE_RULES 'ME_1_NAMEM'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (ATTRIBUTE_NAME) VALUES ('NAME_M')
    MODIFY MERGE_RULES 'ME_1_NAMEM'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (MIN_MAX_TYPE) VALUES ('MM_LONGEST')
    MODIFY MERGE_RULES 'ME_1_NAMEM'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (MIN_MAX_ATTRIBUTE) VALUES ('NAME_M')        

#Rule 2:
#Custom rule text is added to a variable and called from OMB.
#Note the escape characters in the PL/SQL text:
# Double quotes are escaped by a slash
# Single quotes are escaped by a single quote

set custom_rule2 "fName varchar2(2000) := null;
BEGIN
    -- return the longest first name from table a
    -- in table a, CUST_SEQ is not null
  FOR i IN M_MATCHES.FIRST .. M_MATCHES.LAST LOOP
      IF  M_MATCHES(i)."NAME_F" IS NOT NULL and
          M_MATCHES(i)."CUST_SEQ"  is not null THEN
        IF fName IS NULL OR LENGTH(RTRIM(M_MATCHES(i)."NAME_F")) > LENGTH(RTRIM(fName)) THEN
          fName := M_MATCHES(i)."NAME_F";
        END IF;
      END IF;
  END LOOP;
  RETURN fName;
END;"

OMBALTER MAPPING 'CUST_MATCH_MERGE'
    ADD MERGE_RULES 'ME_2_NAMEF'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (TYPE) VALUES ('MM_CUSTOM')
    MODIFY MERGE_RULES 'ME_2_NAMEF'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (ATTRIBUTE_NAME) VALUES ('NAME_F')
    MODIFY MERGE_RULES 'ME_2_NAMEF'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (CUSTOM_TEXT) VALUES ('$custom_rule2')

#Rule 3:
#Custom rule text is added to a variable and called from OMB.
#Note the escape characters in the PL/SQL text:
# Double quotes are escaped by a slash
# Single quotes are escaped by a single quote

set custom_rule3 "BEGIN\n 
type info_type is table of varchar2(4000) index by binary_integer;
type infocnt_type is table of pls_integer index by binary_integer;
v_infos info_type;
v_infocnt infocnt_type;
retval varchar2(20);
retcnt pls_integer;
found boolean;
indx pls_integer := 1;
begin
  -- return the modal value of "SSN" that is not null
  for i in M_MATCHES.FIRST .. M_MATCHES.LAST loop
      if M_MATCHES(i)."SSN" is not null and M_MATCHES(i)."SSN" != ''999-99-9999''  then
        found := false;
        if v_infos.count > 0 then
          for j in  v_infos.first .. v_infos.last loop

            if found = false and v_infos(j) = M_MATCHES(i)."SSN" then
              -- found it
              v_infocnt(j) := v_infocnt(j) + 1;
              found := true;
            end if;
          end loop;
        end if; -- v_infos is not null
        if found = false then
          v_infos(indx) := M_MATCHES(i)."SSN";
          v_infocnt(indx) := 1;
          indx := indx + 1;
        end if;
      end if;
  end loop;
  -- find value with max count
  if v_infos.count > 0 then
    for i in v_infos.first .. v_infos.last loop
      if retval is null then
        retval := v_infos(i);
        retcnt := v_infocnt(i);
      else
        if (v_infocnt(i) > retcnt) then
          retcnt := v_infocnt(i);
          retval := v_infos(i);
          indx := indx + 1;
        end if;
      end if;
    end loop;
  end if;
  return retval; 
nEND;"

OMBALTER MAPPING 'CUST_MATCH_MERGE'
    ADD MERGE_RULES 'ME_3_SSN'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (TYPE) VALUES ('MM_CUSTOM')
    MODIFY MERGE_RULES 'ME_3_SSN'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (ATTRIBUTE_NAME) VALUES ('SSN')
    MODIFY MERGE_RULES 'ME_3_SSN'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (CUSTOM_TEXT) VALUES ('$custom_rule3')

#Rule 4:
#Custom rule text is added to a variable and called from OMB.
#Note the escape characters in the PL/SQL text:
# Double quotes are escaped by a slash
# Single quotes are escaped by a single quote

set custom_rule4 "BEGIN
  -- select the cust_seq from the same record as the merged SSN.
  for i in M_MATCHES.FIRST .. M_MATCHES.LAST loop
      if M_MATCHES(i)."SSN" is not null and
         M_MATCHES(i)."CUST_SEQ" is not null and
         M_MATCHES(i)."SSN" = M_MERGE."SSN"  then
        return  M_MATCHES(i)."CUST_SEQ" ;
      end if;
  end loop;
  return  M_MERGE."CUST_SEQ";  -- previously selected cust sequence
END;"

OMBALTER MAPPING 'CUST_MATCH_MERGE'
    ADD MERGE_RULES 'ME_4_CUSTSEQ'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (TYPE) VALUES ('MM_CUSTOM')
    MODIFY MERGE_RULES 'ME_4_CUSTSEQ'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (ATTRIBUTE_NAME) VALUES ('NAME_F')
    MODIFY MERGE_RULES 'ME_4_CUSTSEQ'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (CUSTOM_TEXT) VALUES ('$custom_rule4')
       
#Rule 5:
#Custom rule text is added to a variable and called from OMB.
#Note the escape characters in the PL/SQL text:
# Double quotes are escaped by a slash
# Single quotes are escaped by a single quote

OMBALTER MAPPING 'CUST_MATCH_MERGE'
    ADD MERGE_RULES 'ME_5_SAPCUSTID'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (TYPE) VALUES ('MM_RANK')
    MODIFY MERGE_RULES 'ME_5_SAPCUSTID'
        OF OPERATOR 'MATCHMERGE' SET PROPERTIES (ATTRIBUTE_NAME) VALUES ('SAP_CUST_ID')
    ADD RANK_RULES 'SAP_RULE'
        OF MERGE_RULES 'ME_5_SAPCUSTID'
            OF OPERATOR 'MATCHMERGE'
    MODIFY RANK_RULES 'SAP_RULE'
        OF MERGE_RULES 'ME_5_SAPCUSTID'
            OF OPERATOR 'MATCHMERGE' SET PROPERTIES (EXPRESSION) VALUES ('length(INGRP1.SAP_CUST_ID) = 7')
       

November 19, 2007

Alcoa Gets It: Oracle Warehouse Builder, Oracle Database, RAC, and BI

According to a new press release, world-leading aluminum producer Alcoa has adopted Oracle Database, Oracle Real Application Clusters, and Oracle Warehouse Builder for constructing a multi-terabyte global data warehouse, as a better alternative for their long-term needs, following "a thorough evaluation of Oracle and a third-party database platform provider." From the release:



With its GDW, Alcoa's business and corporate executives are able to access, manage and integrate global data from a single source to targeted business analytics applications such as Oracle's Hyperion Essbase, Hyperion Financial Management and Hyperion Planning - components of Oracle Fusion Middleware. Alcoa consolidated data from four regional instances of the Oracle E-Business Suite it has deployed throughout the world into a single, global repository to support its business analytics, Enterprise Performance Management, and Master Data Management initiatives. The fast-growing multi-terabyte data warehouse supports approximately 9,000 users across Alcoa's strategic business units in the form of dashboards, Key Performance Indicators (KPIs) and scorecards.


...Alcoa deployed Oracle Database 10g, Oracle Real Application Clusters, Oracle Clusterware, Oracle Automatic Storage Management, Oracle Partitioning, Oracle Warehouse Builder 10g Release 2, Oracle Business Intelligence Discoverer, and Oracle Enterprise Manager 10g on a cluster of three HP 9000 rp8420 servers with eight PA-RISC 8700 processors running HP-UX.
...
The upgraded GDW can support an increased number of applications and data sources.... With Oracle Real Application Clusters, Alcoa is able to incrementally add server capacity to adapt easily to growing business demands without disruption. Finally, Alcoa has a technical architecture for supporting its GDW over time.


"The clustered database architecture that we've deployed has satisfied our business and performance requirements," said Matthew Schroeder, Manager, Business Information & Technologies, Alcoa. "It also offers us the flexibility we need for future growth."


Warehouse Builder is a bit buried in the text of this release, so it may not seem that important. But solving a problem like this is just what Warehouse Builder (and Oracle's range of database, DW and BI offerings) is made for:



  • Drawing from an ever-increasing array of applications and other data sources;
  • Ensuring the quality of the data as it's collected;
  • Populating and managing a data warehouse on whatever scale customers need;
  • Turning the contents of that warehouse into actionable information though BI.

The OWB 11g Overview whitepaper lays out the role of OWB in this story at more length, but the key insight is how each piece of the puzzle enables the others:



  • The richer the data in your warehouse, the more and better the actionable information your BI products can pull out of it;
  • The better your data integration, ETL and DQ processes, the richer the data in your warehouse;
  • The more productive and effective your ETL and DQ developers, the better your processes can be, and the more sources you can incorporate.

Which, of course, is where OWB comes in. :)


The database with RAC, of course, delivers one kind of scalability: handling ever-greater volumes of data. And OWB has a role to play here-- the code it generates has to take optimum advantage of the underlying Oracle database. But OWB also takes care of another kind of scalability: handling more sources, and doing more with them, and getting more, and more consistent, results out of your development resources by getting as much automation and smarts into the tool as possible, and getting as much integration between the tool and the sources as possible.

January 3, 2008

11g Heterogeneous Agent

There are quite a few nice changes in the 11g Gateways that are worth checking out.... from performance to standalone agent to legacy integration. As with a lot of things in life, when you start scratching the surface you find lots of cool and interesting stuff - and in amongst a few things you don't want to find too of course! Here's a few initial findings on setup....

The software firstly for 11g is under the 'See All' section next to the database platform download.

Gateway1:

When you click on 'See All' further down the page you will find the Gateway zip, this is a standalone install that has the Gateway Heterogeneous Agent and also the legacy connect and designer utilities amongst others.

Gateway2:

A change in 11g is that the agent can be installed on its own node or wherever you decide which is great, it can also be used with existing 10.2 database releases which is even better. See the certification matrices below for connection details:

The install and configuration is generally better, there are a few differences such as having to run some scripts on the source system (if you are reverse engineering) - see the SQLServer example for details, and various other gotchas.


Firstly lets check the SQLServer Database Gateway for example...

A lot of people blast on without reading the manual, I am one of those people too. There are some changes to how the Gateway is configured that you should be aware of. In the installation and configuration guide check the section 'Create SQL Server Views for Data Dictionary Support',  these are needed in order to reverse engineer from SQLServer, the views are used by the database's transparent gateway views used by OWB in the import/reverse engineering process - so remember and define them on your source! After this it was fairly straightforward.

Looking at a simple example..
Firstly I thought I'd start off simple using a simple Excel document. With 11g Gateways I can now have an agent on my windows box and use my Linux 11g database to read the document. On my windows host under the gateway Oracle home I create a file initMYEXCEL.ora under hs/admin with the following content:

HS_FDS_CONNECT_INFO = MYEXCEL_DSN
HS_FDS_SUPPORT_STATISTICS=FALSE
HS_FDS_TRANSACTION_ISOLATION=NONE
HS_FDS_TRACE_LEVEL = DEBUG

The trace setting is INVALUABLE whilst you are setting this up, the trace in this case will be written to hs/trace. In the gateway Oracle home's listener.ora file I added a listener for the service:
SID_LIST_LISTENER=
   (SID_LIST=
      (SID_DESC=
         (SID_NAME=MYEXCEL)
         (ORACLE_HOME=C:\product\11.1.0\tg_1)
         (PROGRAM=dg4odbc)
      )
   )

On my database home where I wish to connect to the agent I add an entry in the tnsnames.ora to connect to my Heterogeneous agent:
MYEXCEL_TNS =
(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (Host = myhost) (PORT = 1521))) (CONNECT_DATA = (SID = MYEXCEL ))(HS = OK))

I can then create a database link that uses this to test:
create database link MYEXCEL_DBL 'MYEXCEL_TNS';

Now I can query the catalog for this schema and also query the tables in the Excel document;
select TABLE_NAME from ALL_TABLES@MYEXCEL_DBL;

If this fails (with 'Optional feature not implemented' message) rerun the query, this is a bug (6701400). With 11g this seems to happen with 1st query after the link has been created, every other use is OK. Also this is specific to ODBC Gateway reading from a read-only Excel data source. I was able to use this 11g agent on Windows  with a 10gR2 database on Linux and transparently read the Excel document. So lots of potential once this bug is resolved, also Gateway bug 6883247 is an issue (this was raised after bug 6701461 was raised on OWB, it is actually a database/gateway bug).

This is a quick run through of some initial findings time, permitting I'd like to write up some more of these features that can be leveraged from OWB, the following areas jump to mind;
  • Integrating legacy data using the 11g Studio for legacy integration
  • Parallelizing loading of stage tables using Gateway

January 4, 2008

Advanced Aggregation

The OWB 10.2.0.3 and 11g releases of OWB extended the aggregation capabilities to support Oracle's advanced aggregation capabilities. So for example the CUBE and ROLLUP clauses can now be used in the aggregation operator which is great for building summary maps with these simple yet powerful clauses.

To illustrate I have used the example from the Oracle Data Warehousing guide (Example 20-8 GROUPING combined with HAVING);


SELECT channel_desc, calendar_month_desc, country_iso_code,
       TO_CHAR( SUM(amount_sold), '9,999,999,999') SALES$,
       GROUPING(channel_desc) CH,
       GROUPING (calendar_month_desc) MO,
       GROUPING(country_iso_code) CO
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND customers.country_id = countries.country_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND country_iso_code IN ('GB', 'US')
GROUP BY CUBE(channel_desc, calendar_month_desc, country_iso_code)
HAVING (GROUPING(channel_desc)=1 AND GROUPING(calendar_month_desc)= 1 AND GROUPING(country_iso_code)=1) OR (GROUPING(channel_desc)=1 AND GROUPING (calendar_month_desc)= 1) OR (GROUPING(country_iso_code)=1 AND GROUPING(calendar_month_desc)= 1);

In OWB use the AGGREGATOR operator and construct the group by and having clause as:

Map Builder 5:


This and the ability to define a materialized view as a target in a map, then create/bind the materialized view gives a great way for building summaries and importantly capturing the metadata for lineage/impact analysis (see post on federation). Also check out the post on constructing maps from SQL (part 1 and part 2) which uses the example above.



January 25, 2008

Binary Files in OWB - Simple Example

So how do you load binary data into Oracle? Using a variety of ways, thats the good news, this is a little like teaching new dogs old tricks. OWB's file support is rich enough to fulfill the vast majority of cases. I guess it is not 100% obvious though how it is done, especially from the queries on mailing lists and forums. Once you see the pattern then it becomes apparent what the tasks at hand are. I created a viewlet (see here) which uses a nice small example to illustrate how it all hangs together; what you can do with files and external tables. In 2007, a white paper was published that illustrated how to load COBOL
EBCDIC files using OWB's file support, check it out also (here). It illustrates how more complex data structures are represented and how to use OWB to complete the job. Must post the actual MDLs and data files for these too, its always useful to get your hands on something real.

March 11, 2008

11g Remote Agent

Some more cool stuff in 11g, the Gateways install has a lightweight remote/scheduler agent for executing tasks such as native unloaders or any custom activity. This follows on from the post on the 11g Heterogeneous Agent, and the agent is installed from the software downloaded in that post.

The setup and configuration is fairly simple but is kind of scattered across a few manuals, here is a simple example that steps through executing a batch script on a Windows system from an 11g instance on Linux. The database user defining the job needs CREATE JOB and CREATE EXTERNAL JOB database privileges.

Administration/Setup

Here are the administration/setup steps of agent (documented in the Oracle Database Administrator's Guide - Enabling and Disabling Remote External Jobs, in the Administering Oracle Scheduler chapter);

1. Run the script prvtrsch.plb with following command:

SQL> @?/rdbms/admin/prvtrsch.plb

2. configure the http port on the database instance, so the remote agent can register itself as a valid agent. For example below I used port 1540 using a privileged user;

BEGIN
DBMS_XDB.SETHTTPPORT(1540);
END;
/

3. Set the agent registration password using a privileged user;

BEGIN
DBMS_SCHEDULER.SET_AGENT_REGISTRATION_PASS('agent_registration_password');
END;
/

4. Install the agent software on the remote node using the Gateways install:
DatabaseSchedulerAgent:

I then changed the port number in my case in the sched.conf file in the agent's ORACLE_HOME ( I set PORT=1539)

5. On the remote node, register the agent with the database instance using the port used above and your will be prompted for the agent registration password;

.\schagent -registerdatabase database_hostname database_http_port

6. Start the agent (using the Windows services if Windows for example, or 'schagent -start' or 'schagent -stop' for UNIX.

That's the administration side of the configuration complete. Now we can move on to an example simple job, which will execute a batch script for example.

Example Job

I created a user in Windows for running the job and gave the user the right to 'Log on as a batch job', under 'Local Security Settings' in the administrative tools in Windows XP for example.

To setup the job I had to create a credential for the remote system with the user name and password

EXEC DBMS_SCHEDULER.CREATE_CREDENTIAL('TESTCRED', 'TEST', 'test_user_windows_password');

I can then use the credential TESTCRED in a remote job, note I used enabled=false in the CREATE_JOB otherwise the job would have been deemed local and executed - the remote details of the job are defined via the SET_ATTRIBUTE api call;

BEGIN
DBMS_SCHEDULER.CREATE_JOB(job_name => 'UNLOADJOB',
job_type => 'EXECUTABLE',
job_action => 'c:\unloadutils\unload.bat',
enabled=> FALSE);
DBMS_SCHEDULER.SET_ATTRIBUTE('UNLOADJOB', 'logging_level',
DBMS_SCHEDULER.LOGGING_FULL);
DBMS_SCHEDULER.SET_ATTRIBUTE('UNLOADJOB', 'credential_name', 'TESTCRED');
DBMS_SCHEDULER.SET_ATTRIBUTE('UNLOADJOB', 'destination', 'mywindowshost:1539');
DBMS_SCHEDULER.ENABLE('UNLOADJOB');
END;
/

Whilst I was debugging and building the example I used the LOGGING_FULL logging level to get diagnostics, job details can be found in the view USER_SCHEDULER_JOB_RUN_DETAILS.

For example one time I forgot to register the agent with the database and received errors such as (this was recorded in the job run details view above);

ORA-29273: HTTP request failed
ORA-06512: at "SYS.UTL_HTTP", line 1029
ORA-28759: failure to open file
...

Above the port number 1539 is the port number my remote agent service is using. This port number is defined in the schagent.conf on the remote system where the agent is running. In that file you can define other useful info like tracing levels that will get written to the agent.log file on the remote system.

You can incorporate file retrieval and pushing also using this remote agent using the DBMS_SCHEDULER apis, for example to retrieve the unloaded data file from above. The command below will retrieve the remote file scottish_sales.dat via the remote agent on mywindowshost, to the database systems DATA_FILE_DIR SQL directory, in a file named scottish_sales.dat;

exec DBMS_SCHEDULER.GET_FILE ('c:\stage\scottish_sales.dat', 'mywindowshost:1539','TESTCRED','scottish_sales.dat', 'DATA_FILE_DIR')

So here we've seen a few useful areas for executing remote programs based on a lightweight agent, and retrieval/pushing of files via the agent. These APIs can be incorporated in OWB mappings or process flow objects. Only an 11g database instance can
schedule remote jobs using this agent (unlink the Gateway agent where prior releases can be patched to take advantage of it).

Hope this 'hello world' style example of executing remote jobs is a useful overview of some interesting new capabilities in 11g applicable to OWB and SQL/PLSQL development in general. The full documentation for DBMS_SCHEDULER provides a rich set of functionality you can check for much more besides.

Note:
If you are getting an error about bad certificates you may be hitting database scheduler bug
6862726. In 11.1.0.6 there was a bug where agent certificates would expire after 60 days (this has been fixed in 11.1.0.7).
To resolve this is what you can do

- stop the agent
- remove the file $AGENT_HOME/data/agent.key
- re-register the agent with the database
- restart the agent

March 17, 2008

Pivoting Data in OWB

The pivot transformation operator enables you to transform a single row of attributes into multiple rows in an efficient manner. This example illustrates transforming a table that has a row for each year with the quarterly sales in a table with a row for each quarter. The OWB pivot operator makes this simple (there is also an unpivot).

So taking a simple example as follows:

YEAR   Q1_sales      Q2_sales Q3_sales   Q4_sales
---------- ---------- ---------- ---------- ----------
      2005      10000      15000      14000      25000
      2006      12000      16000      15000      35000
      2007      16000      19000      15000      34000

we wish to transform the data set to the following with a row for each quarter:

   YEAR QTR    SALES
---------- -- ----------
      2005 Q1      10000
      2006 Q1      12000
      2007 Q1      16000
      2005 Q2      15000
      2006 Q2      16000
      2007 Q2      19000
      2005 Q3      14000
etc..

We can design this in the OWB mapping as;
Pivot0:

Looking at the internals of the operator we see how this is described. The pivot operator allows you to define the input columns, the output columns and how the data is pivoted. This is achieved by defining a few pieces of information;
  • the key columns (the columns from the source that will appear in the output of the pivoted data)
  • the row locator (this is the pivot column)
  • the pivot transformation (which values to project for the pivoted columns
Firstly in our example we define the key column to be YEAR, this will be the same for each pivoted row;
Pivot1:

Then we define the QUARTER column as the pivot column, this is the row locator (in OWB terms).
Pivot2:

Finally we define how the row is transformed from a row with columns to a number of rows, we do this by entering a row in the table for each case we desire (so we have a row for Q1, a row for Q2, a row ... etc.).
Pivot3:

This makes the map design so much simpler as you can see, since the operator encapsulates the transformation in a simple manner. The example has been scripted (get the script here) so that you can create it and have a look around at how this is done....

Here is the sample data I used also:
--
-- Execute the following where you deployed SALES_BY_QTR
-- I manually added some rows in SALES_BY_QTR for the example:
--
insert into SALES_BY_QTR values (2005, 10000, 15000, 14000, 25000);
insert into SALES_BY_QTR values (2006, 12000, 16000, 15000, 35000);
insert into SALES_BY_QTR values (2007, 16000, 19000, 15000, 34000);
commit;

The reverse of this scenario is the unpivot, the script for the unpivot example can be found here.

You select the key just like the pivot above, then define the row locator (or unpivot column), defining the values for each match row:
Unpivot0:

Then define the output attributes for the unpivot:
Unpivot1:

Finally define the unpivot transformations (how the column data is taken from the matching row):
Unpivot2:

If your data has many rows with sales values for a quarter (for a single year) you will need to aggregate the data before unpivoting, for example the map below first aggregates and sums sales before unpivoting. The data is grouped by YEAR (key) and QUARTER (row locator) and
the output expression has SUM(SALES), the map then unpivots that data. (you cannot tweak the agg function
just now in the unpivot)
Unpivot3:

Hope this is useful and helps illustrates the pivoting transformation capability.

April 14, 2008

What features do you use and what do you want next?

Catch 22 that is what we sometimes suffer from. How do we know what to build when we don't quite know what you really use day-to-day. Sure we can look at our bugs (not that we have many ;-) ), but would it not be great if you could just tell us what is really important for your project?

So fill out the feature survey and let us know what you use in your projects, and as important here, add your feedback to the enhancement request section at the end and drive the new Warehouse Builder release with your requests...!

Please download the feature survey here and send us the results (press the big nice button).

Thanks in advance for any and all feedback!

April 17, 2008

Checking referential integrity with data rules

So you read my KQIs thingie. This one is a bit more practical in that it is real code without coding :-)

I get incoming data into a table, and what I want to do is check referential integrity without FKs, because they will generate errors while loading. Not quite cool. So here is the scenario:

The incoming data in my STG_TAB:

insert into stg_tab values ('1001', 'Product 1001 Name', '1001 description', 'nothing');
insert into stg_tab values ('1002', 'Product 1002 Name', '1002 description', 'nothing');
insert into stg_tab values ('1003', 'Product 1003 Name', '1003 description', 'nothing');
insert into stg_tab values ('1004', 'Product 1004 Name', '1004 description', 'nothing');
insert into stg_tab values ('1005', 'Product 1005 Name', '1005 description', 'nothing');

The table that has the references to check on has the following data (it is called LKP_TAB):

insert into lkp_tab values ('1001', 'IamProduct1001');
insert into lkp_tab values ('1002', 'IamProduct1002');
insert into lkp_tab values ('1003', 'IamProduct1003');

So the point is that 1004 and 1005 are not around, so they are considered not compliant. And I want to now check this with a data rule. These are steps to get this working:

1) Create a new data rule
    Referential rule type
    Use a single attribute and set both cardinality thingies to 1 : n

Create new datarule:

2) Apply the rule to the STG_TAB table in OWB
    Open the data object and apply the rule
    Bind like this

ApplyRuleToTable:

3) Create a mapping in OWB that has the STG_TAB table
    In the property inspector for the operator, find the DataRule node and open it up
    Set the rule to Move to Error

SetRuleActions:

4) On the mapping, while still highlighting the operator


Set Match By Constraint to NO_CONSTRAINTS (this is assuming the STG_TAB has no PKs or UKs that would work for the delete matching)

SetNoConstraint:

Now find the attribute in the table that is doing the lookup (e.g. the pk without being an enforced pk) and highlight that in the map
Set the Match column when Deleting row to YES

SetDeleteAttribute:

5) Validate the mapping
6) Re-deploy the staging table, you must do this because you need to get the error table deployed as well... so don't load data yet, first redeploy! (Or generate the scripts and simply run the DDL for the error table)
7) Deploy the LKP_TAB
8) Deploy the mapping
9) Insert the data from the scripts above in both tables
10) Run the mapping and you will see the 1004 and 1005 records being removed from the STG_TAB and placed in the STG_TAB_ERR

Here is the final data set in both these tables:

FinalDataCorr:

The error table (I have imported the table back from the database after deploying it via the STG_TAB deploy step - OWB does not do this automatically) looks like this, just as we expected.

FinalDataErr:

May 19, 2008

Data Watch and Repair for Master Data Management

Some happy news on the data quality front (yes I'm harping on that quite a bit these days...). We just completed the new Data Watch and Repair solution using the entire OWB technology stack. One of the things we phased when looking at our MDM products was that we need more emphasis on quality of the data. How do we empower the data steward to take charge of data quality.

So we combined forces and worked on something called data watch and repair for master data management. It is a solution using the some exising technology from OWB (the data quality option for data profiling, data rules, data corrections and data auditors) and a new piece, a connector for the MDM products (for CDH, UCM and PIM). The latter is now officially available for OWB (it is part of OWB 11.1.0.6).

Now what makes this so interesting (in my humble opinion)? DWR - don't you love those acronyms! - allows you to leverage the technology we had in house with pre-defined data rules and pre-defined data corrections. This makes your time to market a lot shorter and your data a lot better. The solution follows our best practices and uses the generated corrections to write back data for the master data applications reducing the need to hand code any routines. The DWR solution also ensures you concentrate your effort on profiling and on data rules. These data rules now drive the entire implementation of data quality enabling the previously mentioned generated data corrections.

You will see more and more information appearing. For a first quick glance at DWR take a look at this paper. You will get an idea of best practices around a data quality solution and how DWR first within that picture.

May 27, 2008

OWB, web services and the database

Since we get this question quite often, lets go back into this a little more. What is the story with OWB (or better the database) and web services? Well, you have all the tools and mechanisms around to simply hook it all together. What is all? Anything that speaks web services can speak to the database, and you can hook your DB processes (e.g. OWB mappings, OWB process flows etc) together with the Middleware frameworks like BPEL. You don't need any other tools to anything, just leverage what you have in the database.

How? Here are the details, excellently written out by Kuassi Mensah who is an expert in this at Oracle. There have been enhancements in Oracle Database 11g, but if you are on 10g this should be extremely helpful.

May 29, 2008

Performance tuning mappings

You can leverage the standard Oracle tuning tools for investigating set based SQL performance and PLSQL performance. There are reports in the utility exchange (see
here
) for reporting on the results after using the DBMS_PROFILER.The PLSQL profiles can
be queried (in master detail form - view map profiles, drill to top 20, profile summary etc). The following reports are available;

  1. Top 20 statements (in terms of most
    time spent)
  2. Summary report giving time taken for
    the package and dependent functions/procedures outside of package.
  3. Profile detail, line by line detail of
    occurrences and time (exportable to Excel etc.)
  4. % code coverage

The start/stop of PLSQL profiling has to be manually performed before/after the map execution, see the download for instructions.



For analysis of set based SQL from the design environment, there were some changes in OWB 10.2.0.3 that included the ability to obtain access explain plans/statistics/tuning advisor from within the mapping editor for SQL within the map. There is a viewlet describing some of this here.
In order to utilize statistics and SQL Tuning there are additional privs required.....



1. For statistics the following must be granted to the schema the map is being deployed to:

grant select on V_$SQL_PLAN to <tgt_schema>;

grant select on V_$SQL_PLAN_STATISTICS_ALL to <tgt_schema>;

grant select on V_$SQL to <tgt_schema>;



2. For SQL Tuning the advisor role is required:

grant advisor to <tgt_schema>;


If you want runtime analysis of SQL, the Oracle Database Release 11 has introduced a new performance view v$sql_monitor. You can use v$sql_monitor for near real-time monitoring of SQL statements.
Customers who license the Enterprise Manager Tuning Pack are allowed to use this view. SQL monitoring is automatically started when a SQL statement runs parallel or when it has consumed at least 5 seconds of CPU or I/O time.



In Oracle Database Release 11 the way to get output in a readable format is through PL/SQL or grid control:

variable my_rept clob ;

begin

:my_rept := dbms_sqltune.report_sql_monitor() ;

end ;

/

print :my_rept


Full documentation for v$sql_monitor is in the Database Reference and the Database Performance Tuning Guide.

May 30, 2008

BIWA Summit call for papers

BIWAlogo:

Oracle BIWA Summit - December
2nd & 3rd

Please pass this
information to customers and partners




Oracle
BIWA Summit 2008 is a forum for business intelligence (BI), warehousing
and analytics professionals to exchange information, experiences and
best practices. With over 75 presentations and workshops and the entire
event focused on BI, Warehousing and Analytics (BIWA), you will get the
knowledge and information critical to be successful in your work. You
will hear experts present novel and interesting use cases of Oracle
Database-centric BIWA topics through keynotes, technical talks, hands
on workshops, discussion panels and more.


The summit will be
held December
2-3, 2008 at the Oracle
Conference Center in Redwood Shores, CA
and
offers outstanding
value for BI professionals who use Oracle technology.  


Call for Presentations & Technical
Workshops


You are invited to submit an abstract for a Presentation (50 minutes)
or hands-on Technical Workshop (110 minutes) to the Oracle Business
Intelligence, Warehousing, and Analytics (BIWA) Summit 2008 at the
Oracle Conference Center in Redwood Shore, CA.  See the Submission
Process at
request
for Presentation Website




Important
Dates


Last
date for submission of abstracts:  
August
15, 2008


Presentation
notification:  
September 30, 2008

BIWA
Summit: 
December 2-3, 2008




LEARN MORE

BIWA
Summit 2008 Website


Request
for Presentation Website

OracleBIWA.org

Watch for Summit Registration details coming soon.



About BIWA: 

BIWA
is the Oracle Business Intelligence, Warehousing, and Analytics Special
Interest Group, part of the Independent Oracle User Group (IOUG). BIWA
is a worldwide association of persons seeking the successful deployment
of Oracle Database-centric business intelligence, data warehousing, and
analytical products, features and Options.

June 3, 2008

Table Functions as Source and Target

Using table functions is a useful way to both boost performance and extend the source/target capabilities in OWB. Here we will see how to setup table functions as a source and a target in OWB. Table functions let you define a set of PL/SQL statements that when queried, will behave like a standard SQL query to a table. We will do this by simple examples to illustrate how it hangs together.

Firstly let's define 2 maps, both use the table function as a source, one uses a table function which takes a scalar parameter, the other takes a REF CURSOR, so an arbitrary SQL query can be passed as a parameter. The functions are skeletal just to illustrate how it all works. You can also have a table function as a target so can write to systems using APIs for example, we will look at this later.

Scalar Table Function Example

The scalar table function example uses a constant (a scalar) as a parameter to the table function, and the results are returned from the table function and then the map writes to rows returned to the target table.

Table Function Scalar Source:

Here is the OMB script to create the table function as source example (script 1).

# ===============================================
# Source: Scalar TF Map:
# ===============================================
OMBCREATE MAPPING 'TF_SCALAR_PARAM' ADD TABLE_FUNCTION OPERATOR 'GET_EMPS' SET PROPERTIES (TABLE_FUNCTION_NAME) VALUES ('SCOTT.SCALARF')

OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD INPUT_PARAMETER OPERATOR 'INPUTS'
OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD ATTRIBUTE 'FILTERREC' OF GROUP 'OUTGRP1' OF OPERATOR 'INPUTS' SET PROPERTIES (DATATYPE,DEFAULT_VALUE) VALUES ('NUMBER', '0')

OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD ATTRIBUTE 'FILTER_EMP' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD ATTRIBUTE 'EMPNO' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD ATTRIBUTE 'ENAME' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE, LENGTH) VALUES ('VARCHAR2', '256')
OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD ATTRIBUTE 'LVL' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')


OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD CONNECTION FROM ATTRIBUTE 'FILTERREC' OF GROUP 'OUTGRP1' OF OPERATOR 'INPUTS' TO ATTRIBUTE 'FILTER_EMP' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS'

OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD TABLE OPERATOR 'EMPLOYEES'
OMBALTER MAPPING 'TF_SCALAR_PARAM' ADD CONNECTION FROM GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' TO GROUP 'INOUTGRP1' OF OPERATOR 'EMPLOYEES'


Ref Cursor Table Function Example

An arbitrary SQL query can also be passed in as a ref cursor into the table function, you define the group with this information in order to get the correct SQL generated;

Table Function Ref Cursor Source:

The table function which is referenced is defined via the property below, if it is in a different schema you prefix the name with the schema name;

Table Function Setup:

Here is the OMB script to create the table function as source example using a ref cursor (script 2).

# ===============================================
# Source: Ref Cursor TF Map:
# ===============================================
OMBCREATE MAPPING 'TF_REFCURSOR_PARAM' ADD TABLE_FUNCTION OPERATOR 'GET_EMPS' SET PROPERTIES (TABLE_FUNCTION_NAME) VALUES ('SCOTT.REFCURSORF')

OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD TABLE OPERATOR 'SRC'
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'C1' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'C2' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' SET PROPERTIES (DATATYPE, LENGTH) VALUES ('VARCHAR2', '256')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'C3' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')

OMBALTER MAPPING 'TF_REFCURSOR_PARAM' MODIFY GROUP 'INGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (GROUP_TYPE) VALUES ('REF_CURSOR')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'EMPNO' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'ENAME' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE, LENGTH) VALUES ('VARCHAR2', '256')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'LVL' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')

OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'EMPNO' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'ENAME' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE, LENGTH) VALUES ('VARCHAR2', '256')
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD ATTRIBUTE 'LVL' OF GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')


OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD CONNECTION FROM ATTRIBUTE 'C1' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' TO ATTRIBUTE 'EMPNO' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS'
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD CONNECTION FROM ATTRIBUTE 'C2' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' TO ATTRIBUTE 'ENAME' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS'
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD CONNECTION FROM ATTRIBUTE 'C3' OF GROUP 'INOUTGRP1' OF OPERATOR 'SRC' TO ATTRIBUTE 'LVL' OF GROUP 'INGRP1' OF OPERATOR 'GET_EMPS'

OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD TABLE OPERATOR 'EMPLOYEES_SAL'
OMBALTER MAPPING 'TF_REFCURSOR_PARAM' ADD CONNECTION FROM GROUP 'OUTGRP1' OF OPERATOR 'GET_EMPS' TO GROUP 'INOUTGRP1' OF OPERATOR 'EMPLOYEES_SAL'

Types and Table Functions for Source Examples

Here are the types and table functions created in SCOTT, you will have to grant execute on your functions to the target schema;

CREATE OR REPLACE TYPE emp_obj AS OBJECT(
  empno NUMBER,
  ename VARCHAR2(255),
  lvl number
  );
/

CREATE OR REPLACE TYPE tb_emp_obj AS TABLE OF emp_obj;
/

CREATE OR REPLACE FUNCTION REFCURSORF(input_values sys_refcursor) RETURN tb_emp_obj pipelined AS
  out_pipe emp_obj := emp_obj(null,null,null);
  ename VARCHAR2(255):=null;
  empno number; lvl NUMBER;
BEGIN
  LOOP
    FETCH input_values INTO empno, ename, lvl;
    EXIT WHEN input_values%NOTFOUND;
    out_pipe.empno:=empno;
    out_pipe.ename:=ename;
    out_pipe.lvl:=lvl;
    PIPE ROW(out_pipe);
  END LOOP;
  CLOSE input_values;
  RETURN;
END;
/

CREATE OR REPLACE FUNCTION SCALARF(input_val NUMBER) RETURN tb_emp_obj pipelined AS
  out_pipe emp_obj := emp_obj(null,null,null);
  ename VARCHAR2(255):=null;
  empno number; lvl NUMBER;
  cursor INPUT_VALUES is select empno, ename, sal from emp;
BEGIN
  open INPUT_VALUES;
  LOOP
    FETCH INPUT_VALUES INTO empno, ename, lvl;
    EXIT WHEN input_values%NOTFOUND;
    out_pipe.empno:=empno;
    out_pipe.ename:=ename;
    out_pipe.lvl:=lvl;
    PIPE ROW(out_pipe);
  END LOOP;
  CLOSE input_values;
  RETURN;
END;
/

Target Table Function Example

With the target table function example there is a flag on the operator to define that the table function is being used as a target in the mapping. The table function as target in OWB must return one row, in the example below I use OWB to aggregate EMP then the ref cursor is passed to the table function, the table function inserts the aggregated values into another table. You will not get the audit details written to the OWB runtime and needs some error handling, but these kind of things are possible. Also the table function scenario as target would be possible if you use the autonomous pragma transaction within the table function as below.

Table Function Targets:

Here is the OMB script to create the table function as target example (script 3).

set map_name LOAD_WITH_TABFUN
OMBCREATE MAPPING '$map_name' ADD TABLE_FUNCTION OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (TABLE_FUNCTION_NAME,TABLE_FUNCTION_IS_TARGET) VALUES ('TG_TGT', 'true')

OMBALTER MAPPING '$map_name' ADD TABLE OPERATOR 'EMP' BOUND TO TABLE '../SCOTT/EMP'
OMBALTER MAPPING '$map_name' ADD ATTRIBUTE 'DEPTNO' OF GROUP 'INGRP1' OF OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')

OMBALTER MAPPING '$map_name' ADD ATTRIBUTE 'SALARY' OF GROUP 'INGRP1' OF OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')

OMBALTER MAPPING '$map_name' ADD ATTRIBUTE 'STATUS' OF GROUP 'OUTGRP1' OF OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (DATATYPE) VALUES ('NUMBER')
OMBALTER MAPPING '$map_name' MODIFY GROUP 'INGRP1' OF OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (GROUP_TYPE) VALUES ('REF_CURSOR')

OMBALTER MAPPING '$map_name' ADD AGGREGATOR OPERATOR 'AGG'

OMBALTER MAPPING '$map_name' ADD CONNECTION FROM ATTRIBUTE 'DEPTNO' OF GROUP 'INOUTGRP1' OF OPERATOR 'EMP' TO GROUP 'INGRP1' OF OPERATOR 'AGG'
OMBALTER MAPPING '$map_name' ADD CONNECTION FROM ATTRIBUTE 'SAL' OF GROUP 'INOUTGRP1' OF OPERATOR 'EMP' TO GROUP 'INGRP1' OF OPERATOR 'AGG'

OMBALTER MAPPING '$map_name' MODIFY OPERATOR 'AGG' SET PROPERTIES (GROUP_BY_CLAUSE) VALUES ('INGRP1.DEPTNO')

OMBALTER MAPPING '$map_name' ADD ATTRIBUTE 'SAL' OF GROUP 'OUTGRP1' OF OPERATOR 'AGG' SET PROPERTIES (DATATYPE, EXPRESSION) VALUES ('NUMBER', 'SUM(INGRP1.SAL)')

OMBALTER MAPPING '$map_name' ADD CONNECTION FROM ATTRIBUTE 'DEPTNO' OF GROUP 'OUTGRP1' OF OPERATOR 'AGG' TO ATTRIBUTE 'DEPTNO' OF GROUP 'INGRP1' OF OPERATOR 'WRITE_TO_ANYTHING'

OMBALTER MAPPING '$map_name' ADD CONNECTION FROM ATTRIBUTE 'SAL' OF GROUP 'OUTGRP1' OF OPERATOR 'AGG' TO ATTRIBUTE 'SALARY' OF GROUP 'INGRP1' OF OPERATOR 'WRITE_TO_ANYTHING'

OMBALTER MAPPING '$map_name' MODIFY GROUP 'OUTGRP1' OF OPERATOR 'WRITE_TO_ANYTHING'
  SET PROPERTIES (RETURN_TABLE_OF_SCALAR) VALUES ('true')

OMBALTER MAPPING '$map_name' MODIFY OPERATOR 'WRITE_TO_ANYTHING' SET PROPERTIES (TABLE_FUNCTION_IS_TARGET) VALUES ('true')

#Work around bug with target table function and generation.
catch {OMBCOMPILE MAPPING '$map_name'}
OMBALTER MAPPING '$map_name' SET PROPERTIES (GENERATION_MODE, DEFAULT_OPERATING_MODE) VALUES ('SET_BASED', 'SET_BASED')
catch {OMBCOMPILE MAPPING '$map_name'}

Here is the demo table function used;

create or replace function TF_TGT(RC IN SYS_REFCURSOR
                          ) return TBL_OF_STRINGS PIPELINED
as
 PRAGMA AUTONOMOUS_TRANSACTION;
 DEPTNO number;
 SALARY number;
begin
 loop
   FETCH RC into DEPTNO,SALARY;
   INSERT INTO TFTGTTAB VALUES (DEPTNO,SALARY);
   EXIT WHEN RC%NOTFOUND;
 end loop;
 COMMIT;
 PIPE ROW('SUCCESS');
end;
/

Here we have seen how the table function is setup in OWB and how we can extend the source and targets using an API based approach. If you would like to see a sample that leverages table functions integrating to web services see the post here - the jpublisher component generates functions and table functions around the web services defined in a WSDL.

June 9, 2008

Can Oracle and OWB Do Email, LDAP and SOA? Yes, We Can.

Sometimes OWB PM gets questions about ways to use Warehouse Builder in consulting situations, sales-related proof of concepts implementations, and so on. Some that came up recently include:



  • How OWB can be used to create or consume Web Services (a subject of a recent whitepaper);
  • How OWB can send email with attachments (the process flow task for sending email makes no provisions for attachments);
  • How OWB can make updates to an Active Directory or other LDAP server

In the second case, we recommended the PL/SQL package UTL_MAIL.  Incorporating a short bit of PL/SQL code that called the needed functions (make the attachment, create the mail message with the attachment, send it out) into a process flow would cover the customer's/developer's need.


The third case was more interesting, because of the thinking behind it, and the questions it raised.


The original question that reached PM was, "Can OWB write to ODBC targets? (Or do I need a different ETL product?)" The line of thinking was something like:



  • I'm using an ETL tool
  • An ETL tool can update data in database targets
  • Therefore, if I want to update something that isn't a database, I should make it look like a database target
  • Therefore, the customer needs to license a third-party ODBC driver for LDAP, and choose an ETL tool that can write to ODBC
  • Can OWB write to ODBC? ("Yes, we can", by the way-- the core product can use the gateway for ODBC with any target, no cost.)

This chain of reasoning is quite understandable, but it misses an important point: OWB is much more than a tool for talking to data sources and targets. OWB is now and has always been a great tool to maximize productivity for developers of PL/SQL-based data integration solutions on top of the Oracle database. (Future mechanisms for heterogeneous data integration in our roadmap don't obsolete any of the strengths of OWB when used with Oracle databases.)


The PL/SQL part is crucial here. From OWB you have access to all the functionality Oracle builds into the PL/SQL packages that come with the Oracle database. (This, for example, is how we have incorporated Web Services/SOA functionality into OWB since our 9.2 release. PL/SQL is generated for OWB mappings, and all PL/SQL packages can be accessed as web services, therefore OWB mappings can be exposed as web services. Likewise, PL/SQL can consume Web services, OWB generated code is PL/SQL code, therefore OWB can consume web services.)


In this instance, OWB access to PL/SQL enables a much more direct solution, with no third-party component and no need to abandon OWB. We proposed a little PL/SQL code to call the DBMS_LDAP and DBMS_LDAP_UTL packages from PL/SQL to update the LDAP server directly, and incorporate that code into a process flow.


Can we do more over time to package up capabilities like the ones above? Yes, we can-- whenever the market demands it. (Tell us what you want-- you, after all, are the market.)


But the next time you look in your palette and you can't find the right icon to drag out to do exactly what you want, don't give up, reframe the question: "Can the Oracle Database platform do this? Are there PL/SQL packages that can do this?" You won't want to or need to do this often-- but when you choose to, yes, you can.

June 16, 2008

Consuming Web Services from Inside Oracle: SQLJ, JPublisher and the Database Web Services Callout Client

Warehouse Builder depends on infrastructure in the Oracle Database for most of what it does... including calling external Web Services. Briefly, this requires that you install JPublisher in your database, which at this point means that you need to install SQLJ from the Oracle Database Client CD, as well as the Web Services Callout Client.


A lucid and concise description of how you configure an Oracle 10gR2 Database to consume Web services was posted in this Oracle By Example (OBE):


http://www.oracle.com/technology/obe/10gr2_db_single/develop/java/java_otn.htm


It takes you through installing all the required components and then calling an example web service from PL/SQL.


Once you have this working you are ready to start using calls to external web services from OWB (of which I shall write more in a future post).


 

July 22, 2008

Full Outer Joins in OWB

One of the questions I noticed recently was on full outer joins. How do you do that in OWB. Well it is quite simple, but I guess not that obvious. So here is a quick post on how to do this.

Here we have a simple mapping, where I join on code. I want to get all records (yeah not a great business case I know) from both tables, so I know if there are records on either side without matches.

My join looks like this:

JoinClause.JPG

Now if I complete the mapping to look like this, I can generate code.

MappingComplete.JPG

As you see by default (this is OWB 10.2.0.3) you get an ANSI join syntax:

ANSICodeForFullOuter.JPG

Want something not ANSI, go to configuration on the mapping and uncheck the ANSI SQL Syntax code option. Note that if you are doing Multi Config, you can change this per deployment of the mapping...

July 30, 2008

Using a DBLink Location for ETL

One of the things that might be interesting to have a play with is the ability to use a predefined dblink as a substitute for the OWB db links generated when you create two locations pointing to two separate hosts.

There are roughly three ways of doing this in OWB:

1) Configure a mapping and set a dblink and schema name property on each mapping (note is not the recommended way)
2) Supply a database link name on the connector between the target module and the source module
3) Use a DBLink type location for the source

Since the third one is most complex one, but used in the EBS cases in house, I figure, let's start with that one.

First thing to do is create the DB link location itself (it will specify the database link to use and the schema etc.)

Create_dblink_location.JPG

To explain this dialog a little bit more, the From Location is the target location in which the mapping will live, the database link is the name of the db link as found in the target schema. The dblink of course connects to your source schema as a normal db link.

Next you would associate the dblink location you just created with the SOURCE (!) module, as shown here, and then set the configuration for the source module to use this dblink location.

Associate_dblinkloc_withModule.JPG

Note that you cannot use the DBLink location to import metadata.

Configure_source_location_withdblink.JPG

This completes the work on the source and on the dblink location.

Once you have done this, you have associated the target with the source because in the dblink location you specify the from (target) and now by adding this to the source, the to is also known. OWB now generates you a connector on the target using the dblink. Open the connector to see the actual dblink details as specified on the dblink location.

Target_connector_withdblink_loc.JPG

Since this target location is already registered, all info above is grayed out...

Next is the target, where nothing else is really required. You create a mapping from source table to target table and generate code.

Generated_code_with_dblink.JPG

As you can see in line 337 the source schema we specified in dblink location and the dblink are taken into the generated code....

Now that you are using this setup, your credentials to for example production schemas are hidden behind the dblink. A DBA will set up the link, you use it without knowing the details for production data.

I'll see if I can quickly add the #2 option at a later point. It is a bit simpler, in that you on the connector override the database link name with your own. However it allows the source to still use full credentials and therefore potentially gives data access to the wrong people.

BTW, not what I was thinking when I started but you can apply the same to a "remote target". Simply deploy the mapping in what I have as a target here. Create a remote location and a dblink loc for target (e.g. data going to that). Drag that operator into the mapping and now you go DBLink two ways...

Great for writing to other DBs like SQL Server of course...

July 31, 2008

DBLinks the sequel

Did the #2 option today on an 11g database (just to remind me that things are different in 11g).

The simple way of getting OWB to use a dblink is to create your own connector (you may have to delete the one OWB creates by default when creating a mapping!) and specify a dblink on it.

connector_created_underloc.JPG

My dblink is created as orcl@tgt_to_hr. It is important to name the connector the same as the dblink without using the servicename entry. So my connector was named TGT_TO_HR.

Once the link is in the db, and you are creating the connector, place the link name in the Details section. Make sure to use Browse... at least you are sure you have the right link name.

choose_dblink_for_connector.JPG

Once you have done this, the generated code will show the dblink being used and uses the schema name (as we saw earlier). Deploying the code will now use the precreated dblink.

August 7, 2008

Working in an SAP production environment with OWB

SAP is a bit different... it is more structured than most other environments and there are specific requirements on running ABAP programs in production systems. Since this is a common topic, and since we have a very nice solution (if I may say so) for it in OWB, I figured to write it up on blog.

Some background, OWB comes with an SAP connector, and has done so for its entire life. We have invested quite a bit of time, effort and knowledge into this capability. Not just from our development staff, but also leveraging SAP expertise from New Frontiers (www.newfrontiers.com). Active users have worked on the connector to make sure it addresses things like the above described need of managing ABAP code in SAP (more later in detail of course). New Frontiers actually created an entire data warehouse out of the box based on the Oracle stack with OWB as the tool to create all ETL from SAP into Oracle and using OWB to model the entire dimensional model. More on that is here.

So what is the problem? In general, you cannot move some custom ABAP into an SAP production system without the SAP administrator actually moving it. So since OWB deploys into the system, that would not be acceptable as the program gets "injected" from the outside. OWB generated ABAP therefore must play within the rules.

Here is how this works in a real environment:

In the development environment you create your SAP location, and deploy the mappings. The system is open, so you run this directly from OWB. Not a problem. Now to move anything into the QA and Production systems, you cannot use the OWB deploy of a mapping (which will go into the SAP app and throw in a custom document - no SAP admin likes this).

QA / Production:

Once you have deployed the actual program into the DEV environment, you will have the SAP admin transfer (that is an SAP term with an SAP application screen) the program to the QA or Production system. While he/she transfers the program, variables can be changes like the data file location etc. These are specifically set up as variables (I forget the actual SAP name for a variable) in our generated ABAP for this purpose. Now the program is correctly loaded and moved without other modifications into the SAP application.

Some customers now want this executed from within SAP, some do still want this executed from within OWB. In the latter case you create a location pointing to the actual SAP production system, and in that location you tell us to use a function module Z_blabla. This function module is custom and we have example code for this. You also tell us in configuration that you do not want to deploy, but just want to execute the ABAP. Now OWB skips deployment (the actual Report is already there) and just executes the transported ABAP from its new location. The data file is generated, OWB will FTP the file and load it.

August 11, 2008

SAP, OWB and the database

Forgot to add this to the SAP blog post... while we are doing all this stuff on the SAP system, many people wonder if this only works on SAP on Oracle Database.

The beauty of using ABAP and SAP JCO components is that it does not matter! To OWB it is completely irrelevant what source database is used underneath SAP R/3. It all works outside of the database and there is no SQL being used whatsoever. That makes the solution so nice and so comfortable for SAP administrators. It is all in SAP, adhering to SAP rules and using SAP code.

So whether SAP runs on SQL Server, DB2 or Oracle (or anything else), you can use OWB to extract data from SAP.

Now, if SAP runs on Oracle, and not many people know this, you can treat the system as Oracle as well. So you can do what many (or most) other ETL tools do and go to the SQL layer. Not a problem. However I would recommend not to do this, and any ETL tool that uses SQL to access SAP is destined to become a problem. SQL looks at database tables, not pool and cluster tables. These are SAP constructs, not database constructs. Some of the most important tables in SAP are non-transparent (e.g. pool or cluster). OWB approaches these tables using ABAP so it can extract from them, no problem!

August 25, 2008

Sizing an OWB system...

Should I even go here...? Maybe not, but one thing that I thought would be interesting to mention here is an avenue to take when you are struggling with sizing. Yes, this will have loads of disclaimers but might be a better place to start than just out of the blue...

So, lets say we are trying to set up an OWB system, which may be a DW or Data mart. What hardware do we run this on so that we get some half decent performance? Good question. Ask the hardware guy, and you'll get lots of cool stuff that you may not need (or may, but find to expensive). Ask the software guys, and you'll get puzzled looks. Or a pointer back to the hardware guys.

Now what do you do. Well for starters, look at this page: http://www.oracle.com/solutions/business_intelligence/optimized-warehouse-initiative.html

Oracle has addressed the issue with pre-defined configurations that create a BALANCED system. It is not a system that will give you infinitely great performance for the smallest $$$ amount... it is a balanced system. This means you will get a good throughput in a normal data warehouse like situation. It does this balancing because it specifies all components. So not just CPU and memory, but disks, HBAs (connect your CPU to your disky things and is kind of required to actually load data and process it - yet often forgotten).

Back to sizing for OWB, these systems will do a good job because they are balanced, and we would expect them to be a good starting point for any DW system.

Since OWB does the ETL in the database, you would not need any extra hardware to host the ETL server, it will go into this single system. Due to that, sizing is simpler and hardware utilization is better.

For data quality sizing is a bit more complex, but even here, if you look at the basic systems, sizing a profiling application is roughly equal. Look at the total data size, determine complexity (much less users on the profile run, a few doing simple queries after it is profiled) and these configs would work as a good starting point as well.

Looking at that optimized warehouse page, note that most vendors are present, so you can kind of pick the one you like. Now I know the temptation is there to skimp on stuff, but if you start with this, and don't skimp on items, you might just have sized your system about right...

September 8, 2008

Event Based Execution - File Arrival Example

If you want to do truly event based execution based on arrival of a file there is a convenient package supplied with the Oracle database scheduler component that can be used, the package can be downloaded from OTN here.

For example if you have an OWB mapping that uses an external table as a source and you wish to execute the mapping when a file arrives on the filesystem (that the external table is dependent on), then you can do this a number of ways, here we will illustrate how the database scheduler is used to execute the map.

After the package otn_sched_file_watch has been installed from the Oracle scheduler download described earlier we can use it to execute any job we have privilege to. For example given the following mapping;

owb_event1.JPG

which uses the file in STAGING_DIR named geog.csv;

owb_event2.JPG

If we wanted to execute the mapping when geog.csv arrives on the file system then we can create a file watch schedule and then create a job, this is described in the readme file for the file watch package. For an 10gR2 mapping this would be like the following (11g would also include the set_workspace code);

begin
otn_sched_file_watch.create_file_watch_schedule('GEOGFILES','STAGING_DIR','geog.csv',FALSE);
dbms_scheduler.create_job(job_name=>'FA_JOB',
schedule_name=>'GEOGFILES',
job_action=>'declare v varchar2(4000);r varchar2(4000); begin load_geog0.main(v,''false'',r); end;',
job_type=>'PLSQL_BLOCK',
auto_drop=>false,
enabled=>true);
end;
/

The mapping is a PLSQL implemented mapping so we can use the MAIN procedure to execute the mapping (as defined in the job_action parameter), we could also have used the OWB runtime APIs to execute the map or any process flow for example based upon events like this. The action in this case is a PLSQL block as defiend by the job_type parameter. We use the schedule GEOGFILES in the job creation which was defined using the api otn_sched_file_watch.create_file_watch_schedule

The file wait package will poll every 5 minutes for a while, as defined in the documentation the dba can change the setting using s function (otn_sched_file_watch_int.set_file_watch_intv) or each job owner can alter the frequency interval using dbms_scheduler.set_attribute using attribute repeat_interval.
for example
exec dbms_scheduler.set_attribute('OTN_SCHED$_FILE_WATCH_JOB', 'REPEAT_INTERVAL', 'FREQ=MINUTELY;INTERVAL=1');

will change the repeat interval to every minute, a bit too frequent but ok for demos and getting the environment setup. See the DBMS_SCHEDULER documentation for the syntax for this property.

As you can see this is fairly straightforward to leverage and allows you to move data upon events rather than time based schedules. There are also some ftp based routines that are worth checking out in dbms_scheduler, these could be embedded as pre-post mapping triggers for example all leveraging the components of the Oracle database.


October 16, 2008

Data types - using complex data types in ETL

Not sure this is a widely known fact, but one of the changes we made in OWB 10.2 (and up of course) is the extensive support for complex Oracle data types. Areas such as spatial data will benefit from this of course, but now even simple complex (if that exists) data types are a piece of cake.

You can import or define user defined database objects in the data object editor. These user defined database objects now inlude things like Varrays, nested tables and object types. OWB also handles nesting types within other types.

If you are on newer database versions, you will also be able to use things like data and timestamp and other simple new data types. One of the things we have not yet completely done is to update the Long data type support. That is actually going to be on the list (no promise) for the next update of OWB.

Some of the above is covered in this whitepaper. It has much more interesting information around data modeling and support for modeling database features such as partitioning, materialized views etc. Again, many of these areas are going to be updated in upcoming releases of OWB.

November 20, 2008

Using 3rd party schedulers with OWB

We get this question a lot, so it warrants a post on the blog I think. Most people are asking how to schedule OWB jobs with a 3rd party scheduler like Control M, or Autosys or any other enterprise wide scheduler.

The other question; should I now ignore workflow and if I do what does that mean? is also quite interesting and related of course.

Now, lets look at workflow and schedulers first. If you use a process flow in OWB to create a chain of jobs with email notifications etc, you will deploy this process flow into the Oracle Workflow engine. That engine lives within the database and you can call the actual job (so the entire process flow) using the regular OWB execution templates. In other words you don't have to make an either/or choice here. You can deploy to Workflow and still use Control M for example to schedule the job.

You can also choose to not use the process flows and therefore not use Workflow at all. You will then need to build your dependencies in the external scheduling/flow engine. In that case you will refer to the mappings as SQL or PL/SQL jobs (more about that later). Now you have full control over the flow in the external scheduler. You will also use its notifications etc.

The interesting information is now how to call either the process flow, or how to call an actual mapping. If we look at the PL/SQL mappings they are quite simple to call. You can call the actual main of the mapping. A simple way would be to do something like this:

-- Run in the schema where the map is deployed
declare
status varchar2(4000);
begin
>mapping_name_goes_here<.MAIN(status);
end;
/

This little block you can now plug into the scheduler and it is now a job that will run the specific mapping.

Since OWB mappings can be non-PL/SQL objects (e.g. a SQL Loader call or an ABAP mapping for SAP) you cannot call all mappings like the above example.

Therefore OWB ships a script called the sqlplus_exec_template.sql, which is located in the /owb/rtp/sql. The exec template allows you to run any mapping from SQL. In that case from an external scheduler you would call SQL Plus and then run the above SQL script with the appropriate parameter values (open up the script for examples and for the actual parameters required).

The sqlplus_exec_template.sql script also allows you to run a process flow. So if you are using the process flows and have deployed them to workflow, use the sqlplus_exec_template.sql script to run the process flow from the external scheduler.

All in all, there are many ways to run jobs. The above is a brief summary on what options you have when running OWB jobs of all sorts. While you are reading about schedulers, make sure to also look at the Oracle Database Scheduler in 11g. It allows you to run jobs on non-Oracle systems via a nifty small agent. David wrote about this in some earlier posts on this blog.

December 8, 2008

Oracle Warehouse Builder for SAP ETL: OWB SAP Application Connector

One of the less well-understood capabilities of Oracle Warehouse Builder is our SAP Application Connector. Many companies implement SAP ERP systems, and need to extract data from them into Oracle data warehouses. Warehouse Builder enables easy access to the data in these SAP systems through the connector, which is a separately priced option.

The connector has been around for several releases now and is a mature product being used successfully by many OWB customers. However, the public documentation on the connector has been rather sparse and in need of a refresh... We recently updated the material to provide a lot more detail on how the connector works and how to use it. You can get the full details here, in the OWB Connector for SAP documentation. Note that while this documentation refers to OWB 11.1, most of the information in it applies to older versions of the connector as well, such as OWB 10.2.

OWB ETL mappings to extract from SAP are designed within the familiar paradigm used by the rest of OWB. Oracle Warehouse Builder generates code in SAP's ABAP language to perform data extraction in a manner completely in line with SAP's supported mechanisms. Furthermore, the SAP connector allows you to comply with the administrative and security processes of the SAP environment, which should make your SAP administrators happy.

If you have more questions after reading the new documentation, let us know...

January 6, 2009

Process Flow Parameters

Ever wondered how to pass values out of one process flow activity into another? The use of variables and activity parameter bindings is the key. In the example below process flows are used as the activities, the same applies for any activity with parameters.

In the image below, the process flow named MASTER has 2 child flows; CHILD and CHILD_1, an output parameter of the sub-flow CHILD is passed into the input parameter of another sub-flow CHILD_1. To do this, the output parameter must be stored in a local variable DATASTORE of the MASTER flow and this variable is then passed into the next activity as an input parameter.

owb_flow_param0.JPG

Another common query is how to actually add variables and parameters in the first place. This is done by selecting the Variables node or START1 node in the selected objects panel, and clicking on the green plus button, not exactly intuitive, but that's how its done;

owb_flow_param0.JPG

After the variable or parameter has been added, you can select it in the tree and change its properties in the Object Details panel. A couple of bits of info that its not obvious how they operate.

January 19, 2009

OWB Integration with Salesforce.com Data: Sometimes The Old Ways are the Best Ways

A request that we get frequently enough to comment on is, "How do I use OWB to move data between Salesforce.com and my Oracle database?" Data warehouse customers in particular seem to be running into this one.

Usually, the person asking expects that some kind of web service call will be the solution of choice. Perhaps it would be, in some arcane cases. There are also ODBC drivers for Salesforce.com, which OWB could connect to with similar results to other ODBC data sources. (Of course, Oracle support won't help with those third-party components.)


However, bulk data movement is the most common case, and there are tried and true ways of doing that without the latest SOA secret sauce.

The moving parts of the simple-enough solution are:

Continue reading "OWB Integration with Salesforce.com Data: Sometimes The Old Ways are the Best Ways" »

March 13, 2009

Data Integration, Data Quality and Cloud Computing: See our LinkedIn Group

Sometimes this blog is very into nitty-gritty, tactical technical suggestions, but sometimes (like when you’re coming up on the end of the week and tired of your to-do list) it’s useful to raise your head and look a few years out at changes in the industry that will probably eventually impact your day-to-day concerns. I tend to look at cloud computing at times like this and try to see how it will affect data integration and data quality.

Recently I ran across this blog post on cloud computing, BI and DI from the Open Group Cloud Computing summit:

A few observations around BI and cloud computing from the show in attending other talks, and just in the hallways over coffee.

First, cloud computing can't not progress forward without a clear data integration and business intelligence strategy, if you ask those in charge.   The core questions are:  How will my information get back into my enterprise when I need it, and better yet, how can I consider data stuck in the cloud in the context of my BI requirements?

Second, security.   Enough said.   

Finally, cultural issues around leveraging platforms we don't own.

Cloud computing is not evil, indeed it's an opportunity to leverage databases-as-a-service, and even information-as-service, at a price point unheard of.   However, you need to place all of this in the context of a cloud computing strategy that specially addresses data security, BI, MDM, and data integration.   

I would like to know how the OWB user community is thinking about such issues as cloud and SAAS become inescapable parts of our environment. I’d also like to shamelessly promote the LinkedIn group the OWB team started on data integration and data quality offerings from Oracle, and foster discussion there.

So:

  • if you use OWB or just follow Oracle data integration offerings, come join the OWB LinkedIn group. It will put you in contact with hundreds of Oracle and outside people who work with or work on Oracle Warehouse Builder and our other data integration offerings. (There are also job listings and recruiters in the group, which may or may not interest you.)
  • If you’re interested in these topics, come join our discussion over at LinkedIn.

Hope to see you there…

March 20, 2009

Process Flow Variables – To Quote or Not To Quote

Here is some information on the process flow variable literal quote or not quote query. Its not exactly intuitive, so hopefully this helps! The flow designer has to know what is a PLSQL activity and what is not. Process flow variables are added in the same manner as process flow parameters (see the post here for details)


Literal = FALSE

When Literal = FALSE is set then the value entered must be a valid PL/SQL expression which is evaluated at the Control Center e.g.
'Hello World!'
22 / 7


Literal = TRUE

When Literal = TRUE then the value is dependent on the the type of Activity.  If the activity is a PL/SQL object i.e. Mapping or Transformation, then the value is PL/SQL snippet.  The critical difference here is that the value is macro substituted into the call for the object.  The format of the value is identical to that entered as default value is the Mapping editor. e.g.
'Hello World!'
sysdate()

If the activity type is not a PL/SQL object then the value is language independent. e.g.
Hello World
3.1427571

Some illustrations

So what happens when you get it wrong? The following variable definitions and default value

variables0

will give the following errors when the flow is executed:

Error
RPE-01003: An infrastructure condition prevented the request from completing.
PFLOWCON
Error
RPE-01038: Failed to evaluate expression declare "$LOOP_DETECT$" NUMBER := 0;function "PFLOWCON" return VARCHAR2 is "WARNING" NUMBER := 2;"RETURN_RESULT_NUMBER" NUMBER := NULL;"NUMBER_OF_ERRORS" NUMBER := 0;"SUCCESS" NUMBER := 1;"ITEM_TYPE" VARCHAR2(4000) := (owbsys.wb_rt_task_variable.get_value(12137));"ERROR" NUMBER := 3;"EVAL_LOCATION" VARCHAR2(4000) := (owbsys.wb_rt_task_variable.get_value(12140));"OK" NUMBER := 1;"FAILURE" NUMBER := 3;"AUDIT_ID" NUMBER := 12136;"RETURN_CODE" NUMBER := NULL;"NUMBER_OF_WARNINGS" NUMBER := 0;"ITEM_KEY" VARCHAR2(4000) := (owbsys.wb_rt_task_variable.get_value(12139));"RETURN_RESULT" VARCHAR(64) := NULL;"PARENT_AUDIT_ID" NUMBER := NULL;"OK_WITH_WARNINGS" NUMBER := 2;begin "$LOOP_DETECT$" := "$LOOP_DETECT$" + 1;if "$LOOP_DETECT$" > 2 then raise_application_error(-20001, 'Loop detected calling "PFLOWCON"');end if;return to_char(defaultvalue);end;begin :result := "PFLOWCON";end;.  Please modify the expression, redeploy and retry again.
PFLOWCON
Error
ORA-06550: line 1, column 831:
PLS-00201: identifier 'DEFAULTVALUE' must be declared
ORA-06550: line 1, column 816:
PL/SQL: Statement ignored

This error happens when you have a variable defined as a STRING, with literal defined as false, and the value of the variable is not quoted.

So the following two variable definitions are equivalent and valid:

variables1

So watch out for the RPE-01038 exception and check the values and literal definitions.

May 6, 2009

Headless Operation – OWB code generation, going it alone

There are a few different implementation scenarios for moving designs from OWB into an execution environment, out of the box OWB has a runtime service component which supports deployment and execution of all of OWB's design objects. The standard deployment through the control center is the approach the majority take, it is the one we document and include in training etc. This post will illustrate how you can roll your sleeves up and go it alone in a headless mode, using the deploy to file utility (on exchange here). In the OWB designer you can generate code for any object, take the code by saving it to file or copy-pasting and run where ever you want. This utility speeds up this process and saves scripts for a bunch of objects into a directory and provides a convenient top-level wrapper script.

The deploy to file utility uses some OMB scripting commands to generate a directory of SQL scripts and control files for objects in OWB's designer. Not all objects can be deployed in this way, but common ones can;

  • tables/views/materialized views
  • dimensions/sequences
  • procedures/functions/packages
  • mappings (PLSQL/SQLLoader)

The utility will generate the code as a bunch of SQL scripts (and SQLLoader control files) and be installed independently of the OWB product (there is still the need for the repository tables for the runtime etc.). The utility provides the ability to deploy from a collection or an Oracle module. It uses the OMBDEPLOY ....AS SCRIPT TO ... grammar to generate a script that can be executed (deployed) in SQLPlus like any other Oracle SQL script. For SQLLoader mappings a control file is generated and this control file can be used by invoke SQLLoader command line and passing the data input file name on the command line. For execution of the PLSQL mappings use the MAIN function method (not the procedure entry point). Process flows cannot be deployed since the code is XPDL (XML) that is interpreted by the runtime adapter and transformed into Oracle Workflow for example. Executing in this manner will also not give the full audit capabilities of OWB, so there are a bunch of caveats.

Users can use collections to group sets of objects together and use this as a deployment unit. For example to deploy all objects in a collection simply right click on the collection node and select 'Deploy To File', the example below will specifically only generate code for the table and mappings in the collection DEP_TEST;

deploy2file3

Next you will be prompted for the directory to generate the SQL scripts in;

deploy2file4

That's it. The utility will generate all of the SQL scripts and save them to file also generating a master create.sql script (dependent order is not guaranteed). The utility is an expert, so can be altered, enhanced to provide such capabilities.

The utility is also available from the Oracle module node :

deploy2file5

When executed from the Oracle module node you will be prompted to select the types you wish to deploy, so you can decide to only generate tables or mappings or any combination can be selected;

deploy2file6

A handy little utility.

I mentioned that not all of the OWB metadata is included since manual deployment was chosen, so you cannot see the deployment information in the OWB runtime tables, or the executions from the control center. The audit details such as errors and row counts are still performed. For example taking the following query;

  • select task_type,exec_location_uoid,object_name,object_type, number_task_errors,number_task_warnings from all_rt_audit_executions

The task_type, exec_location_uoid are normally populated when executing an object deployed through OWB, this ties all the metadata together that the control center manager and browser use, When you manually deploy the script this is unavailable, audit rows are still recorded so the number of task errors and warnings are still reported. The auditing is still done for PLSQL maps, for SQLLoader maps, scraping of the audit details from the log into the audit tables is done by the control center service, so manual execution will not have such audit info.

There is a SQLDeveloper report for viewing the execution information included in the zip. Import the file owb_sqldev_headless.xml from the 'User Defined Reports' node in the Reports panel of SQLDeveloper. This will create a report 'Mappings (headless)' under 'User Defined Reports/OWB/Execution' folders. The report is a master detail report showing;

  • throughput
  • mapping audit
  • mapping errors
  • throughput chart
  • timing analysis

deploy2file9

This approach may be useful for you when you are happy manually building the system and leveraging the code generation capabilities of OWB. This is definitely for the more savvy OWB user since you will be responsible for the database links/directories for example that are used by the mappings and for orchestration of the mappings etc.

July 27, 2009

Why does it take forever to build ETL processes

Since we are having a question over on the Data Warehouse Blog, I figured to cross post that discussion. It is, as the title says a bit of a chat on why ETL often takes so much time to get right.

The original posting is here. A follow up - based on some comments - is posted here. Quite curious to see what the OWB community thinks about this.

August 10, 2009

Writing Master Detail Files

I've seen this question a few times about how to write a master detail file using OWB, so thought a quick post was in order. Here you'll see a technique for writing such files and using the union all and sorter operator to process the records in order. The OWB flat file operator lets you write a single record at a time, so you have to be creative to use in the multi-record master-detail case.

The example we'll look at is writing a master detail set of records representing the EMP and DEPT tables from the SCOTT schema. So the DEPT records will be our master records, and the EMP records the detail.

  • department_information
  • employee_information
  • employee_information
  • employee_information
  • department_information
  • employee_information
  • etc.

The mapping will perform a UNION ALL on the columns from EMP and DEPT, and then sort the columns by DEPTNO and DEPTNAME in order to group the DEPT and EMP records together, then we will have an expression which will result in a single VARCHAR2 which is then written to the target file.

write_master_detail1

The set operation as mentioned above performs a UNION ALL on the EMP AND DEPT records, the sorter sorts based on DEPTNO and DNAME (so we get the DEPT records interspersed with the EMP records).

The sort operation as mentioned includes both the DEPTNO and DNAME columns;

write_master_detail2

The expression results in a single string with the delimiters concatenated into the string, this is the 'record'. Obviously if you had a much larger record size you would need to chunk the expressions.

write_master_detail3

This results in a file like the following with each department and associated employees immediately following;

write_master_detail4

Hopefully this is useful and may inspire some more ideas and thoughts!

August 12, 2009

Parallel Unload to File

Here we see how to to construct a parallel unload mapping leveraging the Oracle Database parallel table function capabilities to write the results from a PLSQL Ref Cursor to file. The example also illustrates how the table function arguments are constructed within OWB. The format of the file is determined by the user in the mapping by an expression defining the data to be written.

The example is a simple table to file scenario just to illustrate the basic mechanics, but the query/operator set can obviously be more complex. Below we see an example writing the results of a SALES table to a file, the target is a PLSQL table function which has a refcursor parameter; a ref cursor which produces the record to be written to the file (the filename and directory name are embedded in the function definition).

parallel_unload8

The record itself is a string and is a concatenation of the columns, below we see the fields delimited by commas, if you wanted any specific locale expressions or wrapped in quotes for example you can do this here.

parallel_unload2

The table function itself has details of the underlying PLSQL function used and an indicator that it is being used as a target in the mapping (you must set this, its not smart enough to figure it out).

parallel_unload3

Now you see that the table function operator has 2 groups, 1 input group and one output group. There is a property which defines the type for the group and if you have a mixture of ref cursor inputs and scalars then you will need more than one group. So I have one group (INGRP1) for the ref cursor which produces the record to be written, it is defined with a type of ref cursor (so the operators sourcing the attribute in this group will be input via a PLSQL refcursor for use in the function).

parallel_unload9

The output group is simply returning a table of scalars so I made sure the group has that property set.

parallel_unload6

That's about it, so when we generate the code we see a call to the table function ParallelUnload wrapped in a TABLE call, and the parameter passed in; a ref cursor containing the query representing the operators sourcing the attribute (the 'select blah from SALES' statement). Note also the PARALLEL hint which I configured on the SALES table and defined 5 as the number of concurrent servers.

parallel_unload10

Running this will allow parallel unload to file. The function

create or replace type NumSet as table of number;

/

create or replace package RefCur_Unload as
    type Line is record (vc varchar2(32767));
    type MultiLine_cur is ref cursor return Line;
end;

/

create or replace function ParallelUnload
       (r RefCur_Unload.MultiLine_cur) return NumSet
   PIPELINED  PARALLEL_ENABLE (PARTITION r BY ANY) is
   i binary_integer := 0;
   rec2 varchar2(32767);
   out utl_file.file_type;
   filename varchar2(256) := 'dbunload';
   directoryname varchar2(256) := 'MY_DIR';
   vsid varchar2(120);
begin
   select sid into vsid from v$mystat where rownum=1;
   filename := filename || vsid || '.dat';
   out := utl_file.fopen (directoryname, filename , 'w');

   loop
     fetch r into rec2;
     exit when r%notfound;

     utl_file.put_line(out, rec2);
     i := i + 1;

   end loop;

   close r;

   utl_file.fclose(out);
   PIPE ROW(i);
   return ;

end;
/

The function uses the PIPELINED AND PARALLEL_ENABLE keywords to indicate the table function and parallel capabilities, the Oracle Database Data Warehousing Guide has more details on this in the 'Loading and Transformation' chapter if you want to dive into more detail. Also each parallel session of the function’s execution will write a data file (so data file will be named dbunload<sessionid>.dat). There was an earlier post on table functions (here) as source and target operators which is also worth checking out, they are a useful way of adding some custom transformation code into the mapping design, and all leveraging Oracle PLSQL.

August 13, 2009

Pattern Matching Conditions

The Oracle database has a range of powerful regular expression based pattern matching functions that can be used to filter and match data, in OWB for example conditions can be used in many places such as filters and joins. This post illustrates how to incorporate pattern matching conditions into OWB today in mappings you design, the data rules within OWB utilize these routines for when they match data (identification numbers, telephone numbers etc.). As the condition types evolved through the releases of the database as did support in OWB, but there is a little hoop to jump which was pointed out from Peter at Rittman Mead Consulting.

The OWB validation for condition varies, so the expression may work in a filter but might be slightly different in a join clause. The REGEXP_LIKE condition is not handled fantastically in the join condition (putting it politely), so we have to do a wee (Scottish accent coming out there) workaround.

So let's look at a basic example from the manuals, the following query returns the first and last names for those employees with a first name of Steven or Stephen (where first_name begins with Ste and ends with en and in between is either v or ph), simple right:

SELECT first_name, last_name
FROM employees
WHERE REGEXP_LIKE (first_name, '^Ste(v|ph)en$')
ORDER BY first_name, last_name;

This uses the EMPLOYEES table in the HR demo schema, it results in the following;

FIRST_NAME           LAST_NAME
-------------------- -------------------------
Steven               King
Steven               Markle
Stephen              Stiles

Now building the same mapping with the same condition in OWB works fine;

owb_regexp1

We get the map validated fine and code generated, great so far.

owb_regexp2

Now we try using the JOIN operator and have a fabricated example where the regular expressions to be matched are in a table MATCHING_TAB and we will join this with the EMPLOYEES and use the REGEXP_LIKE function to match;

owb_regexp3

When you try generating the code for this map design you will get an error stating the join condition is not valid and that a join condition must be a boolean condition. The OWB code generator is parsing your conditions here and has support for a fixed number of condition types, to use REGEXP_LIKE we have to be a little creative.

owb_regexp4

So you can't just do REGEXP_LIKE(...) =1 or anything, one approach is to use the CASE statement, to test the pattern matching condition and return 1 if matched and 0 otherwise, then we can test for 1=1...phew! Here is a working design that provides OWB with a valid boolean condition;

owb_regexp5

You can see the join condition uses the FIRST_NAME column from EMPLOYEES and the EXPR column holding the regular expressions from the matching tab in the REGEXP_LIKE function. A quick run through a query that others will hopefully find useful.

August 27, 2009

Matching Abbreviations and Acronyms

A quick post on how to match data using composite match rules for matching abbreviations or matching acronyms. For example we want to match the following tokens;

  • International Business Machines
  • Intl Bus Machines
  • IBM

One of these is a full name, one is an abbreviation (Intl Bus Machines) and one is an acronym (IBM). How can we match multiple rows and cherry pick information from matching rows to construct a result without using custom code? Let's have a look and see...

The map below has a source DATA1_ABBREV (its an external table on a simple flat file) which includes the 3 strings we are matching above, we pass in a constant with the value 1 for our binning strategy.

match_merge1

The mapping uses the match merge operator in OWB which lets you define rules for how to bin data, match data and merge it. There are a wide set of rules and you can add custom rules too.

Binning

Binning is basically a divide and conquer approach to speed up the processing. For example, if you are loading one million rows into a table with a million rows then you want to avoid doing a million by a million comparison. Binning allows you to subset this processing as matching is only performed within the record set for that bin.

The smaller the match bin, the faster the processing, however, rows will only be considered within the match bin so you need to make sure your binning does not exclude any possible matches (that you do not want eliminated). You can have match bins based on any fields passed into the match-merge operator and since these can contain any operator (including constants,expressions and transformations) you have lots of flexibility. In the case below a constant is passed in so there is one bin (not the typical case, but we have a small set of rows for a demo).

match_merge2

Another binning example is on country name, suppose the bin is based on the first 2 characters of the country name which would result in say 500 records. This would considerably speed up the processing but would miss records if someone has put SOTLAND instead of SCOTLAND. Setting the match bin to just the first character would be more accurate but would result in more match tests.

The Match Rules

As we mentioned earlier we have full strings, abbreviations and acronyms to match. We can add multiple match rules, below we have a match rule MA_0 which is of type conditional and in the detail table we see it uses the Abbreviation algorithm.

match_merge3

There is also a match rule MA_1 which is also of type condition