Main

Metadata Archives

April 25, 2007

Error handling using Data Rules

This has been a topic on the shows we presented, but it is a very powerful feature to use, so here is a little bit on how to use data rules in ETL. Note that data rules are part of the Data Quality option to Warehouse Builder.

To create this solution we identify a number of discrete phases after the initial setup steps which are generic for creating sources and targets in Warehouse Builder:


  1. Create a data rule to document the business rule you want to enforce
  2. Apply the data rule to a specific table (or set of tables is the rule applies on more tables)
  3. Create a mapping with the tables and set the properties for error handling
Create a Data Rule

Data rules are metadata definitions of a business rule you define. You can use some of the pre-defined rules in Warehouse Builder (as-is or customized) or create completely new rules. In this case we will create a new rule.

Steps:

Import the HR schema (you may find that a lot of the data fails the rule, but that is not the point we are making here...)


  1. In the HANDSON project locate the Data Rules node
  2. Right mouse click and choose New...
  3. Name the new module ERROR_HANDLERS
  4. Click OK, this launches the Data Rule Wizard
  5. Click Next in the welcome screen

  6. Call the new rule EMAIL_VERIFICATION
  7. Since we will be verifying the pattern for email addresses, on the Define Rule panel set the following:

    Type: Common Format
    Format: Email Address

    CreateEmailRule:


  8. In the Values section you see the default pattern (in Regular Expressions) applied, since this is a rule that will only recognize 3 letter domains (.com, .org, .net etc.) we will modify the rule Highlight and copy (ctrl-c) the text from the first line. Paste the text into the second line visible (the empty line shows this is extensible)

    RegularExpressionEmail:


  9. In the second line find the +. then locate the {3} behind it, we will change this to recognize 2 letter domains. Change the {3} into {2}

    RegularExpressionModified:


  10. Finish the wizard
Regular expressions are a very interesting language, to learn or read more take a look at the Oracle Regular Expressions Pocket Reference by Jonathan Gennick and Peter Linsley. Nice format, great reference for playing with regexps...

Apply the data rule

The important thing to understand when using data rules is that, first they are applied to a table (this can be a source, but read on for that), only then are they moved into the mapping to produce ETL routines.

Because the rule is applied to the table and because you can choose to remove errors from the main table into an error table (no this is not DML error logging! More on that in a later post, it is coming in 10.2.0.3!), you will need to regenerate the table. The error table DDL is generated only when the main table (after applying the rule) is generated.

Applying the rule to any table is done in the Data Object editor:


  1. Copy the HR.EMPLOYEES table into any target module
  2. Name it STG_EMPS
  3. Open the STG_EMPS table in the editor (double click the table)
  4. Navigate to the Data Rules tab

    ApplyDataRuleDOE:


  5. In the Applied Rules panel, click the Apply Rule button to add a new rule to the table
  6. Click Next on the welcome screen of the wizard
  7. Find the ERROR_HANDLERS module and select the EMAIL_VERIFICATION rule
  8. Click Next
  9. Change the name (of the usage) to ETL_EMAIL_VERIFICATION
  10. Click Next

    ApplyRuleWizardBinding:


  11. In the Binding, find and select EMAIL
  12. Finish the wizard
  13. The rule is now attached (applied) to the STG_EMPS table
If you now generate the STG_EMPS you will see an STG_EMPS_ERR table as well. This table gets deployed when you deploy STG_EMPS (sorry no other way to do this). In configuration you can change the error table name as well if so desired.

Once the STG_EMPS and STG_EMPS_ERR tables are deployed and thus in the database you can start creating a mapping that utilizes the data rule within STG_EMPS.

Data rules in a mapping

Create a new mapping (if you have the table in a mapping already and it was in there before adding the rule, do a synchronize inbound so the mapping gets the latest table definition including the data rule) and drag the STG_EMPS table into it. Make sure you can see the entire operator:

MapTableOperatorwithRule:

Notice how the operator has two groups. The top group is the regular in/out group for the table data and you can map from and to this group. The second group (ERR_GROUP above) can be mapped from (not into) and reflects all the columns that are in the STG_EMPS_ERR table. So there is no need to join these two tables or to import the STG_EMPS_ERR table to retrieve rows from. You can map from both groups in the same mapping and even join them back together (simply think of them as two tables).

Activating a data rule

Once a table with a data rule is in your mapping, you need to choose how you want to use the data rule. For that highlight the operator and find the operator properties:

ActivateDataRuleInMap:

You have three choices:
The default is Ignore. This simply does not act upon the data rule, so it does not influence the generated code or the mapping logic.
MOVE TO ERROR means that the data rule is enforced when running the mapping and that all data rows that fail the rule (or any of the rules if you have multiple) is moved out of the main table STG_EMPS into STG_EMPS_ERR. That failing row does no longer exist in the STG_EMPS table and any selects from it will not see this row.
REPORT means that the error rows are written into the error table, but are NOT removed from the STG_EMPS table and remain in the regular select flow from the main table.

Multiple rules on the same table are possible and can be individually configured. In the case that 2 rules are violated, the error reason is concatenated for the record indicating multiple violations.

ErrorTableProperties:

A second set of properties governs the behavior of the error table. Truncate the error table simply means that the table gets truncated by OWB before loading into it. You would typically do this if you process the error rows in the originating mapping ensuring the information is captured.

Roll up errors gives you a consolidated and aggregated set of rows, ensuring that each row is only given to the extraction query once when selecting from the error table. Note it does not prevent multiple rows to be present, it merely aggregates them to show a single one. If you need all violations as separate records, switch this to No.

Once you have set all properties, you can actually inspect what the pluggable mapping looks like for this operator, and more surprisingly (I think) you can deploy this mapping as is with only a single operator! So it could play the role of validation mapping. In that case you would probably choose Report as strategy.

As a last step, you can create a mapping the extracts both from the source STG_EMPS and from the STG_EMPS_ERR operators in the same mapping.

May 3, 2007

Designing an XML View in OWB

To write about Travis at The Fillmore or generating views from OWB...? Oh just this once I'll do the OWB thang...


Here we will see one way to design a view in OWB. This is applicable to any view but the XML views are an interesting example, so here goes. The mapping editor can be used to build views and materialized views, this is done by having the target operator as a view operator (or materialized view) for example. After the map is fully designed you can create and bind from the view operator in the map - this will generate the view object with the SQL generated from the map definition for the content of the view.


In the example below we are reading XML from a purchase orders table where the XML is stored in an XMLType column and targeting a view, the map extracts line item information (it uses the components generated from the XML expert on OTN)  along with some details of the purchase order:


XMLViewMap:


When the create and bind is selected a new name can be entered. The view definition generated is shown below:


XMLViewDef:


There are certain parts of the XML/SQL that are not possible in OWB (such as the XMLTable function - these have to be embedded in a view), but the rewrite of these boils down to the same code as the XML/SQL extract style functions anyway. The expression/filter style operators allow a free text style expression to be defined so all kinds of stuff can be done including XQuery within the database.


Back to Travis, I mean work...:)

May 30, 2007

Mapping Generation Language in OMB (Retrieve and Alter)

For those of you playing with OMB Plus, here is just and example on how to find generation languages in a mapping. Also figured it may be helpful to know (in advance) the results you may get from these queries (including context changes which are required):

OMB+> OMBDCC
/
OMB+> OMBCC 'SAP_DWH_QUICKSTART'
Context changed.
OMB+> OMBCC 'QS_STG'
Context changed.
OMB+> OMBDCC
ORACLE_MODULE /SAP_DWH_QUICKSTART/QS_STG

OMB+> OMBRETRIEVE MAPPING 'STG_COMPANY_MAP' GET PROPERTIES (GENERATION_LANGUAGE)
ABAP

OMB+> OMBRETRIEVE MAPPING 'STG_BILLING_US' GET PROPERTIES (GENERATION_LANGUAGE)
PLSQL

OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (GENERATION_LANGUAGE)
SQLLOADER

Obviously this way you can retrieve all properties, here are some interesting ones for SQL Loader (btw, most people may not know these exist, in the UI you can find them under the Mapping => Configuration menu):

OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (MULTITHREADING)
USE_DEFAULT
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (OPERATION_RECOVERABLE)
true
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (CONTINUE_LOAD)
false

And while we are on the topic, here is how you alter a property:

OMB+> OMBALTER MAPPING 'LOAD_CUST_FILE' SET PROPERTIES (CONTINUE_LOAD) VALUES ('true')
Map LOAD_CUST_FILE altered.
OMB+> OMBRETRIEVE MAPPING 'LOAD_CUST_FILE' GET PROPERTIES (CONTINUE_LOAD)
true

Now if you wonder, how does he know the property names? Rather than looking it up I use the following very complex naming algorithm:

  • Look in the configuration UI (in this case for mappings)
  • Find the property you are looking for, for example Continue Load
  • Make it all upper case and substitute spaces with underscores: CONTINUE_LOAD
  • Mostly this works...
And then write your OMB...



June 5, 2007

Making Multi-Configuration easier in 10.2 OWB

One of the powerful concepts in OWB today is the fact that we allow for multiple configurations (it is part of EE ETL option!). In 10.2 the concept while present (you may have seen the DEFAULT_CONFIGURATION stuff on your status bar) is a bit hidden. This expert (which becomes a menu in the tool and can be placed on the right mouse menus) makes some of the concepts more visible and usable.

For example you can now (with the expert in place) see the configuration values for an object (or object type, lets say table) across the configurations. You can think of Default -> QA env -> Prod Env, and then view the parallel settings for a table in a simple spreadtable.

Lots of goodies in this one, so do take a look at it.

Synchronizing Pluggable Maps

This post illustrates a simple process to determine which maps depend on a pluggable mapping component and to actually synchronize them. The MDL
for the expert is here, feel free to download, use, comment and enhance;) It works for OWB 10gR2 and 11g.

After importing the MDL the following 2 steps are a one-off to setup the expert in the repository. You can add a shortcut to run the expert from a pluggable map (the expert only supports pluggable mappings just now);
Experts Add Remove:

You will see a dialog, click on the box next to SYNCHALL, now the setup is done!

Experts Sync:

Now you can actually run the expert from the main tree as if it was another OWB component;

Experts Synchronize Dependents:


Now the fun starts! You should see a dialog appear with all the maps
that use your component, I have 2 maps that use my pluggable map;

Experts Sync Run:



You can select multiple maps and click OK to synchronize. Be patient, you will get a dialog indicating the expert is complete. If you want to see what goes on behind the scenes you can switch the log window on and the task assistant (both expert properties) and see the OMB that gets executed, in the end there are statements such as the following executed to synchronize dependent mappings;

OMBSYNCHRONIZE PLUGGABLE_MAPPING '/DIM_EXAMPLES/PLUGGA' TO MAPPING '/DIM_EXAMPLES/WH/ANOTHER_USE_OF_PLUG' OPERATOR 'PLUGGA' USE (RECONCILE_STRATEGY 'REPLACE', MATCHING_STRATEGY 'MATCH_BY_NAME')

Feel free to
look under the hood of how this works and hope it is useful.


Multiple Configurations and MDL between systems

As a quick follow up thought on the Multi-config post from today, multi config can be a nice way to ensure moving MDL between systems (lets say dev ->  prod) in a simple way.

First of, everyone should really script these "go to production procedures" so the operations staff can run a single command line file to "get the new version and deploy to the database". All of this is possible and yes we should write it up for all to use, but no time... it will come though, promise!

So if you have 2 configurations, one is the default one, the other is called prdconf, you will have 2 control centers linked to them. You will also create two sets of locations for each of the systems. So this latter part is different from 10.1! In 10.1 you would change runtime connection and reregister the locations, that is something I would definitely change with multi-config.

Those locations can be production locations, and I would completely set them up. I would then use security to restrict access to the locations (to make sure the data viewer cannot be used against production sources from within OWB). Either disallow usage of the metadata object location (right click - properties - security) or by not sharing passwords (preferences - security settings / only available for repos admin users).

Now if someone changes configuration (to prdconf) and would try to run the data viewer  they would either get an error (saying no access) or they would be prompted for the password (which I hope they do not know for the prod system).

Back to multi-config, setting up the configurations to hold the object configurations per system will allow you to export the entire thing and not change any configuration nor location upon arrival in the production system.

So lets say we did all of this, we have:

  • 2 configurations
  • 2 control centers
  • 2 sets of locations attributed to the correct control center
  • Set security as desired
Now you go to export and you choose to export a project (just for the sake of the example). You must check the little box to export related objects! That will bring locations across... You also need to decide up-front whether to use match by UOID or name. Either is ok, but you must use them consistently for ever and ever on this set of systems!

Now you do the export and you get the entire project, its locations and BOTH configurations to come across...

To get the deployment in production to work, you go into the repository (or script it) and you make the prdconf the default configuration for the user that will do the import and deployment in production. That way, once you import the MDL it will be directly in the right configuration. That in turn means it has the correct control center set, and the correct locations. So you will "never" deploy to the wrong place...

Once you got this working, like I said, use experts and scripting to build an install shield like process... Should make life a lot simpler...

June 8, 2007

What tables are used in my view?

Metadata is an interesting thing. Views are also interesting, however they typically break the metadata story as they are code. No one typically knows what exactly is in the view and you struggle therefore understanding the impact of a change in tables to all the views you have. This post is the first as to how to resolve this using Warehouse Builder.

I have a simple example, a view called REFERENCE_EXAMPLE and it (at least in theory) contains a query that has 2 tables used in it. They are EMPLOYEES and CITIES. What I will do is to link them together using the OWB repository and then look at the Lineage and Impact assessments.

In OMB Plus go to the module that has the objects (OMBCC) then type the following:

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE 'EMPLOYEES'
View REFERENCE_EXAMPLE altered.
OMB+> OMBSAVE
Save complete.

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE 'CITIES'
View REFERENCE_EXAMPLE altered.
OMB+> OMBSAVE
Save complete.

Now launch Lineage from the view and you get:

ViewLineageToTables:

All of a sudden you can show graphically what objects are related to views. Now if you incorporate the view into a mapping, you can drill either way and the view is no longer that dreaded black box object. This is a very small piece of code, but it can be very usefull especially when the views are already created and are just imported into OWB. Next post I'll show a nice way of creating views in OWB and getting complete metadata...

Oh, and should you want to go cross module, make sure you either do a full path to the object to be referenced or a relative path, for example:

OMB+> OMBALTER VIEW 'REFERENCE_EXAMPLE' ADD REF TABLE '../DQ_SRC/JOBS'
View REFERENCE_EXAMPLE altered.
OMB+>

June 13, 2007

Unregistering a Location

Just noticed this script, which I think many people might not be aware of. Ok it is not something you would need every day, but in some cases you may find yourself in need of it...

In <OWB_HOME>/owb102/owb/misc you find 3 scripts:

  • unregister_location.sql
  • wb_env_util.plb
  • wb_env_util.pks
The only one you care about is the SQL one... Now use this only (like the script says) when you cannot get to the control center anymore. Say the database got deleted (runtime that is), but your location in the client still says it is registered. Since you cannot go to the control center an unregister, you need to do something. That is where unregister_location.sql comes in...

Before running it do read the script, it nicely tells you which user to use it in (Repository Owner).

Oh and for those of you new to 10.2, you should not declare this the default to unregistering a location... 10.2 has this capability in the Control Center Manager and you do not have to go to the browser... So it is already quite a bit simpler. Remember just use this one when you are really stuck.

June 26, 2007

OWB Model Introspection

There are useful OMB APIs (see OMBDESCRIBE) to introspect the OWB model and query object definitions. They provide a useful insight when building accelerators. I've used them for building a few things including the configuration experts on the exchange (slice and dice object/properties/configurations).

So if you want to see the core properties of an object (for example TABLE) you can perform the following from within OMBPlus (or panel):

OMBDESCRIBE CLASS_DEFINITION 'TABLE' GET  CORE PROPERTY_DEFINITIONS
You can get just configuration properties by executing:
OMBDESCRIBE CLASS_DEFINITION 'TABLE' GET  CONFIGURATION PROPERTY_DEFINITIONS
and if you want all properties just omit CORE/CONFIGURATION.

This goes for all OWB object types.

One of the things I like to have is a reference document for a model which clearly defines the capabilities of stuff in the model. OWB provides an API guide for the OMB grammar but most of the properties are mixed in with the grammar, the formatting of the properties is not friendly and its difficult to find things. I started to play with the OMBDESCRIBE api to see if I could generate a model document for the objects in OWB.

With the report I wanted to indicate the properties for each object type in the OWB model; so for example I could see the properties defined for the mapping itself, or the properties defined for the table operator within the mapping. There are other interesting points such as which properties were configurable (could have different values per configuration), which were standard properties and which were user defined extension properties - these are all things with the regular doc it is hard or impossible to find.

I packaged the generator as an expert then at least people can see the use of the OMBDESCRIBE APIs and also use it to document their extensions to the OWB model (since if you have created user defined objects they will also be documented). The resultant document is very raw and highlights some holes that I hope will get filled over time, but I often use this when I quickly want to see for an object what the property name is.

If you want to have a look at the generated documentation it is here, the expert that generated this is here (import MDL and run Public Experts/OWB_UTILITIES/MODEL from the global explorer).

As you can see from the doc it is very raw, but does at least concisely list all the properties for that particular type in a table(phew). Getting this enriched with all the core associations, descriptions, members for domains, object model etc would be a good step forward.

July 18, 2007

What is new in 11g and 10.2.0.3 - Feature 1 Federation with OWB

Ok, you all watched the launch of database 11g (or did you tune in to the OWB webcast on the same day?), so lets write down what is coming in handy new things. To clarify the title, I will be using mostly 11.1 software to show the features, but they should work exactly the same in 10.2.0.3 unless stated otherwise...

So the first one that has been in OWB for a while, but that was broken as well, is how do you create Views and MViews while still keeping lineage and impact in tact? Well the simple answer is that you use mappings. In essence what we are doing here is working on Federation of data rather than consolidation (e.g. ETL). So we will leave the data in place but make it look like it is in a single place for any query.

In order to really do federation and use OWB there are some other pieces of the puzzle as far as infrastructure go that should be in place. In its briefest form a federated system with its Oracle technology in place could look like this:

FederationSchematic:

Now the cool bit here is that, while you can just do this with the OWB piece, that piece makes this really work as you need it to because you get:

  • Full metadata transparency delivering impact and lineage analysis
  • A graphical way of managing your data flows using the OWB GUI
  • A graphical debugger for your views (which are mappings at the end of the day)
  • Simplified maintenance and better documentation
  • Version management of all objects
  • A one button choice to either Federate or Consolidate without changing any logical design
The piece I want to show here, is how you can use mappings to actually create the views and how easy it is to change modes between Federation and Consolidation. Last we'll quickly look at the metadata story with a quick look at Lineage and change propagation.

A mapping for federation would look like this:

FedMappingOverview:

where the most notable thing is that the target in this case is a non-existing view. I simply added an empty view operator into the mapping and mapped the approriate columns from the joiner over to the view, which creates the attributes in the operator.

Next I do a Create and Bind on the view (right mouse click for the menu option) and I place it in a module. The result is as follows:

FedMappingGenView:

As you can see the entire view is resolved, the code is generated (the select essentially) including any location information (e.g. dblinks, schema references etc). This is where the maintenance becomes a lot simpler, connection informatino, dblinks etc are decoupled from the actual query and can be changed based on the system. Simply reuse a new configuration and regenerate...

Now if you want to change the strategy (for performance, or for other reasons), what you do is go back to the mapping and replace the view with a table operator. Lets say you created the table in OWB (whichever which way you want to really), you simply do a synchronize in the mapping to replace the view with a table and your system is now an ETL rather than a Federated system and after you deploy the mapping and the table you can run your jobs into the consolidated system.

Obviously if you would want to do this manually, you would create the table (as in OWB) and then move the select into a PL/SQL package and run that. Sounds simple enough, but if you have hundreds of these views in place the automation you can get from OWB will make this task so much faster that it cannot be beaten by hand coding.

Last but not least, changes can be made much easier and faster in OWB using the lineage and impact analysis functionality (not new in 10.2.0.3 nor 11g, but very relevant for this particular feature). If a source column has changed, you can propagate that directly to the view definition.

ChangePropagation:

There is no need to synchronize anything as this is all been taken care off (this includes the view definition)!

July 27, 2007

More process flow basics - for loops

Here we create a typical programming loop within the process flow iterating over a number of activities based on a numeric index. The flow created represents the following pseudocode;

  assign_rslt := 0;
  for (int iter_idx:=0; iter_idx < number_iterations; iter_idx++) {
    // my activity in the for loop is simply assigning the for loop variable
    // to another variable, this can be much more complex and involve other activity types
    assign_rslt := iter_idx;
  }

Using OMB this can be created as below, the script will create a process flow module named FLOW_EXAMPLE, a package named FLOWPACK and a flow named FLOW_FOR_LOOP. The flow can be deployed and executed after the flow module has been editing and configured and the location set. It has no dependencies since the content of the flow is very simple to illustrate the activities.

set flow_mod_name FLOW_EXAMPLE
set flow_pack FLOWPACK
set flow_name FLOW_FOR_LOOP
set iter_cnt 20
OMBCC '/$OMB_CURRENT_PROJECT'
catch { OMBCREATE PROCESS_FLOW_MODULE '$flow_mod_name' }
OMBCC '$flow_mod_name'
catch { OMBCREATE PROCESS_FLOW_PACKAGE '$flow_pack' }
OMBCC '$flow_pack'

OMBCREATE PROCESS_FLOW '$flow_name'
OMBALTER PROCESS_FLOW '$flow_name' ADD PARAMETER 'NUMBER_ITERATIONS' SET PROPERTIES (DIRECTION,DATATYPE,VALUE) VALUES ('IN','INTEGER','$iter_cnt')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'ITER_IDX' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')
OMBALTER PROCESS_FLOW '$flow_name' ADD VARIABLE 'ASSIGN_RSLT' SET PROPERTIES (DATATYPE,VALUE) VALUES ('INTEGER','0')

OMBALTER PROCESS_FLOW '$flow_name' ADD ASSIGN ACTIVITY 'ACTIV_IN_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD FOR_LOOP ACTIVITY 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD END_LOOP ACTIVITY 'END_LOOP'

OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'ST_2_GC' FROM  ACTIVITY 'START1' TO 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_GF' FROM  ACTIVITY 'FOR_LOOP' TO 'ACTIV_IN_LOOP' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('LOOP')
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'AS_2_EL' FROM  ACTIVITY 'ACTIV_IN_LOOP' TO 'END_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'EL_2_WL' FROM  ACTIVITY 'END_LOOP' TO 'FOR_LOOP'
OMBALTER PROCESS_FLOW '$flow_name' ADD TRANSITION 'WH_2_EN' FROM  ACTIVITY 'FOR_LOOP' TO 'END_SUCCESS' SET PROPERTIES (TRANSITION_CONDITION) VALUES ('EXIT')

OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'CONDITION' SET PROPERTIES (VALUE) VALUES ('ITER_IDX < NUMBER_ITERATIONS')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'INITIAL_VALUE' SET PROPERTIES (VALUE) VALUES ('0')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'VARIABLE' SET PROPERTIES (BINDING) VALUES ('ITER_IDX')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'FOR_LOOP' MODIFY PARAMETER 'NEXT_VALUE' SET PROPERTIES (VALUE,ISLITERALVALUE) VALUES ('ITER_IDX + 1', 'FALSE')

# Define the assign activity in the loop
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ACTIV_IN_LOOP' MODIFY PARAMETER 'VARIABLE' SET PROPERTIES (BINDING) VALUES ('ASSIGN_RSLT')
OMBALTER PROCESS_FLOW '$flow_name' MODIFY ACTIVITY 'ACTIV_IN_LOOP' MODIFY PARAMETER 'VALUE' SET PROPERTIES (VALUE) VALUES ('ITER_IDX')


There are a few properties such as the TRANSITION_CONDITION (values LOOP and EXIT) that are important. The actual for loop activity has a bunch of parameters that constitute the loop variable, the initial value the increment expression and the exit criteria. Hope this helps.

August 16, 2007

UI Expert Extensions

A number of the experts on the exchange use a series of UI extensions to provide shuttle and data entry controls. They are written in java but project a tcl interface that makes them fairly straightforward to use. You may find them useful or you can use the same technique to construct your own.

The UI extensions include:

  • Capture data input in a table (oracle.owb.jexpert.DataEntryTable)
  • Information selection and ordering (oracle.owb.jexpert.ShuttleObjects)
  • Display info in table, action them (oracle.owb.jexpert.ActionTable)
To utilize invoke the components from OMBPlus or an OWB expert.

The Shuttle

In order to get a list of selections from a list and order those items, you can use the shuttle control. Shuttle from available to selected. The UI labels are configurable, the example below illustrates selecting columns from a table that represents levels (in a dimension to be created). It is a modal dialog.

Invoke this example as;
set selection [java::call oracle.owb.jexpert.ShuttleObjects getSelection "Select dimension levels" "Select columns to identify levels:" "Columns:" "Levels" {"PROD_ID" "PROD_NAME" "PROD_DESC" "CAT_ID" "SUBCAT_ID"}]

JExpert1:

The select items will be returned in the variable 'selected', they will be in the order the user has defined. The above returns;
CAT_ID SUBCAT_ID PROD_ID

Data Entry Component

This dialog allows information to be collected from the user via a table. In the dialog below, dimension attributes plus the type to level attribute (applicability) to column binding is captured in one dialog (used in the create dimension from expert).

The tcl to invoke this dialog is;
set rowheader {"Attributes/Levels" "Usage" "CATEGORY" "SUBCATEGORY" "PRODUCT"}
set thedata {{"DESCRIPTION" "Standard" "CAT_DESC" "SUBCAT_DESCRIPTION" "PRODUCT_DESC"} {"" "Standard" "" "" ""}}
set avail_cols {"" "CAT_ID" "CAT_DESC" "CAT_INFO" "SUBCAT_ID" "SUBCAT_DESCRIPTION" "PRODUCT_ID" "PRODUCT_DESC"}
set selection_model {{} {"Standard" "Long Description" "Short Description"  "Effective Date" "Expiration Date"} }
set selection_model [linsert $selection_model 2 $avail_cols]
set selection_model [linsert $selection_model 3 $avail_cols]
set selection_model [linsert $selection_model 4 $avail_cols]

set data_result [java::call oracle.owb.jexpert.DataEntryTable {getdata java.lang.String java.lang.String tcl.lang.TclObject tcl.lang.TclObject  tcl.lang.TclObject} "Level Attribute-Column Details" "Bind attributes to columns:" $rowheader $thedata $selection_model]

JExpert2:

The dialog returns {DESCRIPTION Standard CAT_DESC SUBCAT_DESCRIPTION PRODUCT_DESC} {{} Standard {} {} {}} into the variable data_result.

This control also supports copy-paste, so results can be copied from and to the table. So for example it is also possible to copy from a series of cells in Excel into this dialog to quickly construct objects.

JExpert3:

Alternatively the construction of the dialog can be like the following where the columns listed in column 1 are all of the columns excluding the level identification columns, and there will be a default stating all columns will be attributes of the leaf (EMP for example).

set rowheader {"Columns/Levels" "Usage" "DEPT" "EMP"}
set thedata {{"GENDER" "Standard" "false" "true"} {"MARITAL_STATUS" "Standard" "false" "true"} }
set avail_cols {"false" "true"}
set selection_model {{} {"Standard" "Long Description" "Short Description"  "Effective Date" "Expiration Date"} }
set selection_model [linsert $selection_model 2 $avail_cols]
set selection_model [linsert $selection_model 3 $avail_cols]
set data_result [java::call oracle.owb.jexpert.DataEntryTable   {getdata java.lang.String java.lang.String tcl.lang.TclObject tcl.lang.TclObject  tcl.lang.TclObject}  "Level Attribute-Column Details" "Columns-Level applicability:" $rowheader $thedata $selection_model]

The result of the dialog is a TclList with the selections;
{GENDER Standard false true} {MARITAL_STATUS Standard false true}

JExpert4:

Action Table

This control is for when you use OWB to get a list of objects (for example results of a search) and display this list and allow a set of actions to be performed on them.

set rowheader {"Name" "Description" "Path" "My Funky Info"}
set displaydata  { {"DATA_0" "Cube for DATA_0" "//OSA_PROJECT/OWBEXP_TGT/DATA_0" ""} {"LOAD_0" "Load Cube Map Sample" "//OSA_PROJECT/OWBEXP_TGT/LOAD_0" ""}}
set idref {{"CUBE" "//OSA_PROJECT/OWBEXP_TGT" "DATA_0"} {"MAPPING" "//OSA_PROJECT/OWBEXP_TGT" "LOAD_0"}}
set buttons {{"Edit" "OMUALTER"} {"Properties" "OMUPROPERTIES"} {"Lineage" "OMULINEAGE"} {"Cancel" java::null}}

java::call oracle.owb.jexpert.ActionTable displayTable "Search Results" "Object matches" $rowheader $idref $displaydata $buttons

JExpert5:

In Summary

These are sample extensions that are supplied via the owb/lib/ext directory in the OWB client and are useful for providing controls for some fairly common UI actions.


The jexpert.jar file is included in the OWB 10.2 release under owb/lib/ext, so you will be up and running. In 11g, it was omitted (a bug) but you can download and copy it to owb/lib/ext from here.




January 18, 2008

Tell Us What You Want: Protecting Private or Sensitive Data with OWB

We know people read this blog, and so we try to use it to disseminate helpful tips and tricks.


But we hope to make this more than a one-way conversation with our customers, current and potential.  So occasionally, we'll post a raw idea here-- something for which we don't necessarily have a complete answer today.


We want to know if these issues are really problems for you, and what you'd like to see done about them.


When designing a report or other output based upon your warehouse, the designer may well not know the origin of the data included in those outputs. Thus the person developing a seemingly innocent report-- say, a list of student grades or housing assignments-- might inadvertently include a column exposing social security numbers in the form of a student identifier. Your organization could suddenly run afoul of some privacy law.


Heterogeneous data sources and targets compound this issue, in that features at the database level may not preserve whatever metadata about sensitive information was available at the source.


There are ways to use Warehouse Builder to mitigate some of these risks. For example:



  • Starting with any columns in the source that contain sensitive data, use impact analysis to identify any objects downstream that are based on the column.

  • If you have metadata for your outputs in OWB, you can trace back the lineage from that end to identify any sensitive fields, and determine whether the values have been adequately obfuscated along the way.

  • Data profiling techniques could be used to determine whether any data "looks like" sensitive data (e.g. matches the built-in data rules for a social security number).

Of course, the more metadata OWB has about your sources, targets and generated outputs, the more effective these mitigation techniques are-- which, conveniently, gives you additional incentives to use Warehouse Builder. :)


But you know more about your problems than we do. Let us know what, if anything, like this keeps you awake at night.


Are we going to build exactly what you ask for? Probably not. (If only life worked that way.) But are we are listening? Absolutely. The more we hear from you, the faster we can evolve OWB to fit your needs.

April 14, 2008

What features do you use and what do you want next?

Catch 22 that is what we sometimes suffer from. How do we know what to build when we don't quite know what you really use day-to-day. Sure we can look at our bugs (not that we have many ;-) ), but would it not be great if you could just tell us what is really important for your project?

So fill out the feature survey and let us know what you use in your projects, and as important here, add your feedback to the enhancement request section at the end and drive the new Warehouse Builder release with your requests...!

Please download the feature survey here and send us the results (press the big nice button).

Thanks in advance for any and all feedback!

May 30, 2008

BIWA Summit call for papers

BIWAlogo:

Oracle BIWA Summit - December
2nd & 3rd

Please pass this
information to customers and partners




Oracle
BIWA Summit 2008 is a forum for business intelligence (BI), warehousing
and analytics professionals to exchange information, experiences and
best practices. With over 75 presentations and workshops and the entire
event focused on BI, Warehousing and Analytics (BIWA), you will get the
knowledge and information critical to be successful in your work. You
will hear experts present novel and interesting use cases of Oracle
Database-centric BIWA topics through keynotes, technical talks, hands
on workshops, discussion panels and more.


The summit will be
held December
2-3, 2008 at the Oracle
Conference Center in Redwood Shores, CA
and
offers outstanding
value for BI professionals who use Oracle technology.  


Call for Presentations & Technical
Workshops


You are invited to submit an abstract for a Presentation (50 minutes)
or hands-on Technical Workshop (110 minutes) to the Oracle Business
Intelligence, Warehousing, and Analytics (BIWA) Summit 2008 at the
Oracle Conference Center in Redwood Shore, CA.  See the Submission
Process at
request
for Presentation Website




Important
Dates


Last
date for submission of abstracts:  
August
15, 2008


Presentation
notification:  
September 30, 2008

BIWA
Summit: 
December 2-3, 2008




LEARN MORE

BIWA
Summit 2008 Website


Request
for Presentation Website

OracleBIWA.org

Watch for Summit Registration details coming soon.



About BIWA: 

BIWA
is the Oracle Business Intelligence, Warehousing, and Analytics Special
Interest Group, part of the Independent Oracle User Group (IOUG). BIWA
is a worldwide association of persons seeking the successful deployment
of Oracle Database-centric business intelligence, data warehousing, and
analytical products, features and Options.

June 12, 2008

Modifying icons for process flow activities

I would guess many people are using user defined activities in process flow. As do we when we do partner work. Wouldn't it be nice if you could change the icons for the user defined activities to better show what is being done in that activity? In some cases it might be, and it is quite simple to do.

First step is to create an iconset. You do this in the Global Explorer under icon sets. Here is an example of the completed dialog.

IconSetScreen:

Next is to actually associate this iconset with the process flow activity. For mappings this can be done within the UI, but for process flow activities this is done via scripting:

OMB+> OMBALTER PROCESS_FLOW 'LOAD_ORCL' MODIFY ACTIVITY 'RUN_SIS_JOB' SET REFERENCE ICONSET 'STARANALYTICS'
Process Flow Activity RUN_SIS_JOB altered.
Process Flow LOAD_ORCL altered.
OMB+>

After this you get a wonderful icon reflecting the actual external job you are calling.

ProcessFlowWithIconReplaced:

If you want to show separate icons on the mappings, here is how you do that. It is on the properties panel and it will change the icon in the design center.

ChangedMapIcons:

Now of course you are wondering what Star Analytics is. Well, this is a company / product that allows you to re-integrate your Essbase data back into the relational data warehouse. It handles the extraction from Essbase, in this case into Oracle base tables. You then run the OWB mappings to load the data into the Oracle data warehouse. Quite cool if you need to integrate for example finance data back into the EDW.

August 7, 2008

OWB Public Views in 11g Release 1

In OWB 11g the repository is a set of workspaces. In the public views this means you need to set the workspace context.This is not an issue from within a map as the workspace context is set prior to invoking the map as the map audit needs this.

However in order to view data inside OWB using the public views it is necessary to have this set and as OWB creates the connection, the only solution is a logon trigger. This also applies if you are going into any system where you do not control the login. In SQL Plus, you simply run the procedure after logging in or do some session stuff...

As the trigger requires a pl/sql procedure that has execution rights on owbsys.wb_workspace_management.set_workspace this
needs to be granted to the user. In addition, the OWB roles need to be enabled.

The workaround for this is to create the following procedure, grant the following rights to the users and then the logon trigger….

Note the logon trigger has to be granted by sysdba.

Package Script

CREATE OR REPLACE procedure wb_initialize_workspace
authid current_user as

/* To return the first workspace owned by a user */
CURSOR ws_user_owner IS
SELECT MIN(workspace_name) KEEP (DENSE_RANK FIRST ORDER BY workspace_id), user_name
FROM owbsys.all_iv_workspace_assignments
where user_name = USER and isworkspaceowner = 1
GROUP BY user_name;

/* To return the first workspace associated with a user */
CURSOR ws_user_user IS
SELECT MIN(workspace_id) KEEP (DENSE_RANK FIRST ORDER BY workspace_id)
FROM owbsys.all_iv_workspace_assignments
where user_name = USER
GROUP BY user_name;

p_id number;

/* To find the owner of a given workspace */
CURSOR ws_owner_user IS
SELECT workspace_name, user_name
FROM owbsys.all_iv_workspace_assignments
where workspace_id = p_id and isworkspaceowner = 1;

p_ws varchar2 (32);
p_user varchar2 (32);

BEGIN

/* First try to find a workspace owned by this user. If found make that the active workspace */
open ws_user_owner;
FETCH ws_user_owner INTO
p_ws,
p_user;

IF not(ws_user_owner%NOTFOUND) THEN
owbsys.wb_workspace_management.set_workspace(p_ws,p_user);
close ws_user_owner;
ELSE

/* If none owned, get the first associated with this user */
open ws_user_user;
FETCH ws_user_user INTO
p_id;
IF not(ws_user_user%NOTFOUND) THEN

/* Now find the owner of this workspace */
close ws_user_user;
open ws_owner_user;
FETCH ws_owner_user INTO
p_ws,
p_user;
IF not(ws_owner_user%NOTFOUND) THEN
owbsys.wb_workspace_management.set_workspace(p_ws,p_user);
close ws_owner_user;
END IF;
END IF;
END IF;

END;
/

Grants script

define user = &1
grant execute on owbsys.wb_workspace_management to &user;
grant execute on owbsys.wb_initialize_workspace to &user;
alter user &user default role all except OWB$CLIENT;

Trigger script

set concat ~
define user = &1
CREATE OR REPLACE TRIGGER &user~_LOGON_TRIGGER
AFTER LOGON ON &user.SCHEMA
BEGIN
owbsys.wb_initialize_workspace;
END;
/

In addition when using PL/SQL to query the views, for any design time views you will need to grant select access to the user from owbsys (as roles don't work from within pl/sql). This is of course outside of the trigger stuff discussed here...

September 3, 2008

SQL for workspace enablement

As a quick follow up on my post on accessing SQL views in 11g OWB, here is a simple one if you are in SQL Dev or SQL Plus:

begin
owbsys.wb_workspace_management.set_workspace('workspacename', 'workspaceowner');
owbsys.start_enable_owb_roles;
end;

Replace the italics for workspacename and workspaceowner with the name of your workspace and the name of the owner (or user of) the workspace.

Now you can do what you need in direct access tools like SQL Plus...

April 3, 2009

User-Contributed Script: Export All Projects in a Repository

Oracle Warehouse Builder user Hanming Tu contrinbuted a Korn Shell script and OMB*Plus script to export all the projects in a OWB repository to MDL. (It should be possible to write a Win32-specific wrapper for use with the OMB*Plus script as well, if you don’t have a korn shell implementation on your Windows box.)

Known limitations:

  • it requires single user mode to connect to the repository
  • it stops the export if users are still in a project

You can download the script here

August 10, 2009

Writing Master Detail Files

I've seen this question a few times about how to write a master detail file using OWB, so thought a quick post was in order. Here you'll see a technique for writing such files and using the union all and sorter operator to process the records in order. The OWB flat file operator lets you write a single record at a time, so you have to be creative to use in the multi-record master-detail case.

The example we'll look at is writing a master detail set of records representing the EMP and DEPT tables from the SCOTT schema. So the DEPT records will be our master records, and the EMP records the detail.

  • department_information
  • employee_information
  • employee_information
  • employee_information
  • department_information
  • employee_information
  • etc.

The mapping will perform a UNION ALL on the columns from EMP and DEPT, and then sort the columns by DEPTNO and DEPTNAME in order to group the DEPT and EMP records together, then we will have an expression which will result in a single VARCHAR2 which is then written to the target file.

write_master_detail1

The set operation as mentioned above performs a UNION ALL on the EMP AND DEPT records, the sorter sorts based on DEPTNO and DNAME (so we get the DEPT records interspersed with the EMP records).

The sort operation as mentioned includes both the DEPTNO and DNAME columns;

write_master_detail2

The expression results in a single string with the delimiters concatenated into the string, this is the 'record'. Obviously if you had a much larger record size you would need to chunk the expressions.

write_master_detail3

This results in a file like the following with each department and associated employees immediately following;

write_master_detail4

Hopefully this is useful and may inspire some more ideas and thoughts!

August 12, 2009

Parallel Unload to File

Here we see how to to construct a parallel unload mapping leveraging the Oracle Database parallel table function capabilities to write the results from a PLSQL Ref Cursor to file. The example also illustrates how the table function arguments are constructed within OWB. The format of the file is determined by the user in the mapping by an expression defining the data to be written.

The example is a simple table to file scenario just to illustrate the basic mechanics, but the query/operator set can obviously be more complex. Below we see an example writing the results of a SALES table to a file, the target is a PLSQL table function which has a refcursor parameter; a ref cursor which produces the record to be written to the file (the filename and directory name are embedded in the function definition).

parallel_unload8

The record itself is a string and is a concatenation of the columns, below we see the fields delimited by commas, if you wanted any specific locale expressions or wrapped in quotes for example you can do this here.

parallel_unload2

The table function itself has details of the underlying PLSQL function used and an indicator that it is being used as a target in the mapping (you must set this, its not smart enough to figure it out).

parallel_unload3

Now you see that the table function operator has 2 groups, 1 input group and one output group. There is a property which defines the type for the group and if you have a mixture of ref cursor inputs and scalars then you will need more than one group. So I have one group (INGRP1) for the ref cursor which produces the record to be written, it is defined with a type of ref cursor (so the operators sourcing the attribute in this group will be input via a PLSQL refcursor for use in the function).

parallel_unload9

The output group is simply returning a table of scalars so I made sure the group has that property set.

parallel_unload6

That's about it, so when we generate the code we see a call to the table function ParallelUnload wrapped in a TABLE call, and the parameter passed in; a ref cursor containing the query representing the operators sourcing the attribute (the 'select blah from SALES' statement). Note also the PARALLEL hint which I configured on the SALES table and defined 5 as the number of concurrent servers.

parallel_unload10

Running this will allow parallel unload to file. The function

create or replace type NumSet as table of number;

/

create or replace package RefCur_Unload as
    type Line is record (vc varchar2(32767));
    type MultiLine_cur is ref cursor return Line;
end;

/

create or replace function ParallelUnload
       (r RefCur_Unload.MultiLine_cur) return NumSet
   PIPELINED  PARALLEL_ENABLE (PARTITION r BY ANY) is
   i binary_integer := 0;
   rec2 varchar2(32767);
   out utl_file.file_type;
   filename varchar2(256) := 'dbunload';
   directoryname varchar2(256) := 'MY_DIR';
   vsid varchar2(120);
begin
   select sid into vsid from v$mystat where rownum=1;
   filename := filename || vsid || '.dat';
   out := utl_file.fopen (directoryname, filename , 'w');

   loop
     fetch r into rec2;
     exit when r%notfound;

     utl_file.put_line(out, rec2);
     i := i + 1;

   end loop;

   close r;

   utl_file.fclose(out);
   PIPE ROW(i);
   return ;

end;
/

The function uses the PIPELINED AND PARALLEL_ENABLE keywords to indicate the table function and parallel capabilities, the Oracle Database Data Warehousing Guide has more details on this in the 'Loading and Transformation' chapter if you want to dive into more detail. Also each parallel session of the function’s execution will write a data file (so data file will be named dbunload<sessionid>.dat). There was an earlier post on table functions (here) as source and target operators which is also worth checking out, they are a useful way of adding some custom transformation code into the mapping design, and all leveraging Oracle PLSQL.

August 27, 2009

Matching Abbreviations and Acronyms

A quick post on how to match data using composite match rules for matching abbreviations or matching acronyms. For example we want to match the following tokens;

  • International Business Machines
  • Intl Bus Machines
  • IBM

One of these is a full name, one is an abbreviation (Intl Bus Machines) and one is an acronym (IBM). How can we match multiple rows and cherry pick information from matching rows to construct a result without using custom code? Let's have a look and see...

The map below has a source DATA1_ABBREV (its an external table on a simple flat file) which includes the 3 strings we are matching above, we pass in a constant with the value 1 for our binning strategy.

match_merge1

The mapping uses the match merge operator in OWB which lets you define rules for how to bin data, match data and merge it. There are a wide set of rules and you can add custom rules too.

Binning

Binning is basically a divide and conquer approach to speed up the processing. For example, if you are loading one million rows into a table with a million rows then you want to avoid doing a million by a million comparison. Binning allows you to subset this processing as matching is only performed within the record set for that bin.

The smaller the match bin, the faster the processing, however, rows will only be considered within the match bin so you need to make sure your binning does not exclude any possible matches (that you do not want eliminated). You can have match bins based on any fields passed into the match-merge operator and since these can contain any operator (including constants,expressions and transformations) you have lots of flexibility. In the case below a constant is passed in so there is one bin (not the typical case, but we have a small set of rows for a demo).

match_merge2

Another binning example is on country name, suppose the bin is based on the first 2 characters of the country name which would result in say 500 records. This would considerably speed up the processing but would miss records if someone has put SOTLAND instead of SCOTLAND. Setting the match bin to just the first character would be more accurate but would result in more match tests.

The Match Rules

As we mentioned earlier we have full strings, abbreviations and acronyms to match. We can add multiple match rules, below we have a match rule MA_0 which is of type conditional and in the detail table we see it uses the Abbreviation algorithm.

match_merge3

There is also a match rule MA_1 which is also of type conditional and in the detail table we see it uses the Acronym algorithm. So records will be matched if any active rule passes.

match_merge4

In our example we have no merge rules (not typical), this will randomly select ANY value for the attribute. We could get it to select the shortest or longest value for example.

match_merge6

Here we have a merge rule that uses the Min Max merge rule type and uses the attribute and picks the 'Longest' value, this will ensure we get 'International Business Machines' and not 'IBM' or an abbreviated version.

So here we've seen a way to build up matching algorithms on data based on prebuilt rules that we can use to built up useful matching functionality when cleaning and consolidating data. For more details check the OWB user's guide for the match merge data quality operator and also other posts such as the one on complex de-duplication. Interested to hear other approaches on this that people utilize.

November 3, 2009

More Developers to Join OWB Blog

David Allan's posts have been a major part of making the OWB Blog a must-read for our users. Soon, though, he will be joined by more members of our development staff, who will be bringing you more how-tos and insights into OWB 11.2 features that help you get the most value out of the tool.

The first of these should be up shortly-- keep an eye out for it.

About Metadata

This page contains an archive of all entries posted to Oracle Warehouse Builder (OWB) Weblog in the Metadata category. They are listed from oldest to newest.

Events is the previous category.

Press is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle