Data Auditor by Example

OWB has a node Data Auditors under Oracle Module in Projects Navigator. What is data auditor and how to use it? I will give an introduction to data auditor and show its usage by examples.

Data auditor is an important tool in ensuring that data quality levels meet business requirements. Data auditor validates data against a set of data rules to determine which records comply and which do not. It gathers statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited table. Data auditors are typically scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals.

How to use data auditor to monitor data quality? Only objects with data rules can be monitored, so the first step is to define data rules according to business requirements and apply them to the objects you want to monitor. The objects can be tables, views, materialized views, and external tables. Secondly create a data auditor containing the objects. You can configure the data auditor and set physical deployment parameters for it as optional, which will be used while running the data auditor. Then deploy and run the data auditor either manually or as part of the process flow. After execution, the data auditor sets several output values, and records that are identified as not complying with the defined data rules contained in the data auditor are written to error tables.

Here is an example.

We have two tables DEPARTMENTS and EMPLOYEES (see pic-1 and pic-2. Click here for DDL and data) imported into OWB. We want to gather statistical metrics on how well data in these two tables satisfies the following requirements:

a. Values of the EMPLOYEES.EMPLOYEE_ID attribute are three-digit numbers.

b. Valid values for EMPLOYEES.JOB_ID are IT_PROG, SA_REP, SH_CLERK, PU_CLERK, and ST_CLERK.

c. EMPLOYEES.EMPLOYEE_ID is related to DEPARTMENTS.MANAGER_ID.

clip_image002

Pic-1 EMPLOYEES

clip_image003

Pic-2 DEPARTMENTS

1. To determine legal data within EMPLOYEES or legal relationships between data in different columns of the two tables, firstly we define data rules based on the three requirements and apply them to tables.

a. The first requirement is about patterns that an attribute is allowed to conform to. We create a Domain Pattern List data rule EMPLOYEE_PATTERN_RULE here. The pattern is defined in the Oracle Database regular expression syntax as ^([0-9]{3})$

rule1

Apply data rule EMPLOYEE_PATTERN_RULE to table EMPLOYEES.

clip_image006

b. For the second requirement we need a data rule that defines a list of values that an attribute is allowed to have. Domain list data rule JOB_LIST_RULE is created for this purpose, with values IT_PROG, SA_REP, SH_CLERK, PU_CLERK, and ST_CLERK.

rule2

Apply data rule JOB_LIST_RULE to table EMPLOYEES.

clip_image009

c. The third one requires the EMPLOYEES.EMPLOYEE_ID has a 1:n relationship with the MANAGER_ID column of the DEPARTMENTS table. Create a referential data rule DEPT_MNGR_REF_RULE for relationship between DEPARTMENTS.MANAGER_ID and EMPLOYEES.EMPLOYEE_ID.

rule3

Apply data rule DEPT_MNGR_REF_RULE to table DEPARTMENTS.

clip_image012

2. After applying data rules to EMPLOYEES and DEPARTMENTS, we can create a data auditor with these two tables. A data auditor can monitor data quality for more than one data object.

da_1

After selecting the objects to audit, the next step is to choose the desired action for records that don't comply with a rule.

Data auditor has thresholds that allow you to create logic based on the fact that too many noncompliant records can divert the process flow into an error or notification stream. You can specify a threshold value for each data rule that the data auditor audits. This value is used to determine if the data in the data object is within the limits that you defined. Based on this threshold, the process can choose actions. In a process flow, you can test this value and branch based upon the result.

There are two kinds of threshold mode:

Percent: The data auditor will set the audit result based on the percentage of records that do not comply with the data rule.

Six Sigma: The data auditor will set the audit result based on the Six Sigma values for the data rules.

For example, you select Percent and set defect threshold value for all rules to 80%. If less than 80% of the data in the tables complies with the data rules, the auditing for the table fails.

In addition to set threshold, user can specify value of Action for each rule for action to be performed if data in the source object does not comply with the data rule. Select Report to ensure that the data rule is audited, or select Ignore if you want the data rule to be ignored.

da_2

3. When a data auditor is created, you need to deploy the data auditor before you can use it to monitor the data in data objects. To deploy a data auditor, right-click the data auditor and select Deploy in Design Center. Then you can run the data auditor, any records that violate the data rules defined on the data objects are written to the error tables.

There are two ways of using data auditors: manually Running Data Auditors or scheduling a Data Auditor to Run.

To run a data auditor manually, right-click the data auditor and select Start in Design Center, or open Control Center Manager, select the data auditor, and select Start from the File menu.

To schedule the execution of a data auditor as part of process flow, firstly create a process flow that contains a Data Auditor Monitor activity that represents the data auditor, then schedule this process flow to run at a predefined time.

clip_image018

In this process flow, the Data Auditor Monitor activity DATA_AUDITOR_DEMO represents the data auditor we created in step2. LOAD_MAP is a mapping that loads data into the table EMPLOYEES and DEPARTMENTS. If the data load is successful, the data auditor DATA_AUDIT_DEMO is run.

4. When executed, the data auditor sets several output values. One of these values is the audit result. Audit results provide information about the extent of data rule violations that occurred while running the data auditor. You can capture audit results and store them for analysis.

job

The above output values contain the following three parameters:

AUDIT_RESULT: Indicates the result of running the data auditor. The possible values for this parameter are as following:

- 0: No data rule violations occurred.

- 1: At least one data rule violation occurred, but no data rule failed to meet the minimum quality threshold as defined in the data auditor.

- 2: At least one data rule failed to meet the minimum quality threshold.

EO_<data_rule_name>: Represents the calculated error quality for the specified data rule. Zero (0) indicates all errors and 100 indicates no errors.

SO_<data_rule_name>: Represents the Six Sigma quality calculated for the specified data rule.

In this example AUDIT_RESULT is 2, which means at least one data rule failed to meet the minimum quality threshold we defined (100 percent) when creating data auditor, they are data rules EMPLOYEE_PATTERN_RULE and JOB_LIST_RULE applied to table EMPLOYEES. Records that do not comply with defined data rules are written into error tables. The default error table name is the object name suffixed by "_ERR". So noncompliant records are inserted into error table EMPLOYEES_ERR and DEPARTMENTS_ERR.

EMPLOYEES_ERR has four records, they are records with EMPLOYEE_ID=1000,1001 which violate data rule EMPLOYEE_PATTERN_RULE, records with JOB_ID=PU_CLIRK, SA_REPP violate JOB_LIST_RULE. No records are written in DEPARTMENTS_ERR since all data comply with data rule DEPT_MNGR_REF_RULE in table DEPARTMENTS.

You can try out the example with the following steps:

1. Run script create_user.sql as sysdba to create database user DA_DEMO and grant appropriate privileges to DA_DEMO.

    clip_image002[1]

    2. Connect to database as DA_DEMO/DA_DEMO and run scripts create_source_tables.sql, create_tables.sql, insert_source_data.sql, and insert_data.sql to create tables and insert data into tables.

      clip_image004[1]

      3. Import MDL file DA_DEMO-11_2_0_1.mdl to your OWB. This MDL file captures the data rules, data auditor and process flow in the example.

        clip_image006[1]

        4. Register database user DA_DEMO as an Oracle Warehouse Builder user, start control center service, and input password for location DA_LOCATION before you can deploy data auditor DATA_AUDITOR_DEMO.

          clip_image008

           

          Extended Reading:

          Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide
          11g Release 2 (11.2)


          Comments:

          Problem when importing DA_DEMO-11_2.mdl following error msg: MDL3009: MDL Release <11.2> is not avalid release. Linux el5, oracle 11.2 simple instalation. Found no solution in internet or metalink. Please help Thanks Nik

          Posted by Nik on March 23, 2010 at 10:28 PM PDT #

          Hi Nik Just tried, it looks like the MDL was reposted on March 25, retry you should be ok. Probably would have been best to post the MDL for OWB 10gR2 release. Cheers David

          Posted by David Allan on April 01, 2010 at 02:57 AM PDT #

          Thanks for trying out the example. The mdl file DA_DEMO-11_2_0_1.mdl I re-uploaded on March 25 was exported from OWB 11.2.0.1.

          Posted by Jinjin Wang on April 01, 2010 at 11:48 AM PDT #

          Hi,

          What should I configure so the Data Auditor writes information in the error table columns ORA_ERR_NUMBER$, ORA_ERR_MESG$ and ORA_ERR_ROWID$ when it detects an invalidation of a data rule?

          Posted by guest on January 09, 2012 at 10:39 PM PST #

          Post a Comment:
          • HTML Syntax: NOT allowed
          About

          ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.

          Search

          Archives
          « April 2014
          SunMonTueWedThuFriSat
            
          1
          2
          3
          4
          5
          6
          7
          8
          9
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          30
             
                 
          Today