Did You Know that ODI Automatically Summarizes Data Errors?
By Christophe Dupupet on Oct 09, 2009
This post assumes that you have some level of familiarity with ODI. The concepts of Interface, Flow and Static Control, as well as Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..
TABLES GENERATED BY ODI TO IDENTIFY ERRORS
If you take advantage of either Flow Control or Static Control in your interfaces, you know that ODI will automatically trap errors for you as you run your interfaces.
When you select the Controls tab of your interface, where you will decide which Knowledge Module will be used to identify the errors, you have an option to drop the Error table and another one to drop the Check table. Have you ever wondered what these are?
The Error table is the table that will be created by ODI to store all errors trapped by the FLOW_CONTROL and STATIC_CONTROL of your interface. You have probably already used the error table. This table is structured after your target table, along with administrative information needed to re-cycle or re-process the invalid records. It is loaded by ODI with all records that fail to pass the validation of the rules defined on your Target table. This feature is often referred to as a Data Quality Firewall as only the "good" data will make it to the target table.
Once all errors have been identified for a given interface, ODI will summarize them into another table: the Check table. There will be only one such table per data server: all target tables in the server (irrespectively of their schema) will share the same summary table. The name of this table is defined by default by the CKMs as SNP_CHECK_TAB.
LOCATION OF THE CHECK TABLE
You will find the check table in the default work schema of your server. To locate this schema, you have to go back to topology, in the Physical Architecture tab. Expand your data server to list the different physical schemas. One of the schemas is your default schema and will be identified by a checkmark on the schema icon (see SALES_DWH in the example below).
When you edit the schema, it has an associated work schema. The work schema associated to your default schema is your default work schema: ODI_TMP is the following example.
Note that you can change your default schema by selecting/unselecting the default option in the schema definition. But remember that you will always need exactly one default schema for each server.
Now that we know where to find this table, let's look at its structure:
- CATALOG_NAME, SCHEMA_NAME: location of the table that was being loaded (i.e. the target table)
- RESOURCE_NAME, FULL_RES_NAME: name of the table that was being loaded
- ERR_TYPE: type of control that was performed (Flow Control or Static Control)
- ERR_MESS: plain English error message associated with the error
- CHECK_DATE: date and time of the control
- ORIGIN: name of the ODI process that identified the errors
- CONS_NAME: name of the constraint (as defined in the ODI Models) that defines the rule that the record violated
- CONS_TYPE: type of error (duplicate primary key, invalid reference, conditional check failed, Null Value)
- ERR_COUNT: number of records identified by the process that failed to pass that specific control rule.
A sample of the data available in that summary table is show below (we split the content in 2 screenshots to make this more readable - this is one and only one table):
There are many possible uses for this table: decision making in your ODI processes based on the number of errors identified or the type of errors identified, basic reporting on errors trapped by ODI, trend analysis or the evolution of errors over time...
Do not hesitate and share with us how you leverage this table!
Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.