How to handle BIAPPS ETL failures due to duplicate data issues in Oracle DW

How to handle BIAPPS ETL failures due to duplicate data issues in Oracle DW

Author: Amit Kothari

Sometimes there are duplicate data rows in the source tables, due to this while running a ODI Load Plan (LP) in BI Applications 11.1.1.7.1 the ODI sessions may error out while creating the unique indexes. It is always recommended that the source data issues be fixed first but that may not be feasible sometimes or you may just want to fix the data in the warehouse and continue with the LP run. This blog shows you one way to do that – first we find the table and the indexed columns for the failed index and then we substitute it in the delete sql to delete the duplicate rows.

Sample Error:

ODI-1217: Session EXEC_TABLE_MAINT_PROC (12383500) fails with return code 20000.
ODI-1226: Step TABLE_MAINT_PROC fails after 1 attempt(s).
ODI-1232: Procedure TABLE_MAINT_PROC execution fails.
ODI-1228: Task TABLE_MAINT_PROC (Procedure) fails on the target ORACLE connection BIAPPS_DW.
Caused By: java.sql.SQLException: ORA-20000: Error creating Index/Constraint : W_EMPLOYEE_D_U1=> ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found
ORA-06512: at line 152

Solution:

From sql developer run this sql in your ODI repository schema based on the failed indexed name which we got from the error above, it will give you the indexed columns for the failed index, for eg -

SELECT st.table_name,sc.col_name
FROM SNP_TABLE st, SNP_KEY sk, snp_key_col skc, snp_col sc
WHERE sk.key_name ='W_EMPLOYEE_D_U1'
  AND sk.I_table = st.I_table   AND skc.I_key = sk.I_key   AND sc.I_col= skc.i_col   AND st.I_table = sc.i_table


When we run the above sql we get the table name as W_EMPLOYEE_D and columns as – EFFECTIVE_FROM_DT , DATASOURCE_NUM_ID,INTEGRATION_ID

These are the columns which has the duplicate rows resulting in the failure of the unique index creation. Now we can proceed to delete the duplicate rows via a delete sql, make sure you backup the table data before issuing the delete statement.

From sql developer run this sql in your Oracle target schema and then simply restart the ODI load plan - a sample delete sql based on the above index is as follows. Just substitute the columns and the table name for your use case.

DELETE FROM W_EMPLOYEE_D A
WHERE ROWID > (SELECT MIN(ROWID) FROM W_EMPLOYEE_D B WHERE A.INTEGRATION_ID = B.INTEGRATION_ID AND A.DATASOURCE_NUM_ID = B.DATASOURCE_NUM_ID AND A.EFFECTIVE_FROM_DT = B.EFFECTIVE_FROM_DT) ;
commit;

Comments:

Hello Amit,

Thank you for providing a quick fix for this problem I've been having. I don't know how it is with other people having this problem, but in our case I think it's not a problem in the source system. We are attached to a PeopleSoft 9.1 HCM environment and this problem occurs each time a person has more than one effective dated row. Then they have identical rows besides the row_wid and the effective_from_dt fields. Yet the index is based on the integration_id and datasource_num_id.

EXECUTE IMMEDIATE 'CREATE INDEX RABO_DW.W_HR_PERSON_D_U1 ON RABO_DW.W_HR_PERSON_D (DATASOURCE_NUM_ID,INTEGRATION_ID) NOLOGGING ';

This can never work, can it?

Posted by John on November 13, 2013 at 10:35 PM PST #

If you are getting errors on W_HR_PERSON_D, there might be a data or code issue that we might need to investigate. Please send us the SR that is already logged with Oracle.

Posted by guest on November 15, 2013 at 11:05 AM PST #

There is a more elegant way of handling such issues.

In ODI Studio, open the interface causing the issue, go to the flow tab, select the target datastore, and in the KM options that will apppear in the property inspector set FLOW_CONTROL to yes. Save en generate the scenario. Next run, any records that violate constraints will be set aside in an E$_<tablename> error table, and an overview of the errors can be found in SNP_CHECK_TABLE.

Posted by guest on February 11, 2014 at 07:30 AM PST #

Yes, that is another way to handle errors but turning on FLOW_CONTROL has a negative performance impact and that is why out of the box it is turned off for BIAPPS interfaces. The steps given above is just a quick solution for unblocking ETL run issues during a POC or testing. It is always better to fix the source data. Thanks for your comments.

Posted by Amit on February 12, 2014 at 10:13 AM PST #

It's not a data issue, it looks like a flaw in the design.

Whenever someone has a job entry and a termination in the same year, their person data gets added to the data warehouse twice instead of once. Rather than loading the person data first, it's derived from the job records with not enough fields to make it unique.

There are more data loading issues popping up and SR's are being issued. We already had a couple of fixes on delivered software.

Posted by John on February 13, 2014 at 01:24 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Phil Wang-Oracle

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today