Report on ETL Status of a Multiple Datasource Environment
By Jeffrey Mcdaniel-Oracle on Jan 22, 2014
Do you have a Multiple Datasource Reporting Database environment? In a scenario where you have multiple datasources, and multiple ETL processes running you may want to keep track of those ETL runs in an automated fashion. In an earlier blog we discussed how setup your own ETL monitoring system, in this Blog we will discuss how to use OBI to monitor your ETL runs.
Currently we have an ETL_PROCESSMASTER table in the STAR schema that tracks every ETL run - the start time, the finish time, and wheter it was successful. However there is not a column for datasource Id. You can get this information by calling a function in the STAR schema - get_source_db_id.
If you would like an OBI report to show this information here is a way to do it. First let's talk about how this can be accomplished in OBI. This data for ETL_PROCESSMASTER is not exposed in any of the subject areas. The reason is it is not for general user consumption. The ETL_ tables in STAR are for internal trackings and usage during the ETL process. While it can have useful information it's main intention is not for exposure in the subject areas of P6 Analytics where the focus is on your schedule and resource data, not the internal workings of the ETL process.
But a report can be created outside of subject areas by using OBI's ability to query a data source directly. In this type of Analysis you do not use any of the subject areas you use the OBI direct database query and formulate your own SQL. After the SQL has been generated this Analysis can be treated like any other and an administrator could use it to oversee the ETL process for all the organization's datasources.
We need to add a new column to the ETL_PROCESSMASTER table. We will add a column called DSID. You can either run the SQL before running the ETL prcess:
ALTER TABLE etl_processmaster ADD (dsid number);
Or another way is to add a script into the USER_SCRIPTS folder (located in \star\scripts directory). All contents of the USER_SCRIPTS folder will be executed after the ETL process has completed.
Create a script called - addcolumn.sql and add the following contents:
v_new_col1 number := 0;
Select count(*) into v_new_col1 from user_tab_cols where column_name = 'DSID' and table_name = 'ETL_PROCESSMASTER';
if (v_new_col1 = 0) then
execute immediate 'ALTER TABLE etl_processmaster ADD (DSID number)';
This script will look to see if the DSID column exists, if it does not it will be added. If it does exist then this step will be skipped. Now you have the new column added. Next we need to populate the datasource id for the last\most recent ETL run.
In the USER_SCRIPTS folder add another new script called = datasource.sql
set serveroutput on size 100000
-- update datasourceid
v_src_id := get_source_db_id;
update ETL_PROCESSMASTER set dsid = v_src_id where processstartdate = (select max(processstartdate) from etl_processmaster);
This script will go find the datasource id of this ETL run and update the new DSID accordingly for the most recent ETL run row. Now you have all the data in place. Next you just need to generate a simply query to extract this data, such as:
select processid, processstartdate, processenddate, processtype, DSID from etl_processmaster;