Configuring Custom ETL from Fusion Analytics Warehouse

October 1, 2021 | 12 minute read
Krithika Raghavan
Director, Oracle Analytics
Text Size 100%:

Last Updated: 01 May 2023

Introduction

As an Oracle Fusion Analytics Warehouse (FAW) customer, you might need to design a downstream ETL process that extracts and loads FAW data into another data warehouse or system.  This article explains the steps to enable such data extracts from FAW through a simple use case.

 

Prerequisites to Enable Custom ETL

Before enabling custom ETL, you must:

  • Determine when the FAW incremental refresh completed for that day.
  • Determine the changes that happened since the last refresh.
  • Ensure that you have adequate CPUs configured in ADW (and corresponding sessions) to be able to run custom ETL. For every CPU that was allocated as part of the FAW instance creation, you should not exceed 5 low service concurrent sessions for custom ETL. Exceeding this can impact the ETL/Reporting performance and hence is not supported. If you need to run more sessions, then please ensure you have additional CPU added to your ADW instance. Each additional CPU that you add this way, will allow upto 300 low sessions. Medium/High sessions must be avoided in all cases, as they will block the ETL pipeline.

 

Determine when the FAW Incremental Refresh Completed for that Day

To determine when the FAW incremental refresh completed, check these tables:

 

FAW Table Name

Purpose

1

DW_WH_REFRESH_SUMMARY

Holds information about every FAW warehouse refresh and the functional areas that were refreshed as part of it.


The WAREHOUSE_REFRESH_DATETIME in time in UTC is the time when internal processing is completed.

The PROCESS_NAME column can have different values but the ones that are relevant for custom ETL are mentioned below. Other values for process name column can be ignored for the purpose of custom ETL.

WAREHOUSE_REFRESH_ONBOARDING (Full Load)

WAREHOUSE_REFRESH_SCHEDULED (Incremental Load)

WAREHOUSE_REFRESH_ADHOC (On-Demand Load)

 

2

DW_WH_REFRESH_DETAILS

Holds the list of the warehouse tables that were refreshed recently. See FAW documentation for details on the columns.

 

Determine the Changes Since the Last Refresh

To determine the changes since the last refresh, get the last refresh dates by checking these two system date columns in any table:

Column Name

Purpose

W$_INSERT_DT

Includes the timestamp when that particular record was inserted into that table.

W$_UPDATE_DT

Includes the timestamp when that particular row was updated in that table.

Note the following about these columns:

  1. These aren't the dates and times when the records were inserted or updated in the source. These are the dates and times when the records were inserted or updated in FAW.
  2. To optimize the load timings, you can fully refresh certain tables. Similarly, you can delete and insert records instead of updating them. A record that is inserted or updated in an incremental load isn't necessarily a record that was inserted or updated in the source since the last run.
  3. For optimization purposes, you can delete the records and insert them, rather than update them. In such cases, the W$_INSERT_DT and W$_UPDATE_DT columns include the same timestamp.
  4. Certain tables (especially the DFF ones) might not have the W$_INSERT_DT and W$_UPDATE_DT columns included. If the columns aren't included in certain tables, perform a full refresh of those tables. All tables will likely include these two columns in upcoming FAW releases.
  5. An audit of records deleted in the source isn't maintained. A record deleted in the source results in the record being deleted in FAW. You can use the primary keys to compare the records and identify the deletions.
  6. Certain objects are views based on the data warehouse (DW) tables. These objects won't have any W$ columns because they aren't tables but views that join two or more underlying data warehouse tables. List of such objects are published in the FAW product documentation; these objects aren't required to be extracted, because their data is already available in their respective base tables.
  7. The time stored in the W$_INSERT_DT and W$_UPDATE_DT columns corresponds to the time zone of the Autonomous Data Warehouse. Use the "SELECT DBTIMEZONE FROM dual" command to check the time zone.
  8. An entry in DW_WH_REFRESH_DETAILS corresponds to the time when the ETL process for that table ran. If the source had no changes for that day, then you might not see a record in the actual data warehouse table with W$_UPDATE_DT corresponding to that day. You'll still see an entry in DW_WH_REFRESH_DETAILS indicating that the table was refreshed as part of the run (but in this example, there are no changes).
  9. These dates should not be compared between individual tables and/or with the entries in DW_WH_REFRESH_DETAILS/ DW_WH_REFRESH_SUMMARY. All tables won’t have the same timestamp since the tables could be loaded at different times depending on available resources. These dates should solely be used for the purpose of custom ETL as detailed below. Usage in any other way for either interpretation or analysis is neither supported nor recommended.
  10. WAREHOUSE_REFRESH_DATETIME in DW_WH_REFRESH_SUMMARY is the time in UTC when the data was internally processed completely. W$_INSERT_DT W$_UPDATE_DT represent the time in UTC when the data is published into ADW for that individual table. However these tables are made available for consumption only after all the tables have been updated. So there could be a gap between when these data in an individual table is updated in ADW and the date/time when you can start querying these tables.

Identify the Deletions

To understand how to indentify deletions, see the example of the DW_LEDGER_D FAW table that's used internally to populate a downstream custom table CUSTOM_LEDGER_D. This downstream custom table resides in either your custom data warehouse or in a custom schema in the Oracle Database or Autonomous Data Warehouse. See the FAW Data Model documentation to identify the primary key for that table. As per the documentation for DW_LEDGER_D, the primary key is LEDGER_ID. Use the following table to identify the records that have been inserted, updated, and deleted.

Record Availability

on PK Columns
(such as LEDGER_ID)

Record Present in
FAW table
(DW_LEDGER_D)

Record Present in
Custom DW table
(CUSTOM_LEDGER_D)

Notes

Insert Record 

Y

N

Records to be inserted in the custom data warehouse

Update Record

Y

Y

Records to be updated in the custom data warehouse

Delete Record

N

Y

Records to be deleted from or marked as inactive in the custom data warehouse per an end-user requirement

* Full Refresh

Y

N

Tables might undergo a full refresh. You can identify a full refresh if all records in a table have the same W$_UPDATE_DT column values, indicating that all records were inserted or updated on that date. In this case, perform a full refresh for specific tables in the custom data warehouse. 


Enable the Custom ETL

The following sections explain the steps to enable a custom ETL by following a simple use case for replicating a dimension (DW_LEDGER_D) and a fact table (DW_GL_BALANCE_F) in the customer data warehouse from the FAW GL functional area.

Create a Custom Schema

To create a custom schema:

1. Prefix the custom tables with CUSTOM_ to differentiate them from the FAW tables for representation. The corresponding tables in the custom data warehouse for this use case are CUSTOM_DW_LEDGER_D and CUSTOM_DW_GL_BALANCE_F. 

2. The FAW tables have synonyms created for them in the OAX_USER schema. Following this example, use the OAX_USER schema to run the ETL, because it already has the required select privileges. However, the CUSTOM tables are stored in the CUSTOM_DW schema.

3. Be aware that the CUSTOM_DW user is only an example in this sample implementation. If you use a different user, you must replace the CUSTOM_DW user and schema with the user and schema name appropriate to your implementation. 

4. Check the required grants and privileges for the user and schema that's equivalent to CUSTOM_DW in your implementation. OAX_USER is for the end-user to access FAW objects. If any additional grants are required for reading from your user, then ensure that your DBA or system administrator provides them. 

Grant connect, resource to <CUSTOM_DW>; -- Replace with relevant schema and user name in brackets

Grant unlimited tablespace to <CUSTOM_DW>; -- Replace with relevant schema and user name in brackets

 

Run a Full Load

1.Log in as the CUSTOM_DW user and create a table in the custom data warehouse to store the date when the warehouse was last refreshed from FAW:

CREATE TABLE CUSTOM_FAW_ETL_AUDIT (FAW_LAST_REFRESH_DATE DATE);

GRANT ALL ON CUSTOM_FAW_ETL_AUDIT TO OAX_USER;

2. Verify that the functional area GL has been successfully loaded into the warehouse schema. Check the configuration pages to confirm. Alternatively, query the DW_WH_REFRESH_SUMMARY table as shown in this example:

SELECT * FROM DW_WH_REFRESH_SUMMARY WHERE FUNCTIONAL_AREA_CODE='FA_GL' AND PROCESS_NAME='WAREHOUSE_REFRESH_ONBOARDING'

If the module is on-boarded, you see that the sample query returns at least one row.

3. Log in to the OAX_USER schema. Ensure that the OAX_USER schema has been granted privileges to create, update, and delete tables in the CUSTOM_DW schema (log in as ADMIN and GRANT CREATE ANY TABLE TO OAX_USER). For a full load, fetch all the data for the appropriate tables and columns from the FAW schema. The query below shows only primary key columns and a few other columns for simplicity. These grants are required for the incremental step alone and aren't required for running a full load.

4. Create the required tables as the CUSTOM_DW user:

CREATE TABLE CUSTOM_DW.CUSTOM_DW_LEDGER_D AS SELECT LEDGER_ID,LEDGER_NAME,LEDGER_DESCRIPTION FROM OAX_USER.DW_LEDGER_D;

CREATE TABLE CUSTOM_DW.CUSTOM_DW_GL_BALANCE_F AS SELECT FISCAL_PERIOD_NAME, CODE_COMBINATION_ID, LEDGER_ID, TRANSACTION_CURRENCY_CODE, TRANSLATED_FLAG,GL_SEGMENT1,GL_SEGMENT2,TRXN_CRNC_PRD_END_BALANCE FROM OAX_USER.DW_GL_BALANCE_F;

5. Make an entry in the audit table indicating when the tables were refreshed:

INSERT INTO CUSTOM_FAW_ETL_AUDIT(FAW_LAST_REFRESH_DATE) VALUES (SYSDATE);

COMMIT;

6. Log in as the CUSTOM_DW user and grant all privileges on the tables created to OAX_USER:

GRANT ALL ON CUSTOM_DW.CUSTOM_DW_LEDGER_D TO OAX_USER;

GRANT ALL ON CUSTOM_DW.CUSTOM_DW_GL_BALANCE_F TO OAX_USER;

GRANT ALL ON CUSTOM_DW.CUSTOM_FAW_ETL_AUDIT  TO OAX_USER;

Whenever you perform a Target Data Model Upgrade in FAW, ensure that you perform a full refresh. The FAW tables and columns might have changed in the new Target Data Model. The changes are usually backward-compatible for the subsequent release.

Run an Incremental Load

1. Verify if there is any refresh in FAW since the last refresh. Log in as the OAX_USER and run the following query:

--Verify Functional Area Refreshed

SELECT * FROM DW_WH_REFRESH_SUMMARY

WHERE FUNCTIONAL_AREA_CODE='FA_GL'

AND WAREHOUSE_REFRESH_DATETIME > (SELECT MAX(FAW_LAST_REFRESH_DATE) FROM CUSTOM_DW.CUSTOM_FAW_ETL_AUDIT);

If at least one row is returned by this sample query, then there is newer data. Otherwise, there isn't any new data since the last refresh to CUSTOM_DW.

2. Get the primary keys for the tables that you want to refresh. Looking at the documentation, the primary key for DW_LEDGER_D is LEDGER_ID. Similarly as per documentation, the primary key columns for DW_GL_BALANCE_F are FISCAL_PERIOD_NAME, CODE_COMBINATION_ID, LEDGER_ID, TRANSACTION_CURRENCY_CODE, TRANSLATED_FLAG.

3. Log in as the OAX_USER. Using the primary keys, prepare the merge statement to refresh incremental data from OAX_USER to CUSTOM_DW using the following sample queries:

--merge into CUSTOM_DW_LEDGER_D  table

MERGE INTO CUSTOM_DW.CUSTOM_DW_LEDGER_D c

USING (SELECT * FROM DW_LEDGER_D WHERE W$_UPDATE_DT > (SELECT max(FAW_LAST_REFRESH_DATE) FROM CUSTOM_DW.CUSTOM_FAW_ETL_AUDIT)) f --get data changed since last refresh

on (f.LEDGER_ID=c.LEDGER_ID) --Formed based on primary key

WHEN MATCHED THEN

    UPDATE SET --Update only non-Primary Key columns   

    c.LEDGER_NAME=f.LEDGER_NAME,

    c.LEDGER_DESCRIPTION=f.LEDGER_DESCRIPTION   

  WHEN NOT MATCHED THEN

    INSERT (LEDGER_ID,LEDGER_NAME,LEDGER_DESCRIPTION) --specify all columns in custom table

    VALUES (f.LEDGER_ID,f.LEDGER_NAME,f.LEDGER_DESCRIPTION);

 

commit;

 

MERGE INTO CUSTOM_DW.CUSTOM_DW_GL_BALANCE_F c

USING (SELECT * FROM DW_GL_BALANCE_F WHERE W$_UPDATE_DT > (SELECT max(FAW_LAST_REFRESH_DATE) FROM CUSTOM_DW.CUSTOM_FAW_ETL_AUDIT)) f --get data changed since last refresh

on (c.FISCAL_PERIOD_NAME=f.FISCAL_PERIOD_NAME

    and c.CODE_COMBINATION_ID=f.CODE_COMBINATION_ID

    and c.LEDGER_ID=f.LEDGER_ID

    and c.TRANSACTION_CURRENCY_CODE=f.TRANSACTION_CURRENCY_CODE

    and c.TRANSLATED_FLAG=f.TRANSLATED_FLAG) --Formed based on primary key columns

WHEN MATCHED THEN

    UPDATE SET --Update only non-Primary Key columns   

    c.GL_SEGMENT1=f.GL_SEGMENT1,

    c.GL_SEGMENT2=f.GL_SEGMENT2,

    c.TRXN_CRNC_PRD_END_BALANCE=f.TRXN_CRNC_PRD_END_BALANCE 

  WHEN NOT MATCHED THEN

    INSERT (FISCAL_PERIOD_NAME,CODE_COMBINATION_ID,LEDGER_ID,TRANSACTION_CURRENCY_CODE,TRANSLATED_FLAG,GL_SEGMENT1,GL_SEGMENT2,TRXN_CRNC_PRD_END_BALANCE)

    VALUES (f.FISCAL_PERIOD_NAME,f.CODE_COMBINATION_ID,f.LEDGER_ID,f.TRANSACTION_CURRENCY_CODE,f.TRANSLATED_FLAG,f.GL_SEGMENT1,f.GL_SEGMENT2,f.TRXN_CRNC_PRD_END_BALANCE);

 

commit;

This performs all insertions and updates from FAW into CUSTOM_DW for these two tables.

4. Optional: If there are deletions in the source, they get propagated to FAW. If you want to delete all such records from CUSTOM_DW, then run the delete query command. This is shown for only one table here, but the same logic applies for any table. Log in as the OAX_USER and run the following sample query. Note that LEDGER_ID happens to be the primary key column for the DW_LEDGER_D table. You'll need to alter the following query accordingly by including all the primary key columns for the table you need to do this operation:

DELETE FROM CUSTOM_DW.CUSTOM_DW_LEDGER_D c

WHERE c.LEDGER_ID NOT IN (SELECT LEDGER_ID FROM DW_LEDGER_D);

commit;

All the insertions, updates, and deletions from FAW have been replicated to the CUSTOM_DW tables. If the FAW table has many changes, then the MERGE and DELETE statements can take considerable time. In such cases, perform a full refresh of those tables alone, if performance is a concern.

5. Update the last refresh date as done during the full load. Make an entry in the audit table indicating when the tables were refreshed:

INSERT INTO CUSTOM_FAW_ETL_AUDIT(FAW_LAST_REFRESH_DATE) VALUES (SYSDATE);

COMMIT;

This completes the incremental refresh.

 

 

Note:

  1. It is important to get the incremental data by looking at the new data greater than the date in the CUSTOM_FAW_ETL_AUDIT as detailed above. The dates in the DW_WH_REFRESH_SUMMARY cannot be used as a substitute for the CUSTOM_FAW_ETL_AUDIT table. Any other mechanism to identify and trigger custom incremental refresh is not supported. Similarly, usage of the W$_INSERT_DT or W$_UPDATE_DT for purposes other than what is listed in this blog is not supported.
  2. Custom ETL or OAC queries joining custom tables with standard OAX DW tables can hold locks on the underlying DW Tables. If these are running during the scheduled ETL load, they can delay the load as they might block the regular loads. Hence the recommended guideline is that custom ETLs (referring to standard DW tables) be triggered post the completion of the standard ETL loads and any OAC Query(referring to standard DW tables) should not take more than few seconds to run.

     

     

Learn More:

FAW Data Model:

Table Structures

https://docs.oracle.com/en/cloud/saas/analytics/faiae/tables.html#tables

Relationship Diagrams

https://docs.oracle.com/en/cloud/saas/analytics/faiae/relationship-diagram.html#relationship-diagram

Loading Customization Data to Autonomous Data Warehouse

https://docs.oracle.com/en/cloud/saas/analytics/fawag/load-customization-data-autonomous-data-warehouse.html

 

 

Krithika Raghavan

Director, Oracle Analytics

Gunaranjan Vasireddy

Over 18 years of IT industry experience with over 16 years in the Data Warehousing arena. Worked in various IT industries including Services, R&D and Product Development. Been with Oracle since 2011 and entire experience with Oracle is in the BIAPPS Product Development. Currently working as Director for Oracle BI Applications Product in Oracle.

Phani Krishna Somayajula


Previous Post

Custom General Ledger Metrics in Fusion Analytics Warehouse using Semantic Model Extensions

Krishna Prasad Kotti | 8 min read

Next Post


Tracking and analyzing absences with Oracle Fusion HCM Analytics

Nupur Joshi | 8 min read