X

Welcome to All Things Data Integration: Announcements, Insights, Best Practices, Tips & Tricks, and Trend Related...

Slowly Changing Dimensions (SCD) Type 2 Implementation in Oracle Cloud Infrastructure (OCI) Data Integration

ADITYA DUVURI
Product Manager

Today, we will learn about implementing Slowly Changing Dimensions (SCD) in Oracle Cloud Infrastructure (OCI) Data Integration. A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.

OCI Data Integration can be used to define, deploy, and load most types of SCDs. The three broadly used SCD types are:

  • Type 1 SCDs - Overwriting: In a Type 1 SCD the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere else. This is the default type of dimension you create. You do not need to specify any additional information to create a Type 1 SCD. In Oracle Cloud Infrastructure Data Integration, this can be achieved with the help of "Integration Strategy" as Merge or Overwrite.
     
  • Type 2 SCDs - Creating another dimension record: A Type 2 SCD retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active. In OCI Data Integration, this can be achieved with the help of "Integration Strategy" as Merge.
     
  • Type 3 SCDs - Creating a current value field: A Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous value and the current value of the selected attribute. When the value of any of the selected attributes changes, the current value is stored as the old value and the new value becomes the current value.

In this blog, we are going to walk through a complete end-to-end SCD Type 2 implementation.

Pre-Requisites -

Let's Start the Implementation!

There are several ways to implement SCD Type-2. Here, with the help of full-outer and left-outer joins we will identify the new and old records and use conditional statements such as Case/When to implement an SCD Type-2 process. In this example, using a single data flow, we are going to cater to the full and incremental data load to populate the SCD Type -2 table. Below is the list of tables used in the data flow with the structure of the target table and the corresponding mapping with SCD behavior.

List of source and target tables used -

  • Employees
  • Departments
  • Locations
  • Countries
  • Regions
  • Jobs
  • Emp_Tgt [Target Table]

The tables used in this example are from the HR schema and can be referred from Oracle livesql

Target Table "Emp_Tgt" Mapping -

Target and source table mapping

DDL for EMP_TGT -

CREATE TABLE EMP_TGT
(
 ENAME VARCHAR2(100),
 EMPNO NUMBER(10,0),
 JOB VARCHAR2(50),
 MGR NUMBER(4,0),
 HIREDATE DATE,
 SAL NUMBER(10,2),
 COMM NUMBER(7,2),
 DEPTNO NUMBER(10,0),
 DEPARTMENT_NAME VARCHAR2(200),
 STATE_PROVINCE VARCHAR2(200),
 COUNTRY_NAME VARCHAR2(200),
 REGION_NAME VARCHAR2(200),
 SEQ_NO VARCHAR2(500),
 START_DATE DATE,
 END_DATE DATE,
 ACTIVE_FLAG VARCHAR2(1),
  PRIMARY KEY (SEQ_NO)
);

Below is the data flow created for building a Type 2 slowly changing dimension -

Complete Data Flow for SCD Type-2

With the help of the left outer join and full outer join, we have identified the updated, inserted, and changed records based on the primary key, SCD Type 2 column. Here, the left outer join is used to get only the target data matching with the source along with additional records from the source instead of bringing the complete target data for reprocessing.

Left Outer Join used in the data flow

With the help of full outer join, we will be comparing the incoming records with the target records on basis of SCD type 2 columns. Here, COMMISSION_PCT and SALARY are the SCD type-2 columns in the target. We have used NVL conditions to handle NULL values coming from the source.

Join Condition -
NVL(JOIN_7_1.EMPLOYEES.EMPLOYEE_ID,-99) = NVL(JOIN_7_2.EMP_TGT.EMPNO,-99) 
AND NVL(JOIN_7_1.EMPLOYEES.COMMISSION_PCT,-999) = NVL(JOIN_7_2.EMP_TGT.COMM,-999) 
AND NVL(JOIN_7_1.EMPLOYEES.SALARY,-222) = NVL(JOIN_7_2.EMP_TGT.SAL,-222)

Full Outer join in the data flow

In the expression operator i.e. EXPRESSION_1, we have derived the unique ID (generating unique identifier using UUID()), active flag, effective start date, and effective end date.

  • SEQ_NO:
    CASE WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NOT NULL AND      EXPRESSION_1.EMP_TGT.EMPNO IS NULL THEN UUID()
    ELSE EXPRESSION_1.EMP_TGT.SEQ_NO
    END
  • ACTIVE_FLAG:
    CASE WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NOT NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NULL THEN 'Y' 
    WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NOT NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NOT NULL THEN 'Y'
    WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NOT NULL THEN 'N' 
    ELSE 'N'
    END
  • START_DATE:
    CASE WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NOT NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NULL
    THEN CURRENT_DATE
    ELSE EXPRESSION_1.EMP_TGT.START_DATE
    END
  • END_DATE:
    CASE WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NOT NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NULL
    THEN TO_DATE('31-12-9999','dd-MM-yyyy')
    WHEN EXPRESSION_1.EMPLOYEES.EMPLOYEE_ID IS NULL AND EXPRESSION_1.EMP_TGT.EMPNO IS NOT NULL
    THEN CURRENT_DATE
    ELSE EXPRESSION_1.EMP_TGT.END_DATE
    END

Expression used for SCD type-2

Once the flow is completed, save the data flow, create an Integration Task, publish to an application, and run the corresponding task for implementing SCD type-2.

Runs executed in OCI DI

The 1st run inserts all the records from the source table into the target assuming that there are no records in the target table. In the 1st run, for all the records, ACTIVE_FLAG = 'Y', START_DATE = <<current date>>, and END_DATE = '31-DEC-9999'.

Results after the 1st run

In the 2nd run, changed records coming from the source are identified with the target and SCD type-2 logic is applied to those records. Below are the screenshots which showcase the changes that happened in the target during 2nd run based on the changes in the source.

Records that got affected after the 1st run

SCD 2 changes in the target after the 2nd run

In the 3rd run - further changes were performed in the source table Employee where salary and commision_pct of certain employees were changed. After a successful run of the Integration Task, new rows got inserted for the changed records.

Successful run after 3rd run

Resultset after the 3rd run

This concludes an overview of implementing SCD Type-2 with Oracle Cloud Infrastructure Data Integration.

Have you seen some of our other blogs?  Check out all the blogs related to Oracle Cloud Infrastructure Data Integration. To learn more, check out some Oracle Cloud Infrastructure Data Integration Tutorials and Documentation.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.