Monday Nov 09, 2015

An Implementer’s Guide to External Data Support for Universal Sources

Applies to OTBI-Enterprise Version 2 PB 2


Overview of the External Data Support Feature. 3

Support for On-Premise Oracle AU sources. 3

Support for Non-Oracle sources. 3

Supported Subject Areas – HCM+. 4

Practical Use Cases. 5

Getting Started with External Data – Non-Oracle Sources. 5

Sequence of Steps – High Level 6

Deep Dive into the Actual Steps. 6

Register Source and Configure Storage Service Container. 6

Build Load Plans. 8

Prepare Data Files and Upload. 11

Prepare Data Files. 11

Upload Data Files to SSC. 15

Run your Load Plans. 16

Managing Deletes in Source System.. 17

General Tips. 18

Resolving Domains. 18

Special Cases. 22


Data Security with External Data. 24

Potential Automations. 26

File Upload to Storage Service Container. 26

Overview of the External Data Support Feature

Oracle Business Intelligence Applications traditionally supported packaged data warehouses using pre-built ETL adaptors for on premise Oracle sources such as E-Business Suites, PeopleSoft, Siebel, JD Edwards and so on. For Non-Oracle sources, a template based Universal Adaptors were supported through flat files (CSV formats).

With the inception of Oracle Fusion Cloud and other cloud base applications, such as Oracle Talent Cloud (Taleo), ETL adaptors for the same were also delivered.

With their latest product line called OTBI-Enterprise, Oracle delivers their data warehouse solution to the cloud. However, until Version 1, they supported only Oracle Fusion HCM Cloud as the only source. With Version 2, Oracle Talent Cloud (Taleo) was also added to the support matrix, thereby making it possible to analyze HCM data (from Fusion HCM) and recruiting data (from Taleo) together.

With OTBI-Enterprise’s latest patch bundle (2), they extended support for HCM data from two potential on-Premise Oracle sources, namely, E-Business Suite Applications 12.2 and PeopleSoft Applications 9.2 using their pre-built ETL adaptors for these sources. On top of that, support is now also extended for HCM subject areas coming out of Non Oracle sources via the Universal Adaptors. Together put, this is what is being termed as “External Data Support”. With this, now you will be able to upload your on-Premise EBS or PSFT data to the on-Cloud data warehouse, along with the mix of your potential other cloud based sources like Oracle Fusion HCM Cloud or Oracle Talent Cloud (Taleo), in case you have those.

Support for On-Premise Oracle AU sources

For Oracle sources (like E-Business Suite, or PeopleSoft), see my other blog on Oracle Sources, called “An Implementer’s Guide to External Data Support for Oracle Sources”.

Support for Non-Oracle sources

For Non-Oracle sources (could be practically anything, including Non-Oracle cloud applications), obviously, there are no pre-built adaptors. However, Oracle BI Application’s standardized data models for their data warehouse makes it possible for you to populate your Universal Staging area tables and thereon, Oracle BI Applications ETL takes over. The Source Independent Load routines read from these Universal Staging area tables and populate the end data warehouse tables, the Post Load Process takes it further to populate required Aggregate tables. Finally, the reporting layer is already attached to the warehouse. In short, majority of the heavy lifting of the data is already taken care of; you need to just populate the Universal Staging Tables. How?

Oracle provides documentation through which you get to know which staging tables (by subject areas) they need. Along with that comes information about the map of those tables with the CSV files. Finally, once you know the list of CSV files to work on, Oracle’s documentation provide detailed information about the structure of those files, their data types, column (functional) descriptions, key information and any other related information. With all that knowledge, you create the required Universal CSV files and upload to SSC. OTBI-Enterprise now takes over and loads them to the data warehouse and connect the dots to the reporting layer.

Here is a quick diagrammatic representation of Non Oracle (Universal Adaptor) External Data support, illustrating how it fits into the rest of the OTBI-Enterprise picture.

Supported Subject Areas – HCM+

HCM+ acronym stands for certain additional areas outside of core HCM, such as GL Journals. The “Workforce Effectiveness” subject area uses this, and is supported for External Data. Following shows the entire HCM+ support matrix, for Universal sources.

Practical Use Cases

You would most likely have a mix of source systems against which you would want your data warehouse to be built. For example, you may have upgraded to Oracle Fusion HCM cloud for your core HCM data. However, your payroll processing is outsourced to a third party vendor (maybe ADP). You might still be using EBS for your Financials (upgrade to Oracle Fusion FSCM work-in-progress). And you use Oracle Talent Cloud (Taleo) for recruiting data. Such a mix is quite normal. With the delivery of External Data Support, this is how you can work out your solution in OTBI-Enterprise and have your data warehouse on the cloud.

Ø Use Oracle Fusion HCM Cloud Adaptor for core HCM, except Payroll subject area.

Ø Use External Data – Universal adaptor approach for Payroll data from third party (ADP). This is the case we are dealing with in this document.

Ø Use External Data – EBS for GL Journals [that’s the only subject area supported with HCM+]. See my other blog on Oracle Sources, called “An Implementer’s Guide to External Data Support for Oracle Sources”.

Ø Use Oracle Talent Cloud (Taleo) Adaptor for recruiting subject area.

There could be many such cases and combinations. As long as you understand what is supported (matrix above) you get to decide which data comes from where.

Getting Started with External Data – Non-Oracle Sources

When it comes to Non Oracle sources, OTBI-Enterprise uses the Universal Adaptor to populate the data warehouse tables. The Universal Adaptor relies on CSV data files to be present in the ..\srcFiles directory. The actual directory gets provisioned through set up tasks at BI Application Configuration Manager (BIACM). The ETL maps to these CSV files and loads the Universal Staging area tables. Thereon, the regular ETL processes like source independent load, post load etc. kicks in. Note that there is no SDS schema when it comes to Universal Adaptor. The ETL reads directly from the files.

Populating the UA CSV files is usually the biggest challenge for customers. First, to figure out which files are to be populated and what should be their names. Next, to understand the content that goes in each of these CSV files. We will go over these in details below.

Sequence of Steps – High Level

At a high level, the sequence of steps goes like this:

1. Register Source, configure Storage Service Container (SSC)

2. Pick subject areas (FGs, or Fact Groups) and build Load Plans [Regular and Domains-only]

3. Populate required data files and manifest file (list of files)

4. Upload data files (for Domains-only LP) to SSC

5. Run Domain-only LP

6. Upload data files (for regular LP) and new manifest file to SSC

7. Run regular LP

8. Repeat steps 6 and 7 for incremental ETL runs

Deep Dive into the Actual Steps

Let’s take a deeper look into what goes in each of these steps.

Register Source and Configure Storage Service Container

First, you need to enable BI Applications Offerings. To do that,

Ø Login to BIACM

Ø Click on Tasks à Source Setups à Manage Business Applications

Ø Select the offerings you prefer

Ø Click Save

The following is a screenshot showing Human Resources and Financial Offerings were enabled. Custom Offerings always come by default. For OTBI-Enterprise, recommendation is to disable it, since there’s not much of a customization you will get to do.

Now we proceed to source registration. Register Universal Adaptor source for External Data. This is how:

Ø Click on Tasks à Source Setups à Define Business Intelligence Applications Instance

Ø Source Systems Tab à Click on Create à Then provide a Source Instance Name and a Data Source Number, as shown in the screenshot below. It is recommended that you keep 900 as the data source number for Universal file sources.

Ø Click on Next. Note that the Source Files folder path gets populated. At the end of the successful registration, BIACM will provision this folder automatically.

Ø Create a new Container (Drop Down from “Container” and pick “Create Container”) if configuring for the first time. Pick any name. The screenshot below shows that the name given is “Container_Universal”.

Ø Click on Save. The Source system will be registered and the Storage Service Container is created

Build Load Plans

Next, build your load plans. Two load plans are needed. A “Domains-Only Load Plan” which you will run first. The next one is your regular load plan, which you will be running regularly (first time goes in Full mode, thereafter in Incremental mode).

Ø Click on Tasks à Load Plan Administration à Manage Load Plans à Add

Give a name to your LP. We call it “UADomains_LP1”. Pick the right LP type and select your source instance that you had set while registering your source. Since we do Domains-Only LP first, the correct LP Type should be “Domain-Only Extract and Load (SDE and SIL)”.

Ø Click on Next, and then pick the FGs (Fact Groups) as needed. Use the information in the HCM+ support matrix mentioned above in this document. Make sure you pick from what is supported.

Ø Click Generate. Wait for the Domains LP generation to complete.

Now build another Load Plan which you will be using for regular loads, full and incremental.

Ø Click on Tasks à Load Plan Administration à Manage Load Plans à Add

Give a name to your LP. We call it “UAFull_LP1”. Pick the right LP type and select your source instance that you had set while registering your source. Since we are doing regular LP now, the LP Type should be “Source Extract and Load (SDE, SIL and PLP)”.

Ø Click on Next, and then pick the FGs (Fact Groups) as needed. Use the same FGs as you used while creating the Domains-Only LP.

Ø Click Generate. Wait for the Full LP generation to complete.

This completes the step of building all necessary Load Plans.

Prepare Data Files and Upload

This is the most critical part of the implementation project. Let’s break it down to two parts, Preparation of files, and Upload to SSC.

Prepare Data Files

Depending upon the FGs you picked for your LP, the first thing you need to gather is the list of CSV files you will actually need. OTBI-Enterprise documents it quite clearly, by subject areas. You could download that from OTBI-Enterprise Document Library under the section “View Universal Adaptor Reference for HCM Subject Areas”. A sneak peak of that file is here:

Once you know the names of physical CSV files, and also the names of the logical ODI model store name, the next (and probably the biggest) concern is to know what goes in there. Fortunately, OTBI-Enterprise provides some decent documentation about the entire topic – all that you need to know. For example, what should be grain of the data set, which staging area table and which final dimension or fact table this data will get into, a summary about the nature of data at a high level, and so on. Next, it provides the shape of the CSV file, column header names, data types and lengths, whether the column is mandatory or not [some actually aren’t], whether this column is a domain code [including list of potential target warehouse conformed domain codes], and lastly the description of the column through which you get to know what the column is about.

You can download this documentation from OTBI-Enterprise Document Library under the section “View External Data File Load Specifications”. A sneak peak at that document is provided here for reference purposes.

This should be a reasonable enough information for you to get started. OTBI-Enterprise even provides sample CSV files (with no data) so that the structure of the CSV file is already there. You just need to add data to it. There are a whole bunch of rules about the data that you are going to provide. Here is the list you should keep in mind:

Ø First four lines mandatorily contains true header information. You can put whatever you want here, OTBI-Enterprise will assume these as non-data and skip these four lines.

Ø The fifth line is the column header line. This should match with the sample CSVs OTBI-Enterprise provides. If you are using one of their sample files instead of creating your own, you are OK here. This is also assumed to be a non-data row, and OTBI-Enterprise will skip it.

Ø Data starts from the sixth row.

Ø File name should be of the format file_<table name>-timestamp.csv. For example, file_absence_event_fs-20140916_004935.csv, where timestamp format is YearMonthDay_24HourMinuteSecond [20140916_004935]

Ø Raw CSVs are only accepted. Compressed (zipped) files aren’t allowed.

Ø Maximum size limit of a file is 5Gigs.

Ø Ensure you structure your csv file with the columns in the order as specified in OTBI-E documentation, or start from OTBI-Enterprise provided sample CSVs. When there is no data for a column, leave an empty placeholder. For example, if the format specifies First Name, Middle Name, Last Name, with Middle Name as not required, you must still submit the CSV file data as "Mary" ,"", "Washington".

Ø Since “comma” itself can be part of your data, all character data must be encapsulated within double-quotes, if that is your escape character.

Ø Non-character columns must not be encapsulated within double-quotes.

Ø Remove any new line characters such as \r or \n.

Ø Date data must be defined as per OTBI-Enterprise documentation. It is usually of string type in the format: YYYYMMDDHH24MISS. Don’t encapsulate this within double-quotes.

So that’s about your data file. You will populate a set of such CSV data files. Now, OTBI-Enterprise for External Data support for Universal Adaptor requires another file, called the “Manifest” file. This is more like an index file, containing all the data CSV file names which would be consumed in a given lot. Requirements on the manifest file is as follows:

Ø MANIFEST should be the last file to get uploaded after uploading all data files. The download program looks at the Manifest to determine which files ought to be downloaded. After successful download and processing of the CDV data file, the MANIFEST file gets renamed so that they are NOT further picked up and data CSV files will be marked for expiry (purged when it reaches stipulated number of expiry days).

Ø MANIFEST file naming format is "MANIFEST-TIMESTAMP.MF". Timestamp format is YearMonthDay_24HourMinuteSecond [20150417_035459]. It is mandatory to have timestamp in MANIFEST file name as same file names will be overwritten in Cloud Storage Service. For example, MANIFEST-20150417_035459.MF.

Ø MANIFEST file should NOT be compressed/zip.

A sample data CSV file and a sample manifest file is shown in the following screenshot.

Finally, you have all your CSV data files ready and populated, along with the manifest file for the first batch. You now upload these to the SSC. We discuss that next. But for now, put all your data CSV files and the MF file to a particular folder, say “C:/temp/upload”. All files from here will go into SSC.

Upload Data Files to SSC

OTBI-Enterprise actually leaves it up to the customer how they want to upload the files to the Storage Service Containers. However, it is assumed that customers could get a bit technical and prepare their own script to upload the files. There are quite a few ways to do that, and utilities available as well. One of them is called CyberDuck, which has a module for Oracle Storage Cloud Service.

I am going to discuss a specific programming option using Java (see the Potential Automations section). This isn’t supported by OTBI-Enterprise, but hey, it works! You are free to adopt this program, change it to your needs, or come up with your own.

Assuming that you’ve done it, this concludes your data preparation and upload step.

Run your Load Plans

At first, you are required to run the Domains-Only Load Plan.

Ø Click on Tasks à Load Plan Administration à Manage Load Plans. Then pick the Domains-Only LP called “UADomains_LP1” and click on Execute. Check on the Execution status and wait for it to finish.

Once this finishes, you need to re-upload dated CSV files that are needed for the regular LP and a new dated manifest file. You don’t have to clean up the files existing in SSC, but it is a good practice to remove clutter from time to time. Then run the regular LP. Same navigation as above.

Managing Deletes in Source System

Managing deletes for External Data with Non-Oracle sources can be a bit tricky. In this case, OTBI-Enterprise having no clues about the actual source system, relies on you to provide the deleted data related information. Managing deleted data (from source) in the data warehouse is crucial, otherwise you end up getting incorrect numbers. Note that unlike the case with Oracle sources, the OTBI-Enterprise does not maintain a replica schema (an SDS) of the OLTP within the OTBI-E pod. That’s because OTBI-Enterprise isn’t aware of how the source system looks like. Therefore, it turns the responsibility to you to inform about the deleted data through the same flat file (CSV) approach.

The CSV data files, as we discussed above, are to be provided one per staging area table. The column headers of the CSV exactly match up with the columns of the staging area table. There is one such column in all staging area tables (and hence in all CSV files) called “DELETE_FLG”. When you upload data files for initial load, all values for this column would be “Y”. In your incremental run, assuming you have found out some updates on a previously uploaded record, you would need to re-upload that with the changes. Similarly, if the record that you had uploaded earlier is now hard deleted in your source, you will need to re-upload that same record, this time with “DELETE_FLG” set as “N”.

Take a close look at the above screenshot. This is an incremental data file that you are about to upload. Usually the PK definition of universal staging table (and hence the CSV files) is made up of columns:


For data effective tables, a third column gets included as well, making it a three-column PK:


For this example, we have only the first option (two-column PK). You had originally uploaded data for this PK: '900,300100044288208~300100044268215' with DELETE_FLG = ‘N’. The data warehouse have been using this record for its calculation/analysis so far. Later, you figured out that this record was hard deleted in your source. You should now have a process (within our source) that spits out deleted records along with new/updated records. New/Updated records goes their usual way with DELETE_FLG as “N”, but deleted records must have the DELETE_FLG column set to “Y”, as shown in the above picture. The downstream data warehouse process will adopt this as a soft-delete and move on accordingly, removing this record from all calculations and analysis. That’s pretty much the idea.

You can choose to provide all deleted records for all files in a completely separate batch, where all records are marked DELETE_FLG as “Y”, or you can mix with your daily regular data files – your choice.

General Tips

Resolving Domains

Resolving source domain member codes to target domain member codes happens as a part of the SDE interfaces in cases of pre-built adaptors (for E-Business Suites, PeopleSoft, Fusion etc.). However, for Universal sources, you are held responsible for providing the target domain member codes in the CSV files itself. This is quite an important step. Fortunately, there are data model standards in place through which you know that a given column is a target domain code column. Target domain column names are usually prefixed with “W_”. The documentation available will also make a note of that. We could take an example from OTBI-Enterprise Document Library under the section “View External Data File Load Specifications”. We took a similar example above, but that was for a fact source, which usually doesn’t carry a domain. An example of a dimension source would probably help to understand this concept. See below:

So, the first and second column for this file are for you to populate your source domain member code values about “Status” (indicating requisition or applicant status) and “Reason” (indicating the reason, if available, for such a requisition or applicant status). These are your “raw” source values. Then, the third column in this file is a target domain column. The values would be one from the list of values OTBI-Enterprise requires. Based on the data for first two source domain member code columns, you need to do the mapping and “choose” the appropriate target domain member code value.

Assume that the list of target domain member code values include items like “ASSESSMENT_INTERVIEW”, “ASSESSMENT_INTERVIEW1” and “ASSESSMENT_INTERVIEW2” (names are “Assessment Interview”, “Assessment First Interview” and “Assessment Second Interview” respectively). There are actually 22 out of the box values for this target domain.

Also assume that your source domain member code values include items like “WT”, “INT1”, “INT2”, “INT3” and “FINT” (names are “Written Test”, “Interview Round 1”, “Interview Round 2”, “Interview Round 3” and “Final Interview” respectively).

In this case, your job would be to map your 5 source values to the possible list of 3 target values. And you have to get this done in the CSV file itself, before providing it to OTBI-Enterprise. That kind of logic needs to be there at your end. You cannot change the list of target values, unless they are coded as “extensible”. If extensible, you will be allowed to add more to the list in BIACM.

Must be wondering where to get this list of allowed target domain member code values? Simple.

Ø Log into BIACM and navigate to Manage Warehouse Domains.

Ø Do your search based on the domain code (for example, W_RECRUITMENT_EVENT)

Ø Note down the Warehouse Members in the bottom pane. A screenshot below.

While on this topic, let me go over another much related issue here – Domain Mapping Hierarchies. In the above file, the first and second columns (source domain) will lead to mapping the third column (target domain) – that’s great. But what about the fourth column? That is also a target domain column. How do we map that? How do we map the fifth, the sixth, etc.? All of them seem to be target domain columns. The best thing is the read up the functional task manager documentation for this task, if there is any. For any complicated mappings like this, OTBI-Enterprise provides additional information help topics. For the scope of this document, you need to understand only this – you may have to simply copy the mapping from BIACM, if it falls under a domain mapping “hierarchy”.

See this screenshot (Navigation: BIACM à Manage Domain Mappings and Hierarchies à Search on Domain Code like “W_RECRUITMENT”). Then go to the Hierarchies tab within the middle pane.

You notice the hierarchy up from Recruitment Event to Recruitment Sub Stage to Recruitment Stage. This is target domain hierarchy. The parent value at any level is based on the child values. And since the lowest value is a target domain (Recruitment Event, in this case) OTBI-Enterprise has pre-done the mapping up to Sub Stage (see above). The mapping is visible in the bottom most pane.

The next screenshot shows the mapping from Sub Stage to Stage level.

As a universal adaptor user, all you need to do is to copy these “mappings” from BIACM and reflect them in your CSV files. This is true for any target domain columns that are a part of a domain hierarchy. If not (which you can find out looking into BIACM in the above way), you would need to do the mapping from your source domains to the target domains.

Mapping domains is mandatory before even starting a full load in OTBI-Enterprise. A whole lot depends heavily on these mappings.

Special Cases


OTBI-Enterprise product dimension uses a customer configurable source file called “file_unspsc.csv”. Out of the box they provide a few rows of sample data, but customers who have paid for UNSPSC (United Nations Standard Products and Services Code) data from, are expected to provide their own “paid” data. If you are one such customer, you need to download this sample file from OTBI-Enterprise, put your own UNSPSC data in it in the specified format, and upload it to Storage Service.

That’s nothing different in terms of how it works for Oracle Sources (E-Business Suite, or PeopleSoft). However, for non-Oracle sources, there is a bit of a special case, which is why I am talking about UNSPSC in this section.

Problem is, even though this is necessarily a one-time activity (until the international standard changes), you need to upload this same file in each and every batch of ETL, over and over again. For Universal CSV files, once you upload to SSC along with a manifest file, OTBI-Enterprise downloads everything to the Source Files folder (remember that during your source registration step?). So, file_unspsc.csv gets downloaded and consumed in the ETL. Next time, before downloading, OTBI-Enterprise deletes all files from the source files folder. So, file_unspsc.csv gets deleted. For real changing data files, this is OK. But for UNSPSC data, this wasn’t warranted. However, that’s the way it is.

Long story short, just remember to provide the file_unspsc.csv in every batch. If you do use UNSPSC, provide with valid data. If not, just keep providing OTBI-Enterprise’s sample file. You can find this information in the OTBI-Enterprise Document Library under the Known Issues section for UNSPSC.

Data Security with External Data

For Non Oracle sources, OTBI-Enterprise relies on data security measures built on top of the data warehouse itself, leveraging configurations you do on BIACM. This won’t necessarily “match” with your source applications, but that is how far you may be able to get. With more configuration options OTBI-Enterprise provide in the future, we can only get closer. The following section describes how OTBI-Enterprise supports data security and what they recommend you to configure.

Prior to that, it is assumed that you have all your OTBI-Enterprise users pre-configured in the OTBI-E pod. Also, you should have a clear plan in your mind on which of these users should get what kind of data access. And lastly, it is also assumed that you have configured the rest of the things and have run a complete full load of the ETL.

Here are the steps, once you are ready:

Ø Log into BIACM.

Ø On the “Tasks” section on your left hand panel, click on “Manage Data Security”.

Ø Select the source instance that you created before generating your Load Plans. In the following screenshot, the source instance selected was “EBS1222”, which is the same one we created earlier in our example.

Ø From the bottom pane, click the “plus” button to add a security object.

Ø Pick your source instance (EBS1222 in this case), and then select your “Secure by Object”. If you are planning to enable Department based data security, pick the appropriate item from the drop down box.

Ø From the “Users” list, pick your user (or users) who would have similar Department based security restrictions. In this case, we are picking a user called TM-MFITZIMMONS. Select this user and click the single arrow to get this user on the right hand side. Note that if you have multiple users planned, you could use the User Filter to search for them.

Ø Scroll down to see the list of Departments. Select (or multi-select) the Departments to which your user(s) would have access to. Click the single arrow to get them to the right hand side. Then click OK.

Ø The Security objects that you’ve configured will be shown in the bottom pane, as shown below.

Ø If you have other users or wish to secure by other (supported) dimension objects, you need to carry out the same steps.

Instead of having to do this for every user (or users with similar security requirements), OTBI-Enterprise provides a way to import all your security configurations from an external CSV file. For the sake of this document, I am not repeating what is already documented within OTBI-Enterprise Document Library - Setting up Data Security section. Even the single user based process, as I described above, is also mentioned in the same document library. However, I had to pick one that relates to the examples I discussed earlier. And it happened to be the first approach. Anyway, that’s pretty much about it.

Potential Automations

None of these options are supported by OTBI-Enterprise. These are just options for you to use or come up with similar (maybe “better” ones) on your own.

File Upload to Storage Service Container

The first area of potential automation would be a way to upload your files to Storage Service Containers, programmatically. This is going to be a daily-exercise, so it better be automated. Here is a small and quick Java program that utilizes class libraries from Oracle Cloud Storage Services, and helps you move towards automation.

This program expects a property file as an input, called “”. In this, you provide certain details about your storage container situations, as follows:

STORAGE_SERVICE_URL à Storage service URL that Oracle gave you.

STORAGE_SERVICE_NAME à Name of the Storage Service. Oracle gives you this as well.

STORAGE_SERVICE_USER_NAME à Your username to access Storage Service.

STORAGE_SERVICE_PASSWORD à Your password to access Storage Service.

STORAGE_SERVICE_CONTAINER_NAME à The name of the container, as you configured while registering your source.

ACTION à The action you want this program to take. Options are:

LIST_CONTAINERS à Gives a listing of all containers within “STORAGE_SERVICE_CONTAINER_NAME”


LIST_FILES à Gives a listing of all files currently existing in the container “STORAGE_SERVICE_CONTAINER_NAME”

DELETE_FILES à Deletes all files from container “STORAGE_SERVICE_CONTAINER_NAME”


UPLOAD_FILES_FROM_DIRECTORY à This is the location in your local server where all your files are waiting to be uploaded.

Create and save such a property file by the name “”. A Sample screenshot below:

Following is the Java code. Copy it to an appropriate IDE like Eclipse, save the code as “”. Then, compile it to create a .jar file. Call it “StorageService_Manager.jar”. Needless to mention, you need to have the Oracle Cloud Service Libraries referenced here, and a few more. I had these under by Library folder (see screenshot):


import java.util.*;







public class StorageService_Manager {

public static Properties inParam;

public static String storageUrl;

public static String storageServiceName;

public static String storageUserName;

public static String storagePassword;

public static String storageContainerName;

public static String storageAction;

public static String uploadFromDirectory;

public static CloudStorage myConnection;

public static void main(String[] args) {



storageUrl = inParam.getProperty("STORAGE_SERVICE_URL").trim();

storageServiceName = inParam.getProperty("STORAGE_SERVICE_NAME").trim();

storageUserName = inParam.getProperty("STORAGE_SERVICE_USER_NAME").trim();

storagePassword = inParam.getProperty("STORAGE_SERVICE_PASSWORD").trim();

storageContainerName = inParam.getProperty("STORAGE_SERVICE_CONTAINER_NAME").trim();

uploadFromDirectory = inParam.getProperty("UPLOAD_FILES_FROM_DIRECTORY").trim();

storageAction = inParam.getProperty("ACTION").trim();

//config and cloud storage object

CloudStorageConfig myConfig = new CloudStorageConfig();





myConnection = CloudStorageFactory.getStorage(myConfig);

//list containers within a storage

if(storageAction.toUpperCase().equals("LIST_CONTAINERS")) {



//upload files

if(storageAction.toUpperCase().equals("UPLOAD_FILES")) {



//list files

if(storageAction.toUpperCase().equals("LIST_FILES")) {



//delete files

if(storageAction.toUpperCase().equals("DELETE_FILES")) {



//create container

if(storageAction.toUpperCase().equals("DELETE_FILES")) {



//rename files

if(storageAction.toUpperCase().equals("CREATE_CONTAINER")) {



System.out.println("End ...");

}catch (Exception e){


System.out.println("Error has ocuured \n" +e);



private static void createStorageContainer() {

System.out.println("Creating container :" + storageContainerName);

try { c = myConnection.createContainer(storageContainerName);

}catch (Exception e){


System.out.println("Error in creating container \n" +e);



private static void renameFiles() {

System.out.println("Renaming files in container :" + storageContainerName);

java.util.List<Key> cobjects = myConnection.listObjects(storageContainerName, null);

for (int i = 0; i < cobjects.size(); i++) {

System.out.println("Renaming :" + cobjects.get(i).getKey());

StorageInputStream sins = myConnection.retrieveObject(storageContainerName, cobjects.get(i).getKey());

myConnection.storeObject(storageContainerName, "done_"+cobjects.get(i).getKey(), "text/plain", sins);

myConnection.deleteObject(storageContainerName, cobjects.get(i).getKey());



private static void deleteFiles() {

System.out.println("Deleting files in container :" + storageContainerName);

try {

java.util.List<Key> cobjects = myConnection.listObjects(storageContainerName, null);

for (int i = 0; i < cobjects.size(); i++) {

System.out.println("Deleting :" + cobjects.get(i).getKey());

myConnection.deleteObject(storageContainerName, cobjects.get(i).getKey());


}catch(SystemException ex) {

System.out.println("Error in deleting files in container :" + storageContainerName);




private static void listFiles() {

System.out.println("Listing files in container :" + storageContainerName);

try {

java.util.List<Key> cobjects = myConnection.listObjects(storageContainerName, null);


for (int i = 0; i < cobjects.size(); i++) {

System.out.println(cobjects.get(i) + " :" + cobjects.get(i).getKey());


}catch(SystemException ex) {

System.out.println("Error in listing files in container :" + storageContainerName);




private static void uploadFiles() {

try {

File f = new File(uploadFromDirectory);

File files[] = f.listFiles();

System.out.println("Uploading files to container :" + storageContainerName);

for (File file : files) {

try {

FileInputStream fis = new FileInputStream(file);

String fileName = file.getName();

System.out.println("Uploading :" + fileName);

myConnection.storeObject(storageContainerName, fileName, "text/plain", fis);

}catch(FileNotFoundException ex) {

System.out.println("Error in uploading file :" + file.getName());




}catch (NullPointerException ex){

System.out.println("Error in uploading files in container :" + "Upload file directory does not exist");



catch (Exception ex){

System.out.println("Error in uploading files in container :" + storageContainerName);




private static void listContainers() {

System.out.println("Listing Containers");

try {

List<> containerList = myConnection.listContainers();

for (int i = 0; i < containerList.size(); i++) {

System.out.println(containerList.get(i) + " :" + containerList.get(i).getName());


}catch(Exception ex) {

System.out.println("Error in listing containers");




private static void initProperties() throws IOException {

inParam = new Properties();




That’s it. Running it is fairly simple. Be on the directory where you have your JAR. Then run as:

java -cp .;%CLASS_PATH% -jar StorageService_Manager.jar >> storage_out.log

The output file “storage_out.log” records all that happens on the console.

Wednesday Feb 19, 2014

Notes for implementing Universal adapter for OBIA Project analytics

Author: Amit Kothari

Introduction: This blog outlines the steps for implementing OBIA Project Analytics for Universal Adaptor. Similar steps can be followed for other modules.

Supported OBIA releases: onwards

Supported Apps releases: Universal Adapter.


 Please refer to the OBIA documentation and the DMR as a starting point for this exercise.Also refer to this blog entry.

Please login to the ODI Designer to see the OBIA Projects Universal interfaces, the Source files can be seen in the Model layer.

1. High level steps to import data into the data warehouse through the Universal adapter.

a. Populate the csv files with your data (eg. file_proj_budget_fs,.csv is the source file for w_proj_budget_fs table). Typically customer writes an extract program like a shell file/PL*SQLprogram etc which creates these data files from a non supported Source OLTP system.

b. Refer to the steps details of how to populate these files.

c. Build a Load Plan with fact groups: "900: Universal Adaptor Instance"."Project".

d. Run the Load Plan that you created in the previous step.

e. Note: If applicable this Load Plan must be run after the regular Load Plan to populate Oracle Business Analytics Warehouse for the other Subject Areas has completed.

2. The configuration file or files for this task are provided on installation of Oracle BI Applications at one of the following locations:

a. Source-independent files: <Oracle Home for BI>\biapps\etl\data_files\src_files\.

b. Source-specific files: <Oracle Home for BI>\biapps\etl\data_files\src_files\<source adaptor>.

c. Your system administrator will have copied these files to another location and configured ODI connections to read from this location. Work with your system administrator to obtain the files. When configuration is complete, your system administrator will need to copy the configured files to the location from which ODI reads these files.

d. Refer to the Appendix section ‘Setting Up the Delimiter for a Source File’.

  1. As a general rule default 0 for numeric columns and '__NOT_APPLICABLE__’ for string columns so that we do not run into ‘Not Null’ errors when ETLs start loading data.
  2. Date columns should be populated in the CSV file as a number in the format YYYYMMDDHH24MISS or kept null
  3. The dimension ID fields in the fact staging tables have to be populated with the integration_id of the various dimensions. This is very important otherwise the dimension wids fields in the fact tables will default to 0. Please refer to the ODI Model or the DMR for the star schema diagrams and other FK info.
  4. Similarly the common dimensions which Projects uses like W_INT_ORG_D, W_MCAL_DAY_D, W_MCAL_CONTEXT_G, W_EMPLOYE_D, W_JOB_D, W_INVENTORY_PRODUCT_D etc also needs to populated correctly via source files
  5. W_MCAL_CONTEXT_G has a class fields that holds two values – GL or PROJECTS. To resolve the project accounting dates in the fact tables there must be data present in this table for class ‘PROJECTS’
  6. There are various Domain codes which are loaded to Warehouse staging table W_DOMAIN_MEMBER_GS. In order to load this table the generic file File_domain_member_gs.csv has to be populated with the correct Domain code.
    1. The granularity of this file is each domain member per language for any of the domains listed above.
    2. Domain codes for Projects are listed in the Appendix. Just load the domains based on the Facts/Dims you are planning to load.

Table B-164 file_domain_member_gs.csv Field Descriptions

Column Name

Data Type

Sample Data



Not available.

Not available.

This should be populated with the Domain Code corresponding to the Source Domain that is to be configured.


Not available.

Not available.

Defaulted to 'S' - indicates this is a Source Domain Code.


Not available.

Not available.

This should be populated with the CODE value supplied in any of the above files.


Not available.

Not available.

This should be populated with the NAME value that corresponds to the Member Code supplied.


Not available.

Not available.

Not available.


Not available.

Not available.

Hardcode to '__NOT_APPLICABLE__'.


Not available.

Not available.

Not available.


Not available.

Not available.

Warehouse Language Code.


Not available.

Not available.

Source Language Code.


Not available.

Not available.

This is the unique ID for the record. The INTEGRATION_ID for this file can also be populated as DOMAIN_CODE~DOMAIN_MEMBER_CODE.


Not available.

Not available.

The unique Data Source ID of the Source Instance you are configuring.


A. Setting Up the Delimiter for a Source File

When you load data from a Comma Separated Values (CSV) formatted source file, if the data contains a comma character (,), you must enclose the source data with a suitable enclosing character known as a delimiter that does not exist in the source data.

Note: Alternatively, you could configure your data extraction program to enclose the data with a suitable enclosing character automatically.

For example, you might have a CSV source data file with the following data:

Months, Status
January, February, March, Active
April, May, June, Active

If you loaded this data without modification, ODI would load 'January' as the Months value, and 'February' as the Status value. The remaining data for the first record (that is, March, Active) would not be loaded.

To enable ODI to load this data correctly, you might enclose the data in the Months field within the double-quotation mark enclosing character (" ") as follows:

Months, Status
"January, February, March", Active
"April, May, June", Active

After modification, ODI would load the data correctly. In this example, for the first record ODI would load 'January, February, March' as the Months value, and 'Active' as the Status value.

To set up the delimiter for a source file:

1. Open the CSV file containing the source data.

2. Enclose the data fields with the enclosing character that you have chosen (for example, (").

You must choose an enclosing character that is not present in the source data. Common enclosing characters include single quotation marks (') and double quotation marks (").

3. Save and close the CSV file.

4. In ODI Designer, display the Models view, and expand the Oracle BI Applications folder.

Identify the data stores that are associated with the modified CSV files. The CSV file that you modified might be associated with one or more data stores.

5. In ODI Designer, change the properties for each of these data stores to use the enclosing character, as follows:

1. Double-click the data source, to display the DataStore: <Name> dialog.

2. Display the Files tab.

3. Use the Text Delimiter field to specify the enclosing character that you used in step 2 to enclose the data.

4. Click OK to save the changes.

You can now load data from the modified CSV file.

  1. Project Domains





























































Tuesday May 21, 2013

Implementing Oracle BI Applications using Universal Adaptors

Authored by : Sushanta Chakraborty

1. Implementing Oracle BI Applications – Universal Adaptors

 The purpose of this document is to provide enough information one might need while attempting an implementation of one or more offerings and subject areas using the Oracle BI Applications Universal Adaptors. There are several myths around what needs to be done while implementing Universal Adaptors, where can things go wrong if not configured correctly, what columns are to be populated as a ‘must’, how to provide ‘delta’ data set while shooting for incremental ETL runs and so on. All of these topics are discussed in this document.

Understanding the ‘entry points’ required to implement a subject area is the key to a successful implementation of Universal Adaptors.

2. General Background of Universal Adaptors

 Oracle BI Applications provide packaged ETL mappings against source OLTP systems like Oracle Fusion Applications, Oracle E-Business Suites, PeopleSoft, JD Edwards and Siebel across various business areas such as Human Resources, Supply Chain & Procurements, Order Management, Financials, Service and so on. However, Oracle BI Applications does acknowledge that there can be quite a few other source systems, including home-grown ones, typically used by SMB customers. And to that extent, some of the enterprise customers may also be using SAP as their source. Until it gets to a point where Oracle BI Applications can deliver pre-built ETL adaptors against each of these source systems, the Universal Adaptor becomes a viable choice.

A mixed OLTP system where one of them has pre-built adaptor for and the other doesn’t – is also a scenario that calls for the usage of Universal Adaptors. For instance, the core portion of Human Resources may be in PeopleSoft systems, but the Talent Management portion may be maintained in a non-supported (yet) application such as “Taleo”.

In order for customers to enable pulling in data from non-supported source systems into the Data Warehouse, Oracle BI Applications have created a so called “Universal Adaptor”. The reason this was doable in the first place was the fact that the ETL architecture of Oracle BI Applications had the evident support for this. Oracle BI Applications’ Data Warehouse consists of a huge set of facts, dimensions and aggregate tables. The portion of the ETL that loads to these ‘end’ tables are typically Source Independent (loaded using the folder SILOS, acronym of “Source Independent Load Objects”). These ETL maps/interfaces start from a staging table and load data incrementally into the corresponding end table. Aggregates are created upstream, and have no dependencies to the source system where the data came from. The ETL logic in SDE folder (acronym of “Source Dependent Extract”) that extracts into these staging tables (also called Universal Stage Tables) are the ones that go against a given source system, like EBS or PSFT FUSION and so on. For Universal, they go against a similarly structured CSV file. Take any Adaptor – the universal stage tables are exactly the same, structurally. The grain expectation is also exactly the same for all adaptors. And no wonder, while all these conditions are met, the SILOS logic will load the data (extracted from Universal) from the universal stage tables seamlessly.

Why did Oracle BI Applications decide to source from CSV files? In short, the answer to this is “to complete the end-to-end extract-transform-load story”. We will cover this in a bit more details and what the options are, in the next section.

3. General Implementation Considerations

One myth that implementers have while implementing Universal Adaptors is “Data for the universal staging tables should always be presented to Oracle BI Applications in the required CSV file format”.

If your source data is already present in a relational database, why dump it to CSV files and give it to Oracle BI Applications? You will anyway have to write brand new ETL mappings that read from those relational tables to get to the right grain and right columns. Then why target those to CSV files and then use the Oracle BI Applications Universal Adaptor to read from them and write to the universal staging tables? Why not directly target those custom ETL maps to the universal staging tables? In fact, when your source data is in relational tables, this is the preferred approach.

However, if your source data comes from 3rd party sources which you have outsourced, and probably have agreements with them to send you data files/reports/feeds once in a while, and if that 3rd party source doesn’t allow you to access their relational schema, then probably CSV files is the only alternative. A typical example would be Payroll data. A lot of organizations typically outsource their Payroll to 3rd party companies like ADP systems and so on. In those cases, ask for the data in the same manner that you expect in the Oracle BI Applications CSV files. Another valid example might be SaaS applications which usually do not allow direct database accesses. File is the best option for them.

Also, if your source data lies in IBM mainframe systems, where it is typically easier to write COBOL programs or whatever to extract the data in flat files, presenting CSV files to Oracle BI Applications Universal Adaptor is probably easier. Irrespective of how to populate the universal staging tables (relational sources or CSV sources) five very important points should always be kept in mind:

  • Grain of the universal staging tables are met properly.
  • The uniqueness of records do exists in the (typically) INTEGRATION_ID columns.
  • The mandatory columns are populated the way they should be.
  • The relational constraints are met well while populating data for facts. In other words, the natural keys that you provide in the fact universal staging table, must exist in the dimensions. This is with respect to the FK resolution (dimension keys into the end fact table) topic.
  • Incremental extraction policy is well set up. Some overlap of data is OK, but populating the entire dataset to the universal staging tables will prove to be non-performing.

Note: For the rest of the document, we will assume that you are going the CSV file approach, although re-iterating, it is recommended that if your source data is stored in a relational database you should write your own extract mappings.

4. Know your steps towards a successful implementation of Universal Adaptor

 There are several entry points while implementing a subject area using Universal Adaptors. The base dimension tables and base fact tables have their corresponding CSV files where you should configure the data at the right grain and expectations. Other kinds of tables include ‘Exchange Rate’ and ‘Codes’ (Domains). Exchange Rate (W_EXCH_RATE_G) has its own single CSV file, whereas the Codes table (W_CODE_D) has a CSV file, one per each code category. To get to see all code-names well enough in the dashboards/reports, you should configure all the ‘required’ code CSV files for the subject area in question.

Note: The Codes table has been replaced with Domains architecture in newer releases of BI Applications.

Key points:

  • Start with populating the offering specific common dimension CSV files, like Job, Pay Grade, HR Position, GL Account, etc., depending on your offering (such as HR, or FINANCE etc). It is not necessary that all offerings would have “their set of common dimensions used in all facts within the offering”.
  • Then configure subject area specific dimensions, like Job Requisitions, Recruitment Source etc (when implementing Recruitment) or Learning Grade, Course (when implementing Learning) or Pay Type etc (when implementing Payroll) or Absence Event, Absence Type Reason (when implementing Absence) and so on. These are examples from HR offering. Similarly, it applies for Financials subject areas such as AR or AP etc., where you should consider area specific dimension needs at this time.
  • Then configure related COMMON class dimensions applicable for all, like Internal Organizations (logical/applicable partitions being “Department”, “Legal Entity” , “Business Unit” etc), or Business Locations (logical/applicable partitions being “Employee Locations”, “Inventory Locations” etc.).
  • Consider other shared dimensions and helper dimensions like Status, Transaction Types, and Exchange Rate etc. Special handling of time dimension should be addressed here (such as Fiscal or enterprise calendars etc).
  • Then consider the code dimensions. By this time you are already aware of what all dimensions you are considering to implement, and hence, can save time by configuring the CSVs for only the corresponding code categories. For newer releases, you would configure the domain set up at this point.
  • For facts, start with configuring the primary fact for your offering. Since the dimensions are already configured, the natural key of the dimensions are already known to you and hence should be easy to configure them in the fact. For some offerings, there isn’t a concept of a “primary fact”. If so, go to the next step.
  • Primary fact should be followed by subject area specific facts, like Payroll, Job Requisition Event, Applicant Event, etc (under HR offering) or AR, AP etc (under Financial offering) and so on.
  • Now that all the CSV files for facts, dimensions, and helper tables are populated, you should move your focus towards Domains. For E-Business Suite & PeopleSoft Adaptors, we do mid-stream lookups against preconfigured lookup CSV files. The map between source values/codes to their corresponding domain values/codes come pre-configured in these lookup files. However, for Universal Adaptor, no such lookup files exist. This is because of the fact that we expect that the accurate domain values/codes will be configured “along-with” configuring the base dimension tables where they apply. Since everything is from a CSV file, there is no need to have the overhead of an additional lookup file acting in the middle. Domain value columns begin with “W_” [excluding the system columns like W_INSERT_DT and W_UPDATE_DT] and normally they are mandatory, cannot be nulls, and the value-set cannot be changed or extended. We do relax the extension part on a case by case basis, but in no way, the values can be changed. The recommendation at this stage is that you go to the DMR guide (Data Model Reference Guide), get the list of table-wise domain values, understand the relationships clearly in cases there exists any hierarchical or orthogonal relations, identify the tables where they apply and then their corresponding CSV files, look at the source data and configure the domain values in the same CSV files. Note that if your source data is in a relational database and you have chosen to go the ‘alternate’ route of creating all extract mappings by yourself, the recommendation is to follow what we have done for E-Business Suite Adaptors and PeopleSoft Adaptors and create separate domain value lookup CSV files, and do a mid-stream lookup.
  • Note that the above discussion on Domains has been revamped in newer releases on BI Applications. Now, we only have the CODE columns in our data model and the NAME and DESC columns have been taken away. We still do a mid-stream lookup in EBS/PSFT adaptors to resolve the target domain code against the source domain code, but no longer use lookups to resolve the target domain names and descriptions based on source/target domain codes. Rather, these name/description lookups happen on the fly at the RPD level, where the name or description is brought back to the report depending on the code and language preferred in reports. The domain mapping happens in Configuration Manager now, and no longer in CSV files. You will be taken to the step of configuring or mapping them in Configuration Manager.
  • Last, but not the least, configure the ETL parameters. Read up the documentation for these parameters, understand their expectations, study your own business requirements and then set the values accordingly.

5. Impact of incorrect configurations of domain values

 Domain values constitute a very important foundation for Oracle Business Intelligence Applications. We use this concept heavily all across the board to ‘equalize’ similar aspects from a variety of source systems. The Oracle Business Intelligence Applications provide packaged data warehouse solutions for various source systems such as E-Business Suite, PeopleSoft, Siebel, JD Edwards and so on. We attempt to provide a “source dependent extract” type of a mapping that leads to a “source independent load” type of a mapping, followed by a “post load” (also source independent) type of mapping. With data possibly coming in from a variety of source systems, this equalization is necessary. Moreover, the reporting metadata (OBIEE RPD) is also source independent. The metric calculations are obviously source independent.

The following diagram shows how a worker status code/value is mapped onto a warehouse domain to conform to a single target set of values. The domain is then re-used by any measures that are based on worker status.

Domain values help us to equalize similar aspects or attributes as they come from different source systems. We use these values in our ETL logic, sometimes even as hard-coded filters. We use these values in defining our reporting layer metrics. And hence, not configuring, incorrectly configuring, or changing the values of these domain value columns from what we expect, will lead to unpredictable results. You may have a single source system to implement, but still you have to go through all the steps and configure the domain values based on your source data. Unfortunately, this is small price you pay for going the “buy” approach VS the traditional “build” approach for your data warehouse.

One of the very frequently asked question is “what is the difference between domain value code/name pairs VS the regular code/name pairs that are stored in W_CODE_D (or W_DOMAIN_MEMBER_G and W_DOMAIN_MEMBER_MAP_G along with their corresponding Translation tables in newer versions of BI Applications)”.

If you look at the structure of W_CODE_D table, it appears to be also capable of standardizing code/name pairs to something common. This is correct. However, we wanted to give an extensive freedom to users to be able to do that standardization (not necessarily equalization) of their code/names and possibly use that for cleansing as well. For example, if the source supplied code/name are possibly CA/CALIF or CA/California, you can choose the W_CODE_D approach (using Master Code and Master Map tables – see configuration guide for details) to standardize on CA/CALIFORNIA. The equivalent of the above is the W_DOMAIN_MEMBER_MAP_G table in newer versions of BI Applications.

Now, to explain the difference of domain value code/name pairs Vs the regular source code/name pairs, it is enough if you understand the significance of the domain value concept. To keep it simple, wherever we (Oracle Business Intelligence Applications) felt that we should equalize two similar topics that give us analytic values, metric calculation possibilities etc, we have “promoted” a regular code/name pair to a domain value code/name pair.

If we have a requirement to provide a metric called “Male Headcount”, we can’t do that accurately unless we know which of the headcount is “Male” and which is “Female”. This metric therefore has easy calculation logic: Sum of headcount where sex = Male. Since PeopleSoft can call it “M” and EBS can have “male”, we decided to call it a domain value code/name pair, W_SEX_MF_CODE (available in the employee dimension table). Needless to say, if you didn’t configure your domain value for this column accurately, you won’t get this metric right.

6. Impact of incorrect population of CODE-NAME columns

The Oracle BI Applications mostly use Name and Description columns in the out-of-the-box dashboards and reports. We use Codes only during calculations, wherever required. Therefore, it is obvious that if the names and descriptions didn’t resolve against their codes during the ETL, you will see blank values of attributes (or in some cases, depending on the parameter setting, you might see strings like <Source_Code_Not_Supplied> or <Source_Name_Not_Supplied> and so on). In newer versions of BI Applications where names and descriptions are resolved using on-the-fly RPD lookups and not in the ETL, the reports might show values such as ‘__UNSASSIGNED__’, if not mapped well enough in Configuration Manager.

Another point to keep in mind is that all codes should have distinct name values. If two or more codes have the same name value, at the report level you will see them merged. The metric values may sometimes appear in different lines of the report, because OBIEE Server typically throws in a GROUP BY clause on the lowest attribute (code).

Once implemented, you are free to promote the source code columns from the logical layer to the presentation layer. You might do this when you know your business users are more acquainted to the source code values rather than the name values. But that is a separate business decision. The general behavior is not like that.

7. Best practices for extracting incremental changes

 Although you can choose to supply the entire dataset during incremental runs, for all practical reasons, this is not recommended. Firstly because then the ETL has to process all the records and determine what needs to be applied and what can be rejected. Secondly, the decision ETL takes may not be accurate. ETL decisions are based on the values of the system date columns like CHANGED_ON_DT, AUX1_CHANGED_ON_DT, AUX2_CHANGED_ON_DT, AUX3_CHANGED_ON_DT and AUX4_CHANGED_ON_DT columns only. We do not explicitly compare column-by-column and determine whether an update is required. We believe that if something has changed, probably one of the four date columns must have changed. And in that case, we simply update. If all 5 date columns are same, we pretty much tend to reject. The base of this decision is the correctness of the date columns. If your source system does not track the last updated date column on a record well enough, it becomes your responsibility to force an update, no matter what. An easy way to do this is to set SESSSTARTTIME in one of these columns during extract. This will force to detect a change, and we will end up updating.

No wonder, this is not be the best idea. By all means, you should provide the true “delta” data set during every incremental run. A small amount of overlap is acceptable, especially when you deal with flat files. Our generally accepted rules for facts or large dimensions are either:

· Customer does their own version of “persisted staging” so they can determine changes at the earliest opportunity and only load changes into universal staging tables

· If absolutely impossible to determine the “delta” or to go the “persistent staging” route, Customer only does full load. Otherwise doing a full extract every time and processing incrementally will take longer.

Follow the principles below to decide on your incremental strategy:

  • (Applies to relational table sources) Does your source system capture last update date/time accurately in the source record that change? If so, extracting based on this column would be the best idea. Now, your extract mapping may have used 2 or 3 different source tables. Decide which one is primary and which ones are secondary. The last update date on the primary table goes to the CHANGED_ON_DT column in the stage table. The same from the other two tables go to one of the auxiliary changed on date column in the stage table. If you design your extract mapping this way, you are almost done. Just make sure you add the filter criteria “primary_table.last_update_date >= LAST_EXTRACT_DATE parameter”. The value of this parameter is usually maintained within the ETL orchestration layer.
  • (Applies to CSV file sources) Assuming there is a mechanism which you can trust that gives you the delta file during each incremental load; does the ‘delta’ file come with a changed value of system dates? If yes, you’re OK. But if not, then you should add an extra piece of logic in the out of the box SDE_Universal** mappings that sets SESSSTARTTIME to one of the system date columns. This will force an update (when possible) no matter what.
  • (Applies to CSV file sources) If there are no mechanisms to easily give your delta file during incremental, and it seems easier to get a complete ‘dump’ every time, you have actually couple of choices:

a. Pass the whole file in every load, but run true incremental loads. Note that this is not an option for large dimensions or facts.

b. Pass the whole file each time and run full load always.

c. Do something at the back-office to process the files and produce the ‘delta’ file yourselves.

The choices (a) and (b) may sound a bad idea, but we’ve seen it to be a worthwhile solution compared to (c), if the source data volume is not very high. For an HR Analytics implementation (as an example), this could be OK as long as your employee strength is no more than 5000 and you have no more than 5 years of data. The benchmark might be different for a Financial Analytics implementation.

The choice (c) is more involved but produces best results. The idea is simple. You store the last image of the full file that your source gave you [call ‘A’]. You get your new full file today [call ‘B’]. Compare A & B. There are quite a few data-diff software available in the market, or better if you could write a Perl or python script on your own. The result of this script should be a new delta file [call ‘C’] that has the lines copied from B that has changed as compared to A. Use this new file C as your delta data for your incremental runs. Also, discard A and rename B as A, thereby getting ready for the next batch.

Having said that, it is worthwhile to re-iterate that the “Persisted Staging” is a better approach as it is simpler and uses the ETL logic to do the comparison. Oracle BI Applications have used this technique in HR adaptors for E-Business Suite and PeopleSoft, in case you wanted to refer to them.

If there are other options not considered here, by all means, try them out. This list is not comprehensive, it is rather indicative.


Oracle BI applications blog


« November 2015