By Saurabh Verma on May 14, 2013
Authored By : Don Co Seng
Oracle BI Applications (BIAPPS) 184.108.40.206.1 was recently released and is now available for download. One of the biggest features in this release is the introduction of Oracle Data Integrator (ODI) as the embedded ETL tool. With the introduction of ODI, a new method of orchestrating the task of loading your data warehouse is now required. This requirement brings about the introduction of Load Plan Generator (LPG).
LPG is a utility for generating ODI load plans for a desired subset of fact tables to be populated into BIAPPS Data Warehouse against one or more source systems. An ODI load plan is an executable object in ODI allowing you to organize tasks based on pre-defined order on the basis of the fact tables being loaded. LPG is invoked from Configuration Manager (CM) and makes use of metadata stored in CM and ODI repository. There is no separate repository required for LPG. This results in significantly lower metadata development and maintenance costs as LPG uses same metadata as in the ETL tool repository.
Before we get to the inner workings of LPG, let’s take a quick look at a few key concepts that drive the LPG behavior.
BI Apps taxonomy
At the top of the BIAPPS taxonomy are the different offerings available such as Financial Analytics, HR, CRM, etc. Under the offerings are the functional areas such as accounts payable in finance or payroll in HR. Below the functional areas are the different fact groups like “AP Transactions and Balance” or “Payroll Balance”. For each fact group, there are dimension groups associated with it. A dimension group can be specific to a particular fact group or shared across different fact group. The BI apps taxonomy drives both load plan generation as well as the setup flows in Functional Setup manager.
Offering (contains) -> Functional Areas (contains) -> Fact Groups (associated to) -> Dimension Groups
As for BIAPPS load phases, it is similar to that of previous BIAPPS releases. There are 3 main phases: Source Data Extract (SDE), Source Independent Load (SIL), and Post Load Process (PLP). The SDE phase consists of tasks extracting data from your different source systems. This phase loads all your staging tables and requires source system downtime. The SIL phase loads data from your staging tables into their respective dimension or fact tables. The PLP phase loads data into aggregate tables or some other facts requiring additional data processing. The SIL and PLP phase requires data warehouse downtime to complete the load process.
With BI apps 220.127.116.11.1, due to clear separation of SDE and SIL/PLP phases it’s possible to control and optimize the source system / warehouse downtime to the corresponding phases.
Design Time vs. Runtime dependencies
The LPG algorithm deals with the design time dependencies and run time dependencies separately. Any intra-entity specific design time dependencies are seeded in the repository. For example if a dimension has multiple steps to load the target table in SIL phase, these steps are seeded once in the ODI repository as they are known at design time and never change. The run time dependencies i.e. association of dimension to fact, or association of a dimension or fact to corresponding staging tables on the basis of sources are calculated by the LPG engine.
This approach reduces the number of iterations and the metadata required to get task ordering correct at the entire graph level as the design time dependencies are always consistently honored. LPG behavior is deterministic and does not require multiple iterations to get your entire graph behave correctly.
Load Plan Generator
Now, let’s take a closer look at how LPG works. At the core of LPG are load plan components. The load plan components capture the design time dependencies. There is a load plan component per dimension or fact per phase in general. Load plan components are used as building blocks by LPG to generate a complete load plan. Load plan components are further classified into two categories:
Development components are defined at the grain of fact groups or dimension groups as described earlier. Each component contains one or more references to actual ODI scenarios. Each development component requires all steps for loading a particular dimension or fact group. The order of scenarios is pre-defined during development. Most of the development components are for a specific dimension or fact group. However, there are also those defined for supporting tables like persistent staging or general tables.
System components are defined to capture the phases and ordering of phases. The system components are mostly static, since BIAPPS load phases rarely change. For special cases, the dependencies across dimension groups and fact groups are defined here. Also, inclusion of support tables such as persistent staging table is controlled here. System components contain references to development components.
Generating a Load Plan
As mentioned earlier, LPG is invoked from CM. Generating a load plan is a two step process. First, a user creates a load plan definition using CM. A load plan definition is mainly a list of one or more fact groups per source system which the user desires to load. The user then invokes LPG to generate a load plan based on this load plan definition. When complete, the generated load plan is stored back in a load plan folder in ODI repository.
Load Plan generation logic
So, how does LPG figure out which dimension or fact group component to include in the load plan to be generated? LPG generates a load plan utilizing the following information:
1. Fact tables belonging to selected fact groups
o LPG determines the fact tables and fact group association via an ODI flex field at the fact table level in the ODI data store.
2. Dimensions dependencies to Facts
o The main source of dependency information between dimensions and facts are the foreign key constraints between them. Dimension to dimension dependency (snow-flake) is also supported up to the 2nd level.
3. Staging tables related to Facts and Dimensions
o Staging and warehouse tables are resolved via BIAPPS data model naming standard. For example XXX_DS is the staging table for XXX_D dimension table.
4. Related PS or TMP tables used as source in scenarios
o PS and TMP dependencies are resolved based on which are used as source or lookup table in scenarios required for the dimension or fact group.
5. Keywords in load plan steps for domains and class dimensions
o An additional resolution is through the use of keywords in load plan steps which are used mainly for resolving steps within class dimensions or domains. These are generic tables used across facts / dimensions and the keywords help associate as subset task for these tables to specific facts or dimensions.
For table maintenance, additional steps are added in the generated load plan to perform truncate, index management, and table analysis based on pre-defined settings and table types. We will cover this in more detail in a follow up post.
For multi-source support, the user can create a load plan definition
containing one or more source system for a fact particular fact group. There are 3 sequencing options in this
release for multi-source support:
1. Extract-Load, Extract-Load, Extract-Load (EL, EL, EL) – LPG generates separate load plans containing all phases for each source system. Each load plan is executed serially of each other.
2. Extract, Extract, Extract, Load (E,E,E,L) – LPG generates multiple SDE only load plan for each source and a single SIL/PLP load plan. Each SDE only load plan will be sourcing from a particular adaptor. The SIL/PLP load plan should be executed after all of the SDE load plans. Each load plan is executed serially of each other.
3. Extract-Extract-Extract-Load (E-E-E-L) – LPG generates a single load plan containing multiple SDE phases and one SIL phase. This load plan will simultaneously extract data from different source systems. After the SDE phase completes, the SIL followed by PLP phase ensues.
For more details and information on LPG, please refer to documentation available in OTN.