Tuesday Apr 22, 2014

ETL Scheduling in a Multiple Data Source Environment

A multiple data source environment is where the Star schema is populated by either multiple P6 instances or a single P6 instance has been split into unique sources of data.  You could split a single P6 instance based on a specific criteria. Maybe you want a group of projects under an EPS to be updated in Star only on Friday's, but another group of projects need to be updated daily.  You could split into separate ETL's.  See previous blogs for more information on filtering and multiple data sources.

For this blog we are going to cover how the ETL's are executed. If you have two data sources, you need to run two separate ETL processes at different times.  ETL #1 must be run first and complete before ETL #2 can be started.  You do NOT want to allow both ETL processes to be executed at the same time.  This can accomplished with a batch process or another queueing mechanism to make sure ETL #1 completes then execute ETL #2. 

If ETL's were to be run at the same time you could see some data issues because they share staging tables.  While the data in the facts and dimensions is contained in rows that are unique to the data source the staging tables are not.  This data could be clobbered if both ETL's were running at the same time then that clobbered data may be pulled into the rows for an existing data source. 

To help control this problem a new web configuration utility was created in P6 Reporting Database and P6 Analytics 3.3.  Now there is a queuing mechanism to prevent ETL's from running at the same time.

You can setup separate tabs for each ETL.  Define the schedule for each ETL.  They will then queue up and be displayed on the home tab where the running and queued ETL's will show.  They can also be Stopped or Removed from the queue. The main take away is for multiple data source environments the ETL's are sequential not parallel.  

Monday Apr 07, 2014

Handling Codes and UDFs in a Multiple Data Source Environment

In a single data source environment codes and udfs are quiet easy.  Log into the configuration utility.  Select your codes and udfs, run your ETL process, and as long as you don't exceed the default number of columns in your RPD the codes and udfs show in OBI.  In the configuration utility the list of codes and udfs that are presented to you is populated by reading directly from the P6 Extended Schema you provided selection information for.  When you make your selections this list is written to your .properties file in the <rdb installation>\res directory.  During the ETL process these selections are read from the .properties file and inserted into Staging tables and eventually Dimensional tables in the STAR schema.  

One thing to note about Staging tables is there is only one set. In a multiple data source environment Staging tables are shared. Meaning during each ETL run they are overwritten. This is the main reason why in a multiple data source environment ETL processes can not be run at the same time.  One ETL process has to finish before the next can be run. Which leads us to how to handle codes and udfs in a multiple data source environment. Say you make your codes selection from data source 1, run your ETL process.  Now make your code selection from data source 2, and run the ETL process.  The selection you made for data source 2 is now your final selection.  The values and order you choose from data source 1 has been overwritten. 

To accommodate multiple data sources for codes and udfs requires a little coordination. Let's say you have 10 codes from your 1st data source and 10 codes from your 2nd data source that you would like to be used in P6 Analytics in OBI. Run the configuration utility for data source 1, choose your 10 codes using slots 1-10. Go to your \res folder and make a copy of the staretl.properties and save it to a different location (ds1staretl.properties). We're going to come back later to use this staretl.properties file for data source 1. Now for data source 2 lets go ahead and log into the configuration utility. Choose your codes, 1-10 don't really matter but choose codes 11-20 as the ones you want to represent this data source in OBI. Make a copy of this staretl.properties and save in a different location (ds2staretl.properties). Your are saving these backups so you can easily rebuild the codes list at a later time if runSubstitution or configuration utility are executed.  

Now you have 2 properties files. These 2 files contain all the codes you are going to use we just need to combine them into 1 file now. Go to data source 1 properties file, change the lines for codes 11-20 to represent the codes 11-20 from the properties file from data source 2. You can find code 11 in data source 2 properties file, copy the three lines and paste it where code 11 is in data source 1's properties file. Or if there is no code 11 just add it below the other codes. Do this for the rest of the codes in data source 2 until in the data source 1 properties file you have all the codes you want from data source 2 in this properties file. 

Now copy the whole codes section from this data source 1 properties file and overwrite the codes section in data source 2.  You will do this for each type of code (project, activity, resource). This section together is the 'master copy' of your codes. Do not run the configuration utility again on either data source 1 or 2 again after you have put this list together or it will overwrite it. If this 'master copy' is overwritten you will need to coordinate and create it again. So be careful when and where the configuration utility is run after this point.  

If you have a scenario where you have the same codes in data source 1 and data source 2 then this process is much easier. You would need to make the 'master copy' in just one of the data sources and then add it to the other properties files.  If the data sources are truly unique and there are codes that don't exist in one of the other data sources you must follow the steps above.Do not worry about when the ETL process runs for data source 1 that it might not have the codes for data source 2 that are now defined in the properties file. This is adding it into the staging tables and mappings for OBI to use for the data associated with data source 2. When you try to add this code along with data from data source 1 it will not return any results just like any other project or activity that is not associated with a chosen code. 


Provide new information on Primavera Analytics and Data Warehouse


« April 2014 »