By Jeffrey Mcdaniel-Oracle on Nov 19, 2015
The questions below are aimed at larger installations but can also be applicable to single data source installations. The purpose of thinking about these areas before installing and implementing is
to prepare your environment for expansion, to prevent running into scenarios where you are capturing data you don't require, and to prevent creating an environment that is not suitable to your
business needs and may need to be rebuilt.
#1- How many data sources do I plan to have?
Even if you don't know for sure but think you might have 2 to 3 (for example), plan for it. Better to overestimate than underestimate. Overestimating by a few will leave those slots partitioned and available for future usage. Partitioning is required if using multiple data sources.
#2- Partitioning ranges
This can effect performance. It is not recommended setting a large partitioning range (for example - 6 months) if you are capturing a large amount of lower level historical data (SCD's and Activity History). Partitioning is designed to help with performance by querying the data in that partition rather than querying all data if it can help it. Setting partitions equal to 1 month might be
better for a specific environment based on size and historical capturing frequency.
#3- Activity Daily History
How many projects have this on option set? This can heavily effect performance and storage. It is recommended that this is only on for a smaller percentage of projects and then turned off when the need for this level of detail has passed. This has other cascading effects outside of just history capture. This will also turn on slowly changing dimensions for any data related to the project. Slowly changing dimensions will capture any change on objects that support SCD's. By default Activity History is not on. It is recommended to opt in a few projects at a time based on requirements.
#4- What is the P6 Extended Schema data range?
If P6 is your primary data source what is the date range set on the project publication. A large date range represents a larger set of spread data that would need to be pulled over.
Generally this range is 1 year in the past and 2 years in the future. What this means is you will have daily spread rows for each day for each activity and resource assignment spread bucket for each activity in each project that is opted in. That can become a large amount of rows quickly. If you have units or costs outside this range that data is not lost and it will come over to Analytics however
you just won't have a daily bucket for that data. The data outside of the range is lumped onto the beginning or end of the date range. This way the data is still represented in the totals.
#5- What are my data requirements, how often do I need my data refreshed and at what level of granularity?
Decisions based on these questions will shape your entire Analytics environment and Primavera
ecosystem. For example if you have 10 data sources, do you need to report at Activity level and have those Analysis' up to date every day?
Do you also require global reporting for all 10 data sources but those reports are only run weekly?
A scenario like this would lend itself to each data source having their own individual STAR with it's own ETL and then a master environment created to pull from all
10 data sources on a weekly interval for global reporting. The advantages of this are that each individual data source can then get an ETL run in each day. If all 10 data sources per pumping into the same STAR and each ETL took 2 hoursto run it would take 20 total hours before ETL #1 could be run again. Having this global version (while still pulling out of the same data sources) allows for individual sites to have their own ETL rules while still having a consolidated global data warehouse which is updated once a week for each ETL.