A follow up on "why does ETL take so long"
By Jean-Pierre Dijcks on Jul 27, 2009
I got this comment from Oliver on the ETL posting: "in my opinion coding or modelling the etl-process is not the most time consuming part of building a data warehouse. The most time is spent on consolidating the dimensions and business definitions across the different source systems and company divisions. Also agreeing on a common definition of the KPIs takes time."
I think this is a very true statement, and it kind of puts the actual building of the technical infrastructure into perspective. As an example, I have been on a project where it took the company over a year to define the actual product hierarchy, and even then there was still massive debate on the outcome.
That said, let's put my comments into perspective. What I was referring to was a three phase/step process to build the DW. First step is the schema design, then we do the ETL and data loading and lastly we do the BI design and build the reports. From the DW side of the house, the actual ETL typically takes the longest to complete. That is what the posting was about...
Now, back to Oliver's thoughts.
There are a couple of things of extreme importance when building a DW:
- Solve a real business problem
- Get executive and business sponsorship
- Come to well defined business definitions (what does revenue mean, what does the product hierarchy look like, what KPIs are you measuring and when are we doing ok, etc...)
If you find that real business problem and you secure the right sponsorship the hard work is just beginning. Most people will realize there is a problem, but not many will understand the solution. Or they get sidetracked with minor details and thus focus on the wrong things.
That is where it makes sense to use prototypes or agile development methods to start to frame the answer for the business users. Use their own data in a simple reporting environment. Keep them focused on the information, not the color scheme or the report layout. Then work your way through the set of definitions and refine the data extraction rules as you go through that data set.
I think that will help define certain parts of the dimensions and business definitions. Seeing is in many cases believing (or discovering issues). It will at least foster understanding and give the users ideas as to what is possible.
With this start it is hopefully easier and faster to come to consensus on business definitions. But keep in mind it is easier and faster, not easy and fast!
Thanks for the comment Oliver!