Rather than entertain you all with my own theories I decided to be a literature critic and discuss a paper / survey done by TDWI's Philip Russom. The report goes into "what a next generation DW platform should try to address and what it could look like". You can read an Oracle sponsored copy here. Do note that the research was sponsored by a large number of vendors - not just Oracle - and that this is just our way of getting a complementary copy to you...
I think there are a couple of interesting outcomes from the report and I'd like to summarize some of it in the following points:
As a true critic, I of course have to say something sour about the report... my sour grape about this report is the question as to whether you would like a DBMS built for transactions as your DW database... Not quite sure what a survey on DW workloads expects to get as an answer. It did surprise me that the DBMS built for DW has flat growth and moderate to good commitment. My conclusion is that this is not necessarily a great discussion to have, as neither seem to be interesting to too many people. I think people focus on finding a platform that qualifies by solving their pain points and/or the above points.
And then of course, to keep on the sour face, the fun discussion on MPP. I think that everyone agrees that large operational data stores (3NF like schemas in many cases) benefit from parallel processing capacities. What we are seeing is that more and more people need more than just full table scans on these schemas. In light of the above trends (and there is more on those in the report) people require multiple workloads on these systems, read consistency without readers blocking writers etc. In other words, a next generation platform needs to deal with both massively parallel queries and with single transaction level reads and writes.
A discussion around MPP versus SMP versus OLTP DB versus DW DB is not very useful I would think. It should be on the issues and merits of a DBMS and platform not on a theoretical argument. And I think most of the report does bring up the right points you should be thinking about while designing what the platform should be doing. I just think that people should keep in mind that theory is not always the best lead argument. Keith has written a long post on a related topic. One other thing, many new DBMS systems have sprung into the DW market, many just featuring a single core feature (compression, columnar storage, in-memory etc.). For a critical system that sees mixed workloads and lots of data, these point solutions may not be the right thing to base an architecture on...
I was also surprised by the Data Quality entry in Good Growth - Good Commitment. While I think that more and more people are paying attention, I still see this as the orphan in the DW environment. Lots of talk, not much action. I just wish it was more action and less talk, because, come on, this is really a problem! So I hope the report is spot on there, but I'm a skeptic (nothing to blame the report for, just me being in this business not seeing too much happening in DQ).
Ok, be done now with the sulking :-)
Couple of things around some new trends and things in Oracle that would help with building an infrastructure for the next generation data warehouse:
So, rather than finding point solutions for all of these challenges as one of the conclusions shows, maybe a solution would be to choose a platform that actually allows you to do what you need from a single vendor... If you are really thinking operational BI, real time DW it is hard to imagine how you are going to do this with multiple DBMS solutions. If the analytics run on a secondary system, you have just created a delay as you now need to update the EDW and then this analytics system. This simply means you cannot actually do real time data warehousing... Loading just to the analytics system means that you are back into data silos.
Concluding, I think most conclusions of the report are spot on. Think Big Data, understand the business drivers, plan for your systems etc. Not so sure a set of separate point solutions is what you need if you are going to do real time data warehousing...
Anyways, do read the report! I think it gives you good food for thought and a great insight in the trends you will be dealing with in your near future.