Data Warehouse Vendor Comparison
By Jean-Pierre Dijcks on Apr 29, 2009
Spent some time looking at various vendors and their offerings and tried to compare it with the HP Oracle Database Machine offering from a customer perspective... In other words, what would I be looking at if I was looking at my data warehouse infrastructure.
The following is an excerpt from a new white paper now posted here:
Criteria to look at for a data warehouse platform
Agility – Can your system deal with changing requirements, with mixed workloads that include loading data while querying, and returning the right answers? Can you easily switch to real-time data loading for some of your data and support operational needs in your business?
Enterprise Readiness – Does your infrastructure provide the functionality to always keep your business running. Is the infrastructure delivering the security you need, in terms of who can see what data, but also in terms of disaster recovery, and fraudulent manipulation of data? Is your system running when you need it, and does the infrastructure deliver maximum availability to run your business 24*7?
Appliances and How They Stack Up Against Our System Criteria
Most, if not all, of these new, small vendors are using proprietary hardware platforms using massive hardware to deal with the data growth. As a business tactic they then benchmark a single query as proof of their superiority over the current data warehouse database. While claims of 100 times faster sound impressive, it is not actually that useful to run such a “benchmark”.
First of all, the benchmark is not comparing apples to apples. Often an old system, typically undersized or out of balance, is compared with the latest and very much over-estimated appliance hardware for this benchmark.
Second, the response times on the incumbent system are almost always with full user and query loads. E.g. hundreds of users querying on the system while the trickle feed ETL process is running. The so-called super fast appliance runs only a single query with only a single user on static data.
Third, most of the current appliances are really one-trick ponies. The trick is to use massive hardware resources to solve a problem with brute force. Long-running data warehouse queries will benefit from this approach, hence the benchmark results. The real problems come into play when the appliance has to ensure that the data is secure, or when a mixed workload has to be run. Now, all of a sudden the appliance starts to run into problems. Write processes create dirty data polluting queries with incorrect results. Locks prevent users from completing their query, or even from starting one. As more and more users come on-line, contention for resources starts to become an issue, quickly reducing the effectiveness of the system.
What is needed for these workloads is a full solution consisting of the brainy database software on top of brawny hardware, not just a lot of hardware.
Ease of Deployment
Another angle often claimed as an advantage for an appliance is the ease of deployment. As a customer you purchase a completely configured hardware and software solution, you wheel it into the data center and you start working on it. While this is probably true for most of the appliances to some extent, what comes next is not so pretty.
Your system administrators and the DBAs will all of a sudden have to deal with different hardware, different database software and maybe even a different operating system than the ones they use for all your other systems. As the novelty wears off, these appliances soon become orphans of the data center.
With data centers becoming leaner, these orphans will require a set of new skills and specialists to maintain them. Probably not what your cost-conscious CIO wants to hear.
Cost of Upgrading
While it may be easy to deploy an appliance in your data center initially, once it is time to upgrade – due to that ever present data growth – you may get stuck with an interesting invoice. Most appliances do not come with perpetual licenses. Most appliances however do require a forklift upgrade, which means that you will need to buy new licenses for this new machine. So you do not just end up buying new hardware, you also re-purchase the software you already own.
How Viable is The Company
Data warehouse appliance vendors have been popping up like mushrooms. A lot of these vendors are very small with few real customers. Research and Development, support capabilities and overall viability are in a different league from market leaders like Oracle. In today’s economy, having invested a multi-million dollar sum in one of these orphans of the data center runs the serious risk of seeing your critical system running on un-supported hardware and software.
When comparing [the] unique capabilities of Oracle Database 11g and the HP Oracle Database Machine with some of the vendors in the data warehouse market [see graph] we see a distinct segmentation. The so-called general-purpose database offerings from Oracle and others deliver a much more universal value proposition. Appliances like Netezza really are one-dimensional offerings and even in that claimed specialty fall short of the market leaders.
As I said, to read more about all of this take a look at the paper on OTN.