Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Increase Performance while Reducing Cost

Jean-Pierre Dijcks
Master Product Manager

Exciting times! Well yes the performance piece here is cool and reducing cost is obviously something exciting, but the most exciting thing to happen is that Sinterklaas is about to come and bring us cool presents... December 6th is his birthday and all Dutch folks get nice gifts from him on the evening before his birthday... boy am I glad I'm Dutch :-)

Anywho, you don't want to read about Dutch folklore but about serious matters around performance and reducing cost.

As part of the German Oracle User Group (DOAG) conference in Nuremberg I did spend some time debating a presentation about this exact topic. This post is something that will summarize some of those thoughts. The presentation is actually titled "Top 5 tips on reducing storage cost while improving performance".

The tips are reasonably obvious, so I will just list them and then kind of pick out the interesting pieces:

  1. Think, plan and design
  2. Get appropriate hardware
  3. Design tiered storage
  4. Partition data into smaller chunks
  5. Compress the data

Like I said, these are fairly obvious points, but there is some thought behind this. Step one is something always on the list of course but with the newest hardware - and particularly the price points of components - it will pay of to really think about leveraging new components such as Solid State Disks and most certainly Flash technology.

The below picture shows a little bit of what I mean:

Upward and Downward ILM

In essence what this is saying is that typically the cost is higher for faster storage media (not surprising). However those higher speed media costs are increasingly creeping down making them more feasible. The picture also introduces two variations of Information Life-cycle Management, I call them upward ILM and downward ILM.

Downward ILM is traditional data management where older data is placed on lower cost, slower media as the access to this data is infrequent and a little longer response times don't hurt anyone.

Upward ILM is the opposite. Take high value data, accessed a lot and place it on either faster disks but more interestingly on faster media. Like Flash, or in memory. This will get everyone extreme fast query access to this highly relevant data.

That combination of managing data not just from a cost perspective is really what is encapsulated within those first three points in my top 5 tips. Get hardware that allows you to plan for and execute both Upward and Downward ILM, make sure you have that platform. Then tier the storage so you have appropriate performance characteristics across these storage tiers. Plan for all of this from the start (and create budgets and value propositions etc).

The lesson learned, or message taken from this is really: understand the storage technology and hardware available and leverage this accordingly.

The trick to get this done today is to balance all of this in a manner that delivers performance and still keeps the system to buy reasonably priced. Storing all data in SSD's or Flash is going to be really fast and really, really expense if you choose all SSD. First of all, you will introduce sticker shock and instantly create antagonism against your proposal, secondly, you really do not need that performance for all of your data. So balance this out a little and create a data distribution that leverages the hardware, gets great performance and keeps prices in line with the business value delivered.

To understand this a little better consider the following example (disclaimer - purely an illustration of a concept, no attempt to tell you that this is the case in your environment):

data distribution over media

This is stating that most of your queries only access a small amount of your total data volume. This may be 60% or 40% or some other number, but a typical system will look a lot at current (for example 1 - 7 day old) data.

This would mean that you have all of these media in a setup somehow and manage this.

Software, and this gets us to tip 4 and 5 allows us to change these numbers and create systems that are less error prone (what happens if your analysis is not correct and people need to wait for a small data set stored on slow storage) and faster for a good price.

Partitioning data is traditionally a method to speed up the scanning of data in queries (and there are more usages). In its simplest form this means that if I partition data and therefore scan a lot less data to satisfy my query, I can use slower disk and still get adequate performance. BTW, partitioning means Oracle Partitioning but also any other techniques that improve scan rates. Oracle provides things like Storage Indexes and Smart Scans in Exadata which I think are part of the partitioning scheme.

The long and short of it is that partitioning is a downward mobility technology, meaning it allows you to use cheaper and slower media but still achieve good performance.

Compression is often seen in the same light. I compress data, this means I can scan it faster, so there you go.

However I would argue that compression is actually upward mobility. If 5% of my capacity is in flash and in memory due to cost considerations, I may not be able to load all data for my 60% of queries into the flash and memory spaces. Compression however allows me to - at least with Oracle - put a lot (10x) more data onto those media. That gives me both a huge cost advantage in that I do not need to expand the flash cost and it gives me an incredible speed up by using flash for a lot more queries than I could before applying compression.

Better leverage high performance tiers

The picture above is trying to illustrate this concept. Without spending more on hardware you now have a lot more data available for direct access and high speed response times. This can mean that you can put all data online, but more importantly it can mean that all hot data is in flash and memory and you get ultra fast retrieval rates.

The latest developments in hardware, cost and performance combined with the software advances allow us to really reconsider data warehouse storage and performance characteristics. My conclusion would be that it is important to keep track of these new media and start planning on using them. Apply compression and partitioning schemes to move more data into these high performance tiers and get large performance gains at equal hardware costs.

In this case it pays to play around with what is available and define a strategy around the hardware, the software and the business needs. This way you will be able to deliver better performance at lower overall cost to the business.

Join the discussion

Comments ( 3 )
  • Gary Sunday, December 6, 2009
    "The long and short of it is that partitioning is a downward mobility technology"..."However I would argue that compression is actually upward mobility"
    But that sort of implies they work against or in opposition to each other, whereas there's no reason they can't work together. I agree with compression allowing upward mobility, so I guess my issue issue is with tying partitioning to downward mobility. Maybe it is more 'selective' or 'prioritised' mobility ?
  • jean-pierre.dijcks Monday, December 7, 2009
    Hi Gary,
    You are right, partitioning will also help with upward mobility. You can pin partitions (the hot ones of course) into these higher speed regions. And yes it will work in conjunction with compression. Actually, both work together in traditional ILM (downward) and upwards. They are complementary.
    I guess that in my quest to find some upward force I zoomed right by the partitioning effect. So good point and thanks for commenting!
  • Jean-Pierre Wednesday, December 9, 2009
    Just noticed this new white paper on the same topic by Mark Townsend:
    It goes into quite a bit of detail, especially on the Automatic Storage Management (ASM) side and on the compression side. Good read!
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.