X

Enabling Real-Time Analytics With Database In-Memory

Star Schema Performance Challenge - Part 1

Over the last 20 years or so I’ve had the opportunity to help build a variety business intelligence solutions for large organizations. It will be no surprise that one of the core requirements is fast query response times. This
requirement for fast response times seems pretty obvious – faster response times lead to greater end user satisfaction. The
performance of a database can also determine what types of applications are practical. Breakthroughs in performance
often lead to innovating new applications that can change how people work. A more efficient application is usually less
expense to operate since it might not require as large of a server.

Oracle Database In-Memory and the availability of more powerful servers
with many cores and plenty of memory really is a game changer, when it comes to improving query response times. With Database In-Memory far fewer CPU cycles are used to scan, filter, join and aggregate data. With modern servers, there are plenty of CPU cycles available for processing and  DRAM is relatively inexpensive. So, Database In-Memory can do more with less and, since Database In-Memory is a feature of the Oracle Database that works with all of the others performance enhancing features of the Oracle Database, it has become yet another tool in my big tool box of performance improving techniques.

Recently, I was presented with a challenge. Could I provide extremely
fast query response times for a large number of concurrent users running
ad-hoc business intelligence workload on a database running on a single
server? More specifically, at least 100 concurrent users querying a 500
million row table with an average response time of just a few seconds.

This is the first post in a series where I will walk you through my solution to this performance challenge put before me.  I’m actively working on this challenge as I
write this first post, so I’m not exactly sure how
the numbers will work out, but I’m pretty confident I’ll be successful. And you can be sure that Database In-Memory
will at the core of the solution!

I have
a plan in mind, but perhaps some of you will have some suggestions. Feel free to let me know via the comment section below if you do.

Here’s what I know today:

  • I have a star schema with
    a 500 million row fact  table and 9 dimension tables. There is 5 years of data in the Sales fact table(by day),
    about 3,000 items in a product dimension table, about 100,000 customers, a
    handful of distribution channels and some demographic attributes such as
    income and age.

Star Schema


  • There are at least 100 users
    actively querying this data set at any given time. They are an active group of users,
    issuing a query on average once every 5 seconds.
  • The workload is
    diverse. There are business
    intelligence dashboard type queries (often high level aggregations), slice
    and dice ad-hoc queries and anything in-between. Queries can use any subset or all of the
    dimension tables. In total, over 50,000
    queries will be run.
  • I can do anything I want
    to optimize the application. I can
    specify how data is organized and physically stored, how the queries are
    formed and what features of the Oracle Database I use. I must, however, use features that are
    available to and easily implemented by anyone.
  • I will be running a single database instance on a Sun
    X4-4
    with 60 cores, 1 TB of DRAM and spinning disks.

Sun X4-4

I’m going to step through various designs and optimizations,
documenting the improvements along the way. I’m going to start with a baseline using only features that are
available in Oracle 11.2. That, of
course, means the baseline will not use Database In-Memory.

Since I can organize the data and write the queries any way
I want, I’m going to use a prescriptive approach to this challenge. My implementation will be designed to be CPU
efficient (that is use a few CPU cycles as possible) and avoid IO and other
wait events. I will decide what SQL
execution plan I want first and design the implementation around that. This is usually a much better approach that trying
to compensate for in efficient schema or queries that do not result in the best
plan.

Let me know if you have thoughts. In my next post, I’ll share my starting point
and baseline results.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.