Workload Management for Operational Data Warehousing

After spending some time on discussing some of the new parallel features (like AutoDOP and Statement Queuing) it is about time to put these features in a larger context. That context is managing diverse workloads with varying degrees of parallelism for various actions on the data.

A lot can be said about setting up workload management, so expect this to be one of the posts in a series on this topic. As it is the first post, let's start from the beginning, with understanding workloads and a framework of setting up such a management infrastructure.

Continuous Improvements

Workload management - the understanding and management of a set of diverse workloads on a system - is really an ecosystem with many participants. It is ever-changing and therefore is one of these things in life that will always be in motion. As the workload changes, or the environment in which the workload runs, adjustments will be required to ensure everything runs smoothly.

circle_for_wlm

At a high level, the cycle of continuous improvements begins with the definition of a workload plan. That definition should be based on a clear understanding of the actual workloads running on this system (more later on some of the required questions). That will be tested when the workloads are running on the system, and your main task is to monitor and adjust the workload. Adjusting - if all goes well and your plan is reasonable - is mostly required in the beginning when fine tuning of the plan is done.

Once the system stabilizes and all small exceptions to the plan are corrected your main task is to monitor. This whole cycle will repeat itself upon changes to the workloads or to the system. It is crucial that major changes are planned for and not just resolved in an adjustment.

Planning your Solution

To start creating effective workload management solutions it is crucial to understand the phases in above shown picture.

Understand the Workload

To understand the workload for your given system you will need to gather information on the following main points:

Who is doing the work? - Which users are running workloads, which applications are running workloads?

What types of work are done on the system? - Are these workloads batch, ad-hoc, resource intensive (which resources) and are these mixed or separated in some form?

When are certain types being done? - Are there different workloads during different times of the day, are there different priorities during different time windows?

Where are performance problem areas? - Are there any specific issues in today's workload, why are these problems there, what is being done to fix these?

What are the priorities, and do they change during a time window? - Which of the various workloads are most important and when?

Are there priority conflicts? - And if so, who is going to make sure the right decisions are made on the real priorities?

Understanding the workload is a crucial phase! If your understanding is incorrect, your plans are incorrect and you will see issues popping up during the initial running of the workload. Poorly understood workloads might even drive you back to square zero and cause a lot of issues when a system (like an operational DW) is mission critical.

Creating and Implementing the Plan

Now that you know (in detail) the characteristics of your workload you can start to document it all and then implement this plan. It is recommended to document the details and reasoning for the decisions (as with all systems). Out of the documented plan you would create (already in Oracle speak, more later on the products):

Create the required Resource Plans:

For example: Nighttime vs. daytime, online vs. offline

Create the resource groups:

Map to users, application context or other characteristics
Map based on estimated execution time
Etc

Set the overall priorities:

Which resource group gets most resources (per plan/window) for IO, CPU, Parallel Processing
Cap max utilizations that these sessions can use on the system
Etc

Create thresholds:

Estimated execution times to determine in which group to run
Reject sessions if too much time, CPU or IO is required (both estimated and actual)
Downgrade (or upgrade) based on resources used
Set queuing thresholds for parallel statements
Etc

Create throttles:

Limit the number of active sessions
Limit degrees of parallelism
Limit the maximum CPU, IO that can be allocated
Etc

The above is just a small number of the things to consider when putting your plan into action and is mostly focused on Database Resource Manager and IO Resource Manager (IORM is Exadata only!). Also consider working with Services and Server Pools and when you running several databases on a system consider instance caging.

Monitoring and Adjusting

Last but not least you will put the plan into action and now monitor your workloads. As the system runs you will adjust the implemented settings. Those adjustments come at various levels:

System Levels:

Memory allocations
Queuing Thresholds
Maximum Parallel Processes running
Server Pools
Etc

Resource Management Settings:

Increase or Decrease throttles and thresholds
Change the queuing guidelines
CPU and IO levels
Etc

All of these adjustments should be minor tweaks... if there are major changes required, you should consider going back to drawing board and understand what the issues with your plan are.

Products Used

In this case we are focusing on the database environment and we are leveraging components that are part of RAC (Services and Server Pools), Database and IO Resource Manager and Enterprise Manager to monitor the workloads. To study your workload you will be looking at AWR for example.

The next post will cover these products and how they all relate. Hopefully I get that one done rather sooner than later...

Comments:

What are some of the exceptions in the continuous process?

Posted by walter burciaga on September 02, 2010 at 11:10 AM PDT #

Hi Walter, Not sure I understand the question. It is effectively a process to fine-tune the system to a set of workloads. Once it is all running and humming you stop and just monitor for exceptions. So I guess there should be an arrow out when it is all running well. At that point you are just managing exceptions and no longer spend a lot of time on the process itself. Does that make sense? JP

Posted by jean-pierre.dijcks on September 07, 2010 at 01:52 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today