Sunday Nov 15, 2009

Sizing an RTD installation - Part 2

Now that we have the expected throughput in terms of the number of requests per second, lets look at other sizing factors.

Response time  - sometimes the volume of requests is not smoothly distributed and there may be peaks of requests coming at the same time. If there are strict response time requirements, like having an average below 30ms with a maximum of 60ms for 99% of the requests, then we need to consider the maximum number of requests that are going to be processed in parallel. To achieve the highest performance for the highest number of requests we will design for between 3 and 6 requests being processed in parallel per CPU core or hardware hyperthread.

Session Initializations - when a session is initialized there are a few extra things that happen when compared with requests that come after initialization. First, depending on whether the RTD server manages session affinity, a new entry is created in the sessions table, which typically requires at least one database write. Additionally, the in memory session is typically filled from the configured data sources. The speed of these operations is totally driven by the performance of the source databases. If an application has many more session initialization than other types of messages, then the throughput may be affected even though the total number of requests is not too high for the configuration.

Single Point of Failure and High Availability - in most cases the system is configured to provide High Availability (HA) and resiliency to server failure or lack of availability (for rolling maintenance for example). RTD is typically configured with a number of servers to avoid the single point of failure. Sometimes it is also configured with multiple sites for HA and Disaster Recovery (DR). In this context it is important to consider the option of relying on default responses to cope with outages of the RTD servers. I know of one RTD server that has been working since 2005 and has been down for maintenance only for a few hours total since it started.

In the next entry we will finally talk about the sizing of the servers.

Friday Nov 13, 2009

Sizing an RTD installation - Part 1

In every implementation of RTD it is necessary to determine the hardware configuration to support the expected loads of RTD applications. While we try to provide guidelines and generalizations, it helps to understand the most significant factors that affect the desired hardware configuration. In a series of blog entries we describe the different factors that need to be considered.

Throughput

The first factor to consider is the expected load, in terms of number of events per second, that the servers will need to deal with. These events have different types and therefore may cause different loads into the servers.

Estimating the number of events per second usually begins at some given metrics. Examples of typical metrics include:

  • Web site pages served per second/day/month
  • Web site [unique] visitors per month
  • Web site visits/sessions per day
  • Call Center calls per day
  • Average call length
  • Maximum number of concurrent agents
  • IVR calls handled per day

The first thing to do with these metrics is to translate them to "per second" numbers. The translation from large time periods, like months, can not be done by directly dividing by the number of seconds in a month, as it is typical that there are busier days and busier hours of the day.

Some rules of thumb that I have found to result in numbers that are pretty close to reality for a wide variety of situations are as follows:

  • Monthly numbers can be divided by 10 to produce the numbers for a busy day
  • Daily numbers can be divided by 10 to produce the numbers on a busy hour
  • Hourly numbers are divided by 3000 (or sometimes 2000) to produce the number per second
  • If number of pages per visit is unknown, 10 to 15 can be assumed for many sites
  • If call length is unknown, 5 minutes can be assumed
  • Dividing the number of concurrently active agents by the length of a call (in seconds) gives the number of call starts per second

From  these we can compute the expected number of requests per second. Lets look at some examples.

Web example: a bank. Only the following information is available: "The bank has 5M customers, of them 2M have signed up for online banking. They are planning to use RTD to determine content and promotions in several places in most online banking pages."

Since this is all the information we have, we will do a calculation based on many assumptions. Later on we can confirm or adjust our assumptions based on any additional information we are given.

Assuming 1/2 of the signed up customers are active, and we have on average 4 visits per month we have 4M visits per month. Using the rules of thumb above, we can assume 400k visits on a busy day, and 40k on a busy hour. Dividing by 2000 seconds in an hour that gives us about 20 visits started per second. Assuming 10 pages per visit and 3 requests per page we have 30 requests per visit and 600 requests per second.

Call Center example: "A telco has 5000 agents in the call center. They are interested in implementing RTD for offer recommendations at the end of service calls."

Lets assume that the maximum number of agents active at any given time is about 2/3 of the agents, say 3500. Assuming 5 minute calls, which is 300 seconds, we have an average of about 12 call initializations per second. Assuming 4 requests per call, we have about 48 requests per second.

In upcoming posts we will explore other considerations that come into play when selecting a configuration.

About

Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today