Thursday Jan 28, 2010

Precomputed List of Next Best Offers = Bad Idea

What is the difference between having a batch process that computes the Next Best Offer for every customer every night and computing the best offer in real time?

It is all about context. Any precomputed offer list can not possibly take into account the context of the interaction between the customer and the company. Examples of attributes that can not be taken into account in a prebuilt list:

  • Call Reason
  • Recent and Last Transaction
  • Exact state of the account
  • Time of the interaction
  • User Agent (iPhone, Computer, Phone, etc.)
  • Call center agent answering the call
Without utilizing this kind of information you are certain to make the wrong decision in many cases. For example, a customer may be amenable to listening and accepting an offer if they are calling the service call center in the evening and have received a satisfactory resolution for a service call, while the same customer when accessing the site at 10:30 in the morning with the iPhone browser would more likely not be open to any offers at that time.

It has been my experience that in Real Time Marketing implementations in call centers the actual agent answering the call is always in the top 5 predictors that influence the selection of the best offer. Similarly, the call reason and the time of the call tend to be very good predictors.

It is important to understand the difference between inbound and outbound marketing. In addition to the obvious difference in the attitude of the customer and their openness to interact with the company, there is a fundamental difference from the point of view of the customer data. In outbound marketing I can compute the best offer for a customer and then call them a few hours or days later and there is no reason to assume the customer's data would have changed significantly in most cases - only the statistically regular changes apply. In contrast, in inbound marketing I am assured that the customer's data will have changed by the time I am ready to make an offer at the tail end of a call, after all, 100% of those callers decided to call the company for some reason.

Sunday Jan 17, 2010

It's not all about offers

Unfortunately for too many people managing their company's relationship with their customers is all about offers. This narrow view of the customer fails to realize that there are many decisions that the company makes day to day that affect the relationship of the company with the customer. These decisions include:

  • Selection of content to present to the customer
  • Selection of process and process alternatives
  • Product offerings
  • Offers
  • Solution to product or service issues
  • Proactive notifications
  • Fraud detection and avoidance

These decisions are made in the context of different business goals. Like:

  • Increasing revenue
  • Reducing cost
  • Enhancing customers' wallet share
  • Increasing brand recognition
  • Fulfilling partner commitments
  • Providing good customer service
  • Increasing loyalty
  • Controlling fraud

The catalog of possible selections for each decision con come from many sources, including:

  • Campaign management
  • Content management
  • Product catalog
  • Risk Rules
  • Process actions

RTD was designed from the ground up to optimize this variaty of decisions balancing the many competing business goals and selecting items from many different sources, without necessarily owning the metadata for these items. This is in contrast with the view of the world where everything looks like offers and the only goal that matters is immediate increase in revenue.

Monday Jan 04, 2010

Measuring Reality is much easier than Reconstructing it

When asked about the accuracy of RTD's data mining algorithms I often find myself explaining the reasons behind my belief that as a system RTD is much more accurate than any offline data mining system in most cases. One of the reasons for the enhanced accuracy is the capability of directly measuring reality rather than trying to reconstruct it from disconnected data sources.

For example, assume that you are studying the acceptance of offers in a call center. One of the inputs that may be interesting is the length of the queue at the time of the call. In an offline exercise you would have to obtain the logs from the telephony queue, hope that they are kept at enough accuracy, hope that the clock in the systems is synchronized and then query the log using a time based query for sorting the log records. The same thing in RTD is accomplished by simply querying the telephony queue for its current length, at the time of the call. There is no need to hope for data being collected properly, at the right granularity and with synchronized clocks. As we are dealing with reality as-it-happens, we do not care if the clocks are all wrong.

The end result of the difficulty in reconstructing reality is that typical offline data mining studies have much narrower inputs than those typically seen in RTD implementations. The difference in data availability in many cases more than makes up for possible accuracy improvements gained from a manually crafted data mining model.

Just to complete the picture I have to point out that I said "many cases" or "most cases" but not "all cases". The reason for that is that there are many good reasons to perform off-line data mining and it is worth investing in getting the data and complex queries right. Examples include retention, life-time value and in some cases product affinity models. There are also many areas for which RTD algorithms are not applicable, like data exploration, visualization and clustering.

Nevertheless, for predictive data mining applied to process improvement it is hard to beat the real time data collection capabilities or real time analytics systems.

Friday Dec 18, 2009

Learning and predicting for short and long term events

In many RTD deployments we see that the business wants to optimize decisions based on the long term effect of the decision. For example, selecting a retention offer to display to a customer in the web site should not be driven by the likelihood that the customer will click on the offer, but by the likelihood the customer will have been retained, say after 3 month.

Another simpler example is the decision by a bank to offer a credit card to a customer. The events in this situation may be:

  1. Offer Extended
  2. Clicked
  3. Applied for card
  4. Used card
The goal of the bank is to have the customer use the card. The problem is that the feedback for whether the card is used will come weeks after the initial offering. This not only requires the capability of closing the loop at a later time (the subject of a future entry in this blog), but it also leaves us a long time without the capability of having reliable models.

RTD provides built-in functionality to handle these cases gracefully, utilizing the maximum of available information. This is why the positive events for an RTD model can be more than one and their order matters. The idea behind this feature is that the events are naturally ordered. The use of the card comes after the application and after the click. Therefore, a model for the "Click" event can be used as a proxy for the deeper events for as long as we do not have a good model for them.

Using a closer event as a proxy for the farther one is a good strategy, but it requires management of events, levels of conversion, etc. and it gets even more complicated when you think that the different offers can be at different levels of conversion. RTD does all this management automatically.

Before we describe how RTD makes this all work, there is one more consideration. When comparing offers it is not fair to compare the likelihood of click for one offer with the likelihood of Card Use for another offer.

The way that RTD works is as follows. When computing the likelihood for a choice:

  1. Compute the likelihood for the deepest event for which we have a converged model
  2. If the event is the desired one (usually the deepest) stop here and use this likelihood
  3. Compute the average likelihood across all choices for all the events that are deeper than the one we used in step 1
  4. Using the average likelihoods compute the proportion between the different events and apply that proportion to the likelihood we got in step 1.
For example, if the only likelihood that can be computed for a specific choice is Click, and it is 10%, and the averages across all other choices are:

  • Click : 20%
  • Apply : 12%
  • Use: 8%
Then for our choice we take the 10% and multiply it by 8/20 to get the likelihood of use, which gives 4% if I am not mistaken. The likelihood of Apply for the same choice (and customer) would be 6%.

As mentioned before, if you define the events in the proper order as the positive events for the model RTD will take care of the logistics for you, following the algorithm described above.

Happy Holidays.

Saturday Dec 12, 2009

Evaluating Models and Prediction Schemes

It has become quite common in RTD implementation to utilize different models to predict the same kind of value in different situations. For example, if the RTD application is used for optimizing the presentation of Creatives, where the creatives belong to a Offers which in turn belong to Campaigns which belong to Products which belong to Product Lines; it may be desirable to be able to predict at the different levels and use the models in a waterfall fashion as they converge and become more precise.

Another example is when using more than one Model or algorithm, whether internal to RTD or external.

In all these cases it is interesting to determine which of the models or algorithms is better at predicting the output. While RTD Decision Center reports provide good Model Quality reports that can be used to evaluate the RTD internal models, the same may not exist for external model. Furthermore, it may be desired to evaluate the different models in a level playing field, utilizing just one metric that can be used to select the "best" algorithm.

One method of achieving this goal is to use an RTD model to perform the evaluation. This pattern is commonly used in Data Mining to "blend" models or create an "ensemble" of models. The idea is to have the predictors as input and the normal positive event as output. When doing this in RTD, the Decision Center Predictiveness report provides the sorting of the different predictors by their predictiveness.

To demonstrate this I have created an Inline Service (ILS) whose sole purpose is to evaluate predictors which represent different levels of noise over a basic "perfect" predictor. The attached image represents the result of this ILS.

The "Perfect" predictor is just a normally distributed variable centered at 3% with a standard deviation of 7%, limited to the range 0 to 1. The output variable follows exactly the probability given by the predictor. For example, if the predictor is 13% there is a 13% probability of the positive output.

The other predictors are defined by taking the perfect predictor and adding a noise component. The noise is also normally distributed and has a standard deviation that determines the amount of noise.


 For example, the "Noise 1/5" predictor has noise with a standard deviation of 20% (1/5) of the value of the prefect predictor.

You can see that the RTD blended model nicely discovers that the more noise there is in the predictor, the less predictive it is.

This kind of blended model can also be used to create a combined model that has the potential of being better than each of the individual models. This is particularly interesting when the different models are really different, for example because of the inputs they use or because of the algorithms used to develop the models.

If you want a copy of the ILS send me an email.

Monday Dec 07, 2009

Using RTD for recommendations from large number of items

I am often asked whether RTD can be used to recommend items when the number of available items is extremely large, in the tens of thousands to a couple of millions. These situations can be encountered in a number of different industries, including retail, media outlets or portals and news organizations.

Traditional approaches to these situations include Market Basket Analysis and Collaborative Filtering. Collaborative filtering has its strength in extracting affinity information from ratings, and a good CF algorithm can exploit ratings data to extract the least bit of information from it. So these traditional approaches do have their advantages, but nevertheless, they are clearly limited in the following ways:

  1. They can not recommend new items
  2. They can not issue recommendations to new users
  3. They require vast numbers of baskets or ratings to cover the space with statistically significant data
  4. They do not provide flexibility in selecting recommendations to optimize for varying and conflicting business goals

With RTD we are capable of overcoming these limitations by using a technique that does not necessarily involve clustering of items or users, and does not start from scratch for every new item.

Intuitively, it should be clear that the recommendation of the movie "Terminator 3" will follow similar patterns to "Terminator 2", so when T3 appears, the knowledge about T2 can be used as a good approximation. Similarly, the demographic and behavioral data about a user together with the context of an interaction can give us big clues of what the person will be interested n, even if we have not seen any purchase or ratings from that person.

The way we do item recommendations with RTD in this context is to compute the likelihood that an item will be of interest by dividing the likelihood computation into a two layer model network where the base layer computes the affinity of the user with the characteristics of the item and the second layer uses one model to blend the results of the first layer into one final prediction.

Wednesday Nov 25, 2009

Measuring reality is much easier than reconstructing it

The title of this entry says it all. When it comes to collecting data for any analytic work, it is much easier to measure the current data than attempting to reconstruct it from historical databases.

For example, assume you need to analyze the factors that affect cross selling success in the call center and you want to include data like the wait time in the queue or the number of calls the agent answered in the current shift before the call where cross selling was attempted. Collecting this data from history is very complex because:

  1. Not all data is collected all the time
  2. Data from different systems may end up in very disparate historical databases
  3. Different data may have different retention periods and granularity
  4. Different systems may have uncoordinated clocks
  5. Queries become very complex when trying to pinpoint the state of a data record at a specific time
  6. Queries become complex in order to include only events that happened before the point in time in question
For all these reasons and more, it is much easier to perform analytics in Real-Time, when reality can be measured by directly connecting to other systems. For example, it does not matter if the clocks in the different systems are totally unccordinated or work in a different time zone, all I need to worry about is to retrieve the latest data. Similarly, if I need to know the city a person lives in I just retrieve it from the DB, there is no need to go through the list of address changes.

This is one of the reasons I believe that even if you can hand-craft very accurate models, the real time models automatically generated by a self learning system can, in many cases, end up being much more accurate because they can take advantage of more data that is also more accurate.

Sunday Nov 22, 2009

Sizing an RTD installation - Part 3 (final)

Now that we know how to compute the number of requests per second and we have seen other things that need to be considered, we can finally compute the number of CPUs to cope with the desired load. This number is actually quite easy to compute. For planning purposes we usually account for 100 requests/second/CPU. This leaves enough room for higher peak loads or other underestimations in the process. In typical cases we see a higher throughput per CPU.

For example, if we need to support 300 requests per second we can plan for 3 CPUs for Decision Service. The other processes, Learning Server and Workbench Server can usually be run either on one of the Decision Service CPUs or on their own.

Now, lets say that there is the desire to use standard servers with 2 CPUs, each CPU with 4 cores. In this case, one server would have more than enough computing power to cope with the number of requests per second. Nevertheless, we may choose to have 2 of these servers, that is 16 total cores to provide for high availability.

If this same configuration was used with Disaster Recovery then we may end up running two servers in two sites with a total of 32 CPU cores. That, of course, is more computing power than necessary to cope with the load.

An alternative that is counter intuitive for people running transactional applications is to have RTD running on just one server, and pay the price of non-availability. This may be acceptable depending on the application. For example in offer optimization and if the expected down time of a single server is just a couple of hours per year, then the cost of having non redundant servers maybe better than the cost of having a HA setup.

In any case, the numbers above are for basic planning purposes. If there are many sessions being initialized and not so many other kinds of events then the equations may look different as a session initialization usually takes more resources. Additionally, the load balancing strategy in front of the RTD servers also affects performance. Maximum speed is attained when the load balancing scheme is capable of maintaining session affinity.

Finally, for really high throughput in the thousands of requests per second the strategy is to partition the servers along some strict lines. This partitining strategy can be taken all the way into the database.

Sunday Nov 15, 2009

Sizing an RTD installation - Part 2

Now that we have the expected throughput in terms of the number of requests per second, lets look at other sizing factors.

Response time  - sometimes the volume of requests is not smoothly distributed and there may be peaks of requests coming at the same time. If there are strict response time requirements, like having an average below 30ms with a maximum of 60ms for 99% of the requests, then we need to consider the maximum number of requests that are going to be processed in parallel. To achieve the highest performance for the highest number of requests we will design for between 3 and 6 requests being processed in parallel per CPU core or hardware hyperthread.

Session Initializations - when a session is initialized there are a few extra things that happen when compared with requests that come after initialization. First, depending on whether the RTD server manages session affinity, a new entry is created in the sessions table, which typically requires at least one database write. Additionally, the in memory session is typically filled from the configured data sources. The speed of these operations is totally driven by the performance of the source databases. If an application has many more session initialization than other types of messages, then the throughput may be affected even though the total number of requests is not too high for the configuration.

Single Point of Failure and High Availability - in most cases the system is configured to provide High Availability (HA) and resiliency to server failure or lack of availability (for rolling maintenance for example). RTD is typically configured with a number of servers to avoid the single point of failure. Sometimes it is also configured with multiple sites for HA and Disaster Recovery (DR). In this context it is important to consider the option of relying on default responses to cope with outages of the RTD servers. I know of one RTD server that has been working since 2005 and has been down for maintenance only for a few hours total since it started.

In the next entry we will finally talk about the sizing of the servers.

Friday Nov 13, 2009

Sizing an RTD installation - Part 1

In every implementation of RTD it is necessary to determine the hardware configuration to support the expected loads of RTD applications. While we try to provide guidelines and generalizations, it helps to understand the most significant factors that affect the desired hardware configuration. In a series of blog entries we describe the different factors that need to be considered.


The first factor to consider is the expected load, in terms of number of events per second, that the servers will need to deal with. These events have different types and therefore may cause different loads into the servers.

Estimating the number of events per second usually begins at some given metrics. Examples of typical metrics include:

  • Web site pages served per second/day/month
  • Web site [unique] visitors per month
  • Web site visits/sessions per day
  • Call Center calls per day
  • Average call length
  • Maximum number of concurrent agents
  • IVR calls handled per day

The first thing to do with these metrics is to translate them to "per second" numbers. The translation from large time periods, like months, can not be done by directly dividing by the number of seconds in a month, as it is typical that there are busier days and busier hours of the day.

Some rules of thumb that I have found to result in numbers that are pretty close to reality for a wide variety of situations are as follows:

  • Monthly numbers can be divided by 10 to produce the numbers for a busy day
  • Daily numbers can be divided by 10 to produce the numbers on a busy hour
  • Hourly numbers are divided by 3000 (or sometimes 2000) to produce the number per second
  • If number of pages per visit is unknown, 10 to 15 can be assumed for many sites
  • If call length is unknown, 5 minutes can be assumed
  • Dividing the number of concurrently active agents by the length of a call (in seconds) gives the number of call starts per second

From  these we can compute the expected number of requests per second. Lets look at some examples.

Web example: a bank. Only the following information is available: "The bank has 5M customers, of them 2M have signed up for online banking. They are planning to use RTD to determine content and promotions in several places in most online banking pages."

Since this is all the information we have, we will do a calculation based on many assumptions. Later on we can confirm or adjust our assumptions based on any additional information we are given.

Assuming 1/2 of the signed up customers are active, and we have on average 4 visits per month we have 4M visits per month. Using the rules of thumb above, we can assume 400k visits on a busy day, and 40k on a busy hour. Dividing by 2000 seconds in an hour that gives us about 20 visits started per second. Assuming 10 pages per visit and 3 requests per page we have 30 requests per visit and 600 requests per second.

Call Center example: "A telco has 5000 agents in the call center. They are interested in implementing RTD for offer recommendations at the end of service calls."

Lets assume that the maximum number of agents active at any given time is about 2/3 of the agents, say 3500. Assuming 5 minute calls, which is 300 seconds, we have an average of about 12 call initializations per second. Assuming 4 requests per call, we have about 48 requests per second.

In upcoming posts we will explore other considerations that come into play when selecting a configuration.


Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.


« July 2016