## Friday Jan 06, 2012

### Introducing Lukas Vermeer

I have the pleasure to introduce a new contributor to this blog, Lukas Vermeer. He has been an RTD practicioner for a long time and I'm very happy he will be contributing from his field experience to this blog.

Michel

## Tuesday Jan 03, 2012

### Multiple Presentations - Part 3

Continuing the series of analysis of the effect of multiple presentations of the same content on the likelihood of click.

## Exponential Decay

The curve of likelihood of click after the peak can be approximated by an exponential decay function. For example, the likelihood of click on the 19th presentation may be 93% the likelihood of the 18th presentation. We will model the beginning of the curve as a constant, ignoring for now the variation in particular for the first presentation. The result looks like this.

This is a very simple model that will work well in many situations. It is also technically simple as all the memory it requires is remembering the number of times the content has been presented to each specific visitor.

I believe this model will work when:

• The number of presentations of the same content tends to be high; higher than 5 on average.
• The time between presentations is not long; within the same session or at most a few hours in between presentations.
• The content is presented only in a specific channel and location

The model will not work when:

• The number of presentations for each content is small, but bigger than 1. In these cases the difference between the first presentation and subsequent ones can not be ignored.
• The time between presentations tends to be longer; a few days in between presentations for example. This time is long enough for the effect of previous presentations to wane, but not long enough for it to disappear completely.
• The content is presented in different channels or locations. For example if the visitor may be exposed to the same message in an email as well as the main banner of the home page, or different locations within the pages, or pages with significantly different other content.
• The same message is presented with similar contents, but perhaps not the same image dimensions and text

In order to cope with these shortcomings we need to use a model that will take into account the time between presentations and the impact each presentation has. The impact is affected by factors like the channel, location and visual characteristics of the content.

## Monday Dec 12, 2011

### Multiple Presentations - Part 2

In the first part of this series we explored the problem of predicting likelihood with multiple presentations of the same content. In this entry we will explore a few more options.

## Modeling Options (continued)

### Presentation Cap

With presentation caps a person decides on a threshold which is the maximum number of times a specific content is to be used with any given person. For example, "this offer should not be presented more than 5 times to the same customer."

This scheme avoids  the problem of wasting presentations in the long tail, but as any arbitrary threshold, it is not optimal. If the number is too large, then there are going to be wasted presentations and if it is too small, then the effort will quit too early, leaving potential customers "unimpressed."

An additional problem with this scheme stems from the fact that there may be a change inthe situation of the customer. That is, maybe this offer was not relevant a month ago, but now, with changed circumstances it may be relevant. A way to solve this problem is to set an expiration date for the cap.

In summary, for each choice two numbers are given, the maximum number of times to present the same choice to the same person, and the number of days after which the count is reset.

Purple line represents the believed likelihood before the cap is reached. Yellow marked area represents overestimation and red underestimation of the likelihood.

It is also clear that if the cap is too low then the untapped potential - the red area - grows tremendously. Conversely, if the cap is too large then the wasted presentations - the yellow area - become more and more significant. Therefore, quite literaly, using a presentation cap is like fitting a square peg in a round hole.

A further problem with this approach is one of modeling. Should you train the model with each presentation or only when the cap is reached?

In the next installment we will explore a possible better approach.

## Wednesday Nov 30, 2011

### Predicting Likelihood of Click with Multiple Presentations

When using predictive models to predict the likelihood of an ad or a banner to be clicked on it is common to ignore the fact that the same content may have been presented in the past to the same visitor. While the error may be small if the visitors do not often see repeated content, it may be very significant for sites where visitors come repeatedly.

This is a well recognized problem that usually gets handled with presentation thresholds – do not present the same content more than 6 times.

Observations and measurements of visitor behavior provide evidence that something better is needed.

# Observations

For a specific visitor, during a single session, for a banner in a not too prominent space, the second presentation of the

same content is more likely to be clicked on than the first presentation. The difference can be 30% to 100% higher likelihood for the second presentation when compared to the first.

That is, for example, if the first presentation has an average click rate of 1%, the second presentation may have an average CTR of between 1.3% and 2%.

After the second presentation the CTR stays more or less the same for a few more presentations. The number of presentations in this plateau seems to vary by the location of the content in the page and by the visual attraction of the content.

After these few presentations the CTR starts decaying with a curve that is very well approximated by an exponential decay. For example, the 13th presentation may have 90% the likelihood of the 12th, and the 14th has 90% the likelihood of the 13th. The decay constant seems also to depend on the visibility of the content.

# Modeling Options

Now that we know the empirical data, we can propose modeling techniques that will correctly predict the likelihood of a click.

## Use presentation number as an input to the predictive model

Probably the most straight forward approach is to add the presentation number as an input to the predictive model. While this is certainly a simple solution, it carries with it several problems, among them:

1. If the model learns on each case, repeated non-clicks for the same content will reinforce the belief of the model on the non-clicker disproportionately. That is, the weight of a person that does not click for 200 presentations of an offer may be the same as 100 other people that on average click on the second presentation.
2. The effect of the presentation number is not a customer characteristic or a piece of contextual data about the interaction with the customer, but it is contextual data about the content presented.
3. Models tend to underestimate the effect of the presentation number.

For these reasons it is not advisable to use this approach when the average number of presentations of the same content to the same person is above 3, or when there are cases of having the presentation number be very large, in the tens or hundreds.

## Use presentation number as a partitioning attribute to the predictive model

In this approach we essentially build a separate predictive model for each presentation number. This approach overcomes all of the problems in the previous approach, nevertheless, it can be applied only when the volume of data is large enough to have these very specific sub-models converge.

In the next couple of entries we will explore other solutions and a proposed modeling framework.

## Tuesday Nov 29, 2011

### Customer retention - why most companies have it wrong

At least in the US market it is quite common for service companies to offer an initially discounted price to new customers. While this may attract new customers and rob customers from competitors, it is my argument that it is a bad strategy for the company. This strategy gives an incentive to change companies and a disincentive to stay with the company. From the point of view of the customer, after 6 months of being a customer the company rewards the loyalty by raising the price.

A better strategy would be to reward customers for staying with the company. For example, by lowering the cost by 5% every year (compound discount so it does never get to zero). This is a very rational thing to do for the company. Acquiring new customers and setting up their service is expensive, new customers also tend to use more of the common resources like customer service channels. It is probably true for most companies that the cost of providing service to a customer of 10 years is lower than providing the same service in the first year of a customer's tenure. It is only logical to pass these savings to the customer.

From the customer point of view, the competition would have to offer something very attractive, whether in terms of price or service, in order for the customer to switch.

Such a policy would give an advantage to the first mover, but would probably force the competitors to follow suit. Overall, I would expect that this would reduce the mobility in the market, increase loyalty, increase the investment of companies in loyal customers and ultimately, increase competition for providing a better service.

Competitors may even try to break the scheme by offering customers the porting of their tenure, but that would not work that well because it would disenchant existing customers and would be costly, assuming that it is costlier to serve a customer through installation and first year.

What do you think? Is this better than using "save offers" to retain flip-floppers?

### Analyst Report on RTD

An interesting analyst report on RTD has been published by MWD, a reference and description can be found in this blog entry

## Thursday Nov 17, 2011

### Short Season, Long Models - Dealing with Seasonality

Accounting for seasonality presents a challenge for the accurate prediction of events. Examples of seasonality include:

·         Boxed cosmetics sets are more popular during Christmas. They sell at other times of the year, but they rise higher than other products during the holiday season.

·         Interest in a promotion rises around the time advertising on TV airs

·         Interest in the Sports section of a newspaper rises when there is a big football match

There are several ways of dealing with seasonality in predictions.

## Time Windows

If the length of the model time windows is short enough relative to the seasonality effect, then the models will see only seasonal data, and therefore will be accurate in their predictions. For example, a model with a weekly time window may be quick enough to adapt during the holiday season.

In order for time windows to be useful in dealing with seasonality it is necessary that:

1. The time window is significantly shorter than the season changes
2. There is enough volume of data in the short time windows to produce an accurate model

An additional issue to consider is that sometimes the season may have an abrupt end, for example the day after Christmas.

## Input Data

If available, it is possible to include the seasonality effect in the input data for the model. For example the customer record may include a list of all the promotions advertised in the area of residence.

A model with these inputs will have to learn the effect of the input. It is possible to learn it specific to the promotion – and by the way learn about inter-promotion cross feeding – by leaving the list of ads as it is; or it is possible to learn the general effect by having a flag that indicates if the promotion is being advertised.

For inputs to properly represent the effect in the model it is necessary that:

1. The model sees enough events with the input present. For example, by virtue of the model lifetime (or time window) being long enough to see several “seasons” or by having enough volume for the model to learn seasonality quickly.

## Proportional Frequency

If we create a model that ignores seasonality it is possible to use that model to predict how the specific person likelihood differs from average. If we have a divergence from average then we can transfer that divergence proportionally to the observed frequency at the time of the prediction.

Definitions:

Ft = trailing average frequency of the event at time “t”. The average is done over a suitable period of to achieve a statistical significant estimate.

F = average frequency as seen by the model.

L = likelihood predicted by the model for a specific person

Lt = predicted likelihood proportionally scaled for time “t”.

If the model is good at predicting deviation from average, and this holds over the interesting range of seasons, then we can estimate Lt as:

Lt = L * (Ft / F)

Considering that:

L = (L – F) + F

Substituting we get:

Lt = [(L – F) + F] * (Ft / F)

Which simplifies to:

(i)                  Lt = (L – F) * (Ft / F)  +  Ft

This latest expression can be understood as “The adjusted likelihood at time t is the average likelihood at time t plus the effect from the model, which is calculated as the difference from average time the proportion of frequencies”.

The formula above assumes a linear translation of the proportion. It is possible to generalize the formula using a factor which we will call “a” as follows:

(ii)                Lt = (L – F) * (Ft / F) * a  +  Ft

It is also possible to use a formula that does not scale the difference, like:

(iii)               Lt = (L – F) * a  +  Ft

While these formulas seem reasonable, they should be taken as hypothesis to be proven with empirical data. A theoretical analysis provides the following insights:

1. The Cumulative Gains Chart (lift) should stay the same, as at any given time the order of the likelihood for different customers is preserved
2. If F is equal to Ft then the formula reverts to “L”
3. If (Ft = 0) then Lt in (i) and (ii) is 0
4. It is possible for Lt to be above 1.

If it is desired to avoid going over 1, for relatively high base frequencies it is possible to use a relative interpretation of the multiplicative factor.

For example, if we say that Y is twice as likely as X, then we can interpret this sentence as:

• If X is 3%, then Y is 6%
• If X is 11%, then Y is 22%
• If X is 70%, then Y is 85% - in this case we interpret “twice as likely” as “half as likely to not happen”

Applying this reasoning to (i) for example we would get:

If (L < F) or (Ft < (1 / ((L/F) + 1))

Then  Lt = L * (Ft / F)

Else

Lt = 1 – (F / L) + (Ft * F / L)

## Tuesday Aug 23, 2011

### Don't do Product Recommendations!

While it is attractive to talk about "product recommendations" as one single functionality, in reality it should be seen as a family of different functions for different situations and purposes.

The following example provides the motivation for the differentiation between the different types of product recommendations.

A couple enters a supermarket and their smartphone connects to the store's computer. You have the opportunity to give them an offer. You want to offer them a 10% discount of one product, which product should it be? You have the following information:

• Models predict a likelihood of 10% to purchase Fat Free Milk
• Models predict a likelihood of 2% to purchase Gruyère cheese

Which of these two products would you recommend?

What if in addition, you knew:

• Average margin for milk is \$0.50
• Average margin for cheese is \$1

Would you recommend the milk because it is more likely to be purchased? Or the cheese because it has a higher margin? Or the milk because it has a higher expected margin (likelihood times margin)?

An what if in addition you knew:

• Average likelihood to purchase Fat Free Milk is 15% across all customers
• Average likelihood to purchase Gruyère cheese is 0.1% across all customers

I believe that the best of the two products, for this specific situation is the cheese. The reason s are behind the numbers and the goal of the recommendation.

From the statement of the situation, it is reasonable to infer that the goal of the offer is mainly to increase the basket, with the additional benefits of having happy and loyal customers. While at first sight the 10% associated with the milk is a higher number, this number has two problems. First, it is high, and it is quite possible that the customer would have thought of buying it anyway. Secondly, it is lower than the average. Put in words, the customer is "more than 30% less likely to buy non fat milk than the average customer."

The cheese should be the winner because "the customer is 20 times more likely to buy Gruyère cheese than the average customer."

The variables that come into play for the selection of the best product are:

1. Context of the recommendation. Stage in the process, purpose.
2. Selection universe. The set of products from which the recommendation can be selected.
3. Selection criteria

In future posts we will explore how each of these variables affects the way Product Recommendations should work.

## Monday Aug 22, 2011

### Performance Goals for Human Decisions

I've often asked myself whether we, humans, make decisions in a similar way to what RTD does. Do we balance different KPIs? Do we evaluate the different options and chose the one that maximizes our performance goals?

To answer this question one would have to ask what are your performance goals and how much do you value each one of them. It would seem logical that our decisions would be made in such a rational way that they are totally driven by the evaluation of each alternative and the selection of the best one.

Following this logic, one could surmise that if we were able to discover the performance goals that are relevant for a specific person, and the weights for each one, we could be very good at predicting human behavior. Instead of using the Inductive prediction models like the ones we have today in RTD, we could use Deductive models that mimic the logic of the person to arrive to the predicted behavior.

Fortunately, as learned by modern economists and brilliantly put by Dan Ariely in his book Predictably Irrational, human decisions are typically not the result of rational optimization, but heavily influenced by emotions and instinct.

This is one of the reasons that rule systems perform so poorly in trying to predict human behavior. A rule system would try to detect the reason behind the behavior. Empirical, inductive models work much better because they do not try to discover the pattern behind a behavior, but the common characteristics of people; and while we can not rationally explain many of our behaviors, we do see a lot of commonality. While we are each a unique individual, it is possible to predict our behavior by generalizing from what is observed about people similar to us.

I was recently on a Southwest Airlines flight. As usual, travelers had optimized their seat selection according to what was available and the convenience, mostly preferring seats by the front of the plane, windows and aisle seats. Can we predict whether you will prefer a Window or an Aisle? Absolutely, just look at the history of the seats that the person has chosen in the past. While you may claim that such a model is obvious, it is a good model based on generalization of past experience. I can still not answer the question of what are the motivations for some person to preferring a window seat, but I can predict with great accuracy which one will you prefer on a specific flight.

Once everyone had selected their seat there were about 20 middle seats left in the plane. Just before the door closes, a mother with a child enters the plane in a hurry. She evaluates the situation, and for her the Performance Goal of being beside her child was the most important. Since all the "eligible" choices were not good, she tried to create a new choice, asking the flight attendants for help.

As the attendant was starting to make an announcement to ask for someone to give up their Aisle or Window seat, I saw the situation and immediately offered my aisle seat. Then, of course, I tried to figure out why I decided to do that? Why others didn't? Was it purely emotional or there was a Performance Goal mechanism involved?

Sure there are benefits to giving up you seat in a situation like this. For example I got preferential treatment and free premium drinks from the flight attendants for the rest of the flight, but I did not know about those benefits. Are there other hidden KPIs?

I would like to hear from you. What are the Performance Goals that motivate people to action? Is there a moral framework within which decisions do follow KPI optimization?

## Wednesday Nov 17, 2010

### Performance Tips

As RTD implementations become more and more sophisticated and the applications extend the reach of decisions far beyond selecting the next best offer we have been recommending some design decisions in order to ensure a desired level of performance.

By far, the most significant factor affecting performance is external system access. In particular database access. This goes for reads as well as writes. Here are a few tips that are easy to implement and are good to keep in mind when designing a configuration;

1. Data that is repeatedly used in decisions by different sessions should be cached. Examples include Offer Metadata, Product Catalog, Content Catalog, and Zip Code Demographics.
2. If possible, data that will be needed in decisions should be pre-fetched. For example, customer profiles could be loaded at the very beginning of a session.
3. A good storage system for data is an Oracle Coherence Cache. Particularly if it is configured with local storage in the same app server as RTD. Data can include customer profile, event history, etc.
4. When writing to the database and there is no need to be transactional and synchronous, use RTD's batch writing capabilities. This can increase write performance by an order of magnitude.
5. Avoid unnecessary writes and writing unnecessary data. For example, avoid writing metadata together with event data if the metadata can be linked.
6. Consider using stored procedures when updating several tables to minimize roundtrips to the database
7. If a result set is potentially very large, consider wrapping the query with a stored procedure that limits the number of rows returned. For example, if the application calls for loading the purchase history of a customer, and the median length of the list is 3 purchases, but there are 15 customers with 10000 purchases or more, processing these [good] customers will take long - it may be acceptable from the point of view of the application logic to just load a maximum of the latest 100 purchases.
9. Asynchronous processing is not free - avoid unnecessary processing. For example, a decision may take 5 ms of actual CPU processing. That would limit the theoretical throughput of a single CPU to 200 decisions per second. If we add 15 ms of asynchronous processing per decision, we will not be affecting response time, but the throughput will be afected - the thoretical throughput being reduced to 50 per second.

In addition to these tips, it is important also for the environment to be propery setup to achieve peak performance. Some tips include:

1. Prefer physical servers to virtualized ones
2. Always calculate to have at least two CPU cores per JVM
3. Make sure the memory requirements and settings match the available memory to avoid swapping JVM memory
4. If using virtualized servers make sure that CPUs are not overallocated. That is, do not run 5 virtual machines configured for 2 CPUs each on an 8 core system. While such a setup may be acceptable for some applications, with throughput intensive applications like RTD such a setup would certainly cause performance problems and these problems are difficult to diagnose.
5. If using virtualized servers make sure the virtual machine's configured memory will be resident in physical memory. If the Guest believes that it has 4GB of memory, but the host needs to use swap to fulfill that amount of memory performance and availability will suffer. Problems in this area are very difficult to diagnose because to the Guest OS it looks as if CPU cycles were stolen.

## Thursday Jun 10, 2010

### Is RTD Stateless or Stateful?

Yes.

A stateless service is one where each request is an independent transaction that can be processed by any of the servers in a cluster. A stateful service is one where state is kept in a server's memory from transaction to transaction, thus necessitating the proper routing of requests to the right server. The main advantage of stateless systems is simplicity of design. The main advantage of stateful systems is performance.

I'm often asked whether RTD is a stateless or stateful service, so I wanted to clarify this issue in depth so that RTD's architecture will be properly understood.

The short answer is: "RTD can be configured as a stateless or stateful service."

The performance difference between stateless and stateful systems can be very significant, and while in a call center implementation it may be reasonable to use a pure stateless configuration, a web implementation that produces thousands of requests per second is practically impossible with a stateless configuration.

RTD's performance is orders of magnitude better than most competing systems. RTD was architected from the ground up to achieve this performance. Features like automatic and dynamic compression of prediction models, automatic translation of metadata to machine code, lack of interpreted languages, and separation of model building from decisioning contribute to achieving this performance level. Because of this focus on performance we decided to have RTD's default configuration work in a stateful manner. By being stateful RTD requests are typically handled in a few milliseconds when repeated requests come to the same session.

Now, those readers that have participated in implementations of RTD know that RTD's architecture is also focused on reducing Total Cost of Ownership (TCO) with features like automatic model building, automatic time windows, automatic maintenance of database tables, automatic evaluation of data mining models, automatic management of models partitioned by channel, geography, etcetera, and hot swapping of configurations.

How do you reconcile the need for a low TCO and the need for performance? How do you get the performance of a stateful system with the simplicity of a stateless system? The answer is that you make the system behave like a stateless system to the exterior, but you let it automatically take advantage of situations where being stateful is better.

For example, one of the advantages of stateless systems is that you can route a message to any server in a cluster, without worrying about sending it to the same server that was handling the session in previous messages. With an RTD stateful configuration you can still route the message to any server in the cluster, so from the point of view of the configuration of other systems, it is the same as a stateless service. The difference though comes in performance, because if the message arrives to the right server, RTD can serve it without any external access to the session's state, thus tremendously reducing processing time. In typical implementations it is not rare to have high percentages of messages routed directly to the right server, while those that are not, are easily handled by forwarding the messages to the right server. This architecture usually provides the best of both worlds with performance and simplicity of configuration.

Configuring RTD as a pure stateless service

A pure stateless configuration requires session data to be persisted at the end of handling each and every message and reloading that data at the beginning of handling any new message. This is of course, the root of the inefficiency of these configurations. This is also the reason why many "stateless" implementations actually do keep state to take advantage of a request coming back to the same server. Nevertheless, if the implementation requires a pure stateless decision service, this is easy to configure in RTD. The way to do it is:

1. Mark every Integration Point to Close the session at the end of processing the message
2. In the Session entity persist the session data on closing the session
3. In the session entity check if a persisted version exists and load it

An excellent solution for persisting the session data is Oracle Coherence, which provides a high performance, distributed cache that minimizes the performance impact of persisting and reloading the session. Alternatively, the session can be persisted to a local database.

An interesting feature of the RTD stateless configuration is that it can cope with serializing concurrent requests for the same session. For example, if a web page produces two requests to the decision service, these requests could come concurrently to the decision services and be handled by different servers. Most stateless implementation would have the two requests step onto each other when saving the state, or fail one of the messages. When properly configured, RTD will make one message wait for the other before processing.

A Word on Context

Using the context of a customer interaction typically significantly increases lift. For example, offer success in a call center could double if the context of the call is taken into account. For this reason, it is important to utilize the contextual information in decision making. To make the contextual information available throughout a session it needs to be persisted. When there is a well defined owner for the information then there is no problem because in case of a session restart, the information can be easily retrieved. If there is no official owner of the information, then RTD can be configured to persist this information.

Once again, RTD provides flexibility to ensure high performance when it is adequate to allow for some loss of state in the rare cases of server failure. For example, in a heavy use web site that serves 1000 pages per second the navigation history may be stored in the in memory session. In such sites it is typical that there is no OLTP that stores all the navigation events, therefore if an RTD server were to fail, it would be possible for the navigation to that point to be lost (note that a new session would be immediately established in one of the other servers). In most cases the loss of this navigation information would be acceptable as it would happen rarely. If it is desired to save this information, RTD would persist it every time the visitor navigates to a new page.

Note that this practice is preferred whether RTD is configured in a stateless or stateful manner.

## Thursday May 27, 2010

### Tips on ensuring Model Quality

Given enough data that represents well the domain and models that reflect exactly the decision being optimized, models usually provide good predictions that ensure lift. Nevertheless, sometimes the modeling situation is less than ideal. In this blog entry we explore the problems found in a few such situations and how to avoid them.

1 - The Model does not reflect the problem you are trying to solve

For example, you may be trying to solve the problem: "What product should I recommend to this customer" but your model learns on the problem: "Given that a customer has acquired our products, what is the likelihood for each product". In this case the model you built may be too far of a proxy for the problem you are really trying to solve. What you could do in this case is try to build a model based on the result from recommendations of products to customers. If there is not enough data from actual recommendations, you could use a hybrid approach in which you would use the [bad] proxy model until the recommendation model converges.

2 - Data is not predictive enough

If the inputs are not correlated with the output then the models may be unable to provide good predictions. For example, if the input is the phase of the moon and the weather and the output is what car did the customer buy, there may be no correlations found. In this case you should see a low quality model.

The solution in this case is to include more relevant inputs.

3 - Not enough cases seen

If the data learned does not include enough cases, at least 200 positive examples for each output, then the quality of recommendations may be low.

The obvious solution is to include more data records. If this is not possible, then it may be possible to build a model based on the characteristics of the output choices rather than the choices themselves. For example, instead of using products as output, use the product category, price and brand name, and then combine these models.

4 - Output leaking into input giving the false impression of good quality models

If the input data in the training includes values that have changed or are available only because the output happened, then you will find some strong correlations between the input and the output, but these strong correlations do not reflect the data that you will have available at decision (prediction) time. For example, if you are building a model to predict whether a web site visitor will succeed in registering, and the input includes the variable DaysSinceRegistration, and you learn when this variable has already been set, you will probably see a big correlation between having a Zero (or one) in this variable and the fact that registration was successful.

The solution is to remove these variables from the input or make sure they reflect the value as of the time of decision and not after the result is known.

## Wednesday Apr 07, 2010

### The softer side of BPM

BPM and RTD are great complementary technologies that together provide a much higher benefit than each of them separately. BPM covers the need for automating processes, making sure that there is uniformity, that rules and regulations are complied with and that the process runs smoothly and quickly processes the units flowing through it.

By nature, this automation and unification can lead to a stricter, less flexible process. To avoid this problem it is common to encounter process definition that include multiple conditional branches and human input to help direct processing in the direction that best applies to the current situation. This is where RTD comes into play. The selection of branches and conditions and the optimization of decisions is better left in the hands of a system that can measure the results of its decisions in a closed loop fashion and make decisions based on the empirical knowledge accumulated through observing the running of the process.

When designing a business process there are key places in which it may be beneficial to introduce RTD decisions. These are:

• Thresholds - whenever a threshold is used to determine the processing of a unit, there may be an opportunity to make the threshold "softer" by introducing an RTD decision based on predicted results. For example an insurance company process may have a total claim threshold to initiate an investigation. Instead of having that threshold, RTD could be used to help determine what claims to investigate based on the likelihood they are fraudulent, cost of investigation and effect on processing time.
• Human decisions - sometimes a process will let the human participants make decisions of flow. For example, a call center process may leave the escalation decision to the agent. While this has flexibility, it may produce undesired results and asymetry in customer treatment that is not based on objective functions but subjective reasoning by the agent. Instead, an RTD decision may be introduced to recommend escalation or other kinds of treatments.
• Content Selection - a process may include the use of messaging with customers. The selection of the most appropriate message to the customer given the content can be optimized with RTD.
• A/B Testing - a process may have optional paths for which it is not clear what populations they work better for. Rather than making the arbitrary selection or selection by committee of the option deeped the best, RTD can be introduced to dynamically determine the best path for each unit.
In summary, RTD can be used to make BPM based process automation more dynamic and adaptable to the different situations encountered in processing. Effectively making the automation softer, less rigid in its processing. In order for this to work the people responsible for the process need to understand the what are the important KPIs that the business is really interested in optimizing for, and make a concerted effort in measuring up to those KPIs and optimizing the process to achieve better results. The benefit of making better decisions in a process flow can be tremendous, as exemplified by many of current RTD implementations.

## Monday Mar 22, 2010

### Ignoring Robots - Or Better Yet, Counting Them Separately

It is quite common to have web sessions that are undesirable from the point of view of analytics. For example, when there are either internal or external robots that check the site's health, index it or just extract information from it. These robotic session do not behave like humans and if their volume is high enough they can sway the statistics and models.

One easy way to deal with these sessions is to define a partitioning variable for all the models that is a flag indicating whether the session is "Normal" or "Robot". Then all the reports and the predictions can use the "Normal" partition, while the counts and statistics for Robots are still available.

In order for this to work, though, it is necessary to have two conditions:

1. It is possible to identify the Robotic sessions.
2. No learning happens before the identification of the session as a robot.

The first point is obvious, but the second may require some explanation. While the default in RTD is to learn at the end of the session, it is possible to learn in any entry point. This is a setting for each model. There are various reasons to learn in a specific entry point, for example if there is a desire to capture exactly and precisely the data in the session at the time the event happened as opposed to including changes to the end of the session.

In any case, if RTD has already learned on the session before the identification of a robot was done there is no way to retract this learning.

Identifying the robotic sessions can be done through the use of rules and heuristics. For example we may use some of the following:

1. Maintain a list of known robotic IPs or domains
2. Detect very long sessions, lasting more than a few hours or visiting more than 500 pages
3. Detect "robotic" behaviors like a methodic click on all the link of every page
4. Detect a session with 10 pages clicked at exactly 20 second intervals
Now, an interesting experiment would be to use the flag above as an output of a model to see if there are more subtle characteristics of robots such that a model can be used to detect robots, even if they fall through the cracks of rules and heuristics.

In any case, the basic and simple technique of partitioning the models by the type of session is simple to implement and provides a lot of advantages.

## Monday Feb 22, 2010

### The problem with Process Automation is Automation itself

Automation - (Noun) the use of machines to do work that was previously done by people

Replacing people with machines makes it possible to tremendously increase the capacity of a process, which has obvious economic advantages. Automation has been successful in replacing people's work and improving many aspects of the process in addition to the capacity. For example, automated process are much more uniform processing of units.

So what is wrong with Automation? Nothing really, but the fact that there are a few things that people do better than machines. My two favorite human characteristics that tend to be lost with automation are:

1. The Capability of the process to learn
2. The capability of people to discern between different cases
With automation we are able to run the same process, again and again, sometimes repeating the same mistake, again and again. With automation we tend to treat every unit the same way.

Lets take the simple example of automation of answering the phone. Most companies today use IVR software to answer the phone, but how many differentiate between callers? If a valuable bank customer who is approaching retirement age calls the bank after not calling for 5 years, how many banks will actually do the right thing with this customer, which is to kidnap the customer from the IVR and connect them directly with the best agent? How many companies are setup to discover that a problem that affects 1% of their callers is not possible to solve in the IVR, but these customer still have to go through a frustrating tree of options to get to talk with a person that can actually help them?

If there was an actual human that was capable of watching all the interactions in the IVR, and seeing the short and long term results of these calls, and had the capability of affecting the way decisions are made in the IVR the results from automation would be much better.

RTD was designed to infuse these missing elements into business processes. Learning and differentiating (sometimes called "personalization"), thus taking us a step further into better automation of business process, not yet matching all the capabilities of humans, but at least bringing some "common sense" into it.