Monday Mar 22, 2010

Ignoring Robots - Or Better Yet, Counting Them Separately

It is quite common to have web sessions that are undesirable from the point of view of analytics. For example, when there are either internal or external robots that check the site's health, index it or just extract information from it. These robotic session do not behave like humans and if their volume is high enough they can sway the statistics and models.

One easy way to deal with these sessions is to define a partitioning variable for all the models that is a flag indicating whether the session is "Normal" or "Robot". Then all the reports and the predictions can use the "Normal" partition, while the counts and statistics for Robots are still available.

In order for this to work, though, it is necessary to have two conditions:

1. It is possible to identify the Robotic sessions.
2. No learning happens before the identification of the session as a robot.

The first point is obvious, but the second may require some explanation. While the default in RTD is to learn at the end of the session, it is possible to learn in any entry point. This is a setting for each model. There are various reasons to learn in a specific entry point, for example if there is a desire to capture exactly and precisely the data in the session at the time the event happened as opposed to including changes to the end of the session.

In any case, if RTD has already learned on the session before the identification of a robot was done there is no way to retract this learning.

Identifying the robotic sessions can be done through the use of rules and heuristics. For example we may use some of the following:

  1. Maintain a list of known robotic IPs or domains
  2. Detect very long sessions, lasting more than a few hours or visiting more than 500 pages
  3. Detect "robotic" behaviors like a methodic click on all the link of every page
  4. Detect a session with 10 pages clicked at exactly 20 second intervals
  5. Detect extensive non-linear navigation
Now, an interesting experiment would be to use the flag above as an output of a model to see if there are more subtle characteristics of robots such that a model can be used to detect robots, even if they fall through the cracks of rules and heuristics.

In any case, the basic and simple technique of partitioning the models by the type of session is simple to implement and provides a lot of advantages.

Monday Feb 22, 2010

The problem with Process Automation is Automation itself

Automation - (Noun) the use of machines to do work that was previously done by people

Replacing people with machines makes it possible to tremendously increase the capacity of a process, which has obvious economic advantages. Automation has been successful in replacing people's work and improving many aspects of the process in addition to the capacity. For example, automated process are much more uniform processing of units.

So what is wrong with Automation? Nothing really, but the fact that there are a few things that people do better than machines. My two favorite human characteristics that tend to be lost with automation are:

  1. The Capability of the process to learn
  2. The capability of people to discern between different cases
With automation we are able to run the same process, again and again, sometimes repeating the same mistake, again and again. With automation we tend to treat every unit the same way.

Lets take the simple example of automation of answering the phone. Most companies today use IVR software to answer the phone, but how many differentiate between callers? If a valuable bank customer who is approaching retirement age calls the bank after not calling for 5 years, how many banks will actually do the right thing with this customer, which is to kidnap the customer from the IVR and connect them directly with the best agent? How many companies are setup to discover that a problem that affects 1% of their callers is not possible to solve in the IVR, but these customer still have to go through a frustrating tree of options to get to talk with a person that can actually help them?

If there was an actual human that was capable of watching all the interactions in the IVR, and seeing the short and long term results of these calls, and had the capability of affecting the way decisions are made in the IVR the results from automation would be much better.

RTD was designed to infuse these missing elements into business processes. Learning and differentiating (sometimes called "personalization"), thus taking us a step further into better automation of business process, not yet matching all the capabilities of humans, but at least bringing some "common sense" into it.

Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.


« July 2016