Friday Jan 06, 2012

Recording Events

There is always more than one way to skin a cat; it's just that some ways to excoriate a feline are more efficient than others. This is certainly true for the way in which we can record events against choices in RTD, and the differences in performance can be striking.

getChoice

A common approach to record events against static choices is to use the getChoice API.

Choice ch = MyChoiceGroup.getChoice("MyChoiceId");
ch.recordEvent("Clicked");

On the first line, we are asking RTD to go through the list of all choices in MyChoiceGroup and retrieve one particular choice keyed MyChoiceId. On the second, we record a Clicked event against the returned choice. 

Because we are asking RTD to go through the list of all choices, we require that RTD has a full list of choices available; even if we only end up using a single choice. If we use the same code for dynamic choices, RTD will thus have to fetch and instantiate the full list of dynamic choices in order to find the single one we are interested in. This may be an expensive operation, as it may require accessing an external database or web service; execution of complex custom code; and/or instantiating a large number of dynamic choices.

(Note that dynamic choices do not cache at the choice level. They cache at the entity level. These entities hold the data for the dynamic choices but not the choices themselves. The RTD API could perhaps be optimized differently, but the current implementation will instantiate all the dynamic choices and then find the one the API call is looking for. Consider also that the actual source data used for the dynamic choices could come from anywhere and need not come from cached entities at all; it might even be generated on the fly through custom functions. This flexibility in sourcing dynamic choices makes retrieving a single choice non-trivial from the RTD API perspective.)

This approach will certainly work, but may waste precious time and resources retrieving and instantiating dynamic choices that are never used.

getPrototype

For recording an event, it is sufficient that we have a choice that has the desired SDOId; all other choice attributes are irrelevant for this purpose. A more efficient way to record events against dynamic choices is therefore to create a new empty dynamic choice; assign it an SDOId; and record the event against that, rather than retrieving the 'actual' dynamic choice. The result in terms of statistics and learning are the same, but in this approach there is no need for RTD to retrieve any choices at all.

We can instantiate an empty dynamic choice using the getPrototype API.

Choice ch = new MyDynamicGroupChoice(MyDynamicGroup.getPrototype());
ch.setSDOId("MyDynamicGroup" + "$" + "MyChoiceId");
ch.recordEvent("Clicked");

This approach can provide significant performance improvements in implementations with a large number of dynamic choices; or in implementations where retrieving dynamic choices is a non-trivial complex operation and entity caching proves insufficient.

Being able to record events on choices that are not retrievable through the dynamic list is an additional advantage that has several interesting applications which we will explore in future posts.

(Special thanks to Michel Adar for bringing this to my attention and providing an initial draft of this article.)

Wednesday Nov 17, 2010

Performance Tips

As RTD implementations become more and more sophisticated and the applications extend the reach of decisions far beyond selecting the next best offer we have been recommending some design decisions in order to ensure a desired level of performance.

By far, the most significant factor affecting performance is external system access. In particular database access. This goes for reads as well as writes. Here are a few tips that are easy to implement and are good to keep in mind when designing a configuration;

  1. Data that is repeatedly used in decisions by different sessions should be cached. Examples include Offer Metadata, Product Catalog, Content Catalog, and Zip Code Demographics.
  2. If possible, data that will be needed in decisions should be pre-fetched. For example, customer profiles could be loaded at the very beginning of a session.
  3. A good storage system for data is an Oracle Coherence Cache. Particularly if it is configured with local storage in the same app server as RTD. Data can include customer profile, event history, etc.
  4. When writing to the database and there is no need to be transactional and synchronous, use RTD's batch writing capabilities. This can increase write performance by an order of magnitude.
  5. Avoid unnecessary writes and writing unnecessary data. For example, avoid writing metadata together with event data if the metadata can be linked.
  6. Consider using stored procedures when updating several tables to minimize roundtrips to the database
  7. If a result set is potentially very large, consider wrapping the query with a stored procedure that limits the number of rows returned. For example, if the application calls for loading the purchase history of a customer, and the median length of the list is 3 purchases, but there are 15 customers with 10000 purchases or more, processing these [good] customers will take long - it may be acceptable from the point of view of the application logic to just load a maximum of the latest 100 purchases.
  8. When loading metadata, avoid loading data that will not be used. For example, if there are 500k products in the catalog, but realistically only 90k have any real chance of being selected for a recommendation, do the filtering when loading the data and avoid loading the full list.
  9. Asynchronous processing is not free - avoid unnecessary processing. For example, a decision may take 5 ms of actual CPU processing. That would limit the theoretical throughput of a single CPU to 200 decisions per second. If we add 15 ms of asynchronous processing per decision, we will not be affecting response time, but the throughput will be afected - the thoretical throughput being reduced to 50 per second.

In addition to these tips, it is important also for the environment to be propery setup to achieve peak performance. Some tips include:

  1. Prefer physical servers to virtualized ones
  2. Always calculate to have at least two CPU cores per JVM
  3. Make sure the memory requirements and settings match the available memory to avoid swapping JVM memory
  4. If using virtualized servers make sure that CPUs are not overallocated. That is, do not run 5 virtual machines configured for 2 CPUs each on an 8 core system. While such a setup may be acceptable for some applications, with throughput intensive applications like RTD such a setup would certainly cause performance problems and these problems are difficult to diagnose.
  5. If using virtualized servers make sure the virtual machine's configured memory will be resident in physical memory. If the Guest believes that it has 4GB of memory, but the host needs to use swap to fulfill that amount of memory performance and availability will suffer. Problems in this area are very difficult to diagnose because to the Guest OS it looks as if CPU cycles were stolen.
About

Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today