By Michel Adar on Nov 17, 2010
As RTD implementations become more and more sophisticated and the applications extend the reach of decisions far beyond selecting the next best offer we have been recommending some design decisions in order to ensure a desired level of performance.
By far, the most significant factor affecting performance is external system access. In particular database access. This goes for reads as well as writes. Here are a few tips that are easy to implement and are good to keep in mind when designing a configuration;
- Data that is repeatedly used in decisions by different sessions should be cached. Examples include Offer Metadata, Product Catalog, Content Catalog, and Zip Code Demographics.
- If possible, data that will be needed in decisions should be pre-fetched. For example, customer profiles could be loaded at the very beginning of a session.
- A good storage system for data is an Oracle Coherence Cache. Particularly if it is configured with local storage in the same app server as RTD. Data can include customer profile, event history, etc.
- When writing to the database and there is no need to be transactional and synchronous, use RTD's batch writing capabilities. This can increase write performance by an order of magnitude.
- Avoid unnecessary writes and writing unnecessary data. For example, avoid writing metadata together with event data if the metadata can be linked.
- Consider using stored procedures when updating several tables to minimize roundtrips to the database
- If a result set is potentially very large, consider wrapping the query with a stored procedure that limits the number of rows returned. For example, if the application calls for loading the purchase history of a customer, and the median length of the list is 3 purchases, but there are 15 customers with 10000 purchases or more, processing these [good] customers will take long - it may be acceptable from the point of view of the application logic to just load a maximum of the latest 100 purchases.
- When loading metadata, avoid loading data that will not be used. For example, if there are 500k products in the catalog, but realistically only 90k have any real chance of being selected for a recommendation, do the filtering when loading the data and avoid loading the full list.
- Asynchronous processing is not free - avoid unnecessary processing. For example, a decision may take 5 ms of actual CPU processing. That would limit the theoretical throughput of a single CPU to 200 decisions per second. If we add 15 ms of asynchronous processing per decision, we will not be affecting response time, but the throughput will be afected - the thoretical throughput being reduced to 50 per second.
In addition to these tips, it is important also for the environment to be propery setup to achieve peak performance. Some tips include:
- Prefer physical servers to virtualized ones
- Always calculate to have at least two CPU cores per JVM
- Make sure the memory requirements and settings match the available memory to avoid swapping JVM memory
- If using virtualized servers make sure that CPUs are not overallocated. That is, do not run 5 virtual machines configured for 2 CPUs each on an 8 core system. While such a setup may be acceptable for some applications, with throughput intensive applications like RTD such a setup would certainly cause performance problems and these problems are difficult to diagnose.
- If using virtualized servers make sure the virtual machine's configured memory will be resident in physical memory. If the Guest believes that it has 4GB of memory, but the host needs to use swap to fulfill that amount of memory performance and availability will suffer. Problems in this area are very difficult to diagnose because to the Guest OS it looks as if CPU cycles were stolen.