Sizing an RTD installation - Part 3 (final)

Now that we know how to compute the number of requests per second and we have seen other things that need to be considered, we can finally compute the number of CPUs to cope with the desired load. This number is actually quite easy to compute. For planning purposes we usually account for 100 requests/second/CPU. This leaves enough room for higher peak loads or other underestimations in the process. In typical cases we see a higher throughput per CPU.

For example, if we need to support 300 requests per second we can plan for 3 CPUs for Decision Service. The other processes, Learning Server and Workbench Server can usually be run either on one of the Decision Service CPUs or on their own.

Now, lets say that there is the desire to use standard servers with 2 CPUs, each CPU with 4 cores. In this case, one server would have more than enough computing power to cope with the number of requests per second. Nevertheless, we may choose to have 2 of these servers, that is 16 total cores to provide for high availability.

If this same configuration was used with Disaster Recovery then we may end up running two servers in two sites with a total of 32 CPU cores. That, of course, is more computing power than necessary to cope with the load.

An alternative that is counter intuitive for people running transactional applications is to have RTD running on just one server, and pay the price of non-availability. This may be acceptable depending on the application. For example in offer optimization and if the expected down time of a single server is just a couple of hours per year, then the cost of having non redundant servers maybe better than the cost of having a HA setup.

In any case, the numbers above are for basic planning purposes. If there are many sessions being initialized and not so many other kinds of events then the equations may look different as a session initialization usually takes more resources. Additionally, the load balancing strategy in front of the RTD servers also affects performance. Maximum speed is attained when the load balancing scheme is capable of maintaining session affinity.

Finally, for really high throughput in the thousands of requests per second the strategy is to partition the servers along some strict lines. This partitining strategy can be taken all the way into the database.

Thoes posting are very useful. A couple of points that might be interesting to discuss because they might need to be factored in, in some cases: -CPU size: "we usually account for 100 requests/second/CPU". I assume this is for a last generation Xeon core (like 3.6 Ghz). Depending on the CPU used (Itanium, Sun T2k vs M5k servers, IBM P5 or P6 ...) some ratio may have to be applied. -Memory: is there a baseline of how much memory is used by typical session? For example 1 entity of 100 attributes using 10 bytes on average each can we assume the session weigh 1k byte, or is there a baseline to start a session even before loading entities?. Assume (just an example) 8k bytes/session, and that we have at peak time 100k concurent sessions (derived from number of calls/min and average session duration) that would 800MB. Not much memory in itself, but since RTD is 32-bit Arch it may mean it is necessary to start at least 2 JVMs in a cluster to stand the load (regardless of high availability) -load on database server: Customers will typically ask how to size the RTD database server itself (size, CPU and memory) as well as the load on the datases holding the entitiy information. It is likely to be rather light overall, however pounding a db server with a small number of queries repeated with high frequency with or without bind variables may require some specific tuning by the dba depending on the underlying technology. -Unusual -not to say non best practice design- processing requirements (complex function processing, calling web services, calling db stored procedure, calling decision web services with very large payload...) may impact response time and sometimes result in additional CPU spec needs Olivier

Posted by Olivier Lemaire on December 07, 2009 at 07:23 PM PST #

Thanks Olivier. We use 100 requests per second per CPU as a general rule of thumb, but it is very imprecise as it depends on the CPU. Also, with modern multicore CPUs this performance is probably applicable to each core. I still recommend using the 100 rule of thumb as a planning first swag and fine tune it later as the application takes shape. For extremely large volumes, in the order of 1000 or more requests per second I would recommend a more precise planning and measurement, including ensuring proper session affinity in the load balancing layer. For even higher loads, in the order of thousands per second partitioning is necessary. As far as memory usage is concerned, your assumption about the 32 bit is incorrect. When running on a 64 bit CPU, OS and JVM, RTD is a 64 bit application and can take advantage of larger heaps. We have several customers running on 64 bit environments. The footprint in memory of the sessions vary quite a lot, so it is difficult to give estimates. Nevertheless, I agree with you that we would benefit from having rules-of-thumb for them too. We should be able to get some benchmarking done to produce these numbers. With respect to the database size, we do have a spreadsheet that allows you to estimate the size of the database. In this context it is interesting to note that most of the size of the database comes from optional tables, like the statistics and history tables. Therefoe the size of the DB is mainly dictated by the settings. For example, if you have in the order of 100 requests per second, and ask RTD to keep the event history for 6 months and to emit statistics every minute you could easily get to 10 Terabytes. While the same configuration with statistics emmited every hour and no history being saved it would use a couple of hundred GB at the most.

Posted by on December 08, 2009 at 12:12 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.


« June 2016