TimesTen In-Memory Database
for Extreme Performance

Server Scalability Curves - a Missing Law of Computing?

Doug Hood
Evangelist for Oracle TimesTen, Oracle In-Memory and Oracle NoSQL

A Puzzling Customer Performance Question

Recently a customer wanted to know why using 1/4 less connections did not affect their system throughput. It was not obvious at first and then I remembered a fundamental computing concept - server scalability curves. In short, the customer's server was overloaded.


The most important Laws of Computing

Computer Science is a relatively young and evolving discipline, but there are some fundamental truisms or laws:

Some Important Computer Laws


The Missing Law of Server Scalability Curves

While Neil Gunther’s Universal Scalability Law can be great at modelling systems for scalability it can scare off those who do not have a strong grasp of queuing theory and Little's Law. A simpler, more intuitive concept [without the math] is Server Scalability Curves. Many computer performance tuning books and internet articles talk about concepts related to the server performance curve of a specific server, but I am not aware of the generalization of the concept. In over 30 years of computing, I have never seen a server system that does not fit within the model of Server Scalability Curves.


Throughput and Latency vs Connections


Interpretation of the Law of Server Scalability Curves

If you are a computer performance engineer, then these graphs are likely familiar and my comments obvious. If you are not a computer performance engineer, then you many only be familiar with the light load zone of the Server Scalability Curve [ie the linear scalability part].

Linear scalability

Obviously not all computer servers (SQL RDBMS, NoSQL, HTTP, REST, SOAP, CORBA, Tuxedo, CICS etc) have the same scalability curve, but they all have a scalability curve with the following characteristics:

  • Under light load, adding more connections will tend to increase the system throughput

Light Load Zone

  • Under heavy load, adding more connections will tend to neither increase nor decrease the throughput

Heavy Workload Zone

  • In the buckle zone, adding more connections will tend to decrease the throughout

Buckle Zone

There is also a direct relationship between latency and throughput:

  • Under light load, the latency will be the lowest and tend to be flat

Latency Light Load Zone

  • Under heavy load, the latency will tend to slowly increase with the number of connections

Latency Heavy Load Zone

  • In the buckle zone, the latency will tend to grow without bounds. This is often called the hockey stick where the latency becomes un-acceptable

Latency Buckle Zone

While I claim that all servers have a scalability curve, they all tend to differ in how they behave in the light load, heavy load and buckle zones.


Real world Servers

Real world servers are not perfect and tend to vary in the following ways:
  • How close that they get to linear scalability in the light load zone
  • How long that they can continue to scale in the light load zone [ie what is the maximum throughput given a workload and hardware/software configuration]
  • How long they can achieve a mostly constant throughput in the heavy load zone
  • How quickly that their throughput decreases [and latency increases] in the buckle zone

Scalability Curvers for Two Different Servers


An answer to my customer's question

So to explain why my customer's throughput did not change when they decreased their connections by 1/4, the answer was because their system was in the heavy load zone of the server scalability curve.


How can you determine the zones of the Server Scalability Curve

You need to test the server given a workload and hardware configuration as each combination of workload and hardware will tend to give a different scalability curve. A simple way to create a server scalability curve is:

  • Use powers of 2 [eg 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536] to vary the number of connections
    • Servers cannot scale forever as they have a finite number of CPU cores, a fixed number of network cards and a finite bandwidth and response time for their persistent storage
  • Once the throughput stops increasing, use a binary search to determine where the light load zone become the heavy load zone
  • Once the start of the heavy load zone is determined, increase [double] the number of connections until the throughout decreases
  • Use a binary search to determine where the the heavy load zone ends and the buckle zone starts
  • You do not need to test to far within the buckle zone as the latency will tend to increase rapidly [without bounds]
  • Chart the throughput vs connections and latency vs connections graphs


What can you do with a Server Scalability Curve

Having a server scalability curve for your server / hardware / workload, allows you to:

  • Determine the optimal throughput given an acceptable latency
  • Determine how many servers you many need to use for your desired workload [ie use horizontal scalability]
  • Have a baseline for when comparing new hardware, server software versions or different [software] servers


Disclaimer: These are my personal thoughts and do not represent Oracle's official viewpoint in any way, shape, or form.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.