News, tips, partners, and perspectives for the Oracle Solaris operating system

Some thoughts about scalability

In a recent email thread about scalability and why Solaris is especially good at it, some long-time performance gurus summarized the subject matter so well, that I thought it worth sharing with a broader community.  They agreed, so here it is:  

What is scalability, and why is Solaris so good in not preventing applications to scale?

The good scalability is a classic observation about systems which have been profiled for multiple years. They not only perform well at high load, they degrade less on overload.

The cause is usually described mathematically: the causes of the slope of the response time (degradation) curve is dominated by the service time of the single slowest component.  A new product usually has a few large bottlenecks, and because they're large, the response time curve take off for infinity early, and goes almost straight up. Overloading the system even a little bit causes it to "hit the wall" and seem to hang. If X is load and Y is response time, the curve looks like this:

That's called the "hockey-stick" curve in the trade ;-) Response time is fairly flat until an inflection point, then heads up like a homesick angel.

A well profiled mature product has lots of little bottlenecks, one of which is the largest, and which therefor sets the inflection point and the slope. With a small bottleneck, the slope is gentle, and during an overload, the users see the system as somewhat slow, not hung.  This looks a little like this:

The reason you get bad performance at high loads on unprofiled programs is that above 80% load, there is a good chance that multiple users will make requests at the same time, and momentarily drive the system to its inflection point into degradation. As the system is un-optimal, the degradation at that point is large and user-visible. This usually hits at around 70 or 80%, sometimes even less.

We've been hunting down and fixing the slow bits for a long time, and have a very very gentle degradation curve.  PCs, on the other hand, tend to hit the wall really easily, and often.  Some of their legendary unreliability is really bogus: users overload their machines, assume they've hung, and then reboot.

In particular, the fine-grained spin-locking in Solaris is often celebrated as being responsible for a lot of its superior scaling.  In contrast, coarse-grained locks inflate the response time of inherently-serial locks, with the resulting impact
just as Amdahl's Law would dictate.  A large set of evolved architecturally-aware features make the Solaris scheduler itself a huge factor in the superior scaling of Solaris.  Other features such as evolved AIO options and preemption
control which have been well-integrated by Oracle provide even more reasons better scaling.

I should add that superior scaling is not all about peak throughput and the average response-time curve as a function of load, but also tends to manifest as reduced variance in response times in many cases - as well as the "graceful degradation" on the far side of peak throughput that you mentioned.  Those are factors I'd like to see more-frequently characterized - but the habit in the benchmarking world is often to simply celebrate the peak results.

A last thing I'd mention is that the foundation of this was laid out when Sun's version of SVR4 was defined. We pretty much threw out AT&T's implementation and did our own with the idea of full pre-emption, multi-threading and all the rest. It's much much easier when the foundation is built solidly to deliver things on top of it. If you also need
to rebuild the foundation, it's far harder to make things work. One could make a pretty solid argument that ,without the foundation, the rest would have been much, much harder.

Big hardware on top of a great foundation leads to customers who throw more work at the boxes who expose problems that we fix that leads to customers throwing even more work at the boxes...

To see all this in action, here's what you need for a live demo:

Start the old Gnome perfmeter (or perfbar or mpstat) on a customer system running an interactive load.  You can use the Java2D demo that comes with any JDK.  Then fire up dummy CPU loads and push the %CPU higher and higher in front of the customer's eyes, until it finally starts to feel slow. They'll be amazed at how close to 100% they are before they see any actual, user-visible degradation.

And if you want to make their brain explode, use SRM to grant their app 80% of the cpu and then start dozens of dummy CPU loads in another zone to force the CPU to pin at 100%, while their performance stays fine. Of course, this is incredible enough that you may just convince them that you're faking it ;-)

We did this in a demo to techies one immersion week, and even though they knew what we were doing, there was a lot of jaws left on the demo-room floor when they saw the theory in practice.

This article has been compiled from several email messages by

David Collier-Brown
James Litchfield
Bob Sneed

Thank you!
(A German version of this article is available in the German part of this blog.)

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.