X

An Oracle blog about Transactional locks

The perils of negative scalability

Dave Dice
Senior Research Scientist

I've seen the following issue confound customers and colleagues of late, so thought it worth a blog entry.

Lets say you have an application that exhibits negative scalability. That is, if you were to plot throughput on the Y-axis and concurrency on the X-axis the shape of the curve would be convex downward -- performance climbs up to an apex and then falls off. (How can this happen? A common reason is that the communication overheads start to dominate and the benefit of concurrency is overcome by communication and synchronization costs). Under such circumstances it's common to introduce some type of admission control -- say, simple back-off or more elaborate mechanisms -- to restrict concurrency. Ideally, this yields an asymptotic curve where performance remains constant after reaching the peak, avoiding any fall-off.

If you tune the performance of such an application using the usual measure-analyze-modify cycle but pay attention only to the throughput values at high concurrency levels then you might be badly misled. The usual development feedback loop can fail because poor "optimizations" that slow down the code may actually serve as inadvertent implicit back-off (contention management or admission control) that will attenuate the negative scalability at higher concurrency levels but also needlessly impair performance at lower concurrency levels. Ideally, back-off should be applied only as needed, in response to contention.

A related effect is that inserting diagnostic probes might yield better performance in the region of negative scalability because of probe overhead -- a performance "Heisenbug" where performance improves when execution is more closely observed.

The take-away is that we should be careful to measure the performance of any proposed change over a wide range of concurrency values, and not just at the extremes.

An interesting human analog is Brooks's law. The same issues -- the overheads of orchestrating large numbers of humans or threads -- may underly both effects.

Join the discussion

Comments ( 3 )
  • GreenEyed Monday, September 28, 2009

    I've seen it happen usually related to access to databases. You correct a bottleneck in your application, and all of a sudden, all those requests "fly" straight to the database and bring it dow to its knees, ending up with worse performance, because you did not know the bottleneck was actually "protecting" the database from excessive load.

    Nothing that cannot be fixed, but it is true that it is sometimes difficult to explain how "fixing" something can result in worse performance... until you fix the hidden-and-now-uncovered problem.

    S!


  • guest Monday, September 28, 2009

    Good point ...

    This type of effect (or close analogs) manifests in quite a number of ways. A few others that come to mind are CSMA-CD communication protocols when arbitrating for a share resource (spectrum, access to the physical "wire", etc) and "anticipatory scheduling" (http://en.wikipedia.org/wiki/Anticipatory_scheduling).

    Regards, Dave


  • David Dice Monday, October 26, 2009

    See also: www.cs.sfu.ca/~fedorova/papers/wiosca06.pdf.

    Alexandra Fedorova, Margo Seltzer and Michael D. Smith, A Non-Work-Conserving Operating System Scheduler for SMT Processors , In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), in conjunction with ISCA-33, June 2006


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services