### How many kernel engineers does it take to solve one differential equation?

#### By rvs on May 11, 2008

Now, before you can answer this fundamental questions, lets take care of
some basics first. Do you really know what does "load average" (as reported by
getloadavg()
or by various utilities such as
uptime and
w) really tell
you? Or even what does it actually mean? Out of 10 random folks informally surveyed in
the hallways of Sun Quentin:

First things first: the last dude from the survey got pretty close to giving a correct answer. Indeed, Load Average numbers are moving averages, but they are not simple. In fact, they are exponential moving averages (or EMA for all the financially minded folks out there). So in a nutshell, what kernel does to gather load average (LA) statistics is rather simple: once every X seconds it samples the size of the runqueue (which gives it the number of runnable processes at that point in time)
and updates the current values of LA numbers using this simple EMA formula:

In reality you have 3 variables like current_average to take care of. Each corresponding
to a 1, 5 and 15 minutes window. But they all do the same thing: they tell you how
many processes on average (an exponential moving one, but still) there were waiting
in the runqueue during the last 1, 5 and 15 minutes. To put it in simple terms: if your
LA numbers are constantly smaller than the total number of cores you have -- you got ripped
off when you bought your quad-core AMD cpu (there's nothing for some of the cores to do
at all, you've got more toys than kids). If the numbers are constantly bigger -- go ahead
and get an extra quad core CPU (get some shiny toys for those kids of yours). Simple huh?
Almost, there's one little question still to be answered: what's the ideal value for the α constant?
Now, before you run to your favorite statistics handbook, here's a shocker: have you
realized that everytime you compute a formula like this:

what you're really doing is numerically solving an ordinary differential equation
using
the backward Euler method? Yep, that's right. There's no escape. You can call it
an EMA, you can call it a low-pass filter, you can call it whatever you like, but the
truth of the matter is: you're reconstructing function Y, such that:

and each consecutive sample Yn is h units away from a previous one.
Now that we know how does the equation look like it would be a good time to
dust off that old calculus book you have sitting in your basement. Once you open
it, you'll get the bad news: there's no compact solution for the

But obviously, as pointed out by Dr. Gunther, that's not what the Linux community wants. Nor is it what the OpenSolaris community wants. In fact unless your operating system was done by Fernando Corbato and co. you're out of luck. For some unknown reason all of the opensource implementations of load average stick with defining α as an exponent of some sorts:

This makes very little sense (although doesn't do much harm either, since numerically
these exponent values are pretty close to the real deal) and leads to all sorts of
complications
in calculating the actual numbers. Of course, if instead of a simple α==60/65==12/13
(Linux kernel has to sample the actual length of the runqueue only once per 5 seconds,
hence the h==5) you stick with exp(-5sec/1min) no wonder you get in trouble! And for what?
Well, at least I don't know. Now, don't get me wrong, it is not like different
αs are a crime. It is definitely not more of a problem than using logarithms
of base 7 or something. On the other hand, when a Linux α effectively gives me a time
constant of 59.204812 seconds and is numerically slightly different from what Solaris uses
one has to wonder what kind of a esoteric knowledge needs to be possessed in order to appreciate something like that. Again, I don't really know.

But after reading the excerpt from the Multics manual reproduced by Dr. Gunther, I do know this: it takes at least one kernel engineer from 60s to actually be able to solve a differential equation, the new guys seem to have flunked the calculus.

- 2 said "Duh?"
- 7 said "Load average measures an average number of processes over 1 minute; 5 minutes and 15 minutes respectively"
- 1 said "Load average numbers represent a simple moving average of the number of process in a scheduler's run queue with a window of 1 minute, 5 minutes and 15 minutes respectively

First things first: the last dude from the survey got pretty close to giving a correct answer. Indeed, Load Average numbers are moving averages, but they are not simple. In fact, they are exponential moving averages (or EMA for all the financially minded folks out there). So in a nutshell, what kernel does to gather load average (LA) statistics is rather simple: once every X seconds it samples the size of the runqueue (which gives it the number of runnable processes at that point in time)

**A bit of digression:**one of the primary jobs of the Operating System is to make sure that all of the processes can finish up as quickly as they can. The CPU is a precious toy that needs to be given to each process fairly, as though they were children in a large family and a kernel was their mother. The good news is: most of the time most of the children are sleeping and don't need any toys to begin with. The reasons of why they are asleep are plentiful: they could be waiting for the IO to complete or waiting for somebody to wake them up when a particular condition gets met, etc. Processes, not children, that is. The important thing is: they do not require CPU and do not fight for it. The few that are ready to go are called "runnable" and are placed on a runqueue. If your system has fewer CPUs than the amount of processes in that queue than a kernel has to make sure that CPU time is fairly allocated between all needy kids\^H\^H\^H\^Hprocesses (you have more kids than toys).

**current_average = α \* previous_average + (1-α) \* runqueue_length**

**Y[i+1] = α \* Y[i] + (1-α) \* X[i]**

dY | 1-α | |

-- | = -T\*Y(t) + T\*X(t), where T = | ----- |

dt | α\*h |

**A fundamental digression:**if you think about what a typical ordinary differential equation is, you can quickly realize that, in simple terms, it is exactly the recipe for constructing the resulting function one step at a time. Recall a high school definition of the derivative: f'(t) ~= (f(t+h) - f(t))/h, where the weird looking ~= designates a limit. On the other hand ~= could as well designate an approximation. And if you do read it that way -- congratulations! You are on a brink of discovery of a Euler method of numerically solving ordinary differential equations all by yourself. But what's more important is that you can now see that a derivate is a very good tool for describing how neighboring values of the function are supposed to relate to each other:

**f[next] = f[prev]+h\*f'[prev]**If you want extra precision you can write it like this (and struggle with a bit more complex computation, which is what we have to do for computing LAs):

**f[next] = f[prev]+h\*f'[next]**But the basic fundamental principle remains the same: derivatives describe how neighboring values of the function relate to each other, hence every time you see something like:

**x[i] = x[i-1] + EXPRESSION;**don't be fooled -- there's a differential equation being solved here.

**dY/dt = -T\*Y(t) + T\*X(t)**. You have to integrate X(t) and that's a downer. On the positive side, though: as long as our X(t)==0 we have a much simpler and well know equation to solve:**dY/dt = -T\*Y(t)**. Which, of course makes**Y(t) == Y(0)\*exp(-T\*t)**and also hints at what the good values for T might be. Indeed, since in our case the values of X(t) are the measurements of the runqueue's length, everytime X(t)==0 it means that the runqueue is empty. The next logical question to ask is: how do we want our load average numbers to behave when the size of the runque drops to 0. We don't have much freedom, though, since for every segment where X(t)==0 we actually \*know\* how our numbers will behave: Y(t) == Y(0)\*exp(-T\*t). The only knob we could turn is the value of 'T'. Control engineering, that is also governed by the same differential equation, has long standing conventions for how the controlling device should react on changing in the process. These conventions are based on a notion of a time constant. And the way it is defined makes the following two statements true:- After a period of one time constant the function reaches ~37% of its initial value
- After five time constants the function reaches a value less than 1% of its initial value

But obviously, as pointed out by Dr. Gunther, that's not what the Linux community wants. Nor is it what the OpenSolaris community wants. In fact unless your operating system was done by Fernando Corbato and co. you're out of luck. For some unknown reason all of the opensource implementations of load average stick with defining α as an exponent of some sorts:

Linux | Solaris | What makes sense | Difference | |

h | 5sec | 1sec | 1sec | |

1min α | exp(-5/60) | exp(-1/60) | 60/61 | 0.0001 |

5min α | exp(-5/300) | exp(-1/300) | 300/301 | 0.000005 |

15min α | exp(-5/900) | exp(-1/900) | 900/901 | 0.0000006 |

But after reading the excerpt from the Multics manual reproduced by Dr. Gunther, I do know this: it takes at least one kernel engineer from 60s to actually be able to solve a differential equation, the new guys seem to have flunked the calculus.

Fascinating. I love this old-skool Unix cruft. :-)

Note that in the 3BSD distribution (which is the oldest source code I could find here) in usr/src/sys/sys/vmsched.c it says:

/\*

\* Constants for averages over 1, 5, and 15 minutes

\* when sampling at 5 second intervals.

\*/

double cexp[3] = {

0.9200444146293232, /\* exp(-1/12) \*/

0.9834714538216174, /\* exp(-1/60) \*/

0.9944598480048967, /\* exp(-1/180) \*/

};

The file vmsched.c is dated Feb 14, 1980 in the 3BSD distribution. So it is very likely that Linux just copied from BSD...

Posted by

Volker A. Brandton May 11, 2008 at 08:39 PM PDT #I would guess the exp() is moved in the constant on order to avoid recalculating it. So it the formular looks like this:

if (new_load > old_load)

new_load += scale-1;

cpu_load = (old_load\*(scale-1) + new_load) / scale;

in linux kernel.

Gruss

Bernd

Posted by

Bernd Eckenfelson May 12, 2008 at 05:53 AM PDT #To: Volker A. Brandt

Thanks for quoting 3BSD (do you have a URL for that,

by the way?). It seems that a singularity between how

LAs used to be calculated and how they are calculated

right now lies somewhere in AT&T. Now, if only I worked

for Google, I could've asked Ken ;-)

Thanks,

Roman.

Posted by

Roman V Shaposhnikon May 12, 2008 at 07:01 AM PDT #To: Bernd Eckenfels

Could you, please elaborate on your point? See, my main

argument is that exp() has no place being part of the

numerical solution of the above differential equation to

begin with. Of course, the fact that Load Average numbers

have not been standardized (yep, LAs are NOT part of

POSIX) contributes to clouding the matter. Without prior

knowledge of what they are supposed to measure an alpha constant of 60/61 is as good as exp(-1/60). And if for some

reason we really have to stick with exp(-1/60) than your

argument makes sense. But! I maintain that the most logical way of approaching LAs is to make the first one be

aligned at a time constant (IOW, our system start to enter a stable state), the second one be aligned at 5\*T (IOW, our system is now completely transitioned to the steady state) and the third one just being N\*5\*T just to be able to check that the second number is indeed representative of a complete transition into a steady state. Now, why for the scheduler the steady states is supposed to start after 1 minute is debatable, but 1 minute is at least a nice round number ;-)

Thanks,

Roman.

Posted by

Roman V Shaposhnikon May 12, 2008 at 07:12 AM PDT #Hi Roman:

The Unix Heritage Society in Australia have some nice pages under

http://minnie.tuhs.org/TUHS/

The 3BSD source is in their archive; since their first mirror in the US does not work and the second one has "tux" in its name :-), here's a URL into the Canadian mirror:

ftp://ftp.darwinsys.com/pub/UnixArchive/4BSD/Distributions/3bsd.tar.gz

There should also be some AT&T stuff floating around the net but I can't find the links, offhand. I also don't remember exactly what then-Caldera released...

Regards -- Volker

Posted by

Volker A. Brandton May 12, 2008 at 11:16 PM PDT #i liked the information it helped me withmy work.

Posted by

gueston August 06, 2008 at 02:30 PM PDT #I have received several similar emails like this one.

Posted by

links londonon December 01, 2009 at 09:57 AM PST #<a href=" http://www.linksoflondonbest.com/links-of-london-chains-c-7.html">links of london friendship bracelet</a>

<a href=" http://www.linksoflondonbest.com/links-of-london-rings-c-6.html">links of london sweetie</a>

<a href=" http://www.linksoflondonbest.com/links-of-london-rings-c-6.html">links of london ring</a>

Posted by

linksoflondonon December 15, 2009 at 02:34 PM PST #