### Profiling a rate

#### By Darryl Gove on Sep 02, 2009

Sometimes it's the rate of doing something which is the target that needs to be improved through optimisation. ie increase the widgets per second of some application. I've just been looking at a code that estimated performance by counting the number of computations completed in a known constant length of time. The code was showing a performance regression, and I wanted to find out what changed. The analysis is kind of counter intuitive, so I thought I'd share an example with you.

Here's an example code that does a computation for a fixed length of time, in this case about 30 seconds:

#include <sys/time.h> #include <stdio.h> #include <stdlib.h> double f1(int i) { double t=0; while (i-->0) {t+=t;} return t; } double f2(int i) { double t=0; while (i-->0) {t+=t;} return t; } void main(int argc,char\*\*args) { struct timeval now; long startsecs; long count=0; int vcount; if (argc!=2){ printf("Needs a number to be passed in\\n"); exit(0);} vcount=atoi(args[1]); gettimeofday(&now,0); startsecs=now.tv_sec; do { f1(100); f2(vcount); count++; gettimeofday(&now,0); } while (now.tv_sec<startsecs+30); printf("Iterations %i duration %i rate %f\\n",count, now.tv_sec-startsecs, 1.0\*count/(now.tv_sec-startsecs)); }

The code takes a command line parameter to indicate the number of iterations to do in function f2, function f1 always does 100 iterations.

If I compile and run this code under the performance analyzer with 50 and 70 as the commandline parameters I get the following profile:

Description | 50 Iterations | 70 Iterations |

Total time | 26.6s | 25.87s |

f1 | 11.89s | 10.66s |

gettimeofday | 9.9s | 8.76s |

f2 | 4.53s | 6.09s |

Main | 0.28s | 0.37s |

Total iterations | 942,684 | 841,921 |

We can make the following observation when we go from 70 down to 50 for parameter passed to f2, we see a 12% gain in the total rate. This is to be expected as we are reducing the total number of iterations of the pair of loops in f1 and f2 will reduce from 170 down to 150, which is the same ~12% gain.

Where it gets counter intuitive is that for the run which achieves the higher rate, the time spent in the routines f1 and gettimeofday increases - by the same 12%. This is counter intuitive because increased time in a routine normally indicates that the routine is the one to be investigated, but for a 'rate' situation the opposite is true. These routines are being well behaved. The way to think about it is that each unit of work needs a smidgeon of time in both of these routines, if the number of units of work increases, then the absolute amount of time in these two routines needs to increase linearly with the increase in rate.

However, the time in routine f2 decreases as the rate increases. This is the routine which has been "improved" to get the better rate. The other thing to note is that the time went from ~6s to ~4.5s, but the rate went from 841k to 941k, so the time per unit work dropped further than that - this makes comparing the profiles of the two runs more tricky.

Note that Amdahl's law would still tell us that the routines that need to be optimised are the ones where the time is spent - so in one sense nothing has changed. But my particular scenario today is figuring out what has changed in the executable when compiled in two different ways that leads to the performance gain. In this context, I now know the routine, and I can dig into the assembly code to figure out the why.