By Darryl Gove-Oracle on Apr 19, 2007
The paper on cpu2006 working set size is meant to provide estimates of the realistic memory footprint of an application. The OS can report SIZE (how much memory is reserved for an application), RSS (how much of an application is resident in memory), but not how much of that memory is actually filled with data that is touched as part of the run of the application.
A way of envisioning this is to consider a program that allocates an array that is sufficient to handle the largest data set that might be input. In usual runs of the application most of that array will not be used. Looking at the RSS and SIZE metrics provided by the OS will give a very different indicator of the amount of memory required than a more careful inspection of the code.
The approach used in the paper is was to use the Shade emulator to capture the address of memory accesses. Then record the the use of each cacheline. Two definitions were also used. The working set size is the number of cachelines touched in a billion cycles (a processor should be able to execute a billion instructions in a time that can be measured in seconds). The core working set size is the number of cachelines that were touched both in this interval and the proceeding one (i.e. the line was reused).
Again the focus of this methodology was to provide something which was not tied to a particular hardware implementation. An alternative would have been to use a cache simulator and identify cache miss rates for a given configuration; but that would have been 'contaminated' by the decision as to the cache configuration. The second issue with a cache simulator is that cache miss rates typically drop by orders of magnitude as the cache size increases, even very small caches can be very effective. The object of the study was not to find the most effective cache size, but to find out how much memory the benchmarks were actively using.
I found the results, initially, quite surprising. The working set sizes for CPU2000 and CPU2006 are not that dissimilar. The floating point workloads in CPU2006 do have greater working set size, also a few integer workloads. However, the memory requirements for CPU2006 are significantly greater at 1GB vs 256MB. So the codes hold a lot more data in memory, but often do not aggressively exercise all of that data.
I recently found this more traditional cache miss rate study for the CPU2006 benchmarks. Unfortunately there seems to be pop-up ads on the site.