By hinkthink on Oct 14, 2005
As promised, here are some comparisons of Perflib to the more popular BLAS implementation available for the Opteron hardware. All these timings were run on the exact same hardware configuration. The Performance Library numbers are produced on a single cpu of a 4-way opteron running at 2.2 Ghz with 8 GB of Ram. The GOTO, ATLAS, and ACML runs were performed on the same hardware but running SuSE 9.3 pro.
These are about as useful as the SPARC graphs I posted yesterday. Even such simple timers leave much room for variation. Bind to a processor or not? Run timers with cold or warm dcache and icache? Use the 2.5 gnu64 Acml library or the 2.7 pgi64 library? A clever marketing critter could easily take this data and proclaim virtually any of the libraries tested was the best BLAS implementation.
I contend that benchmarks or timers such as these, while marginally useful for producing glossy sales material, are next to useless for the application developer who is searching for performance delivered to a particular application. The much referenced Linpack benchmark is only marginally superior to these contrived performance measurement tools in my opinion.
The reader might notice that Perflib isn't the top performer in all graphs I posted. Even though I'm the technical lead for the Perflib product, I don't care. I don't care because it is my belief that timers in this class have very little value when trying to measure the worth of a scientific library. Tomorrow, I'll post some data showing what I believe to be a better method for judging the worth of a scientific library.