cooltst: Cool Threads Selection Tool
By workloadcharacterization on Apr 05, 2006
Sun Fire T1000/T2000 servers excel in workloads that can exploit the parallelism of the T1 chip, which has eight cores each with four threads. These servers are targeted at commercial environments, which usually have a low floating point workload component according to most studies.
Cool Threads Selection Tool, cooltst, was designed as a screening tool to allow administrators to guage application parallelism and workload floating point intensity. This tool is available from [opensparc.org]. In this entry, I would like to describe the tool's design and capabilities. A description of how to interpret the output of the tool is included in the readme file in the tarball.
The starting point for cooltst was to use the data collector, wcstat, developed for the Workload Characterization Project and adapt and simplify it for the much narrower purposes of screening workloads for Sun Fire T1000/T2000 servers. We first had to determine what distinguished well running from mediocre performing workloads. Through the collection and analysis of performance data the engineering team narrowed the scope of test coverage to FP and single threaded bottlenecks.
We are in the process of making wcstat for Solaris (SPARC and x86) and Linux (x86 and x86_64) available through [opensparc.org]. wcstat collects 15-25 MB of data in 25-30 minutes so we have an environment for processing the data. This processing environment is very complex and we are considering the use of a portal the set up a free service bureau of sorts to make all that data more usable. In future blog entries, I will describe wcstat in some detail.
We then looked at the target systems. The migration of workloads from Lintel platforms seemed natural; unfortunately, Linux does not have good tools for monitoring performance counters. We evaluated the public domain tools and found them lacking in surprising ways. None of them had the flexibility and capability of cpustat, a Solaris tool for reading processor performance counters. We wrote our own tool, FPstat, to count floating point instructions and micro-instructions. We then generalized this tool and called it PCstat. We are in the process of open sourcing PCstat for Xeon and Opteron to the Linux community. When it becomes available, I will blog about it here.
We looked at the parallelism of LWPs in Solaris and Linux. Solaris has several ways to look at thread performance. We found that understanding performance of Linux threads was a mess. It depended on the version of Linux, 2.4 or 2.6, and the library being used. I plan to make some blog entries about this because this area is interesting. Solaris offers definite advantages over Linux when you want to analyze multithreaded application performance.
cooltst looks simple on the surface. Under the covers, it hides the complexity of collecting and summarizing data. I encourage you to download and take it for a test drive.