Is my workload recommended for a CoolThread UltraSPARC T1 server ( T1000 - T2000 ) ?



Since the pre-release and announcement of UltraSPARC T1 systems (T1000 - T2000),
our customers coming in the Sun Solution Benchmark Center have been very interested to know if their
application will work well on UltraSPARC T1. While assessing the multi-threaded nature of a
workload is easy using standard system tools, it is less straightforward to obtain at will
the amount and proportion of floating points instructions executed by a system. Some complex
tools exist but we would like to have a simple go/no-go binary that would answer
only this question. (If you are interested in a more detailed analysis of a cpu behavior, please
ask me about a great tool called ripc )

The key information coming from our UltraSPARC T1 engineers is the choice they had to make (because
of space limitations) to have a single floating point unit shared by the 8 cores (and 32 strands).
Please note that this challenge has been solved on the next release of this processor.

They tell us that in there best estimation any workload doing more than 2% of the total amount of instructions
using floating-points  will not be recommended for UltraSPARC T1. Between 1% and 2% is the gray area where
they recommend us to try because a number of the simpler FPU commands were moved to the
core and dont incur a 40 cycles penalty.


The idea of this article is to explain how to get this information and provide a simple tool
(for all UltraSPARC based systems).

The UltraSPARC III (or UltraSPARC IV core) has a maximum of four instructions that can
be fetched from cache in a clock cycle and a total of sixteen fetched instructions that
can wait for an execution unit to become available. Six parallel execution units exist on
the chip : one load/store unit, one branch unit, two identical integer Arithmetic Logical
Units, one add (and therefore substract) floating point unit named FA_PIPE (see FP 1
on the schema below and one multiply(and therefore divide) floating point unit named FM_PIPE.
(see FP 2 below).

UltraSparcIII_diagram

For the UltraSparc III (and IV or IV+), multiple performance instrumentation counters are provided to analyze the CPU performance
behavior under load but for our purpose we need to consider only three of them :

1-The total number of instructions completed  not counting annulled, mispredicted or
trapped instructions. This is the Instr_cnt counter

2-The total number of instructions completed on the FA_PIPE. This is the FA_pipe_completion
counter.

3-The total number of instructions completed on the FM_PIPE. This is the FM_pipe_completion
counter.

Note that the counters 2 and 3 are also incremented for some type of VIS instructions. Therefore,
they have to be considered only as estimations.

For the UltraSPARC T1 based systems, it is simpler as the single counter FP_instr_cnt is directly provided.


As you already deducted, we will be able to determine the percentage of floationg point
operations with the formula :

%FP_ops = 100 \* (FA_pipe_completion + FM_pipe_completion) / Instr_cnt

We are also able to provide this simple heuristic :

if ( %FP_ops < 1%) -> Recommended for UltraSPARC T1
else if (%FP_ops  between 1% to 2%) -> Possible fit for UltraSPARC T1
else -> Not recommended for UltraSPARC T1

To do this, here is a program named pfp that you can use as pfp <duration in seconds>
If you are on a T1000 or T2000 system, please use the flag -n as this program does not detect the cpu
type in its first release.Please remember to run your workload first and while it is running,
use this program as shown below.

paris # ./pfp 30
We observed 22756679 instructions separated in 0.20% floating point and 99.80% others
This workload is recommended for UltraSPARC T1 systems.

ontario # ./pfp -n 30
We observed 342593950 instructions separated in 0.77% floating point and 99.33% others
This workload is recommended for UltraSPARC T1 systems.

If you just want the percentage of floating point instructions, you can also do
paris # ./pfp -s 30
0.20

Finally, you can also use the tool on Solaris 8 or Solaris 9 with :
Dtrace # ./pfp -ps 30
1.97

The binary of this tool can be found here.


Comments:

0.77+99.33=100.1 (percent). Is it a bug or a blog typo?

Posted by Igor on December 07, 2005 at 10:50 PM PST #

Floating point error ;-p

Posted by Ceri Davies on December 08, 2005 at 12:48 AM PST #

Can you please post the source so I can compile it on a x86 Solaris 10 machine? I want to see if Oracle Spatial or its open source cousin PostGIS (based on PostgreSQL) is suitable for a T1000/T2000. I think it won't be, because storing the geometry objects involves too many FP ops, but I just want to confirm.

Thanks

Posted by Amit Kulkarni on December 08, 2005 at 01:53 PM PST #

x86 CPUs (or at least, current implementations thereof) probably don't have the counter registers that pfp uses, so I imagine its source code is not likely to be useful for this purpose. But yeah, CDDLed source code would be intersting nonetheless...

Posted by Rich Teer on December 09, 2005 at 01:34 AM PST #

Rich,
thanks for your comments. MrBenchmark graciously emailed me with almost the same info.

Posted by Amit Kulkarni on December 09, 2005 at 03:27 AM PST #

> (for all UltraSPARC based systems).

I try this tool on our Solaris 8 UltraSPARC systems and get:

% pfp  30
cpustat: %pic0 cannot measure event 'FA_pipe_completion' on this cpu
cpustat: two events must be specified
Usage:
        cpustat [-c events] [-nhD] [interval [count]]
[...]

% uname -a
SunOS shasta 5.8 Generic_117350-12 sun4u sparc SUNW,Sun-Blade-100

We're trying to decide whether we should buy some Niagara systems.

Posted by Jim Gottlieb on December 26, 2005 at 05:43 PM PST #

Is it possible to collect similar information UltraSPARC I, II or IIIi processors (possibly still running Solaris 8), or do they lack the necessary performance counters? I'd like to evaluate our mailserver (currently an Enterprise 250 ;-), where the primary load is perl running spamassassin, and it has been suggested that perl may be floating point intensive. I'd very much like to check this on the real (production) machine instead of playing with a test box.

Posted by Rainer Orth on January 19, 2006 at 07:20 PM PST #

</head> <body lang=EN-US style='tab-interval:.5in'> <div class=Section1> <p class=MsoNormal>We were doing some investigation on a SUN v890 8 <span class=SpellE><span class=GramE>cpu</span></span> /16 cores and found some anomalies with the tool.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>a.) Executing the tool in a loop with 15 second interval, some executions would fail displaying the usage message.<span style='mso-spacerun:yes'>  </span>It only happens intermittently, it appears to get worse the busier the system is.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>The command that is executed <span class=GramE>is ./</span><span class=SpellE>pfp</span> -p 15<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>The result is:<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>Incorrect platform - Please check your arguments.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>Usage:<o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span><span class=SpellE><span class=GramE>pfp</span></span> [-<span class=SpellE>nps</span>] <duration><o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span>-n<span style='mso-spacerun:yes'>      </span><span class=GramE>If</span> you are running on a <span class=SpellE>UltraSPARC</span> T1 system<o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span>-p<span style='mso-spacerun:yes'>      </span>If you are running on a <span class=SpellE>UltraSPARC</span> III, IV or IV+ system with SunOS 2.8 or 2.9<o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span>-s<span style='mso-spacerun:yes'>      </span>Strip the output to a single number<o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span><span class=GramE>Example :</span> On a V890+ running Solaris 10 monitoring a 60 seconds workload,<o:p></o:p>

<p class=MsoNormal><span style='mso-spacerun:yes'>        </span><span class=GramE>run</span> ./<span class=SpellE>pfp</span> 60, if Solaris 9 run ./<span class=SpellE>pfp</span> -p 60<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>b.) The results are inaccurate in some cases, which <span class=GramE>raises</span> concerns of the overall reliability of the data. Notice the impossible % calculations.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>We observed 699318219 instructions separated<o:p></o:p>

<p class=MsoNormal><span class=GramE>in</span> 14.94% floating point and 85.06% others.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>This workload is not recommended for <span class=SpellE>UltraSPARC</span> T1 systems.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>We observed 3864955 instructions separated<o:p></o:p>

<p class=MsoNormal><span class=GramE>in</span> 2719.14% floating point and -2619.14% others.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>This workload is not recommended for <span class=SpellE>UltraSPARC</span> T1 systems.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>We observed 1366458850 instructions separated<o:p></o:p>

<p class=MsoNormal><span class=GramE>in</span> 7.74% floating point and 92.26% others.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>This workload is not recommended for <span class=SpellE>UltraSPARC</span> T1 systems.<o:p></o:p>

<p class=MsoNormal><o:p> </o:p>

<p class=MsoNormal>Best Regards,<o:p></o:p>

<p class=MsoNormal>Bob Rion

<p class=MsoNormal><o:p> </o:p>

</div> </body> </html>

Posted by Bob Rion on February 05, 2006 at 11:43 PM PST #

We made some tests on SFV800 with 2\*1200MHz CPU's (Solaris 8) and 8\*1200MHz CPU's (Solaris 10) and the behaviour seen by Bob Rion from the post (February 06, 2006 at 07:43 AM PST) is similar. One question: the pfp knows to compute the Instr_cnt, FA_PIPE and FM_PIPE from all CPU's US III, IV and IV+? Best regards, Michel

Posted by Michel Boitos on February 28, 2006 at 10:45 PM PST #

we couldn't start the tools from the crontab. The error messsage is "Incorrect platform - Please check your arguments" Any idea? Regards bbr

Posted by bbr on March 01, 2006 at 08:30 PM PST #

We are now not able to get pfp to execute on a T2000 system, is there a newer version, I think we have the correct flags specified, see below root[mil-bcpnetbu:/var/tmp]# ./pfp -n 60 Incorrect platform - Please check your arguments. Usage: pfp [-nps] <duration> -n If you are running on a UltraSPARC T1 system -p If you are running on a UltraSPARC III, IV or IV+ system with SunOS 2.8 or 2.9 -s Strip the output to a single number Example : On a V890+ running Solaris 10 monitoring a 60 seconds workload, run ./pfp 60, if Solaris 9 run ./pfp -p 60 root[mil-bcpnetbu:/var/tmp]# uname -a SunOS mil-bcpnetbu 5.10 Generic_118822-27 sun4v sparc SUNW,Sun-Fire-T200 Bob Rion

Posted by Bob Rion on March 23, 2006 at 01:19 AM PST #

On our loaner T2000 pfp would only work with a duration of 17 seconds or less. Running Sol 10. SunOS t2ktest 5.10 Generic_118833-03 sun4v sparc SUNW,Sun-Fire-T200. Testing was not exhaustive but it seems that anything longer gives the Incorrect platform message. Useful tool though.

Posted by Taj Newburn on April 06, 2006 at 06:10 AM PDT #

I would like to perform the check over long periods, but I eventually got an error (it works fine with short periods): # uname -a SunOS quadri 5.9 Generic_118558-10 sun4u sparc SUNW,Sun-Fire-480R # ./pfp -p 10 We observed 111190709 instructions separated in 0.80% floating point and 99.20% others. This workload is recommended for UltraSPARC T1 systems. # ./pfp -p 3600 Incorrect platform - Please check your arguments. (The error raised after one hour.)

Posted by Olivier Chédru on April 06, 2006 at 11:21 PM PDT #

pfp works off/on on my 4800 . I ran it for 50 minutes and got '-359.20' . Usually at the end of the interval I get Incorrect platform - Please check your arguments. Usage: pfp [-nps] <duration> -n If you are running on a UltraSPARC T1 system -p If you are running on a UltraSPARC III, IV or IV+ system with SunOS 2.8 or 2.9 -s Strip the output to a single number Example : On a V890+ running Solaris 10 monitoring a 60 seconds workload, run ./pfp 60, if Solaris 9 run ./pfp -p 60 Sometime it works . This is a 4800 running solaris 8.

Posted by John on August 11, 2006 at 05:04 AM PDT #

pfp is designed to run for a small period of time, many times and taking an average. But it is now obsolete. Please use CoolTools. See http://cooltools.sunsource.net/cooltst/index.html regards MrBenchmark

Posted by MrBenchmark on August 11, 2006 at 06:52 AM PDT #

Solaris switches to integer calculations if the FPU is too high. Can you please explain what the algorithm is for deciding to switch?

Thanks,
hcc

Posted by hcc on September 25, 2007 at 02:20 AM PDT #

Great post and draw. Thank you for sharing.

Posted by links of london jewellery on November 23, 2009 at 10:40 AM PST #

I would like to perform the check over long periods it works fine with short periods.

Posted by Egitim on December 09, 2010 at 10:58 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder