« Corestat | Main

Corestat for UltraSPARC T2/T2+

Corestat for UltraSPARC T2/T2+ :

With the launch of UltraSPARC T2+ processor based servers, corestat needs an upgradation. Updated version of corestat is now available off the link from this blog. Also note that the same version (V1.2.3) should work on T5220, T5240 and T5240 servers.

Understanding processor utilization is important for performance analysis and capacity planning. With the launch of UltraSPARC T2 based servers I would like to revisit the topic of core utilization.

As we have seen earlier, for a Chip Multi Threaded (CMT) processor, like UltraSPARC T1, CPU utilization reported by conventional tools like mpstat/vmstat and core utilization reported using hardware performance counters in the processor are different metrics and both are equally important in performance analysis and tuning.

Before discussing the details about core utilization of UltraSPARC T2 and the details about corestat let us take a quick look at what does a core on UltraSPARC T2 look like. UltraSPARC T2 extends the CMT architecture of T1. It consists of eight cores where each core has eight hardware threads. Hardware threads within a core are grouped into two sets of four threads each. There are two integer pipelines within a core and each set of four threads share one integer pipeline. In this sense, the resources available for computation within a core are doubled from that in UltraSPARC T1. It is worth understanding that threads within a core do not switch pipelines and the assignment of threads to a pipeline is fixed and hardwired.

One more important addition to the compute resources within a core is a Floating Point Unit (FPU). Each core of T2, includes a FPU shared by all eight threads from that core. Other shared resources within a core include Level-1 Instruction (I) and Data (D) cache and Translation Look aside Buffers (TLBs) like I-TLB and D-TLB. All cores share a 4 MB Level-2 (L2) cache. Including these there are key features why both single thread and multi thread performance of UltraSPARC T2 is better than T1.

A quick look at the UltraSPARC T2 architecture features shows following enhancements which benefit single thread performance :
  • Increased frequency - 1400 MHz
  • Lower instruction latencies
  • Better Floating Point performance
  • Hardware TLB miss handling for I-TLB and D-TLB
  • Larger D-TLB size (128 entries v/s 64 entries)
  • Larger L2 cache (4 MB v/s 3 MB)
  • Full support of VIS 2.0 instruction set. No kernel emulation
Similarly following are some of the features of UltraSPARC T2 that benefit multi thread performance :
  • Two integer pipelines per core
  • Twice the number of hardware threads (64 v/s 32)
  • Higher L2 cache set associativity. 16 way compared to 12 way
  • Instruction cache being 8 way associative compared to 4 way
  • Dedicated Floating point unit per core shared by all 8 strands, improved FP throughput
  • Memory interface supports FBDIMMs for higher capacity and bandwidth
  • Support for shared context feature where multiple contexts share the same entry in the TLB for mappings to the same address segment
  • Streaming Processing Unit (SPU) per core for on chip encryption/decryption support
Now, let us look at the topic of core utilization. All the important concepts like thread scheduling, idle hardware thread, stalled thread etc. have been introduced in my earlier blog on T1. All those concepts generally hold good for T2 however there are subtle differences such as on T2 an integer pipeline remaining idle doesn't mean a full core remains idle. Both the pipelines within a core can concurrently execute one instruction per cycle hence at 1417 MHz frequency, a core can execute maximum of 2x1417x1000x1000 instructions/second.

Considering these differences, corestat for UltraSPARC T2/T2+ has been enhanced and can be downloaded from here . The main enhancements are :
  1. It now reports the utilization of each pipeline separately. By default only the integer pipe utilization is reported.
  2. There is a new command line option "-g" added to report the FPU utilization along with integer utilization.
  3. Corestat detects frequency of the target system at run time.

While the usage remains same, corestat for UltraSPARC T2 can be used in two modes :
  1. For online monitoring purpose, it requires root privileges. This is the default mode of operation. Default reporting interval is 10 sec and it assumes the frequency of 1417 MHz.
  2. It can be used to report core utilization by post processing already sampled cpustat data using following command line :
cpustat -n -c pic0=Instr_cnt,pic1=Instr_FGU_arithmetic
                   -c pic0=Instr_cnt,pic1=Instr_FGU_arithmetic,
                                 nouser,sys 1

$ corestat
   Frequency = 1050 MHz
             corestat : Permission denied. Needs root privilege...

Usage : corestat [-g] [-v] [[-f <infile>] [-i <interval>] [-r <freq>]]

                  Default mode : Report Integer Pipeline Utilization
                  -g                     : Report FPU usage
                  -v                     : Report version number
                  -f infile            : Filename containing sampled cpustat data
                  -i interval       : Reporting interval in sec (default = 10 sec)
                  -r freq             : Processor frequency in MHz (default = 1417 MHz)

        # corestat -g

           Core Utilization for Integer pipeline
     Core,Int-pipe     %Usr     %Sys     %Usr+Sys
     -------------             -----         -----        --------
         0,0                   0.00          0.19      0.20
         0,1                   0.00          0.01      0.01
         1,0                   0.00          0.03      0.03
         1,1                   0.00          0.01      0.01
         2,0                   1.15          0.02      1.16
         2,1                   0.00          0.01      0.01
         3,0                   0.02          0.02      0.04
         3,1                   0.00          0.01      0.01
         4,0                   0.00          0.02      0.03
         4,1                   0.00          0.01      0.01
         5,0                   0.02          0.01      0.03
         5,1                   0.00          0.01      0.01
         6,0                   0.05          0.03      0.08
         6,1                   0.00          0.01      0.01
         7,0                   0.00          0.03      0.03
         7,1                   0.00          0.01      0.01
     -------------             -----         -----    ------
         Avg                   0.08          0.03      0.10

                      FPU Utilization
              Core         %Usr     %Sys     %Usr+Sys
         -------------         -----         -----     --------
              0                0.02          0.01      0.03
              1                0.02          0.01      0.03
              2                0.01          0.01      0.03
              3                0.01          0.01      0.03
              4                0.02         0.01      0.04
              5                0.02          0.02      0.04
              6                0.02          0.02      0.04
              7                0.02          0.02      0.04
         -------------         -----         -----    ------
             Avg           0.02          0.02      0.04

As far as interpretation of corestat data is concerned, all the points mentioned in an earlier blog with respect to T1, hold good. Since core saturation (measured using corestat) and virtual CPU saturation (measured using vmstat/mpstat) are two different aspects, we need to monitor both simultaneously in order to determine whether an application is likely to saturate the core by using fewer application threads. In such cases, increasing workload (e.g. by increasing the number of threads) may not yield any more performance. On the other hand, most often we will see applications having high Cycles Per Instructions (CPI) and thereby not being able to saturate the cores fully before achieving 100% CPU utilization.

While I make this new version of corestat available here.. we are already looking at a number of RFEs received as comments on my earlier blog and via e-mails to me. Some of the points being considered. Stay tuned !!


« Corestat | Main
Comments:

Any reason as to why the current version 1.2.2 does not work on any of my machines even if I set the frequency to 1000?

# ./corestat -r 1000
Frequency = 1000 MHz
ERROR : Invalid cpu counter information!

Posted by Roman Pestka on November 26, 2007 at 11:37 AM IST #

Well, for Niagara-1, please continue using previous 1.1 version. For Niagara-2 only use the 1.2.2 version. On any other SPARC system, it will not be useful..

Posted by Ravindra Talashikar on November 26, 2007 at 03:50 PM IST #

Thanks for writing this up - it's a very educative article.

Posted by Prashant Srinivasan on April 29, 2008 at 07:59 PM IST #

Seems like you need a patch for the T5240. This seems to have worked. Hopefully it won't break the rest of the script. Can you verify?!?!?

--- corestat.v.1.2.2.orig Fri Jul 18 11:28:17 2008
+++ corestat.v.1.2.2 Fri Jul 18 11:29:13 2008
@@ -55,7 +55,9 @@
\*DEFAULT_FREQUENCY = \\1417 ; # 1417 MHz
\*DEFAULT_INTERVAL = \\10 ; # 10 sec
\*VERSION = \\"1.2.2" ; # Version number
-\*MAX_CPUS = \\64 ; # Max CPUs for T2
+# JPF - 18jul08 - increase max cpus for T5240
+#\*MAX_CPUS = \\64 ; # Max CPUs for T2
+\*MAX_CPUS = \\128 ; # Max CPUs for T2+
\*INT_TYPE = \\0 ; # Integer instructions
\*FP_TYPE = \\1 ; # FP instructions
#

Posted by J Ferraro on July 21, 2008 at 02:35 AM IST #

I can't get corestat to work on my T5120. I keep getting:
ERROR : Invalid cpu counter information!

I have SUNWcpc.v and SUNWcpcu installed.

Posted by Daniel Smith on November 20, 2008 at 04:44 PM IST #

Hi Ravi,
When I try to run corestat (version 1.2.3) on my T5210 server (1415 MHz CPU), I get the following o/p:
--
Frequency = 1415 MHz
ERROR : Invalid cpu counter information!
--

I am running it as sudo. Can you please help me identify what I am doing wrong here?

Thanks

Posted by Amir Hameed on May 12, 2009 at 12:20 PM IST #

corestat + zones + pools

I configured a processor set, associated it with a pool and then set-up a zone with that pool.

So in my system cores 0&1 are bound to the zone - using the pool and pset, and cores 2&3 are in the default pool in global zone.

When I run corestat from global zone
it reports the data about core 2&3.

How can I get report for core 0&1 (bound to zone)? When I try corestat inside the zone, it shows only "Frequency = 1165 MHz
ERROR : Invalid cpu counter information!".

Thank you very much.

Posted by Tomas on June 23, 2009 at 07:42 AM IST #

Hello Ravindra,

We are seeing this unusual behaviour of our application on the T1/T2 processor - we have a multi-process application & expect that all of them are equally loading the cpu.

Due to performance problems we used corestat to check the core utilization. Why is only core 1 (and we see inside it's always thread 1) showing significantly more load than others? Why is there no equal distribution of the application across the cpus?

BR
Aditya

Core Utilization for Integer pipeline
Core,Int-pipe %Usr %Sys %Usr+Sys
------------- ----- ----- --------
0,0 29.44 5.54 34.98
0,1 7.90 2.07 9.97
1,0 12.67 2.80 15.46
1,1 5.64 1.02 6.66
2,0 9.83 2.31 12.14
2,1 4.49 1.78 6.27
3,0 11.69 3.04 14.74
3,1 6.01 2.02 8.03
4,0 8.06 3.45 11.51
4,1 4.73 3.12 7.84
5,0 10.50 7.63 18.13
5,1 3.88 1.68 5.56
6,0 11.26 2.43 13.69
6,1 4.82 1.32 6.14
7,0 9.53 3.87 13.41
7,1 4.77 2.87 7.64
------------- ----- ----- ------
Avg 9.08 2.93 12.01

Posted by Aditya Dhruva on August 13, 2009 at 02:13 PM IST #

We can't get this to run on any or our T2000's either.

{root@testbox1}/var/tmp/corestat.v.1.2.3
$ ./corestat.v.1.2.3 -g
Frequency = 1000 MHz
ERROR : Invalid cpu counter information!

{root@testbox1}/var/tmp/corestat.v.1.2.3
$ ./corestat.v.1.2.3 -r 1000 -g
Frequency = 1000 MHz
ERROR : Invalid cpu counter information!

{root@testbox1}/var/tmp/corestat.v.1.2.3
$ cat /etc/release
Solaris 10 10/08 s10s_u6wos_07b SPARC
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 27 October 2008

{root@testbox1}/var/tmp/corestat.v.1.2.3
$ uname -a
SunOS testbox1 5.10 Generic_138888-08 sun4v sparc SUNW,Sun-Fire-T200

Posted by steve mitchell on September 03, 2009 at 05:51 PM IST #

If you have problems and get error messages like:
"ERROR : Invalid cpu counter information!"

Make sure /usr/sbin is in your $PATH.

Posted by Henkis on September 23, 2009 at 10:25 AM IST #

shouldn't the instr count be added with in the minsamples timeframe rather than assigning or does the cpustat command take care of that?

if ($nsamples < $minsamples) {
$cpu_stat[$cpu_id][$mode][$INT_TYPE] += $instr_ctr;

Posted by user on December 17, 2009 at 09:09 PM IST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

travi

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks