Sun Ray Bandwidth Monitoring Scripts

Kent Peacock, one of our illustrious Senior Engineers who helped give birth the to Sun Rays almost 10 years ago in Sun Labs has some great scripts that we use internally for watching how much bandwidth Sun Rays use. I asked him if it would be OK to share these scripts with our user community and he said "Please do!". You can grab a copy the scripts here. (Right click, Save Target/Link As...)

There are two scripts, utbwlist2 and utbw. The utbwlist2 script is for looking at all the Sun Ray devices on a server or host group and utbw is for looking at a single Sun Ray, typically ran by the user of the session from a terminal.

UTBWLIST2

utbwlist2 takes the following options:

-a get information from all members of the failover group
-h <server> target the given server
-f <filter> apply this regular expression filter to the IP addresses of the Sun Rays (useful for selecting subnets)

The output is as follows:

IP Address       Token/Name                          Pkts/s  Mbps  KBytes DTCPU
129.146.225.93   OpenPlatform.4090009c24a5d7061e1      91  21.310    2291  21%
129.146.97.135   auth.mk228279                        866   7.839   20067   9%
129.146.127.177  OpenPlatform.4090009c24a1890c091    1288   6.716   34384  14%
129.144.70.192   auth.shinseki                        536   6.033   15444  33%
129.146.55.170   OpenPlatform.4090009c24a5d704360     702   5.951   15234   9%
129.144.71.225   auth.de161882                        595   5.901   15105  18%
129.144.88.26    JavaBadgeNP.4090009c22b1e70d0d10     654   3.202   16392   9%
129.146.39.45    OpenPlatform.4090009c24a18c121f0     254   2.422    6200   9%
129.146.62.218   auth.tt8301                          242   2.376    6083  23%
129.146.35.155   auth.cb201937                        219   1.928    4935   7%
129.146.53.121   JavaBadgeNP.4090009c22b1e70f300c     189   1.801    4610   6%
129.146.39.49    OpenPlatform.4090614573162716094     189   1.709    4375   9%
129.146.32.47    OpenPlatform.4090009c243f2109191     159   1.634    4184   5%
129.144.89.119   OpenPlatform.4090614573162716105     163   1.550    3966   7%
129.144.89.125   OpenPlatform.4090009c24ac6116081     158   1.525    3904   6%

DTUS:                   Packets:                Bandwidth  Mbps     DTU CPU
Active      844  28%    Lost       1748         Total   154.610       1%
Total      2971         Total    515771         Average   0.182
Idle       1972         RActive     999         Ravg      0.154

The top block shows bandwidth use of the top Sun Rays, in decreasing order, where:

IP Address is the IP address of one of the top bandwidth consumers
Token/Name is the token presented by the Sun Ray
Pkts/s average packets received per second over the last 20 seconds
Mbps average bandwidth over 20 seconds
KBytes total KBytes received
DTCPU average CPU used on the Sun Ray.

The summary at the end gives totals for either the current server or the server listed using -h, and either just the server or the whole group if -a is present.

The summary block fields are defined as follows:

DTUS:
Active DTUs that have exceeded a small threshhold (1Kbps) of traffic over the last 20 seconds.
Total Total DTUs connected
Idle Number of DTUs that have not exceeded the traffic threshhold in the last minute

Packets:
Lost Total lost packets in last 20 seconds
Total Total packets received by DTUs
RActive "Real" active DTUS (Total - Idle)

Bandwidth Mbps:

Total Total average bandwidth across all Sun Rays in last 20 seconds
Average Average bandwidth per active Sun Ray
Ravg Average bandwidth per RActive Sun Ray

DTU CPU Total per-DTU average CPU use.


UTBW

The output of the utbw program applies to a single Sun Ray. It is usually run it in a shell window on the Sun Ray that you want to get the information about. It can also take a token as an argument, and it will monitor the Sun Ray bound to that token. The output contains the following fields (all measurements over the last 20 seconds):

lost 0/00% pkts 278697 cpu 0% kbytes 249436 0.145 Mbps 1.6(1.6) ms

Output decoded as follows:

lost/pct
Lost packets and percent of total
pkts Total packets received
cpu DTU cpu used in last 20 seconds
kbytes Kbytes received
Mbps Average bandwidth
xx(yy) Instantaneous latency in milliseconds (xx) and smoothed average (yy)

Comments:

This is awesome :)

I would be very nice to have the same kind of scripts for SSGD ;)

Posted by Remold on September 08, 2008 at 05:52 AM PDT #

One small typo:

RAvg is the average bandwidth per RActive Sun Ray.

Posted by Kent Peacock on September 08, 2008 at 07:02 AM PDT #

Is the download complete? Seems missing both scripts utbwlist2 and utbw. Only the main file utbwscripts is included.

Thanks for sharing!

Posted by Jeroen on September 09, 2008 at 08:39 PM PDT #

utbwscripts is tar archive.. just unpack it and u'll get both scripts

Posted by n1p on September 09, 2008 at 09:03 PM PDT #

Tnx, got them now, workes perfect!

Posted by Jeroen on September 09, 2008 at 09:50 PM PDT #

I downloaded and gunzipped the file, however when I attempt to unpack it using tar xvf i get the following error
tar: blocksize = 18
tar: utbw - cannot create
tar: utbwlist2 - cannot create

please advise

Posted by billy on October 21, 2008 at 04:36 AM PDT #

Billy

If you haven't solved this rename the tar file to .tar. The problem occurs as the tar archive is the same name as the directory it is trying to create.

Cheers

Posted by Bob on November 28, 2008 at 07:53 PM PST #

I am having an issue with packet loss with a clients network. I have used utcapture to look at the loss and wanted to get your opinion on this.
Below is the output...

SVR05

# TERMINALID TIMESTAMP TOTAL PACKET TOTAL LOSS BYTES SENT PERCENT LOSS LATENCY
0021286dc4ab 20100618103221 484284 82649 453596266 12.623
0021286dc7d8 20100618103221 2215074 371950 103839534 425.225
0021286dc4ab 20100618103236 484294 82650 453597076 10.000
0021286dc4b8 20100618103236 856877 173576 716274686 18.159
0021286dc7d8 20100618103236 2215074 371950 103839534 425.225
0021286dc4b8 20100618103251 861285 173697 720830596 2.745
0021286dc7b0 20100618103251 1131402 175190 981489728 10.000
0021286dc7b7 20100618103251 3185497 294676 954656934 425.277
0021286dc7d8 20100618103251 2215083 371950 103840344 11.008
0021286dc4ab 20100618103306 484313 82651 453598696 10.000 26.228
0021286dc4b8 20100618103306 864166 174644 722944504 32.871
0021286dc7b2 20100618103306 4436748 2228 699186676 10.712
0021286dc7b7 20100618103306 3185497 294676 954656934 425.277
0021286dc80a 20100618103306 4588778 4149 870003498 1.700
0021286dc4ab 20100618103321 484313 82651 453598696 26.228
0021286dc7b7 20100618103351 3185525 294677 954659364 11.111
0021286dc4ab 20100618103406 484341 82652 453601126 11.111 14.715
0021286dc82d 20100618103406 3834845 79820 2050539834 10.000
0021286dc4ab 20100618103421 484341 82652 453601126 14.715
0021286dc4ab 20100618103451 484361 82652 453602926 10.895
002128763a83 20100618103451 1370682 156864 1324428868 10.671
0021286dc4b8 20100618103506 864223 174644 722949634 13.817
0021286dc7b0 20100618103506 1131469 175191 981495668 11.111
0021286dc4b8 20100618103521 864223 174644 722949634 13.817
0021286dc7b7 20100618103521 3185563 294677 954662784 10.272
0021286dc90d 20100618103536 3682620 77448 1894804344 11.111
0021286dc79b 20100618103551 1121654 175241 1014330398 11.104
002128763a29 20100618103551 3182501 63905 1345289944 412.593
0021286dc4b8 20100618103606 864252 174644 722952244 29.663
0021286dc7b0 20100618103606 1131498 175191 981498278 15.274
0021286dc7b2 20100618103606 4486945 2430 754504430 2.261
0021286dc80a 20100618103606 4650770 4284 939905852 1.368
0021286dc4b8 20100618103621 864252 174644 722952244 29.663
0021286dc7b0 20100618103621 1131498 175191 981498278 15.274

SVR04

# TERMINALID TIMESTAMP TOTAL PACKET TOTAL LOSS BYTES SENT PERCENT LOSS LATENCY
0021286dc4ab 20100618103213 484284 82649 453596266 12.623
0021286dc4b8 20100618103213 846695 171727 707338384 49.461
0021286dc4ab 20100618103228 484294 82650 453597076 10.000
0021286dc4b8 20100618103228 856877 173576 716274686 18.159
0021286dc7d8 20100618103228 2215074 371950 103839534 425.225
0021286dc7d8 20100618103243 2215083 371950 103840344 11.008
0021286dc4b8 20100618103258 861285 173697 720830596 2.745
0021286dc7b0 20100618103258 1131402 175190 981489728 10.000
0021286dc7b7 20100618103258 3185497 294676 954656934 425.277
0021286dc80a 20100618103258 4588778 4149 870003498 1.700
0021286dc4ab 20100618103313 484313 82651 453598696 10.000 26.228
0021286dc4b8 20100618103313 864166 174644 722944504 32.871
0021286dc7b2 20100618103313 4436748 2228 699186676 10.712
0021286dc7b7 20100618103358 3185525 294677 954659364 11.111
0021286dc4ab 20100618103413 484341 82652 453601126 11.111 14.715
0021286dc82d 20100618103413 3834845 79820 2050539834 10.000
0021286dc4ab 20100618103443 484361 82652 453602926 10.895
002128763a83 20100618103443 1370682 156864 1324428868 10.671
0021286dc4ab 20100618103458 484361 82652 453602926 10.895
002128763a83 20100618103458 1370682 156864 1324428868 10.671
0021286dc4b8 20100618103513 864223 174644 722949634 13.817
0021286dc7b0 20100618103513 1131469 175191 981495668 11.111
0021286dc7b7 20100618103513 3185563 294677 954662784 10.272
0021286dc7b7 20100618103528 3185563 294677 954662784 10.272
0021286dc90d 20100618103528 3682620 77448 1894804344 11.111
0021286dc79b 20100618103543 1121654 175241 1014330398 11.104
0021286dc80a 20100618103558 4650770 4284 939905852 1.368
002128763a29 20100618103558 3182501 63905 1345289944 412.593
0021286dc4b8 20100618103613 864252 174644 722952244 29.663
0021286dc7b0 20100618103613 1131498 175191 981498278 15.274
0021286dc7b2 20100618103613 4486945 2430 754504430 2.261

\^C

Posted by Robert Jaudon on June 18, 2010 at 04:20 AM PDT #

Hi Robert,
Is the server connected via GigE? Have you changed the OS timer via adding set hires_tick=1 to /etc/system? See http://wikis.sun.com/display/SRSS4dot2/Tuning+the+Network

Posted by Thin Guy on June 18, 2010 at 04:53 AM PDT #

Thanks for the help with this. It is maddening.
Yes the servers are all connected via GigE and hires_tick is set to 1.

root@SVR04 /# dladm show-link
nge0 type: non-vlan mtu: 1500 device: nge0
nge1 type: non-vlan mtu: 1500 device: nge1
nge2 type: non-vlan mtu: 1500 device: nge2
nge3 type: non-vlan mtu: 1500 device: nge3

root@SVR04 /# dladm show-dev
nge0 link: up speed: 1000 Mbps duplex: full
nge1 link: unknown speed: 0 Mbps duplex: unknown
nge2 link: unknown speed: 0 Mbps duplex: unknown
nge3 link: unknown speed: 0 Mbps duplex: unknown

root@SVR04 /# cat /etc/system | grep tick
\*\*\*\*\*\*\*\*\* High res tick for network throughput
set hires_tick=1

root@SVR05 /# dladm show-link
nge0 type: non-vlan mtu: 1500 device: nge0
nge1 type: non-vlan mtu: 1500 device: nge1
nge2 type: non-vlan mtu: 1500 device: nge2
nge3 type: non-vlan mtu: 1500 device: nge3
root@SVR05 /# dladm show-dev
nge0 link: up speed: 1000 Mbps duplex: full
nge1 link: unknown speed: 0 Mbps duplex: unknown
nge2 link: unknown speed: 0 Mbps duplex: unknown
nge3 link: unknown speed: 0 Mbps duplex: unknown

root@SVR05 /# cat /etc/system | grep tick
\*\*\*\*\*\*\*\*\* High res tick for network throughput
set hires_tick=1
terminalID=0021286dc7b0
terminalIPA=10.0.3.92
model=SunRayP8
currentAuth=10.0.0.37
currentFW=GUI4.2_77_2009.10.19.17.01
currentBarrier=420
currentBarrierLevel=420
currentMTU=1300
Subnet=255.255.252.0
Router=10.0.0.11
LeaseTim=43200
DHCPServer=10.0.0.56
INFORMServer=10.0.0.56
FwSrvr=10.0.0.18
FWservType=FWSrvr
speed=100F
parmsVersion=GUI4.2_77_2009.10.19.17.01
parmsBarrier=420
parmsServers=10.0.0.18,10.0.0.35,10.0.0.37
parmsSelect=random
parms.MTU=1300
configMTU=1500
AltAuth=10.0.0.18,10.0.0.35,10.0.0.37
dnsList=10.0.0.54,10.0.0.55
dname=blah.local
confNetType=DHCP
stopqon=0
bandwidth=100000000

Posted by Robert Jaudon on June 18, 2010 at 05:11 AM PDT #

Hmmm. OK, so first the obvious. The server was rebooted after changing /etc/system correct? (I know you guys are experts, but I have to ask).

Has a dtu been visited and verified that it indeed coming up @ 100F?

Can you describe the LAN? I see a MTU of 1300. Typically that low we are talking WAN connections or VPN's due to the overhead of IPSec.

Can you share what type of switch at the server and the DTU's. On the DTU end, if Cisco, is portfast enabled for the ports the DTU's are connected to?

Finally, if you force nge0 to 100 full dup, does the problem go away?

Posted by Thin Guy on June 18, 2010 at 05:31 AM PDT #

Thanks for your time...

1. Yes....the servers have been rebooted
2. I have verified the DTU's are coming up 100F
3. HP Procurve Switches fiber connected to closet HP switches
4. Closet HP Procurve switches set to 100 full auto negotiation to DTU's
5. I have not tried to set nge0 to 100 full yet....still waiting to talk the network guy to verify the backend.

Posted by Robert Jaudon on June 18, 2010 at 05:50 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Think Thin is a collection of bloggers that work with Oracle's Virtual Desktop portfolio of products.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today