SGE quick and dirty how to find jobs on 'bad' slots

I occasionally have a need to find queues in Sun Grid Engine that are in one of the possibly problematic states which have an occupied slot. It is just infrequent enough that I don't remember exactly how I did it the last time.

qstat -f | awk '$6~/[cdsuE]/ && $3!~/\^[0]/'
queuename qtype used/tot. load_avg arch states
zone.q@r130c24z0.network.com BIP 1/1 -NA- sol-amd64 adu
zone.q@r130c24z1.network.com BIP 1/1 -NA- sol-amd64 adu

An alternate is "qstat -f | awk '$6~/[cdsuE]/ && $3~/\^[1-9]/'" which also avoids printing the header line. In the example above 'state' in $6 matches 's' and 'used' does not begin with '0'.

The possibly more elegant 'qstat -f -qs cdsuE' still requires a second comparison in awk of '$0!~/--/' to filter out the queue separator lines. (qstat -f -qs acduE | awk '$0!~/--/ && $3!~/\^[0]/')


Finally because I can never remember what exactly all the queue states are and the qstat man page doesn't have the nice table:


aoACD – Number of queue instances that are in at least one of the following states:
a – Load threshold alarm
o – Orphaned
A – Suspend threshold alarm
C – Suspended by calendar
D – Disabled by calendar

 

cdsuE – Number of queue instances that are in at least one of the following states:
c – Configuration ambiguous
d – Disabled
s – Suspended
u – Unknown
E – Error

 

Job State/Status:

d(eletion),  E(rror), h(old), r(unning), R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting).

References: SGE (N1GE 6.0) -- Monitoring and Controlling Queues

Edit: Added Job Status, literally couldn't find that in any of the online docs (notwithstanding ~40% through the qstat(1) man page, targeted google searches do a poor job finding the link)

Comments:

Post a Comment:
Comments are closed for this entry.
About

yakshaving

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
Sun Managed Operations