SGE quick and dirty how to find jobs on 'bad' slots
By yakshaving on Nov 26, 2007
I occasionally have a need to find queues in Sun Grid Engine that are in one of the possibly problematic states which have an occupied slot. It is just infrequent enough that I don't remember exactly how I did it the last time.
qstat -f | awk '$6~/[cdsuE]/ && $3!~/\^/'
queuename qtype used/tot. load_avg arch states
email@example.com BIP 1/1 -NA- sol-amd64 adu
firstname.lastname@example.org BIP 1/1 -NA- sol-amd64 adu
An alternate is "qstat -f | awk '$6~/[cdsuE]/ && $3~/\^[1-9]/'" which also avoids printing the header line. In the example above 'state' in $6 matches 's' and 'used' does not begin with '0'.
The possibly more elegant 'qstat -f -qs cdsuE' still requires a second comparison in awk of '$0!~/--/' to filter out the queue separator lines. (qstat -f -qs acduE | awk '$0!~/--/ && $3!~/\^/')
Finally because I can never remember what exactly all the queue states are and the qstat man page doesn't have the nice table:
aoACD – Number of queue instances that are in at least one of the following states:
a – Load threshold alarm
o – Orphaned
A – Suspend threshold alarm
C – Suspended by calendar
D – Disabled by calendar
cdsuE – Number of queue instances that are in at least one of the following states:
c – Configuration ambiguous
d – Disabled
s – Suspended
u – Unknown
E – Error
d(eletion), E(rror), h(old), r(unning), R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting).
Edit: Added Job Status, literally couldn't find that in any of the online docs (notwithstanding ~40% through the qstat(1) man page, targeted google searches do a poor job finding the link)