By martin on Aug 07, 2007
Today I started to ponder a problem which I can't be alone to have encountered. When you administer over 300 systems and want to perform bulk operations over ssh, there are always one or two systems which are down or unreachable, so your nifty little scripts which log on to each system to install a package, apply a patch, change a configuration setting, tweak a variable or just pull statistics from the system will fail.
So I started toying with the idea of an ssh job queue which helps you keep track of bulk operations, so you can see the on which systems the operation has successfully completed. Once I started to try this out I figured that I can't be the first one to face this problem, so i thought I'd ask you for input.
How do you deal with this problem? And "pen and paper" isn't the answer I'm looking for :)