The Power of Xargs

Here's an absolutely amazing command I've come to love in the last month - xargs(1).

Here's an good setting to illustrate it's power. Say you want to collect data from a lab of machines - like current load, which users are on them and so on. I will talk about a pulling (or polling, if you prefer) technique rather than a pushing technique.

So one way you could do this would be (only gathering who data)

cat listOfLabMachines | while read HOST
do
   ssh -n $HOST who > savedata/$HOST
done

Another way you could do this would be through a stub-script and xargs as such

#!/bin/bash
# Filename: stub-script.sh
HOST=$1
echo "SSHing to $HOST"
ssh -n $HOST who > savedata/$HOST
exit 0

Here is the command, taking advantage of xargs, that executes the script.

cat listOfLabMachines | xargs -i ./stub-script.sh {}

The problem with this approach is that it's relatively slow. Sure xargs is nice because it's a one line for loop, but as I executed it then it ran everything sequentially. Here's a timed trace of it running on our Sun Ultra 45 lab of 24 machines at school.

cwalsh@mint:~$ cat stub-script.sh
#!/bin/bash
# Filename: stub-script.sh
HOST=$1
echo "SSHing to $HOST"
ssh -n $HOST who > savedata/$HOST
exit 0

cwalsh@mint:~$ time cat listOfLabMachines | xargs -i bash stub-script.sh {}
SSHing to apple
SSHing to banana
SSHing to blackberry
SSHing to blueberry
ssh: connect to host blueberry port 22: No route to host
SSHing to brownie
ssh: connect to host brownie port 22: No route to host
SSHing to cake
ssh: connect to host cake port 22: No route to host
SSHing to cherry
SSHing to chocolate
tspencer :0           2008-10-10 10:34
tspencer pts/0        2008-10-10 10:35 (:0.0)
SSHing to cobbler
SSHing to kahlua
SSHing to lemon
SSHing to maple
SSHing to mint
xinyang  pts/0        2008-10-10 08:50 (wl-dhcp143-10.mines.edu)
cwalsh   pts/1        2008-10-10 11:08 (alamode.mines.edu)
xinyang  pts/2        2008-10-08 16:40 (sorbet.mines.edu)
SSHing to peach
SSHing to pecan
SSHing to pie
spoole   pts/2        2008-10-07 12:15 (fiji184.mines.edu)
SSHing to pistachio
SSHing to pumpkin
SSHing to raspberry
SSHing to rhubarb
SSHing to strudel
SSHing to tapioca
SSHing to toffee
SSHing to vanilla
vsingh   :0           2008-10-02 15:20

real    0m13.615s
user    0m0.456s
sys     0m0.264s
cwalsh@mint:~$

See? We can do better than 13.6 seconds. That's too slow :) ssh is a i/o bound process, so that means we can take advantage of the idle cpu while ssh is blocking on data. Here's the coolest part about xargs - the P flag. If you use xargs with '-P 4' then it will spawn 4 processes to handle the input given. If you give xargs a '-P 0' then it will spawn as many processes are there are inputs. This is exceptionally handy when you're dealing with multicore machines. Here's the timed trace of the same command run on the lab as before but this time with '-P 0'

cwalsh@mint:~$ time cat listOfLabMachines | xargs -P 0 -i bash stub-script.sh {}
SSHing to apple
SSHing to banana
SSHing to blueberry
SSHing to cherry
SSHing to cake
SSHing to cobbler
SSHing to blackberry
SSHing to brownie
SSHing to chocolate
SSHing to pumpkin
SSHing to raspberry
SSHing to lemon
SSHing to toffee
SSHing to pistachio
SSHing to maple
SSHing to pecan
SSHing to kahlua
SSHing to tapioca
SSHing to rhubarb
SSHing to mint
SSHing to vanilla
SSHing to pie
SSHing to strudel
SSHing to peach
xinyang  pts/0        2008-10-10 08:50 (wl-dhcp143-10.mines.edu)
cwalsh   pts/1        2008-10-10 11:08 (alamode.mines.edu)
xinyang  pts/2        2008-10-08 16:40 (sorbet.mines.edu)
tspencer :0           2008-10-10 10:34
tspencer pts/0        2008-10-10 10:35 (:0.0)
spoole   pts/2        2008-10-07 12:15 (fiji184.mines.edu)
vsingh   :0           2008-10-02 15:20
ssh: connect to host blueberry port 22: No route to host
ssh: connect to host brownie port 22: No route to host
ssh: connect to host cake port 22: No route to host

real    0m3.106s
user    0m0.412s
sys     0m0.328s
cwalsh@mint:~$

As you can see, this is totally awesome! Nearly a 5x increase in run time! All done just by taking advantage of this new multi-core paradigm all us programmers need to adopt if we're to continue pushing those hardware guys along with Moore's law :)

Enjoy

Comments:

You could save yourself another process by using an IO redirect (why do people insist on doing cat file | .... ?).

eg xargs .... < listOfLabMachines

alan.

Posted by Alan Hargreaves on October 19, 2008 at 12:11 PM MDT #

Yes could could in fact. Running `xargs -i ./script-stub.sh {} < listOfLabMachines` works.

Thats a good question why people do the cat thing... I suppose I've known about the immediate redirection and have just never used it (and I've done the cat thing ever since I learned the shell... \*shrug\*). Thanks for reminding me!

Chris

Posted by cwalsh on October 19, 2008 at 02:56 PM MDT #

[Trackback] Sun Academic Initiative (tags: sun social education learning java students sai) The Power of Xargs - Chris&#039; Corner (tags: unix sysadmin...

Posted by c0t0d0s0.org on October 19, 2008 at 10:00 PM MDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Christopher Walsh

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today