Wednesday Nov 25, 2009

Use InfiniBand with Solaris X86 10/09 and HPC-ClusterTools 8.2

I will describe here how to set up HPC-ClusterTools (HPC-CT) 8.2 on Solaris10 X86 10/09 (SunOS 5.10 Generic_141445-09) to run over an InfiniBand (here a QDR IB) network. Attention: As I am behind a firewall, I use very open and possibly not secure settings, avoiding passwords, etc. If connected to the outside world, your cluster could become an easy target for hackers. This blog does not describe how to cable and to configure the switches. I am counting on your IT admin to do this.
Set up a local NFS file system

In order to install HPC-CT, you need a shared filesystem, visible from all nodes of your cluster.
In my case, I had to do this first.
Let us call node0 your headnode (who will the server of the NFS filesystem)
Start on node0
%svcadm -v enable -r network/nfs/server
%mkdir /tools
%chmod 777 /tools
%share -F nfs -o rw /tools

Add the share command into
%cat /etc/dfs/dfstab
share -F nfs -o rw /tools

and you will get it automatically after a reboot.

Now on all other client nodes (node1 to nodeN) do
%mkdir /tools
%mount -F nfs node0:/tools /tools

and add a line at the end of

%cat /etc/vfstab
server:/disk   -   /mount_point   nfs   -   yes    rw,soft
node0:/tools -   /tools         nfs   -   yes    rw,soft
Password-free rsh

The next step is to get a password free rsh for root
edit/create a rhostfile containing the hostnames and the login :

%cat ~/.rhosts
node0 root
node1 root
node2 root
nodeN root

and add the hostnames in the file

%cat /etc/hosts.equiv

you should now be able to create files under /tools and do a rsh nodeN command
without any password prompt.

Installing HPC-CT

Now it is time to install HP-CT 8.2. Download the latest version from here.
Stay on your headnode, node0, and put sun-hpc-ct-8.2-SunOS-i386.tar.gz under the shared filesystem /tools
%cd /tools
%gunzip -c sun-hpc-ct-8.2-SunOS-i386.tar.gz | tar xvf
%cd sun-hpc-ct-8.2-SunOS-i386/Product/Install_Utilities/bin
%./ctinstall -n node0,node1,node2,nodeN -r rsh

For more information, see here for the HPC CT installation guide.You do not need to have the IB network during the installation of HPC-CT. This is a feature taken at run-time, and not at install-time.

For the time being Solaris uses the uDAPL protocol. This protocol requires a TCP interface be up and running
Check with

% ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
     inet netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
         inet netmask ffffff00 broadcast
         ether 0:1e:68:2f:1d:9e

that this is the case. You can already try to run a mpi program by specifying the tcp interface:
%mpirun -np 2 -mca btl sm,tcp,self -mca plm_rsh_agent rsh -hostfile ./hostfile ./a.out
%cat hostfile
node0 slots=1
node1 slots=1

Configuring the IB interface

Check that the IB updates and packages are installed.
Run the
%pkginfo -x | grep -i ib
within a long list you should see something like this :

SUNWhermon                        Sun IB Hermon HCA driver
SUNWib                            Sun InfiniBand Framework
SUNWibsdp                         Sun InfiniBand layered Sockets Direct Protocol
SUNWibsdpib                       Sun InfiniBand Sockets Direct Protocol
SUNWibsdpu                        Sun InfiniBand pseudo Sockets Direct Protocol Admin

If you see nothing here, you will have to install the IB patches from the install image.
If you are using an earlier version of Solaris10 X86 (5/09), you can get these packages from here.

Check the /usr/sbin/datadm command
%datadm -v
If you see nothing, you have to check whether or not you have this file :

%cat /usr/share/dat/SUNWudaplt.conf
# Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
# ident "@(#)SUNWudaplt.conf    1.3     08/10/16 SMI"
driver_name=tavor  u1.2 nonthreadsafe default SUNW.1.0 " "
driver_name=arbel  u1.2 nonthreadsafe default SUNW.1.0 " "
driver_name=hermon u1.2 nonthreadsafe default SUNW.1.0 " "
Run the following command on all nodes
%datadm -a /usr/share/dat/SUNWudaplt.conf

Now datadm should display this
%datadm -v
ibd0 u1.2 nonthreadsafe default SUNW.1.0 " " "driver_name=hermon"

and you should have a file
%cat /etc/dat/dat.conf
ibd0 u1.2 nonthreadsafe default SUNW.1.0 " " "driver_name=hermon"

Eventually reboot now all nodes.
If you have done no mistake, they all should come back with an NFS mounted directory /tools and
password free rsh commands and, datadm should return the line as shown above.

Check if the IB interface is seen under
%ll /dev/ib\*
3120    2 lrwxrwxrwx   1 root     other         29 Nov 11 15:43 /dev/ibd -> 
92901    2 lrwxrwxrwx   1 root     root          72 Nov 16 10:09 /dev/ibd0 -> 

Here my interface is called ibd0. You may have another number at the end.

Now we have to configure the ibd0 interface. In my example, I decided
to give the following IP address for the ibd0 interface:
(Before doing this check with ping that these addresses are really unused ... )

etc ...

Now on every node run ifconfig command with the correct IP
On node 0

%ifconfig ibd0 plumb broadcast netmask up

on node1

%ifconfig ibd0 plumb broadcast netmask up


The ibd0 should now be unplumbed and show

%ifconfig ibd0
ibd0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 4
      inet netmask ffff0000 broadcast
      ipib 0:1:0:4a:fe:80:0:0:0:0:0:0:0:21:28:0:1:3e:5c:90

Finally to make this interface persistent across reboots you have to create on every node a file that contains the IP address for the ibd0 interface.
on node0

%cat /etc/hostname.ibd0

and on node1
%cat /etc/hostname.ibd0

etc ...

As a test you should be able to ping all IP adresses from all nodes.

Do a last sanity check by looking at
%ldd /opt/SUNWhpc/HPC8.2/sun/lib/openmpi/
and check that all libraries are found

Now you are ready for rock'n roll and you can run
%setenv LD_LIBRARY_PATH /opt/SUNWhpc/HPC8.2/sun/lib
%mpirun -np  2 -mca btl sm,self,udapl -mca plm_rsh_agent rsh -x LD_LIBRARY_PATH -hostfile ./hostfile ./a.out

with the same! hostfile as above
%cat hostfile
node0 slots=1
node1 slots=1

Some additional remarks

As you have seen from the examples above, HPC-CT will look for the best way to communicate with the hosts mentioned in the hostfile by searching the fastest possible interconnect.
Let us suppose that node0 and node1 are connected (as described above over IB), while node3 and node4 are on the TCP interconnect. Running
%mpirun -np 4 -mca btl sm,self,tcp,udapl -mca plm_rsh_agent rsh -x LD_LIBRARY_PATH -hostfile ./hostfile ./a.out
%cat hostfile
node0 slots=1
node1 slots=1
node2 slots=1
node3 slots=1

will use IB between nodes 0 and 1 and the TCP network for the rest. If you would impose the IB network by setting -mca btl sm,self,udapl the run will fail and you get an error message.

[Read More]

Be more productive with the Sun High-Performance Computing platform.


« October 2016