Running LAM MPI
From Bootable Cluster CD
Setting LAM as the Default MPI Environment
LAM is a great MPI environment. Unfortunately, it's not the default MPI environment used on the BCCD. The reason for this is simple: LAM lost the coin toss to MPICH when the BCCD was first created. The problem historically with having LAM-MPI and MPICH in the same environment is one of libraries, default executables, consistency across hosts, and the expectations of the end user.
Switching the default environment from MPICH to LAM is easy to do, but one needs to be completely thorough. In other words, if the systems are not completely transitioned to use LAM, the resulting environment will be very, very broken.
However, by following these directions completely, all will be well in your LAM-MPI world. For each host in the BCCD cluster, do the following steps:
Edit the bccd User's Default Settings
By default, the bccd user's PATH setting points to the MPICH binaries. This includes the mpich versions of mpirun, mpicc, mpif77, etc. To point the bccd user's PATH to the LAM compiling scripts and tools, modify the PATH setting in the bccd user's .bashrc file:
vi ~/.bashrc
Look for the line that reads:
export PATH=$PATH:/mpich/bin
change this to read:
export PATH=/lam-mpi/bin:$PATH
Great!
Allow the Changes to Take Effect
Log out of every shell, or source your .bashrc. Changes in your .bashrc settings do not take effect immediately. These changes will "stick" the next time you log in, or if you do what is referred to as "sourcing your .bashrc file". So you will have to do one of the following actions:
- Shutdown X and/or log out of every shell. Then log back in.
- Or issue:
. ~/.bashrc
A necessary (but not sufficient) condition for things to go forward is to issue which mpirun. The response that comes back should reflect the mpirun under /lam-mpi/bin.
If you don't see LAM's version of mpirun, then you've done something wrong (or there's a problem with this howto so of course you'll create yourself an account, log in, and fix the problem, RIGHT?).
Rebuild the System Library Cache
Become root (su -, password letmein). Then issue:
ldconfig -v | less
Ignore errors, if any, about /usr/local/lib not being there. Somewhere in the file, you need to see that libmpi.so.0 is being taken care of by lam-mpi:
/lam-mpi/lib:
libmpi.so.0 -> libmpi.so.0.0.0
liblammpi++.so.0 -> liblammpi++.so.0.0.0
liblamf77mpi.so.0 -> liblamf77mpi.so.0.0.0
liblam.so.0 -> liblam.so.0.0.0
There are likely more entries in the output, the above just illustrates what you're looking for. Do this for every host in the cluster, then log out of the root account.
Booting LAM MPI
LAM-MPI requires a file consisting of a list of current nodes to boot. Make sure that every node has started pkbcast, bccd-allowall, and bccd-snarfhosts, as discussed in Booting up the CD. The bccd-snarfhosts command should generate the appropriate machines file, in the user bccd's local directory. This file contains a list of active nodes, and is exactly what LAM needs. Issue the following command to verify that the cluster is bootable:
If the command is successful, you should see the message below:
To actually start LAM on the specified cluster, issue the following:
If you don't see any error message, then you can now run MPI programs under LAM. Gravy!
Compiling and Running MPI Programs with LAM
To find out how to compile and run sample MPI programs, take a look at Compiling and Running. Remember, the example programs for LAM are in the directory ~/lam-mpi/examples. The examples are sorted inside directories. You may go into each directory to compile and run each program using the familiar mpicc and mpirun commands.
Shutting Down LAM
Cleaning LAM
Instead of lambooting after each MPI run, we can issue a lamclean command to remove all user processes and messages:
After doing this, we can mpirun another program.
Halting LAM
After we are all done, the lamhalt command removes all traces of the LAM session on the network.
And just in case...
In the case of a catastrophic failure (i.e., one or more LAM nodes crash), we can issue a wipe command to halt everything instead of issuing lamhalt.







