Running GROMACS

From Bootable Cluster CD

Running GROMACS

GROMACS is a freely available molecular dynamics package. For more information, please see About GROMACS. This tutorial will walk through running GROMACS to analyze a molecule. The molecule we analyze is a small, fast, stable system that doesn't scale particularly well beyond a few processes. This tutorial assumes that you have booted the BCCD on one or more machines.

Installing GROMACS

First we install GROMACS and the GMXBench molecules:

  • Login to the bccd environment as root. If you have already logged in as bccd, to become root, issue the $su - command, which is the "sudo user" command. For example: [bccd@host129]>su -     You will be prompted for the root user's password, which should be letmein (or see the login splash screen for your image's root password)
  • Type list-packages at the prompt. From the list that appears, select gromacs and then OK. This is the GROMACS package.
  • Once that has finished, issue list-packages again. This time, select gmxbench and then OK. These are sample molecules often run as benchmarks when using GROMACS.
  • Type logout to return to the prompt or the bccd user.

Running a Molecular System

Now we are ready to prepare and run a molecular system. Let's start by running on one processor.

Running on One Processor

  • Login to the bccd environment as bccd. If you prompted to run the BCCD heartbeat process, choose Yes. If you're currently running as the root user (your prompt will be light-blue and end in the hash - #), you can log out of the root account by typing logout or using the ctrl-d key combination. Then log in as bccd.
  • Type bccd-allowall   At the prompt, choose Yes.
  • Type bccd-snarfhosts   This will find all the other computers running the BCCD on the same network.
  • Type bccd-checkem machines   At this step, note how many hosts (other computers running the BCCD) you have access to.
  • The GMX Benchmarks contain four different simulations that can be run. You can see these by moving to the gmxbench directory by typing cd /usr/local/gmxbench/. You should see four directories.
    • d.dppc - This is a bilayer phospholipid membrane, similar to the structure of a cell membrane. The simulation consists of 1024 dipalmitoylphosphatidylcholine (DPPC) lipids simulated in water.
    • d.lzm - This is the lysozyme enzyme, an antibiotic enzyme found in the human body.
    • d.poly-ch2 - This consists of a 6000 unit polyethylene molecule.
    • d.villin - This is the villin headpiece protein. It starts with a chain of thirty-six amino acids making up the protein and places it in an environment of water as it folds. (To be specific, the amino acid sequence DEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLFMLS.) This is the simulation we will run.
  • Now create a folder to run the benchmark in. Do this with the command mkdir ~/gmxtmp
  • After we've created the folder, we need to copy the files over that we are going to run. Since we're analyzing the villin molecule, this is the directory we will copy. Copy the villin folder over with the following command:
cp /usr/local/gmxbench/d.villin/* ~/gmxtmp

(However, if you want to use one of the other molecules, copy that folder instead. For instance, to copy lysozyme, type cp /usr/local/gmxbench/d.lzm/* ~/gmxtmp) We create this directory because GROMACS assumes by default that all the files are in the current working directory.

  • Move into that directory with the command cd ~/gmxtmp
  • Next we need to prepare the system to run GROMACS with the GROMACS Pre-Processor (grompp). We're going to run with one processor first. Issue the command
grompp -np 1 -shuffle -sort -v

The -shuffle argument specifies that the the workload should be equally split up amongst the number of processors; this won't matter when we're using only one processor, but will help optimize when we use more than one. -sort is another optimization when using multiple processors; it splits up the molecules according to their coordinates. -np 1 tells the system that we're going to run on one processor.

  • Finally we're ready to run GROMACS itself. (Here, we're actually using MPI to run GROMACS. For more information on MPI, check out Compiling and Running.) Do this with the command
mpirun -np 1 -machinefile ~/machines /usr/local/bin/mdrun -v -c output.gro

Once you hit enter, mdrun (the computational workhorse of GROMACS) begins to run the molecular dynamics simulation. For each atom, a list of its neighbors is made, and then the forces for each neighbor are computed on each atom. After summing up all the forces, the new position and velocity of the atom is calculated. This happens at each timestep!

Benchmarking Results

At the end of mdrun's output you will notice a number of run-time statistics. These can be used for benchmarking. Note the values for Real(s) (this is the wall time, or the total time the process was running) and ps/NODE hour (picoseconds per node hour).

Output File Results

The output is also stored in the file we specified when running mdrun: output.gro. This file represents the final positions and forces on all of the atoms. To view in a scroll-able form, use cat output.gro | more    (To quit from this, type the letter q.)

This file lists all of the atoms present in the villin headpiece. The file is quite long! For instance, in the amino acid sequence for villin (DEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLFMLS) there are three A's, which stand for alanine. One alanine molecule alone has three atoms of carbon, seven atoms of hydrogen, one of nitrogen, and two of oxygen!

We can view just the atoms making up alanine molecules in our output file with the command

cat output.gro | grep ALA

This searches our output file for any lines that contain 'ALA'. The three letter symbol for any other amino acid could also be used, or, to search for a specific atom, the symbol for that atom. For instance, to search for hydrogen, we would use the command cat output.gro | grep ' H ' (We need the spaces because we're only interested in lines that have an 'H' by itself. For instance, we're not interested in lines with the word PHE just because it has an H in it.)

Your results for alanine molecules should look something like this:

   9ALA      N   87   3.492   2.996   1.618  0.1574  0.2410 -0.0154
   9ALA      H   88   3.519   2.922   1.679  2.0612  0.8963  0.0172
   9ALA     CA   89   3.614   3.063   1.563 -0.6239  0.2846  0.0653
   9ALA     CB   90   3.744   3.018   1.632 -0.3756  0.0682  0.1689
   9ALA      C   91   3.627   3.036   1.412  0.0235 -0.4013  0.1374
   9ALA      O   92   3.560   2.952   1.355 -0.9055 -0.2922  0.6419
  17ALA      N  166   3.249   2.174   1.947  0.8748 -0.1217 -0.4455
  17ALA      H  167   3.163   2.174   1.895  1.5885  1.3499 -1.6779
  17ALA     CA  168   3.262   2.077   2.058 -0.1161  0.1543  0.2606
  17ALA     CB  169   3.269   2.138   2.195 -0.2276  0.7956 -0.3209
  17ALA      C  170   3.139   1.988   2.070  0.5609  0.6531  0.0107
  17ALA      O  171   3.028   2.033   2.034 -0.2431  0.5592 -0.3315
  19ALA      N  189   3.013   1.812   2.337  0.1137  0.0036  0.4017
  19ALA      H  190   3.111   1.813   2.357  0.5014  0.0426 -1.3815
  19ALA     CA  191   2.922   1.861   2.448  0.5072  0.0745  0.3275
  19ALA     CB  192   2.980   1.825   2.580 -0.3012  0.7028  0.5064
  19ALA      C  193   2.908   2.016   2.438 -0.0876 -0.0857  0.1006
  19ALA      O  194   3.007   2.087   2.435 -0.1408 -0.5455  0.2537

Each line represents one atom. The first six lines belong to an alanine which happens to be the ninth group of atoms (called a residue) in this file, the second and third alanines are the seventeenth and nineteenth residues, respectively. Looking at the first line, the second column indicates the first atom is N for nitrogen. The second has an H for hydrogen. The third column indicates the unique identifying number for that particular atom. From here, the numbers with decimals all indicate positions or velocities: the first three are for the x, y, and z positions, and the last three are for the x, y, and z velocities for that particular atom.

Two Processors

Now let's run villin on two processors. (Assuming you have access to that many - see the bccd-checkem machines step above).

  • Make sure you're still in the directory you created with the command cd ~/gmxtmp
  • Run the GROMACS pre-processor again. Here the 2 indicates that we're going to run on two processors.
grompp -np 2 -shuffle -sort -v
  • This time, because we're going to be running over multiple machines, we need to synchronize all the files that grompp spit out over the different computers. To do this, first move into your home directory with cd ~    Next, run the following command
bccd-syncdir ~/gmxtmp machines

This will tell you the specific name of the temporary directory. You'll need this for the next step!

  • Move into that directory that was just created: cd /tmp/<your directory>
  • Now we're ready to run GROMACS again. The -np 2 specifies that we're using two processors this time.
mpirun -np 2 -machinefile ~/machines /usr/local/bin/mdrun -v -c output.gro

Benchmarking Results

Again note the values for Real (s) and ps/NODE hour. How do they compare to the one processor values? We would expect a speedup with two processors working on the problem, unless the overhead introduced by having to communicate between the two processors was actually longer than the speed up in processing. Did it run in half the time? Did the rate (ps/NODE hour) double?

Repeat the above for additional processors, record the data, and graph the results.

Other molecular systems are likely to exhibit different characteristics with respect to speedup and efficiency. This has to do with how well a particular simulation can be split up to run on separate processors, as well as how large it is and how long it takes to run.

Personal tools